From redmine at redmine.open-bio.org Mon Aug 1 01:24:51 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 1 Aug 2011 05:24:51 +0000 Subject: [Biopython-dev] [Biopython - Feature #3271] Updates to PDBList.py- downloading PDB structures References: Message-ID: Issue #3271 has been updated by David Cain. Hi, Eric. I'm glad you like my changes, and I appreciate your feedback. I made some changes in line with your suggestions and submitted my branch as a pull request. Thank you again for the response. ---------------------------------------- Feature #3271: Updates to PDBList.py- downloading PDB structures https://redmine.open-bio.org/issues/3271 Author: David Cain Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: 1.57 URL: https://github.com/DavidCain/biopython PDBList.py is somewhat out of date: it has support for .Z compression, but the ftp://ftp.wwpdb.org/ server only has .gz archives. It also relies on a system utility to decompress the downloaded archives. The default, gunzip, is effective enough for posix systems, but Windows requires the installation of a command line tool, such as 7zip. I've rewritten it to use the gzip module, and to ignore the compression parameter (as all files are .gz anyway). I left the 'uncompress' and 'compression' parameters for backwards compatibility. I've also made it so that the user can override and use a system decompression tool if desired. I'm not sure if this is the best way to handle it, as the retrieve_pdb_file() function would work just fine removing support for system decompression and the 'compression' parameter. Also, when calling retrieve_pdb_file() repeatedly, urllib can generate too many FTP connections and crash (for example) a script attempting to download some structures in succession. Updating to urllib2 removes this issue. My GitHub branch is linked, and the only file I've modified (PDBList.py) is attached. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Mon Aug 1 10:57:06 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 1 Aug 2011 14:57:06 +0000 Subject: [Biopython-dev] [Biopython - Feature #3271] (Closed) Updates to PDBList.py- downloading PDB structures References: Message-ID: Issue #3271 has been updated by Eric Talevich. Status changed from New to Closed % Done changed from 0 to 100 Merged it: https://github.com/biopython/biopython/pull/14 I think we could do more work on the docstrings and comments, generally, but it's out of the scope of this bug. Thanks again! ---------------------------------------- Feature #3271: Updates to PDBList.py- downloading PDB structures https://redmine.open-bio.org/issues/3271 Author: David Cain Status: Closed Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: 1.57 URL: https://github.com/DavidCain/biopython PDBList.py is somewhat out of date: it has support for .Z compression, but the ftp://ftp.wwpdb.org/ server only has .gz archives. It also relies on a system utility to decompress the downloaded archives. The default, gunzip, is effective enough for posix systems, but Windows requires the installation of a command line tool, such as 7zip. I've rewritten it to use the gzip module, and to ignore the compression parameter (as all files are .gz anyway). I left the 'uncompress' and 'compression' parameters for backwards compatibility. I've also made it so that the user can override and use a system decompression tool if desired. I'm not sure if this is the best way to handle it, as the retrieve_pdb_file() function would work just fine removing support for system decompression and the 'compression' parameter. Also, when calling retrieve_pdb_file() repeatedly, urllib can generate too many FTP connections and crash (for example) a script attempting to download some structures in succession. Updating to urllib2 removes this issue. My GitHub branch is linked, and the only file I've modified (PDBList.py) is attached. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Tue Aug 2 12:43:30 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 2 Aug 2011 17:43:30 +0100 Subject: [Biopython-dev] Leaked handles in PAML unit tests Message-ID: Hi Brandon, Would you be able to look at these handle leaks in the PAML unit tests some time? test_PAML_baseml ... /Users/pjcock/lib/python3.2/unittest/case.py:574: ResourceWarning: unclosed file <_io.TextIOWrapper name='PAML/bad1.ctl' mode='r' encoding='UTF-8'> callableObj(*args, **kwargs) /Users/pjcock/lib/python3.2/unittest/case.py:574: ResourceWarning: unclosed file <_io.TextIOWrapper name='PAML/bad2.ctl' mode='r' encoding='UTF-8'> callableObj(*args, **kwargs) /Users/pjcock/lib/python3.2/unittest/case.py:574: ResourceWarning: unclosed file <_io.TextIOWrapper name='/dev/null' mode='w' encoding='UTF-8'> callableObj(*args, **kwargs) ok test_PAML_codeml ... ok test_PAML_yn00 ... /Users/pjcock/lib/python3.2/unittest/case.py:574: ResourceWarning: unclosed file <_io.TextIOWrapper name='PAML/bad3.ctl' mode='r' encoding='UTF-8'> callableObj(*args, **kwargs) ok This is warning is new under Python 3.2, but this kind of code can and has caused bugs on Windows (can't delete files if there is an open handle) and Jython (different GC collection, so implicit handle closing is stochastic). See also: http://bugs.python.org/issue10093 Note there are other cases of this, some in PopGen (which may explain a periodic failure under Jython), and in test_SCOP_Astral.py (where the object design makes this difficult to avoid IIRC), etc. Peter From p.j.a.cock at googlemail.com Tue Aug 2 12:47:20 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 2 Aug 2011 17:47:20 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Sat, Jul 30, 2011 at 8:42 AM, Wibowo Arindrarto wrote: > Hi Peter, > I've done some more improvements to the code: > - I've written the check and unittest for the file handle mode. I've set it > so that abi file has to be opened in 'rb' mode, otherwise it'll return an > error. While it's ok to open in 'r' mode in python 2 in Linux, it has to be > specified as 'rb' in Windows and/or Python 3 for the file to be read > correctly. So I decided forcing it to 'rb' is the best. Because of this, I > changed 'test_SeqIO.py:503' to include the mode argument when opening. OK, good. > - I've also checked against test_Emboss.py for seqret output, after > including the abi format in it. My EMBOSS version is 6.4.0. There was a > slight problem with this testing, since for some reason the ID returned by > seqret is always "EMBOSS_001". Something might be wrong with my EMBOSS > installation, since when I previously tested it against 6.1.0, the ID was > correct (although the qual values not, so I had to upgrade). As expected, if > I comment out the code that tests for sequence id ('test_Emboss.py:168-172') > the tests pass. Maybe you could try testing it as well and see if EMBOSS > also returns the default id instead of the sample name? EMBOSS 6.3.1 is fine, so I think we should report this as a bug in EMBOSS 6.4.0 > - Finally, I did some small cosmetic changes to the code (typos, etc). > All changes have been pushed to my github fork. Now I still have time for > the weekend to improve whatever needs to be improved :). > Regards, There appears to be another Python 3 problem, consider this at the python prompt: from Bio import SeqIO record = SeqIO.read("Tests/Abi/310.ab1", "abi") record.letter_annotations["phred_quality"] I expect as list of integers, e.g. [0, 0, 0, ..., 0] not ['\x00', '\x00', '\x00', ..., '\x00'] Peter From w.arindrarto at gmail.com Tue Aug 2 12:53:46 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 2 Aug 2011 18:53:46 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, I noticed that bug was because I did not add the _bytes_to_string() converter for a data type. I already fixed this with my latest push, adding the appropriate if clause at AbiIO.py:293-294. Regards, --- Wibowo Arindrarto (bow) http://bow.web.id On Tue, Aug 2, 2011 at 18:47, Peter Cock wrote: > On Sat, Jul 30, 2011 at 8:42 AM, Wibowo Arindrarto > wrote: > > Hi Peter, > > I've done some more improvements to the code: > > - I've written the check and unittest for the file handle mode. I've set > it > > so that abi file has to be opened in 'rb' mode, otherwise it'll return an > > error. While it's ok to open in 'r' mode in python 2 in Linux, it has to > be > > specified as 'rb' in Windows and/or Python 3 for the file to be read > > correctly. So I decided forcing it to 'rb' is the best. Because of this, > I > > changed 'test_SeqIO.py:503' to include the mode argument when opening. > > OK, good. > > > - I've also checked against test_Emboss.py for seqret output, after > > including the abi format in it. My EMBOSS version is 6.4.0. There was a > > slight problem with this testing, since for some reason the ID returned > by > > seqret is always "EMBOSS_001". Something might be wrong with my EMBOSS > > installation, since when I previously tested it against 6.1.0, the ID was > > correct (although the qual values not, so I had to upgrade). As expected, > if > > I comment out the code that tests for sequence id > ('test_Emboss.py:168-172') > > the tests pass. Maybe you could try testing it as well and see if EMBOSS > > also returns the default id instead of the sample name? > > EMBOSS 6.3.1 is fine, so I think we should report this as a bug in EMBOSS > 6.4.0 > > > - Finally, I did some small cosmetic changes to the code (typos, etc). > > All changes have been pushed to my github fork. Now I still have time for > > the weekend to improve whatever needs to be improved :). > > Regards, > > There appears to be another Python 3 problem, consider this at the > python prompt: > > from Bio import SeqIO > record = SeqIO.read("Tests/Abi/310.ab1", "abi") > record.letter_annotations["phred_quality"] > > I expect as list of integers, e.g. [0, 0, 0, ..., 0] not ['\x00', > '\x00', '\x00', ..., '\x00'] > > Peter > From p.j.a.cock at googlemail.com Tue Aug 2 13:57:56 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 2 Aug 2011 18:57:56 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Tue, Aug 2, 2011 at 5:53 PM, Wibowo Arindrarto wrote: > Hi Peter, > I noticed that bug was because I did not add the _bytes_to_string() > converter for a data type. I already fixed this with my latest push, adding > the appropriate if clause at AbiIO.py:293-294. > Regards, Was that only half the fix? This made it work for me: https://github.com/peterjc/biopython/commit/8fc1e141173a735740f91a1338a3fbb747fa2a45 and: https://github.com/peterjc/biopython/commit/a44e206e482ca5904b395aaca3576a232769ce2e Peter From p.j.a.cock at googlemail.com Tue Aug 2 14:03:24 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 2 Aug 2011 19:03:24 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Tue, Aug 2, 2011 at 6:57 PM, Peter Cock wrote: > On Tue, Aug 2, 2011 at 5:53 PM, Wibowo Arindrarto > wrote: >> Hi Peter, >> I noticed that bug was because I did not add the _bytes_to_string() >> converter for a data type. I already fixed this with my latest push, adding >> the appropriate if clause at AbiIO.py:293-294. >> Regards, > > Was that only half the fix? This made it work for me: > > https://github.com/peterjc/biopython/commit/8fc1e141173a735740f91a1338a3fbb747fa2a45 > > and: > > https://github.com/peterjc/biopython/commit/a44e206e482ca5904b395aaca3576a232769ce2e > > Peter > Could you test this branch, which I think is ready to be merged to the trunk now: https://github.com/peterjc/biopython/tree/seqio-abi Thanks, Peter From w.arindrarto at gmail.com Wed Aug 3 08:14:53 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 3 Aug 2011 14:14:53 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, My bad, I forgot to change that one line and didn't test before comitting. Thanks for fixing it. I've ran the tests on your tree on py2.6.5 and py3.1.2, here are the results: - On both py2.6.5 and py3.1.2, I have the following test case error: "NameError: global name 'embossversion' is not defined", on line 257. I didn't have "EMBOSS_ROOT" in my os.environ paths (I installed 6.4.0 from source, by the way), so this must be what's causing it. Is there another way to automatically detect EMBOSS_ROOT other than this? Or perhaps we should avoid emboss 6.4.0's bug by only checking if the id is EMBOSS_001? The only case I think this would fail is if the user inputs "EMBOSS_001" before the sequencing run as the sample id, which is possible but unlikely. - On a related note, I noticed you set the minimum Emboss requirement to 6.1.0 patch 3. I'm not sure if this the one I use previously, but my previous Emboss 6.1.0 installation failed to extract the proper quality values. Perhaps we should set the minimum version to 6.3.1? (well, making it the only Emboss version that works with Biopython because of that 6.4.0 bug). - Other than those two, everything's tip top :). Regards, Wibowo Arindrarto (bow) http://bow.web.id On Tue, Aug 2, 2011 at 20:03, Peter Cock wrote: > On Tue, Aug 2, 2011 at 6:57 PM, Peter Cock > wrote: > > On Tue, Aug 2, 2011 at 5:53 PM, Wibowo Arindrarto > > wrote: > >> Hi Peter, > >> I noticed that bug was because I did not add the _bytes_to_string() > >> converter for a data type. I already fixed this with my latest push, > adding > >> the appropriate if clause at AbiIO.py:293-294. > >> Regards, > > > > Was that only half the fix? This made it work for me: > > > > > https://github.com/peterjc/biopython/commit/8fc1e141173a735740f91a1338a3fbb747fa2a45 > > > > and: > > > > > https://github.com/peterjc/biopython/commit/a44e206e482ca5904b395aaca3576a232769ce2e > > > > Peter > > > > Could you test this branch, which I think is ready to be merged to the > trunk now: > > https://github.com/peterjc/biopython/tree/seqio-abi > > Thanks, > > Peter > From macrozhu at gmail.com Wed Aug 3 09:47:07 2011 From: macrozhu at gmail.com (Hongbo Zhu) Date: Wed, 3 Aug 2011 15:47:07 +0200 Subject: [Biopython-dev] inconsistent return values Bio.PDB.NeighborSearch.search() Message-ID: Hi, python-developers, In the current version of BioPython (source code as of 3 Aug. 2011), it seems the outcome of *Bio.PDB.NeighborSearch.search()* is inconsistent if different levels are specified when the returned list is empty. e.g. > ns.search(center, radius, 'A') > [] > ns.search(center, radius, 'R') # similar for levels 'C', 'M', 'S' > IndexError: list index out of range Obviously, this is because the Bio.PDB.NeighborSearch.search() functions tries to convert returned list to levels other than 'A' using function Bio.PDB.Selection.unfold_entities() (see line 92 in NeighborSearch.py). In function unfold_entities(), the first element of input argument entity_list is evaluated without entity_list being checked for emptiness (see line 47 in Selection.py). An IndexError is raised when entity_list is empty. So, I think either the length of the returned list in Bio.PDB.NeighborSearch.search() should be checked before invoking Bio.PDB.Selection.unfold_entities(), or the function Bio.PDB.Selection.unfold_entities() should be revised so that it simply returns an empty list if the argument entity_list is empty. I prefer the latter solution because this would also fix other similar situations when Bio.PDB.Selection.unfold_entities() is invoked in other functions. And it seems "Sorry, entering bugs into the product Biopython has been disabled." regards, Hongbo Zhu From p.j.a.cock at googlemail.com Wed Aug 3 09:58:13 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 3 Aug 2011 14:58:13 +0100 Subject: [Biopython-dev] inconsistent return values Bio.PDB.NeighborSearch.search() In-Reply-To: References: Message-ID: On Wed, Aug 3, 2011 at 2:47 PM, Hongbo Zhu wrote: > > And it seems "Sorry, entering bugs into the product Biopython has been > disabled." We moved from Bugzilla to Redmine, links on the main homepage were updated: http://redmine.open-bio.org/projects/biopython I wonder if we can change that message text or something... Peter From p.j.a.cock at googlemail.com Wed Aug 3 10:04:46 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 3 Aug 2011 15:04:46 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Wed, Aug 3, 2011 at 1:14 PM, Wibowo Arindrarto wrote: > Hi Peter, > My bad, I forgot to change that one line and didn't test before comitting. > Thanks for fixing it. > I've ran the tests on your tree on py2.6.5 and py3.1.2, here are the > results: > - On both py2.6.5 ?and py3.1.2, I have the following test case error: > "NameError: global name 'embossversion' is not defined", on line 257. >... It was simpler than that - I'd checked it in with a typo, emboss_version was what I wanted. Sorry about that confusion! > - On a related note, I noticed you set the minimum Emboss requirement to > 6.1.0 patch 3. I'm not sure if this the one I use previously, but my > previous Emboss 6.1.0 installation failed to extract the proper quality > values. Perhaps we should set the minimum version to 6.3.1? (well, making it > the only Emboss version that works with Biopython because of that 6.4.0 > bug). We test a lot of FASTQ stuff which requires 6.1.0 patch 3 or later, which is why that requirement exists. Asking for at least EMBOSS 6.3.1 makes no practical difference as far as I can see. If you meant require EMBOSS 6.4.1 that hasn't been released yet. I'm expecting them to release EMBOSS 6.4.0 patch 1 soon (after I've tested the proposed patch Peter Rice sent), but that will still report itself as EMBOSS 6.4.0 (based on past patch behaviour, something I consider annoying but have to live with). > - Other than those two, everything's tip top :). > Great. I've pushed the code to the main repository, and have just set off the buildbot slaves as a final sanity test. This reveal a minor Python 2.4 breakage (not a big issue - it only seems to be me still trying to keep testing this - and I'm about ready to give up), and another probable EMBOSS bug in an older version installed on one buildslave. Congratulations, your code will be in the next Biopython release. Thank you, Peter From redmine at redmine.open-bio.org Wed Aug 3 10:52:32 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 3 Aug 2011 14:52:32 +0000 Subject: [Biopython-dev] [Biopython - Bug #3276] (New) inconsistent returns of Bio.PDB.NeighborSearch.search() Message-ID: Issue #3276 has been reported by Hongbo Zhu. ---------------------------------------- Bug #3276: inconsistent returns of Bio.PDB.NeighborSearch.search() https://redmine.open-bio.org/issues/3276 Author: Hongbo Zhu Status: New Priority: Normal Assignee: Category: Target version: URL: In the current version of BioPython (source code as of 3 Aug. 2011), it seems the outcome of Bio.PDB.NeighborSearch.search() is inconsistent if different levels are specified when the returned list is empty. i.e. @ ns.search(center, radius, 'A') [] ns.search(center, radius, 'R') # similar for levels 'C', 'M', 'S' IndexError: list index out of range @ Obviously, this is because the Bio.PDB.NeighborSearch.search() functions tries to convert returned list to levels other than 'A' using function Bio.PDB.Selection.unfold_entities() (see line 92 in NeighborSearch.py). In function unfold_entities(), the first element of input argument entity_list is evaluated without entity_list being checked for emptiness (see line 47 in Selection.py). An IndexError is raised when entity_list is empty. So, I think either the length of the returned list in Bio.PDB.NeighborSearch.search() should be checked before invoking Bio.PDB.Selection.unfold_entities(), or the function Bio.PDB.Selection.unfold_entities() should be revised so that it simply returns an empty list if the argument entity_list is empty. I prefer the latter solution because this would also fix other similar situations when Bio.PDB.Selection.unfold_entities() is invoked in other functions. cheers, hongbo ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From w.arindrarto at gmail.com Wed Aug 3 11:11:13 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 3 Aug 2011 17:11:13 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, On Wed, Aug 3, 2011 at 16:04, Peter Cock wrote: > On Wed, Aug 3, 2011 at 1:14 PM, Wibowo Arindrarto > wrote: > > Hi Peter, > > My bad, I forgot to change that one line and didn't test before > comitting. > > Thanks for fixing it. > > I've ran the tests on your tree on py2.6.5 and py3.1.2, here are the > > results: > > - On both py2.6.5 and py3.1.2, I have the following test case error: > > "NameError: global name 'embossversion' is not defined", on line 257. > >... It was simpler than that - I'd checked it in with a typo, emboss_version > was what I wanted. Sorry about that confusion! Silly me, I should've noticed you used emboss_version when I was looking at the code checking Emboss dependency :/. > > - On a related note, I noticed you set the minimum Emboss requirement to > > 6.1.0 patch 3. I'm not sure if this the one I use previously, but my > > previous Emboss 6.1.0 installation failed to extract the proper quality > > values. Perhaps we should set the minimum version to 6.3.1? (well, making > it > > the only Emboss version that works with Biopython because of that 6.4.0 > > bug). > > We test a lot of FASTQ stuff which requires 6.1.0 patch 3 or later, > which is why that requirement exists. Asking for at least EMBOSS > 6.3.1 makes no practical difference as far as I can see. > > If you meant require EMBOSS 6.4.1 that hasn't been released yet. > > I'm expecting them to release EMBOSS 6.4.0 patch 1 soon (after > I've tested the proposed patch Peter Rice sent), but that will still > report itself as EMBOSS 6.4.0 (based on past patch behaviour, > something I consider annoying but have to live with). I meant Emboss 6.3.1, since that seems to be one that works best with the current AbiIO implementation. But yeah, I guess as long as the tests work it's fine. > > - Other than those two, everything's tip top :). > > > > Great. I've pushed the code to the main repository, and have > just set off the buildbot slaves as a final sanity test. > > This reveal a minor Python 2.4 breakage (not a big issue - it only > seems to be me still trying to keep testing this - and I'm about > ready to give up), and another probable EMBOSS bug in an > older version installed on one buildslave. > > Congratulations, your code will be in the next Biopython release. > > Thank you, > > Peter > This really made my day :)! You're welcome and thank you reviewing my code, too! Regards, --- Wibowo Arindrarto (bow) http://bow.web.id From w.arindrarto at gmail.com Thu Aug 4 07:30:44 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 4 Aug 2011 13:30:44 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, Ah yes, I didn't know there could be handles without .seek() and .tell(), and I thought those two are the proper way of traversing files, so I used them. I also didn't realize you could use SeqIO with network handles, too. This is really neat :). In any case, sure, I'd love to make some changes to the current AbiIO code so it works without .seek() and .tell(). Is there any other input types that does not use .seek() and .tell() other than network handles? Here's my new branch from the current master: https://github.com/bow/biopython/tree/seqio-abi_handlefix, nothing different for now but I'll push my updates soon. Regards, --- Wibowo Arindrarto (bow) http://bow.web.id On Thu, Aug 4, 2011 at 13:03, Peter Cock wrote: > On Wed, Aug 3, 2011 at 4:11 PM, Wibowo Arindrarto > wrote: > > On Wed, Aug 3, 2011 at 16:04, Peter Cock > wrote: > >> ... > >> Congratulations, your code will be in the next Biopython release. > >> ... > > > > This really made my day :)! You're welcome and thank you reviewing my > code, > > too! > > I found something else to work on (sorry!). You're using seek and tell, > which > may not exist. Network handles are a good example of this situation. Try: > > from urllib import urlopen > from Bio import SeqIO > handle = urlopen("http://biopython.org/SRC/biopython/Tests/Abi/310.ab1") > record = SeqIO.read(handle, "abi") > handle.close() > > I've added some code to test_SeqIO.py to simulate this, which revealed that > the SFF parser was also using the tell method. In that case we must track > the > offset explicitly (it is needed for handling SFF index blocks). You can see > how > I did this here - note I avoid the overhead of tracking the offset in > general: > > https://github.com/biopython/biopython/commit/9a3c44b28aae256b8da825c3c1553d71dbe329cc > > I've tried the same trick in the ABI parser, but this reveals your code > likes to > seek backwards. Try the attached patch against this revision to confirm > this. > > Having looked over your code, I don't believe you need to use seek and tell > at all. This isn't critical to fix right now, but I would like us to > solve it. Would > you like to try? Make a new branch from the current master for this please. > > Regards, > > Peter > From p.j.a.cock at googlemail.com Thu Aug 4 07:03:27 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 4 Aug 2011 12:03:27 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Wed, Aug 3, 2011 at 4:11 PM, Wibowo Arindrarto wrote: > On Wed, Aug 3, 2011 at 16:04, Peter Cock wrote: >> ... >> Congratulations, your code will be in the next Biopython release. >> ... > > This really made my day :)! You're welcome and thank you reviewing my code, > too! I found something else to work on (sorry!). You're using seek and tell, which may not exist. Network handles are a good example of this situation. Try: from urllib import urlopen from Bio import SeqIO handle = urlopen("http://biopython.org/SRC/biopython/Tests/Abi/310.ab1") record = SeqIO.read(handle, "abi") handle.close() I've added some code to test_SeqIO.py to simulate this, which revealed that the SFF parser was also using the tell method. In that case we must track the offset explicitly (it is needed for handling SFF index blocks). You can see how I did this here - note I avoid the overhead of tracking the offset in general: https://github.com/biopython/biopython/commit/9a3c44b28aae256b8da825c3c1553d71dbe329cc I've tried the same trick in the ABI parser, but this reveals your code likes to seek backwards. Try the attached patch against this revision to confirm this. Having looked over your code, I don't believe you need to use seek and tell at all. This isn't critical to fix right now, but I would like us to solve it. Would you like to try? Make a new branch from the current master for this please. Regards, Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: tell_hack.patch Type: application/octet-stream Size: 1466 bytes Desc: not available URL: From p.j.a.cock at googlemail.com Thu Aug 4 07:47:49 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 4 Aug 2011 12:47:49 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Thu, Aug 4, 2011 at 12:30 PM, Wibowo Arindrarto wrote: > Hi Peter, > Ah yes, I didn't know there could be handles without .seek() and .tell(), > and I thought those two are the proper way of traversing files, so I used > them. I also didn't realize you could use SeqIO with network handles, too. > This is really neat :). Yes - having a handle focused API makes some clever stuff possible :) Of course, parsing sequences directly from network handles isn't always a good idea, but it can be useful. > In any case, sure, I'd love to make some changes to the current AbiIO code > so it works without .seek() and .tell(). Is there any other input types that > does not use .seek() and .tell() other than network handles? I suspect some specialised handles for accessing compressed files might have similar limitations. In the case of gzip at least, I think it does support seek and tell. > Here's my new branch from the current master: > https://github.com/bow/biopython/tree/seqio-abi_handlefix > nothing different for now but I'll push my updates soon. Don't rush yourself - I'm away for a long weekend so won't be testing any updates till next week anyway. Thanks, Peter From b.invergo at gmail.com Thu Aug 4 11:38:23 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 04 Aug 2011 17:38:23 +0200 Subject: [Biopython-dev] Leaked handles in PAML unit tests In-Reply-To: References: <1312366681.1302.9.camel@localhost.localdomain> Message-ID: <1312472309.8916.15.camel@localhost.localdomain> Hi Peter, (I'm CCing this to the dev list for the info in the second paragraph) Thanks for the reply. I solved the Python2 problem by fixing my PYTHONPATH. Running the tests from the Tests directory couldn't find the Bio module due to a mistake in the PYTHONPATH, so I tried to run them from the parent directory, resulting in test failures. A dumb mistake but anyway it's fixed. Sorry for wasting your time with that. I still have the following error with Python 3.2, though, which prevents me from figuring out the leaked handle problem in Py3k: [brandon at brandon-linux Tests]$ python test_PAML_baseml.py Traceback (most recent call last): File "test_PAML_baseml.py", line 10, in from Bio.Phylo.PAML import baseml File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/__init__.py", line 12, in from Bio.Phylo._io import parse, read, write, convert File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/_io.py", line 12, in from Bio.Phylo import BaseTree, NewickIO, NexusIO File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/BaseTree.py", line 222 return u'%s(%s)' % (self.__class__.__name__, SyntaxError: invalid syntax Regarding that specific error, I think all strings are implicitly unicode in Python 3, aren't they? I don't have much experience with maintaing Py2/3 compatibility, though, so I don't know how to best handle this. Searching for the unicode operator (u') in the entire Bio file tree shows that it only exists in Phylo/PhyloXML.py and Phylo/BaseTree.py. -brandon On Wed, 2011-08-03 at 13:33 +0100, Peter Cock wrote: > On Wed, Aug 3, 2011 at 11:18 AM, Brandon Invergo wrote: > > Hi Peter, > > I'm still in the process of looking at them now but I'm running into a > > side issue that maybe you can help with. I've tried running the unit > > tests myself using both Python 2.7.2 and Python 3.2.1, the two versions > > I have, and both times it fails. > > Python 3 takes a bit more effort to debug due to the 2to3 thing > and different paths - so I'd focus on Python 2.7 initially. > > > Just looking at test_PAML_baseml.py, for example, with Python 2 I get a > > lot of test failures due to baseml.py now (correctly) throwing IOErrors > > rather than AttributeErrors or TypeErrors. With Python 3, on the other > > hand, I get syntax errors in BaseTree.py (I'll include the output of > > both below). I did a git pull upstream master before doing this, so my > > code should be up-to-date (it seems like the unit tests are out-of-date, > > re: the error types). Now, clearly these have passed on the build > > machine so I'm wondering what I could be doing wrong. Being able to > > replicate the test failures in Python 3 on my machine will really help > > in fixing them. > > Sorry about the probable-newbie question... > > What does "git status" give you? > > My usual routine is as follows, but I clone from the official repository > (which is therefore called origin), and have my personal one setup > as peterjc via "git remote add ...": > > git checkout master #if not there already > git fetch origin > git status #should say behind and can FF merge > git merge origin/master #should now have latest code > > I'm guessing you're working from a clone of your github repo? > > An easy thing to try is a fresh clone of the official biopython. > > The other key point is all the unit tests expect the current > directory to be the Tests directory NOT the parent directory > where setup.py lives. > > Note if you just do "python test_PAML_baseml.py" this will > pickup the installed Biopython (via PYTHONPATH etc). > > One option is "runtests.py test_PAML_baseml.py" which > will use the local code for you. > > If you do "python Tests/test_PAML_baseml.py" this should > pickup the source code for Biopython (won't work for any > compiled modules IIRC). > > Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 490 bytes Desc: This is a digitally signed message part URL: From p.j.a.cock at googlemail.com Thu Aug 4 11:59:42 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 4 Aug 2011 16:59:42 +0100 Subject: [Biopython-dev] Leaked handles in PAML unit tests In-Reply-To: <1312472309.8916.15.camel@localhost.localdomain> References: <1312366681.1302.9.camel@localhost.localdomain> <1312472309.8916.15.camel@localhost.localdomain> Message-ID: On Thu, Aug 4, 2011 at 4:38 PM, Brandon Invergo wrote: > Hi Peter, > (I'm CCing this to the dev list for the info in the second paragraph) > Thanks for the reply. I solved the Python2 problem by fixing my > PYTHONPATH. Running the tests from the Tests directory couldn't find the > Bio module due to a mistake in the PYTHONPATH, so I tried to run them > from the parent directory, resulting in test failures. A dumb mistake > but anyway it's fixed. Sorry for wasting your time with that. No problem - learning about paths and imports is a bit tricky. > I still have the following error with Python 3.2, though, which prevents > me from figuring out the leaked handle problem in Py3k: > [brandon at brandon-linux Tests]$ python test_PAML_baseml.py > Traceback (most recent call last): > ?File "test_PAML_baseml.py", line 10, in > ? ?from Bio.Phylo.PAML import baseml > ?File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/__init__.py", > line 12, in > ? ?from Bio.Phylo._io import parse, read, write, convert > ?File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/_io.py", line > 12, in > ? ?from Bio.Phylo import BaseTree, NewickIO, NexusIO > ?File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/BaseTree.py", > line 222 > ? ?return u'%s(%s)' % (self.__class__.__name__, > > SyntaxError: invalid syntax Hang on - that looks like you ran it with "python" meaning Python 2.x Working with Python 3 the following should "just work": cd /home/brandon/Projects/pypaml/biopython python3 setup.py build python3 setup.py test python3 setup.py install #Use sudo or --prefix etc if you want However, if you want to run the offline test only, you need to go into the Python3 converted Tests directory, not the unconverted Python2 Tests directory. Note that this is Biopython specific (but based on what NumPy does). e.g. cd /home/brandon/Projects/pypaml/biopython python3 setup.py build cd build/py3.2/Tests python3 run_tests.py --offline Likewise if you want to test just one module, cd /home/brandon/Projects/pypaml/biopython python3 setup.py build cd build/py3.2/Tests python3 run_tests.py test_PAML_baseml.py In the above, run_tests.py should take care of the path settings to ensure the freshly built Biopython is used (not whatever old version may be installed elsewhere). If the above works nicely for you, stick with that. Alternatively, I often just install in-development versions of Biopython on my personal machine under my home directory (where Python 3 was also installed using the --prefix option so I don't need to mess about with the PYTHONPATH): cd /home/brandon/Projects/pypaml/biopython python3 setup.py install --prefix=$HOME cd build/py3.2/Tests python3 test_PAML_baseml.py If your Python 3 is installed at system level you can do this but it isn't very clean (certainly don't do it on a shared machine): cd /home/brandon/Projects/pypaml/biopython sudo python3 setup.py install cd build/py3.2/Tests python3 test_PAML_baseml.py Alternatively if your Python 3 is at the system level you can install Biopython under your home directory but then you have to mess about with PYTHONPATH and keep changing it for Python2 vs Python3, since they use the same variable (a design choice I fail to see any advantages in). Confusing isn't it? There are other potential solutions to having multiple copies of Python installed, like using virtualenv... Peter From p.j.a.cock at googlemail.com Thu Aug 4 13:32:38 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 4 Aug 2011 18:32:38 +0100 Subject: [Biopython-dev] Leaked handles in PAML unit tests In-Reply-To: <1312478530.8916.20.camel@localhost.localdomain> References: <1312366681.1302.9.camel@localhost.localdomain> <1312472309.8916.15.camel@localhost.localdomain> <1312478530.8916.20.camel@localhost.localdomain> Message-ID: On Thu, Aug 4, 2011 at 6:22 PM, Brandon Invergo wrote: > > The above does work nicely for me. So nicely, in fact, that the PAML > tests all pass! So I'm still having trouble replicating the leaked > handles. I'm still trying to figure out what's happening... > It could be something silly with warning silencing being global and not local, and thus depends on the order the tests are run in. Did you try running all the (offline) tests in one go under Python 3.2? Peter From b.invergo at gmail.com Thu Aug 4 14:21:59 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 04 Aug 2011 20:21:59 +0200 Subject: [Biopython-dev] Leaked handles in PAML unit tests In-Reply-To: References: <1312366681.1302.9.camel@localhost.localdomain> <1312472309.8916.15.camel@localhost.localdomain> <1312478530.8916.20.camel@localhost.localdomain> Message-ID: <1312482121.8916.22.camel@localhost.localdomain> Ok, now I've got the errors. Now I can actually get to work. Thanks for your help with this. I had no idea about the special Py3 building (I've just been using the raw tests from the repository) I'll see what I can do now. -brandon On Thu, 2011-08-04 at 18:32 +0100, Peter Cock wrote: > On Thu, Aug 4, 2011 at 6:22 PM, Brandon Invergo wrote: > > > > The above does work nicely for me. So nicely, in fact, that the PAML > > tests all pass! So I'm still having trouble replicating the leaked > > handles. I'm still trying to figure out what's happening... > > > > It could be something silly with warning silencing being global > and not local, and thus depends on the order the tests are run in. > > Did you try running all the (offline) tests in one go under Python 3.2? > > Peter From b.invergo at gmail.com Fri Aug 5 09:58:27 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Fri, 05 Aug 2011 15:58:27 +0200 Subject: [Biopython-dev] Leaked handles in PAML unit tests In-Reply-To: References: <1312366681.1302.9.camel@localhost.localdomain> <1312472309.8916.15.camel@localhost.localdomain> <1312478530.8916.20.camel@localhost.localdomain> Message-ID: <1312552714.8916.28.camel@localhost.localdomain> Ok the leaks have been taken care of. The problem arises when an exception is raised within a block of text in which a file handle is currently open. I simply had to close the handle just before raising the exception. There was another one, however, that came up from using stdout=open('/dev/null', 'w') in the subprocess.call() to PAML programs (which, come to think of it, is *nix-specific anyway, and probably wouldn't work with Windows). Instead, I set stdout to a subprocess.PIPE and get rid of the /dev/null handle altogether. Cheers, Brandon On Thu, 2011-08-04 at 18:32 +0100, Peter Cock wrote: > On Thu, Aug 4, 2011 at 6:22 PM, Brandon Invergo wrote: > > > > The above does work nicely for me. So nicely, in fact, that the PAML > > tests all pass! So I'm still having trouble replicating the leaked > > handles. I'm still trying to figure out what's happening... > > > > It could be something silly with warning silencing being global > and not local, and thus depends on the order the tests are run in. > > Did you try running all the (offline) tests in one go under Python 3.2? > > Peter From w.arindrarto at gmail.com Sat Aug 6 05:52:13 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Sat, 6 Aug 2011 11:52:13 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter & everyone, I've been trying to improve the parser so it works with forward-only handles, but I'm drawing a blank for now. I realized the reason I use seek in the first place was because of the file structure. In an Abi file we've got three data blocks: the header that contains the file information, the sequencing data, and the directories which serve as indexes to the sequencing data. To unpack the sequencing data bytes, we need the information stored in the directories. Depending on its size, it could be stored outside the directories block, or in the directory itself. This is why .seek() helps, because it allows for jumping between the directories and the sequencing data as it is being parsed. Now, I thought the three blocks were stored in this order: header - directory - sequencing data. I've thought of a way of parsing the file if the structure is like this. As it turns out, it's possible (or even this might be the norm) that the order is: header - sequencing data - directory. So as soon as I finished parsing the information on how to retrieve the data from the directories, I've already gone past the data block. In forward-only handles, this makes the data irretrievable. There should be other ways to retrieve the sequencing data in forward-only handles. I thought about reading the entire handle stream first and storing it into a variable. This way, we could replace seek() with slicing operators. The trade off is we store the entire handle stream in memory at once (abi files are probably ~300-500kb in size). I'm sure there are other ways, but I couldn't think of any now. So what do you think? Or maybe anyone else have ideas that I could try? Regards & have a nice weekend all, --- Wibowo Arindrarto (bow) http://bow.web.id On Thu, Aug 4, 2011 at 13:47, Peter Cock wrote: > On Thu, Aug 4, 2011 at 12:30 PM, Wibowo Arindrarto > wrote: > > Hi Peter, > > Ah yes, I didn't know there could be handles without .seek() and .tell(), > > and I thought those two are the proper way of traversing files, so I used > > them. I also didn't realize you could use SeqIO with network handles, > too. > > This is really neat :). > > Yes - having a handle focused API makes some clever stuff possible :) > Of course, parsing sequences directly from network handles isn't always > a good idea, but it can be useful. > > > In any case, sure, I'd love to make some changes to the current AbiIO > code > > so it works without .seek() and .tell(). Is there any other input types > that > > does not use .seek() and .tell() other than network handles? > > I suspect some specialised handles for accessing compressed files might > have similar limitations. In the case of gzip at least, I think it does > support > seek and tell. > > > Here's my new branch from the current master: > > https://github.com/bow/biopython/tree/seqio-abi_handlefix > > nothing different for now but I'll push my updates soon. > > Don't rush yourself - I'm away for a long weekend so won't be testing > any updates till next week anyway. > > Thanks, > > Peter > From derjogi at web.de Sun Aug 7 09:44:03 2011 From: derjogi at web.de (Jogi) Date: Sun, 07 Aug 2011 15:44:03 +0200 Subject: [Biopython-dev] Bug in Bio.Restriction.Analysis.print_as('map') + correction Message-ID: <1312724643.2148.5.camel@JogiDesk> I'm new to the field of 'bug reporting', so please, if someone knows where I should post this message please tell me or do it yourself :) I've found a bug in the Bio.Restriction module when calling Analysis.print_as('map'). The bugs (that I know of and that I corrected): 1. When there is a restriction site within the first 60 basepairs in the sequence this one isn't added to a list and thus raises an KeyError: 0 2. Sometimes (I don't know exactly how to reproduce it any more) an Enzyme is repeated in every line although there is no restriction site. Solution: Replace from line 310 in PrintFormat.py: x, counter, length = 0, 0, len(self.sequence) for x in xrange(60, length, 60): counter = x - 60 l=[] for key in mapping: if key <= x: l.append(key) else: cutloc[counter] = l mapping = mapping[mapping.index(key):] break cutloc[x] = l cutloc[x] = mapping sequence = self.sequence.tostring() With upper, lower, length = 0, 0, len(self.sequence) for upper in xrange(60, length+60, 60): lower = upper - 60 l=[] for key in mapping: if key <= upper and key > lower: l.append(key) else: mapping = mapping[mapping.index(key):] break cutloc[lower] = l sequence = self.sequence.tostring() Hope this bug report/solution was/is helpful and at the right place :) J.Kuhn From p.j.a.cock at googlemail.com Tue Aug 9 09:40:18 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 9 Aug 2011 14:40:18 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Sat, Aug 6, 2011 at 10:52 AM, Wibowo Arindrarto wrote: > Hi Peter & everyone, > I've been trying to improve the parser so it works with forward-only > handles, but I'm drawing a blank for now. > I realized the reason I use seek in the first place was because of the file > structure. In an Abi file we've got three data blocks: the header that > contains the file information, the sequencing data, and the directories > which serve as indexes to the sequencing data. To unpack the sequencing data > bytes, we need the information stored in the directories. Depending on its > size, it could be stored outside the directories block, or in the directory > itself. This is why .seek() helps, because it allows for jumping between the > directories and the sequencing data as it is being parsed. Yes - this design makes sense, especially given the computer capabilities back when the format was designed. > Now, I thought the three blocks were stored in this order: header - > directory - sequencing data. I've thought of a way of parsing the file if > the structure is like this.?As it turns out, it's possible (or even this > might be the norm) that the order is: header - sequencing data - directory. > So as soon as I finished parsing the information on how to retrieve the data > from the directories, I've already gone past the data block. In forward-only > handles, this makes the data irretrievable. I see now, that is unfortunate. I presume the current order was chosen to make writing the data easy (do the directory last). A simple forward only parser would be possible IF the data was reordered, but we can't require that. > There should be other ways to retrieve the sequencing data in forward-only > handles. I thought about reading the entire handle stream first and storing > it into a variable. This way, we could replace seek() with slicing > operators. The trade off is we store the entire handle stream in memory at > once (abi files are probably ~300-500kb in size). I'm sure there are other > ways, but I couldn't think of any now. > So what do you think? Or maybe anyone else have ideas that I could try? > Regards & have a nice weekend all, I think we have to accept that typical ABI files are not suitable for forward only parsing. Thanks for looking into this - I hope you found it interesting. Regards, Peter From redmine at redmine.open-bio.org Tue Aug 9 10:29:53 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 9 Aug 2011 14:29:53 +0000 Subject: [Biopython-dev] [Biopython - Bug #3278] (New) SeqIO tries to use Gapped without import Message-ID: Issue #3278 has been reported by Paul Agapow. ---------------------------------------- Bug #3278: SeqIO tries to use Gapped without import https://redmine.open-bio.org/issues/3278 Author: Paul Agapow Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: 1.53 URL: @to_alignment@ in @SeqIO@ uses @Gapped@ (@ isinstance(alphabet, Gapped)@) but does not actually import @Gapped at . Thus a @NameError@ results. Although the method is labelled obsolete, it is used by @SeqIO@ in write when an @AlignIO@ writer must be used (e.g. when trying to write sequences to a Nexus file). Solution: @from Bio.Alphabet import Gapped@ ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Aug 9 10:29:54 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 9 Aug 2011 14:29:54 +0000 Subject: [Biopython-dev] [Biopython - Bug #3278] (New) SeqIO tries to use Gapped without import Message-ID: Issue #3278 has been reported by Paul Agapow. ---------------------------------------- Bug #3278: SeqIO tries to use Gapped without import https://redmine.open-bio.org/issues/3278 Author: Paul Agapow Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: 1.53 URL: @to_alignment@ in @SeqIO@ uses @Gapped@ (@ isinstance(alphabet, Gapped)@) but does not actually import @Gapped at . Thus a @NameError@ results. Although the method is labelled obsolete, it is used by @SeqIO@ in write when an @AlignIO@ writer must be used (e.g. when trying to write sequences to a Nexus file). Solution: @from Bio.Alphabet import Gapped@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Aug 9 10:47:22 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 9 Aug 2011 14:47:22 +0000 Subject: [Biopython-dev] [Biopython - Bug #3278] SeqIO tries to use Gapped without import References: Message-ID: Issue #3278 has been updated by Peter Cock. Looking at Biopython 1.53 (December 2009) you appear to be correct. However, the function was explicitly made obsolete in Biopython 1.54 (with a deprecation warning), and at that point this error did not exist. Unless there a related problem in the current release, I will close this report. Thanks. ---------------------------------------- Bug #3278: SeqIO tries to use Gapped without import https://redmine.open-bio.org/issues/3278 Author: Paul Agapow Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: 1.53 URL: @to_alignment@ in @SeqIO@ uses @Gapped@ (@ isinstance(alphabet, Gapped)@) but does not actually import @Gapped at . Thus a @NameError@ results. Although the method is labelled obsolete, it is used by @SeqIO@ in write when an @AlignIO@ writer must be used (e.g. when trying to write sequences to a Nexus file). Solution: @from Bio.Alphabet import Gapped@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Tue Aug 9 10:49:30 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 9 Aug 2011 15:49:30 +0100 Subject: [Biopython-dev] Bug in Bio.Restriction.Analysis.print_as('map') + correction In-Reply-To: <1312724643.2148.5.camel@JogiDesk> References: <1312724643.2148.5.camel@JogiDesk> Message-ID: On Sun, Aug 7, 2011 at 2:44 PM, Jogi wrote: > I'm new to the field of 'bug reporting', so please, if someone knows > where I should post this message please tell me or do it yourself :) > > I've found a bug in the Bio.Restriction module when calling > Analysis.print_as('map'). > > The bugs (that I know of and that I corrected): > 1. When there is a restriction site within the first 60 basepairs in the > sequence this one isn't added to a list and thus raises an KeyError: 0 Could you give a short example script showing the problem? It could then be used for a unit test. > 2. Sometimes (I don't know exactly how to reproduce it any more) an > Enzyme is repeated in every line although there is no restriction site. I'm not familiar with that problem - without an example that will be hard to look into. Peter From w.arindrarto at gmail.com Tue Aug 9 10:59:37 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 9 Aug 2011 16:59:37 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, You're welcome :)! Although a bit disappointing, it was nice when I understood why my forward parser didn't work. Regards, --- Wibowo Arindrarto (bow) http://bow.web.id On Tue, Aug 9, 2011 at 15:40, Peter Cock wrote: > On Sat, Aug 6, 2011 at 10:52 AM, Wibowo Arindrarto > wrote: > > Hi Peter & everyone, > > I've been trying to improve the parser so it works with forward-only > > handles, but I'm drawing a blank for now. > > I realized the reason I use seek in the first place was because of the > file > > structure. In an Abi file we've got three data blocks: the header that > > contains the file information, the sequencing data, and the directories > > which serve as indexes to the sequencing data. To unpack the sequencing > data > > bytes, we need the information stored in the directories. Depending on > its > > size, it could be stored outside the directories block, or in the > directory > > itself. This is why .seek() helps, because it allows for jumping between > the > > directories and the sequencing data as it is being parsed. > > Yes - this design makes sense, especially given the computer > capabilities back when the format was designed. > > > Now, I thought the three blocks were stored in this order: header - > > directory - sequencing data. I've thought of a way of parsing the file if > > the structure is like this. As it turns out, it's possible (or even this > > might be the norm) that the order is: header - sequencing data - > directory. > > So as soon as I finished parsing the information on how to retrieve the > data > > from the directories, I've already gone past the data block. In > forward-only > > handles, this makes the data irretrievable. > > I see now, that is unfortunate. I presume the current order was chosen > to make writing the data easy (do the directory last). A simple forward > only parser would be possible IF the data was reordered, but we can't > require that. > > > There should be other ways to retrieve the sequencing data in > forward-only > > handles. I thought about reading the entire handle stream first and > storing > > it into a variable. This way, we could replace seek() with slicing > > operators. The trade off is we store the entire handle stream in memory > at > > once (abi files are probably ~300-500kb in size). I'm sure there are > other > > ways, but I couldn't think of any now. > > So what do you think? Or maybe anyone else have ideas that I could try? > > Regards & have a nice weekend all, > > I think we have to accept that typical ABI files are not suitable for > forward > only parsing. Thanks for looking into this - I hope you found it > interesting. > > Regards, > > Peter > From redmine at redmine.open-bio.org Tue Aug 9 11:48:06 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 9 Aug 2011 15:48:06 +0000 Subject: [Biopython-dev] [Biopython - Bug #3278] (Closed) SeqIO tries to use Gapped without import References: Message-ID: Issue #3278 has been updated by Peter Cock. Status changed from New to Closed % Done changed from 0 to 100 I realised this deprecated function was due for removal, it will be gone in Biopython 1.58, https://github.com/biopython/biopython/commit/9eb934ee0425b4636b26f310a0f1454f53745b17 Marking this bug as closed. ---------------------------------------- Bug #3278: SeqIO tries to use Gapped without import https://redmine.open-bio.org/issues/3278 Author: Paul Agapow Status: Closed Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: 1.53 URL: @to_alignment@ in @SeqIO@ uses @Gapped@ (@ isinstance(alphabet, Gapped)@) but does not actually import @Gapped at . Thus a @NameError@ results. Although the method is labelled obsolete, it is used by @SeqIO@ in write when an @AlignIO@ writer must be used (e.g. when trying to write sequences to a Nexus file). Solution: @from Bio.Alphabet import Gapped@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Wed Aug 10 13:12:25 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 10 Aug 2011 18:12:25 +0100 Subject: [Biopython-dev] pypaml In-Reply-To: References: Message-ID: On Fri, Jan 14, 2011 at 2:11 PM, Brandon Invergo wrote: >> By the way, have you ever tried using this under Windows? > > I haven't yet but by the looks of it it should work fine assuming the > programs are in the system path and thus can be called by name from > any location in the file system. I see one line where I accidentally > made it *nix-specific (default working directory is "./") but other > than that, all files/directories are located via os.path or by > user-inputted strings (as they would be in the control file). I have > both a Linux and a Windows 7 machine at home though so I can do some > testing. Obviously the unit tests here will help catch system-specific > errors such as entering file locations incorrectly (I can see a few > exceptions that I'm currently not handling). Hi Brandon, Have you looked into PAML under Windows yet? Regards, Peter From b.invergo at gmail.com Wed Aug 10 13:16:08 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Wed, 10 Aug 2011 19:16:08 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: References: Message-ID: <1312996570.1339.12.camel@localhost.localdomain> On Wed, 2011-08-10 at 18:12 +0100, Peter Cock wrote: > Hi Brandon, > > Have you looked into PAML under Windows yet? > > Regards, > > Peter Hi Peter, Unfortunately, I don't have a Windows machine at my disposal to test it on! Has anyone reported any problems yet? -brandon From p.j.a.cock at googlemail.com Thu Aug 11 07:36:41 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 11 Aug 2011 12:36:41 +0100 Subject: [Biopython-dev] pypaml In-Reply-To: <1312996570.1339.12.camel@localhost.localdomain> References: <1312996570.1339.12.camel@localhost.localdomain> Message-ID: On Wed, Aug 10, 2011 at 6:16 PM, Brandon Invergo wrote: > On Wed, 2011-08-10 at 18:12 +0100, Peter Cock wrote: >> Hi Brandon, >> >> Have you looked into PAML under Windows yet? >> >> Regards, >> >> Peter > > Hi Peter, > Unfortunately, I don't have a Windows machine at my disposal to test it > on! Has anyone reported any problems yet? > > -brandon Hi Brandon, It's a shame you don't still have access to the Windows 7 box. I've just grabbed the current PAML 4.4 pre-compiled for Windows and put it on my Windows machine which runs as a buildslave, and put the binaries on the PATH: http://abacus.gene.ucl.ac.uk/software/paml.html http://abacus.gene.ucl.ac.uk/software/paml4.4e.tar.gz None of the current unit tests actually use the binaries do they? Could you add a basic test (in a separate file which raises the missing dependency exception to skip the test if the binary is not on the path) for calling the tools? Peter From b.invergo at gmail.com Thu Aug 11 07:51:26 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 11 Aug 2011 13:51:26 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: References: <1312996570.1339.12.camel@localhost.localdomain> Message-ID: <1313063488.1339.28.camel@localhost.localdomain> On Thu, 2011-08-11 at 12:36 +0100, Peter Cock wrote: > It's a shame you don't still have access to the Windows 7 box. > > I've just grabbed the current PAML 4.4 pre-compiled for Windows > and put it on my Windows machine which runs as a buildslave, > and put the binaries on the PATH: > > http://abacus.gene.ucl.ac.uk/software/paml.html > http://abacus.gene.ucl.ac.uk/software/paml4.4e.tar.gz > > None of the current unit tests actually use the binaries do they? > Could you add a basic test (in a separate file which raises the > missing dependency exception to skip the test if the binary is > not on the path) for calling the tools? > > Peter No, I didn't include any tests that use the binaries because I wasn't sure if they would be on the main test machine. Also, generating the output which is used in other tests can take a lot of time in some cases. Instead, I've generated the output files myself and then accessed those from the tests. The one problem I have with this approach is that it's not very reproducible; if someone else wishes to add data files from later versions of PAML, they won't know how I generated them. Again the goal is to make sure that we're parsing each new version correctly, since the output format has been known to change between versions. I could create a readme file which contains the info and put it in the paml Tests subfolder. Sound reasonable? I can create a Tests/test_PAML.py file to contain the proposed test. In it, I can try to run codeml, baseml and yn00 directly using Subprocess, each on some bogus input. If the binaries are there, they'll throw an error which the test will catch. If they aren't Subprocess itself will throw an error. I can't do this check using Bio.Phylo.PAML because we, of course, aim to prevent bogus input from ever even reaching the binary. How does that sound? Is that what you had in mind? -brandon From p.j.a.cock at googlemail.com Thu Aug 11 09:49:39 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 11 Aug 2011 14:49:39 +0100 Subject: [Biopython-dev] pypaml In-Reply-To: <1313063488.1339.28.camel@localhost.localdomain> References: <1312996570.1339.12.camel@localhost.localdomain> <1313063488.1339.28.camel@localhost.localdomain> Message-ID: On Thu, Aug 11, 2011 at 12:51 PM, Brandon Invergo wrote: > On Thu, 2011-08-11 at 12:36 +0100, Peter Cock wrote: >> It's a shame you don't still have access to the Windows 7 box. >> >> I've just grabbed the current PAML 4.4 pre-compiled for Windows >> and put it on my Windows machine which runs as a buildslave, >> and put the binaries on the PATH: >> >> http://abacus.gene.ucl.ac.uk/software/paml.html >> http://abacus.gene.ucl.ac.uk/software/paml4.4e.tar.gz >> >> None of the current unit tests actually use the binaries do they? >> Could you add a basic test (in a separate file which raises the >> missing dependency exception to skip the test if the binary is >> not on the path) for calling the tools? >> >> Peter > > No, I didn't include any tests that use the binaries because I wasn't > sure if they would be on the main test machine. Also, generating the > output which is used in other tests can take a lot of time in some > cases. Instead, I've generated the output files myself and then accessed > those from the tests. The one problem I have with this approach is that > it's not very reproducible; if someone else wishes to add data files > from later versions of PAML, they won't know how I generated them. Next time there is a PAML release, you'll have to make some more test files ;) > Again > the goal is to make sure that we're parsing each new version correctly, > since the output format has been known to change between versions. I > could create a readme file which contains the info and put it in the > paml Tests subfolder. Sound reasonable? Yes. > I can create a Tests/test_PAML.py file to contain the proposed test. In > it, I can try to run codeml, baseml and yn00 directly using Subprocess, > each on some bogus input. If the binaries are there, they'll throw an > error which the test will catch. If they aren't Subprocess itself will > throw an error. I can't do this check using Bio.Phylo.PAML because we, > of course, aim to prevent bogus input from ever even reaching the > binary. How does that sound? Is that what you had in mind? I believe we're thinking on the same lines here - have a look at test_Muscle_tool.py or test_Emboss.py and others like it. There is some header code which tries to locate the binaries, and perhaps check their version. Some tools have a switch like -v or --help or similar which makes them immediately exit, sometimes with a version number. This is less trouble than trying to run them with a dummy input file. Having had a quick play with ds.exe it generally seems to insist on asking for an input file, so you may have to go that route. But see if this is useful - probably you'd need /dev/nul on Unix machines: C:\repositories\biopython\Tests>ds nul results go into out.txt (1) collecting min, max, and mean 0:00 (2) variance-covariance matrix 0:00 (3) median, percentiles & serial correlation 0:00 (4) Histograms and 1-D densities If the binaries are missing or the wrong version, we raise MissingExternalDependencyError and the test gets skipped. If the binaries are present (and the right version), use the normal unittest framework. Try to make the examples quick to run (aim for well under a minute for the whole test), so smaller datafiles than might be typical. Peter From p.j.a.cock at googlemail.com Thu Aug 11 12:06:48 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 11 Aug 2011 17:06:48 +0100 Subject: [Biopython-dev] Deprecated code in Bio.PopGen and Bio.Motif ready to go? Message-ID: Hi Tiago & Bartek, Looking over the DEPRECATED file, the following are about due for removal in Bio.PopGen and Bio.Motif - do you guys have time to make these changes yourselves? Thanks, Peter > Bio.PopGen.FDist > ================ > The RecordParser, _Scanner, and _RecordConsumer classes were declared obsolete > in Release 1.54, and deprecated in Release 1.55 final. Their functionality is > now available through a read() function. and: > Bio.Motif > ========= > ... > AlignAceConsumer, AlignAceParser, and AlignAceScanner were declared obsolete > in Release 1.53 and deprecated in Release 1.55 final; their functionality is > now available through a read() function in Bio.Motif.Parsers.AlignAce. > MEMEParser, _MEMEScanner, _MEMEConsumer, _MASTConsumer, MASTParser, > _MASTScanner, and MASTRecord were declared obsolete in Release 1.54 and > deprecated in Release 1.55 final; their functionality is now available through > a read() function in Bio.Motif.Parsers.MEME and Bio.Motif.Parsers.MAST, > respectively. P.S. We don't usually need to mention private classes like _MEMEScanner in the DEPRECATE file. From tiagoantao at gmail.com Thu Aug 11 12:15:08 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 11 Aug 2011 17:15:08 +0100 Subject: [Biopython-dev] Deprecated code in Bio.PopGen and Bio.Motif ready to go? In-Reply-To: References: Message-ID: I will do it over the weekend for bio.popgen 2011/8/11, Peter Cock : > Hi Tiago & Bartek, > > Looking over the DEPRECATED file, the following are about due for removal > in Bio.PopGen and Bio.Motif - do you guys have time to make these changes > yourselves? > > Thanks, > > Peter > >> Bio.PopGen.FDist >> ================ >> The RecordParser, _Scanner, and _RecordConsumer classes were declared >> obsolete >> in Release 1.54, and deprecated in Release 1.55 final. Their functionality >> is >> now available through a read() function. > > and: > >> Bio.Motif >> ========= >> ... >> AlignAceConsumer, AlignAceParser, and AlignAceScanner were declared >> obsolete >> in Release 1.53 and deprecated in Release 1.55 final; their functionality >> is >> now available through a read() function in Bio.Motif.Parsers.AlignAce. >> MEMEParser, _MEMEScanner, _MEMEConsumer, _MASTConsumer, MASTParser, >> _MASTScanner, and MASTRecord were declared obsolete in Release 1.54 and >> deprecated in Release 1.55 final; their functionality is now available >> through >> a read() function in Bio.Motif.Parsers.MEME and Bio.Motif.Parsers.MAST, >> respectively. > > P.S. We don't usually need to mention private classes like _MEMEScanner in > the DEPRECATE file. > -- Enviada a partir do meu dispositivo m?vel "If you want to get laid, go to college. If you want an education, go to the library." - Frank Zappa From barwil at gmail.com Thu Aug 11 12:28:01 2011 From: barwil at gmail.com (Bartek Wilczynski) Date: Thu, 11 Aug 2011 09:28:01 -0700 Subject: [Biopython-dev] Deprecated code in Bio.PopGen and Bio.Motif ready to go? In-Reply-To: References: Message-ID: Hi, I'll do the necessary changes in Bio.Motif by the end of the week. best Bartek 2011/8/11 Peter Cock : > Hi Tiago & Bartek, > > Looking over the DEPRECATED file, the following are about due for removal > in Bio.PopGen and Bio.Motif - do you guys have time to make these changes > yourselves? > > Thanks, > > Peter > >> Bio.PopGen.FDist >> ================ >> The RecordParser, _Scanner, and _RecordConsumer classes were declared obsolete >> in Release 1.54, and deprecated in Release 1.55 final. Their functionality is >> now available through a read() function. > > and: > >> Bio.Motif >> ========= >> ... >> AlignAceConsumer, AlignAceParser, and AlignAceScanner were declared obsolete >> in Release 1.53 and deprecated in Release 1.55 final; their functionality is >> now available through a read() function in Bio.Motif.Parsers.AlignAce. >> MEMEParser, _MEMEScanner, _MEMEConsumer, _MASTConsumer, MASTParser, >> _MASTScanner, and MASTRecord were declared obsolete in Release 1.54 and >> deprecated in Release 1.55 final; their functionality is now available through >> a read() function in Bio.Motif.Parsers.MEME and Bio.Motif.Parsers.MAST, >> respectively. > > P.S. We don't usually need to mention private classes like _MEMEScanner in > the DEPRECATE file. > -- Bartek Wilczynski ================== Institute of Informatics University of Warsaw http://www.mimuw.edu.pl/~bartek From redmine at redmine.open-bio.org Mon Aug 15 05:59:39 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 15 Aug 2011 09:59:39 +0000 Subject: [Biopython-dev] [Biopython - Bug #3188] (Closed) Test bug, please ignore References: Message-ID: Issue #3188 has been updated by Peter Cock. Status changed from New to Closed % Done changed from 0 to 100 Should have closed this test bug a while ago. ---------------------------------------- Bug #3188: Test bug, please ignore https://redmine.open-bio.org/issues/3188 Author: Peter Cock Status: Closed Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: The aim of this bug is to test the Redmine "Email on New Issue" option from the Newissuealerts module. This issue should get emailed to the biopython-dev email list automatically... Peter -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Mon Aug 15 06:04:41 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 15 Aug 2011 11:04:41 +0100 Subject: [Biopython-dev] Release blockers? PAML? Message-ID: Hi all, We're about due to make a Biopython release, and I could do it early this week - but then I'm away for a fortnight. I am fortunate to be attending the BioHackathon 2011 in Kyoto next week, http://2011.biohackathon.org/ I think we're in a good position with the code on the trunk to release Biopython 1.58, bar the PAML code which has not yet been tested on Windows. Also, I'd be keen for Tiago and Brandon to take a look at the application calling code to see if the is any scope for a more common approach between the PAML wrappers and the PopGen tools. Note that both sets of tools are not 'nicely behaved' Unix style tools (which is what the Bio.Applications API targets). To do anything useful with these tools you have to do nasty things like switch the current working directory and so on. If we want to do the release this week, we could just warn that the PAML code is consider to be "in beta" and that the API may well change in non-backwards compatible ways? What else should be addressed before the next release? There are some open bugs, but at first glance nothing critical. Regards, Peter From b.invergo at gmail.com Mon Aug 15 06:15:04 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Mon, 15 Aug 2011 12:15:04 +0200 Subject: [Biopython-dev] Release blockers? PAML? In-Reply-To: References: Message-ID: <1313403306.3107.5.camel@localhost.localdomain> Hi, Regarding PAML, I'm sorry I haven't implemented the binary tests yet. I'll put it on my to-do for today. Turns out it's a Spanish national holiday today so I guess I don't have to go to the lab. I have a Windows 7 laptop that up until now has been quarantined and used only for music software, with no other software allowed on it, not allowed near the interwebs, etc (it's a fickle machine), but last night I broke the rules and installed Python 2.7 on it. I'll try running the PAML tests on it and I'll let everyone know how it goes. Until later, -brandon On Mon, 2011-08-15 at 11:04 +0100, Peter Cock wrote: > Hi all, > > We're about due to make a Biopython release, and I could > do it early this week - but then I'm away for a fortnight. I am > fortunate to be attending the BioHackathon 2011 in Kyoto > next week, http://2011.biohackathon.org/ > > I think we're in a good position with the code on the trunk to > release Biopython 1.58, bar the PAML code which has not > yet been tested on Windows. Also, I'd be keen for Tiago and > Brandon to take a look at the application calling code to see > if the is any scope for a more common approach between > the PAML wrappers and the PopGen tools. Note that both > sets of tools are not 'nicely behaved' Unix style tools (which > is what the Bio.Applications API targets). To do anything > useful with these tools you have to do nasty things like > switch the current working directory and so on. > > If we want to do the release this week, we could just warn > that the PAML code is consider to be "in beta" and that > the API may well change in non-backwards compatible > ways? > > What else should be addressed before the next release? > > There are some open bugs, but at first glance nothing > critical. > > Regards, > > Peter From eric.talevich at gmail.com Mon Aug 15 11:02:57 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 15 Aug 2011 11:02:57 -0400 Subject: [Biopython-dev] Release blockers? PAML? In-Reply-To: References: Message-ID: On Mon, Aug 15, 2011 at 6:04 AM, Peter Cock wrote: > Hi all, > > We're about due to make a Biopython release, and I could > do it early this week - but then I'm away for a fortnight. I am > fortunate to be attending the BioHackathon 2011 in Kyoto > next week, http://2011.biohackathon.org/ > > [...] > What else should be addressed before the next release? > > There are some open bugs, but at first glance nothing > critical. > > A while ago I pushed a new function, Phylo.draw(). It draws rooted phylograms much like Phylip's drawgram or ape's plot.tree function. There's a lot of room for personal preferences here, so I'd appreciate if someone else could try it out and suggest changes. Usage: >>> from Bio import Phylo >>> tree = Phylo.read('some_tree.nwk', 'newick') >>> Phylo.draw(tree) Code: https://github.com/biopython/biopython/blob/master/Bio/Phylo/_utils.py The function only takes a few arguments, but since it's based on matplotlib/pylab, the aesthetics of a plot can easily be changed after the initial plotting. If we're happy with it, then I'll add a mention of it to the Tutorial. While I'm at it, has anyone else used Bio.Applications.PhymlCommandline and found any issues? Thanks, Eric From b.invergo at gmail.com Tue Aug 16 16:06:24 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Tue, 16 Aug 2011 22:06:24 +0200 Subject: [Biopython-dev] Release blockers? PAML? In-Reply-To: References: Message-ID: <1313525186.3107.7.camel@localhost.localdomain> Hi everyone, I wrote some tests for the presence of the PAML binaries and I've run all the unit tests in Python 2.7 on Windows 7 and they all pass. Cheers, Brandon On Mon, 2011-08-15 at 11:04 +0100, Peter Cock wrote: > Hi all, > > We're about due to make a Biopython release, and I could > do it early this week - but then I'm away for a fortnight. I am > fortunate to be attending the BioHackathon 2011 in Kyoto > next week, http://2011.biohackathon.org/ > > I think we're in a good position with the code on the trunk to > release Biopython 1.58, bar the PAML code which has not > yet been tested on Windows. Also, I'd be keen for Tiago and > Brandon to take a look at the application calling code to see > if the is any scope for a more common approach between > the PAML wrappers and the PopGen tools. Note that both > sets of tools are not 'nicely behaved' Unix style tools (which > is what the Bio.Applications API targets). To do anything > useful with these tools you have to do nasty things like > switch the current working directory and so on. > > If we want to do the release this week, we could just warn > that the PAML code is consider to be "in beta" and that > the API may well change in non-backwards compatible > ways? > > What else should be addressed before the next release? > > There are some open bugs, but at first glance nothing > critical. > > Regards, > > Peter From p.j.a.cock at googlemail.com Wed Aug 17 11:28:16 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 17 Aug 2011 16:28:16 +0100 Subject: [Biopython-dev] PAML yn00 under Windows Message-ID: Hi Brandon, It looks like the stats line parsing in yn00 needs a little adjustment for this platform, ====================================================================== ERROR: Test that the yn00 binary runs and generates correct output. ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\repositories\BuildBot\win26\build\Tests\test_PAML_tools.py", line 139, in testYn00Binary results = self.yn.run() File "c:\repositories\BuildBot\win26\build\build\lib.win32-2.6\Bio\Phylo\PAML\yn00.py", line 106, in run results = read(self.out_file) File "c:\repositories\BuildBot\win26\build\build\lib.win32-2.6\Bio\Phylo\PAML\yn00.py", line 131, in read sequences) File "c:\repositories\BuildBot\win26\build\build\lib.win32-2.6\Bio\Phylo\PAML\_parse_yn00.py", line 110, in parse_others value = stats_split[i+2].strip("()") IndexError: list index out of range ---------------------------------------------------------------------- Ran 157 tests in 282.385 seconds I added this commit for a more helpful error message: https://github.com/biopython/biopython/commit/420430164d258aae27714d907705cd729626f3c6 C:\repositories\biopython\Tests>c:\python26\python test_PAML_tools.py Test that the baseml binary runs and generates correct output ... ok Test that the codeml binary runs and generates correct output ... ok Test that the yn00 binary runs and generates correct output. ... ERROR ====================================================================== ERROR: Test that the yn00 binary runs and generates correct output. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PAML_tools.py", line 139, in testYn00Binary results = self.yn.run() File "c:\python26\Lib\site-packages\Bio\Phylo\PAML\yn00.py", line 106, in run results = read(self.out_file) File "c:\python26\Lib\site-packages\Bio\Phylo\PAML\yn00.py", line 131, in read sequences) File "c:\python26\Lib\site-packages\Bio\Phylo\PAML\_parse_yn00.py", line 113, in parse_others raise ValueError("Problem with stats line: %r" % line) ValueError: Problem with stats line: 'LWL85m: dS = -1.#IND dN = -1.#IND w =-1.#IND S = -1.$ N = -1.$ (rho = -1.#IO)\n' ---------------------------------------------------------------------- Ran 3 tests in 1.312s FAILED (errors=1) It looks like you're not expecting a bracket pattern quite like that (and/or this is a cross platform C float representation issue). Hopefully that string is enough to work out how to fix the parser, even if you can't reproduce this on your own machine. I can try and find the output file if you like... might have to disable the tool's clean up code temporarily to leave it behind. Regards, Peter From p.j.a.cock at googlemail.com Wed Aug 17 11:39:41 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 17 Aug 2011 16:39:41 +0100 Subject: [Biopython-dev] PAML yn00 under Windows In-Reply-To: References: Message-ID: On Wed, Aug 17, 2011 at 4:28 PM, Peter Cock wrote: > Hi Brandon, > > It looks like the stats line parsing in yn00 needs a little adjustment > for this platform, > ... > ? ?value = stats_split[i+2].strip("()") > IndexError: list index out of range > > > ... > ? ?raise ValueError("Problem with stats line: %r" % line) > ValueError: Problem with stats line: 'LWL85m: dS = -1.#IND dN = > -1.#IND w =-1.#IND S = ? -1.$ N = ? -1.$ (rho = -1.#IO)\n' I think you need to adjustment to the bounds on i given you want to use stats_split[i] and stats_split[i+2]. Note sure if want a -3 or -2 on the upper bound... C:\repositories\biopython\Tests>git diff diff --git a/Bio/Phylo/PAML/_parse_yn00.py b/Bio/Phylo/PAML/_parse_yn00.py index 221b6de..e4967fb 100644 --- a/Bio/Phylo/PAML/_parse_yn00.py +++ b/Bio/Phylo/PAML/_parse_yn00.py @@ -103,7 +103,7 @@ def parse_others(lines, results, sequences): stats = {} line_stats = line.split(":")[1].strip() stats_split = line_stats.split() - for i in range(0, len(stats_split), 3): + for i in range(0, len(stats_split)-3, 3): stat = stats_split[i].strip("()") if stat == "w": stat = "omega" I don't know why this didn't come up under Linux, something subtle going on between the PAML versions maybe? Regards, Peter From p.j.a.cock at googlemail.com Wed Aug 17 13:02:24 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 17 Aug 2011 18:02:24 +0100 Subject: [Biopython-dev] PAML yn00 under Windows In-Reply-To: References: Message-ID: Hi again, You may have noticed from the buildbot emails that there is a separate issue with the PAML tests on Python (2.4 and) 2.5, applying to executing all three binaries tried: yn00, baseml and codeml, e.g. http://testing.open-bio.org:8010/builders/Windows%20XP%20-%20Python%202.4/builds/259/steps/shell/logs/stdio ====================================================================== ERROR: Test that the yn00 binary runs and generates correct output. ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\repositories\BuildBot\win24\build\Tests\test_PAML_tools.py", line 139, in testYn00Binary results = self.yn.run() File "c:\repositories\BuildBot\win24\build\build\lib.win32-2.4\Bio\Phylo\PAML\yn00.py", line 104, in run Paml.run(self, ctl_file, verbose, command) File "c:\repositories\BuildBot\win24\build\build\lib.win32-2.4\Bio\Phylo\PAML\_paml.py", line 148, in run raise EnvironmentError, "The %s process was killed." % command EnvironmentError: The yn00 process was killed. ---------------------------------------------------------------------- I can reproduce this at the terminal window, and it is specific to Python (2.4 and) 2.5, using Python 2.6 and 2.7 is fine, as are Python 3.1 and 3.2. Peter From p.j.a.cock at googlemail.com Wed Aug 17 13:56:28 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 17 Aug 2011 18:56:28 +0100 Subject: [Biopython-dev] PAML yn00 under Windows In-Reply-To: References: Message-ID: On Wed, Aug 17, 2011 at 6:02 PM, Peter Cock wrote: > Hi again, > > You may have noticed from the buildbot emails that there is a > separate issue with the PAML tests on Python (2.4 and) 2.5, > applying to executing all three binaries tried: yn00, baseml > and codeml, e.g. > ... > I can reproduce this at the terminal window, and it is specific > to Python (2.4 and) 2.5, using Python 2.6 and 2.7 is fine, as > are Python 3.1 and 3.2. I'm getting -1 back from the subprocess.call(...) https://github.com/biopython/biopython/commit/2d94a24ca223851d9fd895a82780dd0f23dc2dca Some debugging later I realised the paths in the control file were using Unix slashes rather than Windows slashes: https://github.com/biopython/biopython/commit/4125e55b291922053380b5fe688bd687c70035fa That should now just leave the yn00 stats parsing for you to check (which offset should the fix use, assuming that is the right fix). It was worth insisting on more tests and running them on Windows :) Regards, Peter From b.invergo at gmail.com Wed Aug 17 14:43:04 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Wed, 17 Aug 2011 20:43:04 +0200 Subject: [Biopython-dev] PAML yn00 under Windows In-Reply-To: References: Message-ID: <1313606586.3107.9.camel@localhost.localdomain> Hi, Just got home and saw the emails. Yes, in the end it was good to do the extra tests! So the path separator problem is solved, right? That indexing is a weird one. I'll look at it now. -brandon On Wed, 2011-08-17 at 18:56 +0100, Peter Cock wrote: > On Wed, Aug 17, 2011 at 6:02 PM, Peter Cock wrote: > > Hi again, > > > > You may have noticed from the buildbot emails that there is a > > separate issue with the PAML tests on Python (2.4 and) 2.5, > > applying to executing all three binaries tried: yn00, baseml > > and codeml, e.g. > > ... > > I can reproduce this at the terminal window, and it is specific > > to Python (2.4 and) 2.5, using Python 2.6 and 2.7 is fine, as > > are Python 3.1 and 3.2. > > I'm getting -1 back from the subprocess.call(...) > https://github.com/biopython/biopython/commit/2d94a24ca223851d9fd895a82780dd0f23dc2dca > > Some debugging later I realised the paths in the control file > were using Unix slashes rather than Windows slashes: > https://github.com/biopython/biopython/commit/4125e55b291922053380b5fe688bd687c70035fa > > That should now just leave the yn00 stats parsing for you > to check (which offset should the fix use, assuming that > is the right fix). > > It was worth insisting on more tests and running them on Windows :) > > Regards, > > Peter From b.invergo at gmail.com Wed Aug 17 17:28:32 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Wed, 17 Aug 2011 23:28:32 +0200 Subject: [Biopython-dev] PAML yn00 under Windows In-Reply-To: References: Message-ID: <1313616514.3107.27.camel@localhost.localdomain> Ok, I just sent a pull request. It turns out that either due to the way C works in Windows or due to the way PAML was coded, what was a nice "-nan" in Linux is printed as "-1.#IND" in Windows, which messed up everything. Rather than parsing it in an algorithmic manner, I got angry and threw some regex fu at it, which works a lot nicer than what I had before. Tested successfully in Linux and Windows 7, Python 2.7.2 -brandon On Wed, 2011-08-17 at 16:39 +0100, Peter Cock wrote: > On Wed, Aug 17, 2011 at 4:28 PM, Peter Cock wrote: > > Hi Brandon, > > > > It looks like the stats line parsing in yn00 needs a little adjustment > > for this platform, > > ... > > value = stats_split[i+2].strip("()") > > IndexError: list index out of range > > > > > > ... > > raise ValueError("Problem with stats line: %r" % line) > > ValueError: Problem with stats line: 'LWL85m: dS = -1.#IND dN = > > -1.#IND w =-1.#IND S = -1.$ N = -1.$ (rho = -1.#IO)\n' > > I think you need to adjustment to the bounds on i given you want to use > stats_split[i] and stats_split[i+2]. Note sure if want a -3 or -2 on the upper > bound... > > C:\repositories\biopython\Tests>git diff > diff --git a/Bio/Phylo/PAML/_parse_yn00.py b/Bio/Phylo/PAML/_parse_yn00.py > index 221b6de..e4967fb 100644 > --- a/Bio/Phylo/PAML/_parse_yn00.py > +++ b/Bio/Phylo/PAML/_parse_yn00.py > @@ -103,7 +103,7 @@ def parse_others(lines, results, sequences): > stats = {} > line_stats = line.split(":")[1].strip() > stats_split = line_stats.split() > - for i in range(0, len(stats_split), 3): > + for i in range(0, len(stats_split)-3, 3): > stat = stats_split[i].strip("()") > if stat == "w": > stat = "omega" > > > I don't know why this didn't come up under Linux, something subtle > going on between the PAML versions maybe? > > Regards, > > Peter From p.j.a.cock at googlemail.com Wed Aug 17 17:43:13 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 17 Aug 2011 22:43:13 +0100 Subject: [Biopython-dev] PAML yn00 under Windows In-Reply-To: <1313616514.3107.27.camel@localhost.localdomain> References: <1313616514.3107.27.camel@localhost.localdomain> Message-ID: On Wed, Aug 17, 2011 at 10:28 PM, Brandon Invergo wrote: > Ok, I just sent a pull request. It turns out that either due to the way > C works in Windows or due to the way PAML was coded, what was a nice > "-nan" in Linux is printed as "-1.#IND" in Windows, which messed up > everything. That sounds like the C float libraries, the oddities of which are something which later versions of Python have done a better and better job of hiding from us ;) > Rather than parsing it in an algorithmic manner, I got angry > and threw some regex fu at it, which works a lot nicer than what > I had before. > > Tested successfully in Linux and Windows 7, Python 2.7.2 > > -brandon Sounds good - I'll have a look on github (possibly tomorrow), Peter From p.j.a.cock at googlemail.com Thu Aug 18 12:10:15 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 18 Aug 2011 17:10:15 +0100 Subject: [Biopython-dev] Commit freeze for release 1.58 Message-ID: Hi all, Unless anyone objects I propose to do the Biopython 1.58 release in the next hour. If this runs into any issues, it will have to wait until I'm back at work in two weeks time, or someone else (with access to a Windows 32 bit machine with all the compilers setup) can tackle it instead. I will be active online next week however - and coding - but on Japan time: http://2011.biohackathon.org/ I'm assuming the NEWS file is up to date, and will as usual be basing the release notice on that. If there is anything missing, please reply by email. Thank you all, Peter From p.j.a.cock at googlemail.com Thu Aug 18 13:19:32 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 18 Aug 2011 18:19:32 +0100 Subject: [Biopython-dev] Commit freeze for release 1.58 In-Reply-To: References: Message-ID: On Thu, Aug 18, 2011 at 5:10 PM, Peter Cock wrote: > Hi all, > > Unless anyone objects I propose to do the Biopython 1.58 > release in the next hour. If this runs into any issues, it will > have to wait until I'm back at work in two weeks time, or > someone else (with access to a Windows 32 bit machine > with all the compilers setup) can tackle it instead. > > I will be active online next week however - and coding - > but on Japan time: http://2011.biohackathon.org/ > > I'm assuming the NEWS file is up to date, and will as > usual be basing the release notice on that. If there is > anything missing, please reply by email. > > Thank you all, > > Peter > Ok, that's done. And in news that will no doubt please some of you, I've finally given up on keeping Python 2.4 support going. Feel free to start cleaning up some of the nastier hacks (like the ElementTree imports). Peter From p.j.a.cock at googlemail.com Thu Aug 18 15:32:57 2011 From: p.j.a.cock at googlemail.com (Peter) Date: Thu, 18 Aug 2011 20:32:57 +0100 Subject: [Biopython-dev] Biopython 1.58 released Message-ID: <75327C54-CF88-43BC-BACF-87139456FE67@googlemail.com> Dear All, Biopython 1.58 is out: http://news.open-bio.org/news/2011/08/biopython-1-58-released/ Thank you to everyone who has contributed. Peter P.S. We're on Twitter as @Biopython From updates at feedmyinbox.com Sun Aug 21 03:49:13 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Sun, 21 Aug 2011 03:49:13 -0400 Subject: [Biopython-dev] 8/21 newest questions tagged biopython - Stack Overflow Message-ID: <0adf58b4241f2a58161d1a41524288d1@74.63.51.88> // A PWM with gapped alignments in Biopython // August 9, 2011 at 11:28 AM http://stackoverflow.com/questions/6998727/a-pwm-with-gapped-alignments-in-biopython I'm trying to generate a Position-Weighted Matrix (PWM) in Biopython from Clustalw multiple sequence alignments. I get a "Wrong Alphabet" error every time I do it with gapped alignments. From reading the documentation, I think I need to utilize the Gapped Alphabet to deal with the '-' character in gapped alignments. But when I do this, it still doesn't resolve the error. Does anyone see the problem with this code, or have a better way to generate a PWM from gapped Clustal alignments? from Bio.Alphabet import Gapped alignment = AlignIO.read("filename.clustalw", "clustal", alphabet=Gapped) m = Motif.Motif() for a in alignment: m.add_instance(a.seq) m.pwm() -- Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/837947/00ae8e456ba91bb32a32b795eb392f971eee04e9/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From updates at feedmyinbox.com Sun Aug 21 03:48:37 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Sun, 21 Aug 2011 03:48:37 -0400 Subject: [Biopython-dev] 8/21 biopython Questions - BioStar Message-ID: <44c53445166933a51ab21f5d53e72577@74.63.51.88> // Error using Entrez.esummary from biopython // August 16, 2011 at 8:47 AM http://biostar.stackexchange.com/questions/11232/error-using-entrez-esummary-from-biopython Can someone please explain this error? I hava a smal script that tries to fetch information from the a NCBI BioAssay using the Entrez module form Bipython. I get an error I do not understand. I try to run: from Bio import Entrez Entrez.email="yourname at mail.se" handle_esummary=Entrez.esummary(db='pcassay',id='1337') record_esummary=Entrez.read(handle_esummary) I get the error: File "smaltest.py", line 5, in record_esummary=Entrez.read(handle_esummary) File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 297, in read record = handler.run(handle) File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 90, in run self.parser.ParseFile(handle) File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 105, in startElement itemtype = str(attrs["Type"]) # convert from Unicode KeyError: 'Type' // Import fasta sequences to a motif // August 15, 2011 at 11:54 AM http://biostar.stackexchange.com/questions/11204/import-fasta-sequences-to-a-motif I need to construct a PWM from every sequence in a fasta file, using biopython. The way I'm trying to do this is to import each line of sequence into a motif, then run a PWM on each instance of the motif. Currently, I'm trying it this way, but different variations of it have generated their fair share of errors, mostly "Wrong Alphabet" and "NoneType object is not iterable": alphabet = IUPAC.unambiguous_dna m = Motif.Motif(alphabet) for seq_record in SeqIO.parse("10fasta.fasta", "fasta"): m.add_instance(seq_record.seq) print m1.pwm() Does anyone see what's wrong with the way I'm adding instances to the motif? Of course, if there's a better way to do this that I'm completely missing, feel free to comment on that too. // A PWM with gapped alignments in Biopython // August 9, 2011 at 1:47 PM http://biostar.stackexchange.com/questions/11070/a-pwm-with-gapped-alignments-in-biopython I'm trying to generate a Position-Weighted Matrix (PWM) in Biopython from Clustalw multiple sequence alignments. I get a "Wrong Alphabet" error every time I do it with gapped alignments. From reading the documentation, I think I need to utilize the Gapped Alphabet to deal with the '-' character in gapped alignments. But when I do this, it still doesn't resolve the error. Does anyone see the problem with this code, or have a better way to generate a PWM from gapped Clustal alignments? from Bio.Alphabet import Gapped alignment = AlignIO.read("filename.clustalw", "clustal", alphabet=Gapped) m = Motif.Motif() for a in alignment: m.add_instance(a.seq) m.pwm() -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/837946/d83641150d25e0f52255e3fcfa9e7ccb2b83405f/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From p.j.a.cock at googlemail.com Mon Aug 22 02:53:17 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 22 Aug 2011 07:53:17 +0100 Subject: [Biopython-dev] Type vs type in Entrez.esummary XML (via BioStar) Message-ID: Hi all, On Sun, Aug 21, 2011 at 8:48 AM, Feed My Inbox wrote: > // Error using Entrez.esummary from biopython > // August 16, 2011 at 8:47 AM > > http://biostar.stackexchange.com/questions/11232/error-using-entrez-esummary-from-biopython > Can someone please explain this error? > > I hava a smal script that tries to fetch information from the a > NCBI BioAssay using the Entrez module form Bipython. I get > an error I do not understand. I try to run: > > from Bio import Entrez > Entrez.email="yourname at mail.se" > > handle_esummary=Entrez.esummary(db='pcassay',id='1337') > record_esummary=Entrez.read(handle_esummary) > > > I get the error: > > File "smaltest.py", line 5, in > ? ?record_esummary=Entrez.read(handle_esummary) > ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 297, in read > ? ?record = handler.run(handle) > ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 90, in run > ? ?self.parser.ParseFile(handle) > ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 105, in startElement > ? ?itemtype = str(attrs["Type"]) # convert from Unicode > KeyError: 'Type' > I can reproduce this and The cause is the NCBI using lowercase in one tag's attribute: We're expecting the attributes to be Name and Type, and that is the case for all the other tags in this file. Michiel - do you think we should just add a fallback for type if we get a KeyError on Type? Do you think we should report this inconsistency/bug to the NCBI? Peter From p.j.a.cock at googlemail.com Mon Aug 22 03:03:30 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 22 Aug 2011 08:03:30 +0100 Subject: [Biopython-dev] Type vs type in Entrez.esummary XML (via BioStar) In-Reply-To: References: Message-ID: On Mon, Aug 22, 2011 at 7:53 AM, Peter Cock wrote: > Hi all, > > On Sun, Aug 21, 2011 at 8:48 AM, Feed My Inbox wrote: >> // Error using Entrez.esummary from biopython >> // August 16, 2011 at 8:47 AM >> >> http://biostar.stackexchange.com/questions/11232/error-using-entrez-esummary-from-biopython >> Can someone please explain this error? >> >> I hava a smal script that tries to fetch information from the a >> NCBI BioAssay using the Entrez module form Bipython. I get >> an error I do not understand. I try to run: >> >> from Bio import Entrez >> Entrez.email="yourname at mail.se" >> >> handle_esummary=Entrez.esummary(db='pcassay',id='1337') >> record_esummary=Entrez.read(handle_esummary) >> >> >> I get the error: >> >> File "smaltest.py", line 5, in >> ? ?record_esummary=Entrez.read(handle_esummary) >> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 297, in read >> ? ?record = handler.run(handle) >> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 90, in run >> ? ?self.parser.ParseFile(handle) >> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 105, in startElement >> ? ?itemtype = str(attrs["Type"]) # convert from Unicode >> KeyError: 'Type' >> > > I can reproduce this and The cause is the NCBI using > lowercase in one tag's attribute: > > > > We're expecting the attributes to be Name and Type, and > that is the case for all the other tags in this file. > > Michiel - do you think we should just add a fallback for > type if we get a KeyError on Type? Do you think we should > report this inconsistency/bug to the NCBI? Actually it clearly violates the DTD, and thus fails XML validation - so it is clearly a NCBI bug. Peter From chapmanb at 50mail.com Tue Aug 23 15:31:34 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 23 Aug 2011 15:31:34 -0400 Subject: [Biopython-dev] Brad's GFF parser in a Biopython repository In-Reply-To: References: Message-ID: <20110823193134.GB507@kunkel> Peter; Awesome, thanks for doing this. I didn't even realize there was a git solution that could transfer histories across repositories like this; how did you do it? Everything looks great on a first pass. Do you think some of the scripts would also be useful to include in the script directory? They handle some of the common cases people have asked about; 'access_gff_index.py' uses bx-python so might be excluded, but the others are Biopython specific. Thanks again, Brad > I managed to do a git script to select out the GFF code and tests from > your bcbb repository and get it into the Biopython source tree. The > folder changes made it interesting ;) > > Input: https://github.com/chapmanb/bcbb (master branch) > > Output: https://github.com/peterjc/biopython/tree/brad_gff > > The tests pass, but that is as far as I have got with this. Brad, > could you have a look at this new branch for sanity checking please? > > Peter From p.j.a.cock at googlemail.com Tue Aug 23 22:33:21 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 24 Aug 2011 03:33:21 +0100 Subject: [Biopython-dev] Brad's GFF parser in a Biopython repository In-Reply-To: <20110823193134.GB507@kunkel> References: <20110823193134.GB507@kunkel> Message-ID: On Tue, Aug 23, 2011 at 8:31 PM, Brad Chapman wrote: > Peter; > Awesome, thanks for doing this. I didn't even realize there was a > git solution that could transfer histories across repositories like > this; how did you do it? Well, it wasn't an off the shelf solution, it was a hack. See https://gist.github.com/1167169 and https://github.com/gitpython-developers/GitPython I used the Python library (import git) to query the source repository, basically doing "git log -- gff/BCBio gff/Tests" to find only the commits of interest, then "git show XXX" to extract the diff which I then had to modify to change the paths, then a system call to patch to apply each patch to the destination repository, git add, git commit. Note for git commit you can specify the message via a file (-F) so I could preserve the original long message, plus you can preserve the authored date (--date) and the author too. There were several steps where I couldn't work out how you were meant to do something via the git wrapper's API (e.g. get a diff as a patch), but it also lets you easily call git commands directly which was easier for me. Bit hacky but seemed to get the job done. > Everything looks great on a first pass. Do you think some of the > scripts would also be useful to include in the script directory? > They handle some of the common cases people have asked about; > 'access_gff_index.py' uses bx-python so might be excluded, but the > others are Biopython specific. > > Thanks again, > Brad Good point - that could be mapped to the Biopython scripts folder. I'll take a look. Peter From updates at feedmyinbox.com Thu Aug 25 03:48:40 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Thu, 25 Aug 2011 03:48:40 -0400 Subject: [Biopython-dev] 8/25 biopython Questions - BioStar Message-ID: <738da676fc97903dba65147015733dc5@74.63.51.88> // How to fetch genomics sequnce using coordinates in BIOPython // August 24, 2011 at 10:56 PM http://biostar.stackexchange.com/questions/11454/how-to-fetch-genomics-sequnce-using-coordinates-in-biopython Hi everyone, I'm a newbie of biopython. My question may be stupid but please help. I want to use (chromosome number, start position, end position, strand) to fetch the corresponding sequence in mouse genome. How can this be done with biopython connecting to NCBI database? Could anyone help me please? Thanks a lot. // How to fetch genomics sequence using coordinates in BioPython // August 24, 2011 at 10:56 PM http://biostar.stackexchange.com/questions/11454/how-to-fetch-genomics-sequence-using-coordinates-in-biopython Hi everyone, I'm a newbie of biopython. My question may be stupid but please help. I want to use (chromosome number, start position, end position, strand) to fetch the corresponding sequence in mouse genome. How can this be done with biopython connecting to NCBI database? Could anyone help me please? Thanks a lot. -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/837946/d83641150d25e0f52255e3fcfa9e7ccb2b83405f/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From p.j.a.cock at googlemail.com Fri Aug 26 03:44:32 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Aug 2011 08:44:32 +0100 Subject: [Biopython-dev] Biopython under Python from Cygwin on Windows? Message-ID: Hi all, I was just wondering if anyone has tried this recently (Biopython under Cygwin), and if it would be worth adding as another platform for the buildbot. There are likely enough differences from Linux to cause potential cross platform issues - especially for calling external tools... Regards, Peter From updates at feedmyinbox.com Fri Aug 26 04:05:18 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Fri, 26 Aug 2011 04:05:18 -0400 Subject: [Biopython-dev] 8/26 newest questions tagged biopython - Stack Overflow Message-ID: // How do I set the PYTHONPATH on Cygwin? // August 25, 2011 at 9:16 PM http://stackoverflow.com/questions/7199082/how-do-i-set-the-pythonpath-on-cygwin In the Biopython installation instructions, it says that if Biopython doesn't work I'm supposed to do this: export PYTHONPATH = $PYTHONPATH':/directory/where/you/put/Biopython' I tried doing that in Cygwin from the ~ directory using the name of the Biopython directory (or everything of it past the ~ directory), but when I tested it by going into the Python interpreter and typing in From Bio.Seq import Seq It said the module doesn't exist. How do I make it so that I don't have to be in the Biopython directory to be able to import Seq? -- Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/837947/00ae8e456ba91bb32a32b795eb392f971eee04e9/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From clements at galaxyproject.org Mon Aug 29 17:29:28 2011 From: clements at galaxyproject.org (Dave Clements) Date: Mon, 29 Aug 2011 14:29:28 -0700 Subject: [Biopython-dev] Galaxy is Hiring In-Reply-To: References: Message-ID: Hello all The Galaxy Project is growing and has open positions in both the Penn State and Emory groups (http://wiki.g2.bx.psu.edu/News/Galaxy%20is%20Hiring). *Penn State: System administrators/analysts* The Nekrutenko Lab at the Huck Institutes of Life Sciences at Penn State is currently recruiting system analysts/administrators with experience in building and maintaining complex performance compute environments. The areas of immediate need include: - Storage balancing and tiered storage - Virtualization - Schedulers - Deployment of Galaxy instances and dependence management - Relational databases and query optimization - User management A minimum of 5 year experience with UNIX/Linux system administration is required. Applicants should submit a CV and list of references to jobs at galaxyproject.org. *Emory: Software Engineers and Post-Docs* The Taylor Lab in the Biologyand Mathematics & Computer Science at Emory Universityis looking for software engineers and postdoctoral scholarsto work on the Galaxy project. We are seeking software engineers with expertise in distributed computing and systems programming, web-based visualization and visual analytics, informatics and data analysis and integration, and bioinformatics application areas such as re-sequencing, de novo assembly, metagenomics, transcriptome analysis and epigenetics. These are full time positions located in Atlanta, GA. See the official posting( http://bx.mathcs.emory.edu/joining/sw/) for full details. Postdoctoral applicants should have expertise in Bioinformatics and Computational Biology and research interests that complement but extend the lab's current interests: The Galaxy project; distributed and high-performance computing for data intensive science; vertebrate functional genomics; and genomics and epigenomic mechanisms of gene regulation, the role of transcription factors and chromatin structure in global gene expression, development, and differentiation. See the announcement( http://bx.mathcs.emory.edu/joining/postdocs/) for full details. If any of these openings describe you then please consider applying. Thanks, Dave C. -- http://galaxyproject.org/ http://getgalaxy.org/ http://usegalaxy.org/ http://galaxyproject.org/wiki/ From redmine at redmine.open-bio.org Mon Aug 1 05:24:51 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 1 Aug 2011 05:24:51 +0000 Subject: [Biopython-dev] [Biopython - Feature #3271] Updates to PDBList.py- downloading PDB structures References: Message-ID: Issue #3271 has been updated by David Cain. Hi, Eric. I'm glad you like my changes, and I appreciate your feedback. I made some changes in line with your suggestions and submitted my branch as a pull request. Thank you again for the response. ---------------------------------------- Feature #3271: Updates to PDBList.py- downloading PDB structures https://redmine.open-bio.org/issues/3271 Author: David Cain Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: 1.57 URL: https://github.com/DavidCain/biopython PDBList.py is somewhat out of date: it has support for .Z compression, but the ftp://ftp.wwpdb.org/ server only has .gz archives. It also relies on a system utility to decompress the downloaded archives. The default, gunzip, is effective enough for posix systems, but Windows requires the installation of a command line tool, such as 7zip. I've rewritten it to use the gzip module, and to ignore the compression parameter (as all files are .gz anyway). I left the 'uncompress' and 'compression' parameters for backwards compatibility. I've also made it so that the user can override and use a system decompression tool if desired. I'm not sure if this is the best way to handle it, as the retrieve_pdb_file() function would work just fine removing support for system decompression and the 'compression' parameter. Also, when calling retrieve_pdb_file() repeatedly, urllib can generate too many FTP connections and crash (for example) a script attempting to download some structures in succession. Updating to urllib2 removes this issue. My GitHub branch is linked, and the only file I've modified (PDBList.py) is attached. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Mon Aug 1 14:57:06 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 1 Aug 2011 14:57:06 +0000 Subject: [Biopython-dev] [Biopython - Feature #3271] (Closed) Updates to PDBList.py- downloading PDB structures References: Message-ID: Issue #3271 has been updated by Eric Talevich. Status changed from New to Closed % Done changed from 0 to 100 Merged it: https://github.com/biopython/biopython/pull/14 I think we could do more work on the docstrings and comments, generally, but it's out of the scope of this bug. Thanks again! ---------------------------------------- Feature #3271: Updates to PDBList.py- downloading PDB structures https://redmine.open-bio.org/issues/3271 Author: David Cain Status: Closed Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: 1.57 URL: https://github.com/DavidCain/biopython PDBList.py is somewhat out of date: it has support for .Z compression, but the ftp://ftp.wwpdb.org/ server only has .gz archives. It also relies on a system utility to decompress the downloaded archives. The default, gunzip, is effective enough for posix systems, but Windows requires the installation of a command line tool, such as 7zip. I've rewritten it to use the gzip module, and to ignore the compression parameter (as all files are .gz anyway). I left the 'uncompress' and 'compression' parameters for backwards compatibility. I've also made it so that the user can override and use a system decompression tool if desired. I'm not sure if this is the best way to handle it, as the retrieve_pdb_file() function would work just fine removing support for system decompression and the 'compression' parameter. Also, when calling retrieve_pdb_file() repeatedly, urllib can generate too many FTP connections and crash (for example) a script attempting to download some structures in succession. Updating to urllib2 removes this issue. My GitHub branch is linked, and the only file I've modified (PDBList.py) is attached. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Tue Aug 2 16:43:30 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 2 Aug 2011 17:43:30 +0100 Subject: [Biopython-dev] Leaked handles in PAML unit tests Message-ID: Hi Brandon, Would you be able to look at these handle leaks in the PAML unit tests some time? test_PAML_baseml ... /Users/pjcock/lib/python3.2/unittest/case.py:574: ResourceWarning: unclosed file <_io.TextIOWrapper name='PAML/bad1.ctl' mode='r' encoding='UTF-8'> callableObj(*args, **kwargs) /Users/pjcock/lib/python3.2/unittest/case.py:574: ResourceWarning: unclosed file <_io.TextIOWrapper name='PAML/bad2.ctl' mode='r' encoding='UTF-8'> callableObj(*args, **kwargs) /Users/pjcock/lib/python3.2/unittest/case.py:574: ResourceWarning: unclosed file <_io.TextIOWrapper name='/dev/null' mode='w' encoding='UTF-8'> callableObj(*args, **kwargs) ok test_PAML_codeml ... ok test_PAML_yn00 ... /Users/pjcock/lib/python3.2/unittest/case.py:574: ResourceWarning: unclosed file <_io.TextIOWrapper name='PAML/bad3.ctl' mode='r' encoding='UTF-8'> callableObj(*args, **kwargs) ok This is warning is new under Python 3.2, but this kind of code can and has caused bugs on Windows (can't delete files if there is an open handle) and Jython (different GC collection, so implicit handle closing is stochastic). See also: http://bugs.python.org/issue10093 Note there are other cases of this, some in PopGen (which may explain a periodic failure under Jython), and in test_SCOP_Astral.py (where the object design makes this difficult to avoid IIRC), etc. Peter From p.j.a.cock at googlemail.com Tue Aug 2 16:47:20 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 2 Aug 2011 17:47:20 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Sat, Jul 30, 2011 at 8:42 AM, Wibowo Arindrarto wrote: > Hi Peter, > I've done some more improvements to the code: > - I've written the check and unittest for the file handle mode. I've set it > so that abi file has to be opened in 'rb' mode, otherwise it'll return an > error. While it's ok to open in 'r' mode in python 2 in Linux, it has to be > specified as 'rb' in Windows and/or Python 3 for the file to be read > correctly. So I decided forcing it to 'rb' is the best. Because of this, I > changed 'test_SeqIO.py:503' to include the mode argument when opening. OK, good. > - I've also checked against test_Emboss.py for seqret output, after > including the abi format in it. My EMBOSS version is 6.4.0. There was a > slight problem with this testing, since for some reason the ID returned by > seqret is always "EMBOSS_001". Something might be wrong with my EMBOSS > installation, since when I previously tested it against 6.1.0, the ID was > correct (although the qual values not, so I had to upgrade). As expected, if > I comment out the code that tests for sequence id ('test_Emboss.py:168-172') > the tests pass. Maybe you could try testing it as well and see if EMBOSS > also returns the default id instead of the sample name? EMBOSS 6.3.1 is fine, so I think we should report this as a bug in EMBOSS 6.4.0 > - Finally, I did some small cosmetic changes to the code (typos, etc). > All changes have been pushed to my github fork. Now I still have time for > the weekend to improve whatever needs to be improved :). > Regards, There appears to be another Python 3 problem, consider this at the python prompt: from Bio import SeqIO record = SeqIO.read("Tests/Abi/310.ab1", "abi") record.letter_annotations["phred_quality"] I expect as list of integers, e.g. [0, 0, 0, ..., 0] not ['\x00', '\x00', '\x00', ..., '\x00'] Peter From w.arindrarto at gmail.com Tue Aug 2 16:53:46 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 2 Aug 2011 18:53:46 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, I noticed that bug was because I did not add the _bytes_to_string() converter for a data type. I already fixed this with my latest push, adding the appropriate if clause at AbiIO.py:293-294. Regards, --- Wibowo Arindrarto (bow) http://bow.web.id On Tue, Aug 2, 2011 at 18:47, Peter Cock wrote: > On Sat, Jul 30, 2011 at 8:42 AM, Wibowo Arindrarto > wrote: > > Hi Peter, > > I've done some more improvements to the code: > > - I've written the check and unittest for the file handle mode. I've set > it > > so that abi file has to be opened in 'rb' mode, otherwise it'll return an > > error. While it's ok to open in 'r' mode in python 2 in Linux, it has to > be > > specified as 'rb' in Windows and/or Python 3 for the file to be read > > correctly. So I decided forcing it to 'rb' is the best. Because of this, > I > > changed 'test_SeqIO.py:503' to include the mode argument when opening. > > OK, good. > > > - I've also checked against test_Emboss.py for seqret output, after > > including the abi format in it. My EMBOSS version is 6.4.0. There was a > > slight problem with this testing, since for some reason the ID returned > by > > seqret is always "EMBOSS_001". Something might be wrong with my EMBOSS > > installation, since when I previously tested it against 6.1.0, the ID was > > correct (although the qual values not, so I had to upgrade). As expected, > if > > I comment out the code that tests for sequence id > ('test_Emboss.py:168-172') > > the tests pass. Maybe you could try testing it as well and see if EMBOSS > > also returns the default id instead of the sample name? > > EMBOSS 6.3.1 is fine, so I think we should report this as a bug in EMBOSS > 6.4.0 > > > - Finally, I did some small cosmetic changes to the code (typos, etc). > > All changes have been pushed to my github fork. Now I still have time for > > the weekend to improve whatever needs to be improved :). > > Regards, > > There appears to be another Python 3 problem, consider this at the > python prompt: > > from Bio import SeqIO > record = SeqIO.read("Tests/Abi/310.ab1", "abi") > record.letter_annotations["phred_quality"] > > I expect as list of integers, e.g. [0, 0, 0, ..., 0] not ['\x00', > '\x00', '\x00', ..., '\x00'] > > Peter > From p.j.a.cock at googlemail.com Tue Aug 2 17:57:56 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 2 Aug 2011 18:57:56 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Tue, Aug 2, 2011 at 5:53 PM, Wibowo Arindrarto wrote: > Hi Peter, > I noticed that bug was because I did not add the _bytes_to_string() > converter for a data type. I already fixed this with my latest push, adding > the appropriate if clause at AbiIO.py:293-294. > Regards, Was that only half the fix? This made it work for me: https://github.com/peterjc/biopython/commit/8fc1e141173a735740f91a1338a3fbb747fa2a45 and: https://github.com/peterjc/biopython/commit/a44e206e482ca5904b395aaca3576a232769ce2e Peter From p.j.a.cock at googlemail.com Tue Aug 2 18:03:24 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 2 Aug 2011 19:03:24 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Tue, Aug 2, 2011 at 6:57 PM, Peter Cock wrote: > On Tue, Aug 2, 2011 at 5:53 PM, Wibowo Arindrarto > wrote: >> Hi Peter, >> I noticed that bug was because I did not add the _bytes_to_string() >> converter for a data type. I already fixed this with my latest push, adding >> the appropriate if clause at AbiIO.py:293-294. >> Regards, > > Was that only half the fix? This made it work for me: > > https://github.com/peterjc/biopython/commit/8fc1e141173a735740f91a1338a3fbb747fa2a45 > > and: > > https://github.com/peterjc/biopython/commit/a44e206e482ca5904b395aaca3576a232769ce2e > > Peter > Could you test this branch, which I think is ready to be merged to the trunk now: https://github.com/peterjc/biopython/tree/seqio-abi Thanks, Peter From w.arindrarto at gmail.com Wed Aug 3 12:14:53 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 3 Aug 2011 14:14:53 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, My bad, I forgot to change that one line and didn't test before comitting. Thanks for fixing it. I've ran the tests on your tree on py2.6.5 and py3.1.2, here are the results: - On both py2.6.5 and py3.1.2, I have the following test case error: "NameError: global name 'embossversion' is not defined", on line 257. I didn't have "EMBOSS_ROOT" in my os.environ paths (I installed 6.4.0 from source, by the way), so this must be what's causing it. Is there another way to automatically detect EMBOSS_ROOT other than this? Or perhaps we should avoid emboss 6.4.0's bug by only checking if the id is EMBOSS_001? The only case I think this would fail is if the user inputs "EMBOSS_001" before the sequencing run as the sample id, which is possible but unlikely. - On a related note, I noticed you set the minimum Emboss requirement to 6.1.0 patch 3. I'm not sure if this the one I use previously, but my previous Emboss 6.1.0 installation failed to extract the proper quality values. Perhaps we should set the minimum version to 6.3.1? (well, making it the only Emboss version that works with Biopython because of that 6.4.0 bug). - Other than those two, everything's tip top :). Regards, Wibowo Arindrarto (bow) http://bow.web.id On Tue, Aug 2, 2011 at 20:03, Peter Cock wrote: > On Tue, Aug 2, 2011 at 6:57 PM, Peter Cock > wrote: > > On Tue, Aug 2, 2011 at 5:53 PM, Wibowo Arindrarto > > wrote: > >> Hi Peter, > >> I noticed that bug was because I did not add the _bytes_to_string() > >> converter for a data type. I already fixed this with my latest push, > adding > >> the appropriate if clause at AbiIO.py:293-294. > >> Regards, > > > > Was that only half the fix? This made it work for me: > > > > > https://github.com/peterjc/biopython/commit/8fc1e141173a735740f91a1338a3fbb747fa2a45 > > > > and: > > > > > https://github.com/peterjc/biopython/commit/a44e206e482ca5904b395aaca3576a232769ce2e > > > > Peter > > > > Could you test this branch, which I think is ready to be merged to the > trunk now: > > https://github.com/peterjc/biopython/tree/seqio-abi > > Thanks, > > Peter > From macrozhu at gmail.com Wed Aug 3 13:47:07 2011 From: macrozhu at gmail.com (Hongbo Zhu) Date: Wed, 3 Aug 2011 15:47:07 +0200 Subject: [Biopython-dev] inconsistent return values Bio.PDB.NeighborSearch.search() Message-ID: Hi, python-developers, In the current version of BioPython (source code as of 3 Aug. 2011), it seems the outcome of *Bio.PDB.NeighborSearch.search()* is inconsistent if different levels are specified when the returned list is empty. e.g. > ns.search(center, radius, 'A') > [] > ns.search(center, radius, 'R') # similar for levels 'C', 'M', 'S' > IndexError: list index out of range Obviously, this is because the Bio.PDB.NeighborSearch.search() functions tries to convert returned list to levels other than 'A' using function Bio.PDB.Selection.unfold_entities() (see line 92 in NeighborSearch.py). In function unfold_entities(), the first element of input argument entity_list is evaluated without entity_list being checked for emptiness (see line 47 in Selection.py). An IndexError is raised when entity_list is empty. So, I think either the length of the returned list in Bio.PDB.NeighborSearch.search() should be checked before invoking Bio.PDB.Selection.unfold_entities(), or the function Bio.PDB.Selection.unfold_entities() should be revised so that it simply returns an empty list if the argument entity_list is empty. I prefer the latter solution because this would also fix other similar situations when Bio.PDB.Selection.unfold_entities() is invoked in other functions. And it seems "Sorry, entering bugs into the product Biopython has been disabled." regards, Hongbo Zhu From p.j.a.cock at googlemail.com Wed Aug 3 13:58:13 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 3 Aug 2011 14:58:13 +0100 Subject: [Biopython-dev] inconsistent return values Bio.PDB.NeighborSearch.search() In-Reply-To: References: Message-ID: On Wed, Aug 3, 2011 at 2:47 PM, Hongbo Zhu wrote: > > And it seems "Sorry, entering bugs into the product Biopython has been > disabled." We moved from Bugzilla to Redmine, links on the main homepage were updated: http://redmine.open-bio.org/projects/biopython I wonder if we can change that message text or something... Peter From p.j.a.cock at googlemail.com Wed Aug 3 14:04:46 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 3 Aug 2011 15:04:46 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Wed, Aug 3, 2011 at 1:14 PM, Wibowo Arindrarto wrote: > Hi Peter, > My bad, I forgot to change that one line and didn't test before comitting. > Thanks for fixing it. > I've ran the tests on your tree on py2.6.5 and py3.1.2, here are the > results: > - On both py2.6.5 ?and py3.1.2, I have the following test case error: > "NameError: global name 'embossversion' is not defined", on line 257. >... It was simpler than that - I'd checked it in with a typo, emboss_version was what I wanted. Sorry about that confusion! > - On a related note, I noticed you set the minimum Emboss requirement to > 6.1.0 patch 3. I'm not sure if this the one I use previously, but my > previous Emboss 6.1.0 installation failed to extract the proper quality > values. Perhaps we should set the minimum version to 6.3.1? (well, making it > the only Emboss version that works with Biopython because of that 6.4.0 > bug). We test a lot of FASTQ stuff which requires 6.1.0 patch 3 or later, which is why that requirement exists. Asking for at least EMBOSS 6.3.1 makes no practical difference as far as I can see. If you meant require EMBOSS 6.4.1 that hasn't been released yet. I'm expecting them to release EMBOSS 6.4.0 patch 1 soon (after I've tested the proposed patch Peter Rice sent), but that will still report itself as EMBOSS 6.4.0 (based on past patch behaviour, something I consider annoying but have to live with). > - Other than those two, everything's tip top :). > Great. I've pushed the code to the main repository, and have just set off the buildbot slaves as a final sanity test. This reveal a minor Python 2.4 breakage (not a big issue - it only seems to be me still trying to keep testing this - and I'm about ready to give up), and another probable EMBOSS bug in an older version installed on one buildslave. Congratulations, your code will be in the next Biopython release. Thank you, Peter From redmine at redmine.open-bio.org Wed Aug 3 14:52:32 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 3 Aug 2011 14:52:32 +0000 Subject: [Biopython-dev] [Biopython - Bug #3276] (New) inconsistent returns of Bio.PDB.NeighborSearch.search() Message-ID: Issue #3276 has been reported by Hongbo Zhu. ---------------------------------------- Bug #3276: inconsistent returns of Bio.PDB.NeighborSearch.search() https://redmine.open-bio.org/issues/3276 Author: Hongbo Zhu Status: New Priority: Normal Assignee: Category: Target version: URL: In the current version of BioPython (source code as of 3 Aug. 2011), it seems the outcome of Bio.PDB.NeighborSearch.search() is inconsistent if different levels are specified when the returned list is empty. i.e. @ ns.search(center, radius, 'A') [] ns.search(center, radius, 'R') # similar for levels 'C', 'M', 'S' IndexError: list index out of range @ Obviously, this is because the Bio.PDB.NeighborSearch.search() functions tries to convert returned list to levels other than 'A' using function Bio.PDB.Selection.unfold_entities() (see line 92 in NeighborSearch.py). In function unfold_entities(), the first element of input argument entity_list is evaluated without entity_list being checked for emptiness (see line 47 in Selection.py). An IndexError is raised when entity_list is empty. So, I think either the length of the returned list in Bio.PDB.NeighborSearch.search() should be checked before invoking Bio.PDB.Selection.unfold_entities(), or the function Bio.PDB.Selection.unfold_entities() should be revised so that it simply returns an empty list if the argument entity_list is empty. I prefer the latter solution because this would also fix other similar situations when Bio.PDB.Selection.unfold_entities() is invoked in other functions. cheers, hongbo ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From w.arindrarto at gmail.com Wed Aug 3 15:11:13 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 3 Aug 2011 17:11:13 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, On Wed, Aug 3, 2011 at 16:04, Peter Cock wrote: > On Wed, Aug 3, 2011 at 1:14 PM, Wibowo Arindrarto > wrote: > > Hi Peter, > > My bad, I forgot to change that one line and didn't test before > comitting. > > Thanks for fixing it. > > I've ran the tests on your tree on py2.6.5 and py3.1.2, here are the > > results: > > - On both py2.6.5 and py3.1.2, I have the following test case error: > > "NameError: global name 'embossversion' is not defined", on line 257. > >... It was simpler than that - I'd checked it in with a typo, emboss_version > was what I wanted. Sorry about that confusion! Silly me, I should've noticed you used emboss_version when I was looking at the code checking Emboss dependency :/. > > - On a related note, I noticed you set the minimum Emboss requirement to > > 6.1.0 patch 3. I'm not sure if this the one I use previously, but my > > previous Emboss 6.1.0 installation failed to extract the proper quality > > values. Perhaps we should set the minimum version to 6.3.1? (well, making > it > > the only Emboss version that works with Biopython because of that 6.4.0 > > bug). > > We test a lot of FASTQ stuff which requires 6.1.0 patch 3 or later, > which is why that requirement exists. Asking for at least EMBOSS > 6.3.1 makes no practical difference as far as I can see. > > If you meant require EMBOSS 6.4.1 that hasn't been released yet. > > I'm expecting them to release EMBOSS 6.4.0 patch 1 soon (after > I've tested the proposed patch Peter Rice sent), but that will still > report itself as EMBOSS 6.4.0 (based on past patch behaviour, > something I consider annoying but have to live with). I meant Emboss 6.3.1, since that seems to be one that works best with the current AbiIO implementation. But yeah, I guess as long as the tests work it's fine. > > - Other than those two, everything's tip top :). > > > > Great. I've pushed the code to the main repository, and have > just set off the buildbot slaves as a final sanity test. > > This reveal a minor Python 2.4 breakage (not a big issue - it only > seems to be me still trying to keep testing this - and I'm about > ready to give up), and another probable EMBOSS bug in an > older version installed on one buildslave. > > Congratulations, your code will be in the next Biopython release. > > Thank you, > > Peter > This really made my day :)! You're welcome and thank you reviewing my code, too! Regards, --- Wibowo Arindrarto (bow) http://bow.web.id From w.arindrarto at gmail.com Thu Aug 4 11:30:44 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 4 Aug 2011 13:30:44 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, Ah yes, I didn't know there could be handles without .seek() and .tell(), and I thought those two are the proper way of traversing files, so I used them. I also didn't realize you could use SeqIO with network handles, too. This is really neat :). In any case, sure, I'd love to make some changes to the current AbiIO code so it works without .seek() and .tell(). Is there any other input types that does not use .seek() and .tell() other than network handles? Here's my new branch from the current master: https://github.com/bow/biopython/tree/seqio-abi_handlefix, nothing different for now but I'll push my updates soon. Regards, --- Wibowo Arindrarto (bow) http://bow.web.id On Thu, Aug 4, 2011 at 13:03, Peter Cock wrote: > On Wed, Aug 3, 2011 at 4:11 PM, Wibowo Arindrarto > wrote: > > On Wed, Aug 3, 2011 at 16:04, Peter Cock > wrote: > >> ... > >> Congratulations, your code will be in the next Biopython release. > >> ... > > > > This really made my day :)! You're welcome and thank you reviewing my > code, > > too! > > I found something else to work on (sorry!). You're using seek and tell, > which > may not exist. Network handles are a good example of this situation. Try: > > from urllib import urlopen > from Bio import SeqIO > handle = urlopen("http://biopython.org/SRC/biopython/Tests/Abi/310.ab1") > record = SeqIO.read(handle, "abi") > handle.close() > > I've added some code to test_SeqIO.py to simulate this, which revealed that > the SFF parser was also using the tell method. In that case we must track > the > offset explicitly (it is needed for handling SFF index blocks). You can see > how > I did this here - note I avoid the overhead of tracking the offset in > general: > > https://github.com/biopython/biopython/commit/9a3c44b28aae256b8da825c3c1553d71dbe329cc > > I've tried the same trick in the ABI parser, but this reveals your code > likes to > seek backwards. Try the attached patch against this revision to confirm > this. > > Having looked over your code, I don't believe you need to use seek and tell > at all. This isn't critical to fix right now, but I would like us to > solve it. Would > you like to try? Make a new branch from the current master for this please. > > Regards, > > Peter > From p.j.a.cock at googlemail.com Thu Aug 4 11:03:27 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 4 Aug 2011 12:03:27 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Wed, Aug 3, 2011 at 4:11 PM, Wibowo Arindrarto wrote: > On Wed, Aug 3, 2011 at 16:04, Peter Cock wrote: >> ... >> Congratulations, your code will be in the next Biopython release. >> ... > > This really made my day :)! You're welcome and thank you reviewing my code, > too! I found something else to work on (sorry!). You're using seek and tell, which may not exist. Network handles are a good example of this situation. Try: from urllib import urlopen from Bio import SeqIO handle = urlopen("http://biopython.org/SRC/biopython/Tests/Abi/310.ab1") record = SeqIO.read(handle, "abi") handle.close() I've added some code to test_SeqIO.py to simulate this, which revealed that the SFF parser was also using the tell method. In that case we must track the offset explicitly (it is needed for handling SFF index blocks). You can see how I did this here - note I avoid the overhead of tracking the offset in general: https://github.com/biopython/biopython/commit/9a3c44b28aae256b8da825c3c1553d71dbe329cc I've tried the same trick in the ABI parser, but this reveals your code likes to seek backwards. Try the attached patch against this revision to confirm this. Having looked over your code, I don't believe you need to use seek and tell at all. This isn't critical to fix right now, but I would like us to solve it. Would you like to try? Make a new branch from the current master for this please. Regards, Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: tell_hack.patch Type: application/octet-stream Size: 1466 bytes Desc: not available URL: From p.j.a.cock at googlemail.com Thu Aug 4 11:47:49 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 4 Aug 2011 12:47:49 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Thu, Aug 4, 2011 at 12:30 PM, Wibowo Arindrarto wrote: > Hi Peter, > Ah yes, I didn't know there could be handles without .seek() and .tell(), > and I thought those two are the proper way of traversing files, so I used > them. I also didn't realize you could use SeqIO with network handles, too. > This is really neat :). Yes - having a handle focused API makes some clever stuff possible :) Of course, parsing sequences directly from network handles isn't always a good idea, but it can be useful. > In any case, sure, I'd love to make some changes to the current AbiIO code > so it works without .seek() and .tell(). Is there any other input types that > does not use .seek() and .tell() other than network handles? I suspect some specialised handles for accessing compressed files might have similar limitations. In the case of gzip at least, I think it does support seek and tell. > Here's my new branch from the current master: > https://github.com/bow/biopython/tree/seqio-abi_handlefix > nothing different for now but I'll push my updates soon. Don't rush yourself - I'm away for a long weekend so won't be testing any updates till next week anyway. Thanks, Peter From b.invergo at gmail.com Thu Aug 4 15:38:23 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 04 Aug 2011 17:38:23 +0200 Subject: [Biopython-dev] Leaked handles in PAML unit tests In-Reply-To: References: <1312366681.1302.9.camel@localhost.localdomain> Message-ID: <1312472309.8916.15.camel@localhost.localdomain> Hi Peter, (I'm CCing this to the dev list for the info in the second paragraph) Thanks for the reply. I solved the Python2 problem by fixing my PYTHONPATH. Running the tests from the Tests directory couldn't find the Bio module due to a mistake in the PYTHONPATH, so I tried to run them from the parent directory, resulting in test failures. A dumb mistake but anyway it's fixed. Sorry for wasting your time with that. I still have the following error with Python 3.2, though, which prevents me from figuring out the leaked handle problem in Py3k: [brandon at brandon-linux Tests]$ python test_PAML_baseml.py Traceback (most recent call last): File "test_PAML_baseml.py", line 10, in from Bio.Phylo.PAML import baseml File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/__init__.py", line 12, in from Bio.Phylo._io import parse, read, write, convert File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/_io.py", line 12, in from Bio.Phylo import BaseTree, NewickIO, NexusIO File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/BaseTree.py", line 222 return u'%s(%s)' % (self.__class__.__name__, SyntaxError: invalid syntax Regarding that specific error, I think all strings are implicitly unicode in Python 3, aren't they? I don't have much experience with maintaing Py2/3 compatibility, though, so I don't know how to best handle this. Searching for the unicode operator (u') in the entire Bio file tree shows that it only exists in Phylo/PhyloXML.py and Phylo/BaseTree.py. -brandon On Wed, 2011-08-03 at 13:33 +0100, Peter Cock wrote: > On Wed, Aug 3, 2011 at 11:18 AM, Brandon Invergo wrote: > > Hi Peter, > > I'm still in the process of looking at them now but I'm running into a > > side issue that maybe you can help with. I've tried running the unit > > tests myself using both Python 2.7.2 and Python 3.2.1, the two versions > > I have, and both times it fails. > > Python 3 takes a bit more effort to debug due to the 2to3 thing > and different paths - so I'd focus on Python 2.7 initially. > > > Just looking at test_PAML_baseml.py, for example, with Python 2 I get a > > lot of test failures due to baseml.py now (correctly) throwing IOErrors > > rather than AttributeErrors or TypeErrors. With Python 3, on the other > > hand, I get syntax errors in BaseTree.py (I'll include the output of > > both below). I did a git pull upstream master before doing this, so my > > code should be up-to-date (it seems like the unit tests are out-of-date, > > re: the error types). Now, clearly these have passed on the build > > machine so I'm wondering what I could be doing wrong. Being able to > > replicate the test failures in Python 3 on my machine will really help > > in fixing them. > > Sorry about the probable-newbie question... > > What does "git status" give you? > > My usual routine is as follows, but I clone from the official repository > (which is therefore called origin), and have my personal one setup > as peterjc via "git remote add ...": > > git checkout master #if not there already > git fetch origin > git status #should say behind and can FF merge > git merge origin/master #should now have latest code > > I'm guessing you're working from a clone of your github repo? > > An easy thing to try is a fresh clone of the official biopython. > > The other key point is all the unit tests expect the current > directory to be the Tests directory NOT the parent directory > where setup.py lives. > > Note if you just do "python test_PAML_baseml.py" this will > pickup the installed Biopython (via PYTHONPATH etc). > > One option is "runtests.py test_PAML_baseml.py" which > will use the local code for you. > > If you do "python Tests/test_PAML_baseml.py" this should > pickup the source code for Biopython (won't work for any > compiled modules IIRC). > > Peter -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 490 bytes Desc: This is a digitally signed message part URL: From p.j.a.cock at googlemail.com Thu Aug 4 15:59:42 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 4 Aug 2011 16:59:42 +0100 Subject: [Biopython-dev] Leaked handles in PAML unit tests In-Reply-To: <1312472309.8916.15.camel@localhost.localdomain> References: <1312366681.1302.9.camel@localhost.localdomain> <1312472309.8916.15.camel@localhost.localdomain> Message-ID: On Thu, Aug 4, 2011 at 4:38 PM, Brandon Invergo wrote: > Hi Peter, > (I'm CCing this to the dev list for the info in the second paragraph) > Thanks for the reply. I solved the Python2 problem by fixing my > PYTHONPATH. Running the tests from the Tests directory couldn't find the > Bio module due to a mistake in the PYTHONPATH, so I tried to run them > from the parent directory, resulting in test failures. A dumb mistake > but anyway it's fixed. Sorry for wasting your time with that. No problem - learning about paths and imports is a bit tricky. > I still have the following error with Python 3.2, though, which prevents > me from figuring out the leaked handle problem in Py3k: > [brandon at brandon-linux Tests]$ python test_PAML_baseml.py > Traceback (most recent call last): > ?File "test_PAML_baseml.py", line 10, in > ? ?from Bio.Phylo.PAML import baseml > ?File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/__init__.py", > line 12, in > ? ?from Bio.Phylo._io import parse, read, write, convert > ?File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/_io.py", line > 12, in > ? ?from Bio.Phylo import BaseTree, NewickIO, NexusIO > ?File "/home/brandon/Projects/pypaml/biopython/Bio/Phylo/BaseTree.py", > line 222 > ? ?return u'%s(%s)' % (self.__class__.__name__, > > SyntaxError: invalid syntax Hang on - that looks like you ran it with "python" meaning Python 2.x Working with Python 3 the following should "just work": cd /home/brandon/Projects/pypaml/biopython python3 setup.py build python3 setup.py test python3 setup.py install #Use sudo or --prefix etc if you want However, if you want to run the offline test only, you need to go into the Python3 converted Tests directory, not the unconverted Python2 Tests directory. Note that this is Biopython specific (but based on what NumPy does). e.g. cd /home/brandon/Projects/pypaml/biopython python3 setup.py build cd build/py3.2/Tests python3 run_tests.py --offline Likewise if you want to test just one module, cd /home/brandon/Projects/pypaml/biopython python3 setup.py build cd build/py3.2/Tests python3 run_tests.py test_PAML_baseml.py In the above, run_tests.py should take care of the path settings to ensure the freshly built Biopython is used (not whatever old version may be installed elsewhere). If the above works nicely for you, stick with that. Alternatively, I often just install in-development versions of Biopython on my personal machine under my home directory (where Python 3 was also installed using the --prefix option so I don't need to mess about with the PYTHONPATH): cd /home/brandon/Projects/pypaml/biopython python3 setup.py install --prefix=$HOME cd build/py3.2/Tests python3 test_PAML_baseml.py If your Python 3 is installed at system level you can do this but it isn't very clean (certainly don't do it on a shared machine): cd /home/brandon/Projects/pypaml/biopython sudo python3 setup.py install cd build/py3.2/Tests python3 test_PAML_baseml.py Alternatively if your Python 3 is at the system level you can install Biopython under your home directory but then you have to mess about with PYTHONPATH and keep changing it for Python2 vs Python3, since they use the same variable (a design choice I fail to see any advantages in). Confusing isn't it? There are other potential solutions to having multiple copies of Python installed, like using virtualenv... Peter From p.j.a.cock at googlemail.com Thu Aug 4 17:32:38 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 4 Aug 2011 18:32:38 +0100 Subject: [Biopython-dev] Leaked handles in PAML unit tests In-Reply-To: <1312478530.8916.20.camel@localhost.localdomain> References: <1312366681.1302.9.camel@localhost.localdomain> <1312472309.8916.15.camel@localhost.localdomain> <1312478530.8916.20.camel@localhost.localdomain> Message-ID: On Thu, Aug 4, 2011 at 6:22 PM, Brandon Invergo wrote: > > The above does work nicely for me. So nicely, in fact, that the PAML > tests all pass! So I'm still having trouble replicating the leaked > handles. I'm still trying to figure out what's happening... > It could be something silly with warning silencing being global and not local, and thus depends on the order the tests are run in. Did you try running all the (offline) tests in one go under Python 3.2? Peter From b.invergo at gmail.com Thu Aug 4 18:21:59 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 04 Aug 2011 20:21:59 +0200 Subject: [Biopython-dev] Leaked handles in PAML unit tests In-Reply-To: References: <1312366681.1302.9.camel@localhost.localdomain> <1312472309.8916.15.camel@localhost.localdomain> <1312478530.8916.20.camel@localhost.localdomain> Message-ID: <1312482121.8916.22.camel@localhost.localdomain> Ok, now I've got the errors. Now I can actually get to work. Thanks for your help with this. I had no idea about the special Py3 building (I've just been using the raw tests from the repository) I'll see what I can do now. -brandon On Thu, 2011-08-04 at 18:32 +0100, Peter Cock wrote: > On Thu, Aug 4, 2011 at 6:22 PM, Brandon Invergo wrote: > > > > The above does work nicely for me. So nicely, in fact, that the PAML > > tests all pass! So I'm still having trouble replicating the leaked > > handles. I'm still trying to figure out what's happening... > > > > It could be something silly with warning silencing being global > and not local, and thus depends on the order the tests are run in. > > Did you try running all the (offline) tests in one go under Python 3.2? > > Peter From b.invergo at gmail.com Fri Aug 5 13:58:27 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Fri, 05 Aug 2011 15:58:27 +0200 Subject: [Biopython-dev] Leaked handles in PAML unit tests In-Reply-To: References: <1312366681.1302.9.camel@localhost.localdomain> <1312472309.8916.15.camel@localhost.localdomain> <1312478530.8916.20.camel@localhost.localdomain> Message-ID: <1312552714.8916.28.camel@localhost.localdomain> Ok the leaks have been taken care of. The problem arises when an exception is raised within a block of text in which a file handle is currently open. I simply had to close the handle just before raising the exception. There was another one, however, that came up from using stdout=open('/dev/null', 'w') in the subprocess.call() to PAML programs (which, come to think of it, is *nix-specific anyway, and probably wouldn't work with Windows). Instead, I set stdout to a subprocess.PIPE and get rid of the /dev/null handle altogether. Cheers, Brandon On Thu, 2011-08-04 at 18:32 +0100, Peter Cock wrote: > On Thu, Aug 4, 2011 at 6:22 PM, Brandon Invergo wrote: > > > > The above does work nicely for me. So nicely, in fact, that the PAML > > tests all pass! So I'm still having trouble replicating the leaked > > handles. I'm still trying to figure out what's happening... > > > > It could be something silly with warning silencing being global > and not local, and thus depends on the order the tests are run in. > > Did you try running all the (offline) tests in one go under Python 3.2? > > Peter From w.arindrarto at gmail.com Sat Aug 6 09:52:13 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Sat, 6 Aug 2011 11:52:13 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter & everyone, I've been trying to improve the parser so it works with forward-only handles, but I'm drawing a blank for now. I realized the reason I use seek in the first place was because of the file structure. In an Abi file we've got three data blocks: the header that contains the file information, the sequencing data, and the directories which serve as indexes to the sequencing data. To unpack the sequencing data bytes, we need the information stored in the directories. Depending on its size, it could be stored outside the directories block, or in the directory itself. This is why .seek() helps, because it allows for jumping between the directories and the sequencing data as it is being parsed. Now, I thought the three blocks were stored in this order: header - directory - sequencing data. I've thought of a way of parsing the file if the structure is like this. As it turns out, it's possible (or even this might be the norm) that the order is: header - sequencing data - directory. So as soon as I finished parsing the information on how to retrieve the data from the directories, I've already gone past the data block. In forward-only handles, this makes the data irretrievable. There should be other ways to retrieve the sequencing data in forward-only handles. I thought about reading the entire handle stream first and storing it into a variable. This way, we could replace seek() with slicing operators. The trade off is we store the entire handle stream in memory at once (abi files are probably ~300-500kb in size). I'm sure there are other ways, but I couldn't think of any now. So what do you think? Or maybe anyone else have ideas that I could try? Regards & have a nice weekend all, --- Wibowo Arindrarto (bow) http://bow.web.id On Thu, Aug 4, 2011 at 13:47, Peter Cock wrote: > On Thu, Aug 4, 2011 at 12:30 PM, Wibowo Arindrarto > wrote: > > Hi Peter, > > Ah yes, I didn't know there could be handles without .seek() and .tell(), > > and I thought those two are the proper way of traversing files, so I used > > them. I also didn't realize you could use SeqIO with network handles, > too. > > This is really neat :). > > Yes - having a handle focused API makes some clever stuff possible :) > Of course, parsing sequences directly from network handles isn't always > a good idea, but it can be useful. > > > In any case, sure, I'd love to make some changes to the current AbiIO > code > > so it works without .seek() and .tell(). Is there any other input types > that > > does not use .seek() and .tell() other than network handles? > > I suspect some specialised handles for accessing compressed files might > have similar limitations. In the case of gzip at least, I think it does > support > seek and tell. > > > Here's my new branch from the current master: > > https://github.com/bow/biopython/tree/seqio-abi_handlefix > > nothing different for now but I'll push my updates soon. > > Don't rush yourself - I'm away for a long weekend so won't be testing > any updates till next week anyway. > > Thanks, > > Peter > From derjogi at web.de Sun Aug 7 13:44:03 2011 From: derjogi at web.de (Jogi) Date: Sun, 07 Aug 2011 15:44:03 +0200 Subject: [Biopython-dev] Bug in Bio.Restriction.Analysis.print_as('map') + correction Message-ID: <1312724643.2148.5.camel@JogiDesk> I'm new to the field of 'bug reporting', so please, if someone knows where I should post this message please tell me or do it yourself :) I've found a bug in the Bio.Restriction module when calling Analysis.print_as('map'). The bugs (that I know of and that I corrected): 1. When there is a restriction site within the first 60 basepairs in the sequence this one isn't added to a list and thus raises an KeyError: 0 2. Sometimes (I don't know exactly how to reproduce it any more) an Enzyme is repeated in every line although there is no restriction site. Solution: Replace from line 310 in PrintFormat.py: x, counter, length = 0, 0, len(self.sequence) for x in xrange(60, length, 60): counter = x - 60 l=[] for key in mapping: if key <= x: l.append(key) else: cutloc[counter] = l mapping = mapping[mapping.index(key):] break cutloc[x] = l cutloc[x] = mapping sequence = self.sequence.tostring() With upper, lower, length = 0, 0, len(self.sequence) for upper in xrange(60, length+60, 60): lower = upper - 60 l=[] for key in mapping: if key <= upper and key > lower: l.append(key) else: mapping = mapping[mapping.index(key):] break cutloc[lower] = l sequence = self.sequence.tostring() Hope this bug report/solution was/is helpful and at the right place :) J.Kuhn From p.j.a.cock at googlemail.com Tue Aug 9 13:40:18 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 9 Aug 2011 14:40:18 +0100 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: On Sat, Aug 6, 2011 at 10:52 AM, Wibowo Arindrarto wrote: > Hi Peter & everyone, > I've been trying to improve the parser so it works with forward-only > handles, but I'm drawing a blank for now. > I realized the reason I use seek in the first place was because of the file > structure. In an Abi file we've got three data blocks: the header that > contains the file information, the sequencing data, and the directories > which serve as indexes to the sequencing data. To unpack the sequencing data > bytes, we need the information stored in the directories. Depending on its > size, it could be stored outside the directories block, or in the directory > itself. This is why .seek() helps, because it allows for jumping between the > directories and the sequencing data as it is being parsed. Yes - this design makes sense, especially given the computer capabilities back when the format was designed. > Now, I thought the three blocks were stored in this order: header - > directory - sequencing data. I've thought of a way of parsing the file if > the structure is like this.?As it turns out, it's possible (or even this > might be the norm) that the order is: header - sequencing data - directory. > So as soon as I finished parsing the information on how to retrieve the data > from the directories, I've already gone past the data block. In forward-only > handles, this makes the data irretrievable. I see now, that is unfortunate. I presume the current order was chosen to make writing the data easy (do the directory last). A simple forward only parser would be possible IF the data was reordered, but we can't require that. > There should be other ways to retrieve the sequencing data in forward-only > handles. I thought about reading the entire handle stream first and storing > it into a variable. This way, we could replace seek() with slicing > operators. The trade off is we store the entire handle stream in memory at > once (abi files are probably ~300-500kb in size). I'm sure there are other > ways, but I couldn't think of any now. > So what do you think? Or maybe anyone else have ideas that I could try? > Regards & have a nice weekend all, I think we have to accept that typical ABI files are not suitable for forward only parsing. Thanks for looking into this - I hope you found it interesting. Regards, Peter From redmine at redmine.open-bio.org Tue Aug 9 14:29:53 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 9 Aug 2011 14:29:53 +0000 Subject: [Biopython-dev] [Biopython - Bug #3278] (New) SeqIO tries to use Gapped without import Message-ID: Issue #3278 has been reported by Paul Agapow. ---------------------------------------- Bug #3278: SeqIO tries to use Gapped without import https://redmine.open-bio.org/issues/3278 Author: Paul Agapow Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: 1.53 URL: @to_alignment@ in @SeqIO@ uses @Gapped@ (@ isinstance(alphabet, Gapped)@) but does not actually import @Gapped at . Thus a @NameError@ results. Although the method is labelled obsolete, it is used by @SeqIO@ in write when an @AlignIO@ writer must be used (e.g. when trying to write sequences to a Nexus file). Solution: @from Bio.Alphabet import Gapped@ ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Aug 9 14:29:54 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 9 Aug 2011 14:29:54 +0000 Subject: [Biopython-dev] [Biopython - Bug #3278] (New) SeqIO tries to use Gapped without import Message-ID: Issue #3278 has been reported by Paul Agapow. ---------------------------------------- Bug #3278: SeqIO tries to use Gapped without import https://redmine.open-bio.org/issues/3278 Author: Paul Agapow Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: 1.53 URL: @to_alignment@ in @SeqIO@ uses @Gapped@ (@ isinstance(alphabet, Gapped)@) but does not actually import @Gapped at . Thus a @NameError@ results. Although the method is labelled obsolete, it is used by @SeqIO@ in write when an @AlignIO@ writer must be used (e.g. when trying to write sequences to a Nexus file). Solution: @from Bio.Alphabet import Gapped@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Aug 9 14:47:22 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 9 Aug 2011 14:47:22 +0000 Subject: [Biopython-dev] [Biopython - Bug #3278] SeqIO tries to use Gapped without import References: Message-ID: Issue #3278 has been updated by Peter Cock. Looking at Biopython 1.53 (December 2009) you appear to be correct. However, the function was explicitly made obsolete in Biopython 1.54 (with a deprecation warning), and at that point this error did not exist. Unless there a related problem in the current release, I will close this report. Thanks. ---------------------------------------- Bug #3278: SeqIO tries to use Gapped without import https://redmine.open-bio.org/issues/3278 Author: Paul Agapow Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: 1.53 URL: @to_alignment@ in @SeqIO@ uses @Gapped@ (@ isinstance(alphabet, Gapped)@) but does not actually import @Gapped at . Thus a @NameError@ results. Although the method is labelled obsolete, it is used by @SeqIO@ in write when an @AlignIO@ writer must be used (e.g. when trying to write sequences to a Nexus file). Solution: @from Bio.Alphabet import Gapped@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Tue Aug 9 14:49:30 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 9 Aug 2011 15:49:30 +0100 Subject: [Biopython-dev] Bug in Bio.Restriction.Analysis.print_as('map') + correction In-Reply-To: <1312724643.2148.5.camel@JogiDesk> References: <1312724643.2148.5.camel@JogiDesk> Message-ID: On Sun, Aug 7, 2011 at 2:44 PM, Jogi wrote: > I'm new to the field of 'bug reporting', so please, if someone knows > where I should post this message please tell me or do it yourself :) > > I've found a bug in the Bio.Restriction module when calling > Analysis.print_as('map'). > > The bugs (that I know of and that I corrected): > 1. When there is a restriction site within the first 60 basepairs in the > sequence this one isn't added to a list and thus raises an KeyError: 0 Could you give a short example script showing the problem? It could then be used for a unit test. > 2. Sometimes (I don't know exactly how to reproduce it any more) an > Enzyme is repeated in every line although there is no restriction site. I'm not familiar with that problem - without an example that will be hard to look into. Peter From w.arindrarto at gmail.com Tue Aug 9 14:59:37 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 9 Aug 2011 16:59:37 +0200 Subject: [Biopython-dev] SeqIO Abi Parser In-Reply-To: References: Message-ID: Hi Peter, You're welcome :)! Although a bit disappointing, it was nice when I understood why my forward parser didn't work. Regards, --- Wibowo Arindrarto (bow) http://bow.web.id On Tue, Aug 9, 2011 at 15:40, Peter Cock wrote: > On Sat, Aug 6, 2011 at 10:52 AM, Wibowo Arindrarto > wrote: > > Hi Peter & everyone, > > I've been trying to improve the parser so it works with forward-only > > handles, but I'm drawing a blank for now. > > I realized the reason I use seek in the first place was because of the > file > > structure. In an Abi file we've got three data blocks: the header that > > contains the file information, the sequencing data, and the directories > > which serve as indexes to the sequencing data. To unpack the sequencing > data > > bytes, we need the information stored in the directories. Depending on > its > > size, it could be stored outside the directories block, or in the > directory > > itself. This is why .seek() helps, because it allows for jumping between > the > > directories and the sequencing data as it is being parsed. > > Yes - this design makes sense, especially given the computer > capabilities back when the format was designed. > > > Now, I thought the three blocks were stored in this order: header - > > directory - sequencing data. I've thought of a way of parsing the file if > > the structure is like this. As it turns out, it's possible (or even this > > might be the norm) that the order is: header - sequencing data - > directory. > > So as soon as I finished parsing the information on how to retrieve the > data > > from the directories, I've already gone past the data block. In > forward-only > > handles, this makes the data irretrievable. > > I see now, that is unfortunate. I presume the current order was chosen > to make writing the data easy (do the directory last). A simple forward > only parser would be possible IF the data was reordered, but we can't > require that. > > > There should be other ways to retrieve the sequencing data in > forward-only > > handles. I thought about reading the entire handle stream first and > storing > > it into a variable. This way, we could replace seek() with slicing > > operators. The trade off is we store the entire handle stream in memory > at > > once (abi files are probably ~300-500kb in size). I'm sure there are > other > > ways, but I couldn't think of any now. > > So what do you think? Or maybe anyone else have ideas that I could try? > > Regards & have a nice weekend all, > > I think we have to accept that typical ABI files are not suitable for > forward > only parsing. Thanks for looking into this - I hope you found it > interesting. > > Regards, > > Peter > From redmine at redmine.open-bio.org Tue Aug 9 15:48:06 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 9 Aug 2011 15:48:06 +0000 Subject: [Biopython-dev] [Biopython - Bug #3278] (Closed) SeqIO tries to use Gapped without import References: Message-ID: Issue #3278 has been updated by Peter Cock. Status changed from New to Closed % Done changed from 0 to 100 I realised this deprecated function was due for removal, it will be gone in Biopython 1.58, https://github.com/biopython/biopython/commit/9eb934ee0425b4636b26f310a0f1454f53745b17 Marking this bug as closed. ---------------------------------------- Bug #3278: SeqIO tries to use Gapped without import https://redmine.open-bio.org/issues/3278 Author: Paul Agapow Status: Closed Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: 1.53 URL: @to_alignment@ in @SeqIO@ uses @Gapped@ (@ isinstance(alphabet, Gapped)@) but does not actually import @Gapped at . Thus a @NameError@ results. Although the method is labelled obsolete, it is used by @SeqIO@ in write when an @AlignIO@ writer must be used (e.g. when trying to write sequences to a Nexus file). Solution: @from Bio.Alphabet import Gapped@ -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Wed Aug 10 17:12:25 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 10 Aug 2011 18:12:25 +0100 Subject: [Biopython-dev] pypaml In-Reply-To: References: Message-ID: On Fri, Jan 14, 2011 at 2:11 PM, Brandon Invergo wrote: >> By the way, have you ever tried using this under Windows? > > I haven't yet but by the looks of it it should work fine assuming the > programs are in the system path and thus can be called by name from > any location in the file system. I see one line where I accidentally > made it *nix-specific (default working directory is "./") but other > than that, all files/directories are located via os.path or by > user-inputted strings (as they would be in the control file). I have > both a Linux and a Windows 7 machine at home though so I can do some > testing. Obviously the unit tests here will help catch system-specific > errors such as entering file locations incorrectly (I can see a few > exceptions that I'm currently not handling). Hi Brandon, Have you looked into PAML under Windows yet? Regards, Peter From b.invergo at gmail.com Wed Aug 10 17:16:08 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Wed, 10 Aug 2011 19:16:08 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: References: Message-ID: <1312996570.1339.12.camel@localhost.localdomain> On Wed, 2011-08-10 at 18:12 +0100, Peter Cock wrote: > Hi Brandon, > > Have you looked into PAML under Windows yet? > > Regards, > > Peter Hi Peter, Unfortunately, I don't have a Windows machine at my disposal to test it on! Has anyone reported any problems yet? -brandon From p.j.a.cock at googlemail.com Thu Aug 11 11:36:41 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 11 Aug 2011 12:36:41 +0100 Subject: [Biopython-dev] pypaml In-Reply-To: <1312996570.1339.12.camel@localhost.localdomain> References: <1312996570.1339.12.camel@localhost.localdomain> Message-ID: On Wed, Aug 10, 2011 at 6:16 PM, Brandon Invergo wrote: > On Wed, 2011-08-10 at 18:12 +0100, Peter Cock wrote: >> Hi Brandon, >> >> Have you looked into PAML under Windows yet? >> >> Regards, >> >> Peter > > Hi Peter, > Unfortunately, I don't have a Windows machine at my disposal to test it > on! Has anyone reported any problems yet? > > -brandon Hi Brandon, It's a shame you don't still have access to the Windows 7 box. I've just grabbed the current PAML 4.4 pre-compiled for Windows and put it on my Windows machine which runs as a buildslave, and put the binaries on the PATH: http://abacus.gene.ucl.ac.uk/software/paml.html http://abacus.gene.ucl.ac.uk/software/paml4.4e.tar.gz None of the current unit tests actually use the binaries do they? Could you add a basic test (in a separate file which raises the missing dependency exception to skip the test if the binary is not on the path) for calling the tools? Peter From b.invergo at gmail.com Thu Aug 11 11:51:26 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 11 Aug 2011 13:51:26 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: References: <1312996570.1339.12.camel@localhost.localdomain> Message-ID: <1313063488.1339.28.camel@localhost.localdomain> On Thu, 2011-08-11 at 12:36 +0100, Peter Cock wrote: > It's a shame you don't still have access to the Windows 7 box. > > I've just grabbed the current PAML 4.4 pre-compiled for Windows > and put it on my Windows machine which runs as a buildslave, > and put the binaries on the PATH: > > http://abacus.gene.ucl.ac.uk/software/paml.html > http://abacus.gene.ucl.ac.uk/software/paml4.4e.tar.gz > > None of the current unit tests actually use the binaries do they? > Could you add a basic test (in a separate file which raises the > missing dependency exception to skip the test if the binary is > not on the path) for calling the tools? > > Peter No, I didn't include any tests that use the binaries because I wasn't sure if they would be on the main test machine. Also, generating the output which is used in other tests can take a lot of time in some cases. Instead, I've generated the output files myself and then accessed those from the tests. The one problem I have with this approach is that it's not very reproducible; if someone else wishes to add data files from later versions of PAML, they won't know how I generated them. Again the goal is to make sure that we're parsing each new version correctly, since the output format has been known to change between versions. I could create a readme file which contains the info and put it in the paml Tests subfolder. Sound reasonable? I can create a Tests/test_PAML.py file to contain the proposed test. In it, I can try to run codeml, baseml and yn00 directly using Subprocess, each on some bogus input. If the binaries are there, they'll throw an error which the test will catch. If they aren't Subprocess itself will throw an error. I can't do this check using Bio.Phylo.PAML because we, of course, aim to prevent bogus input from ever even reaching the binary. How does that sound? Is that what you had in mind? -brandon From p.j.a.cock at googlemail.com Thu Aug 11 13:49:39 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 11 Aug 2011 14:49:39 +0100 Subject: [Biopython-dev] pypaml In-Reply-To: <1313063488.1339.28.camel@localhost.localdomain> References: <1312996570.1339.12.camel@localhost.localdomain> <1313063488.1339.28.camel@localhost.localdomain> Message-ID: On Thu, Aug 11, 2011 at 12:51 PM, Brandon Invergo wrote: > On Thu, 2011-08-11 at 12:36 +0100, Peter Cock wrote: >> It's a shame you don't still have access to the Windows 7 box. >> >> I've just grabbed the current PAML 4.4 pre-compiled for Windows >> and put it on my Windows machine which runs as a buildslave, >> and put the binaries on the PATH: >> >> http://abacus.gene.ucl.ac.uk/software/paml.html >> http://abacus.gene.ucl.ac.uk/software/paml4.4e.tar.gz >> >> None of the current unit tests actually use the binaries do they? >> Could you add a basic test (in a separate file which raises the >> missing dependency exception to skip the test if the binary is >> not on the path) for calling the tools? >> >> Peter > > No, I didn't include any tests that use the binaries because I wasn't > sure if they would be on the main test machine. Also, generating the > output which is used in other tests can take a lot of time in some > cases. Instead, I've generated the output files myself and then accessed > those from the tests. The one problem I have with this approach is that > it's not very reproducible; if someone else wishes to add data files > from later versions of PAML, they won't know how I generated them. Next time there is a PAML release, you'll have to make some more test files ;) > Again > the goal is to make sure that we're parsing each new version correctly, > since the output format has been known to change between versions. I > could create a readme file which contains the info and put it in the > paml Tests subfolder. Sound reasonable? Yes. > I can create a Tests/test_PAML.py file to contain the proposed test. In > it, I can try to run codeml, baseml and yn00 directly using Subprocess, > each on some bogus input. If the binaries are there, they'll throw an > error which the test will catch. If they aren't Subprocess itself will > throw an error. I can't do this check using Bio.Phylo.PAML because we, > of course, aim to prevent bogus input from ever even reaching the > binary. How does that sound? Is that what you had in mind? I believe we're thinking on the same lines here - have a look at test_Muscle_tool.py or test_Emboss.py and others like it. There is some header code which tries to locate the binaries, and perhaps check their version. Some tools have a switch like -v or --help or similar which makes them immediately exit, sometimes with a version number. This is less trouble than trying to run them with a dummy input file. Having had a quick play with ds.exe it generally seems to insist on asking for an input file, so you may have to go that route. But see if this is useful - probably you'd need /dev/nul on Unix machines: C:\repositories\biopython\Tests>ds nul results go into out.txt (1) collecting min, max, and mean 0:00 (2) variance-covariance matrix 0:00 (3) median, percentiles & serial correlation 0:00 (4) Histograms and 1-D densities If the binaries are missing or the wrong version, we raise MissingExternalDependencyError and the test gets skipped. If the binaries are present (and the right version), use the normal unittest framework. Try to make the examples quick to run (aim for well under a minute for the whole test), so smaller datafiles than might be typical. Peter From p.j.a.cock at googlemail.com Thu Aug 11 16:06:48 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 11 Aug 2011 17:06:48 +0100 Subject: [Biopython-dev] Deprecated code in Bio.PopGen and Bio.Motif ready to go? Message-ID: Hi Tiago & Bartek, Looking over the DEPRECATED file, the following are about due for removal in Bio.PopGen and Bio.Motif - do you guys have time to make these changes yourselves? Thanks, Peter > Bio.PopGen.FDist > ================ > The RecordParser, _Scanner, and _RecordConsumer classes were declared obsolete > in Release 1.54, and deprecated in Release 1.55 final. Their functionality is > now available through a read() function. and: > Bio.Motif > ========= > ... > AlignAceConsumer, AlignAceParser, and AlignAceScanner were declared obsolete > in Release 1.53 and deprecated in Release 1.55 final; their functionality is > now available through a read() function in Bio.Motif.Parsers.AlignAce. > MEMEParser, _MEMEScanner, _MEMEConsumer, _MASTConsumer, MASTParser, > _MASTScanner, and MASTRecord were declared obsolete in Release 1.54 and > deprecated in Release 1.55 final; their functionality is now available through > a read() function in Bio.Motif.Parsers.MEME and Bio.Motif.Parsers.MAST, > respectively. P.S. We don't usually need to mention private classes like _MEMEScanner in the DEPRECATE file. From tiagoantao at gmail.com Thu Aug 11 16:15:08 2011 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 11 Aug 2011 17:15:08 +0100 Subject: [Biopython-dev] Deprecated code in Bio.PopGen and Bio.Motif ready to go? In-Reply-To: References: Message-ID: I will do it over the weekend for bio.popgen 2011/8/11, Peter Cock : > Hi Tiago & Bartek, > > Looking over the DEPRECATED file, the following are about due for removal > in Bio.PopGen and Bio.Motif - do you guys have time to make these changes > yourselves? > > Thanks, > > Peter > >> Bio.PopGen.FDist >> ================ >> The RecordParser, _Scanner, and _RecordConsumer classes were declared >> obsolete >> in Release 1.54, and deprecated in Release 1.55 final. Their functionality >> is >> now available through a read() function. > > and: > >> Bio.Motif >> ========= >> ... >> AlignAceConsumer, AlignAceParser, and AlignAceScanner were declared >> obsolete >> in Release 1.53 and deprecated in Release 1.55 final; their functionality >> is >> now available through a read() function in Bio.Motif.Parsers.AlignAce. >> MEMEParser, _MEMEScanner, _MEMEConsumer, _MASTConsumer, MASTParser, >> _MASTScanner, and MASTRecord were declared obsolete in Release 1.54 and >> deprecated in Release 1.55 final; their functionality is now available >> through >> a read() function in Bio.Motif.Parsers.MEME and Bio.Motif.Parsers.MAST, >> respectively. > > P.S. We don't usually need to mention private classes like _MEMEScanner in > the DEPRECATE file. > -- Enviada a partir do meu dispositivo m?vel "If you want to get laid, go to college. If you want an education, go to the library." - Frank Zappa From barwil at gmail.com Thu Aug 11 16:28:01 2011 From: barwil at gmail.com (Bartek Wilczynski) Date: Thu, 11 Aug 2011 09:28:01 -0700 Subject: [Biopython-dev] Deprecated code in Bio.PopGen and Bio.Motif ready to go? In-Reply-To: References: Message-ID: Hi, I'll do the necessary changes in Bio.Motif by the end of the week. best Bartek 2011/8/11 Peter Cock : > Hi Tiago & Bartek, > > Looking over the DEPRECATED file, the following are about due for removal > in Bio.PopGen and Bio.Motif - do you guys have time to make these changes > yourselves? > > Thanks, > > Peter > >> Bio.PopGen.FDist >> ================ >> The RecordParser, _Scanner, and _RecordConsumer classes were declared obsolete >> in Release 1.54, and deprecated in Release 1.55 final. Their functionality is >> now available through a read() function. > > and: > >> Bio.Motif >> ========= >> ... >> AlignAceConsumer, AlignAceParser, and AlignAceScanner were declared obsolete >> in Release 1.53 and deprecated in Release 1.55 final; their functionality is >> now available through a read() function in Bio.Motif.Parsers.AlignAce. >> MEMEParser, _MEMEScanner, _MEMEConsumer, _MASTConsumer, MASTParser, >> _MASTScanner, and MASTRecord were declared obsolete in Release 1.54 and >> deprecated in Release 1.55 final; their functionality is now available through >> a read() function in Bio.Motif.Parsers.MEME and Bio.Motif.Parsers.MAST, >> respectively. > > P.S. We don't usually need to mention private classes like _MEMEScanner in > the DEPRECATE file. > -- Bartek Wilczynski ================== Institute of Informatics University of Warsaw http://www.mimuw.edu.pl/~bartek From redmine at redmine.open-bio.org Mon Aug 15 09:59:39 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 15 Aug 2011 09:59:39 +0000 Subject: [Biopython-dev] [Biopython - Bug #3188] (Closed) Test bug, please ignore References: Message-ID: Issue #3188 has been updated by Peter Cock. Status changed from New to Closed % Done changed from 0 to 100 Should have closed this test bug a while ago. ---------------------------------------- Bug #3188: Test bug, please ignore https://redmine.open-bio.org/issues/3188 Author: Peter Cock Status: Closed Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: The aim of this bug is to test the Redmine "Email on New Issue" option from the Newissuealerts module. This issue should get emailed to the biopython-dev email list automatically... Peter -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Mon Aug 15 10:04:41 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 15 Aug 2011 11:04:41 +0100 Subject: [Biopython-dev] Release blockers? PAML? Message-ID: Hi all, We're about due to make a Biopython release, and I could do it early this week - but then I'm away for a fortnight. I am fortunate to be attending the BioHackathon 2011 in Kyoto next week, http://2011.biohackathon.org/ I think we're in a good position with the code on the trunk to release Biopython 1.58, bar the PAML code which has not yet been tested on Windows. Also, I'd be keen for Tiago and Brandon to take a look at the application calling code to see if the is any scope for a more common approach between the PAML wrappers and the PopGen tools. Note that both sets of tools are not 'nicely behaved' Unix style tools (which is what the Bio.Applications API targets). To do anything useful with these tools you have to do nasty things like switch the current working directory and so on. If we want to do the release this week, we could just warn that the PAML code is consider to be "in beta" and that the API may well change in non-backwards compatible ways? What else should be addressed before the next release? There are some open bugs, but at first glance nothing critical. Regards, Peter From b.invergo at gmail.com Mon Aug 15 10:15:04 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Mon, 15 Aug 2011 12:15:04 +0200 Subject: [Biopython-dev] Release blockers? PAML? In-Reply-To: References: Message-ID: <1313403306.3107.5.camel@localhost.localdomain> Hi, Regarding PAML, I'm sorry I haven't implemented the binary tests yet. I'll put it on my to-do for today. Turns out it's a Spanish national holiday today so I guess I don't have to go to the lab. I have a Windows 7 laptop that up until now has been quarantined and used only for music software, with no other software allowed on it, not allowed near the interwebs, etc (it's a fickle machine), but last night I broke the rules and installed Python 2.7 on it. I'll try running the PAML tests on it and I'll let everyone know how it goes. Until later, -brandon On Mon, 2011-08-15 at 11:04 +0100, Peter Cock wrote: > Hi all, > > We're about due to make a Biopython release, and I could > do it early this week - but then I'm away for a fortnight. I am > fortunate to be attending the BioHackathon 2011 in Kyoto > next week, http://2011.biohackathon.org/ > > I think we're in a good position with the code on the trunk to > release Biopython 1.58, bar the PAML code which has not > yet been tested on Windows. Also, I'd be keen for Tiago and > Brandon to take a look at the application calling code to see > if the is any scope for a more common approach between > the PAML wrappers and the PopGen tools. Note that both > sets of tools are not 'nicely behaved' Unix style tools (which > is what the Bio.Applications API targets). To do anything > useful with these tools you have to do nasty things like > switch the current working directory and so on. > > If we want to do the release this week, we could just warn > that the PAML code is consider to be "in beta" and that > the API may well change in non-backwards compatible > ways? > > What else should be addressed before the next release? > > There are some open bugs, but at first glance nothing > critical. > > Regards, > > Peter From eric.talevich at gmail.com Mon Aug 15 15:02:57 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 15 Aug 2011 11:02:57 -0400 Subject: [Biopython-dev] Release blockers? PAML? In-Reply-To: References: Message-ID: On Mon, Aug 15, 2011 at 6:04 AM, Peter Cock wrote: > Hi all, > > We're about due to make a Biopython release, and I could > do it early this week - but then I'm away for a fortnight. I am > fortunate to be attending the BioHackathon 2011 in Kyoto > next week, http://2011.biohackathon.org/ > > [...] > What else should be addressed before the next release? > > There are some open bugs, but at first glance nothing > critical. > > A while ago I pushed a new function, Phylo.draw(). It draws rooted phylograms much like Phylip's drawgram or ape's plot.tree function. There's a lot of room for personal preferences here, so I'd appreciate if someone else could try it out and suggest changes. Usage: >>> from Bio import Phylo >>> tree = Phylo.read('some_tree.nwk', 'newick') >>> Phylo.draw(tree) Code: https://github.com/biopython/biopython/blob/master/Bio/Phylo/_utils.py The function only takes a few arguments, but since it's based on matplotlib/pylab, the aesthetics of a plot can easily be changed after the initial plotting. If we're happy with it, then I'll add a mention of it to the Tutorial. While I'm at it, has anyone else used Bio.Applications.PhymlCommandline and found any issues? Thanks, Eric From b.invergo at gmail.com Tue Aug 16 20:06:24 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Tue, 16 Aug 2011 22:06:24 +0200 Subject: [Biopython-dev] Release blockers? PAML? In-Reply-To: References: Message-ID: <1313525186.3107.7.camel@localhost.localdomain> Hi everyone, I wrote some tests for the presence of the PAML binaries and I've run all the unit tests in Python 2.7 on Windows 7 and they all pass. Cheers, Brandon On Mon, 2011-08-15 at 11:04 +0100, Peter Cock wrote: > Hi all, > > We're about due to make a Biopython release, and I could > do it early this week - but then I'm away for a fortnight. I am > fortunate to be attending the BioHackathon 2011 in Kyoto > next week, http://2011.biohackathon.org/ > > I think we're in a good position with the code on the trunk to > release Biopython 1.58, bar the PAML code which has not > yet been tested on Windows. Also, I'd be keen for Tiago and > Brandon to take a look at the application calling code to see > if the is any scope for a more common approach between > the PAML wrappers and the PopGen tools. Note that both > sets of tools are not 'nicely behaved' Unix style tools (which > is what the Bio.Applications API targets). To do anything > useful with these tools you have to do nasty things like > switch the current working directory and so on. > > If we want to do the release this week, we could just warn > that the PAML code is consider to be "in beta" and that > the API may well change in non-backwards compatible > ways? > > What else should be addressed before the next release? > > There are some open bugs, but at first glance nothing > critical. > > Regards, > > Peter From p.j.a.cock at googlemail.com Wed Aug 17 15:28:16 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 17 Aug 2011 16:28:16 +0100 Subject: [Biopython-dev] PAML yn00 under Windows Message-ID: Hi Brandon, It looks like the stats line parsing in yn00 needs a little adjustment for this platform, ====================================================================== ERROR: Test that the yn00 binary runs and generates correct output. ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\repositories\BuildBot\win26\build\Tests\test_PAML_tools.py", line 139, in testYn00Binary results = self.yn.run() File "c:\repositories\BuildBot\win26\build\build\lib.win32-2.6\Bio\Phylo\PAML\yn00.py", line 106, in run results = read(self.out_file) File "c:\repositories\BuildBot\win26\build\build\lib.win32-2.6\Bio\Phylo\PAML\yn00.py", line 131, in read sequences) File "c:\repositories\BuildBot\win26\build\build\lib.win32-2.6\Bio\Phylo\PAML\_parse_yn00.py", line 110, in parse_others value = stats_split[i+2].strip("()") IndexError: list index out of range ---------------------------------------------------------------------- Ran 157 tests in 282.385 seconds I added this commit for a more helpful error message: https://github.com/biopython/biopython/commit/420430164d258aae27714d907705cd729626f3c6 C:\repositories\biopython\Tests>c:\python26\python test_PAML_tools.py Test that the baseml binary runs and generates correct output ... ok Test that the codeml binary runs and generates correct output ... ok Test that the yn00 binary runs and generates correct output. ... ERROR ====================================================================== ERROR: Test that the yn00 binary runs and generates correct output. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_PAML_tools.py", line 139, in testYn00Binary results = self.yn.run() File "c:\python26\Lib\site-packages\Bio\Phylo\PAML\yn00.py", line 106, in run results = read(self.out_file) File "c:\python26\Lib\site-packages\Bio\Phylo\PAML\yn00.py", line 131, in read sequences) File "c:\python26\Lib\site-packages\Bio\Phylo\PAML\_parse_yn00.py", line 113, in parse_others raise ValueError("Problem with stats line: %r" % line) ValueError: Problem with stats line: 'LWL85m: dS = -1.#IND dN = -1.#IND w =-1.#IND S = -1.$ N = -1.$ (rho = -1.#IO)\n' ---------------------------------------------------------------------- Ran 3 tests in 1.312s FAILED (errors=1) It looks like you're not expecting a bracket pattern quite like that (and/or this is a cross platform C float representation issue). Hopefully that string is enough to work out how to fix the parser, even if you can't reproduce this on your own machine. I can try and find the output file if you like... might have to disable the tool's clean up code temporarily to leave it behind. Regards, Peter From p.j.a.cock at googlemail.com Wed Aug 17 15:39:41 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 17 Aug 2011 16:39:41 +0100 Subject: [Biopython-dev] PAML yn00 under Windows In-Reply-To: References: Message-ID: On Wed, Aug 17, 2011 at 4:28 PM, Peter Cock wrote: > Hi Brandon, > > It looks like the stats line parsing in yn00 needs a little adjustment > for this platform, > ... > ? ?value = stats_split[i+2].strip("()") > IndexError: list index out of range > > > ... > ? ?raise ValueError("Problem with stats line: %r" % line) > ValueError: Problem with stats line: 'LWL85m: dS = -1.#IND dN = > -1.#IND w =-1.#IND S = ? -1.$ N = ? -1.$ (rho = -1.#IO)\n' I think you need to adjustment to the bounds on i given you want to use stats_split[i] and stats_split[i+2]. Note sure if want a -3 or -2 on the upper bound... C:\repositories\biopython\Tests>git diff diff --git a/Bio/Phylo/PAML/_parse_yn00.py b/Bio/Phylo/PAML/_parse_yn00.py index 221b6de..e4967fb 100644 --- a/Bio/Phylo/PAML/_parse_yn00.py +++ b/Bio/Phylo/PAML/_parse_yn00.py @@ -103,7 +103,7 @@ def parse_others(lines, results, sequences): stats = {} line_stats = line.split(":")[1].strip() stats_split = line_stats.split() - for i in range(0, len(stats_split), 3): + for i in range(0, len(stats_split)-3, 3): stat = stats_split[i].strip("()") if stat == "w": stat = "omega" I don't know why this didn't come up under Linux, something subtle going on between the PAML versions maybe? Regards, Peter From p.j.a.cock at googlemail.com Wed Aug 17 17:02:24 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 17 Aug 2011 18:02:24 +0100 Subject: [Biopython-dev] PAML yn00 under Windows In-Reply-To: References: Message-ID: Hi again, You may have noticed from the buildbot emails that there is a separate issue with the PAML tests on Python (2.4 and) 2.5, applying to executing all three binaries tried: yn00, baseml and codeml, e.g. http://testing.open-bio.org:8010/builders/Windows%20XP%20-%20Python%202.4/builds/259/steps/shell/logs/stdio ====================================================================== ERROR: Test that the yn00 binary runs and generates correct output. ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\repositories\BuildBot\win24\build\Tests\test_PAML_tools.py", line 139, in testYn00Binary results = self.yn.run() File "c:\repositories\BuildBot\win24\build\build\lib.win32-2.4\Bio\Phylo\PAML\yn00.py", line 104, in run Paml.run(self, ctl_file, verbose, command) File "c:\repositories\BuildBot\win24\build\build\lib.win32-2.4\Bio\Phylo\PAML\_paml.py", line 148, in run raise EnvironmentError, "The %s process was killed." % command EnvironmentError: The yn00 process was killed. ---------------------------------------------------------------------- I can reproduce this at the terminal window, and it is specific to Python (2.4 and) 2.5, using Python 2.6 and 2.7 is fine, as are Python 3.1 and 3.2. Peter From p.j.a.cock at googlemail.com Wed Aug 17 17:56:28 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 17 Aug 2011 18:56:28 +0100 Subject: [Biopython-dev] PAML yn00 under Windows In-Reply-To: References: Message-ID: On Wed, Aug 17, 2011 at 6:02 PM, Peter Cock wrote: > Hi again, > > You may have noticed from the buildbot emails that there is a > separate issue with the PAML tests on Python (2.4 and) 2.5, > applying to executing all three binaries tried: yn00, baseml > and codeml, e.g. > ... > I can reproduce this at the terminal window, and it is specific > to Python (2.4 and) 2.5, using Python 2.6 and 2.7 is fine, as > are Python 3.1 and 3.2. I'm getting -1 back from the subprocess.call(...) https://github.com/biopython/biopython/commit/2d94a24ca223851d9fd895a82780dd0f23dc2dca Some debugging later I realised the paths in the control file were using Unix slashes rather than Windows slashes: https://github.com/biopython/biopython/commit/4125e55b291922053380b5fe688bd687c70035fa That should now just leave the yn00 stats parsing for you to check (which offset should the fix use, assuming that is the right fix). It was worth insisting on more tests and running them on Windows :) Regards, Peter From b.invergo at gmail.com Wed Aug 17 18:43:04 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Wed, 17 Aug 2011 20:43:04 +0200 Subject: [Biopython-dev] PAML yn00 under Windows In-Reply-To: References: Message-ID: <1313606586.3107.9.camel@localhost.localdomain> Hi, Just got home and saw the emails. Yes, in the end it was good to do the extra tests! So the path separator problem is solved, right? That indexing is a weird one. I'll look at it now. -brandon On Wed, 2011-08-17 at 18:56 +0100, Peter Cock wrote: > On Wed, Aug 17, 2011 at 6:02 PM, Peter Cock wrote: > > Hi again, > > > > You may have noticed from the buildbot emails that there is a > > separate issue with the PAML tests on Python (2.4 and) 2.5, > > applying to executing all three binaries tried: yn00, baseml > > and codeml, e.g. > > ... > > I can reproduce this at the terminal window, and it is specific > > to Python (2.4 and) 2.5, using Python 2.6 and 2.7 is fine, as > > are Python 3.1 and 3.2. > > I'm getting -1 back from the subprocess.call(...) > https://github.com/biopython/biopython/commit/2d94a24ca223851d9fd895a82780dd0f23dc2dca > > Some debugging later I realised the paths in the control file > were using Unix slashes rather than Windows slashes: > https://github.com/biopython/biopython/commit/4125e55b291922053380b5fe688bd687c70035fa > > That should now just leave the yn00 stats parsing for you > to check (which offset should the fix use, assuming that > is the right fix). > > It was worth insisting on more tests and running them on Windows :) > > Regards, > > Peter From b.invergo at gmail.com Wed Aug 17 21:28:32 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Wed, 17 Aug 2011 23:28:32 +0200 Subject: [Biopython-dev] PAML yn00 under Windows In-Reply-To: References: Message-ID: <1313616514.3107.27.camel@localhost.localdomain> Ok, I just sent a pull request. It turns out that either due to the way C works in Windows or due to the way PAML was coded, what was a nice "-nan" in Linux is printed as "-1.#IND" in Windows, which messed up everything. Rather than parsing it in an algorithmic manner, I got angry and threw some regex fu at it, which works a lot nicer than what I had before. Tested successfully in Linux and Windows 7, Python 2.7.2 -brandon On Wed, 2011-08-17 at 16:39 +0100, Peter Cock wrote: > On Wed, Aug 17, 2011 at 4:28 PM, Peter Cock wrote: > > Hi Brandon, > > > > It looks like the stats line parsing in yn00 needs a little adjustment > > for this platform, > > ... > > value = stats_split[i+2].strip("()") > > IndexError: list index out of range > > > > > > ... > > raise ValueError("Problem with stats line: %r" % line) > > ValueError: Problem with stats line: 'LWL85m: dS = -1.#IND dN = > > -1.#IND w =-1.#IND S = -1.$ N = -1.$ (rho = -1.#IO)\n' > > I think you need to adjustment to the bounds on i given you want to use > stats_split[i] and stats_split[i+2]. Note sure if want a -3 or -2 on the upper > bound... > > C:\repositories\biopython\Tests>git diff > diff --git a/Bio/Phylo/PAML/_parse_yn00.py b/Bio/Phylo/PAML/_parse_yn00.py > index 221b6de..e4967fb 100644 > --- a/Bio/Phylo/PAML/_parse_yn00.py > +++ b/Bio/Phylo/PAML/_parse_yn00.py > @@ -103,7 +103,7 @@ def parse_others(lines, results, sequences): > stats = {} > line_stats = line.split(":")[1].strip() > stats_split = line_stats.split() > - for i in range(0, len(stats_split), 3): > + for i in range(0, len(stats_split)-3, 3): > stat = stats_split[i].strip("()") > if stat == "w": > stat = "omega" > > > I don't know why this didn't come up under Linux, something subtle > going on between the PAML versions maybe? > > Regards, > > Peter From p.j.a.cock at googlemail.com Wed Aug 17 21:43:13 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 17 Aug 2011 22:43:13 +0100 Subject: [Biopython-dev] PAML yn00 under Windows In-Reply-To: <1313616514.3107.27.camel@localhost.localdomain> References: <1313616514.3107.27.camel@localhost.localdomain> Message-ID: On Wed, Aug 17, 2011 at 10:28 PM, Brandon Invergo wrote: > Ok, I just sent a pull request. It turns out that either due to the way > C works in Windows or due to the way PAML was coded, what was a nice > "-nan" in Linux is printed as "-1.#IND" in Windows, which messed up > everything. That sounds like the C float libraries, the oddities of which are something which later versions of Python have done a better and better job of hiding from us ;) > Rather than parsing it in an algorithmic manner, I got angry > and threw some regex fu at it, which works a lot nicer than what > I had before. > > Tested successfully in Linux and Windows 7, Python 2.7.2 > > -brandon Sounds good - I'll have a look on github (possibly tomorrow), Peter From p.j.a.cock at googlemail.com Thu Aug 18 16:10:15 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 18 Aug 2011 17:10:15 +0100 Subject: [Biopython-dev] Commit freeze for release 1.58 Message-ID: Hi all, Unless anyone objects I propose to do the Biopython 1.58 release in the next hour. If this runs into any issues, it will have to wait until I'm back at work in two weeks time, or someone else (with access to a Windows 32 bit machine with all the compilers setup) can tackle it instead. I will be active online next week however - and coding - but on Japan time: http://2011.biohackathon.org/ I'm assuming the NEWS file is up to date, and will as usual be basing the release notice on that. If there is anything missing, please reply by email. Thank you all, Peter From p.j.a.cock at googlemail.com Thu Aug 18 17:19:32 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 18 Aug 2011 18:19:32 +0100 Subject: [Biopython-dev] Commit freeze for release 1.58 In-Reply-To: References: Message-ID: On Thu, Aug 18, 2011 at 5:10 PM, Peter Cock wrote: > Hi all, > > Unless anyone objects I propose to do the Biopython 1.58 > release in the next hour. If this runs into any issues, it will > have to wait until I'm back at work in two weeks time, or > someone else (with access to a Windows 32 bit machine > with all the compilers setup) can tackle it instead. > > I will be active online next week however - and coding - > but on Japan time: http://2011.biohackathon.org/ > > I'm assuming the NEWS file is up to date, and will as > usual be basing the release notice on that. If there is > anything missing, please reply by email. > > Thank you all, > > Peter > Ok, that's done. And in news that will no doubt please some of you, I've finally given up on keeping Python 2.4 support going. Feel free to start cleaning up some of the nastier hacks (like the ElementTree imports). Peter From p.j.a.cock at googlemail.com Thu Aug 18 19:32:57 2011 From: p.j.a.cock at googlemail.com (Peter) Date: Thu, 18 Aug 2011 20:32:57 +0100 Subject: [Biopython-dev] Biopython 1.58 released Message-ID: <75327C54-CF88-43BC-BACF-87139456FE67@googlemail.com> Dear All, Biopython 1.58 is out: http://news.open-bio.org/news/2011/08/biopython-1-58-released/ Thank you to everyone who has contributed. Peter P.S. We're on Twitter as @Biopython From updates at feedmyinbox.com Sun Aug 21 07:49:13 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Sun, 21 Aug 2011 03:49:13 -0400 Subject: [Biopython-dev] 8/21 newest questions tagged biopython - Stack Overflow Message-ID: <0adf58b4241f2a58161d1a41524288d1@74.63.51.88> // A PWM with gapped alignments in Biopython // August 9, 2011 at 11:28 AM http://stackoverflow.com/questions/6998727/a-pwm-with-gapped-alignments-in-biopython I'm trying to generate a Position-Weighted Matrix (PWM) in Biopython from Clustalw multiple sequence alignments. I get a "Wrong Alphabet" error every time I do it with gapped alignments. From reading the documentation, I think I need to utilize the Gapped Alphabet to deal with the '-' character in gapped alignments. But when I do this, it still doesn't resolve the error. Does anyone see the problem with this code, or have a better way to generate a PWM from gapped Clustal alignments? from Bio.Alphabet import Gapped alignment = AlignIO.read("filename.clustalw", "clustal", alphabet=Gapped) m = Motif.Motif() for a in alignment: m.add_instance(a.seq) m.pwm() -- Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/837947/00ae8e456ba91bb32a32b795eb392f971eee04e9/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From updates at feedmyinbox.com Sun Aug 21 07:48:37 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Sun, 21 Aug 2011 03:48:37 -0400 Subject: [Biopython-dev] 8/21 biopython Questions - BioStar Message-ID: <44c53445166933a51ab21f5d53e72577@74.63.51.88> // Error using Entrez.esummary from biopython // August 16, 2011 at 8:47 AM http://biostar.stackexchange.com/questions/11232/error-using-entrez-esummary-from-biopython Can someone please explain this error? I hava a smal script that tries to fetch information from the a NCBI BioAssay using the Entrez module form Bipython. I get an error I do not understand. I try to run: from Bio import Entrez Entrez.email="yourname at mail.se" handle_esummary=Entrez.esummary(db='pcassay',id='1337') record_esummary=Entrez.read(handle_esummary) I get the error: File "smaltest.py", line 5, in record_esummary=Entrez.read(handle_esummary) File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 297, in read record = handler.run(handle) File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 90, in run self.parser.ParseFile(handle) File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 105, in startElement itemtype = str(attrs["Type"]) # convert from Unicode KeyError: 'Type' // Import fasta sequences to a motif // August 15, 2011 at 11:54 AM http://biostar.stackexchange.com/questions/11204/import-fasta-sequences-to-a-motif I need to construct a PWM from every sequence in a fasta file, using biopython. The way I'm trying to do this is to import each line of sequence into a motif, then run a PWM on each instance of the motif. Currently, I'm trying it this way, but different variations of it have generated their fair share of errors, mostly "Wrong Alphabet" and "NoneType object is not iterable": alphabet = IUPAC.unambiguous_dna m = Motif.Motif(alphabet) for seq_record in SeqIO.parse("10fasta.fasta", "fasta"): m.add_instance(seq_record.seq) print m1.pwm() Does anyone see what's wrong with the way I'm adding instances to the motif? Of course, if there's a better way to do this that I'm completely missing, feel free to comment on that too. // A PWM with gapped alignments in Biopython // August 9, 2011 at 1:47 PM http://biostar.stackexchange.com/questions/11070/a-pwm-with-gapped-alignments-in-biopython I'm trying to generate a Position-Weighted Matrix (PWM) in Biopython from Clustalw multiple sequence alignments. I get a "Wrong Alphabet" error every time I do it with gapped alignments. From reading the documentation, I think I need to utilize the Gapped Alphabet to deal with the '-' character in gapped alignments. But when I do this, it still doesn't resolve the error. Does anyone see the problem with this code, or have a better way to generate a PWM from gapped Clustal alignments? from Bio.Alphabet import Gapped alignment = AlignIO.read("filename.clustalw", "clustal", alphabet=Gapped) m = Motif.Motif() for a in alignment: m.add_instance(a.seq) m.pwm() -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/837946/d83641150d25e0f52255e3fcfa9e7ccb2b83405f/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From p.j.a.cock at googlemail.com Mon Aug 22 06:53:17 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 22 Aug 2011 07:53:17 +0100 Subject: [Biopython-dev] Type vs type in Entrez.esummary XML (via BioStar) Message-ID: Hi all, On Sun, Aug 21, 2011 at 8:48 AM, Feed My Inbox wrote: > // Error using Entrez.esummary from biopython > // August 16, 2011 at 8:47 AM > > http://biostar.stackexchange.com/questions/11232/error-using-entrez-esummary-from-biopython > Can someone please explain this error? > > I hava a smal script that tries to fetch information from the a > NCBI BioAssay using the Entrez module form Bipython. I get > an error I do not understand. I try to run: > > from Bio import Entrez > Entrez.email="yourname at mail.se" > > handle_esummary=Entrez.esummary(db='pcassay',id='1337') > record_esummary=Entrez.read(handle_esummary) > > > I get the error: > > File "smaltest.py", line 5, in > ? ?record_esummary=Entrez.read(handle_esummary) > ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 297, in read > ? ?record = handler.run(handle) > ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 90, in run > ? ?self.parser.ParseFile(handle) > ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 105, in startElement > ? ?itemtype = str(attrs["Type"]) # convert from Unicode > KeyError: 'Type' > I can reproduce this and The cause is the NCBI using lowercase in one tag's attribute: We're expecting the attributes to be Name and Type, and that is the case for all the other tags in this file. Michiel - do you think we should just add a fallback for type if we get a KeyError on Type? Do you think we should report this inconsistency/bug to the NCBI? Peter From p.j.a.cock at googlemail.com Mon Aug 22 07:03:30 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 22 Aug 2011 08:03:30 +0100 Subject: [Biopython-dev] Type vs type in Entrez.esummary XML (via BioStar) In-Reply-To: References: Message-ID: On Mon, Aug 22, 2011 at 7:53 AM, Peter Cock wrote: > Hi all, > > On Sun, Aug 21, 2011 at 8:48 AM, Feed My Inbox wrote: >> // Error using Entrez.esummary from biopython >> // August 16, 2011 at 8:47 AM >> >> http://biostar.stackexchange.com/questions/11232/error-using-entrez-esummary-from-biopython >> Can someone please explain this error? >> >> I hava a smal script that tries to fetch information from the a >> NCBI BioAssay using the Entrez module form Bipython. I get >> an error I do not understand. I try to run: >> >> from Bio import Entrez >> Entrez.email="yourname at mail.se" >> >> handle_esummary=Entrez.esummary(db='pcassay',id='1337') >> record_esummary=Entrez.read(handle_esummary) >> >> >> I get the error: >> >> File "smaltest.py", line 5, in >> ? ?record_esummary=Entrez.read(handle_esummary) >> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/__init__.py", line 297, in read >> ? ?record = handler.run(handle) >> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 90, in run >> ? ?self.parser.ParseFile(handle) >> ?File "/usr/common/schrodinger/mmshare-v20109/lib/Linux-x86_64/lib/python2.7/site-packages/Bio/Entrez/Parser.py", line 105, in startElement >> ? ?itemtype = str(attrs["Type"]) # convert from Unicode >> KeyError: 'Type' >> > > I can reproduce this and The cause is the NCBI using > lowercase in one tag's attribute: > > > > We're expecting the attributes to be Name and Type, and > that is the case for all the other tags in this file. > > Michiel - do you think we should just add a fallback for > type if we get a KeyError on Type? Do you think we should > report this inconsistency/bug to the NCBI? Actually it clearly violates the DTD, and thus fails XML validation - so it is clearly a NCBI bug. Peter From chapmanb at 50mail.com Tue Aug 23 19:31:34 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 23 Aug 2011 15:31:34 -0400 Subject: [Biopython-dev] Brad's GFF parser in a Biopython repository In-Reply-To: References: Message-ID: <20110823193134.GB507@kunkel> Peter; Awesome, thanks for doing this. I didn't even realize there was a git solution that could transfer histories across repositories like this; how did you do it? Everything looks great on a first pass. Do you think some of the scripts would also be useful to include in the script directory? They handle some of the common cases people have asked about; 'access_gff_index.py' uses bx-python so might be excluded, but the others are Biopython specific. Thanks again, Brad > I managed to do a git script to select out the GFF code and tests from > your bcbb repository and get it into the Biopython source tree. The > folder changes made it interesting ;) > > Input: https://github.com/chapmanb/bcbb (master branch) > > Output: https://github.com/peterjc/biopython/tree/brad_gff > > The tests pass, but that is as far as I have got with this. Brad, > could you have a look at this new branch for sanity checking please? > > Peter From p.j.a.cock at googlemail.com Wed Aug 24 02:33:21 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 24 Aug 2011 03:33:21 +0100 Subject: [Biopython-dev] Brad's GFF parser in a Biopython repository In-Reply-To: <20110823193134.GB507@kunkel> References: <20110823193134.GB507@kunkel> Message-ID: On Tue, Aug 23, 2011 at 8:31 PM, Brad Chapman wrote: > Peter; > Awesome, thanks for doing this. I didn't even realize there was a > git solution that could transfer histories across repositories like > this; how did you do it? Well, it wasn't an off the shelf solution, it was a hack. See https://gist.github.com/1167169 and https://github.com/gitpython-developers/GitPython I used the Python library (import git) to query the source repository, basically doing "git log -- gff/BCBio gff/Tests" to find only the commits of interest, then "git show XXX" to extract the diff which I then had to modify to change the paths, then a system call to patch to apply each patch to the destination repository, git add, git commit. Note for git commit you can specify the message via a file (-F) so I could preserve the original long message, plus you can preserve the authored date (--date) and the author too. There were several steps where I couldn't work out how you were meant to do something via the git wrapper's API (e.g. get a diff as a patch), but it also lets you easily call git commands directly which was easier for me. Bit hacky but seemed to get the job done. > Everything looks great on a first pass. Do you think some of the > scripts would also be useful to include in the script directory? > They handle some of the common cases people have asked about; > 'access_gff_index.py' uses bx-python so might be excluded, but the > others are Biopython specific. > > Thanks again, > Brad Good point - that could be mapped to the Biopython scripts folder. I'll take a look. Peter From updates at feedmyinbox.com Thu Aug 25 07:48:40 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Thu, 25 Aug 2011 03:48:40 -0400 Subject: [Biopython-dev] 8/25 biopython Questions - BioStar Message-ID: <738da676fc97903dba65147015733dc5@74.63.51.88> // How to fetch genomics sequnce using coordinates in BIOPython // August 24, 2011 at 10:56 PM http://biostar.stackexchange.com/questions/11454/how-to-fetch-genomics-sequnce-using-coordinates-in-biopython Hi everyone, I'm a newbie of biopython. My question may be stupid but please help. I want to use (chromosome number, start position, end position, strand) to fetch the corresponding sequence in mouse genome. How can this be done with biopython connecting to NCBI database? Could anyone help me please? Thanks a lot. // How to fetch genomics sequence using coordinates in BioPython // August 24, 2011 at 10:56 PM http://biostar.stackexchange.com/questions/11454/how-to-fetch-genomics-sequence-using-coordinates-in-biopython Hi everyone, I'm a newbie of biopython. My question may be stupid but please help. I want to use (chromosome number, start position, end position, strand) to fetch the corresponding sequence in mouse genome. How can this be done with biopython connecting to NCBI database? Could anyone help me please? Thanks a lot. -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/837946/d83641150d25e0f52255e3fcfa9e7ccb2b83405f/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From p.j.a.cock at googlemail.com Fri Aug 26 07:44:32 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 26 Aug 2011 08:44:32 +0100 Subject: [Biopython-dev] Biopython under Python from Cygwin on Windows? Message-ID: Hi all, I was just wondering if anyone has tried this recently (Biopython under Cygwin), and if it would be worth adding as another platform for the buildbot. There are likely enough differences from Linux to cause potential cross platform issues - especially for calling external tools... Regards, Peter From updates at feedmyinbox.com Fri Aug 26 08:05:18 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Fri, 26 Aug 2011 04:05:18 -0400 Subject: [Biopython-dev] 8/26 newest questions tagged biopython - Stack Overflow Message-ID: // How do I set the PYTHONPATH on Cygwin? // August 25, 2011 at 9:16 PM http://stackoverflow.com/questions/7199082/how-do-i-set-the-pythonpath-on-cygwin In the Biopython installation instructions, it says that if Biopython doesn't work I'm supposed to do this: export PYTHONPATH = $PYTHONPATH':/directory/where/you/put/Biopython' I tried doing that in Cygwin from the ~ directory using the name of the Biopython directory (or everything of it past the ~ directory), but when I tested it by going into the Python interpreter and typing in From Bio.Seq import Seq It said the module doesn't exist. How do I make it so that I don't have to be in the Biopython directory to be able to import Seq? -- Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/837947/00ae8e456ba91bb32a32b795eb392f971eee04e9/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From clements at galaxyproject.org Mon Aug 29 21:29:28 2011 From: clements at galaxyproject.org (Dave Clements) Date: Mon, 29 Aug 2011 14:29:28 -0700 Subject: [Biopython-dev] Galaxy is Hiring In-Reply-To: References: Message-ID: Hello all The Galaxy Project is growing and has open positions in both the Penn State and Emory groups (http://wiki.g2.bx.psu.edu/News/Galaxy%20is%20Hiring). *Penn State: System administrators/analysts* The Nekrutenko Lab at the Huck Institutes of Life Sciences at Penn State is currently recruiting system analysts/administrators with experience in building and maintaining complex performance compute environments. The areas of immediate need include: - Storage balancing and tiered storage - Virtualization - Schedulers - Deployment of Galaxy instances and dependence management - Relational databases and query optimization - User management A minimum of 5 year experience with UNIX/Linux system administration is required. Applicants should submit a CV and list of references to jobs at galaxyproject.org. *Emory: Software Engineers and Post-Docs* The Taylor Lab in the Biologyand Mathematics & Computer Science at Emory Universityis looking for software engineers and postdoctoral scholarsto work on the Galaxy project. We are seeking software engineers with expertise in distributed computing and systems programming, web-based visualization and visual analytics, informatics and data analysis and integration, and bioinformatics application areas such as re-sequencing, de novo assembly, metagenomics, transcriptome analysis and epigenetics. These are full time positions located in Atlanta, GA. See the official posting( http://bx.mathcs.emory.edu/joining/sw/) for full details. Postdoctoral applicants should have expertise in Bioinformatics and Computational Biology and research interests that complement but extend the lab's current interests: The Galaxy project; distributed and high-performance computing for data intensive science; vertebrate functional genomics; and genomics and epigenomic mechanisms of gene regulation, the role of transcription factors and chromatin structure in global gene expression, development, and differentiation. See the announcement( http://bx.mathcs.emory.edu/joining/postdocs/) for full details. If any of these openings describe you then please consider applying. Thanks, Dave C. -- http://galaxyproject.org/ http://getgalaxy.org/ http://usegalaxy.org/ http://galaxyproject.org/wiki/