From p.j.a.cock at googlemail.com Tue Dec 3 05:38:43 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 3 Dec 2013 10:38:43 +0000 Subject: [Biopython-dev] [biopython] Missing DTD files (#260) In-Reply-To: <1385868525.19183.YahooMailBasic@web164002.mail.gq1.yahoo.com> References: <1385868525.19183.YahooMailBasic@web164002.mail.gq1.yahoo.com> Message-ID: On Sun, Dec 1, 2013 at 3:28 AM, Michiel de Hoon wrote: > How would people feel about Biopython always downloading DTD files > on the fly instead of distributing them with Biopython? > > After downloading and parsing a DTD file, we can keep it in memory > so we won't need to parse the same DTD file over and over again. > So the impact on speed will be minimal. > > If we do so, we'll never run into the problem of missing DTD files. The > downside of course is that we will need internet access to parse any > XML file through Bio.Entrez. But maybe in today's world that is acceptable. Requiring network access would be annoying for offline work (e.g. how we usually run the automated tests), but most of the NCBI Entrez XML files will (I expect) will be downloaded and immediately parsed. So for usability this seems OK. Automatic caching to disk (without a scary warning) seems like a better idea than always downloading the DTD files on demand (which seems wasteful of bandwidth and more likely to give intermittent errors), although as you have noted before there is the open question of where to put this files (including where on Windows): http://lists.open-bio.org/pipermail/biopython-dev/2010-October/008310.html Regards, Peter From tra at popgen.net Tue Dec 3 09:31:41 2013 From: tra at popgen.net (Tiago Antao) Date: Tue, 3 Dec 2013 14:31:41 -0000 Subject: [Biopython-dev] 1.63 Release attempt Message-ID: <28b5ab441650c2b5a6a8fc1f9cf2d60e.squirrel@webmail.popgen.net> Dear all, Tomorrow I will try to release 1.63. If possible please keep the number of commits to the trunk to a minimum. I intend to pull the source in the morning (Western European time) and release 1.63 by the end of the day. If you have any serious issues with this plan, please get in touch ASAP. Thanks, Tiago From tiagoantao at gmail.com Wed Dec 4 16:02:24 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 4 Dec 2013 21:02:24 +0000 Subject: [Biopython-dev] 1.63 delayed Message-ID: Dear all, My sincere apologies but the release 1.63 will be delayed. I hope to be able to release it tomorrow (instead of today). Regards, Tiago -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From tiagoantao at gmail.com Wed Dec 4 16:13:21 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 4 Dec 2013 21:13:21 +0000 Subject: [Biopython-dev] Issue with test_Phylo_CDAO Message-ID: Dear all, I am trying to release 1.63 and I am getting a strange problem with the test_Python_CDAO module: If we run the test on a recent Ubuntu Linux Machine the following error occurs (Python 2.7, rdflib 2.4.2): ERROR: test_parse_0 (__main__.ParseTests) Parse the phylogenies in test.cdao. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Phylo_CDAO.py", line 43, in test_parse trees = list(bp._io.parse(filename, 'cdao')) File "/home/tiago/Dropbox/soft/bp-release/release/biopython-1.63/Bio/Phylo/_io.py", line 53, in parse for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs): File "/home/tiago/Dropbox/soft/bp-release/release/biopython-1.63/Bio/Phylo/CDAOIO.py", line 63, in parse return Parser(handle).parse(**kwargs) File "/home/tiago/Dropbox/soft/bp-release/release/biopython-1.63/Bio/Phylo/CDAOIO.py", line 94, in parse self.parse_handle_to_graph(**kwargs) File "/home/tiago/Dropbox/soft/bp-release/release/biopython-1.63/Bio/Phylo/CDAOIO.py", line 115, in parse_handle_to_graph graph.parse(file=self.handle, publicID=base_uri, format=parse_format) TypeError: parse() takes at least 2 arguments (3 given) We suspect that the interface to the RDFLib might have changed recently and that that might have broke the test code. Can someone (Eric, Ben?) that is more experienced with this code maybe give a suggestion and comment on the potentially importance of correcting this before releasing 1.63? Regards, Tiago -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From tiagoantao at gmail.com Thu Dec 5 04:03:06 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 5 Dec 2013 09:03:06 +0000 Subject: [Biopython-dev] 1.63 release Message-ID: Dear all, As far as I know there are only two issues discovered (and not fully resolved) during the release attempt: 1. TogoWS testing, where the field ti is not supported anymore on the pubmed database. I will log a bug after this email. The test code was temporarily amended. 2. The failing of test_Phylo_CDAO. I think this is a long running problem with the interface of the RDF library. I speculate that the test stopped working when there was some change on the RDF library interface. Unfortunately we did not have testing on this module setup anywhere. Also, I suspect that this bugs exists on previous biopython versions. My suggestion is to go ahead with the release (today) in spite of the problems above. Unless someone feels that any of the above should be solved before... -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From p.j.a.cock at googlemail.com Thu Dec 5 05:30:27 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 10:30:27 +0000 Subject: [Biopython-dev] [Biopython] type object 'RestrictionType' has no attribute 'size' In-Reply-To: References: Message-ID: On Thu, Dec 5, 2013 at 10:01 AM, Wibowo Arindrarto wrote: > Hi everyone, > > Christopher: Ah yes, I actually meant the IPython notebook (but I > guess it turns out the same occurs in the IPython console then :) ). > > Antony: That may be the case, too. But thanks for the pull request (I > think Peter has just looked at it, actually > https://github.com/biopython/biopython/pull/148). I do think there is > room for improvement there, especially since the code probably > predates modern Python conventions (& assumptions). > > Cheers, > Bow Even Frederic Sohm (the original author) agreed that the current Bio.Restriction code is too complicated (I called it 'magic' in our discussion back in 2010 regarding a Python 2.6 problem with super http://bugzilla.open-bio.org/show_bug.cgi?id=2604 or now https://redmine.open-bio.org/issues/2604 ). And also I dislike the fact it does one-based counting. However, none of our currently active developers really understand the code so changing it is hard - and backward compatibility constrains us greatly. I think the best route forward is to replace Bio.Restriction with a new less complicated implementation trying to follow modern Python conventions (using zero-based counting!), likey based on Antony's branch https://github.com/biopython/biopython/pull/148 and then deprecate and later remove Bio.Restriction. (We should continue that debate on the biopython-dev list, CC'd) In terms of Christopher's problems - it would not surprise me if they are specific to IPython since introspection of the 'magic' classes seems problematic. Regards, Peter From tiagoantao at gmail.com Thu Dec 5 08:53:27 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 5 Dec 2013 13:53:27 +0000 Subject: [Biopython-dev] 1.63 release In-Reply-To: References: Message-ID: Dear all, The CDAO issue was caused by a old version of rdflib being the default on ubuntu. As far as I can see we can go ahead and release today? Tiago On 5 December 2013 09:03, Tiago Ant?o wrote: > Dear all, > > As far as I know there are only two issues discovered (and not fully > resolved) during the release attempt: > > 1. TogoWS testing, where the field ti is not supported anymore on the > pubmed database. I will log a bug after this email. The test code was > temporarily amended. > > 2. The failing of test_Phylo_CDAO. I think this is a long running problem > with the interface of the RDF library. I speculate that the test stopped > working when there was some change on the RDF library interface. > Unfortunately we did not have testing on this module setup anywhere. Also, > I suspect that this bugs exists on previous biopython versions. > > My suggestion is to go ahead with the release (today) in spite of the > problems above. Unless someone feels that any of the above should be solved > before... > > -- > "The truth may be out there, but the lies are already in your head" - > Terry Pratchett > -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From binni at binnisb.com Thu Dec 5 11:29:02 2013 From: binni at binnisb.com (=?ISO-8859-1?Q?Brynjar_Sm=E1ri_Bjarnason?=) Date: Thu, 05 Dec 2013 17:29:02 +0100 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: <52A0A8E4.2080806@binnisb.com> References: <52A0A8E4.2080806@binnisb.com> Message-ID: <52A0A9CE.9080400@binnisb.com> Hello. I see CompoundLocation is quite new. I am currently using anaconda (Python 2.7.6 :: Anaconda 1.8.0 (64-bit)) and BioPython 1.62. I am fetching gi values and using SeqIO to parse them. So far most of them work but I found one that fail. Code: p = Entrez.efetch(db="protein", rettype="gp", retmode="text",id="494379") seq = SeqIO.read(p,"gb") Gives error: ValueError: CompoundLocation should have at least 2 parts With quite long stack trace and the last one being: /Bio/SeqFeature.pyc: 996 if len(self.parts) < 2: --> 997 raise ValueError("CompoundLocation should have at least 2 parts") Any suggestions on how to fix this, and maybe what is different with this gi from the rest of them (one gi that works: 10342)? Brynjar From p.j.a.cock at googlemail.com Thu Dec 5 11:43:20 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 16:43:20 +0000 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: <52A0A9CE.9080400@binnisb.com> References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On Thu, Dec 5, 2013 at 4:29 PM, Brynjar Sm?ri Bjarnason wrote: > > Hello. > > I see CompoundLocation is quite new. I am currently using anaconda > (Python 2.7.6 :: Anaconda 1.8.0 (64-bit)) and BioPython 1.62. > > I am fetching gi values and using SeqIO to parse them. So far most of > them work but I found one that fail. > > Code: > > p = Entrez.efetch(db="protein", rettype="gp", retmode="text",id="494379") > seq = SeqIO.read(p,"gb") > > Gives error: > ValueError: CompoundLocation should have at least 2 parts > > With quite long stack trace and the last one being: > > /Bio/SeqFeature.pyc: > 996 if len(self.parts) < 2: > --> 997 raise ValueError("CompoundLocation should have at > least 2 parts") > > Any suggestions on how to fix this, and maybe what is different with > this gi from the rest of them (one gi that works: 10342)? > > Brynjar Hi Brynjar, Hmm. Right now the website is very slow & won't load http://www.ncbi.nlm.nih.gov/protein/494379 and via Entrez I am getting a network error: urllib2.HTTPError: HTTP Error 502: Bad Gateway Where you able to save the file, and could you post it online (e.g. at http://gist.github.com)? Regards, Peter From p.j.a.cock at googlemail.com Thu Dec 5 11:46:46 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 16:46:46 +0000 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On Thu, Dec 5, 2013 at 4:43 PM, Peter Cock wrote: > On Thu, Dec 5, 2013 at 4:29 PM, Brynjar Sm?ri Bjarnason > wrote: >> >> Hello. >> >> I see CompoundLocation is quite new. I am currently using anaconda >> (Python 2.7.6 :: Anaconda 1.8.0 (64-bit)) and BioPython 1.62. >> >> I am fetching gi values and using SeqIO to parse them. So far most of >> them work but I found one that fail. >> >> Code: >> >> p = Entrez.efetch(db="protein", rettype="gp", retmode="text",id="494379") >> seq = SeqIO.read(p,"gb") >> >> Gives error: >> ValueError: CompoundLocation should have at least 2 parts >> >> With quite long stack trace and the last one being: >> >> /Bio/SeqFeature.pyc: >> 996 if len(self.parts) < 2: >> --> 997 raise ValueError("CompoundLocation should have at >> least 2 parts") >> >> Any suggestions on how to fix this, and maybe what is different with >> this gi from the rest of them (one gi that works: 10342)? >> >> Brynjar > > Hi Brynjar, > > Hmm. Right now the website is very slow & won't load > http://www.ncbi.nlm.nih.gov/protein/494379 > and via Entrez I am getting a network error: > urllib2.HTTPError: HTTP Error 502: Bad Gateway > > Where you able to save the file, and could you post it online > (e.g. at http://gist.github.com)? > > Regards, > > Peter Not to worry - the site did respond when I retried a bit later, and I can reproduce the parser error: >>> from Bio import SeqIO >>> r = SeqIO.read("1MRR_A.gp", "genbank") /Library/Python/2.7/site-packages/Bio/GenBank/__init__.py:1096: BiopythonParserWarning: Couldn't parse feature location: 'join(bond(84),bond(115),bond(118),bond(238))' % (location_line))) /Library/Python/2.7/site-packages/Bio/GenBank/__init__.py:1096: BiopythonParserWarning: Couldn't parse feature location: 'join(bond(115),bond(204),bond(238),bond(241))' % (location_line))) /Library/Python/2.7/site-packages/Bio/GenBank/__init__.py:1096: BiopythonParserWarning: Couldn't parse feature location: 'join(bond(194),bond(272))' % (location_line))) Traceback (most recent call last): File "", line 1, in File "/Library/Python/2.7/site-packages/Bio/SeqIO/__init__.py", line 646, in read first = next(iterator) File "/Library/Python/2.7/site-packages/Bio/SeqIO/__init__.py", line 582, in parse for r in i: File "/Library/Python/2.7/site-packages/Bio/GenBank/Scanner.py", line 467, in parse_records record = self.parse(handle, do_features) File "/Library/Python/2.7/site-packages/Bio/GenBank/Scanner.py", line 451, in parse if self.feed(handle, consumer, do_features): File "/Library/Python/2.7/site-packages/Bio/GenBank/Scanner.py", line 423, in feed self._feed_feature_table(consumer, self.parse_features(skip=False)) File "/Library/Python/2.7/site-packages/Bio/GenBank/Scanner.py", line 374, in _feed_feature_table consumer.location(location_string) File "/Library/Python/2.7/site-packages/Bio/GenBank/__init__.py", line 1083, in location operator=location_line[:i]) File "/Library/Python/2.7/site-packages/Bio/SeqFeature.py", line 1003, in __init__ raise ValueError("CompoundLocation should have at least 2 parts") ValueError: CompoundLocation should have at least 2 parts Peter From p.j.a.cock at googlemail.com Thu Dec 5 12:12:04 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 17:12:04 +0000 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On Thu, Dec 5, 2013 at 4:46 PM, Peter Cock wrote: > > Not to worry - the site did respond when I retried a bit later, and > I can reproduce the parser error: > >>>> from Bio import SeqIO >>>> r = SeqIO.read("1MRR_A.gp", "genbank") > BiopythonParserWarning: Couldn't parse feature location: > 'join(bond(84),bond(115),bond(118),bond(238))' > BiopythonParserWarning: Couldn't parse feature location: > 'join(bond(115),bond(204),bond(238),bond(241))' > BiopythonParserWarning: Couldn't parse feature location: > 'join(bond(194),bond(272))' > ... > ValueError: CompoundLocation should have at least 2 parts The problem is the bond locations, and in particular while the parser gave up on the ones with a warning, it fell over the single bond entry, bond(196). This is partly due to a change in the use of the bond term, which used to be a compound entry like bond(194,272). Also the GenBank parser was and is primarily used on nucleotide sequences rather than GenPept files which are occasionally more weird (like here!). A short term hack would be to strip out the bond term (with a warning) and parse the remainder as a simple join or single residue accordingly. Would that work for you - do you need the bond bit? Peter From tiagoantao at gmail.com Thu Dec 5 12:24:41 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 5 Dec 2013 17:24:41 +0000 Subject: [Biopython-dev] 1.63 (almost) Message-ID: Dear all, I have almost completed the process of releasing 1.63: Source and binaries are already available at http://biopython.org/wiki/Download The API docs have been updated Please feel free to test/comment. If there are no problems I will finalize the release (announcements + pypi) tomorrow -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From p.j.a.cock at googlemail.com Thu Dec 5 13:03:41 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 18:03:41 +0000 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On Thu, Dec 5, 2013 at 5:12 PM, Peter Cock wrote: > > A short term hack would be to strip out the bond term > (with a warning) and parse the remainder as a simple > join or single residue accordingly. > > Would that work for you - do you need the bond bit? Proposed branch with that change: https://github.com/peterjc/biopython/tree/gp_bond Sample output, using the same example GenPept file: >>> from Bio import SeqIO >>> r = SeqIO.read("1MRR_A.gp", "genbank") Bio/GenBank/__init__.py:1011: BiopythonParserWarning: Dropping bond qualifier in feature location warnings.warn("Dropping bond qualifier in feature location", BiopythonParserWarning) >>> for f in r.features: print f.type, f.location ... source [0:375] Region [27:340] SecStr [34:46] Site order{[36:37], [43:44], [108:110], [112:113], [115:117], [119:120], [122:123], [136:138], [140:141]} Site order{[47:48], [83:84], [114:115], [117:118], [121:122], [235:237], [240:241]} SecStr [56:65] SecStr [66:87] Site order{[83:84], [114:115], [117:118], [203:204], [237:238], [240:241]} Het join{[83:84], [114:115], [117:118], [237:238]} SecStr [101:129] Het join{[114:115], [203:204], [237:238], [240:241]} Site [121:122] SecStr [132:140] SecStr [142:151] SecStr [152:169] SecStr [171:177] SecStr [179:185] SecStr [185:216] Het join{[193:194], [271:272]} Het [195:196] Het join{[195:196], [195:196]} Het join{[209:210], [213:214], [213:214]} SecStr [224:253] SecStr [259:269] Bond bond{[267:268], [271:272]} SecStr [269:285] Het join{[283:284], [304:305], [308:309], [304:305]} SecStr [300:319] Useful? Peter From binni at binnisb.com Thu Dec 5 13:06:45 2013 From: binni at binnisb.com (=?ISO-8859-1?Q?Brynjar_Sm=E1ri_Bjarnason?=) Date: Thu, 5 Dec 2013 19:06:45 +0100 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: I'll ask one who knows but I think I could skip using the bonds. Can you suggest how I can ignore the bonds in efetch response, or the parser? Thanks a lot for looking at this! On 5 Dec 2013 18:12, "Peter Cock" wrote: > On Thu, Dec 5, 2013 at 4:46 PM, Peter Cock > wrote: > > > > Not to worry - the site did respond when I retried a bit later, and > > I can reproduce the parser error: > > > >>>> from Bio import SeqIO > >>>> r = SeqIO.read("1MRR_A.gp", "genbank") > > BiopythonParserWarning: Couldn't parse feature location: > > 'join(bond(84),bond(115),bond(118),bond(238))' > > BiopythonParserWarning: Couldn't parse feature location: > > 'join(bond(115),bond(204),bond(238),bond(241))' > > BiopythonParserWarning: Couldn't parse feature location: > > 'join(bond(194),bond(272))' > > ... > > ValueError: CompoundLocation should have at least 2 parts > > The problem is the bond locations, and in particular while the > parser gave up on the ones with a warning, it fell over the > single bond entry, bond(196). > > This is partly due to a change in the use of the bond term, > which used to be a compound entry like bond(194,272). > Also the GenBank parser was and is primarily used on > nucleotide sequences rather than GenPept files which are > occasionally more weird (like here!). > > A short term hack would be to strip out the bond term > (with a warning) and parse the remainder as a simple > join or single residue accordingly. > > Would that work for you - do you need the bond bit? > > Peter > From p.j.a.cock at googlemail.com Thu Dec 5 13:06:48 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 18:06:48 +0000 Subject: [Biopython-dev] 1.63 (almost) In-Reply-To: References: Message-ID: On Thu, Dec 5, 2013 at 5:24 PM, Tiago Ant?o wrote: > Dear all, > > I have almost completed the process of releasing 1.63: > Source and binaries are already available at > http://biopython.org/wiki/Download > The API docs have been updated > > Please feel free to test/comment. If there are no problems I will finalize > the release (announcements + pypi) tomorrow > Thanks Tiago, The fact we got three issues reported the very evening is just bad luck (PDB handles vs filenames, SeqIO.index_db relative paths, and GenPept bond location parsing), but none of these are regressions - they all existed in Biopython 1.62 as well. Maybe we can aim to get the Biopython 1.64 release out early next year? Regards, Peter From binni at binnisb.com Thu Dec 5 13:08:20 2013 From: binni at binnisb.com (=?ISO-8859-1?Q?Brynjar_Sm=E1ri_Bjarnason?=) Date: Thu, 5 Dec 2013 19:08:20 +0100 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: Thanks, will look at this when I'm at the computer :-) On 5 Dec 2013 19:06, "Brynjar Sm?ri Bjarnason" wrote: > I'll ask one who knows but I think I could skip using the bonds. Can you > suggest how I can ignore the bonds in efetch response, or the parser? > > Thanks a lot for looking at this! > On 5 Dec 2013 18:12, "Peter Cock" wrote: > >> On Thu, Dec 5, 2013 at 4:46 PM, Peter Cock >> wrote: >> > >> > Not to worry - the site did respond when I retried a bit later, and >> > I can reproduce the parser error: >> > >> >>>> from Bio import SeqIO >> >>>> r = SeqIO.read("1MRR_A.gp", "genbank") >> > BiopythonParserWarning: Couldn't parse feature location: >> > 'join(bond(84),bond(115),bond(118),bond(238))' >> > BiopythonParserWarning: Couldn't parse feature location: >> > 'join(bond(115),bond(204),bond(238),bond(241))' >> > BiopythonParserWarning: Couldn't parse feature location: >> > 'join(bond(194),bond(272))' >> > ... >> > ValueError: CompoundLocation should have at least 2 parts >> >> The problem is the bond locations, and in particular while the >> parser gave up on the ones with a warning, it fell over the >> single bond entry, bond(196). >> >> This is partly due to a change in the use of the bond term, >> which used to be a compound entry like bond(194,272). >> Also the GenBank parser was and is primarily used on >> nucleotide sequences rather than GenPept files which are >> occasionally more weird (like here!). >> >> A short term hack would be to strip out the bond term >> (with a warning) and parse the remainder as a simple >> join or single residue accordingly. >> >> Would that work for you - do you need the bond bit? >> >> Peter >> > From binni at binnisb.com Fri Dec 6 03:03:45 2013 From: binni at binnisb.com (=?ISO-8859-1?Q?Brynjar_Sm=E1ri_Bjarnason?=) Date: Fri, 6 Dec 2013 09:03:45 +0100 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On 5 December 2013 19:08, Brynjar Sm?ri Bjarnason wrote: > Thanks, will look at this when I'm at the computer :-) > On 5 Dec 2013 19:06, "Brynjar Sm?ri Bjarnason" wrote: > >> I'll ask one who knows but I think I could skip using the bonds. Can you >> suggest how I can ignore the bonds in efetch response, or the parser? >> >> Thanks a lot for looking at this! >> On 5 Dec 2013 18:12, "Peter Cock" wrote: >> >>> On Thu, Dec 5, 2013 at 4:46 PM, Peter Cock >>> wrote: >>> > >>> > Not to worry - the site did respond when I retried a bit later, and >>> > I can reproduce the parser error: >>> > >>> >>>> from Bio import SeqIO >>> >>>> r = SeqIO.read("1MRR_A.gp", "genbank") >>> > BiopythonParserWarning: Couldn't parse feature location: >>> > 'join(bond(84),bond(115),bond(118),bond(238))' >>> > BiopythonParserWarning: Couldn't parse feature location: >>> > 'join(bond(115),bond(204),bond(238),bond(241))' >>> > BiopythonParserWarning: Couldn't parse feature location: >>> > 'join(bond(194),bond(272))' >>> > ... >>> > ValueError: CompoundLocation should have at least 2 parts >>> >>> The problem is the bond locations, and in particular while the >>> parser gave up on the ones with a warning, it fell over the >>> single bond entry, bond(196). >>> >>> This is partly due to a change in the use of the bond term, >>> which used to be a compound entry like bond(194,272). >>> Also the GenBank parser was and is primarily used on >>> nucleotide sequences rather than GenPept files which are >>> occasionally more weird (like here!). >>> >>> A short term hack would be to strip out the bond term >>> (with a warning) and parse the remainder as a simple >>> join or single residue accordingly. >>> >>> Would that work for you - do you need the bond bit? >>> >>> Peter >>> >> I believe for our part that leaving the bond bit out is fine so your patch should work well. Any suggestions on a good way to apply this patch? Should I build Biopython from that branch or clone latest stable and apply the patch before building? Thank you Brynjar From binni at binnisb.com Fri Dec 6 04:55:18 2013 From: binni at binnisb.com (=?ISO-8859-1?Q?Brynjar_Sm=E1ri_Bjarnason?=) Date: Fri, 6 Dec 2013 10:55:18 +0100 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On 6 December 2013 09:03, Brynjar Sm?ri Bjarnason wrote: > On 5 December 2013 19:08, Brynjar Sm?ri Bjarnason wrote: > >> Thanks, will look at this when I'm at the computer :-) >> On 5 Dec 2013 19:06, "Brynjar Sm?ri Bjarnason" wrote: >> >>> I'll ask one who knows but I think I could skip using the bonds. Can you >>> suggest how I can ignore the bonds in efetch response, or the parser? >>> >>> Thanks a lot for looking at this! >>> On 5 Dec 2013 18:12, "Peter Cock" wrote: >>> >>>> On Thu, Dec 5, 2013 at 4:46 PM, Peter Cock >>>> wrote: >>>> > >>>> > Not to worry - the site did respond when I retried a bit later, and >>>> > I can reproduce the parser error: >>>> > >>>> >>>> from Bio import SeqIO >>>> >>>> r = SeqIO.read("1MRR_A.gp", "genbank") >>>> > BiopythonParserWarning: Couldn't parse feature location: >>>> > 'join(bond(84),bond(115),bond(118),bond(238))' >>>> > BiopythonParserWarning: Couldn't parse feature location: >>>> > 'join(bond(115),bond(204),bond(238),bond(241))' >>>> > BiopythonParserWarning: Couldn't parse feature location: >>>> > 'join(bond(194),bond(272))' >>>> > ... >>>> > ValueError: CompoundLocation should have at least 2 parts >>>> >>>> The problem is the bond locations, and in particular while the >>>> parser gave up on the ones with a warning, it fell over the >>>> single bond entry, bond(196). >>>> >>>> This is partly due to a change in the use of the bond term, >>>> which used to be a compound entry like bond(194,272). >>>> Also the GenBank parser was and is primarily used on >>>> nucleotide sequences rather than GenPept files which are >>>> occasionally more weird (like here!). >>>> >>>> A short term hack would be to strip out the bond term >>>> (with a warning) and parse the remainder as a simple >>>> join or single residue accordingly. >>>> >>>> Would that work for you - do you need the bond bit? >>>> >>>> Peter >>>> >>> > I believe for our part that leaving the bond bit out is fine so your patch > should work well. > > Any suggestions on a good way to apply this patch? Should I build > Biopython from that branch or clone latest stable and apply the patch > before building? > > Thank you > > Brynjar > The patch works for me! Thank you. I cloned the official Biopython and applied your commit as patch before building. My gis' that were failing now work. Brynjar From p.j.a.cock at googlemail.com Fri Dec 6 05:19:42 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 6 Dec 2013 10:19:42 +0000 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On Fri, Dec 6, 2013 at 9:55 AM, Brynjar Sm?ri Bjarnason wrote: > > The patch works for me! Thank you. > > I cloned the official Biopython and applied your commit as patch before > building. My gis' that were failing now work. > > Brynjar Great - thank you for confirming this fixes the problem. I think we just missed the window for getting this into the Biopython 1.63 release (so you'll effectively be running Biopython 1.63 + this patch). Regards, Peter From binni at binnisb.com Fri Dec 6 05:26:46 2013 From: binni at binnisb.com (=?ISO-8859-1?Q?Brynjar_Sm=E1ri_Bjarnason?=) Date: Fri, 6 Dec 2013 11:26:46 +0100 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On 6 December 2013 11:19, Peter Cock wrote: > On Fri, Dec 6, 2013 at 9:55 AM, Brynjar Sm?ri Bjarnason > wrote: > > > > The patch works for me! Thank you. > > > > I cloned the official Biopython and applied your commit as patch before > > building. My gis' that were failing now work. > > > > Brynjar > > Great - thank you for confirming this fixes the problem. > > I think we just missed the window for getting this into the > Biopython 1.63 release (so you'll effectively be running > Biopython 1.63 + this patch). > > Regards, > > Peter > I saw that 1.63 was just released, so yes, I run Biopython 1.63 + patch. Is it likely that this will find its way into 1.64? Not that I am in any rush since it is working, just wondering for the one that needs to maintain the Biopython versions. Best reagrds, Brynjar From p.j.a.cock at googlemail.com Fri Dec 6 05:38:08 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 6 Dec 2013 10:38:08 +0000 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On Fri, Dec 6, 2013 at 10:26 AM, Brynjar Sm?ri Bjarnason wrote: > > I saw that 1.63 was just released, so yes, I run Biopython 1.63 + patch. The official release announcement should be later today, but yes on GitHub you can (and have) get it already ;) > Is it likely that this will find its way into 1.64? Not that I am in any > rush since it is working, just wondering for the one that needs to maintain > the Biopython versions. Yes, once the release is formally announced I intend to commit this change (and a test - probably the very same example GenPept file you reported). It is possible that we'll have a better solution in place for Biopython 1.64 to handle these locations without simple dropping the bond term. Peter From tiagoantao at gmail.com Fri Dec 6 06:27:28 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 6 Dec 2013 11:27:28 +0000 Subject: [Biopython-dev] Biopython 1.63 released Message-ID: Source distributions and Windows installers for Biopython 1.63 are now available from the downloads page on the official Biopython website and (soon) from the Python Package Index (PyPI). The current version removed the requirement of the 2to3 library. This was made possible by dropping Python 2.5 (and Jython 2.5). This release of Biopython supports Python 2.6 and 2.7, and also Python 3.3. The Biopython Tutorial & Cookbook, and the docstring examples in the source code, now use the Python 3 style print function in place of the Python 2 style print statement. This language feature is available under Python 2.6 and 2.7 via: from __future__ import print_function Similarly we now use the Python 3 style built-in next function in place of the Python 2 style iterators? .next() method. This language feature is also available under Python 2.6 and 2.7. The restriction enzyme list in Bio.Restriction has been updated to the December 2013 release of REBASE. Many thanks to the Biopython developers and community for making this release possible, especially the following contributors: Chris Mitchell (first contribution) Christian Brueffer Eric Talevich Gokcen Eraslan (first contribution) Josha Inglis (first contribution) Konstantin Tretyakov (first contribution) Lenna Peterson Martin Mokrejs Nigel Delaney (first contribution) Peter Cock Sergei Lebedev (first contribution) Tiago Antao Wayne Decatur (first contribution) Wibowo ?Bow? Arindrarto From tiagoantao at gmail.com Fri Dec 6 07:26:09 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 6 Dec 2013 12:26:09 +0000 Subject: [Biopython-dev] Releasing biopython - lessons Message-ID: Dear all, With much help from Peter, 1.63 was released. I have a few comments/ideas I would like to share: 1. I think that there is a need to maintain an exhaustive list of dependencies that a full biopython distribution will need (python packages and external applications). I am offering to do that here: http://biopython.org/wiki/List_of_applications_executed_via_Biopython (over the next few days). 2. I was planning on creating a Linux virtual box image (and make it available) with everything that is needed to fully test and run a Biopython distribution. This would allow to have an extremely stable environment for testing (testing PCs normally have other uses and things can be broken by other stuff). 3. I think that it would be nice to change run_tests to have an option to run in an "extremely picky mode": Basically fail if there is a warning that is not a Deprecation warning. Some silent warnings can actually be somewhat irritating if not understood/acted upon (e.g the RDFlib case). This would be run before release and maybe once a week on buildbot? 1 and 2 are trivial (and go very well together) and I am offering to do it in the very next few days if that is OK. Any views on 3? This would require changing run_tests and that might not be easy/desirable... My 2p, Tiago -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From chapmanb at 50mail.com Fri Dec 6 10:19:52 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 06 Dec 2013 10:19:52 -0500 Subject: [Biopython-dev] Releasing biopython - lessons In-Reply-To: References: Message-ID: <87wqjij9bb.fsf@fastmail.fm> Tiago; > With much help from Peter, 1.63 was released. I have a few comments/ideas I > would like to share: > > 1. I think that there is a need to maintain an exhaustive list of > dependencies that a full biopython distribution will need (python packages > and external applications). I am offering to do that here: > http://biopython.org/wiki/List_of_applications_executed_via_Biopython (over > the next few days). > > 2. I was planning on creating a Linux virtual box image (and make it > available) with everything that is needed to fully test and run a Biopython > distribution. This would allow to have an extremely stable environment for > testing (testing PCs normally have other uses and things can be broken by > other stuff). Great idea. One of the things we've been doing a lot of work on in CloudBioLinux is making it easy to automate these type of local installs. The advantage is that you have full documentation (lists of packages), and automated way to install it, and a script you can use for generating images for testing/distribution purposes. I put together a starter package for a Biopython 'flavor' here: https://github.com/chapmanb/cloudbiolinux/tree/master/contrib/flavor/biopython You can run with: git clone https://github.com/chapmanb/cloudbiolinux.git fab -H localhost install_biolinux:flavor=biopython By default it will install an Anaconda Python, Biopython dependencies and associated tools into ~/biopython. If you like this approach, happy to help with pointers on adding more tools. Thanks again for taking this on, Brad From mjldehoon at yahoo.com Mon Dec 9 01:33:01 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 8 Dec 2013 22:33:01 -0800 (PST) Subject: [Biopython-dev] [biopython] Missing DTD files (#260) In-Reply-To: Message-ID: <1386570781.7166.YahooMailBasic@web164001.mail.gq1.yahoo.com> Current we are using os.path.expanduser('~') /.biopython/Bio/Entrez/DTDs to look for locally stored DTDs. This should work on Windows also. Then I would suggest the following if a DTD file is missing: 1) Print a non-scary warning message that we will attempt to download the DTD; 2) Download the DTD; 3) Try to store it in the local DTD directory. If this fails (e.g. due to file permissions or whatnot), print another warning message; 4) Use the downloaded DTD to parse the XML. Any final objections? Best, -Michiel. -------------------------------------------- On Tue, 12/3/13, Peter Cock wrote: Subject: Re: [Biopython-dev] [biopython] Missing DTD files (#260) To: "Michiel de Hoon" Cc: "Biopython-Dev Mailing List" Date: Tuesday, December 3, 2013, 5:38 AM On Sun, Dec 1, 2013 at 3:28 AM, Michiel de Hoon wrote: > How would people feel about Biopython always downloading DTD files > on the fly instead of distributing them with Biopython? > > After downloading and parsing a DTD file, we can keep it in memory > so we won't need to parse the same DTD file over and over again. > So the impact on speed will be minimal. > > If we do so, we'll never run into the problem of missing DTD files. The > downside of course is that we will need internet access to parse any > XML file through Bio.Entrez. But maybe in today's world that is acceptable. Requiring network access would be annoying for offline work (e.g. how we usually run the automated tests), but most of the NCBI Entrez XML files will (I expect) will be downloaded and immediately parsed. So for usability this seems OK. Automatic caching to disk (without a scary warning) seems like a better idea than always downloading the DTD files on demand (which seems wasteful of bandwidth and more likely to give intermittent errors), although as you have noted before there is the open question of where to put this files (including where on Windows): http://lists.open-bio.org/pipermail/biopython-dev/2010-October/008310.html Regards, Peter From p.j.a.cock at googlemail.com Mon Dec 9 05:13:14 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 9 Dec 2013 10:13:14 +0000 Subject: [Biopython-dev] [biopython] Missing DTD files (#260) In-Reply-To: <1386570781.7166.YahooMailBasic@web164001.mail.gq1.yahoo.com> References: <1386570781.7166.YahooMailBasic@web164001.mail.gq1.yahoo.com> Message-ID: On Mon, Dec 9, 2013 at 6:33 AM, Michiel de Hoon wrote: > Current we are using os.path.expanduser('~') /.biopython/Bio/Entrez/DTDs to look for locally stored DTDs. > This should work on Windows also. Well partly - it would create a "scary" .biopython folder which would NOT be hidden by default. On Windows we could deliberately mark that folder as hidden (a file system attribute, used instead of the leading dot convention from Unix). However, I think we should really be using something under: $HOME\Local Settings\Application Data A little research might be needed for how to get that setting (if possible without reading the registry and the additional Python dependency that would entail). > Then I would suggest the following if a DTD file is missing: > 1) Print a non-scary warning message that we will attempt to download the DTD; > 2) Download the DTD; > 3) Try to store it in the local DTD directory. If this fails (e.g. due to file permissions or whatnot), print another warning message; > 4) Use the downloaded DTD to parse the XML. > Any final objections? Only with regard to the location of the cache on Windows. Peter From mjldehoon at yahoo.com Mon Dec 9 09:20:23 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 9 Dec 2013 06:20:23 -0800 (PST) Subject: [Biopython-dev] [biopython] Missing DTD files (#260) In-Reply-To: Message-ID: <1386598823.52509.YahooMailBasic@web164004.mail.gq1.yahoo.com> Can somebody with a Windows computer check what os.path.expanduser('~') refers to on Windows? Or how matplotlib solves this on Windows (they are storing files under $HOME/.matplotlib on unix-like systems; I don't know what they use on Windows). Thanks, -Michiel. -------------------------------------------- On Mon, 12/9/13, Peter Cock wrote: Subject: Re: [Biopython-dev] [biopython] Missing DTD files (#260) To: "Michiel de Hoon" Cc: "Biopython-Dev Mailing List" Date: Monday, December 9, 2013, 5:13 AM On Mon, Dec 9, 2013 at 6:33 AM, Michiel de Hoon wrote: > Current we are using os.path.expanduser('~') /.biopython/Bio/Entrez/DTDs to look for locally stored DTDs. > This should work on Windows also. Well partly - it would create a "scary" .biopython folder which would NOT be hidden by default. On Windows we could deliberately mark that folder as hidden (a file system attribute, used instead of the leading dot convention from Unix). However, I think we should really be using something under: $HOME\Local Settings\Application Data A little research might be needed for how to get that setting (if possible without reading the registry and the additional Python dependency that would entail). > Then I would suggest the following if a DTD file is missing: > 1) Print a non-scary warning message that we will attempt to download the DTD; > 2) Download the DTD; > 3) Try to store it in the local DTD directory. If this fails (e.g. due to file permissions or whatnot), print another warning message; > 4) Use the downloaded DTD to parse the XML. > Any final objections? Only with regard to the location of the cache on Windows. Peter From tiagoantao at gmail.com Mon Dec 9 09:40:12 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 9 Dec 2013 14:40:12 +0000 Subject: [Biopython-dev] [biopython] Missing DTD files (#260) In-Reply-To: <1386598823.52509.YahooMailBasic@web164004.mail.gq1.yahoo.com> References: <1386598823.52509.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: Hi, I do not have access here to a windows machine (later in the day I can). But: /.biopython/Bio/Entrez/DTDs should not be correct. You probably want to replace "/" with os.sep, say: os.sep.join([".biopython", "Bio", "Entrez", "DTDs"]) which will give you \.biopython\Bio\Entrez\DTDs On 9 December 2013 14:20, Michiel de Hoon wrote: > Can somebody with a Windows computer check what os.path.expanduser('~') > refers to on Windows? Or how matplotlib solves this on Windows (they are > storing files under $HOME/.matplotlib on unix-like systems; I don't know > what they use on Windows). > > Thanks, > -Michiel. > > > -------------------------------------------- > On Mon, 12/9/13, Peter Cock wrote: > > Subject: Re: [Biopython-dev] [biopython] Missing DTD files (#260) > To: "Michiel de Hoon" > Cc: "Biopython-Dev Mailing List" > Date: Monday, December 9, 2013, 5:13 AM > > On Mon, Dec 9, 2013 at 6:33 AM, > Michiel de Hoon > wrote: > > Current we are using os.path.expanduser('~') > /.biopython/Bio/Entrez/DTDs to look for locally stored > DTDs. > > This should work on Windows also. > > Well partly - it would create a "scary" .biopython folder > which would NOT be hidden by default. On Windows > we could deliberately mark that folder as hidden (a > file system attribute, used instead of the leading dot > convention from Unix). However, I think we should > really be using something under: > > $HOME\Local Settings\Application Data > > A little research might be needed for how to get that > setting (if possible without reading the registry and the > additional Python dependency that would entail). > > > Then I would suggest the following if a DTD file is > missing: > > 1) Print a non-scary warning message that we will > attempt to download the DTD; > > 2) Download the DTD; > > 3) Try to store it in the local DTD directory. If this > fails (e.g. due to file permissions or whatnot), print > another warning message; > > 4) Use the downloaded DTD to parse the XML. > > Any final objections? > > Only with regard to the location of the cache on Windows. > > Peter > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From tiagoantao at gmail.com Mon Dec 9 09:41:43 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 9 Dec 2013 14:41:43 +0000 Subject: [Biopython-dev] Releasing biopython - lessons In-Reply-To: <87wqjij9bb.fsf@fastmail.fm> References: <87wqjij9bb.fsf@fastmail.fm> Message-ID: Hi Brad, > If you like this approach, happy to help with pointers on adding more > tools. Thanks again for taking this on, > > This definitely seems the way to go. I will try to build up a full dependency list along with testing this tomorrow. Tiago -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From p.j.a.cock at googlemail.com Mon Dec 9 10:09:50 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 9 Dec 2013 15:09:50 +0000 Subject: [Biopython-dev] Releasing biopython - lessons In-Reply-To: References: <87wqjij9bb.fsf@fastmail.fm> Message-ID: On Mon, Dec 9, 2013 at 2:41 PM, Tiago Ant?o wrote: > Hi Brad, > >> If you like this approach, happy to help with pointers on adding more >> tools. Thanks again for taking this on, > > This definitely seems the way to go. I will try to build up a full > dependency list along with testing this tomorrow. > > Tiago Excellent plan - thank you both for tackling this :) Peter From p.j.a.cock at googlemail.com Mon Dec 9 10:52:22 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 9 Dec 2013 15:52:22 +0000 Subject: [Biopython-dev] [biopython] Missing DTD files (#260) In-Reply-To: References: <1386598823.52509.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: > On 9 December 2013 14:20, Michiel de Hoon wrote: >> >> Can somebody with a Windows computer check what os.path.expanduser('~') >> refers to on Windows? Or how matplotlib solves this on Windows (they are >> storing files under $HOME/.matplotlib on unix-like systems; I don't know >> what they use on Windows). >> >> Thanks, >> -Michiel. On Windows XP, using Python 2.6, 2.7, 3.3, or PyPy 2.2, C:\Documents and Settings\pc40583>c:\python26\python Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> print(os.path.expanduser('~')) C:\Documents and Settings\pcock i.e. As expected, it found my home directory. On Mon, Dec 9, 2013 at 2:40 PM, Tiago Ant?o wrote: > Hi, > > I do not have access here to a windows machine (later in the day I can). > But: > /.biopython/Bio/Entrez/DTDs > > should not be correct. You probably want to replace "/" with os.sep, say: > os.sep.join([".biopython", "Bio", "Entrez", "DTDs"]) > > which will give you > \.biopython\Bio\Entrez\DTDs Yes in theory, but in practice using the Unix style slashes works just fine in my experience (and they are used in many of our unit tests without problem). Even a mix of slashes works. Note style-wise it would be preferable to use os.path.join(...) rather than the string join method os.sep.join(...) as shown. Peter From kashyap.cc at gmail.com Mon Dec 9 11:29:05 2013 From: kashyap.cc at gmail.com (Kashyap Chhatbar) Date: Mon, 9 Dec 2013 16:29:05 +0000 Subject: [Biopython-dev] Introduction from a newbie; Tiny contribution Message-ID: Hi, I am happy to join the dev mailing list for Biopython. I would be definitely interested in learning more and hopefully at some point contributing to the biopython source. A few days ago, I created an issue on the official github repository of biopython (#267). I think I have come up with a noobish contribution to convert the file name (if relative path is given) to absolute paths when creating sqlite3 database using Bio.SeqIO.index_db() function. (commit de9ba35). I have not looked up too much of the biopython code so as to conclude whether introducing the absolute path for filenames is as such a good idea or not but I thought of this because Peter J Cock hinted of improving the code for relative paths. Cheers, Kashyap From p.j.a.cock at googlemail.com Mon Dec 9 11:42:43 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 9 Dec 2013 16:42:43 +0000 Subject: [Biopython-dev] Introduction from a newbie; Tiny contribution In-Reply-To: References: Message-ID: On Mon, Dec 9, 2013 at 4:29 PM, Kashyap Chhatbar wrote: > Hi, > > I am happy to join the dev mailing list for Biopython. I would be > definitely interested in learning more and hopefully at some point > contributing to the biopython source. Thank you and welcome :) > A few days ago, I created an issue on > the official github repository of biopython > (#267). Yes, thank for for raising this - it is something we can improve. > I think I have come up with a noobish contribution to convert the file name > (if relative path is given) to absolute paths when creating sqlite3 > database using Bio.SeqIO.index_db() function. (commit > de9ba35). > I have not looked up too much of the biopython code so as to conclude > whether introducing the absolute path for filenames is as such a good idea > or not but I thought of this because Peter J Cock hinted of improving the > code for relative paths. I don't think storing the absolute path is the best plan - I have tried to better explain what I meant by using relative filenames on the GitHub issue, https://github.com/biopython/biopython/issues/267 Is that clearer? Thanks, Peter From p.j.a.cock at googlemail.com Tue Dec 10 05:43:32 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 10 Dec 2013 10:43:32 +0000 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On Fri, Dec 6, 2013 at 10:38 AM, Peter Cock wrote: > On Fri, Dec 6, 2013 at 10:26 AM, Brynjar Sm?ri Bjarnason wrote: > >> Is it likely that this will find its way into 1.64? Not that I am in any >> rush since it is working, just wondering for the one that needs to maintain >> the Biopython versions. > > Yes, once the release is formally announced I intend to commit this > change (and a test - probably the very same example GenPept file > you reported). It is possible that we'll have a better solution in place > for Biopython 1.64 to handle these locations without simple dropping > the bond term. This workaround has now been committed to the main branch, https://github.com/biopython/biopython/commit/ddbe6dc5dc08aa5079dfd29bc240651675c48427 Along with the example file as a test case: https://github.com/biopython/biopython/commit/44ec2b2ce4687055a50f246ac264690ed664326c Thank you, Peter From kashyap.cc at gmail.com Tue Dec 10 08:15:22 2013 From: kashyap.cc at gmail.com (Kashyap Chhatbar) Date: Tue, 10 Dec 2013 13:15:22 +0000 Subject: [Biopython-dev] Introduction from a newbie; Tiny contribution In-Reply-To: References: Message-ID: On Mon, Dec 9, 2013 at 4:42 PM, Peter Cock wrote: > I don't think storing the absolute path is the best plan - I have > tried to better explain what I meant by using relative filenames on > the GitHub issue, > > https://github.com/biopython/biopython/issues/267 > > Is that clearer? > I have tried to understand the options and have commented further. Cheers, Kashyap From yeyanbo289 at gmail.com Thu Dec 12 10:09:10 2013 From: yeyanbo289 at gmail.com (Yanbo Ye) Date: Thu, 12 Dec 2013 23:09:10 +0800 Subject: [Biopython-dev] tutorial translation and rst files Message-ID: Hi guys, We almost completed the Chinese translation of the Biopython tutorial and fixed the format errors caused by latex to rst conversion. It's based on the Update ? 22 March 2013 and the repository is here: https://github.com/bigwiv/Biopython-cn , including the English version . I heard there was an discussion about switching the tutorial format from latex to rst or Sphinx/reStructuredText port. don't whether these files are useful for this task. I'm not an expert in rst format and there must be other errors in those files. Any suggestions? Best, Yanbo -- *Yanbo Ye* *Guangzhou Institutes of Biomedicine and Health, * *Chinese Academy of Sciences* *190 Kaiyuan Avenue, Science Park, Guangzhou, China* *Email: ye_yanbo at gibh.ac.cn * *Web: http://www.yeyanbo.com * *Phone: (86)-020-32093810* From mjldehoon at yahoo.com Thu Dec 12 10:16:14 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 12 Dec 2013 07:16:14 -0800 (PST) Subject: [Biopython-dev] [biopython] Missing DTD files (#260) In-Reply-To: Message-ID: <1386861374.64762.YahooMailBasic@web164003.mail.gq1.yahoo.com> Thanks Peter. I believe that matplotlib uses os.path.expanduser('~') /.matplotlib/matplotlibrc. Then shall we use the analogous for Biopython, so os.path.expanduser('~')/.biopython/Bio/Entrez/DTDs? Best, -Michiel. On Windows XP, using Python 2.6, 2.7, 3.3, or PyPy 2.2, C:\Documents and Settings\pc40583>c:\python26\python Python 2.6 (r26:66721, Oct? 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> print(os.path.expanduser('~')) C:\Documents and Settings\pcock i.e. As expected, it found my home directory. From p.j.a.cock at googlemail.com Thu Dec 12 10:58:21 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 12 Dec 2013 15:58:21 +0000 Subject: [Biopython-dev] tutorial translation and rst files In-Reply-To: References: Message-ID: On Thu, Dec 12, 2013 at 3:09 PM, Yanbo Ye wrote: > Hi guys, > > We almost completed the Chinese translation of the Biopython tutorial and > fixed the format errors caused by latex to rst conversion. It's based on > the Update ? 22 March 2013 and the repository is here: > https://github.com/bigwiv/Biopython-cn , including the English version . > > I heard there was an discussion about switching the tutorial format from > latex to rst or Sphinx/reStructuredText port. don't whether these files are > useful for this task. I'm not an expert in rst format and there must be > other errors in those files. Any suggestions? > > Best, > Yanbo Hi Yanbo, That looks impressive - the individual chapters are fast on GitHub, but even the whole English document displays quite quickly there: https://github.com/bigwiv/Biopython-cn/blob/master/en/allinone.rst Sadly the all-in-one document makes GitHub struggle: https://github.com/bigwiv/Biopython-cn/blob/master/cn/allinone.rst Did you have any trouble converting particular parts of the Tutorial? There are a few places where we used LaTeX for complex mathematical formulas - that seems to be an rst weakness. Can you post compiled HTML and PDF output (English & Chinese) as well? That would be a fairer way to look at the output, rather than just seeing how GitHub renders it. Regards, Peter P.S. The six month old Tutorial.rst in the repository root does not seem to work? From p.j.a.cock at googlemail.com Thu Dec 12 11:30:07 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 12 Dec 2013 16:30:07 +0000 Subject: [Biopython-dev] Fwd: [biopython] TreeConstruction and Consensus modules from GSoC 2013 (#270) In-Reply-To: References: Message-ID: Hello Biopythoneers, For those of you not following the GitHub repository, this is quite a big and important pull request :) Please take a look! Thanks, Peter ---------- Forwarded message ---------- From: yeyanbo Date: Thu, Dec 12, 2013 at 2:16 PM Subject: [biopython] TreeConstruction and Consensus modules from GSoC 2013 (#270) To: biopython/biopython two module files created in Bio/Phylo: TreeConstruction.py, Consensus.py ; two test files created in Tests: test_TreeConstruction.py , test_Consensus.py ; directory created for testing files: Tests\TreeConstruction . ________________________________ You can merge this Pull Request by running git pull https://github.com/lijax/biopython master Or view, comment on, or merge it at: https://github.com/biopython/biopython/pull/270 Commit Summary add TreeConststruction and Consensus modules implement upgma and nj algorithms" add parsimony scorer rewrite and test parsimony score minor change add NNITreeSearcher._get_neighbors complete parsimony method fix the bug that the nj tree may have 1 child instead of 3 at the root add convertion from SubsMat to Matrix for protein submatrix BitString and strict_consensus reorganize different tests add automatic test for consensus tree add `repr` funcionn for Matrix class and improve the document move `delitem` and `insert` from DistanceMatrix to Matrix fix the index bug of Matrix.insert() function and improve the document test files for consensus algorithms improve document of DiscanceCalculator and DistanceTreeConstructor improve document for parsimony tree classes majority and adam consensus methods fix majority bug, finish adam consensus, doc improvement add branch support method change DistanceCalculator parameters as msa should be independent restructure TreeConstructor classes fix nj bug assign 0 lenght to root clade of nj and upgma add bootstrap method adapt "identity" model in DistanceCalculator to protein;None condition of starting_tree in ParsimonyTreeConstructor convert list to generator in bootstrap methods fix adam consensus bug test cleanup minor change make assistant classes private remove import * File Changes A Bio/Phylo/Consensus.py (570) A Bio/Phylo/TreeConstruction.py (1011) A Tests/TreeConstruction/adam_refs.tre (3) A Tests/TreeConstruction/bootstrap_consensus.tre (1) A Tests/TreeConstruction/consensus_refs.tre (3) A Tests/TreeConstruction/majority_ref.tre (2) A Tests/TreeConstruction/msa.phy (6) A Tests/TreeConstruction/neighbor_trees.tre (4) A Tests/TreeConstruction/nj.tre (1) A Tests/TreeConstruction/pars1.tre (1) A Tests/TreeConstruction/pars2.tre (1) A Tests/TreeConstruction/pars3.tre (1) A Tests/TreeConstruction/strict_refs.tre (3) A Tests/TreeConstruction/test.log (36) A Tests/TreeConstruction/trees.tre (3) A Tests/TreeConstruction/upgma.tre (1) A Tests/test_Consensus.py (152) A Tests/test_TreeConstruction.py (245) Patch Links: https://github.com/biopython/biopython/pull/270.patch https://github.com/biopython/biopython/pull/270.diff From yeyanbo289 at gmail.com Thu Dec 12 12:36:48 2013 From: yeyanbo289 at gmail.com (Yanbo Ye) Date: Fri, 13 Dec 2013 01:36:48 +0800 Subject: [Biopython-dev] Fwd: Re: tutorial translation and rst files In-Reply-To: References: Message-ID: forgot to add the list ---------- ????? ---------- ????"Yanbo Ye" ???2013?12?13? ??1:09 ???Re: [Biopython-dev] tutorial translation and rst files ????"Peter Cock" ??? Hi Peter, I added the compiled html files to the repository and the format seems to be ok. https://github.com/bigwiv/Biopython-cn/blob/master/cn/allinone.html https://github.com/bigwiv/Biopython-cn/blob/master/en/allinone.html On Thu, Dec 12, 2013 at 11:58 PM, Peter Cock wrote: > On Thu, Dec 12, 2013 at 3:09 PM, Yanbo Ye wrote: > > Hi guys, > > > > We almost completed the Chinese translation of the Biopython tutorial and > > fixed the format errors caused by latex to rst conversion. It's based on > > the Update ? 22 March 2013 and the repository is here: > > https://github.com/bigwiv/Biopython-cn , including the English version . > > > > I heard there was an discussion about switching the tutorial format from > > latex to rst or Sphinx/reStructuredText port. don't whether these files > are > > useful for this task. I'm not an expert in rst format and there must be > > other errors in those files. Any suggestions? > > > > Best, > > Yanbo > > Hi Yanbo, > > That looks impressive - the individual chapters are fast on GitHub, > but even the whole English document displays quite quickly there: > https://github.com/bigwiv/Biopython-cn/blob/master/en/allinone.rst > > Sadly the all-in-one document makes GitHub struggle: > https://github.com/bigwiv/Biopython-cn/blob/master/cn/allinone.rst > > Did you have any trouble converting particular parts of the Tutorial? > There are a few places where we used LaTeX for complex > mathematical formulas - that seems to be an rst weakness. There were many format errors with the hyperlinks, lists, tables and formulas. You can check the original converted file Tutorial.rst for reference. Now most of them are fixed, including the formulas(by using the latex code in the original latex file). Another big trouble is the title level inconsistency. For example, in chapter 15, there are three individual subsections before the section 15.1 start. It cannot be compiled through. Now I just change them to the section level to avoid the error. Any other solution? > Can you post compiled HTML and PDF output (English & Chinese) > as well? That would be a fairer way to look at the output, rather > than just seeing how GitHub renders it. > > Regards, > > Peter > > P.S. The six month old Tutorial.rst in the repository root does > not seem to work? > This is the original file converted and is just for reference. Best, Yanbo -- *Yanbo Ye* *Guangzhou Institutes of Biomedicine and Health, * *Chinese Academy of Sciences* *190 Kaiyuan Avenue, Science Park, Guangzhou, China* *Email: ye_yanbo at gibh.ac.cn * *Web: http://www.yeyanbo.com * *Phone: (86)-020-32093810* From yeyanbo289 at gmail.com Thu Dec 12 20:23:56 2013 From: yeyanbo289 at gmail.com (Yanbo Ye) Date: Fri, 13 Dec 2013 09:23:56 +0800 Subject: [Biopython-dev] [biopython] TreeConstruction and Consensus modules from GSoC 2013 (#270) In-Reply-To: References: Message-ID: Thanks, Peter. I noticed the Travis CI build failed because of the 'StringIO' import error under version 3.3. Need to fix this version issue. On Fri, Dec 13, 2013 at 12:30 AM, Peter Cock wrote: > Hello Biopythoneers, > > For those of you not following the GitHub repository, > this is quite a big and important pull request :) > > Please take a look! > > Thanks, > > Peter > > ---------- Forwarded message ---------- > From: yeyanbo > Date: Thu, Dec 12, 2013 at 2:16 PM > Subject: [biopython] TreeConstruction and Consensus modules from GSoC > 2013 (#270) > To: biopython/biopython > > > two module files created in Bio/Phylo: TreeConstruction.py, Consensus.py ; > two test files created in Tests: test_TreeConstruction.py , > test_Consensus.py ; > directory created for testing files: Tests\TreeConstruction . > > ________________________________ > > You can merge this Pull Request by running > > git pull https://github.com/lijax/biopython master > > Or view, comment on, or merge it at: > > https://github.com/biopython/biopython/pull/270 > > Commit Summary > > add TreeConststruction and Consensus modules > implement upgma and nj algorithms" > add parsimony scorer > rewrite and test parsimony score > minor change > add NNITreeSearcher._get_neighbors > complete parsimony method > fix the bug that the nj tree may have 1 child instead of 3 at the root > add convertion from SubsMat to Matrix for protein submatrix > BitString and strict_consensus > reorganize different tests > add automatic test for consensus tree > add `repr` funcionn for Matrix class and improve the document > move `delitem` and `insert` from DistanceMatrix to Matrix > fix the index bug of Matrix.insert() function and improve the document > test files for consensus algorithms > improve document of DiscanceCalculator and DistanceTreeConstructor > improve document for parsimony tree classes > majority and adam consensus methods > fix majority bug, finish adam consensus, doc improvement > add branch support method > change DistanceCalculator parameters as msa should be independent > restructure TreeConstructor classes > fix nj bug > assign 0 lenght to root clade of nj and upgma > add bootstrap method > adapt "identity" model in DistanceCalculator to protein;None condition > of starting_tree in ParsimonyTreeConstructor > convert list to generator in bootstrap methods > fix adam consensus bug > test cleanup > minor change > make assistant classes private > remove import * > > File Changes > > A Bio/Phylo/Consensus.py (570) > A Bio/Phylo/TreeConstruction.py (1011) > A Tests/TreeConstruction/adam_refs.tre (3) > A Tests/TreeConstruction/bootstrap_consensus.tre (1) > A Tests/TreeConstruction/consensus_refs.tre (3) > A Tests/TreeConstruction/majority_ref.tre (2) > A Tests/TreeConstruction/msa.phy (6) > A Tests/TreeConstruction/neighbor_trees.tre (4) > A Tests/TreeConstruction/nj.tre (1) > A Tests/TreeConstruction/pars1.tre (1) > A Tests/TreeConstruction/pars2.tre (1) > A Tests/TreeConstruction/pars3.tre (1) > A Tests/TreeConstruction/strict_refs.tre (3) > A Tests/TreeConstruction/test.log (36) > A Tests/TreeConstruction/trees.tre (3) > A Tests/TreeConstruction/upgma.tre (1) > A Tests/test_Consensus.py (152) > A Tests/test_TreeConstruction.py (245) > > Patch Links: > > https://github.com/biopython/biopython/pull/270.patch > https://github.com/biopython/biopython/pull/270.diff > -- *Yanbo Ye* *Guangzhou Institutes of Biomedicine and Health, * *Chinese Academy of Sciences* *190 Kaiyuan Avenue, Science Park, Guangzhou, China* *Email: ye_yanbo at gibh.ac.cn * *Web: http://www.yeyanbo.com * *Phone: (86)-020-32093810* From p.j.a.cock at googlemail.com Sat Dec 14 16:24:52 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 14 Dec 2013 21:24:52 +0000 Subject: [Biopython-dev] [biopython] Missing DTD files (#260) In-Reply-To: <1386861374.64762.YahooMailBasic@web164003.mail.gq1.yahoo.com> References: <1386861374.64762.YahooMailBasic@web164003.mail.gq1.yahoo.com> Message-ID: On Thu, Dec 12, 2013 at 3:16 PM, Michiel de Hoon wrote: > Thanks Peter. > I believe that matplotlib uses os.path.expanduser('~') /.matplotlib/matplotlibrc. > Then shall we use the analogous for Biopython, so > os.path.expanduser('~')/.biopython/Bio/Entrez/DTDs? That or ~/.config/biopython makes sense under Linux and Mac, but I think we want something like this on Windows (untested, based on some Google reading): os.path.join(os.getenv("APPDATA"), "biopython") Regards, Peter From mjldehoon at yahoo.com Thu Dec 26 05:28:32 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 26 Dec 2013 02:28:32 -0800 (PST) Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings In-Reply-To: Message-ID: <1388053712.20811.YahooMailBasic@web164006.mail.gq1.yahoo.com> Fixed; please let us know if you encounter any problems. -Michiel. -------------------------------------------- On Mon, 9/23/13, Peter Cock wrote: Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings To: "Biopython-Dev Mailing List" Date: Monday, September 23, 2013, 4:58 PM Hi all, I'm seeing the following warning from NumPy 1.7 with Python 3.3 on Mac OS X, and on Linux too. I believe the NumPy version is the critical factor: building 'Bio.Cluster.cluster' extension building 'Bio.KDTree._CKDTree' extension building 'Bio.Motif._pwm' extension building 'Bio.motifs._pwm' extension all give: /Users/peterjc/lib/python3.3/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:11:2: warning: "Using ? ? ? deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] According to this page, http://docs.scipy.org/doc/numpy-dev/reference/c-api.deprecations.html If we add this line it should confirm our code is clean for NumPy 1.7 (and implies to side effects on older NumPy): #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION Unfortunately that seems all four modules have problems doing that, presumably planned NumPy C API changes we need to handle via a version conditional #ifdef? Peter _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From mjldehoon at yahoo.com Sun Dec 1 03:28:45 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 30 Nov 2013 19:28:45 -0800 (PST) Subject: [Biopython-dev] [biopython] Missing DTD files (#260) In-Reply-To: Message-ID: <1385868525.19183.YahooMailBasic@web164002.mail.gq1.yahoo.com> How would people feel about Biopython always downloading DTD files on the fly instead of distributing them with Biopython? After downloading and parsing a DTD file, we can keep it in memory so we won't need to parse the same DTD file over and over again. So the impact on speed will be minimal. If we do so, we'll never run into the problem of missing DTD files. The downside of course is that we will need internet access to parse any XML file through Bio.Entrez. But maybe in today's world that is acceptable. Best, -Michiel. -------------------------------------------- On Thu, 11/28/13, Karol M. Langner wrote: Subject: Re: [biopython] Missing DTD files (#260) To: "biopython/biopython" Date: Thursday, November 28, 2013, 9:46 AM It's true, and I didn't bother to look at the newer version. Thanks for the explanation. On the plus side, biopython still downloads the missing DTD, so I don't critically need to install a newer local version. ? Reply to this email directly or view it on GitHub. From p.j.a.cock at googlemail.com Tue Dec 3 10:38:43 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 3 Dec 2013 10:38:43 +0000 Subject: [Biopython-dev] [biopython] Missing DTD files (#260) In-Reply-To: <1385868525.19183.YahooMailBasic@web164002.mail.gq1.yahoo.com> References: <1385868525.19183.YahooMailBasic@web164002.mail.gq1.yahoo.com> Message-ID: On Sun, Dec 1, 2013 at 3:28 AM, Michiel de Hoon wrote: > How would people feel about Biopython always downloading DTD files > on the fly instead of distributing them with Biopython? > > After downloading and parsing a DTD file, we can keep it in memory > so we won't need to parse the same DTD file over and over again. > So the impact on speed will be minimal. > > If we do so, we'll never run into the problem of missing DTD files. The > downside of course is that we will need internet access to parse any > XML file through Bio.Entrez. But maybe in today's world that is acceptable. Requiring network access would be annoying for offline work (e.g. how we usually run the automated tests), but most of the NCBI Entrez XML files will (I expect) will be downloaded and immediately parsed. So for usability this seems OK. Automatic caching to disk (without a scary warning) seems like a better idea than always downloading the DTD files on demand (which seems wasteful of bandwidth and more likely to give intermittent errors), although as you have noted before there is the open question of where to put this files (including where on Windows): http://lists.open-bio.org/pipermail/biopython-dev/2010-October/008310.html Regards, Peter From tra at popgen.net Tue Dec 3 14:31:41 2013 From: tra at popgen.net (Tiago Antao) Date: Tue, 3 Dec 2013 14:31:41 -0000 Subject: [Biopython-dev] 1.63 Release attempt Message-ID: <28b5ab441650c2b5a6a8fc1f9cf2d60e.squirrel@webmail.popgen.net> Dear all, Tomorrow I will try to release 1.63. If possible please keep the number of commits to the trunk to a minimum. I intend to pull the source in the morning (Western European time) and release 1.63 by the end of the day. If you have any serious issues with this plan, please get in touch ASAP. Thanks, Tiago From tiagoantao at gmail.com Wed Dec 4 21:02:24 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 4 Dec 2013 21:02:24 +0000 Subject: [Biopython-dev] 1.63 delayed Message-ID: Dear all, My sincere apologies but the release 1.63 will be delayed. I hope to be able to release it tomorrow (instead of today). Regards, Tiago -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From tiagoantao at gmail.com Wed Dec 4 21:13:21 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Wed, 4 Dec 2013 21:13:21 +0000 Subject: [Biopython-dev] Issue with test_Phylo_CDAO Message-ID: Dear all, I am trying to release 1.63 and I am getting a strange problem with the test_Python_CDAO module: If we run the test on a recent Ubuntu Linux Machine the following error occurs (Python 2.7, rdflib 2.4.2): ERROR: test_parse_0 (__main__.ParseTests) Parse the phylogenies in test.cdao. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Phylo_CDAO.py", line 43, in test_parse trees = list(bp._io.parse(filename, 'cdao')) File "/home/tiago/Dropbox/soft/bp-release/release/biopython-1.63/Bio/Phylo/_io.py", line 53, in parse for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs): File "/home/tiago/Dropbox/soft/bp-release/release/biopython-1.63/Bio/Phylo/CDAOIO.py", line 63, in parse return Parser(handle).parse(**kwargs) File "/home/tiago/Dropbox/soft/bp-release/release/biopython-1.63/Bio/Phylo/CDAOIO.py", line 94, in parse self.parse_handle_to_graph(**kwargs) File "/home/tiago/Dropbox/soft/bp-release/release/biopython-1.63/Bio/Phylo/CDAOIO.py", line 115, in parse_handle_to_graph graph.parse(file=self.handle, publicID=base_uri, format=parse_format) TypeError: parse() takes at least 2 arguments (3 given) We suspect that the interface to the RDFLib might have changed recently and that that might have broke the test code. Can someone (Eric, Ben?) that is more experienced with this code maybe give a suggestion and comment on the potentially importance of correcting this before releasing 1.63? Regards, Tiago -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From tiagoantao at gmail.com Thu Dec 5 09:03:06 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 5 Dec 2013 09:03:06 +0000 Subject: [Biopython-dev] 1.63 release Message-ID: Dear all, As far as I know there are only two issues discovered (and not fully resolved) during the release attempt: 1. TogoWS testing, where the field ti is not supported anymore on the pubmed database. I will log a bug after this email. The test code was temporarily amended. 2. The failing of test_Phylo_CDAO. I think this is a long running problem with the interface of the RDF library. I speculate that the test stopped working when there was some change on the RDF library interface. Unfortunately we did not have testing on this module setup anywhere. Also, I suspect that this bugs exists on previous biopython versions. My suggestion is to go ahead with the release (today) in spite of the problems above. Unless someone feels that any of the above should be solved before... -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From p.j.a.cock at googlemail.com Thu Dec 5 10:30:27 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 10:30:27 +0000 Subject: [Biopython-dev] [Biopython] type object 'RestrictionType' has no attribute 'size' In-Reply-To: References: Message-ID: On Thu, Dec 5, 2013 at 10:01 AM, Wibowo Arindrarto wrote: > Hi everyone, > > Christopher: Ah yes, I actually meant the IPython notebook (but I > guess it turns out the same occurs in the IPython console then :) ). > > Antony: That may be the case, too. But thanks for the pull request (I > think Peter has just looked at it, actually > https://github.com/biopython/biopython/pull/148). I do think there is > room for improvement there, especially since the code probably > predates modern Python conventions (& assumptions). > > Cheers, > Bow Even Frederic Sohm (the original author) agreed that the current Bio.Restriction code is too complicated (I called it 'magic' in our discussion back in 2010 regarding a Python 2.6 problem with super http://bugzilla.open-bio.org/show_bug.cgi?id=2604 or now https://redmine.open-bio.org/issues/2604 ). And also I dislike the fact it does one-based counting. However, none of our currently active developers really understand the code so changing it is hard - and backward compatibility constrains us greatly. I think the best route forward is to replace Bio.Restriction with a new less complicated implementation trying to follow modern Python conventions (using zero-based counting!), likey based on Antony's branch https://github.com/biopython/biopython/pull/148 and then deprecate and later remove Bio.Restriction. (We should continue that debate on the biopython-dev list, CC'd) In terms of Christopher's problems - it would not surprise me if they are specific to IPython since introspection of the 'magic' classes seems problematic. Regards, Peter From tiagoantao at gmail.com Thu Dec 5 13:53:27 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 5 Dec 2013 13:53:27 +0000 Subject: [Biopython-dev] 1.63 release In-Reply-To: References: Message-ID: Dear all, The CDAO issue was caused by a old version of rdflib being the default on ubuntu. As far as I can see we can go ahead and release today? Tiago On 5 December 2013 09:03, Tiago Ant?o wrote: > Dear all, > > As far as I know there are only two issues discovered (and not fully > resolved) during the release attempt: > > 1. TogoWS testing, where the field ti is not supported anymore on the > pubmed database. I will log a bug after this email. The test code was > temporarily amended. > > 2. The failing of test_Phylo_CDAO. I think this is a long running problem > with the interface of the RDF library. I speculate that the test stopped > working when there was some change on the RDF library interface. > Unfortunately we did not have testing on this module setup anywhere. Also, > I suspect that this bugs exists on previous biopython versions. > > My suggestion is to go ahead with the release (today) in spite of the > problems above. Unless someone feels that any of the above should be solved > before... > > -- > "The truth may be out there, but the lies are already in your head" - > Terry Pratchett > -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From binni at binnisb.com Thu Dec 5 16:29:02 2013 From: binni at binnisb.com (=?ISO-8859-1?Q?Brynjar_Sm=E1ri_Bjarnason?=) Date: Thu, 05 Dec 2013 17:29:02 +0100 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: <52A0A8E4.2080806@binnisb.com> References: <52A0A8E4.2080806@binnisb.com> Message-ID: <52A0A9CE.9080400@binnisb.com> Hello. I see CompoundLocation is quite new. I am currently using anaconda (Python 2.7.6 :: Anaconda 1.8.0 (64-bit)) and BioPython 1.62. I am fetching gi values and using SeqIO to parse them. So far most of them work but I found one that fail. Code: p = Entrez.efetch(db="protein", rettype="gp", retmode="text",id="494379") seq = SeqIO.read(p,"gb") Gives error: ValueError: CompoundLocation should have at least 2 parts With quite long stack trace and the last one being: /Bio/SeqFeature.pyc: 996 if len(self.parts) < 2: --> 997 raise ValueError("CompoundLocation should have at least 2 parts") Any suggestions on how to fix this, and maybe what is different with this gi from the rest of them (one gi that works: 10342)? Brynjar From p.j.a.cock at googlemail.com Thu Dec 5 16:43:20 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 16:43:20 +0000 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: <52A0A9CE.9080400@binnisb.com> References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On Thu, Dec 5, 2013 at 4:29 PM, Brynjar Sm?ri Bjarnason wrote: > > Hello. > > I see CompoundLocation is quite new. I am currently using anaconda > (Python 2.7.6 :: Anaconda 1.8.0 (64-bit)) and BioPython 1.62. > > I am fetching gi values and using SeqIO to parse them. So far most of > them work but I found one that fail. > > Code: > > p = Entrez.efetch(db="protein", rettype="gp", retmode="text",id="494379") > seq = SeqIO.read(p,"gb") > > Gives error: > ValueError: CompoundLocation should have at least 2 parts > > With quite long stack trace and the last one being: > > /Bio/SeqFeature.pyc: > 996 if len(self.parts) < 2: > --> 997 raise ValueError("CompoundLocation should have at > least 2 parts") > > Any suggestions on how to fix this, and maybe what is different with > this gi from the rest of them (one gi that works: 10342)? > > Brynjar Hi Brynjar, Hmm. Right now the website is very slow & won't load http://www.ncbi.nlm.nih.gov/protein/494379 and via Entrez I am getting a network error: urllib2.HTTPError: HTTP Error 502: Bad Gateway Where you able to save the file, and could you post it online (e.g. at http://gist.github.com)? Regards, Peter From p.j.a.cock at googlemail.com Thu Dec 5 16:46:46 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 16:46:46 +0000 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On Thu, Dec 5, 2013 at 4:43 PM, Peter Cock wrote: > On Thu, Dec 5, 2013 at 4:29 PM, Brynjar Sm?ri Bjarnason > wrote: >> >> Hello. >> >> I see CompoundLocation is quite new. I am currently using anaconda >> (Python 2.7.6 :: Anaconda 1.8.0 (64-bit)) and BioPython 1.62. >> >> I am fetching gi values and using SeqIO to parse them. So far most of >> them work but I found one that fail. >> >> Code: >> >> p = Entrez.efetch(db="protein", rettype="gp", retmode="text",id="494379") >> seq = SeqIO.read(p,"gb") >> >> Gives error: >> ValueError: CompoundLocation should have at least 2 parts >> >> With quite long stack trace and the last one being: >> >> /Bio/SeqFeature.pyc: >> 996 if len(self.parts) < 2: >> --> 997 raise ValueError("CompoundLocation should have at >> least 2 parts") >> >> Any suggestions on how to fix this, and maybe what is different with >> this gi from the rest of them (one gi that works: 10342)? >> >> Brynjar > > Hi Brynjar, > > Hmm. Right now the website is very slow & won't load > http://www.ncbi.nlm.nih.gov/protein/494379 > and via Entrez I am getting a network error: > urllib2.HTTPError: HTTP Error 502: Bad Gateway > > Where you able to save the file, and could you post it online > (e.g. at http://gist.github.com)? > > Regards, > > Peter Not to worry - the site did respond when I retried a bit later, and I can reproduce the parser error: >>> from Bio import SeqIO >>> r = SeqIO.read("1MRR_A.gp", "genbank") /Library/Python/2.7/site-packages/Bio/GenBank/__init__.py:1096: BiopythonParserWarning: Couldn't parse feature location: 'join(bond(84),bond(115),bond(118),bond(238))' % (location_line))) /Library/Python/2.7/site-packages/Bio/GenBank/__init__.py:1096: BiopythonParserWarning: Couldn't parse feature location: 'join(bond(115),bond(204),bond(238),bond(241))' % (location_line))) /Library/Python/2.7/site-packages/Bio/GenBank/__init__.py:1096: BiopythonParserWarning: Couldn't parse feature location: 'join(bond(194),bond(272))' % (location_line))) Traceback (most recent call last): File "", line 1, in File "/Library/Python/2.7/site-packages/Bio/SeqIO/__init__.py", line 646, in read first = next(iterator) File "/Library/Python/2.7/site-packages/Bio/SeqIO/__init__.py", line 582, in parse for r in i: File "/Library/Python/2.7/site-packages/Bio/GenBank/Scanner.py", line 467, in parse_records record = self.parse(handle, do_features) File "/Library/Python/2.7/site-packages/Bio/GenBank/Scanner.py", line 451, in parse if self.feed(handle, consumer, do_features): File "/Library/Python/2.7/site-packages/Bio/GenBank/Scanner.py", line 423, in feed self._feed_feature_table(consumer, self.parse_features(skip=False)) File "/Library/Python/2.7/site-packages/Bio/GenBank/Scanner.py", line 374, in _feed_feature_table consumer.location(location_string) File "/Library/Python/2.7/site-packages/Bio/GenBank/__init__.py", line 1083, in location operator=location_line[:i]) File "/Library/Python/2.7/site-packages/Bio/SeqFeature.py", line 1003, in __init__ raise ValueError("CompoundLocation should have at least 2 parts") ValueError: CompoundLocation should have at least 2 parts Peter From p.j.a.cock at googlemail.com Thu Dec 5 17:12:04 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 17:12:04 +0000 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On Thu, Dec 5, 2013 at 4:46 PM, Peter Cock wrote: > > Not to worry - the site did respond when I retried a bit later, and > I can reproduce the parser error: > >>>> from Bio import SeqIO >>>> r = SeqIO.read("1MRR_A.gp", "genbank") > BiopythonParserWarning: Couldn't parse feature location: > 'join(bond(84),bond(115),bond(118),bond(238))' > BiopythonParserWarning: Couldn't parse feature location: > 'join(bond(115),bond(204),bond(238),bond(241))' > BiopythonParserWarning: Couldn't parse feature location: > 'join(bond(194),bond(272))' > ... > ValueError: CompoundLocation should have at least 2 parts The problem is the bond locations, and in particular while the parser gave up on the ones with a warning, it fell over the single bond entry, bond(196). This is partly due to a change in the use of the bond term, which used to be a compound entry like bond(194,272). Also the GenBank parser was and is primarily used on nucleotide sequences rather than GenPept files which are occasionally more weird (like here!). A short term hack would be to strip out the bond term (with a warning) and parse the remainder as a simple join or single residue accordingly. Would that work for you - do you need the bond bit? Peter From tiagoantao at gmail.com Thu Dec 5 17:24:41 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 5 Dec 2013 17:24:41 +0000 Subject: [Biopython-dev] 1.63 (almost) Message-ID: Dear all, I have almost completed the process of releasing 1.63: Source and binaries are already available at http://biopython.org/wiki/Download The API docs have been updated Please feel free to test/comment. If there are no problems I will finalize the release (announcements + pypi) tomorrow -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From p.j.a.cock at googlemail.com Thu Dec 5 18:03:41 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 18:03:41 +0000 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On Thu, Dec 5, 2013 at 5:12 PM, Peter Cock wrote: > > A short term hack would be to strip out the bond term > (with a warning) and parse the remainder as a simple > join or single residue accordingly. > > Would that work for you - do you need the bond bit? Proposed branch with that change: https://github.com/peterjc/biopython/tree/gp_bond Sample output, using the same example GenPept file: >>> from Bio import SeqIO >>> r = SeqIO.read("1MRR_A.gp", "genbank") Bio/GenBank/__init__.py:1011: BiopythonParserWarning: Dropping bond qualifier in feature location warnings.warn("Dropping bond qualifier in feature location", BiopythonParserWarning) >>> for f in r.features: print f.type, f.location ... source [0:375] Region [27:340] SecStr [34:46] Site order{[36:37], [43:44], [108:110], [112:113], [115:117], [119:120], [122:123], [136:138], [140:141]} Site order{[47:48], [83:84], [114:115], [117:118], [121:122], [235:237], [240:241]} SecStr [56:65] SecStr [66:87] Site order{[83:84], [114:115], [117:118], [203:204], [237:238], [240:241]} Het join{[83:84], [114:115], [117:118], [237:238]} SecStr [101:129] Het join{[114:115], [203:204], [237:238], [240:241]} Site [121:122] SecStr [132:140] SecStr [142:151] SecStr [152:169] SecStr [171:177] SecStr [179:185] SecStr [185:216] Het join{[193:194], [271:272]} Het [195:196] Het join{[195:196], [195:196]} Het join{[209:210], [213:214], [213:214]} SecStr [224:253] SecStr [259:269] Bond bond{[267:268], [271:272]} SecStr [269:285] Het join{[283:284], [304:305], [308:309], [304:305]} SecStr [300:319] Useful? Peter From binni at binnisb.com Thu Dec 5 18:06:45 2013 From: binni at binnisb.com (=?ISO-8859-1?Q?Brynjar_Sm=E1ri_Bjarnason?=) Date: Thu, 5 Dec 2013 19:06:45 +0100 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: I'll ask one who knows but I think I could skip using the bonds. Can you suggest how I can ignore the bonds in efetch response, or the parser? Thanks a lot for looking at this! On 5 Dec 2013 18:12, "Peter Cock" wrote: > On Thu, Dec 5, 2013 at 4:46 PM, Peter Cock > wrote: > > > > Not to worry - the site did respond when I retried a bit later, and > > I can reproduce the parser error: > > > >>>> from Bio import SeqIO > >>>> r = SeqIO.read("1MRR_A.gp", "genbank") > > BiopythonParserWarning: Couldn't parse feature location: > > 'join(bond(84),bond(115),bond(118),bond(238))' > > BiopythonParserWarning: Couldn't parse feature location: > > 'join(bond(115),bond(204),bond(238),bond(241))' > > BiopythonParserWarning: Couldn't parse feature location: > > 'join(bond(194),bond(272))' > > ... > > ValueError: CompoundLocation should have at least 2 parts > > The problem is the bond locations, and in particular while the > parser gave up on the ones with a warning, it fell over the > single bond entry, bond(196). > > This is partly due to a change in the use of the bond term, > which used to be a compound entry like bond(194,272). > Also the GenBank parser was and is primarily used on > nucleotide sequences rather than GenPept files which are > occasionally more weird (like here!). > > A short term hack would be to strip out the bond term > (with a warning) and parse the remainder as a simple > join or single residue accordingly. > > Would that work for you - do you need the bond bit? > > Peter > From p.j.a.cock at googlemail.com Thu Dec 5 18:06:48 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 5 Dec 2013 18:06:48 +0000 Subject: [Biopython-dev] 1.63 (almost) In-Reply-To: References: Message-ID: On Thu, Dec 5, 2013 at 5:24 PM, Tiago Ant?o wrote: > Dear all, > > I have almost completed the process of releasing 1.63: > Source and binaries are already available at > http://biopython.org/wiki/Download > The API docs have been updated > > Please feel free to test/comment. If there are no problems I will finalize > the release (announcements + pypi) tomorrow > Thanks Tiago, The fact we got three issues reported the very evening is just bad luck (PDB handles vs filenames, SeqIO.index_db relative paths, and GenPept bond location parsing), but none of these are regressions - they all existed in Biopython 1.62 as well. Maybe we can aim to get the Biopython 1.64 release out early next year? Regards, Peter From binni at binnisb.com Thu Dec 5 18:08:20 2013 From: binni at binnisb.com (=?ISO-8859-1?Q?Brynjar_Sm=E1ri_Bjarnason?=) Date: Thu, 5 Dec 2013 19:08:20 +0100 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: Thanks, will look at this when I'm at the computer :-) On 5 Dec 2013 19:06, "Brynjar Sm?ri Bjarnason" wrote: > I'll ask one who knows but I think I could skip using the bonds. Can you > suggest how I can ignore the bonds in efetch response, or the parser? > > Thanks a lot for looking at this! > On 5 Dec 2013 18:12, "Peter Cock" wrote: > >> On Thu, Dec 5, 2013 at 4:46 PM, Peter Cock >> wrote: >> > >> > Not to worry - the site did respond when I retried a bit later, and >> > I can reproduce the parser error: >> > >> >>>> from Bio import SeqIO >> >>>> r = SeqIO.read("1MRR_A.gp", "genbank") >> > BiopythonParserWarning: Couldn't parse feature location: >> > 'join(bond(84),bond(115),bond(118),bond(238))' >> > BiopythonParserWarning: Couldn't parse feature location: >> > 'join(bond(115),bond(204),bond(238),bond(241))' >> > BiopythonParserWarning: Couldn't parse feature location: >> > 'join(bond(194),bond(272))' >> > ... >> > ValueError: CompoundLocation should have at least 2 parts >> >> The problem is the bond locations, and in particular while the >> parser gave up on the ones with a warning, it fell over the >> single bond entry, bond(196). >> >> This is partly due to a change in the use of the bond term, >> which used to be a compound entry like bond(194,272). >> Also the GenBank parser was and is primarily used on >> nucleotide sequences rather than GenPept files which are >> occasionally more weird (like here!). >> >> A short term hack would be to strip out the bond term >> (with a warning) and parse the remainder as a simple >> join or single residue accordingly. >> >> Would that work for you - do you need the bond bit? >> >> Peter >> > From binni at binnisb.com Fri Dec 6 08:03:45 2013 From: binni at binnisb.com (=?ISO-8859-1?Q?Brynjar_Sm=E1ri_Bjarnason?=) Date: Fri, 6 Dec 2013 09:03:45 +0100 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On 5 December 2013 19:08, Brynjar Sm?ri Bjarnason wrote: > Thanks, will look at this when I'm at the computer :-) > On 5 Dec 2013 19:06, "Brynjar Sm?ri Bjarnason" wrote: > >> I'll ask one who knows but I think I could skip using the bonds. Can you >> suggest how I can ignore the bonds in efetch response, or the parser? >> >> Thanks a lot for looking at this! >> On 5 Dec 2013 18:12, "Peter Cock" wrote: >> >>> On Thu, Dec 5, 2013 at 4:46 PM, Peter Cock >>> wrote: >>> > >>> > Not to worry - the site did respond when I retried a bit later, and >>> > I can reproduce the parser error: >>> > >>> >>>> from Bio import SeqIO >>> >>>> r = SeqIO.read("1MRR_A.gp", "genbank") >>> > BiopythonParserWarning: Couldn't parse feature location: >>> > 'join(bond(84),bond(115),bond(118),bond(238))' >>> > BiopythonParserWarning: Couldn't parse feature location: >>> > 'join(bond(115),bond(204),bond(238),bond(241))' >>> > BiopythonParserWarning: Couldn't parse feature location: >>> > 'join(bond(194),bond(272))' >>> > ... >>> > ValueError: CompoundLocation should have at least 2 parts >>> >>> The problem is the bond locations, and in particular while the >>> parser gave up on the ones with a warning, it fell over the >>> single bond entry, bond(196). >>> >>> This is partly due to a change in the use of the bond term, >>> which used to be a compound entry like bond(194,272). >>> Also the GenBank parser was and is primarily used on >>> nucleotide sequences rather than GenPept files which are >>> occasionally more weird (like here!). >>> >>> A short term hack would be to strip out the bond term >>> (with a warning) and parse the remainder as a simple >>> join or single residue accordingly. >>> >>> Would that work for you - do you need the bond bit? >>> >>> Peter >>> >> I believe for our part that leaving the bond bit out is fine so your patch should work well. Any suggestions on a good way to apply this patch? Should I build Biopython from that branch or clone latest stable and apply the patch before building? Thank you Brynjar From binni at binnisb.com Fri Dec 6 09:55:18 2013 From: binni at binnisb.com (=?ISO-8859-1?Q?Brynjar_Sm=E1ri_Bjarnason?=) Date: Fri, 6 Dec 2013 10:55:18 +0100 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On 6 December 2013 09:03, Brynjar Sm?ri Bjarnason wrote: > On 5 December 2013 19:08, Brynjar Sm?ri Bjarnason wrote: > >> Thanks, will look at this when I'm at the computer :-) >> On 5 Dec 2013 19:06, "Brynjar Sm?ri Bjarnason" wrote: >> >>> I'll ask one who knows but I think I could skip using the bonds. Can you >>> suggest how I can ignore the bonds in efetch response, or the parser? >>> >>> Thanks a lot for looking at this! >>> On 5 Dec 2013 18:12, "Peter Cock" wrote: >>> >>>> On Thu, Dec 5, 2013 at 4:46 PM, Peter Cock >>>> wrote: >>>> > >>>> > Not to worry - the site did respond when I retried a bit later, and >>>> > I can reproduce the parser error: >>>> > >>>> >>>> from Bio import SeqIO >>>> >>>> r = SeqIO.read("1MRR_A.gp", "genbank") >>>> > BiopythonParserWarning: Couldn't parse feature location: >>>> > 'join(bond(84),bond(115),bond(118),bond(238))' >>>> > BiopythonParserWarning: Couldn't parse feature location: >>>> > 'join(bond(115),bond(204),bond(238),bond(241))' >>>> > BiopythonParserWarning: Couldn't parse feature location: >>>> > 'join(bond(194),bond(272))' >>>> > ... >>>> > ValueError: CompoundLocation should have at least 2 parts >>>> >>>> The problem is the bond locations, and in particular while the >>>> parser gave up on the ones with a warning, it fell over the >>>> single bond entry, bond(196). >>>> >>>> This is partly due to a change in the use of the bond term, >>>> which used to be a compound entry like bond(194,272). >>>> Also the GenBank parser was and is primarily used on >>>> nucleotide sequences rather than GenPept files which are >>>> occasionally more weird (like here!). >>>> >>>> A short term hack would be to strip out the bond term >>>> (with a warning) and parse the remainder as a simple >>>> join or single residue accordingly. >>>> >>>> Would that work for you - do you need the bond bit? >>>> >>>> Peter >>>> >>> > I believe for our part that leaving the bond bit out is fine so your patch > should work well. > > Any suggestions on a good way to apply this patch? Should I build > Biopython from that branch or clone latest stable and apply the patch > before building? > > Thank you > > Brynjar > The patch works for me! Thank you. I cloned the official Biopython and applied your commit as patch before building. My gis' that were failing now work. Brynjar From p.j.a.cock at googlemail.com Fri Dec 6 10:19:42 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 6 Dec 2013 10:19:42 +0000 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On Fri, Dec 6, 2013 at 9:55 AM, Brynjar Sm?ri Bjarnason wrote: > > The patch works for me! Thank you. > > I cloned the official Biopython and applied your commit as patch before > building. My gis' that were failing now work. > > Brynjar Great - thank you for confirming this fixes the problem. I think we just missed the window for getting this into the Biopython 1.63 release (so you'll effectively be running Biopython 1.63 + this patch). Regards, Peter From binni at binnisb.com Fri Dec 6 10:26:46 2013 From: binni at binnisb.com (=?ISO-8859-1?Q?Brynjar_Sm=E1ri_Bjarnason?=) Date: Fri, 6 Dec 2013 11:26:46 +0100 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On 6 December 2013 11:19, Peter Cock wrote: > On Fri, Dec 6, 2013 at 9:55 AM, Brynjar Sm?ri Bjarnason > wrote: > > > > The patch works for me! Thank you. > > > > I cloned the official Biopython and applied your commit as patch before > > building. My gis' that were failing now work. > > > > Brynjar > > Great - thank you for confirming this fixes the problem. > > I think we just missed the window for getting this into the > Biopython 1.63 release (so you'll effectively be running > Biopython 1.63 + this patch). > > Regards, > > Peter > I saw that 1.63 was just released, so yes, I run Biopython 1.63 + patch. Is it likely that this will find its way into 1.64? Not that I am in any rush since it is working, just wondering for the one that needs to maintain the Biopython versions. Best reagrds, Brynjar From p.j.a.cock at googlemail.com Fri Dec 6 10:38:08 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 6 Dec 2013 10:38:08 +0000 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On Fri, Dec 6, 2013 at 10:26 AM, Brynjar Sm?ri Bjarnason wrote: > > I saw that 1.63 was just released, so yes, I run Biopython 1.63 + patch. The official release announcement should be later today, but yes on GitHub you can (and have) get it already ;) > Is it likely that this will find its way into 1.64? Not that I am in any > rush since it is working, just wondering for the one that needs to maintain > the Biopython versions. Yes, once the release is formally announced I intend to commit this change (and a test - probably the very same example GenPept file you reported). It is possible that we'll have a better solution in place for Biopython 1.64 to handle these locations without simple dropping the bond term. Peter From tiagoantao at gmail.com Fri Dec 6 11:27:28 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 6 Dec 2013 11:27:28 +0000 Subject: [Biopython-dev] Biopython 1.63 released Message-ID: Source distributions and Windows installers for Biopython 1.63 are now available from the downloads page on the official Biopython website and (soon) from the Python Package Index (PyPI). The current version removed the requirement of the 2to3 library. This was made possible by dropping Python 2.5 (and Jython 2.5). This release of Biopython supports Python 2.6 and 2.7, and also Python 3.3. The Biopython Tutorial & Cookbook, and the docstring examples in the source code, now use the Python 3 style print function in place of the Python 2 style print statement. This language feature is available under Python 2.6 and 2.7 via: from __future__ import print_function Similarly we now use the Python 3 style built-in next function in place of the Python 2 style iterators? .next() method. This language feature is also available under Python 2.6 and 2.7. The restriction enzyme list in Bio.Restriction has been updated to the December 2013 release of REBASE. Many thanks to the Biopython developers and community for making this release possible, especially the following contributors: Chris Mitchell (first contribution) Christian Brueffer Eric Talevich Gokcen Eraslan (first contribution) Josha Inglis (first contribution) Konstantin Tretyakov (first contribution) Lenna Peterson Martin Mokrejs Nigel Delaney (first contribution) Peter Cock Sergei Lebedev (first contribution) Tiago Antao Wayne Decatur (first contribution) Wibowo ?Bow? Arindrarto From tiagoantao at gmail.com Fri Dec 6 12:26:09 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 6 Dec 2013 12:26:09 +0000 Subject: [Biopython-dev] Releasing biopython - lessons Message-ID: Dear all, With much help from Peter, 1.63 was released. I have a few comments/ideas I would like to share: 1. I think that there is a need to maintain an exhaustive list of dependencies that a full biopython distribution will need (python packages and external applications). I am offering to do that here: http://biopython.org/wiki/List_of_applications_executed_via_Biopython (over the next few days). 2. I was planning on creating a Linux virtual box image (and make it available) with everything that is needed to fully test and run a Biopython distribution. This would allow to have an extremely stable environment for testing (testing PCs normally have other uses and things can be broken by other stuff). 3. I think that it would be nice to change run_tests to have an option to run in an "extremely picky mode": Basically fail if there is a warning that is not a Deprecation warning. Some silent warnings can actually be somewhat irritating if not understood/acted upon (e.g the RDFlib case). This would be run before release and maybe once a week on buildbot? 1 and 2 are trivial (and go very well together) and I am offering to do it in the very next few days if that is OK. Any views on 3? This would require changing run_tests and that might not be easy/desirable... My 2p, Tiago -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From chapmanb at 50mail.com Fri Dec 6 15:19:52 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 06 Dec 2013 10:19:52 -0500 Subject: [Biopython-dev] Releasing biopython - lessons In-Reply-To: References: Message-ID: <87wqjij9bb.fsf@fastmail.fm> Tiago; > With much help from Peter, 1.63 was released. I have a few comments/ideas I > would like to share: > > 1. I think that there is a need to maintain an exhaustive list of > dependencies that a full biopython distribution will need (python packages > and external applications). I am offering to do that here: > http://biopython.org/wiki/List_of_applications_executed_via_Biopython (over > the next few days). > > 2. I was planning on creating a Linux virtual box image (and make it > available) with everything that is needed to fully test and run a Biopython > distribution. This would allow to have an extremely stable environment for > testing (testing PCs normally have other uses and things can be broken by > other stuff). Great idea. One of the things we've been doing a lot of work on in CloudBioLinux is making it easy to automate these type of local installs. The advantage is that you have full documentation (lists of packages), and automated way to install it, and a script you can use for generating images for testing/distribution purposes. I put together a starter package for a Biopython 'flavor' here: https://github.com/chapmanb/cloudbiolinux/tree/master/contrib/flavor/biopython You can run with: git clone https://github.com/chapmanb/cloudbiolinux.git fab -H localhost install_biolinux:flavor=biopython By default it will install an Anaconda Python, Biopython dependencies and associated tools into ~/biopython. If you like this approach, happy to help with pointers on adding more tools. Thanks again for taking this on, Brad From mjldehoon at yahoo.com Mon Dec 9 06:33:01 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 8 Dec 2013 22:33:01 -0800 (PST) Subject: [Biopython-dev] [biopython] Missing DTD files (#260) In-Reply-To: Message-ID: <1386570781.7166.YahooMailBasic@web164001.mail.gq1.yahoo.com> Current we are using os.path.expanduser('~') /.biopython/Bio/Entrez/DTDs to look for locally stored DTDs. This should work on Windows also. Then I would suggest the following if a DTD file is missing: 1) Print a non-scary warning message that we will attempt to download the DTD; 2) Download the DTD; 3) Try to store it in the local DTD directory. If this fails (e.g. due to file permissions or whatnot), print another warning message; 4) Use the downloaded DTD to parse the XML. Any final objections? Best, -Michiel. -------------------------------------------- On Tue, 12/3/13, Peter Cock wrote: Subject: Re: [Biopython-dev] [biopython] Missing DTD files (#260) To: "Michiel de Hoon" Cc: "Biopython-Dev Mailing List" Date: Tuesday, December 3, 2013, 5:38 AM On Sun, Dec 1, 2013 at 3:28 AM, Michiel de Hoon wrote: > How would people feel about Biopython always downloading DTD files > on the fly instead of distributing them with Biopython? > > After downloading and parsing a DTD file, we can keep it in memory > so we won't need to parse the same DTD file over and over again. > So the impact on speed will be minimal. > > If we do so, we'll never run into the problem of missing DTD files. The > downside of course is that we will need internet access to parse any > XML file through Bio.Entrez. But maybe in today's world that is acceptable. Requiring network access would be annoying for offline work (e.g. how we usually run the automated tests), but most of the NCBI Entrez XML files will (I expect) will be downloaded and immediately parsed. So for usability this seems OK. Automatic caching to disk (without a scary warning) seems like a better idea than always downloading the DTD files on demand (which seems wasteful of bandwidth and more likely to give intermittent errors), although as you have noted before there is the open question of where to put this files (including where on Windows): http://lists.open-bio.org/pipermail/biopython-dev/2010-October/008310.html Regards, Peter From p.j.a.cock at googlemail.com Mon Dec 9 10:13:14 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 9 Dec 2013 10:13:14 +0000 Subject: [Biopython-dev] [biopython] Missing DTD files (#260) In-Reply-To: <1386570781.7166.YahooMailBasic@web164001.mail.gq1.yahoo.com> References: <1386570781.7166.YahooMailBasic@web164001.mail.gq1.yahoo.com> Message-ID: On Mon, Dec 9, 2013 at 6:33 AM, Michiel de Hoon wrote: > Current we are using os.path.expanduser('~') /.biopython/Bio/Entrez/DTDs to look for locally stored DTDs. > This should work on Windows also. Well partly - it would create a "scary" .biopython folder which would NOT be hidden by default. On Windows we could deliberately mark that folder as hidden (a file system attribute, used instead of the leading dot convention from Unix). However, I think we should really be using something under: $HOME\Local Settings\Application Data A little research might be needed for how to get that setting (if possible without reading the registry and the additional Python dependency that would entail). > Then I would suggest the following if a DTD file is missing: > 1) Print a non-scary warning message that we will attempt to download the DTD; > 2) Download the DTD; > 3) Try to store it in the local DTD directory. If this fails (e.g. due to file permissions or whatnot), print another warning message; > 4) Use the downloaded DTD to parse the XML. > Any final objections? Only with regard to the location of the cache on Windows. Peter From mjldehoon at yahoo.com Mon Dec 9 14:20:23 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 9 Dec 2013 06:20:23 -0800 (PST) Subject: [Biopython-dev] [biopython] Missing DTD files (#260) In-Reply-To: Message-ID: <1386598823.52509.YahooMailBasic@web164004.mail.gq1.yahoo.com> Can somebody with a Windows computer check what os.path.expanduser('~') refers to on Windows? Or how matplotlib solves this on Windows (they are storing files under $HOME/.matplotlib on unix-like systems; I don't know what they use on Windows). Thanks, -Michiel. -------------------------------------------- On Mon, 12/9/13, Peter Cock wrote: Subject: Re: [Biopython-dev] [biopython] Missing DTD files (#260) To: "Michiel de Hoon" Cc: "Biopython-Dev Mailing List" Date: Monday, December 9, 2013, 5:13 AM On Mon, Dec 9, 2013 at 6:33 AM, Michiel de Hoon wrote: > Current we are using os.path.expanduser('~') /.biopython/Bio/Entrez/DTDs to look for locally stored DTDs. > This should work on Windows also. Well partly - it would create a "scary" .biopython folder which would NOT be hidden by default. On Windows we could deliberately mark that folder as hidden (a file system attribute, used instead of the leading dot convention from Unix). However, I think we should really be using something under: $HOME\Local Settings\Application Data A little research might be needed for how to get that setting (if possible without reading the registry and the additional Python dependency that would entail). > Then I would suggest the following if a DTD file is missing: > 1) Print a non-scary warning message that we will attempt to download the DTD; > 2) Download the DTD; > 3) Try to store it in the local DTD directory. If this fails (e.g. due to file permissions or whatnot), print another warning message; > 4) Use the downloaded DTD to parse the XML. > Any final objections? Only with regard to the location of the cache on Windows. Peter From tiagoantao at gmail.com Mon Dec 9 14:40:12 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 9 Dec 2013 14:40:12 +0000 Subject: [Biopython-dev] [biopython] Missing DTD files (#260) In-Reply-To: <1386598823.52509.YahooMailBasic@web164004.mail.gq1.yahoo.com> References: <1386598823.52509.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: Hi, I do not have access here to a windows machine (later in the day I can). But: /.biopython/Bio/Entrez/DTDs should not be correct. You probably want to replace "/" with os.sep, say: os.sep.join([".biopython", "Bio", "Entrez", "DTDs"]) which will give you \.biopython\Bio\Entrez\DTDs On 9 December 2013 14:20, Michiel de Hoon wrote: > Can somebody with a Windows computer check what os.path.expanduser('~') > refers to on Windows? Or how matplotlib solves this on Windows (they are > storing files under $HOME/.matplotlib on unix-like systems; I don't know > what they use on Windows). > > Thanks, > -Michiel. > > > -------------------------------------------- > On Mon, 12/9/13, Peter Cock wrote: > > Subject: Re: [Biopython-dev] [biopython] Missing DTD files (#260) > To: "Michiel de Hoon" > Cc: "Biopython-Dev Mailing List" > Date: Monday, December 9, 2013, 5:13 AM > > On Mon, Dec 9, 2013 at 6:33 AM, > Michiel de Hoon > wrote: > > Current we are using os.path.expanduser('~') > /.biopython/Bio/Entrez/DTDs to look for locally stored > DTDs. > > This should work on Windows also. > > Well partly - it would create a "scary" .biopython folder > which would NOT be hidden by default. On Windows > we could deliberately mark that folder as hidden (a > file system attribute, used instead of the leading dot > convention from Unix). However, I think we should > really be using something under: > > $HOME\Local Settings\Application Data > > A little research might be needed for how to get that > setting (if possible without reading the registry and the > additional Python dependency that would entail). > > > Then I would suggest the following if a DTD file is > missing: > > 1) Print a non-scary warning message that we will > attempt to download the DTD; > > 2) Download the DTD; > > 3) Try to store it in the local DTD directory. If this > fails (e.g. due to file permissions or whatnot), print > another warning message; > > 4) Use the downloaded DTD to parse the XML. > > Any final objections? > > Only with regard to the location of the cache on Windows. > > Peter > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From tiagoantao at gmail.com Mon Dec 9 14:41:43 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 9 Dec 2013 14:41:43 +0000 Subject: [Biopython-dev] Releasing biopython - lessons In-Reply-To: <87wqjij9bb.fsf@fastmail.fm> References: <87wqjij9bb.fsf@fastmail.fm> Message-ID: Hi Brad, > If you like this approach, happy to help with pointers on adding more > tools. Thanks again for taking this on, > > This definitely seems the way to go. I will try to build up a full dependency list along with testing this tomorrow. Tiago -- "The truth may be out there, but the lies are already in your head" - Terry Pratchett From p.j.a.cock at googlemail.com Mon Dec 9 15:09:50 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 9 Dec 2013 15:09:50 +0000 Subject: [Biopython-dev] Releasing biopython - lessons In-Reply-To: References: <87wqjij9bb.fsf@fastmail.fm> Message-ID: On Mon, Dec 9, 2013 at 2:41 PM, Tiago Ant?o wrote: > Hi Brad, > >> If you like this approach, happy to help with pointers on adding more >> tools. Thanks again for taking this on, > > This definitely seems the way to go. I will try to build up a full > dependency list along with testing this tomorrow. > > Tiago Excellent plan - thank you both for tackling this :) Peter From p.j.a.cock at googlemail.com Mon Dec 9 15:52:22 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 9 Dec 2013 15:52:22 +0000 Subject: [Biopython-dev] [biopython] Missing DTD files (#260) In-Reply-To: References: <1386598823.52509.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: > On 9 December 2013 14:20, Michiel de Hoon wrote: >> >> Can somebody with a Windows computer check what os.path.expanduser('~') >> refers to on Windows? Or how matplotlib solves this on Windows (they are >> storing files under $HOME/.matplotlib on unix-like systems; I don't know >> what they use on Windows). >> >> Thanks, >> -Michiel. On Windows XP, using Python 2.6, 2.7, 3.3, or PyPy 2.2, C:\Documents and Settings\pc40583>c:\python26\python Python 2.6 (r26:66721, Oct 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> print(os.path.expanduser('~')) C:\Documents and Settings\pcock i.e. As expected, it found my home directory. On Mon, Dec 9, 2013 at 2:40 PM, Tiago Ant?o wrote: > Hi, > > I do not have access here to a windows machine (later in the day I can). > But: > /.biopython/Bio/Entrez/DTDs > > should not be correct. You probably want to replace "/" with os.sep, say: > os.sep.join([".biopython", "Bio", "Entrez", "DTDs"]) > > which will give you > \.biopython\Bio\Entrez\DTDs Yes in theory, but in practice using the Unix style slashes works just fine in my experience (and they are used in many of our unit tests without problem). Even a mix of slashes works. Note style-wise it would be preferable to use os.path.join(...) rather than the string join method os.sep.join(...) as shown. Peter From kashyap.cc at gmail.com Mon Dec 9 16:29:05 2013 From: kashyap.cc at gmail.com (Kashyap Chhatbar) Date: Mon, 9 Dec 2013 16:29:05 +0000 Subject: [Biopython-dev] Introduction from a newbie; Tiny contribution Message-ID: Hi, I am happy to join the dev mailing list for Biopython. I would be definitely interested in learning more and hopefully at some point contributing to the biopython source. A few days ago, I created an issue on the official github repository of biopython (#267). I think I have come up with a noobish contribution to convert the file name (if relative path is given) to absolute paths when creating sqlite3 database using Bio.SeqIO.index_db() function. (commit de9ba35). I have not looked up too much of the biopython code so as to conclude whether introducing the absolute path for filenames is as such a good idea or not but I thought of this because Peter J Cock hinted of improving the code for relative paths. Cheers, Kashyap From p.j.a.cock at googlemail.com Mon Dec 9 16:42:43 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 9 Dec 2013 16:42:43 +0000 Subject: [Biopython-dev] Introduction from a newbie; Tiny contribution In-Reply-To: References: Message-ID: On Mon, Dec 9, 2013 at 4:29 PM, Kashyap Chhatbar wrote: > Hi, > > I am happy to join the dev mailing list for Biopython. I would be > definitely interested in learning more and hopefully at some point > contributing to the biopython source. Thank you and welcome :) > A few days ago, I created an issue on > the official github repository of biopython > (#267). Yes, thank for for raising this - it is something we can improve. > I think I have come up with a noobish contribution to convert the file name > (if relative path is given) to absolute paths when creating sqlite3 > database using Bio.SeqIO.index_db() function. (commit > de9ba35). > I have not looked up too much of the biopython code so as to conclude > whether introducing the absolute path for filenames is as such a good idea > or not but I thought of this because Peter J Cock hinted of improving the > code for relative paths. I don't think storing the absolute path is the best plan - I have tried to better explain what I meant by using relative filenames on the GitHub issue, https://github.com/biopython/biopython/issues/267 Is that clearer? Thanks, Peter From p.j.a.cock at googlemail.com Tue Dec 10 10:43:32 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 10 Dec 2013 10:43:32 +0000 Subject: [Biopython-dev] Error in SeqFeature.CompoundLocation parsing NCBI efetch format In-Reply-To: References: <52A0A8E4.2080806@binnisb.com> <52A0A9CE.9080400@binnisb.com> Message-ID: On Fri, Dec 6, 2013 at 10:38 AM, Peter Cock wrote: > On Fri, Dec 6, 2013 at 10:26 AM, Brynjar Sm?ri Bjarnason wrote: > >> Is it likely that this will find its way into 1.64? Not that I am in any >> rush since it is working, just wondering for the one that needs to maintain >> the Biopython versions. > > Yes, once the release is formally announced I intend to commit this > change (and a test - probably the very same example GenPept file > you reported). It is possible that we'll have a better solution in place > for Biopython 1.64 to handle these locations without simple dropping > the bond term. This workaround has now been committed to the main branch, https://github.com/biopython/biopython/commit/ddbe6dc5dc08aa5079dfd29bc240651675c48427 Along with the example file as a test case: https://github.com/biopython/biopython/commit/44ec2b2ce4687055a50f246ac264690ed664326c Thank you, Peter From kashyap.cc at gmail.com Tue Dec 10 13:15:22 2013 From: kashyap.cc at gmail.com (Kashyap Chhatbar) Date: Tue, 10 Dec 2013 13:15:22 +0000 Subject: [Biopython-dev] Introduction from a newbie; Tiny contribution In-Reply-To: References: Message-ID: On Mon, Dec 9, 2013 at 4:42 PM, Peter Cock wrote: > I don't think storing the absolute path is the best plan - I have > tried to better explain what I meant by using relative filenames on > the GitHub issue, > > https://github.com/biopython/biopython/issues/267 > > Is that clearer? > I have tried to understand the options and have commented further. Cheers, Kashyap From yeyanbo289 at gmail.com Thu Dec 12 15:09:10 2013 From: yeyanbo289 at gmail.com (Yanbo Ye) Date: Thu, 12 Dec 2013 23:09:10 +0800 Subject: [Biopython-dev] tutorial translation and rst files Message-ID: Hi guys, We almost completed the Chinese translation of the Biopython tutorial and fixed the format errors caused by latex to rst conversion. It's based on the Update ? 22 March 2013 and the repository is here: https://github.com/bigwiv/Biopython-cn , including the English version . I heard there was an discussion about switching the tutorial format from latex to rst or Sphinx/reStructuredText port. don't whether these files are useful for this task. I'm not an expert in rst format and there must be other errors in those files. Any suggestions? Best, Yanbo -- *Yanbo Ye* *Guangzhou Institutes of Biomedicine and Health, * *Chinese Academy of Sciences* *190 Kaiyuan Avenue, Science Park, Guangzhou, China* *Email: ye_yanbo at gibh.ac.cn * *Web: http://www.yeyanbo.com * *Phone: (86)-020-32093810* From mjldehoon at yahoo.com Thu Dec 12 15:16:14 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 12 Dec 2013 07:16:14 -0800 (PST) Subject: [Biopython-dev] [biopython] Missing DTD files (#260) In-Reply-To: Message-ID: <1386861374.64762.YahooMailBasic@web164003.mail.gq1.yahoo.com> Thanks Peter. I believe that matplotlib uses os.path.expanduser('~') /.matplotlib/matplotlibrc. Then shall we use the analogous for Biopython, so os.path.expanduser('~')/.biopython/Bio/Entrez/DTDs? Best, -Michiel. On Windows XP, using Python 2.6, 2.7, 3.3, or PyPy 2.2, C:\Documents and Settings\pc40583>c:\python26\python Python 2.6 (r26:66721, Oct? 2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> print(os.path.expanduser('~')) C:\Documents and Settings\pcock i.e. As expected, it found my home directory. From p.j.a.cock at googlemail.com Thu Dec 12 15:58:21 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 12 Dec 2013 15:58:21 +0000 Subject: [Biopython-dev] tutorial translation and rst files In-Reply-To: References: Message-ID: On Thu, Dec 12, 2013 at 3:09 PM, Yanbo Ye wrote: > Hi guys, > > We almost completed the Chinese translation of the Biopython tutorial and > fixed the format errors caused by latex to rst conversion. It's based on > the Update ? 22 March 2013 and the repository is here: > https://github.com/bigwiv/Biopython-cn , including the English version . > > I heard there was an discussion about switching the tutorial format from > latex to rst or Sphinx/reStructuredText port. don't whether these files are > useful for this task. I'm not an expert in rst format and there must be > other errors in those files. Any suggestions? > > Best, > Yanbo Hi Yanbo, That looks impressive - the individual chapters are fast on GitHub, but even the whole English document displays quite quickly there: https://github.com/bigwiv/Biopython-cn/blob/master/en/allinone.rst Sadly the all-in-one document makes GitHub struggle: https://github.com/bigwiv/Biopython-cn/blob/master/cn/allinone.rst Did you have any trouble converting particular parts of the Tutorial? There are a few places where we used LaTeX for complex mathematical formulas - that seems to be an rst weakness. Can you post compiled HTML and PDF output (English & Chinese) as well? That would be a fairer way to look at the output, rather than just seeing how GitHub renders it. Regards, Peter P.S. The six month old Tutorial.rst in the repository root does not seem to work? From p.j.a.cock at googlemail.com Thu Dec 12 16:30:07 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 12 Dec 2013 16:30:07 +0000 Subject: [Biopython-dev] Fwd: [biopython] TreeConstruction and Consensus modules from GSoC 2013 (#270) In-Reply-To: References: Message-ID: Hello Biopythoneers, For those of you not following the GitHub repository, this is quite a big and important pull request :) Please take a look! Thanks, Peter ---------- Forwarded message ---------- From: yeyanbo Date: Thu, Dec 12, 2013 at 2:16 PM Subject: [biopython] TreeConstruction and Consensus modules from GSoC 2013 (#270) To: biopython/biopython two module files created in Bio/Phylo: TreeConstruction.py, Consensus.py ; two test files created in Tests: test_TreeConstruction.py , test_Consensus.py ; directory created for testing files: Tests\TreeConstruction . ________________________________ You can merge this Pull Request by running git pull https://github.com/lijax/biopython master Or view, comment on, or merge it at: https://github.com/biopython/biopython/pull/270 Commit Summary add TreeConststruction and Consensus modules implement upgma and nj algorithms" add parsimony scorer rewrite and test parsimony score minor change add NNITreeSearcher._get_neighbors complete parsimony method fix the bug that the nj tree may have 1 child instead of 3 at the root add convertion from SubsMat to Matrix for protein submatrix BitString and strict_consensus reorganize different tests add automatic test for consensus tree add `repr` funcionn for Matrix class and improve the document move `delitem` and `insert` from DistanceMatrix to Matrix fix the index bug of Matrix.insert() function and improve the document test files for consensus algorithms improve document of DiscanceCalculator and DistanceTreeConstructor improve document for parsimony tree classes majority and adam consensus methods fix majority bug, finish adam consensus, doc improvement add branch support method change DistanceCalculator parameters as msa should be independent restructure TreeConstructor classes fix nj bug assign 0 lenght to root clade of nj and upgma add bootstrap method adapt "identity" model in DistanceCalculator to protein;None condition of starting_tree in ParsimonyTreeConstructor convert list to generator in bootstrap methods fix adam consensus bug test cleanup minor change make assistant classes private remove import * File Changes A Bio/Phylo/Consensus.py (570) A Bio/Phylo/TreeConstruction.py (1011) A Tests/TreeConstruction/adam_refs.tre (3) A Tests/TreeConstruction/bootstrap_consensus.tre (1) A Tests/TreeConstruction/consensus_refs.tre (3) A Tests/TreeConstruction/majority_ref.tre (2) A Tests/TreeConstruction/msa.phy (6) A Tests/TreeConstruction/neighbor_trees.tre (4) A Tests/TreeConstruction/nj.tre (1) A Tests/TreeConstruction/pars1.tre (1) A Tests/TreeConstruction/pars2.tre (1) A Tests/TreeConstruction/pars3.tre (1) A Tests/TreeConstruction/strict_refs.tre (3) A Tests/TreeConstruction/test.log (36) A Tests/TreeConstruction/trees.tre (3) A Tests/TreeConstruction/upgma.tre (1) A Tests/test_Consensus.py (152) A Tests/test_TreeConstruction.py (245) Patch Links: https://github.com/biopython/biopython/pull/270.patch https://github.com/biopython/biopython/pull/270.diff From yeyanbo289 at gmail.com Thu Dec 12 17:36:48 2013 From: yeyanbo289 at gmail.com (Yanbo Ye) Date: Fri, 13 Dec 2013 01:36:48 +0800 Subject: [Biopython-dev] Fwd: Re: tutorial translation and rst files In-Reply-To: References: Message-ID: forgot to add the list ---------- ????? ---------- ????"Yanbo Ye" ???2013?12?13? ??1:09 ???Re: [Biopython-dev] tutorial translation and rst files ????"Peter Cock" ??? Hi Peter, I added the compiled html files to the repository and the format seems to be ok. https://github.com/bigwiv/Biopython-cn/blob/master/cn/allinone.html https://github.com/bigwiv/Biopython-cn/blob/master/en/allinone.html On Thu, Dec 12, 2013 at 11:58 PM, Peter Cock wrote: > On Thu, Dec 12, 2013 at 3:09 PM, Yanbo Ye wrote: > > Hi guys, > > > > We almost completed the Chinese translation of the Biopython tutorial and > > fixed the format errors caused by latex to rst conversion. It's based on > > the Update ? 22 March 2013 and the repository is here: > > https://github.com/bigwiv/Biopython-cn , including the English version . > > > > I heard there was an discussion about switching the tutorial format from > > latex to rst or Sphinx/reStructuredText port. don't whether these files > are > > useful for this task. I'm not an expert in rst format and there must be > > other errors in those files. Any suggestions? > > > > Best, > > Yanbo > > Hi Yanbo, > > That looks impressive - the individual chapters are fast on GitHub, > but even the whole English document displays quite quickly there: > https://github.com/bigwiv/Biopython-cn/blob/master/en/allinone.rst > > Sadly the all-in-one document makes GitHub struggle: > https://github.com/bigwiv/Biopython-cn/blob/master/cn/allinone.rst > > Did you have any trouble converting particular parts of the Tutorial? > There are a few places where we used LaTeX for complex > mathematical formulas - that seems to be an rst weakness. There were many format errors with the hyperlinks, lists, tables and formulas. You can check the original converted file Tutorial.rst for reference. Now most of them are fixed, including the formulas(by using the latex code in the original latex file). Another big trouble is the title level inconsistency. For example, in chapter 15, there are three individual subsections before the section 15.1 start. It cannot be compiled through. Now I just change them to the section level to avoid the error. Any other solution? > Can you post compiled HTML and PDF output (English & Chinese) > as well? That would be a fairer way to look at the output, rather > than just seeing how GitHub renders it. > > Regards, > > Peter > > P.S. The six month old Tutorial.rst in the repository root does > not seem to work? > This is the original file converted and is just for reference. Best, Yanbo -- *Yanbo Ye* *Guangzhou Institutes of Biomedicine and Health, * *Chinese Academy of Sciences* *190 Kaiyuan Avenue, Science Park, Guangzhou, China* *Email: ye_yanbo at gibh.ac.cn * *Web: http://www.yeyanbo.com * *Phone: (86)-020-32093810* From yeyanbo289 at gmail.com Fri Dec 13 01:23:56 2013 From: yeyanbo289 at gmail.com (Yanbo Ye) Date: Fri, 13 Dec 2013 09:23:56 +0800 Subject: [Biopython-dev] [biopython] TreeConstruction and Consensus modules from GSoC 2013 (#270) In-Reply-To: References: Message-ID: Thanks, Peter. I noticed the Travis CI build failed because of the 'StringIO' import error under version 3.3. Need to fix this version issue. On Fri, Dec 13, 2013 at 12:30 AM, Peter Cock wrote: > Hello Biopythoneers, > > For those of you not following the GitHub repository, > this is quite a big and important pull request :) > > Please take a look! > > Thanks, > > Peter > > ---------- Forwarded message ---------- > From: yeyanbo > Date: Thu, Dec 12, 2013 at 2:16 PM > Subject: [biopython] TreeConstruction and Consensus modules from GSoC > 2013 (#270) > To: biopython/biopython > > > two module files created in Bio/Phylo: TreeConstruction.py, Consensus.py ; > two test files created in Tests: test_TreeConstruction.py , > test_Consensus.py ; > directory created for testing files: Tests\TreeConstruction . > > ________________________________ > > You can merge this Pull Request by running > > git pull https://github.com/lijax/biopython master > > Or view, comment on, or merge it at: > > https://github.com/biopython/biopython/pull/270 > > Commit Summary > > add TreeConststruction and Consensus modules > implement upgma and nj algorithms" > add parsimony scorer > rewrite and test parsimony score > minor change > add NNITreeSearcher._get_neighbors > complete parsimony method > fix the bug that the nj tree may have 1 child instead of 3 at the root > add convertion from SubsMat to Matrix for protein submatrix > BitString and strict_consensus > reorganize different tests > add automatic test for consensus tree > add `repr` funcionn for Matrix class and improve the document > move `delitem` and `insert` from DistanceMatrix to Matrix > fix the index bug of Matrix.insert() function and improve the document > test files for consensus algorithms > improve document of DiscanceCalculator and DistanceTreeConstructor > improve document for parsimony tree classes > majority and adam consensus methods > fix majority bug, finish adam consensus, doc improvement > add branch support method > change DistanceCalculator parameters as msa should be independent > restructure TreeConstructor classes > fix nj bug > assign 0 lenght to root clade of nj and upgma > add bootstrap method > adapt "identity" model in DistanceCalculator to protein;None condition > of starting_tree in ParsimonyTreeConstructor > convert list to generator in bootstrap methods > fix adam consensus bug > test cleanup > minor change > make assistant classes private > remove import * > > File Changes > > A Bio/Phylo/Consensus.py (570) > A Bio/Phylo/TreeConstruction.py (1011) > A Tests/TreeConstruction/adam_refs.tre (3) > A Tests/TreeConstruction/bootstrap_consensus.tre (1) > A Tests/TreeConstruction/consensus_refs.tre (3) > A Tests/TreeConstruction/majority_ref.tre (2) > A Tests/TreeConstruction/msa.phy (6) > A Tests/TreeConstruction/neighbor_trees.tre (4) > A Tests/TreeConstruction/nj.tre (1) > A Tests/TreeConstruction/pars1.tre (1) > A Tests/TreeConstruction/pars2.tre (1) > A Tests/TreeConstruction/pars3.tre (1) > A Tests/TreeConstruction/strict_refs.tre (3) > A Tests/TreeConstruction/test.log (36) > A Tests/TreeConstruction/trees.tre (3) > A Tests/TreeConstruction/upgma.tre (1) > A Tests/test_Consensus.py (152) > A Tests/test_TreeConstruction.py (245) > > Patch Links: > > https://github.com/biopython/biopython/pull/270.patch > https://github.com/biopython/biopython/pull/270.diff > -- *Yanbo Ye* *Guangzhou Institutes of Biomedicine and Health, * *Chinese Academy of Sciences* *190 Kaiyuan Avenue, Science Park, Guangzhou, China* *Email: ye_yanbo at gibh.ac.cn * *Web: http://www.yeyanbo.com * *Phone: (86)-020-32093810* From p.j.a.cock at googlemail.com Sat Dec 14 21:24:52 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 14 Dec 2013 21:24:52 +0000 Subject: [Biopython-dev] [biopython] Missing DTD files (#260) In-Reply-To: <1386861374.64762.YahooMailBasic@web164003.mail.gq1.yahoo.com> References: <1386861374.64762.YahooMailBasic@web164003.mail.gq1.yahoo.com> Message-ID: On Thu, Dec 12, 2013 at 3:16 PM, Michiel de Hoon wrote: > Thanks Peter. > I believe that matplotlib uses os.path.expanduser('~') /.matplotlib/matplotlibrc. > Then shall we use the analogous for Biopython, so > os.path.expanduser('~')/.biopython/Bio/Entrez/DTDs? That or ~/.config/biopython makes sense under Linux and Mac, but I think we want something like this on Windows (untested, based on some Google reading): os.path.join(os.getenv("APPDATA"), "biopython") Regards, Peter From mjldehoon at yahoo.com Thu Dec 26 10:28:32 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 26 Dec 2013 02:28:32 -0800 (PST) Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings In-Reply-To: Message-ID: <1388053712.20811.YahooMailBasic@web164006.mail.gq1.yahoo.com> Fixed; please let us know if you encounter any problems. -Michiel. -------------------------------------------- On Mon, 9/23/13, Peter Cock wrote: Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings To: "Biopython-Dev Mailing List" Date: Monday, September 23, 2013, 4:58 PM Hi all, I'm seeing the following warning from NumPy 1.7 with Python 3.3 on Mac OS X, and on Linux too. I believe the NumPy version is the critical factor: building 'Bio.Cluster.cluster' extension building 'Bio.KDTree._CKDTree' extension building 'Bio.Motif._pwm' extension building 'Bio.motifs._pwm' extension all give: /Users/peterjc/lib/python3.3/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:11:2: warning: "Using ? ? ? deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] According to this page, http://docs.scipy.org/doc/numpy-dev/reference/c-api.deprecations.html If we add this line it should confirm our code is clean for NumPy 1.7 (and implies to side effects on older NumPy): #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION Unfortunately that seems all four modules have problems doing that, presumably planned NumPy C API changes we need to handle via a version conditional #ifdef? Peter _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev