From mjldehoon at yahoo.com Fri Feb 1 00:35:18 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 31 Jan 2013 21:35:18 -0800 (PST) Subject: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone In-Reply-To: Message-ID: <1359696918.16306.YahooMailClassic@web164005.mail.gq1.yahoo.com> Hi Peter, Bow, > > I'm OK with using the setUp and tearDown arguments to > > doctest.DocTestSuite to do the directory magic, but > keeping the test files > > under Tests/. > > As a more elegant version of the Bio._utils.run_doctest() > function? Exactly. Bow, do you want to give this approach a try? (Assuming that there are no further objections from the other developers). Best, -Michiel. From w.arindrarto at gmail.com Fri Feb 1 05:29:59 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 1 Feb 2013 11:29:59 +0100 Subject: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone In-Reply-To: <1359696918.16306.YahooMailClassic@web164005.mail.gq1.yahoo.com> References: <1359696918.16306.YahooMailClassic@web164005.mail.gq1.yahoo.com> Message-ID: Hi Michiel, Peter, everyone, >> > I'm OK with using the setUp and tearDown arguments to >> > doctest.DocTestSuite to do the directory magic, but >> keeping the test files >> > under Tests/. >> >> As a more elegant version of the Bio._utils.run_doctest() >> function? > > Exactly. Bow, do you want to give this approach a try? (Assuming that there are no further objections from the other developers). Just to be clear, we are: * changing all module's doctest file path to use relative paths (with respect to the module's location), * replacing the run_doctest() import with a simpler doctest import and `doctest.testmod()` in each module having this doctest * resorting to setUp and tearDown in the DocTestSuite in `run_tests.py` so that each module / submodule can find their test files * and refactoring all string functions in Bio._utils to Bio.Phylo and Bio.SearchIO, so that we can remove Bio._utils, right? I'd be happy to give this a shot if everyone feels the same :). Regards, Bow From p.j.a.cock at googlemail.com Fri Feb 1 06:07:22 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 11:07:22 +0000 Subject: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone In-Reply-To: References: <1359696918.16306.YahooMailClassic@web164005.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 10:29 AM, Wibowo Arindrarto wrote: > Hi Michiel, Peter, everyone, > >>> > I'm OK with using the setUp and tearDown arguments to >>> > doctest.DocTestSuite to do the directory magic, but >>> > keeping the test files >>> > under Tests/. >>> >>> As a more elegant version of the Bio._utils.run_doctest() >>> function? >> >> Exactly. Bow, do you want to give this approach a try? >> (Assuming that there are no further objections from the other developers). > > Just to be clear, we are: > > * changing all module's doctest file path to use relative paths (with > respect to the module's location), > * replacing the run_doctest() import with a simpler doctest import and > `doctest.testmod()` in each module having this doctest > * resorting to setUp and tearDown in the DocTestSuite in > `run_tests.py` so that each module / submodule can find their test > files That wasn't my understanding - I thought we we just talking about making the Bio._utils.run_doctest() use setUp and tearDown to take care of the path changes (although I'm not sure if that will actually be any shorter - we'd find out). > * and refactoring all string functions in Bio._utils to Bio.Phylo and > Bio.SearchIO, so that we can remove Bio._utils, I'm not particularly bothered either way on this. Having misc utilities like this under Bio.Phylo or Bio.SearchIO makes is clear where they are used, and makes it easier to compartmentalise functionality. Regards, Peter From mjldehoon at yahoo.com Fri Feb 1 06:23:15 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 1 Feb 2013 03:23:15 -0800 (PST) Subject: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone In-Reply-To: Message-ID: <1359717795.82942.YahooMailClassic@web164001.mail.gq1.yahoo.com> Hi Bow, Yes, that is correct. Responding to Peter's email: Peter, do you agree with this approach? Best, -Michiel. --- On Fri, 2/1/13, Wibowo Arindrarto wrote: > From: Wibowo Arindrarto > Subject: Re: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone > To: "Michiel de Hoon" > Cc: "Peter Cock" , "BioPython-Dev Mailing List" > Date: Friday, February 1, 2013, 5:29 AM > Hi Michiel, Peter, everyone, > > >> > I'm OK with using the setUp and tearDown > arguments to > >> > doctest.DocTestSuite to do the directory > magic, but > >> keeping the test files > >> > under Tests/. > >> > >> As a more elegant version of the > Bio._utils.run_doctest() > >> function? > > > > Exactly. Bow, do you want to give this approach a try? > (Assuming that there are no further objections from the > other developers). > > Just to be clear, we are: > > * changing all module's doctest file path to use relative > paths (with > respect to the module's location), > * replacing the run_doctest() import with a simpler doctest > import and > `doctest.testmod()` in each module having this doctest > * resorting to setUp and tearDown in the DocTestSuite in > `run_tests.py` so that each module / submodule can find > their test > files > * and refactoring all string functions in Bio._utils to > Bio.Phylo and > Bio.SearchIO, so that we can remove Bio._utils, > > right? > > I'd be happy to give this a shot if everyone feels the same > :). > > Regards, > Bow > From p.j.a.cock at googlemail.com Fri Feb 1 06:51:16 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 11:51:16 +0000 Subject: [Biopython-dev] Trie with_prefix doesn't work as expected In-Reply-To: References: Message-ID: On Thu, Jan 31, 2013 at 11:38 AM, Peter Cock wrote: > > Thanks to Jeff Chang for a very speedy fix (sent as an attachment off list), > which I have applied to the repository: > https://github.com/biopython/biopython/commit/cd7cc7174fd4b0607381e9c58f6ae0d17cca8f74 > > I've also added a unit test based on Kevin's example: > https://github.com/biopython/biopython/commit/efc289c8fe2e78ad12481973e42554fa40f2ea0a > > Thank you for reporting this Kevin. > > Peter > > P.S. Nice to hear from you again Jeff :) > > I think your last commit was before we moved from CVS to git, please > let us know if you want commit access on github. Thanks again to Kevin for another test case, and a Jeff for another quick code fix where a trie key exceeded the MAX_KEY_LENGTH buffer: https://github.com/biopython/biopython/commit/31909c8725d5cfbfba2096b7c15ef7afeaf20a5b Peter From redmine at redmine.open-bio.org Fri Feb 1 06:51:51 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Fri, 1 Feb 2013 11:51:51 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Peter Cock. Kevin Wu reported a related issue, which we discussed with Jeff Chang (off list), where a key in the trie exceeded 1000 bytes (the original value of MAX_KEY_LENGTH). See: http://lists.open-bio.org/pipermail/biopython-dev/2013-February/010284.html https://github.com/biopython/biopython/commit/31909c8725d5cfbfba2096b7c15ef7afeaf20a5b (Ideally we could give a specific ValueError exception here, but nevertheless the current print message is an improvement) ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Fri Feb 1 07:14:49 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 12:14:49 +0000 Subject: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone In-Reply-To: <1359717795.82942.YahooMailClassic@web164001.mail.gq1.yahoo.com> References: <1359717795.82942.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: >Wibowo Arindrarto wrote: >> Just to be clear, we are: >> >> * changing all module's doctest file path to use relative >> paths (with >> respect to the module's location), >> * replacing the run_doctest() import with a simpler doctest >> import and >> `doctest.testmod()` in each module having this doctest >> * resorting to setUp and tearDown in the DocTestSuite in >> `run_tests.py` so that each module / submodule can find >> their test >> files >> * and refactoring all string functions in Bio._utils to >> Bio.Phylo and >> Bio.SearchIO, so that we can remove Bio._utils, >> >> right? >> >> I'd be happy to give this a shot if everyone feels the same >> :). >> On Fri, Feb 1, 2013 at 11:23 AM, Michiel de Hoon wrote: > Hi Bow, > > Yes, that is correct. > Responding to Peter's email: Peter, do you agree with this approach? > > Best, > -Michiel. No. I think we have misunderstood each other on the doctest discussion :( If we keep the test files under Tests/ (and I think that is best) then for example look at this doctest in Bio/SeqRecord.py >>> from Bio import SeqIO >>> record = SeqIO.read(open("Fasta/sweetpea.nu"),"fasta") >>> len(record) 309 That is currently written to assume it is run from the Tests/ folder. If we write this assuming is it in the Bio/ folder where the Python file SeqRecord.py lives, it becomes: >>> from Bio import SeqIO >>> record = SeqIO.read(open("../Tests/Fasta/sweetpea.nu"),"fasta") >>> len(record) 309 I think a beginner would find that more confusing. It is also longer and we already have trouble with some lines exceeding 80 chars. Ideally there would be a nice way for doctests to specify the folder, and then we could use a simple filename like "sweetpea.nu" with no directories at all. But I don't think that is possible without us making the testing infrastructure even more complicated. -- If we want to get rid of Bio._utils.run_doctest() (and the whole of the file Bio/_utils.py) then I would prefer reverting to the old situation prior to adding the Bio._utils.run_doctest() helper function. If the repetitive code snippets to run the doctests of a module are a problem it can be shortened to something less flexible, for example in Bio/SeqRecord.py could use something very short like this: if __name__ == "__main__": assert os.path.isfile("Fasta/sweetpea.nu"), "Run from Tests/ folder" import doctest doctest.testmod(verbose=2) Or, as I suggested before, we can remove these development convenience hooks completely? -- On the subject of the string functions in Bio/_utils.py, I have no objection to moving them back under Bio.SearchIO and/or Bio.Phylo - which has advantages in terms of modularity (a good thing for preventing accidental side effects). Regards, Peter From mjldehoon at yahoo.com Fri Feb 1 08:54:46 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 1 Feb 2013 05:54:46 -0800 (PST) Subject: [Biopython-dev] Namespace for online resources? Message-ID: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com> Hi Lenna, > Regarding point (2), is your primary concern namespace clutter or > importing efficiency? Regarding point (2), my primary concern is that a Bio.WWW module would group together modules that don't have much in common with each other. I agree to your point that the category of internet access is more fundamental than the category of parsers. But still, which modules should then go into a Bio.WWW module? Any module whose sole purpose is to use the internet (that would exclude Bio.Entrez)? Any module whose main purpose is to use the internet? This would be unclear; for example, Bio.Entrez may or may not fall in that category, depending on how you use the module. Any module whose functionality includes internet access? Then if one day we add access to the JASPAR database over the internet to Bio.Motif, it would have to move to Bio.WWW. Currently most modules are organized by theme (Bio.Seq, Bio.Motif, Bio.Cluster, Bio.Phylo, Bio.Entrez, etc.). For each theme, we have one module, one chapter in the documentation, one test of unit tests, one set of doctests, which I think is a huge advantage both in terms of clarity and in terms of user experience. Best, -Michiel. --- On Wed, 1/30/13, Lenna Peterson wrote: From: Lenna Peterson Subject: Re: [Biopython-dev] Namespace for online resources? To: "Michiel de Hoon" Cc: "Biopython-Dev Mailing List" Date: Wednesday, January 30, 2013, 12:10 PM Michiel,? You raise an excellent point that separating the modules in this way will complicate doctests.? Regarding point (2), is your primary concern namespace clutter or importing efficiency?? I still maintain that the category of internet access is more fundamental than the category of parsers. For point (1), if every database is accessed using a WWW submodule, a user will know to look there. Obviously moving everything would be a lot of work... Cheers,? Lenna On Tue, Jan 29, 2013 at 9:00 PM, Michiel de Hoon wrote: Bio.WWW was one of those modules that seem a good idea at first, but then failed to gain general acceptance. There are three problems with Bio.WWW: 1) From the module name, it's not clear what you would find in it. For example, if you want to access the Entrez database, would you first look in Bio.Entrez or in Bio.WWW? Similarly for TAIR: Would you look for it in Bio.TAIR, or in Bio.WWW? 2) The modules in Bio.WWW don't have much to do with each other, except that they access the internet. But any given user probably is mainly interested in Entrez, or ExPASy, or some other database, not in all of them at the same time. 3) The flip side of this is that a user accessing e.g. ExPASy would have to import both Bio.WWW and Bio.ExPASy to be able to use ExPASy. Doctests get more complicated also, as they would span more than one module. Here is an example from Bio.Entrez that accesses the database, and then parses the results: >>> from Bio import Entrez >>> Entrez.email = "Your.Name.Here at example.org" >>> handle = Entrez.einfo() # or esearch, efetch, ... >>> record = Entrez.read(handle) >>> handle.close() The ultimate question is whether we organize the code in Biopython by their functionality from a user perspective, or by the kind of things they do? Almost all of Biopython is organized according to the former. For example, we don't have a Bio.Parsers module for all the parsers; similarly, we don't have Bio.WWW for internet access. Best, -Michiel. --- On Tue, 1/29/13, Peter Cock wrote: > From: Peter Cock > Subject: Re: [Biopython-dev] Namespace for online resources? > To: "Wibowo Arindrarto" > Cc: "Biopython-Dev Mailing List" > Date: Tuesday, January 29, 2013, 4:11 PM > On Tue, Jan 29, 2013 at 9:03 PM, > Peter Cock > wrote: > > On Tue, Jan 29, 2013 at 7:52 PM, Wibowo Arindrarto > > > wrote: > >> Hi everyone, > >> > >> Why was Bio.WWW deprecated in the first place? > >> > > > > The flippant answer is everything under Bio.WWW was > moved > > or deprecated: > > http://lists.open-bio.org/pipermail/biopython-dev/2008-July/004059.html > > > > I'm trying to identify the discussions prior to that > covering the moves: > > > > Bio.WWW.ExPASy -> Bio.ExPASy > > Bio.WWW.InterPro -> Bio.InterPro > > Bio.WWW.NCBI -> Bio.Entrez > > Bio.WWW.SCOP -> Bio.SCOP > > Probably this thread, > http://lists.open-bio.org/pipermail/biopython-dev/2007-November/003241.html > > Also a bit more background on the NCBI Entrez side: > http://lists.open-bio.org/pipermail/biopython-dev/2008-February/003423.html > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Fri Feb 1 09:14:56 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 14:14:56 +0000 Subject: [Biopython-dev] Namespace for online resources? In-Reply-To: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com> References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 1:54 PM, Michiel de Hoon wrote: > Hi Lenna, > >> Regarding point (2), is your primary concern namespace clutter or >> importing efficiency? > > Regarding point (2), my primary concern is that a Bio.WWW module would > group together modules that don't have much in common with each other. I > agree to your point that the category of internet access is more fundamental > than the category of parsers. But still, which modules should then go into a > Bio.WWW module? Any module whose sole purpose is to use the internet (that > would exclude Bio.Entrez)? Any module whose main purpose is to use the > internet? This would be unclear; for example, Bio.Entrez may or may not fall > in that category, depending on how you use the module. Any module whose > functionality includes internet access? Then if one day we add access to the > JASPAR database over the internet to Bio.Motif, it would have to move to > Bio.WWW. > > Currently most modules are organized by theme (Bio.Seq, Bio.Motif, > Bio.Cluster, Bio.Phylo, Bio.Entrez, etc.). For each theme, we have one > module, one chapter in the documentation, one test of unit tests, one set of > doctests, which I think is a huge advantage both in terms of clarity and in > terms of user experience. Also with the theme approach, most (if not all) the themes are likely to have some online resources (databases or remote APIs). On those grounds it makes sense to keep online motif functionality (like weblogo) under Bio.Motif, and so on. People leaning for a Bio.WWW grouping: Bow, Lenna, Kevin (which could be a big disruption with lots of code relocation) People leaning against a Bio.WWW grouping: Michiel, Peter (me) (which would also be the status quo, so no disruption). In the specific case of Kevin's TAIR code for fetch Arabidopsis sequences, Bio.TAIR (lower case?) is consistent with current usage. Somewhere under Bio.Seq* also seems sensible to me, as I wrote at the start of this thread. Regards, Peter From mjldehoon at yahoo.com Fri Feb 1 09:12:38 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 1 Feb 2013 06:12:38 -0800 (PST) Subject: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone In-Reply-To: Message-ID: <1359727958.56029.YahooMailClassic@web164001.mail.gq1.yahoo.com> Hi Peter, As we misunderstood each other, let me try once to make the case for putting test files in Bio/*. If I fail to convince you, let's either go back to the situation before Bio._utils, or remove the "if __name__ == '__main__':" stuff altogether. First of all, if we use "if __name__ == '__main__':" to run the docstring tests, then those tests should pass if a user executes the script. Otherwise, we have installed some code that makes no sense outside of the distribution. This is also a problem with the os.path.isfile("Fasta/sweetpea.nu"), "Run from Tests/ folder" solution, as after installation there is no Tests/ folder any more. Suppose we make a subdirectory Examples in each module that uses docstring tests which need some data files, and put the data files in the Examples subdirectory. The docstring tests are supposed to be simple (full testing is done by the unittests), so the example data files can be tiny. The docstring tests can then use >>> record = SeqIO.read(open("Examples/sweetpea.nu"),"fasta") which is simple enough. The unit tests can switch to the appropriate directory when running the docstring tests. A user, finding the example in the docstring tests, can try out the example directly, since the data file is provided together with the relevant module. And since the data file is in the subdirectory Examples/, there is still some separation between the code and the data. Best, -Michiel. --- On Fri, 2/1/13, Peter Cock wrote: > From: Peter Cock > Subject: Re: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone > To: "Michiel de Hoon" > Cc: "Wibowo Arindrarto" , "BioPython-Dev Mailing List" > Date: Friday, February 1, 2013, 7:14 AM > >Wibowo Arindrarto > wrote: > >> Just to be clear, we are: > >> > >> * changing all module's doctest file path to use > relative > >> paths (with > >> respect to the module's location), > >> * replacing the run_doctest() import with a simpler > doctest > >> import and > >> `doctest.testmod()` in each module having this > doctest > >> * resorting to setUp and tearDown in the > DocTestSuite in > >> `run_tests.py` so that each module / submodule can > find > >> their test > >> files > >> * and refactoring all string functions in > Bio._utils to > >> Bio.Phylo and > >> Bio.SearchIO, so that we can remove Bio._utils, > >> > >> right? > >> > >> I'd be happy to give this a shot if everyone feels > the same > >> :). > >> > > On Fri, Feb 1, 2013 at 11:23 AM, Michiel de Hoon > wrote: > > Hi Bow, > > > > Yes, that is correct. > > Responding to Peter's email: Peter, do you agree with > this approach? > > > > Best, > > -Michiel. > > No. I think we have misunderstood each other on the doctest > discussion :( > > If we keep the test files under Tests/ (and I think that is > best) > then for example look at this doctest in Bio/SeqRecord.py > > ? ? ? ? >>> from Bio import > SeqIO > ? ? ? ? >>> record = > SeqIO.read(open("Fasta/sweetpea.nu"),"fasta") > ? ? ? ? >>> len(record) > ? ? ? ? 309 > > That is currently written to assume it is run from the > Tests/ > folder. If we write this assuming is it in the Bio/ folder > where > the Python file SeqRecord.py lives, it becomes: > > ? ? ? ? >>> from Bio import > SeqIO > ? ? ? ? >>> record = > SeqIO.read(open("../Tests/Fasta/sweetpea.nu"),"fasta") > ? ? ? ? >>> len(record) > ? ? ? ? 309 > > I think a beginner would find that more confusing. It is > also longer > and we already have trouble with some lines exceeding 80 > chars. > > Ideally there would be a nice way for doctests to specify > the folder, > and then we could use a simple filename like "sweetpea.nu" > with > no directories at all. But I don't think that is possible > without us > making the testing infrastructure even more complicated. > > -- > > If we want to get rid of Bio._utils.run_doctest() (and the > whole of > the file Bio/_utils.py) then I would prefer reverting to the > old situation > prior to adding the Bio._utils.run_doctest() helper > function. > > If the repetitive code snippets to run the doctests of a > module are a > problem it can be shortened to something less flexible, for > example > in Bio/SeqRecord.py could use something very short like > this: > > if __name__ == "__main__": > ? ? assert os.path.isfile("Fasta/sweetpea.nu"), > "Run from Tests/ folder" > ? ? import doctest > ? ? doctest.testmod(verbose=2) > > Or, as I suggested before, we can remove these development > convenience hooks completely? > > -- > > On the subject of the string functions in Bio/_utils.py, I > have no > objection to moving them back under Bio.SearchIO and/or > Bio.Phylo - which has advantages in terms of modularity (a > good thing for preventing accidental side effects). > > Regards, > > Peter > From p.j.a.cock at googlemail.com Fri Feb 1 09:32:46 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 14:32:46 +0000 Subject: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone In-Reply-To: <1359727958.56029.YahooMailClassic@web164001.mail.gq1.yahoo.com> References: <1359727958.56029.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 2:12 PM, Michiel de Hoon wrote: > Hi Peter, > > As we misunderstood each other, let me try once to make the case for > putting test files in Bio/*. If I fail to convince you, let's either go back > to the situation before Bio._utils, or remove the "if __name__ == > '__main__':" stuff altogether. I'm not convinced about putting test files under Bio/* so lets revert the use of the helper function Bio._utils.run_doctest(), and if you wish proceed with removing Bio/_utils.py as well. Shall I go ahead and revert 8b59d89bb4e282192ddee751e24ceef4afa63528 then remove run_doctest and find_test_dir from Bio/_utils.py now? > First of all, if we use "if __name__ == '__main__':" to run the docstring > tests, then those tests should pass if a user executes the script. > Otherwise, we have installed some code that makes no sense outside of the > distribution. This is also a problem with the > os.path.isfile("Fasta/sweetpea.nu"), "Run from Tests/ folder" > solution, as after installation there is no Tests/ folder any more. That is a good point, this has always been a weakness of the __main__ hook to run the doctests. > Suppose we make a subdirectory Examples in each module that uses docstring > tests which need some data files, and put the data files in the Examples > subdirectory. The docstring tests are supposed to be simple (full testing is > done by the unittests), so the example data files can be tiny. > > The docstring tests can then use >>>> record = SeqIO.read(open("Examples/sweetpea.nu"),"fasta") > which is simple enough. > The unit tests can switch to the appropriate directory when running the > docstring tests. > A user, finding the example in the docstring tests, can try out the > example directly, since the data file is provided together with the relevant > module. > And since the data file is in the subdirectory Examples/, there is still > some separation between the code and the data. Did you envision installing the examples subdirectories next to the code under site-packages? Technically that is doable, but I'm not sure if that is considered good practice (does anyone know the relevant Debian policies for example - they're quite keen on this kind of thing?). I much prefer the simplicity of having all the test files in one place (under the Tests/ folder) especially as things like simple FASTA files get used in doctests and unittests for multiple different areas of Biopython. Regards, Peter From p.j.a.cock at googlemail.com Fri Feb 1 09:56:02 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 14:56:02 +0000 Subject: [Biopython-dev] Bio.Motif update In-Reply-To: <1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com> References: <1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 2:53 PM, Michiel de Hoon wrote: > Hi Peter and all, > > --- On Tue, 1/29/13, Peter Cock wrote: >> We need to say something about this in the NEWS file too. > > Done. > >> I think it would make sense to add a PendingDeprecationWarning >> to Bio.Motif now. > > Done. Thanks. >> Also, if you feel the new Bio.motifs API isn't quite >> settled yet, adding the new BiopythonExperimentalWarning to >> that makes sense. > > I don't expect big changes in the API, so I think we can do without the > BiopythonExperimentalWarning. Also we should avoid the situation > that Bio.Motif gives a DeprecationWarning, and Bio.Motifs gives a > BiopythonExperimentalWarning. Agreed. >> (And once this is settled, I think we can schedule the >> release) > > We should also check whether we can remove deprecated modules, > or deprecate modules that are currently declared obsolete. Or has > somebody done that already? I went over the list in the DEPRECATED file last month, but a second check would be a good idea. Peter From mjldehoon at yahoo.com Fri Feb 1 09:53:06 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 1 Feb 2013 06:53:06 -0800 (PST) Subject: [Biopython-dev] Bio.Motif update In-Reply-To: Message-ID: <1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com> Hi Peter and all, --- On Tue, 1/29/13, Peter Cock wrote: > We need to say something about this in the NEWS file too. Done. > I think it would make sense to add a PendingDeprecationWarning > to Bio.Motif now. Done. > Also, if you feel the new Bio.motifs API isn't quite > settled yet, adding the new BiopythonExperimentalWarning to > that makes sense. I don't expect big changes in the API, so I think we can do without the BiopythonExperimentalWarning. Also we should avoid the situation that Bio.Motif gives a DeprecationWarning, and Bio.Motifs gives a BiopythonExperimentalWarning. > (And once this is settled, I think we can schedule the > release) We should also check whether we can remove deprecated modules, or deprecate modules that are currently declared obsolete. Or has somebody done that already? Best, -Michiel From p.j.a.cock at googlemail.com Fri Feb 1 10:03:24 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 15:03:24 +0000 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? Message-ID: Hello all, I think we're overdue for a Biopython release now, and I would like to do this next week. There are still plenty more additions and enhancements waiting in the wings, but right now I just want any remaining bug fixes addressed. Are there any release blocking issues? Thanks, Peter From w.arindrarto at gmail.com Fri Feb 1 10:29:09 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 1 Feb 2013 16:29:09 +0100 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: Message-ID: Hi Peter, > I think we're overdue for a Biopython release now, and I would > like to do this next week. There are still plenty more additions > and enhancements waiting in the wings, but right now I just > want any remaining bug fixes addressed. > > Are there any release blocking issues? There's still one bug for Bio.SearchIO that I would prefer to be fixed (https://redmine.open-bio.org/issues/3400). Is it possible to wait a few more days (no later than next week I hope) to sort this bug out? Also, since this is our first release with the BiopythonExperimentalWarning, I was thinking maybe we can include some modules that have been in the waiting line. One that I can think of right now is Andrew's MafIO (re: the recent mention as well). Considering that some people have started using it, maybe we can release it under a BiopythonExperimentalWarning. And later down the line, perhaps we can include Brad's GTF/GFF parser (seeing that this is already included in the wiki, maybe it's a good time to start considering where to put it)? Brad, if you don't mind, perhaps we can start working on this as well. Regards, Bow From p.j.a.cock at googlemail.com Fri Feb 1 10:40:03 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 15:40:03 +0000 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: Message-ID: On Fri, Feb 1, 2013 at 3:29 PM, Wibowo Arindrarto wrote: > Hi Peter, > >> I think we're overdue for a Biopython release now, and I would >> like to do this next week. There are still plenty more additions >> and enhancements waiting in the wings, but right now I just >> want any remaining bug fixes addressed. >> >> Are there any release blocking issues? > > There's still one bug for Bio.SearchIO that I would prefer to be fixed > (https://redmine.open-bio.org/issues/3400). Is it possible to wait a > few more days (no later than next week I hope) to sort this bug out? A few days sure - but that is a small enough issue (and in a clearly marked 'here be dragons experimental code' section) that I don't think it should delay the whole release. > Also, since this is our first release with the > BiopythonExperimentalWarning, I was thinking maybe we can include some > modules that have been in the waiting line. One that I can think of > right now is Andrew's MafIO (re: the recent mention as well). > Considering that some people have started using it, maybe we can > release it under a BiopythonExperimentalWarning. > > And later down the line, perhaps we can include Brad's GTF/GFF parser > (seeing that this is already included in the wiki, maybe it's a good > time to start considering where to put it)? Brad, if you don't mind, > perhaps we can start working on this as well. Both examples of things I would like to do *after* shipping Biopython 1.61, which I feel is already overdue. Regards, Peter From mjldehoon at yahoo.com Fri Feb 1 10:39:15 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 1 Feb 2013 07:39:15 -0800 (PST) Subject: [Biopython-dev] Bio.Motif update In-Reply-To: Message-ID: <1359733155.26451.YahooMailClassic@web164006.mail.gq1.yahoo.com> --- On Fri, 2/1/13, Peter Cock wrote: > I went over the list in the DEPRECATED file last month, but > a second check would be a good idea. The following were declared obsolete in Biopython 1.60, and can in principle be declared deprecated in Biopython 1.61: ---------- Bio/Blast/Applications.py: BlastallCommandline BlastpgpCommandline RpsBlastCommandline Bio/Blast/NCBIStandalone.py overall, and specifically: blastall blastpgp rpsblast Bio/ParserSupport.py overall Bio/PDB/AbstractPropertyMap.py: The has_key function in class AbstractPropertyMap Bio/PDB/FragmentMapper.py: The has_key function in class FragmentMapper Bio/UniGene/UniGene.py overall In BioSQL/BioSeqDatabase.py: class DBServer: remove_database class BioSeqDatabase: get_all_primary_ids get_Seq_by_primary_id ----------- These functions were deprecated in Biopython 1.59 or earlier, and could be removed for Biopython 1.61: Bio/Align/__init__.py: class MultipleSeqAlignment: get_column add_sequence Bio/Align/Generic.py: class Alignment overall get_all_seqs get_seq_by_num Bio/File.py: class StringHandle Bio/Graphics/GenomeDiagram/_AbstractDrawer.py: class AbstractDrawer: _set_xcentre, _set_ycentre Bio/Graphics/GenomeDiagram/_Graph.py: class GraphData: _set_centre Bio/ParserSupport.py: SGMLStrippingConsumer Bio/Seq.py: class Seq: .data property Bio/SeqIO/SffIO.py: _sff_read_roche_index_xml -------------------- The tostring() method of the class Seq in Bio/Seq.py: Can we declare this obsolete? -Michiel From w.arindrarto at gmail.com Fri Feb 1 10:47:14 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 1 Feb 2013 16:47:14 +0100 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: <510BE201.4090002@biotech.uni-tuebingen.de> References: <510BE201.4090002@biotech.uni-tuebingen.de> Message-ID: Hi Peter, Kai, >> There's still one bug for Bio.SearchIO that I would prefer to be >> fixed (https://redmine.open-bio.org/issues/3400). Is it possible to >> wait a few more days (no later than next week I hope) to sort this >> bug out? > > Sorry for letting this slip for so long, but I never got around to > write an actual test case. > > Bow, did we agree to use optionalcascade for now and then maybe > refactor? I'm pretty confident the code works as-is, all the BioPython > issues I've been running into with my production site have been in the > GenBank/EMBL parsers. :) Yes, we did :). I meant to do the optionalcascade refactor so the code is more maintainable (and to prevent a corner case bug). But in general, the optionalcascade fix looks to be fine. And for code marked with the BiopythonExperimentalWarning, having a fix without test cases seems better than no fix at all. Peter, if you're fine with Kai's fix, I think we can mark this bug solved and go on with the release. I'll add the test cases and refactor the code later on. Regards, Bow From p.j.a.cock at googlemail.com Fri Feb 1 10:51:07 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 15:51:07 +0000 Subject: [Biopython-dev] Bio.Motif update In-Reply-To: <1359733155.26451.YahooMailClassic@web164006.mail.gq1.yahoo.com> References: <1359733155.26451.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 3:39 PM, Michiel de Hoon wrote: > --- On Fri, 2/1/13, Peter Cock wrote: >> I went over the list in the DEPRECATED file last month, but >> a second check would be a good idea. > > The following were declared obsolete in Biopython 1.60, and can > in principle be declared deprecated in Biopython 1.61: > > ---------- > Bio/Blast/Applications.py: > BlastallCommandline > BlastpgpCommandline > RpsBlastCommandline My impression is there is still a sizeable group of people still using blastall and the rest of legacy BLAST as it is mature reliable code, while BLAST+ still has some rough edges. But as the NCBI themselves have now stopped updating legacy BLAST, perhaps the time has come. So if you want, deprecating the legacy BLAST wrappers seems OK. > Bio/Blast/NCBIStandalone.py overall, and specifically: > blastall > blastpgp > rpsblast Given the SearchIO use of the plain text BLAST parser, I think we agreed to leave that as is in the short term. The command line calling functions blastall, blastpgp & rpsblast the same applies as for BlastallCommandline, BlastpgpCommandline and RpsBlastCommandline (above). > Bio/ParserSupport.py overall Given the SearchIO use of the plain text BLAST parser which uses this, I think we agreed to leave that as is in the short term. > Bio/PDB/AbstractPropertyMap.py: > The has_key function in class AbstractPropertyMap > > Bio/PDB/FragmentMapper.py: > The has_key function in class FragmentMapper The Python dict lost the has_key function in Python 3, so it does make sense to proceed with those deprecations. > Bio/UniGene/UniGene.py overall > Yes, ready to deprecate. > In BioSQL/BioSeqDatabase.py: > class DBServer: > remove_database > class BioSeqDatabase: > get_all_primary_ids > get_Seq_by_primary_id Yes, ready to deprecate. Thanks, Peter From p.j.a.cock at googlemail.com Fri Feb 1 11:02:33 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 16:02:33 +0000 Subject: [Biopython-dev] Bugzilla --> RedMine --> GitHub issues? Message-ID: On Fri, Feb 1, 2013 at 3:40 PM, Kai Blin wrote: > > PS: I'd have replied on the bug tracker, but for some reason I can't > log in again, even after resetting the password. For some reason > redmine hates me. > Biopython used to use Bugzilla, at http://bugzilla.open-bio.org/ (it was left as a read only legacy listing, but it broke last year when the old server started to die and isn't really worth fixing). This was moved over to RedMine, along with all the other OBF projects. This does have some git integration, but I'm not that taken with it - and it is yet another service for the OBF team to maintain. What do people think of moving over to using GitHub issues? This would link in very well with pull requests and makes linking to commits much simpler too. One potential issue is if and how we could have bug reports sent to the biopython-dev mailing list (something we touched on recently for pull requests). A full automated move could be possible (NumPy did this), but I think a gradual move would be fine - stop filing new issues on RedMine and use GitHub issues in future. There are only about 100 issues open at the moment anyway, and a manual migration would also be a good way to review some of the older tickets. Thoughts?, Peter From p.j.a.cock at googlemail.com Fri Feb 1 11:04:10 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 16:04:10 +0000 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: <510BE201.4090002@biotech.uni-tuebingen.de> Message-ID: On Fri, Feb 1, 2013 at 3:47 PM, Wibowo Arindrarto wrote: > Hi Peter, Kai, > > >>> There's still one bug for Bio.SearchIO that I would prefer to be >>> fixed (https://redmine.open-bio.org/issues/3400). Is it possible to >>> wait a few more days (no later than next week I hope) to sort this >>> bug out? >> >> Sorry for letting this slip for so long, but I never got around to >> write an actual test case. >> >> Bow, did we agree to use optionalcascade for now and then maybe >> refactor? I'm pretty confident the code works as-is, all the BioPython >> issues I've been running into with my production site have been in the >> GenBank/EMBL parsers. :) > > Yes, we did :). I meant to do the optionalcascade refactor so the code > is more maintainable (and to prevent a corner case bug). But in > general, the optionalcascade fix looks to be fine. And for code marked > with the BiopythonExperimentalWarning, having a fix without test cases > seems better than no fix at all. That sounds OK for now. > Peter, if you're fine with Kai's fix, I think we can mark this bug > solved and go on with the release. I'll add the test cases and > refactor the code later on. You mean this patch from https://redmine.open-bio.org/issues/3400 ?: https://redmine.open-bio.org/attachments/1754/0001-SearchIO-Add-optionalcascade-getter-setter-to-allow-.patch I can apply that if you want. Peter From redmine at redmine.open-bio.org Fri Feb 1 11:04:20 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Fri, 1 Feb 2013 16:04:20 +0000 Subject: [Biopython-dev] [Biopython - Bug #3405] (New) to_networkx converts weights as string Message-ID: Issue #3405 has been reported by Aleksey Kladov. ---------------------------------------- Bug #3405: to_networkx converts weights as string https://redmine.open-bio.org/issues/3405 Author: Aleksey Kladov Status: New Priority: Normal Assignee: Category: Target version: URL: in the file /Bio/Phylo/_utils.py in the method add_edge(graph, n1, n2) there is a line
 graph.add_edge(n1, n2, weight=str(n2.branch_length or 1.0)) 
It's strange, because if weights are strings, then you are unable to find shortest paths due to
TypeError: unsupported operand type(s) for +: 'int' and 'str'
---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From kai.blin at biotech.uni-tuebingen.de Fri Feb 1 10:40:49 2013 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 01 Feb 2013 16:40:49 +0100 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: Message-ID: <510BE201.4090002@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2013-02-01 16:29, Wibowo Arindrarto wrote: > There's still one bug for Bio.SearchIO that I would prefer to be > fixed (https://redmine.open-bio.org/issues/3400). Is it possible to > wait a few more days (no later than next week I hope) to sort this > bug out? Sorry for letting this slip for so long, but I never got around to write an actual test case. Bow, did we agree to use optionalcascade for now and then maybe refactor? I'm pretty confident the code works as-is, all the BioPython issues I've been running into with my production site have been in the GenBank/EMBL parsers. :) Cheers, Kai PS: I'd have replied on the bug tracker, but for some reason I can't log in again, even after resetting the password. For some reason redmine hates me. - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJRC+IBAAoJEKM5lwBiwTTPlisH/1QSF+4jIx2jKycRCys1NPMj 6YwFTdKoGmIDYjEB+qge5PKNIHplN3EsGz6l4bRYZiWbqTlyvb5IUPHgwxFRigXg VuSnR/k8faSLNuGJpoFezLmZ0yJoLslXztCUJ+HbWXB02K9uzYXovRg8AtfHlnOu Qd9aNbyX/nzFrsayllTvYy9ZxcQNCH5Lrgm+EWMkuBptcMdBLjqSGkov5iE2g1bV ItHacrQUPJXVIAMTXW9mSy3AXzTqjOjqfBwXsthLSyXHEv8ppcnIi4bmVX+XS//n 4vc+LdaxzgkENaw4P+60bikkFqey/GFoxaIzLACh4HFupRAjK+6NaUzGYPSEQXM= =efd0 -----END PGP SIGNATURE----- From p.j.a.cock at googlemail.com Fri Feb 1 11:25:56 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 16:25:56 +0000 Subject: [Biopython-dev] Bugzilla --> RedMine --> GitHub issues? In-Reply-To: References: Message-ID: On Fri, Feb 1, 2013 at 4:02 PM, Peter Cock wrote: > > What do people think of moving over to using GitHub issues? > This would link in very well with pull requests and makes linking > to commits much simpler too. One potential issue is if and how > we could have bug reports sent to the biopython-dev mailing list > (something we touched on recently for pull requests). > I've filled an issue for that (I couldn't find any open issue like it): https://github.com/gitlabhq/gitlabhq/issues/2884 Peter From kai.blin at biotech.uni-tuebingen.de Fri Feb 1 11:27:13 2013 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 01 Feb 2013 17:27:13 +0100 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: <510BEAEF.4070108@biotech.uni-tuebingen.de> References: <510BE201.4090002@biotech.uni-tuebingen.de> <510BEAEF.4070108@biotech.uni-tuebingen.de> Message-ID: <510BECE1.4020306@biotech.uni-tuebingen.de> On 2013-02-01 17:18, Kai Blin wrote: > That's not quite it. Let me update my bug3400 branch and submit a > pull request. Will be ready in a minute. https://github.com/biopython/biopython/pull/150 Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From kai.blin at biotech.uni-tuebingen.de Fri Feb 1 11:18:55 2013 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 01 Feb 2013 17:18:55 +0100 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: <510BE201.4090002@biotech.uni-tuebingen.de> Message-ID: <510BEAEF.4070108@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2013-02-01 17:04, Peter Cock wrote: Hi Peter, >> Peter, if you're fine with Kai's fix, I think we can mark this >> bug solved and go on with the release. I'll add the test cases >> and refactor the code later on. > > You mean this patch from https://redmine.open-bio.org/issues/3400 > ?: > https://redmine.open-bio.org/attachments/1754/0001-SearchIO-Add-optionalcascade-getter-setter-to-allow-.patch > > I can apply that if you want. That's not quite it. Let me update my bug3400 branch and submit a pull request. Will be ready in a minute. Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJRC+rvAAoJEKM5lwBiwTTPYH4H+QGiY5cyN7tFjT2RZGN28Pp8 2t/RbW9bYakVqKHtZR6xXu4QF48jCmHkkER0cMvDuKcWrjko/xAWSGuNqWK59rHe b7t9CgGywYC9KdhPih+pG5HzKuc9ZP1c2unK/e+c+y8rrFZTUoB1e2AbGqzg163S qplu0RGv8kSOMXmGVFNj+iZ/AJnN735Tp5gfzFHfudS13kzfqW+Mq1+DlSG1GOwM Y99kc6Uc5WFHmHME4pDdlLBGyKVd+9LlQnTeApBjWnBDcRBMyXI0HIck6Bw64swH BvPIz2yq3PEnhvgI0v0A9lO1xR0Yj9wGQGr8XGPLq0UHh0W0O0P1I8YbMCVHkPg= =kCtp -----END PGP SIGNATURE----- From p.j.a.cock at googlemail.com Fri Feb 1 11:50:57 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 16:50:57 +0000 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: Message-ID: On Fri, Feb 1, 2013 at 3:03 PM, Peter Cock wrote: > Hello all, > > I think we're overdue for a Biopython release now, and I would > like to do this next week. There are still plenty more additions > and enhancements waiting in the wings, but right now I just > want any remaining bug fixes addressed. > > Are there any release blocking issues? > > Thanks, > > Peter I won't have time to look at it today, but the BLAST+ wrappers need updating for the BLAST 2.2.27+ release, e.g. new arg frame_shift_penalty (checked via test_NCBI_BLAST_tools.py). Any volunteers? This should be a small job... Peter From w.arindrarto at gmail.com Fri Feb 1 12:37:57 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 1 Feb 2013 18:37:57 +0100 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: Message-ID: Hi Peter, >> I think we're overdue for a Biopython release now, and I would >> like to do this next week. There are still plenty more additions >> and enhancements waiting in the wings, but right now I just >> want any remaining bug fixes addressed. >> >> Are there any release blocking issues? >> >> Thanks, >> >> Peter > > I won't have time to look at it today, but the BLAST+ wrappers > need updating for the BLAST 2.2.27+ release, e.g. new arg > frame_shift_penalty (checked via test_NCBI_BLAST_tools.py). > > Any volunteers? This should be a small job... I've submitted a pull request here: https://github.com/biopython/biopython/pull/151 From w.arindrarto at gmail.com Fri Feb 1 12:43:23 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 1 Feb 2013 18:43:23 +0100 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: Message-ID: > Hi Peter, > >>> I think we're overdue for a Biopython release now, and I would >>> like to do this next week. There are still plenty more additions >>> and enhancements waiting in the wings, but right now I just >>> want any remaining bug fixes addressed. >>> >>> Are there any release blocking issues? >>> >>> Thanks, >>> >>> Peter >> >> I won't have time to look at it today, but the BLAST+ wrappers >> need updating for the BLAST 2.2.27+ release, e.g. new arg >> frame_shift_penalty (checked via test_NCBI_BLAST_tools.py). >> >> Any volunteers? This should be a small job... > > I've submitted a pull request here: > https://github.com/biopython/biopython/pull/151 Wops, sorry for sending an incomplete mail ~ I wanted to add that some test_NCBI_BLAST_tools.py doesn't correctly detect my blast installations (even though I have it). I had to comment out the "Install BLAST+ ..." notice and the rpsblast test (for some reason it keeps saying I don't have rpsblast too, even though I do). Anyway, these are not in the pull request, just something I did when writing this fix. Could you confirm that the fixes are OK? Hope that helps, Bow From w.arindrarto at gmail.com Fri Feb 1 12:48:09 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 1 Feb 2013 18:48:09 +0100 Subject: [Biopython-dev] Bugzilla --> RedMine --> GitHub issues? In-Reply-To: References: Message-ID: >> PS: I'd have replied on the bug tracker, but for some reason I can't >> log in again, even after resetting the password. For some reason >> redmine hates me. >> > > Biopython used to use Bugzilla, at http://bugzilla.open-bio.org/ > (it was left as a read only legacy listing, but it broke last year when > the old server started to die and isn't really worth fixing). > > This was moved over to RedMine, along with all the other OBF > projects. This does have some git integration, but I'm not that > taken with it - and it is yet another service for the OBF team > to maintain. > > What do people think of moving over to using GitHub issues? > This would link in very well with pull requests and makes linking > to commits much simpler too. One potential issue is if and how > we could have bug reports sent to the biopython-dev mailing list > (something we touched on recently for pull requests). > > A full automated move could be possible (NumPy did this), but I > think a gradual move would be fine - stop filing new issues on > RedMine and use GitHub issues in future. There are only about > 100 issues open at the moment anyway, and a manual migration > would also be a good way to review some of the older tickets. > > Thoughts?, Moving to GitHub sounds good to me. I'd prefer if we go over the issues manually (removing the obsolete ones and keeping the current ones). As per the bug reports sending to the mailing list, could we perhaps create our own custom hooks? e.g. anytime a pull request is issued, an email would be sent (see https://github.com/github/github-services and http://developer.github.com/v3/repos/hooks/#create-a-hook) Regards, Bow From arklenna at gmail.com Fri Feb 1 14:05:18 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Fri, 1 Feb 2013 14:05:18 -0500 Subject: [Biopython-dev] Namespace for online resources? In-Reply-To: References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock wrote: > > People leaning for a Bio.WWW grouping: Bow, Lenna, Kevin > (which could be a big disruption with lots of code relocation) > > People leaning against a Bio.WWW grouping: Michiel, Peter (me) > (which would also be the status quo, so no disruption). > > I concede that the potential benefit of refactoring to separate WWW is outweighed both by potential downsides and the disruption and effort involved. In the specific case of Kevin's TAIR code for fetch Arabidopsis sequences, > Bio.TAIR (lower case?) is consistent with current usage. Somewhere under > Bio.Seq* also seems sensible to me, as I wrote at the start of this thread. > > Populating the top level namespace with a submodule for each web-only service has the risk of creating too many submodules. Bio.Seq* makes sense, because the TAIR code pulls data into a Seq. Web services that connect to a single biopython representation can be organized under that submodule. Web services that return multiple types of information (e.g. Entrez) are big enough to logically comprise their own submodule. Is my interpretation of the biopython classification scheme more or less correct? Cheers, Lenna From p.j.a.cock at googlemail.com Fri Feb 1 16:00:10 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 21:00:10 +0000 Subject: [Biopython-dev] Namespace for online resources? In-Reply-To: References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 7:05 PM, Lenna Peterson wrote: > On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock > wrote: >> >> >> People leaning for a Bio.WWW grouping: Bow, Lenna, Kevin >> (which could be a big disruption with lots of code relocation) >> >> People leaning against a Bio.WWW grouping: Michiel, Peter (me) >> (which would also be the status quo, so no disruption). >> > > I concede that the potential benefit of refactoring to separate WWW is > outweighed both by potential downsides and the disruption and effort > involved. > >> In the specific case of Kevin's TAIR code for fetch Arabidopsis sequences, >> Bio.TAIR (lower case?) is consistent with current usage. Somewhere under >> Bio.Seq* also seems sensible to me, as I wrote at the start of this >> thread. >> > > Populating the top level namespace with a submodule for each web-only > service has the risk of creating too many submodules. Bio.Seq* makes sense, > because the TAIR code pulls data into a Seq. Web services that connect to a > single biopython representation can be organized under that submodule. Web > services that return multiple types of information (e.g. Entrez) are big > enough to logically comprise their own submodule. > > Is my interpretation of the biopython classification scheme more or less > correct? Yes that sounds about right :) Of course, the historical muddle of Bio.Seq* is something we've talked about addressing recently - see this thread from October, http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html Peter From natemsutton at yahoo.com Fri Feb 1 16:54:42 2013 From: natemsutton at yahoo.com (Nate Sutton) Date: Fri, 1 Feb 2013 13:54:42 -0800 (PST) Subject: [Biopython-dev] New BioPython member In-Reply-To: References: <1359494577.29159.YahooMailNeo@web122606.mail.ne1.yahoo.com> Message-ID: <1359755682.16563.YahooMailNeo@web122605.mail.ne1.yahoo.com> Thanks for the welcome! ?Also, I looked briefly through the code with the files you wrote about and I see the command line app wrapping components you described. ?I appreciate the advice about how the do the wrapper and I am glad to know of that pattern of command line app wrapping that is consistent with code in other places of BioPython. ?Thanks for the other advice including possibly asking for guidance. ?I?ll just give it a shot and hopefully things go smoothly but it being my first BioPython coding I appreciate the support. Thanks, Nate ________________________________ From: Peter Cock To: Nate Sutton Cc: "biopython-dev at lists.open-bio.org" Sent: Wednesday, January 30, 2013 2:31 AM Subject: Re: [Biopython-dev] New BioPython member On Tue, Jan 29, 2013 at 9:22 PM, Nate Sutton wrote: > Dear all, > > I just recently joined the BioPython developers group and am > looking forward to contributing to BioPython!? I have worked for a while > in programming, genetics, and biology and have > a m.s. in Biomedical Informatics.? After > talking with some fellow contributors I have decided to try working on > https://redmine.open-bio.org/issues/3360 but I will also work on writing > some documentation on examples from the > cookbook, especially if I am stuck on the bug.? If anyone wants to work on > the same things, I?d be glad to hear that, I > may be slow on the work because I am still learning Python after coming > from > other languages. > > -Nate Hi Nate, and welcome. Eric is in charge of the Bio.Phylo module, but within that the command line application wrappers under Bio.Phylo.Applications follow a pattern used elsewhere in Biopython. To add a wrapper for fasttree http://www.microbesonline.org/fasttree/ have a look at the existing wrappers for PHYML and RAXML, defined in Bio/Phylo/Applications/_Phyml.py and Bio/Phylo/Applications/_Raxml.py (leading underscores mean private modules in Python), which are exposed to the user via Bio/Phylo/Applications/__init__.py In this case, I'd suggest putting the new wrapper in a new file, Bio/Phylo/Applications/_fastree.py Other similar wrappers existing under Bio.Emboss, Bio.Align, etc. Don't be shy about asking for guidance on this, or git and github. Ultimately I'm hoping you'll be able to do is take a fork (personally copy of the repository) on GitHub, create a new fasttree branch, commit your enhancements, and make a pull request. If that's all too much for now, simply writing the new file and letting us do the git side would be fine. Regards, Peter From k.d.murray.91 at gmail.com Fri Feb 1 18:59:57 2013 From: k.d.murray.91 at gmail.com (Kevin Murray) Date: Sat, 2 Feb 2013 10:59:57 +1100 Subject: [Biopython-dev] Namespace for online resources? In-Reply-To: References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: Hi All, How about this: In the vein of Lenna's last email, we create a module WebSeq (or Seq.Web, or whatever), containing modules whose sole purpose is to get sequences (Seq/SeqRecord objects) from an internet database. This would i think provide a good balance between a messy top-level domain full of modules like Bio.tair, and the absolutisim of having anything vaugly web related in a single WWW module. It should also provide the unified theme per module which Michiel talks of, and unit/doctests should be fine, as no modules will be split (simply moved in their entirety from Bio.x to Bio.WebSeq.x). >From a quick look, the only candiate (apart from TAIR) for a shift is TogoWS, and even then I'm not sure, as TogoWS isn't used just for Seq's (and does not return them). Regards Kevin Murray On 2 February 2013 08:00, Peter Cock wrote: > On Fri, Feb 1, 2013 at 7:05 PM, Lenna Peterson wrote: > > On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock > > wrote: > >> > >> > >> People leaning for a Bio.WWW grouping: Bow, Lenna, Kevin > >> (which could be a big disruption with lots of code relocation) > >> > >> People leaning against a Bio.WWW grouping: Michiel, Peter (me) > >> (which would also be the status quo, so no disruption). > >> > > > > I concede that the potential benefit of refactoring to separate WWW is > > outweighed both by potential downsides and the disruption and effort > > involved. > > > >> In the specific case of Kevin's TAIR code for fetch Arabidopsis > sequences, > >> Bio.TAIR (lower case?) is consistent with current usage. Somewhere under > >> Bio.Seq* also seems sensible to me, as I wrote at the start of this > >> thread. > >> > > > > Populating the top level namespace with a submodule for each web-only > > service has the risk of creating too many submodules. Bio.Seq* makes > sense, > > because the TAIR code pulls data into a Seq. Web services that connect > to a > > single biopython representation can be organized under that submodule. > Web > > services that return multiple types of information (e.g. Entrez) are big > > enough to logically comprise their own submodule. > > > > Is my interpretation of the biopython classification scheme more or less > > correct? > > Yes that sounds about right :) > > Of course, the historical muddle of Bio.Seq* is something we've talked > about addressing recently - see this thread from October, > http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From mjldehoon at yahoo.com Fri Feb 1 20:36:03 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 1 Feb 2013 17:36:03 -0800 (PST) Subject: [Biopython-dev] Namespace for online resources? In-Reply-To: Message-ID: <1359768963.25565.YahooMailClassic@web164004.mail.gq1.yahoo.com> In principle I am OK with this, but is TAIR only used for sequences? Or is it possible / likely that in the future we may want to add other functionality to TAIR? Anyway, if TAIR is predominantly used for sequences, then Bio.Seq.Web is a good option I think. Best, -Michiel. --- On Fri, 2/1/13, Kevin Murray wrote: > From: Kevin Murray > Subject: Re: [Biopython-dev] Namespace for online resources? > To: "Peter Cock" > Cc: "Biopython-Dev Mailing List" > Date: Friday, February 1, 2013, 6:59 PM > Hi All, > > How about this: > In the vein of Lenna's last email, we create a module WebSeq > (or Seq.Web, > or whatever), containing modules whose sole purpose is to > get sequences > (Seq/SeqRecord objects) from an internet database. This > would i think > provide a good balance between a messy top-level domain full > of modules > like Bio.tair, and the absolutisim of having anything vaugly > web related in > a single WWW module. It should also provide the unified > theme per module > which Michiel talks of, and unit/doctests should be fine, as > no modules > will be split (simply moved in their entirety from Bio.x to > Bio.WebSeq.x). > > >From a quick look, the only candiate (apart from TAIR) > for a shift is > TogoWS, and even then I'm not sure, as TogoWS isn't used > just for Seq's > (and does not return them). > > Regards > Kevin Murray > > > On 2 February 2013 08:00, Peter Cock > wrote: > > > On Fri, Feb 1, 2013 at 7:05 PM, Lenna Peterson > wrote: > > > On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock > > > wrote: > > >> > > >> > > >> People leaning for a Bio.WWW grouping: Bow, > Lenna, Kevin > > >> (which could be a big disruption with lots of > code relocation) > > >> > > >> People leaning against a Bio.WWW grouping: > Michiel, Peter (me) > > >> (which would also be the status quo, so no > disruption). > > >> > > > > > > I concede that the potential benefit of > refactoring to separate WWW is > > > outweighed both by potential downsides and the > disruption and effort > > > involved. > > > > > >> In the specific case of Kevin's TAIR code for > fetch Arabidopsis > > sequences, > > >> Bio.TAIR (lower case?) is consistent with > current usage. Somewhere under > > >> Bio.Seq* also seems sensible to me, as I wrote > at the start of this > > >> thread. > > >> > > > > > > Populating the top level namespace with a > submodule for each web-only > > > service has the risk of creating too many > submodules. Bio.Seq* makes > > sense, > > > because the TAIR code pulls data into a Seq. Web > services that connect > > to a > > > single biopython representation can be organized > under that submodule. > > Web > > > services that return multiple types of information > (e.g. Entrez) are big > > > enough to logically comprise their own submodule. > > > > > > Is my interpretation of the biopython > classification scheme more or less > > > correct? > > > > Yes that sounds about right :) > > > > Of course, the historical muddle of Bio.Seq* is > something we've talked > > about addressing recently - see this thread from > October, > > http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html > > > > Peter > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From k.d.murray.91 at gmail.com Sat Feb 2 01:00:34 2013 From: k.d.murray.91 at gmail.com (Kevin Murray) Date: Sat, 2 Feb 2013 17:00:34 +1100 Subject: [Biopython-dev] Namespace for online resources? In-Reply-To: <1359768963.25565.YahooMailClassic@web164004.mail.gq1.yahoo.com> References: <1359768963.25565.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: Michiel, TAIR (http://www.arabidopsis.org/) is primarily a sequence repository. I have no intention to extend it beyond that, and any other features would not be easily scriptable, or would be pointless to include in Biopython. Regards Kevin Murray On 2 February 2013 12:36, Michiel de Hoon wrote: > In principle I am OK with this, but is TAIR only used for sequences? Or is > it possible / likely that in the future we may want to add other > functionality to TAIR? Anyway, if TAIR is predominantly used for sequences, > then Bio.Seq.Web is a good option I think. > > Best, > -Michiel. > > --- On Fri, 2/1/13, Kevin Murray wrote: > > > From: Kevin Murray > > Subject: Re: [Biopython-dev] Namespace for online resources? > > To: "Peter Cock" > > Cc: "Biopython-Dev Mailing List" > > Date: Friday, February 1, 2013, 6:59 PM > > Hi All, > > > > How about this: > > In the vein of Lenna's last email, we create a module WebSeq > > (or Seq.Web, > > or whatever), containing modules whose sole purpose is to > > get sequences > > (Seq/SeqRecord objects) from an internet database. This > > would i think > > provide a good balance between a messy top-level domain full > > of modules > > like Bio.tair, and the absolutisim of having anything vaugly > > web related in > > a single WWW module. It should also provide the unified > > theme per module > > which Michiel talks of, and unit/doctests should be fine, as > > no modules > > will be split (simply moved in their entirety from Bio.x to > > Bio.WebSeq.x). > > > > >From a quick look, the only candiate (apart from TAIR) > > for a shift is > > TogoWS, and even then I'm not sure, as TogoWS isn't used > > just for Seq's > > (and does not return them). > > > > Regards > > Kevin Murray > > > > > > On 2 February 2013 08:00, Peter Cock > > wrote: > > > > > On Fri, Feb 1, 2013 at 7:05 PM, Lenna Peterson > > wrote: > > > > On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock < > p.j.a.cock at googlemail.com> > > > > wrote: > > > >> > > > >> > > > >> People leaning for a Bio.WWW grouping: Bow, > > Lenna, Kevin > > > >> (which could be a big disruption with lots of > > code relocation) > > > >> > > > >> People leaning against a Bio.WWW grouping: > > Michiel, Peter (me) > > > >> (which would also be the status quo, so no > > disruption). > > > >> > > > > > > > > I concede that the potential benefit of > > refactoring to separate WWW is > > > > outweighed both by potential downsides and the > > disruption and effort > > > > involved. > > > > > > > >> In the specific case of Kevin's TAIR code for > > fetch Arabidopsis > > > sequences, > > > >> Bio.TAIR (lower case?) is consistent with > > current usage. Somewhere under > > > >> Bio.Seq* also seems sensible to me, as I wrote > > at the start of this > > > >> thread. > > > >> > > > > > > > > Populating the top level namespace with a > > submodule for each web-only > > > > service has the risk of creating too many > > submodules. Bio.Seq* makes > > > sense, > > > > because the TAIR code pulls data into a Seq. Web > > services that connect > > > to a > > > > single biopython representation can be organized > > under that submodule. > > > Web > > > > services that return multiple types of information > > (e.g. Entrez) are big > > > > enough to logically comprise their own submodule. > > > > > > > > Is my interpretation of the biopython > > classification scheme more or less > > > > correct? > > > > > > Yes that sounds about right :) > > > > > > Of course, the historical muddle of Bio.Seq* is > > something we've talked > > > about addressing recently - see this thread from > > October, > > > > http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html > > > > > > Peter > > > _______________________________________________ > > > Biopython-dev mailing list > > > Biopython-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > From eric.talevich at gmail.com Sat Feb 2 17:29:57 2013 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 2 Feb 2013 17:29:57 -0500 Subject: [Biopython-dev] Namespace for online resources? In-Reply-To: References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock wrote: > On Fri, Feb 1, 2013 at 1:54 PM, Michiel de Hoon > wrote: > > Hi Lenna, > > > >> Regarding point (2), is your primary concern namespace clutter or > >> importing efficiency? > > > > Regarding point (2), my primary concern is that a Bio.WWW module would > > group together modules that don't have much in common with each other. I > > agree to your point that the category of internet access is more > fundamental > > than the category of parsers. But still, which modules should then go > into a > > Bio.WWW module? Any module whose sole purpose is to use the internet > (that > > would exclude Bio.Entrez)? Any module whose main purpose is to use the > > internet? This would be unclear; for example, Bio.Entrez may or may not > fall > > in that category, depending on how you use the module. Any module whose > > functionality includes internet access? Then if one day we add access to > the > > JASPAR database over the internet to Bio.Motif, it would have to move to > > Bio.WWW. > > > > Currently most modules are organized by theme (Bio.Seq, Bio.Motif, > > Bio.Cluster, Bio.Phylo, Bio.Entrez, etc.). For each theme, we have one > > module, one chapter in the documentation, one test of unit tests, one > set of > > doctests, which I think is a huge advantage both in terms of clarity and > in > > terms of user experience. > > Also with the theme approach, most (if not all) the themes are likely to > have some online resources (databases or remote APIs). On those > grounds it makes sense to keep online motif functionality (like weblogo) > under Bio.Motif, and so on. > I agree. >From an engineering perspective, it's usually best to organize code around data types. (To be clear: think classes and structures, not ints and strings.) The SeqIO, AlignIO, SearchIO, Phylo, Motif, PDB, etc. modules each have a core data type that serves as the "theme" for the sub-package. Within the sub-package we can have modules for different file formats, data transformations/manipulations, web servers, and command-line program wrappers, and keep all the interdependencies within the same small region of the code base. Since most users will not read the documentation in its entirety (if at all), this also makes it easier to look up how to do things with the data type in question. The core data type for a WWW module would be a network handle, I suppose -- but that's already part of the Python standard library. I've suggested before that we can justify the current placement of sequence-related modules at the top level, rather than under a new "Seq" sub-package, by considering sequences to be the default/implicit data type. As we've covered, many online resources can serve up several different data types, although sequences are probably the most common. In terms of namespace clutter, perhaps I've gotten too used to R, but I don't think we've reached the point where the number of modules and functions visible from the top level harms the user experience. In the specific case of Kevin's TAIR code for fetch Arabidopsis sequences, > Bio.TAIR (lower case?) is consistent with current usage. Somewhere under > Bio.Seq* also seems sensible to me, as I wrote at the start of this thread. > Bio.TAIR or Bio.Seq.TAIR or perhaps Bio.Seq.WWW.TAIR seem sensible to me, too. No preference on casing. -Eric From p.j.a.cock at googlemail.com Mon Feb 4 07:01:49 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 4 Feb 2013 12:01:49 +0000 Subject: [Biopython-dev] Deprecations for Biopython 1.61 release; Was: Bio.Motif update Message-ID: On Fri, Feb 1, 2013 at 3:39 PM, Michiel de Hoon wrote: > --- On Fri, 2/1/13, Peter Cock wrote: >> I went over the list in the DEPRECATED file last month, but >> a second check would be a good idea. > > The following were declared obsolete in Biopython 1.60, and can > in principle be declared deprecated in Biopython 1.61: > > ---------- > Bio/Blast/Applications.py: > BlastallCommandline > BlastpgpCommandline > RpsBlastCommandline > > Bio/Blast/NCBIStandalone.py overall, and specifically: > blastall > blastpgp > rpsblast > > Bio/ParserSupport.py overall > > Bio/PDB/AbstractPropertyMap.py: > The has_key function in class AbstractPropertyMap > > Bio/PDB/FragmentMapper.py: > The has_key function in class FragmentMapper > > Bio/UniGene/UniGene.py overall > > In BioSQL/BioSeqDatabase.py: > class DBServer: > remove_database > class BioSeqDatabase: > get_all_primary_ids > get_Seq_by_primary_id > > ----------- > > These functions were deprecated in Biopython 1.59 or earlier, and could be removed for Biopython 1.61: > > Bio/Align/__init__.py: > class MultipleSeqAlignment: > get_column > add_sequence > > Bio/Align/Generic.py: > class Alignment overall > get_all_seqs > get_seq_by_num > > Bio/File.py: > class StringHandle > > Bio/Graphics/GenomeDiagram/_AbstractDrawer.py: > class AbstractDrawer: > _set_xcentre, _set_ycentre > > Bio/Graphics/GenomeDiagram/_Graph.py: > class GraphData: > _set_centre > > Bio/ParserSupport.py: > SGMLStrippingConsumer > > Bio/Seq.py: > class Seq: > .data property > > Bio/SeqIO/SffIO.py: > _sff_read_roche_index_xml > > -------------------- > > The tostring() method of the class Seq in Bio/Seq.py: > Can we declare this obsolete? > > -Michiel Bio/SeqIO/SffIO.py function _sff_read_roche_index_xml done: https://github.com/biopython/biopython/commit/567464d9a5f8b87ec48e95bae127b86463bd4da1 Bio/File.py and Bio/ParserSupport.py bits done: https://github.com/biopython/biopython/commit/63997ea0afa5f7f6cac5c1b036d56416b04edb2a GenomeDiagram centre setters done: https://github.com/biopython/biopython/commit/2424c5ca36cdf4348b54bafdae444a91a6457288 Peter From ben at bendmorris.com Mon Feb 4 10:17:36 2013 From: ben at bendmorris.com (Ben Morris) Date: Mon, 4 Feb 2013 10:17:36 -0500 Subject: [Biopython-dev] Support for NeXML and RDF trees in Bio.Phylo In-Reply-To: References: Message-ID: On Fri, Jan 18, 2013 at 8:20 PM, Eric Talevich wrote: > On Fri, Dec 28, 2012 at 10:50 AM, Ben Morris wrote: >> >> On Tue, Dec 25, 2012 at 2:18 AM, Eric Talevich >> wrote: >> > >> > On Mon, Dec 24, 2012 at 8:58 AM, Ben Morris wrote: >> >> >> >> Hi all, >> >> >> >> I've implemented support for two new phylogenetic tree formats: NeXML >> >> and >> >> RDF (conforming to the Comparative Data Analysis Ontology). >> >> >> >> I noticed that NeXML support was planned, but I didn't see anyone >> >> working >> >> on it on GitHub and the feature request hadn't been updated in about a >> >> year, so I went ahead and implemented a simple version. At first I >> >> tried >> >> the generateDS.py approach, but the generated writer doesn't give very >> >> much >> >> control over the output, so I ended up writing my own parser/writer >> >> using >> >> ElementTree. >> >> >> >> As for the RDF/CDAO format, AFAIK this is not a format that's supported >> >> by >> >> any other phylogenetic libraries, so I'm not sure how useful this is to >> >> everyone else. It provides a simple, standards-compliant format that >> >> can be >> >> imported to a triple store and supports annotation. We'll be using it >> >> at >> >> NESCent so I wanted to make it available to everyone else as well. The >> >> parser and writer require the Redlands Python bindings. >> >> >> >> The code is available in my fork of Biopython, >> >> >> >> https://github.com/bendmorris/biopython >> >> >> >> under branches "cdao" and "nexml." I'd love to get everyone's thoughts >> >> and >> >> see if these contributions would be a good fit for the Biopython >> >> project. >> > >> > >> > >> > Thanks for letting us know! I'll try it out soonish. Looking at the code >> > on your nexml branch, I have a few comments: >> > >> > - The parser uses ElementTree.parse rather than iterparse, so in its >> > current state it would not be able to parse massive files (those larger than >> > available RAM). Worth fixing eventually? >> >> Great point. I rewrote it to use iterparse instead. >> >> > - The parser creates Newick.Tree and Newick.Clade objects, which is >> > nearly correct in my opinion. I would suggest subclassing BaseTree.Tree and >> > BaseTree.Clade to create NeXML-specific Tree and Clade classes, even if you >> > don't have any additional attributes to attach to those classes at the >> > moment. (These would go in a new file NeXML.py, similar to PhyloXML.py and >> > PhyloXMLIO.py.) >> >> Went ahead and did this as well. > > > Thanks! Sorry for the pace of this, I'm in the midst of a dissertation. > > >> > - The 'confidence' or 'confidences' attribute isn't used (for e.g. >> > bootstrap support values). Does NeXML define it? >> >> Not that I'm aware of, but I'm not sure. I searched >> http://nexml.org/nexml/html/doc/schema-1/ and didn't find anything. >> I'm going to ask some people who know more about this than I do. > > > I would like for Bio.Phylo's I/O modules to be able to successfully > round-trip a file from Newick to phyloXML to NeXML and back to Newick > without losing support values. I found these two examples of how to add this > data to a NeXML document by referencing CDAO: > https://www.nescent.org/wg_evoinfo/NeXML_Test_Files#Bootstraps_represented_using_the_.22meta.22_tag > https://www.nescent.org/wg_evoinfo/NeXML_Test_Files#Bootstraps_represented_without_new_tags_or_elements > > That's the standard way to store bootstrap supports in NeXML (Hilmar > confirms). How do your NeXML and CDAO modules interact, if at all? Would the > CDAO modules be useful to properly support NeXML metadata like > support/confidence values, or would it be simpler to just hard-code the few > tags we're specifically interested in? > > Relatedly, those look like good test files. I see you've started writing > NeXML unit tests already; if you would like help with any of this, just let > me know. > > -Eric No worries! I just returned from a NESCent-sponsored hackathon where we used BioPython as part of a Virtuoso-backed RDF treestore (https://github.com/phylotastic/rdf-treestore). Now that I'm back, I'll work on the bootstrap support values and annotations for NeXML as I have time. I think it's probably much easier to just hard-code specific tags for now. The CDAO module can convert the more readable CDAO prefix names to OBO numeric identifiers (e.g. cdao:has_Root -> obo:CDAO_0000148) but other than that I don't see a good way for them to interact. I gave a short demo of Bio.Phylo at the hackathon, and people were very impressed. We had some issues with Newick and Nexus parsing as well, so I'll open issues on the bug tracker. ~Ben From redmine at redmine.open-bio.org Mon Feb 4 10:20:38 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 4 Feb 2013 15:20:38 +0000 Subject: [Biopython-dev] [Biopython - Bug #3407] (New) Handling of bootstrap support values in Bio.Phylo Newick parser Message-ID: Issue #3407 has been reported by Ben Morris. ---------------------------------------- Bug #3407: Handling of bootstrap support values in Bio.Phylo Newick parser https://redmine.open-bio.org/issues/3407 Author: Ben Morris Status: New Priority: Normal Assignee: Category: Target version: URL: This was reported to me by Arlin Stoltzfus (quote): "There is a description of Newick here: http://evolution.genetics.washington.edu/phylip/newicktree.html and a BNF here: http://evolution.genetics.washington.edu/phylip/newick_doc.html Note that this allows square-bracketed comments. Bootstrap values commonly are represented in 2 ways, one of which is wrong. The wrong way to represent bootstrap values is to present them as internal node labels. Labels for internal nodes are given as follows: ((( human: 0.1, chimp:0.1 ) primates: 0.2, (rat:0.1, mouse:0.1) rodents:0.2), cat:0.3 ) where "primates" and "rodents" are internal node labels. They go between the right paren and the (optional) colon and distance. If you put numbers in the label position, a graphic renderer may place them on the nodes, which is why some people represent bootstrap values this way. However, the preferred way to represent bootstrap values is to make them syntactic comments (enclosed in square brackets) placed after all other node information, i.e., after the optional colon & branch length. Both examples are shown here: ((raccoon:19.19959,bear:6.80041)50:0.84600,((sea_lion:11.99700, seal:12.00300)100:7.52973,((monkey:100.85930,cat:47.14069)80:20.59201, weasel:18.87953)75:2.09460)50:3.87382,dog:25.46154); or ((raccoon:19.19959,bear:6.80041):0.84600[50],((sea_lion:11.99700, seal:12.00300):7.52973[100],((monkey:100.85930,cat:47.14069):20.59201[80], weasel:18.87953):2.09460[75]):3.87382[50],dog:25.46154); I recommend that you only support the second version, and treat the first version as a case of internal node labels. Arlin ------- Arlin Stoltzfus (arlin at umd.edu) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD, 20850 tel: 240 314 6208; web: www.molevol.org" ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Mon Feb 4 10:26:31 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 4 Feb 2013 15:26:31 +0000 Subject: [Biopython-dev] [Biopython - Bug #3408] (New) Parsing of Nexus files generated by TreeBase fails (Bio.Phylo) Message-ID: Issue #3408 has been reported by Ben Morris. ---------------------------------------- Bug #3408: Parsing of Nexus files generated by TreeBase fails (Bio.Phylo) https://redmine.open-bio.org/issues/3408 Author: Ben Morris Status: New Priority: Normal Assignee: Category: Target version: URL: Steps to reproduce: Pick a tree on TreeBase (e.g. http://treebase.org/treebase-web/search/study/trees.html?id=12003 or http://treebase.org/treebase-web/search/study/trees.html?id=1029) and click on "download reconstructed NEXUS file." Attempt to parse the file using Bio.Phylo.read. Exception:
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/_io.py", line 62, in read
    tree = tree_gen.next()
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/_io.py", line 50, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NexusIO.py", line 39, in parse
    nex = Nexus.Nexus(handle)
  File "/usr/lib/python2.7/site-packages/Bio/Nexus/Nexus.py", line 572, in __init__
    self.read(input)
  File "/usr/lib/python2.7/site-packages/Bio/Nexus/Nexus.py", line 623, in read
    self._parse_nexus_block(title, contents)
  File "/usr/lib/python2.7/site-packages/Bio/Nexus/Nexus.py", line 664, in _parse_nexus_block
    getattr(self,'_'+line.command)(line.options)
AttributeError: 'Nexus' object has no attribute '_link'
DendroPy is able to parse the same files. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Mon Feb 4 11:49:07 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 4 Feb 2013 16:49:07 +0000 Subject: [Biopython-dev] Deprecations for Biopython 1.61 release; Was: Bio.Motif update In-Reply-To: References: Message-ID: On Mon, Feb 4, 2013 at 12:01 PM, Peter Cock wrote: > On Fri, Feb 1, 2013 at 3:39 PM, Michiel de Hoon wrote: >> --- On Fri, 2/1/13, Peter Cock wrote: >>> I went over the list in the DEPRECATED file last month, but >>> a second check would be a good idea. >> >> The following were declared obsolete in Biopython 1.60, and can >> in principle be declared deprecated in Biopython 1.61: >> >> ---------- >> Bio/Blast/Applications.py: >> BlastallCommandline >> BlastpgpCommandline >> RpsBlastCommandline >> >> Bio/Blast/NCBIStandalone.py overall, and specifically: >> blastall >> blastpgp >> rpsblast >> >> Bio/ParserSupport.py overall >> >> Bio/PDB/AbstractPropertyMap.py: >> The has_key function in class AbstractPropertyMap >> >> Bio/PDB/FragmentMapper.py: >> The has_key function in class FragmentMapper >> >> Bio/UniGene/UniGene.py overall >> >> In BioSQL/BioSeqDatabase.py: >> class DBServer: >> remove_database >> class BioSeqDatabase: >> get_all_primary_ids >> get_Seq_by_primary_id >> >> ----------- >> >> These functions were deprecated in Biopython 1.59 or earlier, and could be removed for Biopython 1.61: >> >> Bio/Align/__init__.py: >> class MultipleSeqAlignment: >> get_column >> add_sequence >> >> Bio/Align/Generic.py: >> class Alignment overall >> get_all_seqs >> get_seq_by_num >> >> Bio/File.py: >> class StringHandle >> >> Bio/Graphics/GenomeDiagram/_AbstractDrawer.py: >> class AbstractDrawer: >> _set_xcentre, _set_ycentre >> >> Bio/Graphics/GenomeDiagram/_Graph.py: >> class GraphData: >> _set_centre >> >> Bio/ParserSupport.py: >> SGMLStrippingConsumer >> >> Bio/Seq.py: >> class Seq: >> .data property >> >> Bio/SeqIO/SffIO.py: >> _sff_read_roche_index_xml >> >> -------------------- >> >> The tostring() method of the class Seq in Bio/Seq.py: >> Can we declare this obsolete? >> >> -Michiel > > Bio/SeqIO/SffIO.py function _sff_read_roche_index_xml done: > https://github.com/biopython/biopython/commit/567464d9a5f8b87ec48e95bae127b86463bd4da1 > > Bio/File.py and Bio/ParserSupport.py bits done: > https://github.com/biopython/biopython/commit/63997ea0afa5f7f6cac5c1b036d56416b04edb2a > > GenomeDiagram centre setters done: > https://github.com/biopython/biopython/commit/2424c5ca36cdf4348b54bafdae444a91a6457288 Michiel already did most of the others, https://github.com/biopython/biopython/commit/1b2025bee868b0282b913690a999833d13598ea4 I've just removed the Seq object's deprecated data property: https://github.com/biopython/biopython/commit/e3cf12a1bf28c1cd52e4b5492fb1cd76731b486b For the Seq object's tostring() method, let's review Bow's pull request after this release? https://github.com/biopython/biopython/pull/137 Regards, Peter From p.j.a.cock at googlemail.com Mon Feb 4 12:26:44 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 4 Feb 2013 17:26:44 +0000 Subject: [Biopython-dev] Bio.Motif update In-Reply-To: <1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com> References: <1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 2:53 PM, Michiel de Hoon wrote: > Hi Peter and all, > > --- On Tue, 1/29/13, Peter Cock wrote: >> We need to say something about this in the NEWS file too. > > Done. > >> I think it would make sense to add a PendingDeprecationWarning >> to Bio.Motif now. > > Done. > >> Also, if you feel the new Bio.motifs API isn't quite >> settled yet, adding the new BiopythonExperimentalWarning to >> that makes sense. > > I don't expect big changes in the API, so I think we can do without the > BiopythonExperimentalWarning. Also we should avoid the situation > that Bio.Motif gives a DeprecationWarning, and Bio.Motifs gives a > BiopythonExperimentalWarning. > >> (And once this is settled, I think we can schedule the >> release) Hi Michiel, Rather than having two (very similar) chapters in the Tutorial for the old Bio.Motif and new Bio.motifs modules, I've downgraded the old chapter to just a section of the new chapter: https://github.com/biopython/biopython/commit/ee5cccf6bc661befc924cb7fc2a422c07f3eeee1 There is still a lot of redundant content - would you be able to shorten this? Or can we just cut it and refer anyone interested to the tutorial shipped with Biopython 1.60 instead? I think a summary of the differences be more useful, to help people convert from the old module to the new motifs module. Also, what is the point of the Bio.motifs.create function? Is there a reason not to initialise a Motif object directly? Thanks, Peter From p.j.a.cock at googlemail.com Mon Feb 4 12:57:42 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 4 Feb 2013 17:57:42 +0000 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: Message-ID: On Fri, Feb 1, 2013 at 3:03 PM, Peter Cock wrote: > Hello all, > > I think we're overdue for a Biopython release now, and I would > like to do this next week. There are still plenty more additions > and enhancements waiting in the wings, but right now I just > want any remaining bug fixes addressed. > > Are there any release blocking issues? > > Thanks, > > Peter Hi all, I've posted the current tutorial as HTML and PDF online [*], http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html http://biopython.org/DIST/docs/tutorial/Tutorial-dev.pdf It would be great to have you all re-read chapters you've contributed to or are familiar with - and fix or report any more typos etc. Note that some of the embedded examples in the LaTeX source are now tested via doctest using test_Tutorial.py, so if you do make some local edits run that before you commit them. Thanks, Peter [*] Those URLs used to be updated nightly, something I've not yet restored since the website was moved from the old OBF hardware to an Amazon cloud server. The simplest option here would be to install latex on the server... From redmine at redmine.open-bio.org Mon Feb 4 13:14:19 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 4 Feb 2013 18:14:19 +0000 Subject: [Biopython-dev] [Biopython - Bug #3409] (New) Newick parser fails to parse Greengenes tree (Bio.Phylo) Message-ID: Issue #3409 has been reported by Ben Morris. ---------------------------------------- Bug #3409: Newick parser fails to parse Greengenes tree (Bio.Phylo) https://redmine.open-bio.org/issues/3409 Author: Ben Morris Status: New Priority: Normal Assignee: Category: Target version: URL: The file is available here: http://www.evoio.org/wg/evoio/images/f/f9/Greengenes2011.txt (9.2 MB) The problem may be related to the use of single-quoted node labels which sometimes contain parentheses, e.g.
'p__Fusobacteria; c__Fusobacteria (class); o__Fusobacteriales; f__Fusobacteriaceae':0.11021
Exception:
  ...
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 110, in _parse_subtree
    clade.clades = [self._parse_subtree(st) for st in subtrees]
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 110, in _parse_subtree
    clade.clades = [self._parse_subtree(st) for st in subtrees]
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 110, in _parse_subtree
    clade.clades = [self._parse_subtree(st) for st in subtrees]
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 110, in _parse_subtree
    clade.clades = [self._parse_subtree(st) for st in subtrees]
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 87, in _parse_subtree
    raise NewickError("Parentheses do not match in (sub)tree: " + text)
Bio.Phylo.NewickIO.NewickError: Parentheses do not match in (sub)tree: 139839:0.04507):0.02429
Other Newick parsers (ete and dendropy) are able to parse this file. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From mjldehoon at yahoo.com Mon Feb 4 23:01:26 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 4 Feb 2013 20:01:26 -0800 (PST) Subject: [Biopython-dev] Bio.Motif update In-Reply-To: Message-ID: <1360036886.33220.YahooMailClassic@web164006.mail.gq1.yahoo.com> Hi Peter, --- On Mon, 2/4/13, Peter Cock wrote: > Rather than having two (very similar) chapters in the > Tutorial for the old Bio.Motif and new Bio.motifs modules, > I've downgraded the old chapter to just a section of > the new chapter: ... > There is still a lot of redundant content - would you be > able to shorten this? I think it's OK if it is redundant. Anyway the chapter on the older Bio.Motif will be removed a few releases later. > I think a summary of the differences?be more useful, > to help people convert from the old module to the new > motifs module. Maybe, but for me it doesn't have a high priority. It's easier to understand the new chapter on Bio.motifs. > Also, what is the point of the Bio.motifs.create function? > Is there a reason not to initialise a Motif object directly? There are two ways to initialize a Motif: either to specify the alignment from which the motif is created, or directly from a position-weight matrix. This can be a bit confusing. To separate the two, the Bio.motifs.create function only initializes a Motif from an alignment; some of the motif parsers initialize a Motif from a position-weight matrix. Best, -Michiel. From p.j.a.cock at googlemail.com Tue Feb 5 07:32:47 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 12:32:47 +0000 Subject: [Biopython-dev] KEGG enhancements Message-ID: Hi all, We have a couple of new pull requests for KEGG enhancements, which we can look at after the imminent Biopython 1.61 release goes out this week. Kevin's working on the REST API, https://github.com/biopython/biopython/pull/152 Leighton's working on KGML and graphics, https://github.com/biopython/biopython/pull/153 There is a tiny bit of online access code in Leighton's code which can probably be changed to use Kevin's work - I've not had time to examine the overlap yet. Peter ---------- Forwarded message ---------- From: kevin Date: Mon, Feb 4, 2013 at 8:03 PM Subject: [biopython] Add KEGG API Querying Support (#152) To: biopython/biopython This adds support to query KEGG's REST API (http://www.kegg.jp/kegg/docs/keggapi.html) along with simple tests which ensure that the correct url is hit and documentation in the cookbook. This has been discussed on the mailing list in the following thread: http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009981.html. ________________________________ You can merge this Pull Request by running git pull https://github.com/kevinwuhoo/biopython master Or view, comment on, or merge it at: https://github.com/biopython/biopython/pull/152 Commit Summary Added a KEGG API Wrapper Forgot copyright Added a general parser and a KEGG section in the tutorial. Updated querying code and corresponding tests. Updated documentation to reflect changes in KEGG module. File Changes M Bio/KEGG/__init__.py (196) M Doc/Tutorial.tex (88) M Tests/output/test_KEGG (41) M Tests/test_KEGG.py (159) Patch Links: https://github.com/biopython/biopython/pull/152.patch https://github.com/biopython/biopython/pull/152.diff From p.j.a.cock at googlemail.com Tue Feb 5 07:33:52 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 12:33:52 +0000 Subject: [Biopython-dev] KEGG enhancements In-Reply-To: References: Message-ID: On Tue, Feb 5, 2013 at 12:32 PM, Peter Cock wrote: > Hi all, > > We have a couple of new pull requests for KEGG enhancements, > which we can look at after the imminent Biopython 1.61 release > goes out this week. > > Kevin's working on the REST API, > https://github.com/biopython/biopython/pull/152 > > Leighton's working on KGML and graphics, Sorry, the correct URL, https://github.com/biopython/biopython/pull/155 Details below, Peter ---------- Forwarded message ---------- From: Leighton Pritchard Date: Tue, Feb 5, 2013 at 12:28 PM Subject: [biopython] KGML files (#155) To: biopython/biopython As we discussed - not an ideal pull request (rebasing added the recent Biopython changes to the KEGG branch, rather than what was expected), but if it's workable, here's the code in a way that doesn't seem to break Biopython ;) L. ________________________________ You can merge this Pull Request by running git pull https://github.com/widdowquinn/biopython kegg Or view, comment on, or merge it at: https://github.com/biopython/biopython/pull/155 Commit Summary First addition of KGML module (with tests) Moved Bio.KGML to Bio.KEGG.KGML and split KGML tests Modified comments to indicate TODO Removed accidentally-committed files Fix typo in error message Fix typo in blastall wrapper Add new Blast 2.2.27+ arguments to wrappers Ignore new blastx arguments if testing with old BLAST+ BLAST 2.2.27+ dropped -frame_shift_penalty argument Remove deprecated Bio.File.StringHandle and SGMLStripper Remove centre setters, add explicit deprecation warning to getters. Clarify docstrings of deprecated BLAST functions. Avoid ResourceWarning: unclosed file in these doctests Close handle in this doctest Remove the deprecated Seq object's data property Remove duplicated section labels in Tutorial (in repeated Motifs text) Downgrade Bio.Motif chapter to a section at the end of the Bio.motifs chapter Fix a typo Clarify docstring for obsolete Bio.Motif module Explain Bio.motifs replaces Bio.Motif in its docstring Update date in Tutorial Fix 2 typos. Add links to SearchIO tutorial files Update SearchIO tutorial language style Add links to SearchIO documentation pages Tutorial specific example files have previously gone under Doc/examples Update paths in tutorial after moving example files File Changes M Bio/Blast/Applications.py (36) M Bio/Blast/NCBIStandalone.py (21) M Bio/File.py (65) M Bio/Graphics/GenomeDiagram/_AbstractDrawer.py (30) M Bio/Graphics/GenomeDiagram/_Graph.py (14) A Bio/Graphics/KGML_vis.py (422) A Bio/KEGG/KGML/KGML_parser.py (184) A Bio/KEGG/KGML/KGML_pathway.py (766) A Bio/KEGG/KGML/KGML_scrape.py (109) A Bio/KEGG/KGML/__init__.py (15) M Bio/Motif/__init__.py (13) M Bio/ParserSupport.py (34) M Bio/Seq.py (33) M Bio/SeqIO/SffIO.py (1) M Bio/SeqRecord.py (6) M Bio/motifs/__init__.py (7) M DEPRECATED (8) M Doc/Tutorial.tex (164) A Doc/examples/my_blast.xml (0) A Doc/examples/my_blat.psl (0) A Tests/KEGG/ko01100.kgml (17805) A Tests/KEGG/ko01100.xml (25176) A Tests/KEGG/ko01100_mod_original.pdf (98) A Tests/KEGG/ko01100_original.pdf (98) A Tests/KEGG/ko01120.xml (11425) A Tests/KEGG/ko03070.kgml (249) A Tests/KEGG/ko03070.xml (413) A Tests/KEGG/ko03070_mod_original.pdf (113) A Tests/KEGG/ko03070_original.pdf (113) A Tests/KEGG/map01100.png (0) A Tests/KEGG/map03070.png (0) D Tests/Tutorial/README.txt (9) M Tests/test_File.py (13) A Tests/test_KGML_graphics.py (138) A Tests/test_KGML_nographics.py (99) A Tests/test_KGML_online.py (68) M Tests/test_NCBI_BLAST_tools.py (9) M Tests/test_ParserSupport.py (9) M setup.py (1) Patch Links: https://github.com/biopython/biopython/pull/155.patch https://github.com/biopython/biopython/pull/155.diff From p.j.a.cock at googlemail.com Tue Feb 5 07:36:55 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 12:36:55 +0000 Subject: [Biopython-dev] KEGG enhancements In-Reply-To: References: Message-ID: On Tue, Feb 5, 2013 at 12:33 PM, Peter Cock wrote: > On Tue, Feb 5, 2013 at 12:32 PM, Peter Cock wrote: >> Hi all, >> >> We have a couple of new pull requests for KEGG enhancements, >> which we can look at after the imminent Biopython 1.61 release >> goes out this week. >> >> Kevin's working on the REST API, >> https://github.com/biopython/biopython/pull/152 >> >> Leighton's working on KGML and graphics, > > Sorry, the correct URL, https://github.com/biopython/biopython/pull/155 > > Details below, See also Leighton's blog posts about this work (with pictures): http://armchairbiology.blogspot.co.uk/2013/01/keggwatch-part-i.html http://armchairbiology.blogspot.co.uk/2013/02/keggwatch-part-ii.html http://armchairbiology.blogspot.co.uk/2013/02/keggwatch-part-iii.html Regards, Peter From p.j.a.cock at googlemail.com Tue Feb 5 08:55:20 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 13:55:20 +0000 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: Message-ID: Hi all, I'm going to try and do the release this afternoon, so no commits to the master branch until further notice please. Thanks, Peter From p.j.a.cock at googlemail.com Tue Feb 5 09:49:20 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 14:49:20 +0000 Subject: [Biopython-dev] Biopython 1.61 release Message-ID: On Tue, Feb 5, 2013 at 1:55 PM, Peter Cock wrote: > Hi all, > > I'm going to try and do the release this afternoon, so > no commits to the master branch until further notice > please. > > Thanks, > > Peter The release is in progress... The Windows installers are on the website for some quick pre-announcement testing. If anyone spots an issue, please email me ASAP: http://biopython.org/DIST/ Last time we put 'beta' in the Python 3.2 installer to emphasise this was still not quite reading for prime time. Should we do that again? How comfortable are we all about encouraging more use under Python 3? Thanks, Peter From p.j.a.cock at googlemail.com Tue Feb 5 13:14:24 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 18:14:24 +0000 Subject: [Biopython-dev] Biopython 1.61 release In-Reply-To: References: Message-ID: On Tue, Feb 5, 2013 at 2:49 PM, Peter Cock wrote: > On Tue, Feb 5, 2013 at 1:55 PM, Peter Cock wrote: >> Hi all, >> >> I'm going to try and do the release this afternoon, so >> no commits to the master branch until further notice >> please. >> >> Thanks, >> >> Peter > > The release is in progress... > > The Windows installers are on the website for some quick > pre-announcement testing. If anyone spots an issue, please > email me ASAP: http://biopython.org/DIST/ > > Last time we put 'beta' in the Python 3.2 installer to emphasise > this was still not quite reading for prime time. Should we do that > again? How comfortable are we all about encouraging more > use under Python 3? I'm planning to do the same in terms of putting beta in the Windows installer for Python 3.2. After some trouble, I now have the epydoc API files updated (a manual refresh might be needed to see the changes): http://biopython.org/DIST/docs/api/ Bow - the `backtick` markup doesn't do anything in epydoc, but perhaps for the next release we can turn the SearchIO markup into restructuredtext instead? I think last time I didn't have the docutils dependency installed in order for epydoc to try and parse the restructuredtext (used in Bio.Phylo). Running epydoc also showed a few more epydoc formatting errors, fixed in git - I will now regenerate the installers, and tag this in git etc. Peter From w.arindrarto at gmail.com Tue Feb 5 13:22:46 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 5 Feb 2013 19:22:46 +0100 Subject: [Biopython-dev] Biopython 1.61 release In-Reply-To: References: Message-ID: Hi Peter, > Bow - the `backtick` markup doesn't do anything in epydoc, but > perhaps for the next release we can turn the SearchIO markup > into restructuredtext instead? > > I think last time I didn't have the docutils dependency installed > in order for epydoc to try and parse the restructuredtext (used > in Bio.Phylo). Running epydoc also showed a few more epydoc > formatting errors, fixed in git - I will now regenerate the installers, > and tag this in git etc. Hmm..IIRC, I did wrote the entire SearchIO doc using reStructuredText markup; in hindsight probably not wise since we still rely on epydoc. Using rSt for the next release sounds good. On a related not, do we have any solid plans to move out of epydoc (and into Sphinx?) for the next release? regards, Bow From p.j.a.cock at googlemail.com Tue Feb 5 13:30:29 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 18:30:29 +0000 Subject: [Biopython-dev] Biopython 1.61 release In-Reply-To: References: Message-ID: On Tue, Feb 5, 2013 at 6:22 PM, Wibowo Arindrarto wrote: > Hi Peter, > >> Bow - the `backtick` markup doesn't do anything in epydoc, but >> perhaps for the next release we can turn the SearchIO markup >> into restructuredtext instead? >> >> I think last time I didn't have the docutils dependency installed >> in order for epydoc to try and parse the restructuredtext (used >> in Bio.Phylo). Running epydoc also showed a few more epydoc >> formatting errors, fixed in git - I will now regenerate the installers, >> and tag this in git etc. > > Hmm..IIRC, I did wrote the entire SearchIO doc using reStructuredText > markup; in hindsight probably not wise since we still rely on epydoc. > Using rSt for the next release sounds good. Using reStructuredText (like Eric did with Bio.Phylo) would have been (and is) fine, however you had __docformat__ = 'epytext en' in the file. > On a related not, do we have any solid plans to move out of epydoc > (and into Sphinx?) for the next release? Not yet - but moving all the docstrings to reStructuredText is a very good step towards that, and a chance to review/update all the plain text docstrings in particular to look nicer and be more consistent. Peter From p.j.a.cock at googlemail.com Tue Feb 5 13:57:58 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 18:57:58 +0000 Subject: [Biopython-dev] Biopython 1.61 release In-Reply-To: References: Message-ID: Hi all, The Biopython 1.61 release files are live, http://biopython.org/DIST/ and this its tagged on GitHub now, i.e. this commit: https://github.com/biopython/biopython/commit/d372e59b3d9147cd9855feb6e3b90ff523f539b5 I've not yet pushed this to PyPI, nor done the announcement. If anyone would like to write a draft based on the NEWS file and the previous announcements during the next hour or two, that would be great. Otherwise I'll do this after dinner... Thanks, Peter From p.j.a.cock at googlemail.com Tue Feb 5 16:30:45 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 21:30:45 +0000 Subject: [Biopython-dev] Biopython 1.61 release In-Reply-To: References: Message-ID: On Tue, Feb 5, 2013 at 6:57 PM, Peter Cock wrote: > Hi all, > > The Biopython 1.61 release files are live, http://biopython.org/DIST/ > and this its tagged on GitHub now, i.e. this commit: > https://github.com/biopython/biopython/commit/d372e59b3d9147cd9855feb6e3b90ff523f539b5 > > I've not yet pushed this to PyPI, nor done the announcement. > > If anyone would like to write a draft based on the NEWS file > and the previous announcements during the next hour or two, > that would be great. Otherwise I'll do this after dinner... > > Thanks, > > Peter Draft text below, based heavily on the NEWS file - any comments? I'll post the new Tutorial online now, and then update the Downloads page on the wiki before posting this. Peter -- Biopython 1.61 released Source distributions and Windows installers for Biopython 1.61 are now available from the downloads page on the Biopython website and from the Python Package Index (PyPI). The updated Biopython Tutorial and Cookbook is online (PDF). Platforms/Deployment We currently support Python 2.5, 2.6 and 2.7 and also test under Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C extensions). We are still encouraging early adopters to help test on these platforms, and have included a ?beta? installer for Python 3.2 (and Python 3.3 too follow soon) under 32-bit Windows. Please note we are phasing out support for Python 2.5. We will continue support for at least one further release (Biopython 1.62). This could be extended given feedback from our users. Focusing on Python 2.6 and 2.7 only will make writing Python 3 compatible code easier. Features GenomeDiagram has three new sigils (shapes to illustrate features). OCTO shows an octagonal shape, like the existing BOX sigil but with the corners cut off. JAGGY shows a box with jagged edges at the start and end, intended for things like NNNNN regions in draft genomes. Finally BIGARROW is like the existing ARROW sigil but is drawn straddling the axis. This is useful for drawing vertically compact figures where you do not have overlapping genes. New module Bio.Graphics.ColorSpiral can generate colors along a spiral path through HSV color space. This can be used to make arbitrary ?rainbow? scales, for example to color features or cross-links on a GenomeDiagram figure. The Bio.SeqIO module now supports reading sequences from PDB files in two different ways. The ?pdb-atom? format determines the sequence as it appears in the structure based on the atom coordinate section of the file (via Bio.PDB, so NumPy is currently required for this). Alternatively, you can use the ?pdb-seqres? format to read the complete protein sequence as it is listed in the PDB header, if available. The Bio.SeqUtils module how has a seq1 function to turn a sequence using three letter amino acid codes into one using the more common one letter codes. This acts as the inverse of the existing seq3 function. The multiple-sequence-alignment object used by Bio.AlignIO etc now supports an annotation dictionary. Additional support for per-column annotation is planned, with addition and splicing to work like that for the SeqRecord per-letter annotation. The Bio.Motif module has been updated and reorganized. To allow for a clean deprecation of the old code, the new motif code is stored in a new module Bio.motifs, and a PendingDeprecationWarning was added to Bio.Motif. Experimental Code ? SearchIO This release also includes Bow?s Google Summer of Code work writing a unified parsing framework for NCBI BLAST (assorted formats including tabular and XML), HMMER, BLAT, and other sequence searching tools. This is currently available with the new BiopythonExperimentalWarning to indicate that this is still somewhat experimental. We?re bundling it with the main release to get more public feedback, but with the big warning that the API is likely to change. In fact, even the current name of Bio.SearchIO may change since unless you are familiar with BioPerl its purpose isn?t immediately clear. Contributors Brandon Invergo Bryan Lunt (first contribution) Christian Brueffer (first contribution) David Cain Eric Talevich Grace Yeo (first contribution) Jeffrey Chang Jingping Li (first contribution) Kai Blin (first contribution) Leighton Pritchard Lenna Peterson Lucas Sinclair (first contribution) Michiel de Hoon Nick Semenkovich (first contribution) Peter Cock Robert Ernst (first contribution) Tiago Antao Wibowo ?Bow? Arindrarto From p.j.a.cock at googlemail.com Tue Feb 5 16:42:06 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 21:42:06 +0000 Subject: [Biopython-dev] Biopython 1.61 release In-Reply-To: References: Message-ID: On Tue, Feb 5, 2013 at 9:34 PM, Lenna Peterson wrote: > Hi Peter, > > Looks great. Very small typo: in the last sentence of the paragraph about > platforms, "Python 3.3 too follow" should be "Python 3.3 to follow". Thanks Lenna :) I didn't make an installer for Python 3.3 this afternoon, but I will tomorrow having heard back from the NumPy 1.7 release manager that there shouldn't be any problems from compiling against their release candidate: http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065369.html On a related point, NumPy are looking at if they can include pre-compiled installers for 64bit Windows - once that happens (and it may have to wait until NumPy 1.8), we will need to look at this too: http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065339.html Peter From p.j.a.cock at googlemail.com Tue Feb 5 17:05:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 22:05:25 +0000 Subject: [Biopython-dev] Biopython 1.61 released Message-ID: Dear Biopythoneers, Source distributions and Windows installers for Biopython 1.61 are now available from the downloads page on the Biopython website and from the Python Package Index (PyPI). The updated Biopython Tutorial and Cookbook is online (PDF). Platforms/Deployment: We currently support Python 2.5, 2.6 and 2.7 and also test under Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C extensions). We are still encouraging early adopters to help test on these platforms, and have included a ?beta? installer for Python 3.2 (and Python 3.3 to follow soon) under 32-bit Windows. Please note we are phasing out support for Python 2.5. We will continue support for at least one further release (Biopython 1.62). This could be extended given feedback from our users. Focusing on Python 2.6 and 2.7 only will make writing Python 3 compatible code easier. New Features: GenomeDiagram has three new sigils (shapes to illustrate features). OCTO shows an octagonal shape, like the existing BOX sigil but with the corners cut off. JAGGY shows a box with jagged edges at the start and end, intended for things like NNNNN regions in draft genomes. Finally BIGARROW is like the existing ARROW sigil but is drawn straddling the axis. This is useful for drawing vertically compact figures where you do not have overlapping genes. New module Bio.Graphics.ColorSpiral can generate colors along a spiral path through HSV color space. This can be used to make arbitrary ?rainbow? scales, for example to color features or cross-links on a GenomeDiagram figure. The Bio.SeqIO module now supports reading sequences from PDB files in two different ways. The ?pdb-atom? format determines the sequence as it appears in the structure based on the atom coordinate section of the file (via Bio.PDB, so NumPy is currently required for this). Alternatively, you can use the ?pdb-seqres? format to read the complete protein sequence as it is listed in the PDB header, if available. The Bio.SeqUtils module how has a seq1 function to turn a sequence using three letter amino acid codes into one using the more common one letter codes. This acts as the inverse of the existing seq3 function. The multiple-sequence-alignment object used by Bio.AlignIO etc now supports an annotation dictionary. Additional support for per-column annotation is planned, with addition and splicing to work like that for the SeqRecord per-letter annotation. The Bio.Motif module has been updated and reorganized. To allow for a clean deprecation of the old code, the new motif code is stored in a new module Bio.motifs, and a PendingDeprecationWarning was added to Bio.Motif. Experimental Code ? SearchIO: This release also includes Bow?s Google Summer of Code work writing a unified parsing framework for NCBI BLAST (assorted formats including tabular and XML), HMMER, BLAT, and other sequence searching tools. This is currently available with the new BiopythonExperimentalWarning to indicate that this is still somewhat experimental. We?re bundling it with the main release to get more public feedback, but with the big warning that the API is likely to change. In fact, even the current name of Bio.SearchIO may change since unless you are familiar with BioPerl its purpose isn?t immediately clear. Contributors: Brandon Invergo Bryan Lunt (first contribution) Christian Brueffer (first contribution) David Cain Eric Talevich Grace Yeo (first contribution) Jeffrey Chang Jingping Li (first contribution) Kai Blin (first contribution) Leighton Pritchard Lenna Peterson Lucas Sinclair (first contribution) Michiel de Hoon Nick Semenkovich (first contribution) Peter Cock Robert Ernst (first contribution) Tiago Antao Wibowo ?Bow? Arindrarto Thank you all. Release announcement here (RSS feed available): http://news.open-bio.org/news/2013/02/biopython-1-61-released/ P.S. You can follow @Biopython on Twitter https://twitter.com/Biopython From p.j.a.cock at googlemail.com Tue Feb 5 17:38:32 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 22:38:32 +0000 Subject: [Biopython-dev] More 'fun' with GenBank In-Reply-To: <50FD0F2B.1080606@biotech.uni-tuebingen.de> References: <50F57BC5.7020607@biotech.uni-tuebingen.de> <50F66496.8000109@biotech.uni-tuebingen.de> <50FD0F2B.1080606@biotech.uni-tuebingen.de> Message-ID: On Mon, Jan 21, 2013 at 9:49 AM, Kai Blin wrote: > >> Kai - would you mind retesting with f_loc5 (the rebased branch)? > > The location of the feature that caused trouble for me still looks > correct. I'm currently running some more sequences, but I'm pretty > confident that the code will work just fine. The tests I added to the > genbank parser code for all the problem cases I had pass, after all. :) > >> Everyone - does it seem sensible to include this now, ready for the >> upcoming release (*)? Or perhaps just after the release? > > I'd perfer having this in the next release if possible, but of course > if the release after that is coming up within a reasonable time frame, > that would work as well. > > Cheers, > Kai Unless anyone objects, I will apply the (rebased) version of this f_loc4 / f_loc5 branch later this week (now that Biopython 1.61 is out). This replaces the SeqFeature use of sub_features with a new CompoundLocation which I think is a far more natural way to handle join locations in EMBL/GenBank files. Also, it means we can offer parsing of GenBank/EMBL style location lines into (Compound)Location objects directly :) Regards, Peter From w.arindrarto at gmail.com Tue Feb 5 19:03:52 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 6 Feb 2013 01:03:52 +0100 Subject: [Biopython-dev] [Biopython] Biopython 1.61 released In-Reply-To: References: Message-ID: Hi Peter, > Dear Biopythoneers, > > Source distributions and Windows installers for Biopython 1.61 are now > available from the downloads page on the Biopython website and from > the Python Package Index (PyPI). > > The updated Biopython Tutorial and Cookbook is online (PDF). > > Platforms/Deployment: > > We currently support Python 2.5, 2.6 and 2.7 and also test under > Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython > 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C > extensions). We are still encouraging early adopters to help test on > these platforms, and have included a ?beta? installer for Python 3.2 > (and Python 3.3 to follow soon) under 32-bit Windows. > > Please note we are phasing out support for Python 2.5. We will > continue support for at least one further release (Biopython 1.62). > This could be extended given feedback from our users. Focusing on > Python 2.6 and 2.7 only will make writing Python 3 compatible code > easier. > > New Features: > > GenomeDiagram has three new sigils (shapes to illustrate features). > OCTO shows an octagonal shape, like the existing BOX sigil but with > the corners cut off. JAGGY shows a box with jagged edges at the start > and end, intended for things like NNNNN regions in draft genomes. > Finally BIGARROW is like the existing ARROW sigil but is drawn > straddling the axis. This is useful for drawing vertically compact > figures where you do not have overlapping genes. > > New module Bio.Graphics.ColorSpiral can generate colors along a spiral > path through HSV color space. This can be used to make arbitrary > ?rainbow? scales, for example to color features or cross-links on a > GenomeDiagram figure. > > The Bio.SeqIO module now supports reading sequences from PDB files in > two different ways. The ?pdb-atom? format determines the sequence as > it appears in the structure based on the atom coordinate section of > the file (via Bio.PDB, > so NumPy is currently required for this). Alternatively, you can use > the ?pdb-seqres? format to read the complete protein sequence as it is > listed in the PDB header, if available. > > The Bio.SeqUtils module how has a seq1 function to turn a sequence > using three letter amino acid codes into one using the more common one > letter codes. This acts as the inverse of the existing seq3 function. > > The multiple-sequence-alignment object used by Bio.AlignIO etc now > supports an annotation dictionary. Additional support for per-column > annotation is planned, with addition and splicing to work like that > for the SeqRecord per-letter annotation. > > The Bio.Motif module has been updated and reorganized. To allow for a > clean deprecation of the old code, the new motif code is stored in a > new module Bio.motifs, and a PendingDeprecationWarning was added to > Bio.Motif. > > Experimental Code ? SearchIO: > > This release also includes Bow?s Google Summer of Code work writing a > unified parsing framework for NCBI BLAST (assorted formats including > tabular and XML), HMMER, BLAT, and other sequence searching tools. > This is currently available with the new BiopythonExperimentalWarning > to indicate that this is still somewhat experimental. We?re bundling > it with the main release to get more public feedback, but with the big > warning that the API is likely to change. In fact, even the current > name of Bio.SearchIO may change since unless you are familiar with > BioPerl its purpose isn?t immediately clear. > > Contributors: > > Brandon Invergo > Bryan Lunt (first contribution) > Christian Brueffer (first contribution) > David Cain > Eric Talevich > Grace Yeo (first contribution) > Jeffrey Chang > Jingping Li (first contribution) > Kai Blin (first contribution) > Leighton Pritchard > Lenna Peterson > Lucas Sinclair (first contribution) > Michiel de Hoon > Nick Semenkovich (first contribution) > Peter Cock > Robert Ernst (first contribution) > Tiago Antao > Wibowo ?Bow? Arindrarto > > Thank you all. > > Release announcement here (RSS feed available): > http://news.open-bio.org/news/2013/02/biopython-1-61-released/ > > P.S. You can follow @Biopython on Twitter > https://twitter.com/Biopython Thanks for doing the release! It feels exciting to see SearchIO code finally live in the distributions :). Hopefully this will result in more feedback (and then more improvements ~ likewise for the whole Biopython as well). Also, thank you as well to everyone who has criticized / commented / contributed code to the module :). cheers, Bow From mjldehoon at yahoo.com Tue Feb 5 20:03:30 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 5 Feb 2013 17:03:30 -0800 (PST) Subject: [Biopython-dev] [Biopython-announce] Biopython 1.61 released In-Reply-To: Message-ID: <1360112610.67186.YahooMailClassic@web164006.mail.gq1.yahoo.com> Thanks Peter! Great to see this new code out. Best, -Michiel. --- On Tue, 2/5/13, Peter Cock wrote: > From: Peter Cock > Subject: [Biopython-announce] Biopython 1.61 released > To: biopython-announce at lists.open-bio.org, "Biopython Mailing List" , "Biopython-Dev Mailing List" > Date: Tuesday, February 5, 2013, 5:05 PM > Dear Biopythoneers, > > Source distributions and Windows installers for Biopython > 1.61 are now > available from the downloads page on the Biopython website > and from > the Python Package Index (PyPI). > > The updated Biopython Tutorial and Cookbook is online > (PDF). > > Platforms/Deployment: > > We currently support Python 2.5, 2.6 and 2.7 and also test > under > Python 3.1, 3.2 and 3.3 (including modules using NumPy), and > Jython > 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our > C > extensions). We are still encouraging early adopters to help > test on > these platforms, and have included a ?beta? installer > for Python 3.2 > (and Python 3.3 to follow soon) under 32-bit Windows. > > Please note we are phasing out support for Python 2.5. We > will > continue support for at least one further release (Biopython > 1.62). > This could be extended given feedback from our users. > Focusing on > Python 2.6 and 2.7 only will make writing Python 3 > compatible code > easier. > > New Features: > > GenomeDiagram has three new sigils (shapes to illustrate > features). > OCTO shows an octagonal shape, like the existing BOX sigil > but with > the corners cut off. JAGGY shows a box with jagged edges at > the start > and end, intended for things like NNNNN regions in draft > genomes. > Finally BIGARROW is like the existing ARROW sigil but is > drawn > straddling the axis. This is useful for drawing vertically > compact > figures where you do not have overlapping genes. > > New module Bio.Graphics.ColorSpiral can generate colors > along a spiral > path through HSV color space. This can be used to make > arbitrary > ?rainbow? scales, for example to color features or > cross-links on a > GenomeDiagram figure. > > The Bio.SeqIO module now supports reading sequences from PDB > files in > two different ways. The ?pdb-atom? format determines the > sequence as > it appears in the structure based on the atom coordinate > section of > the file (via Bio.PDB, > so NumPy is currently required for this). Alternatively, you > can use > the ?pdb-seqres? format to read the complete protein > sequence as it is > listed in the PDB header, if available. > > The Bio.SeqUtils module how has a seq1 function to turn a > sequence > using three letter amino acid codes into one using the more > common one > letter codes. This acts as the inverse of the existing seq3 > function. > > The multiple-sequence-alignment object used by Bio.AlignIO > etc now > supports an annotation dictionary. Additional support for > per-column > annotation is planned, with addition and splicing to work > like that > for the SeqRecord per-letter annotation. > > The Bio.Motif module has been updated and reorganized. To > allow for a > clean deprecation of the old code, the new motif code is > stored in a > new module Bio.motifs, and a PendingDeprecationWarning was > added to > Bio.Motif. > > Experimental Code ? SearchIO: > > This release also includes Bow?s Google Summer of Code > work writing a > unified parsing framework for NCBI BLAST (assorted formats > including > tabular and XML), HMMER, BLAT, and other sequence searching > tools. > This is currently available with the new > BiopythonExperimentalWarning > to indicate that this is still somewhat experimental. > We?re bundling > it with the main release to get more public feedback, but > with the big > warning that the API is likely to change. In fact, even the > current > name of Bio.SearchIO may change since unless you are > familiar with > BioPerl its purpose isn?t immediately clear. > > Contributors: > > Brandon Invergo > Bryan Lunt (first contribution) > Christian Brueffer (first contribution) > David Cain > Eric Talevich > Grace Yeo (first contribution) > Jeffrey Chang > Jingping Li (first contribution) > Kai Blin (first contribution) > Leighton Pritchard > Lenna Peterson > Lucas Sinclair (first contribution) > Michiel de Hoon > Nick Semenkovich (first contribution) > Peter Cock > Robert Ernst (first contribution) > Tiago Antao > Wibowo ?Bow? Arindrarto > > Thank you all. > > Release announcement here (RSS feed available): > http://news.open-bio.org/news/2013/02/biopython-1-61-released/ > > P.S. You can follow @Biopython on Twitter > https://twitter.com/Biopython > > _______________________________________________ > Biopython-announce mailing list? -? Biopython-announce at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-announce > From mjldehoon at yahoo.com Tue Feb 5 20:07:53 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 5 Feb 2013 17:07:53 -0800 (PST) Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) Message-ID: <1360112873.79741.YahooMailClassic@web164005.mail.gq1.yahoo.com> With Biopython 1.61 now out, perhaps this is a good time to tackle Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like to replace this with a plain C module, or perhaps with a pure-Python parser. This issue was previously discussed here: http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html Or is anybody else already looking at this module? Best, -Michiel. From arklenna at gmail.com Tue Feb 5 20:31:16 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Tue, 5 Feb 2013 20:31:16 -0500 Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: <1360112873.79741.YahooMailClassic@web164005.mail.gq1.yahoo.com> References: <1360112873.79741.YahooMailClassic@web164005.mail.gq1.yahoo.com> Message-ID: Hi Michiel, I worked on that a bit early last year. See thread on this bug: https://redmine.open-bio.org/issues/2619 Namely, I determined that the flex headers aren't required to compile the flex-generated C, which is a great start. I also started work on a PLY-based pure Python reimplementation. Pull request here: https://github.com/biopython/biopython/pull/33 I haven't looked at this code in quite a long time! Let me know if you have any questions about what I did and I will do my best to remember... Cheers, Lenna On Tue, Feb 5, 2013 at 8:07 PM, Michiel de Hoon wrote: > With Biopython 1.61 now out, perhaps this is a good time to tackle > Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like > to replace this with a plain C module, or perhaps with a pure-Python > parser. This issue was previously discussed here: > > http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html > > Or is anybody else already looking at this module? > > Best, > -Michiel. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From kieran.mace at gmail.com Tue Feb 5 21:05:19 2013 From: kieran.mace at gmail.com (Kieran Mace) Date: Tue, 5 Feb 2013 18:05:19 -0800 Subject: [Biopython-dev] [Biopython-announce] Biopython 1.61 released In-Reply-To: <1360112610.67186.YahooMailClassic@web164006.mail.gq1.yahoo.com> References: <1360112610.67186.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: Hi. I'm wondering if the MafIO module is going to be included in this release? -Kieran On Feb 5, 2013, at 17:03, Michiel de Hoon wrote: > Thanks Peter! > Great to see this new code out. > > Best, > -Michiel. > > --- On Tue, 2/5/13, Peter Cock wrote: > >> From: Peter Cock >> Subject: [Biopython-announce] Biopython 1.61 released >> To: biopython-announce at lists.open-bio.org, "Biopython Mailing List" , "Biopython-Dev Mailing List" >> Date: Tuesday, February 5, 2013, 5:05 PM >> Dear Biopythoneers, >> >> Source distributions and Windows installers for Biopython >> 1.61 are now >> available from the downloads page on the Biopython website >> and from >> the Python Package Index (PyPI). >> >> The updated Biopython Tutorial and Cookbook is online >> (PDF). >> >> Platforms/Deployment: >> >> We currently support Python 2.5, 2.6 and 2.7 and also test >> under >> Python 3.1, 3.2 and 3.3 (including modules using NumPy), and >> Jython >> 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our >> C >> extensions). We are still encouraging early adopters to help >> test on >> these platforms, and have included a ?beta? installer >> for Python 3.2 >> (and Python 3.3 to follow soon) under 32-bit Windows. >> >> Please note we are phasing out support for Python 2.5. We >> will >> continue support for at least one further release (Biopython >> 1.62). >> This could be extended given feedback from our users. >> Focusing on >> Python 2.6 and 2.7 only will make writing Python 3 >> compatible code >> easier. >> >> New Features: >> >> GenomeDiagram has three new sigils (shapes to illustrate >> features). >> OCTO shows an octagonal shape, like the existing BOX sigil >> but with >> the corners cut off. JAGGY shows a box with jagged edges at >> the start >> and end, intended for things like NNNNN regions in draft >> genomes. >> Finally BIGARROW is like the existing ARROW sigil but is >> drawn >> straddling the axis. This is useful for drawing vertically >> compact >> figures where you do not have overlapping genes. >> >> New module Bio.Graphics.ColorSpiral can generate colors >> along a spiral >> path through HSV color space. This can be used to make >> arbitrary >> ?rainbow? scales, for example to color features or >> cross-links on a >> GenomeDiagram figure. >> >> The Bio.SeqIO module now supports reading sequences from PDB >> files in >> two different ways. The ?pdb-atom? format determines the >> sequence as >> it appears in the structure based on the atom coordinate >> section of >> the file (via Bio.PDB, >> so NumPy is currently required for this). Alternatively, you >> can use >> the ?pdb-seqres? format to read the complete protein >> sequence as it is >> listed in the PDB header, if available. >> >> The Bio.SeqUtils module how has a seq1 function to turn a >> sequence >> using three letter amino acid codes into one using the more >> common one >> letter codes. This acts as the inverse of the existing seq3 >> function. >> >> The multiple-sequence-alignment object used by Bio.AlignIO >> etc now >> supports an annotation dictionary. Additional support for >> per-column >> annotation is planned, with addition and splicing to work >> like that >> for the SeqRecord per-letter annotation. >> >> The Bio.Motif module has been updated and reorganized. To >> allow for a >> clean deprecation of the old code, the new motif code is >> stored in a >> new module Bio.motifs, and a PendingDeprecationWarning was >> added to >> Bio.Motif. >> >> Experimental Code ? SearchIO: >> >> This release also includes Bow?s Google Summer of Code >> work writing a >> unified parsing framework for NCBI BLAST (assorted formats >> including >> tabular and XML), HMMER, BLAT, and other sequence searching >> tools. >> This is currently available with the new >> BiopythonExperimentalWarning >> to indicate that this is still somewhat experimental. >> We?re bundling >> it with the main release to get more public feedback, but >> with the big >> warning that the API is likely to change. In fact, even the >> current >> name of Bio.SearchIO may change since unless you are >> familiar with >> BioPerl its purpose isn?t immediately clear. >> >> Contributors: >> >> Brandon Invergo >> Bryan Lunt (first contribution) >> Christian Brueffer (first contribution) >> David Cain >> Eric Talevich >> Grace Yeo (first contribution) >> Jeffrey Chang >> Jingping Li (first contribution) >> Kai Blin (first contribution) >> Leighton Pritchard >> Lenna Peterson >> Lucas Sinclair (first contribution) >> Michiel de Hoon >> Nick Semenkovich (first contribution) >> Peter Cock >> Robert Ernst (first contribution) >> Tiago Antao >> Wibowo ?Bow? Arindrarto >> >> Thank you all. >> >> Release announcement here (RSS feed available): >> http://news.open-bio.org/news/2013/02/biopython-1-61-released/ >> >> P.S. You can follow @Biopython on Twitter >> https://twitter.com/Biopython >> >> _______________________________________________ >> Biopython-announce mailing list - Biopython-announce at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-announce > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Wed Feb 6 03:37:05 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 6 Feb 2013 08:37:05 +0000 Subject: [Biopython-dev] Biopython 1.61 released In-Reply-To: References: <1360112610.67186.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: On Wednesday, February 6, 2013, Kieran Mace wrote: > Hi. > > I'm wondering if the MafIO module is going to be included in this release? > > -Kieran I'm not promising but I would hope so. There is some work to be done first with locations and start/end information in the SeqRecord. See also the CompoundLocation discussion. Peter From mjldehoon at yahoo.com Wed Feb 6 03:36:26 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 6 Feb 2013 00:36:26 -0800 (PST) Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: Message-ID: <1360139786.53882.YahooMailClassic@web164003.mail.gq1.yahoo.com> Hi Lenna, Thanks for your reply. Are you planning to continue your work on the PLY-based mmCIF parser? Best, -Michiel --- On Tue, 2/5/13, Lenna Peterson wrote: From: Lenna Peterson Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) To: "Michiel de Hoon" Cc: "BioPython-Dev Mailing List" Date: Tuesday, February 5, 2013, 8:31 PM Hi Michiel,? I worked on that a bit early last year. See thread on this bug:? https://redmine.open-bio.org/issues/2619 Namely, I determined that the flex headers aren't required to compile the flex-generated C, which is a great start.? I also started work on a PLY-based pure Python reimplementation. Pull request here: https://github.com/biopython/biopython/pull/33 I haven't looked at this code in quite a long time! Let me know if you have any questions about what I did and I will do my best to remember... Cheers, Lenna On Tue, Feb 5, 2013 at 8:07 PM, Michiel de Hoon wrote: With Biopython 1.61 now out, perhaps this is a good time to tackle Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like to replace this with a plain C module, or perhaps with a pure-Python parser. This issue was previously discussed here: http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html Or is anybody else already looking at this module? Best, -Michiel. _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From redmine at redmine.open-bio.org Wed Feb 6 16:39:04 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 6 Feb 2013 21:39:04 +0000 Subject: [Biopython-dev] [Biopython - Bug #3411] (New) Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST) Message-ID: Issue #3411 has been reported by Tom McCoy. ---------------------------------------- Bug #3411: Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST) https://redmine.open-bio.org/issues/3411 Author: Tom McCoy Status: New Priority: Normal Assignee: Category: Target version: URL: "Either a single UID or a comma-delimited list of UIDs may be provided. All of the UIDs must be from the database specified by db. There is no set maximum for the number of UIDs that can be passed to EFetch, *but if more than about 200 UIDs are to be provided, the request should be made using the HTTP POST method*." -- http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch Entrez.efetch uses this API endpoint via GET regardless of the number of UIDs supplied. The attached patch corrects this behavior. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Feb 7 05:20:30 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 7 Feb 2013 10:20:30 +0000 Subject: [Biopython-dev] [Biopython - Bug #3411] Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST) References: Message-ID: Issue #3411 has been updated by Peter Cock. Assignee set to Biopython Dev Mailing List I don't recall that guideline being in the earlier requirements/documentation when Bio.Entrez was first written, but the fix proposed looks sensible. (Note - do we need to worry about the ids being a string or a list at that point, and therefore how to count the entries?) P.S. Resetting assignee to default of the dev mailing list. ---------------------------------------- Bug #3411: Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST) https://redmine.open-bio.org/issues/3411 Author: Tom McCoy Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: "Either a single UID or a comma-delimited list of UIDs may be provided. All of the UIDs must be from the database specified by db. There is no set maximum for the number of UIDs that can be passed to EFetch, *but if more than about 200 UIDs are to be provided, the request should be made using the HTTP POST method*." -- http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch Entrez.efetch uses this API endpoint via GET regardless of the number of UIDs supplied. The attached patch corrects this behavior. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Thu Feb 7 06:33:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 7 Feb 2013 11:33:25 +0000 Subject: [Biopython-dev] Biopython 1.61 released In-Reply-To: References: Message-ID: On Tue, Feb 5, 2013 at 10:05 PM, Peter Cock wrote: > Dear Biopythoneers, > > Source distributions and Windows installers for Biopython 1.61 are now > available from the downloads page on the Biopython website and from > the Python Package Index (PyPI). > > The updated Biopython Tutorial and Cookbook is online (PDF). > > Platforms/Deployment: > > We currently support Python 2.5, 2.6 and 2.7 and also test under > Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython > 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C > extensions). We are still encouraging early adopters to help test on > these platforms, and have included a ?beta? installer for Python 3.2 > (and Python 3.3 to follow soon) under 32-bit Windows. For those of you wanting to try Biopython on Python 3.3 on Windows, there is now an installer for Biopython 1.61 built against NumPy 1.7.0rc2. NumPy 1.7 is their first release to support Python 3.3, and the official release is expected to be near-identical to this second release candidate, see: http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065384.html Regards, Peter From p.j.a.cock at googlemail.com Thu Feb 7 06:53:40 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 7 Feb 2013 11:53:40 +0000 Subject: [Biopython-dev] [Biopython] Fwd: Bug in bgzf module In-Reply-To: References: Message-ID: On Wed, Feb 6, 2013 at 10:35 PM, Petra Kubincov? wrote: > Hi Peter, > > based on your unit test for tell method I've created this: > http://dl.dropbox.com/u/... > I hope it's at least partially usable. > > Regards, > Petra Thanks, I turned that into this commit: https://github.com/biopython/biopython/commit/194bda7cd4bc292b37fd219f1f95a19e1316ac5a That lead me to notice a special case with offsets on a block boundary, see this fix and test: https://github.com/biopython/biopython/commit/fef7659dacaf93ddeb6270103d8ded6fb89414b7 Peter From p.j.a.cock at googlemail.com Thu Feb 7 08:30:31 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 7 Feb 2013 13:30:31 +0000 Subject: [Biopython-dev] More 'fun' with GenBank In-Reply-To: References: <50F57BC5.7020607@biotech.uni-tuebingen.de> <50F66496.8000109@biotech.uni-tuebingen.de> <50FD0F2B.1080606@biotech.uni-tuebingen.de> Message-ID: On Tue, Feb 5, 2013 at 10:38 PM, Peter Cock wrote: > On Mon, Jan 21, 2013 at 9:49 AM, Kai Blin > wrote: >> >>> Kai - would you mind retesting with f_loc5 (the rebased branch)? >> >> The location of the feature that caused trouble for me still looks >> correct. I'm currently running some more sequences, but I'm pretty >> confident that the code will work just fine. The tests I added to the >> genbank parser code for all the problem cases I had pass, after all. :) >> >>> Everyone - does it seem sensible to include this now, ready for the >>> upcoming release (*)? Or perhaps just after the release? >> >> I'd perfer having this in the next release if possible, but of course >> if the release after that is coming up within a reasonable time frame, >> that would work as well. >> >> Cheers, >> Kai > > Unless anyone objects, I will apply the (rebased) version of this > f_loc4 / f_loc5 branch later this week (now that Biopython 1.61 > is out). > > This replaces the SeqFeature use of sub_features with a new > CompoundLocation which I think is a far more natural way to > handle join locations in EMBL/GenBank files. > > Also, it means we can offer parsing of GenBank/EMBL style > location lines into (Compound)Location objects directly :) > > Regards, > > Peter Applied to master, https://github.com/biopython/biopython/commit/e5ff9e48e315924d59348c013ab082d6f155d18b Peter From kai.blin at biotech.uni-tuebingen.de Thu Feb 7 09:47:37 2013 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 07 Feb 2013 15:47:37 +0100 Subject: [Biopython-dev] More 'fun' with GenBank In-Reply-To: References: <50F57BC5.7020607@biotech.uni-tuebingen.de> <50F66496.8000109@biotech.uni-tuebingen.de> <50FD0F2B.1080606@biotech.uni-tuebingen.de> Message-ID: <5113BE89.3050303@biotech.uni-tuebingen.de> On 2013-02-07 14:30, Peter Cock wrote: Hi Peter, > Applied to master, > https://github.com/biopython/biopython/commit/e5ff9e48e315924d59348c013ab082d6f155d18b Thanks for that. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From arklenna at gmail.com Thu Feb 7 13:21:37 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Thu, 7 Feb 2013 13:21:37 -0500 Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: <1360139786.53882.YahooMailClassic@web164003.mail.gq1.yahoo.com> References: <1360139786.53882.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: Hi Michiel, If there are well-defined problems with the PLY parser, I can work on fixing them. I am not currently working with mmCIF so I am not in the best position to evaluate where and how the parser needs to be improved. I am working with X-ray PDB files and I am not sure if my collaborators are familiar with mmCIF. I have not dealt with NMR files of any type, either. Cheers, Lenna On Wed, Feb 6, 2013 at 3:36 AM, Michiel de Hoon wrote: > Hi Lenna, > > Thanks for your reply. > Are you planning to continue your work on the PLY-based mmCIF parser? > > Best, > -Michiel > > --- On *Tue, 2/5/13, Lenna Peterson * wrote: > > > From: Lenna Peterson > Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) > To: "Michiel de Hoon" > Cc: "BioPython-Dev Mailing List" > Date: Tuesday, February 5, 2013, 8:31 PM > > > Hi Michiel, > > I worked on that a bit early last year. See thread on this bug: > > https://redmine.open-bio.org/issues/2619 > > Namely, I determined that the flex headers aren't required to compile the > flex-generated C, which is a great start. > > I also started work on a PLY-based pure Python reimplementation. Pull > request here: > > https://github.com/biopython/biopython/pull/33 > > I haven't looked at this code in quite a long time! Let me know if you > have any questions about what I did and I will do my best to remember... > > Cheers, > > Lenna > > > On Tue, Feb 5, 2013 at 8:07 PM, Michiel de Hoon > > wrote: > > With Biopython 1.61 now out, perhaps this is a good time to tackle > Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like > to replace this with a plain C module, or perhaps with a pure-Python > parser. This issue was previously discussed here: > > http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html > > Or is anybody else already looking at this module? > > Best, > -Michiel. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > From anaryin at gmail.com Thu Feb 7 13:25:37 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 7 Feb 2013 19:25:37 +0100 Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: References: <1360139786.53882.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: Hi, In our NMR lab I am pretty sure mmCIF files are not even known.. How widely used is the format in x-ray labs? I have never seen it outside this mailing list to be honest. Best, Jo?o From p.j.a.cock at googlemail.com Fri Feb 8 10:21:46 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 8 Feb 2013 15:21:46 +0000 Subject: [Biopython-dev] Fwd: [biopython] Newick parser (#156) In-Reply-To: References: Message-ID: Eric, Could you take a look at this please? Thanks, Peter ---------- Forwarded message ---------- From: Ben Morris Date: Fri, Feb 8, 2013 at 3:12 PM Subject: [biopython] Newick parser (#156) To: biopython/biopython In light of three issues with the Newick parser: https://redmine.open-bio.org/issues/3409 https://redmine.open-bio.org/issues/3386 https://redmine.open-bio.org/issues/3407 this is a rewrite of the parser from scratch. It supports quoted node labels and can handle support values either as they were previously handled or from square-bracketed comments, as requested by Arlin. Additionally, it's consistently quite fast: [image: newick_parse_times] The unit tests still pass with these changes, and I'm now able to parse trees that previously raised exceptions. ------------------------------ You can merge this Pull Request by running git pull https://github.com/bendmorris/biopython newick Or view, comment on, or merge it at: https://github.com/biopython/biopython/pull/156 Commit Summary - A more efficient implementation of a Newick parser (linear time vs. quadratic) that makes only a single pass over the text and handles quoted labels correctly. - Implementing support values and fixing issue when external parentheses are missing. File Changes - *M* Bio/Phylo/NewickIO.py(198) Patch Links: - https://github.com/biopython/biopython/pull/156.patch - https://github.com/biopython/biopython/pull/156.diff From mjldehoon at yahoo.com Fri Feb 8 20:42:23 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 8 Feb 2013 17:42:23 -0800 (PST) Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: Message-ID: <1360374143.25311.YahooMailClassic@web164004.mail.gq1.yahoo.com> Hi Lenna, --- On Thu, 2/7/13, Lenna Peterson wrote: > If there are well-defined problems with the PLY parser, I can work on > fixing them. I am not currently working with mmCIF so I am not in the > best position to evaluate where and how the parser needs to be improved. I don't know of any problems with the PLY parser, but since it relies on PLY, it would add another dependency to Biopython. On the other hand, a pure-Python solution may be preferable, as it's easier to maintain and runs with Jython. The C implementation is considerably faster, but I doubt that it really matters since the Python (PLY) parser seems to be fast enough. I see three options then: 1) Remove the lex stuff from lex.yy.c, and optionally convert the remaining C code to Python. 2) Remove the PLY dependency from the PLY-based parser. 3) Write a new pure-Python parser from scratch. I'm guessing that 1) will be the most straightforward. Other opinions? Best, -Michiel. --- On Thu, 2/7/13, Lenna Peterson wrote: If there are well-defined problems with the PLY parser, I can work on fixing them. I am not currently working with mmCIF so I am not in the best position to evaluate where and how the parser needs to be improved. I am working with X-ray PDB files and I am not sure if my collaborators are familiar with mmCIF. I have not dealt with NMR files of any type, either.? Cheers, Lenna On Wed, Feb 6, 2013 at 3:36 AM, Michiel de Hoon wrote: Hi Lenna, Thanks for your reply. Are you planning to continue your work on the PLY-based mmCIF parser? Best, -Michiel --- On Tue, 2/5/13, Lenna Peterson wrote: From: Lenna Peterson Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) To: "Michiel de Hoon" Cc: "BioPython-Dev Mailing List" Date: Tuesday, February 5, 2013, 8:31 PM Hi Michiel,? I worked on that a bit early last year. See thread on this bug:? https://redmine.open-bio.org/issues/2619 Namely, I determined that the flex headers aren't required to compile the flex-generated C, which is a great start.? I also started work on a PLY-based pure Python reimplementation. Pull request here: https://github.com/biopython/biopython/pull/33 I haven't looked at this code in quite a long time! Let me know if you have any questions about what I did and I will do my best to remember... Cheers, Lenna On Tue, Feb 5, 2013 at 8:07 PM, Michiel de Hoon wrote: With Biopython 1.61 now out, perhaps this is a good time to tackle Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like to replace this with a plain C module, or perhaps with a pure-Python parser. This issue was previously discussed here: http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html Or is anybody else already looking at this module? Best, -Michiel. _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From redmine at redmine.open-bio.org Sat Feb 9 03:22:31 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sat, 9 Feb 2013 08:22:31 +0000 Subject: [Biopython-dev] [Biopython - Bug #3411] Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST) References: Message-ID: Issue #3411 has been updated by Michiel de Hoon. Fixed (using a slightly different code); see revision f1836165. ---------------------------------------- Bug #3411: Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST) https://redmine.open-bio.org/issues/3411 Author: Tom McCoy Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: "Either a single UID or a comma-delimited list of UIDs may be provided. All of the UIDs must be from the database specified by db. There is no set maximum for the number of UIDs that can be passed to EFetch, *but if more than about 200 UIDs are to be provided, the request should be made using the HTTP POST method*." -- http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch Entrez.efetch uses this API endpoint via GET regardless of the number of UIDs supplied. The attached patch corrects this behavior. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Sat Feb 9 06:53:31 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 9 Feb 2013 11:53:31 +0000 Subject: [Biopython-dev] Deprecating Bio.Index? Message-ID: Hello all, Does anyone still use Bio.Index? I don't think any of Biopython itself does nowadays, so perhaps we can deprecate this? https://github.com/biopython/biopython/blob/master/Bio/Index.py (We should of course ask on the main list first just in case) Regards, Peter From colin.aibn at gmail.com Sat Feb 9 08:06:13 2013 From: colin.aibn at gmail.com (Colin Archer) Date: Sat, 9 Feb 2013 23:06:13 +1000 Subject: [Biopython-dev] SearchIO HSP indexing Message-ID: Hi everyone, I have a question about the implementation of high-scoring segment pairs (HSPs) in SearchIO. I currently have an BLAST output file in XML format I am parsing and this is one of the hits (removed the alignment details to save space): 1 gnl|BL_ORD_ID|111 ref|NC_007779|:125695-127587 111 1893 1 3352.79 1815 0 1 1893 1 1893 1 1 1867 1867 0 2 399.997 216 2.88061e-111 331 881 22 581 1 1 452 452 19 565 Using Hsp1 as an example, the query and hit starts ("Hsp_query_to" and "Hsp_hit_to") are both 1 in the XML but when I access the Hsp objects from the BlastResult, both values are equal to 0: >>> blast_record[0][0].query_start 0 >>> blast_record[0][0].hit_start 0 However, when I access the end objects for the query and hit, the result isn't 1892 (zero based 1893) but 1893: >>> blast_record[0][0].query_end 1893 >>> blast_record[0][0].hit_end 1893 Is this correct? I find it a little confusing that one result is zero-based and the other one-based. Thanks Colin From p.j.a.cock at googlemail.com Sat Feb 9 08:16:43 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 9 Feb 2013 13:16:43 +0000 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: References: Message-ID: On Sat, Feb 9, 2013 at 1:06 PM, Colin Archer wrote: > Hi everyone, > I have a question about the implementation of > high-scoring segment pairs (HSPs) in SearchIO. I currently have an BLAST > output file in XML format I am parsing and this is one of the hits (removed > the alignment details to save space): > > > 1 > gnl|BL_ORD_ID|111 > ref|NC_007779|:125695-127587 > 111 > 1893 > > > 1 > 3352.79 > 1815 > 0 > 1 > 1893 > 1 > 1893 > 1 > 1 > 1867 > 1867 > 0 > > > 2 > 399.997 > 216 > 2.88061e-111 > 331 > 881 > 22 > 581 > 1 > 1 > 452 > 452 > 19 > 565 > > > Using Hsp1 as an example, the query and hit starts ("Hsp_query_to" and > "Hsp_hit_to") are both 1 in the XML but when I access the Hsp objects from > the BlastResult, both values are equal to 0: > >>>> blast_record[0][0].query_start > 0 >>>> blast_record[0][0].hit_start > 0 > > However, when I access the end objects for the query and hit, the result > isn't 1892 (zero based 1893) but 1893: > >>>> blast_record[0][0].query_end > 1893 >>>> blast_record[0][0].hit_end > 1893 > > Is this correct? I find it a little confusing that one result is zero-based > and the other one-based. > > Thanks > Colin Hi Colin, The SearchIO positions like elsewhere in Biopython should be using Python style counting. Looking at this one: 1 1893 That is like a GenBank/EMBL location 1..1893 which in Python string slicing is [0:1893], so the start has -1 but the end is unchanged. The nice thing is the length is 1893 and is given as the difference of the Python slicing style end and start. Perhaps we need to work on the help text? Any suggestions? Thanks, Peter From colin.aibn at gmail.com Sat Feb 9 08:54:42 2013 From: colin.aibn at gmail.com (Colin Archer) Date: Sat, 9 Feb 2013 23:54:42 +1000 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: References: Message-ID: Hi Peter, Thanks for getting back to me so quickly. I'm curious about the benefits of having these values in Python string slicing format? I haven't come across this very often, I'm used to seeing values systematically zero or one-based. Would it be easier to keep the range variables hit_range and hit_range_all in slicing format and the start and end variables in sequence position format so that they represent the actual BLAST results? I had a look at some of the code and I can't see the slicing format mentioned anywhere (Hsp.py, Hit.py, or blast_xml.py). It would probably be helpful to explain the values in Hsp.py as a ** mark on hsp_start, hsp_end, query_start, and query_end so that if people are interested they can have a look at the files and see what they mean. Thanks Colin On Sat, Feb 9, 2013 at 11:16 PM, Peter Cock wrote: > On Sat, Feb 9, 2013 at 1:06 PM, Colin Archer wrote: > > Hi everyone, > > I have a question about the implementation of > > high-scoring segment pairs (HSPs) in SearchIO. I currently have an BLAST > > output file in XML format I am parsing and this is one of the hits > (removed > > the alignment details to save space): > > > > > > 1 > > gnl|BL_ORD_ID|111 > > ref|NC_007779|:125695-127587 > > 111 > > 1893 > > > > > > 1 > > 3352.79 > > 1815 > > 0 > > 1 > > 1893 > > 1 > > 1893 > > 1 > > 1 > > 1867 > > 1867 > > 0 > > > > > > 2 > > 399.997 > > 216 > > 2.88061e-111 > > 331 > > 881 > > 22 > > 581 > > 1 > > 1 > > 452 > > 452 > > 19 > > 565 > > > > > > Using Hsp1 as an example, the query and hit starts ("Hsp_query_to" and > > "Hsp_hit_to") are both 1 in the XML but when I access the Hsp objects > from > > the BlastResult, both values are equal to 0: > > > >>>> blast_record[0][0].query_start > > 0 > >>>> blast_record[0][0].hit_start > > 0 > > > > However, when I access the end objects for the query and hit, the result > > isn't 1892 (zero based 1893) but 1893: > > > >>>> blast_record[0][0].query_end > > 1893 > >>>> blast_record[0][0].hit_end > > 1893 > > > > Is this correct? I find it a little confusing that one result is > zero-based > > and the other one-based. > > > > Thanks > > Colin > > Hi Colin, > > The SearchIO positions like elsewhere in Biopython should be > using Python style counting. Looking at this one: > > 1 > 1893 > > That is like a GenBank/EMBL location 1..1893 which in Python string > slicing is [0:1893], so the start has -1 but the end is unchanged. The > nice thing is the length is 1893 and is given as the difference of the > Python slicing style end and start. > > Perhaps we need to work on the help text? Any suggestions? > > Thanks, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From p.j.a.cock at googlemail.com Sat Feb 9 09:30:26 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 9 Feb 2013 14:30:26 +0000 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: References: Message-ID: On Sat, Feb 9, 2013 at 1:54 PM, Colin Archer wrote: > Hi Peter, > Thanks for getting back to me so quickly. > Thank you - the main reason for including SearchIO in Biopython 1.61 as 'experimental code' is to get wider testing and feedback (hopefully an approach that will work well and we can use this more in future for other new code). > I'm curious about the benefits of having these values in Python string > slicing format? I haven't come across this very often, I'm used to seeing > values systematically zero or one-based. Once you're used to Python slicing it becomes very natural. > Would it be easier to keep the range variables hit_range and hit_range_all > in slicing format and the start and end variables in sequence position > format so that they represent the actual BLAST results? One reason for this is to be consistent across all the formats supported in SearchIO, and since Biopython is a Python library following Python norms seems most natural. > I had a look at some of the code and I can't see the slicing format > mentioned anywhere (Hsp.py, Hit.py, or blast_xml.py). It would probably be > helpful to explain the values in Hsp.py as a ** mark on hsp_start, hsp_end, > query_start, and query_end so that if people are interested they can have a > look at the files and see what they mean. > > Thanks > Colin OK, so some clarification with examples in the docstrings is needed. How about the Tutorial chapter? Thanks, Peter From chapmanb at 50mail.com Sat Feb 9 09:43:26 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Sat, 09 Feb 2013 09:43:26 -0500 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: References: Message-ID: <87a9rdy2cx.fsf@fastmail.fm> Colin; >> I'm curious about the benefits of having these values in Python string >> slicing format? I haven't come across this very often, I'm used to seeing >> values systematically zero or one-based. To clarify further in addition to Peter's response, the 0-based half-open and 1-based closed systems are the two systems you're referring to. Python, and most programming languages, use the 0-based half open indexing approach which is what SearchIO is converting to. Aaron has a nice response on BioStars while explains the differences in more details: http://www.biostars.org/p/6373/#6377 Brad From colin.aibn at gmail.com Sat Feb 9 10:18:33 2013 From: colin.aibn at gmail.com (Colin Archer) Date: Sun, 10 Feb 2013 01:18:33 +1000 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: <87a9rdy2cx.fsf@fastmail.fm> References: <87a9rdy2cx.fsf@fastmail.fm> Message-ID: Interesting commentary from Edsger Dijkstra as well: http://www.cs.utexas.edu/~EWD/ewd08xx/EWD831.PDF if possible, I would definitely add some of these links to either the tutorial or the code Colin On Sun, Feb 10, 2013 at 12:43 AM, Brad Chapman wrote: > > Colin; > > >> I'm curious about the benefits of having these values in Python string > >> slicing format? I haven't come across this very often, I'm used to > seeing > >> values systematically zero or one-based. > > To clarify further in addition to Peter's response, the 0-based > half-open and 1-based closed systems are the two systems you're > referring to. Python, and most programming languages, use the 0-based > half open indexing approach which is what SearchIO is converting to. > Aaron has a nice response on BioStars while explains the differences in > more details: > > http://www.biostars.org/p/6373/#6377 > > Brad > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From Markus.Piotrowski at ruhr-uni-bochum.de Sat Feb 9 10:12:12 2013 From: Markus.Piotrowski at ruhr-uni-bochum.de (Markus Piotrowski) Date: 9 Feb 2013 16:12:12 +0100 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: References: Message-ID: <739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de> Mmh, at least Bio.Blast.NCBIXML returns the exact values given in the xml result. So query_start and sbjct_start (BTW, not hit_start) return the values from and . Thus, my first guess would be that a search function that can return an entity 'query_start' will return the value that is written in the file. Markus Am 2013-02-09 15:30, schrieb Peter Cock: > On Sat, Feb 9, 2013 at 1:54 PM, Colin Archer > wrote: >> Hi Peter, >> Thanks for getting back to me so quickly. >> > > Thank you - the main reason for including SearchIO in Biopython 1.61 > as 'experimental code' is to get wider testing and feedback > (hopefully > an approach that will work well and we can use this more in future > for > other new code). > >> I'm curious about the benefits of having these values in Python >> string >> slicing format? I haven't come across this very often, I'm used to >> seeing >> values systematically zero or one-based. > > Once you're used to Python slicing it becomes very natural. > >> Would it be easier to keep the range variables hit_range and >> hit_range_all >> in slicing format and the start and end variables in sequence >> position >> format so that they represent the actual BLAST results? > > One reason for this is to be consistent across all the formats > supported > in SearchIO, and since Biopython is a Python library following Python > norms seems most natural. > >> I had a look at some of the code and I can't see the slicing format >> mentioned anywhere (Hsp.py, Hit.py, or blast_xml.py). It would >> probably be >> helpful to explain the values in Hsp.py as a ** mark on hsp_start, >> hsp_end, >> query_start, and query_end so that if people are interested they can >> have a >> look at the files and see what they mean. >> >> Thanks >> Colin > > OK, so some clarification with examples in the docstrings is needed. > How about the Tutorial chapter? > > Thanks, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From colin.aibn at gmail.com Sat Feb 9 10:19:26 2013 From: colin.aibn at gmail.com (Colin Archer) Date: Sun, 10 Feb 2013 01:19:26 +1000 Subject: [Biopython-dev] Fwd: SearchIO HSP indexing In-Reply-To: References: Message-ID: > Hi Peter, > > Thanks for getting back to me so quickly. > > > > Thank you - the main reason for including SearchIO in Biopython 1.61 > as 'experimental code' is to get wider testing and feedback (hopefully > an approach that will work well and we can use this more in future for > other new code). > > I've been using it for a couple months now and i definitely prefer it over the existing parser. > > I'm curious about the benefits of having these values in Python string > > slicing format? I haven't come across this very often, I'm used to seeing > > values systematically zero or one-based. > > Once you're used to Python slicing it becomes very natural. > > > Would it be easier to keep the range variables hit_range and hit_range_all > > in slicing format and the start and end variables in sequence position > > format so that they represent the actual BLAST results? > > One reason for this is to be consistent across all the formats supported > in SearchIO, and since Biopython is a Python library following Python > norms seems most natural. > > > I had a look at some of the code and I can't see the slicing format > > mentioned anywhere (Hsp.py, Hit.py, or blast_xml.py). It would probably > be > > helpful to explain the values in Hsp.py as a ** mark on hsp_start, > hsp_end, > > query_start, and query_end so that if people are interested they can > have a > > look at the files and see what they mean. > > > > Thanks > > Colin > > OK, so some clarification with examples in the docstrings is needed. > How about the Tutorial chapter? > > I would definitely add comments to the Hsp.py file and if there is a tutorial that people use, I would also update that as that would be the first place most people would look. I was wondering if there was any code in SearchIO to align high-scoring segment pairs against the same hit? I see the fragmentation code but that seems specific to BLAT results and when I look at the HSPFragments in the QueryResult object it does not seem to combine multiple HSPs against the same hit even if they are not overlapping. Thanks Colin From p.j.a.cock at googlemail.com Sat Feb 9 10:36:34 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 9 Feb 2013 15:36:34 +0000 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: <739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de> References: <739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de> Message-ID: > Am 2013-02-09 15:30, schrieb Peter Cock: >> One reason for this is to be consistent across all the formats supported >> in SearchIO, and since Biopython is a Python library following Python >> norms seems most natural. On Sat, Feb 9, 2013 at 3:12 PM, Markus Piotrowski wrote: > Mmh, at least Bio.Blast.NCBIXML returns the exact values given in the xml > result. Yes, the old Bio.Blast parsers do not try and convert the co-ordinates. Given they were only handling BLAST output that was a justifiable option. With Bio.SearchIO we're not just modelling BLAST output though - it covers multiple formats with different conventions. Peter From w.arindrarto at gmail.com Sat Feb 9 11:56:46 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Sat, 9 Feb 2013 17:56:46 +0100 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: References: <739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de> Message-ID: Hi everyone, Colin, thanks for the feedback! Peter has explained the rationale behind the decision, so I would like to add that there has been indeed an explanation of this behavior in the tutorial (http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc106) and the code (https://github.com/biopython/biopython/blob/master/Bio/SearchIO/__init__.py#L100). I do admit that the explanation in the code could be made clearer with some comments in hsp.py ~ which I can add :). As for your point about the alignment code: > I was wondering if there was any code in SearchIO to align high-scoring > segment pairs against the same hit? I see the fragmentation code but that > seems specific to BLAT results and when I look at the HSPFragments in the > QueryResult object it does not seem to combine multiple HSPs against the > same hit even if they are not overlapping. SearchIO relies on BLAST to do this ~ which has already grouped each HSP aligning to the same database sequence in one group (all of which is accessible through the Hit object). I've always assumed that if two HSPs came from the same database entry (Hit), they are grouped into one Hit by BLAST, regardless of whether they overlap or not. Have you seen any results from BLAST that shows otherwise? cheers, Bow From arklenna at gmail.com Sat Feb 9 12:14:01 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Sat, 9 Feb 2013 12:14:01 -0500 Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: <1360374143.25311.YahooMailClassic@web164004.mail.gq1.yahoo.com> References: <1360374143.25311.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 8, 2013 at 8:42 PM, Michiel de Hoon wrote: > Hi Lenna, > > > --- On *Thu, 2/7/13, Lenna Peterson * wrote: > > If there are well-defined problems with the PLY parser, I can work on > > fixing them. I am not currently working with mmCIF so I am not in the > > best position to evaluate where and how the parser needs to be improved. > > I don't know of any problems with the PLY parser, but since it relies on > PLY, it would add another dependency to Biopython. > > On the other hand, a pure-Python solution may be preferable, as it's > easier to maintain and runs with Jython. > As far as I can tell, PLY works with Jython, discussion on this thread: http://permalink.gmane.org/gmane.comp.python.ply/402 Not sure about pypy. One option would be to deploy the PLY parser for non-CPython platforms and tell them to manually install PLY if they want to use mmCIF. Not ideal, but is that preferred to an explicit dependency? > > I see three options then: > 1) Remove the lex stuff from lex.yy.c, and optionally convert the > remaining C code to Python. > As is, the C compiles cross platform with no dependencies. There is nothing but lex stuff in lex.yy.c - I'm not quite sure what you mean here. > 2) Remove the PLY dependency from the PLY-based parser. > 3) Write a new pure-Python parser from scratch. > > I'm not sure whether there is an appreciable difference between options 2 and 3. Cheers, Lenna From mjldehoon at yahoo.com Sat Feb 9 22:55:37 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 9 Feb 2013 19:55:37 -0800 (PST) Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: Message-ID: <1360468537.28338.YahooMailClassic@web164004.mail.gq1.yahoo.com> Hi Lenna, >--- On Sat, 2/9/13, Lenna Peterson wrote: > > 1) Remove the lex stuff from lex.yy.c, and optionally convert the remaining > >? C code to Python. > As is, the C?compiles cross platform with?no?dependencies.?There is nothing > but?lex stuff in lex.yy.c - I'm not quite sure what you mean here. Currently lex.yy.c contains lots of code that is generated automatically by lex but is not actually needed for the mmCIF parser. I was thinking to remove those parts, and to clean up the remainder so that the code is understandable (allowing us to fix any bugs, or to convert it to pure Python). Best, -Michiel From colin.aibn at gmail.com Sun Feb 10 02:28:36 2013 From: colin.aibn at gmail.com (Colin Archer) Date: Sun, 10 Feb 2013 17:28:36 +1000 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: References: <739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de> Message-ID: On Sun, Feb 10, 2013 at 2:56 AM, Wibowo Arindrarto wrote: > Hi everyone, > > Colin, thanks for the feedback! Peter has explained the rationale > behind the decision, so I would like to add that there has been indeed > an explanation of this behavior in the tutorial > (http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc106) and > the code ( > https://github.com/biopython/biopython/blob/master/Bio/SearchIO/__init__.py#L100 > ). > I do admit that the explanation in the code could be made clearer with > some comments in hsp.py ~ which I can add :). > > As for your point about the alignment code: > > > I was wondering if there was any code in SearchIO to align high-scoring > > segment pairs against the same hit? I see the fragmentation code but that > > seems specific to BLAT results and when I look at the HSPFragments in the > > QueryResult object it does not seem to combine multiple HSPs against the > > same hit even if they are not overlapping. > > SearchIO relies on BLAST to do this ~ which has already grouped each > HSP aligning to the same database sequence in one group (all of which > is accessible through the Hit object). I've always assumed that if two > HSPs came from the same database entry (Hit), they are grouped into > one Hit by BLAST, regardless of whether they overlap or not. Have you > seen any results from BLAST that shows otherwise? > > I have a couple of examples where BLAST doesn't combine the HSPs as you would expect. It seems to mainly occur because the HSP alignments overlap and to combine them would mean including more gaps in each hsp. For example, *ftsK* in *E. coli* (ftsK.blast) or *aceF* in *E. coli* (aceF.blast). In the second case, the first HSP spans the entire query and there are two additional HSPs that are overlapped by it. I know that BioPerl tries to align/tile (in Bio::Search::BlastUtils) the HSPs somewhat when required but some people are hesitant to use their method in certain situations (e.g., with tblastn results that overestimate some of the metrics). They also implement additional functionality so that the user could do a complete smith-waterman alignment if they wanted to. Thanks Colin -------------- next part -------------- A non-text attachment was scrubbed... Name: aceF.blast Type: application/octet-stream Size: 12124 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ftsK.blast Type: application/octet-stream Size: 18537 bytes Desc: not available URL: From w.arindrarto at gmail.com Sun Feb 10 10:31:51 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Sun, 10 Feb 2013 16:31:51 +0100 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: References: <739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de> Message-ID: Hi Colin, >> As for your point about the alignment code: >> >> > I was wondering if there was any code in SearchIO to align high-scoring >> > segment pairs against the same hit? I see the fragmentation code but >> > that >> > seems specific to BLAT results and when I look at the HSPFragments in >> > the >> > QueryResult object it does not seem to combine multiple HSPs against the >> > same hit even if they are not overlapping. >> >> SearchIO relies on BLAST to do this ~ which has already grouped each >> HSP aligning to the same database sequence in one group (all of which >> is accessible through the Hit object). I've always assumed that if two >> HSPs came from the same database entry (Hit), they are grouped into >> one Hit by BLAST, regardless of whether they overlap or not. Have you >> seen any results from BLAST that shows otherwise? >> > > I have a couple of examples where BLAST doesn't combine the HSPs as you > would expect. It seems to mainly occur because the HSP alignments overlap > and to combine them would mean including more gaps in each hsp. For example, > ftsK in E. coli (ftsK.blast) or aceF in E. coli (aceF.blast). In the second > case, the first HSP spans the entire query and there are two additional HSPs > that are overlapped by it. > > I know that BioPerl tries to align/tile (in Bio::Search::BlastUtils) the > HSPs somewhat when required but some people are hesitant to use their method > in certain situations (e.g., with tblastn results that overestimate some of > the metrics). They also implement additional functionality so that the user > could do a complete smith-waterman alignment if they wanted to. Thanks for including the files! At the moment, no, SearchIO doesn't have any code to 'assemble'/'tile' overlapping HSPs. The fragment bits you're seeing in the BLAT parser is simply the name we use to refer to noncontiguous blocks inside a reported HSP. We may be able to add some functions to return the intervals for such overlapping HSPs, given a Hit object. But I'm a bit hesitant to go further than that (i.e. to the point where we merge the statistics of the each HSP to assign to the assembled HSP). This is mostly because such assembly seems very specific to the program's statistics and format (BLAST's merge would be different from BLAT? and BLAST XML's merge may be different from tabular BLAST). If anything, perhaps these functions deserve their own space in SearchUtils (taking parallels from Bio.SeqIO and Bio.SeqUtils)? regards, Bow From redmine at redmine.open-bio.org Sun Feb 10 17:13:20 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 10 Feb 2013 22:13:20 +0000 Subject: [Biopython-dev] [Biopython - Bug #3412] (New) Bad URL in docs for module NCBIWWW.qblast Message-ID: Issue #3412 has been reported by Vincent Davis. ---------------------------------------- Bug #3412: Bad URL in docs for module NCBIWWW.qblast https://redmine.open-bio.org/issues/3412 Author: Vincent Davis Status: New Priority: Low Assignee: Biopython Dev Mailing List Category: Documentation Target version: URL: At the bottom of "help(help(NCBIWWW.qblast) is a link to http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html This link is not valid. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sun Feb 10 17:13:20 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 10 Feb 2013 22:13:20 +0000 Subject: [Biopython-dev] [Biopython - Bug #3412] (New) Bad URL in docs for module NCBIWWW.qblast Message-ID: Issue #3412 has been reported by Vincent Davis. ---------------------------------------- Bug #3412: Bad URL in docs for module NCBIWWW.qblast https://redmine.open-bio.org/issues/3412 Author: Vincent Davis Status: New Priority: Low Assignee: Biopython Dev Mailing List Category: Documentation Target version: URL: At the bottom of "help(help(NCBIWWW.qblast) is a link to http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html This link is not valid. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sun Feb 10 17:40:21 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 10 Feb 2013 22:40:21 +0000 Subject: [Biopython-dev] [Biopython - Bug #3412] (Resolved) Bad URL in docs for module NCBIWWW.qblast References: Message-ID: Issue #3412 has been updated by Peter Cock. Status changed from New to Resolved % Done changed from 0 to 100 The NCBI seem to have broken that link, and if they did setup a redirect for a while it has stopped now. I'll use http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html instead I think, https://github.com/biopython/biopython/commit/ae84cc8cb828e868883c75a980fcd83585c338f8 Thanks! ---------------------------------------- Bug #3412: Bad URL in docs for module NCBIWWW.qblast https://redmine.open-bio.org/issues/3412 Author: Vincent Davis Status: Resolved Priority: Low Assignee: Biopython Dev Mailing List Category: Documentation Target version: URL: At the bottom of "help(help(NCBIWWW.qblast) is a link to http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html This link is not valid. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From eric.talevich at gmail.com Sun Feb 10 21:11:54 2013 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 10 Feb 2013 21:11:54 -0500 Subject: [Biopython-dev] New Newick parser in Bio.Phylo Message-ID: Hi Ben, I've noticed a couple new characteristics of the Newick parser that I had questions about. 1. There is no longer a way to tell the parser to treat internal node labels as confidence values. Lots of files in the wild do record the support values here, including those generated by RAxML, PhyML, FastTree and MrBayes, so I'd like to restore this option, and perhaps make it the default. I think the condition is: if not (self.values_are_confidence or self.comments_are_confidence or current_clade.is_terminal()): # parse confidence from node label Is there an easy way to add this option to the parser? I'm trying to get this to work in the "else" clause in parse_tree, where unquoted node labels are handled. 2. Confidence values are required to be between 0.0 and 1.0. Also, support values recorded as integers are treated as percentages and divided by 100 automatically. The phyloXML spec doesn't have this range requirement. RAxML scales bootstraps to 100, but PhyML records the raw number of supporting bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap replicates). So, I'd prefer to leave the confidence values as they are, requiring only that they be numeric. Thoughts? Thanks, Eric From ben at bendmorris.com Sun Feb 10 21:39:24 2013 From: ben at bendmorris.com (Ben Morris) Date: Sun, 10 Feb 2013 21:39:24 -0500 Subject: [Biopython-dev] New Newick parser in Bio.Phylo In-Reply-To: References: Message-ID: On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich wrote: > Hi Ben, > > I've noticed a couple new characteristics of the Newick parser that I had > questions about. > > 1. There is no longer a way to tell the parser to treat internal node labels > as confidence values. Lots of files in the wild do record the support values > here, including those generated by RAxML, PhyML, FastTree and MrBayes, so > I'd like to restore this option, and perhaps make it the default. I think > the condition is: > > if not (self.values_are_confidence or self.comments_are_confidence or > current_clade.is_terminal()): # parse confidence from node label > > Is there an easy way to add this option to the parser? I'm trying to get > this to work in the "else" clause in parse_tree, where unquoted node labels > are handled. > > > 2. Confidence values are required to be between 0.0 and 1.0. Also, support > values recorded as integers are treated as percentages and divided by 100 > automatically. The phyloXML spec doesn't have this range requirement. RAxML > scales bootstraps to 100, but PhyML records the raw number of supporting > bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap > replicates). So, I'd prefer to leave the confidence values as they are, > requiring only that they be numeric. Thoughts? > > > Thanks, > Eric 1. One issue is that current_clade.is_terminal() will always be true at that point because current_clade's children haven't been parsed yet. Putting the check in the "process_clade" function (which is called when the closing paren is hit, and therefore all children should have been parsed) should fix this. So, if values_are_confidence and comments_are_confidence are both false and a node label is numeric, it should be treated as confidence, and clade.name should be set to None - is that correct? 2. This should be as simple as removing current lines 123-127. ~Ben From eric.talevich at gmail.com Sun Feb 10 22:30:47 2013 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 10 Feb 2013 22:30:47 -0500 Subject: [Biopython-dev] New Newick parser in Bio.Phylo In-Reply-To: References: Message-ID: On Sun, Feb 10, 2013 at 9:39 PM, Ben Morris wrote: > On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich > wrote: > > Hi Ben, > > > > I've noticed a couple new characteristics of the Newick parser that I had > > questions about. > > > > 1. There is no longer a way to tell the parser to treat internal node > labels > > as confidence values. Lots of files in the wild do record the support > values > > here, including those generated by RAxML, PhyML, FastTree and MrBayes, so > > I'd like to restore this option, and perhaps make it the default. I think > > the condition is: > > > > if not (self.values_are_confidence or self.comments_are_confidence or > > current_clade.is_terminal()): # parse confidence from node label > > > > Is there an easy way to add this option to the parser? I'm trying to get > > this to work in the "else" clause in parse_tree, where unquoted node > labels > > are handled. > > > > > > 2. Confidence values are required to be between 0.0 and 1.0. Also, > support > > values recorded as integers are treated as percentages and divided by 100 > > automatically. The phyloXML spec doesn't have this range requirement. > RAxML > > scales bootstraps to 100, but PhyML records the raw number of supporting > > bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap > > replicates). So, I'd prefer to leave the confidence values as they are, > > requiring only that they be numeric. Thoughts? > > > > > > Thanks, > > Eric > > 1. One issue is that current_clade.is_terminal() will always be true > at that point because current_clade's children haven't been parsed > yet. Putting the check in the "process_clade" function (which is > called when the closing paren is hit, and therefore all children > should have been parsed) should fix this. > > So, if values_are_confidence and comments_are_confidence are both > false and a node label is numeric, it should be treated as confidence, > and clade.name should be set to None - is that correct? > > 2. This should be as simple as removing current lines 123-127. > > ~Ben > Thanks. Here's #2: https://github.com/biopython/biopython/commit/0aee549e72fe5dcf9bcea239d29780706500922a I agree with your assessment of #1, but haven't been able to get it working yet. I'm leaving Bug #3407 open for now: https://redmine.open-bio.org/issues/3407 From ben at bendmorris.com Sun Feb 10 23:04:45 2013 From: ben at bendmorris.com (Ben Morris) Date: Sun, 10 Feb 2013 23:04:45 -0500 Subject: [Biopython-dev] New Newick parser in Bio.Phylo In-Reply-To: References: Message-ID: On Sun, Feb 10, 2013 at 10:30 PM, Eric Talevich wrote: > On Sun, Feb 10, 2013 at 9:39 PM, Ben Morris wrote: >> >> On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich >> wrote: >> > Hi Ben, >> > >> > I've noticed a couple new characteristics of the Newick parser that I >> > had >> > questions about. >> > >> > 1. There is no longer a way to tell the parser to treat internal node >> > labels >> > as confidence values. Lots of files in the wild do record the support >> > values >> > here, including those generated by RAxML, PhyML, FastTree and MrBayes, >> > so >> > I'd like to restore this option, and perhaps make it the default. I >> > think >> > the condition is: >> > >> > if not (self.values_are_confidence or self.comments_are_confidence or >> > current_clade.is_terminal()): # parse confidence from node label >> > >> > Is there an easy way to add this option to the parser? I'm trying to get >> > this to work in the "else" clause in parse_tree, where unquoted node >> > labels >> > are handled. >> > >> > >> > 2. Confidence values are required to be between 0.0 and 1.0. Also, >> > support >> > values recorded as integers are treated as percentages and divided by >> > 100 >> > automatically. The phyloXML spec doesn't have this range requirement. >> > RAxML >> > scales bootstraps to 100, but PhyML records the raw number of supporting >> > bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap >> > replicates). So, I'd prefer to leave the confidence values as they are, >> > requiring only that they be numeric. Thoughts? >> > >> > >> > Thanks, >> > Eric >> >> 1. One issue is that current_clade.is_terminal() will always be true >> at that point because current_clade's children haven't been parsed >> yet. Putting the check in the "process_clade" function (which is >> called when the closing paren is hit, and therefore all children >> should have been parsed) should fix this. >> >> So, if values_are_confidence and comments_are_confidence are both >> false and a node label is numeric, it should be treated as confidence, >> and clade.name should be set to None - is that correct? >> >> 2. This should be as simple as removing current lines 123-127. >> >> ~Ben > > > > Thanks. Here's #2: > https://github.com/biopython/biopython/commit/0aee549e72fe5dcf9bcea239d29780706500922a > > I agree with your assessment of #1, but haven't been able to get it working > yet. I'm leaving Bug #3407 open for now: > https://redmine.open-bio.org/issues/3407 > I think this should do it: https://github.com/bendmorris/biopython/commit/b430f27ff908f07d8ab59bec48429947f0028d63 I also updated the test case to make sure this is working correctly and changed the default value of comments_are_confidences from True to False. If that looks correct, feel free to pull. ~Ben From eric.talevich at gmail.com Sun Feb 10 23:20:20 2013 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 10 Feb 2013 23:20:20 -0500 Subject: [Biopython-dev] New Newick parser in Bio.Phylo In-Reply-To: References: Message-ID: On Sun, Feb 10, 2013 at 11:04 PM, Ben Morris wrote: > On Sun, Feb 10, 2013 at 10:30 PM, Eric Talevich > wrote: > > On Sun, Feb 10, 2013 at 9:39 PM, Ben Morris wrote: > >> > >> On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich > > >> wrote: > >> > Hi Ben, > >> > > >> > I've noticed a couple new characteristics of the Newick parser that I > >> > had > >> > questions about. > >> > > >> > 1. There is no longer a way to tell the parser to treat internal node > >> > labels > >> > as confidence values. Lots of files in the wild do record the support > >> > values > >> > here, including those generated by RAxML, PhyML, FastTree and MrBayes, > >> > so > >> > I'd like to restore this option, and perhaps make it the default. I > >> > think > >> > the condition is: > >> > > >> > if not (self.values_are_confidence or self.comments_are_confidence or > >> > current_clade.is_terminal()): # parse confidence from node label > >> > > >> > Is there an easy way to add this option to the parser? I'm trying to > get > >> > this to work in the "else" clause in parse_tree, where unquoted node > >> > labels > >> > are handled. > >> > > >> > > >> > 2. Confidence values are required to be between 0.0 and 1.0. Also, > >> > support > >> > values recorded as integers are treated as percentages and divided by > >> > 100 > >> > automatically. The phyloXML spec doesn't have this range requirement. > >> > RAxML > >> > scales bootstraps to 100, but PhyML records the raw number of > supporting > >> > bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap > >> > replicates). So, I'd prefer to leave the confidence values as they > are, > >> > requiring only that they be numeric. Thoughts? > >> > > >> > > >> > Thanks, > >> > Eric > >> > >> 1. One issue is that current_clade.is_terminal() will always be true > >> at that point because current_clade's children haven't been parsed > >> yet. Putting the check in the "process_clade" function (which is > >> called when the closing paren is hit, and therefore all children > >> should have been parsed) should fix this. > >> > >> So, if values_are_confidence and comments_are_confidence are both > >> false and a node label is numeric, it should be treated as confidence, > >> and clade.name should be set to None - is that correct? > >> > >> 2. This should be as simple as removing current lines 123-127. > >> > >> ~Ben > > > > > > > > Thanks. Here's #2: > > > https://github.com/biopython/biopython/commit/0aee549e72fe5dcf9bcea239d29780706500922a > > > > I agree with your assessment of #1, but haven't been able to get it > working > > yet. I'm leaving Bug #3407 open for now: > > https://redmine.open-bio.org/issues/3407 > > > > I think this should do it: > > > https://github.com/bendmorris/biopython/commit/b430f27ff908f07d8ab59bec48429947f0028d63 > > I also updated the test case to make sure this is working correctly > and changed the default value of comments_are_confidences from True to > False. > > If that looks correct, feel free to pull. > > ~Ben > Works for me, thanks! I cherry-picked it here: https://github.com/biopython/biopython/commit/f382f550f49f73301663ad949a6c1e40f5d71c0c From p.j.a.cock at googlemail.com Mon Feb 11 06:46:20 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 11 Feb 2013 11:46:20 +0000 Subject: [Biopython-dev] Dropping Python 2.5 and Jython 2.5 support? In-Reply-To: References: Message-ID: On Mon, Jan 7, 2013 at 6:55 PM, Peter Cock wrote: > > My only significant concern is for Jython users, since this will also > mean dropping support for Jython 2.5 (which implements the > Python 2.5 language). The replacement Jython 2.7 is still only > at the alpha release stage. Good news for Jython fans, although originally expected last year, they have now released a beta of Jython 2.7 (which supports the same language features as C Python 2.7): http://fwierzbicki.blogspot.co.uk/2013/02/jython-27-beta1-released.html Hopefully the Biopython unit tests will all be fine under this... and if so that is good news for phasing out support of Python 2.5. Regards, Peter From tiagoantao at gmail.com Mon Feb 11 06:50:10 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 11 Feb 2013 11:50:10 +0000 Subject: [Biopython-dev] Dropping Python 2.5 and Jython 2.5 support? In-Reply-To: References: Message-ID: On Mon, Feb 11, 2013 at 11:46 AM, Peter Cock wrote: > Good news for Jython fans, although originally expected last year, > they have now released a beta of Jython 2.7 (which supports the > same language features as C Python 2.7): I am going to setup builldbot now for this. I will set my slave first. If you have any slaves that you want to add this, please tell me. Tiago -- "Liberty for wolves is death to the lambs" - Isaiah Berlin From saketkc at gmail.com Tue Feb 12 04:51:54 2013 From: saketkc at gmail.com (Saket Choudhary) Date: Tue, 12 Feb 2013 15:21:54 +0530 Subject: [Biopython-dev] BWA Wrapper Message-ID: Hi, I am writing a bwa wrapper for bio-python. I have infact got the "index" option working. However I have a concern: bwa has these options : bwa index -a bwtsw database.fasta bwa aln database.fasta short_read.fastq > aln_sa.sai bwa samse database.fasta aln_sa.sai short_read.fastq > aln.sam bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam bwa bwasw database.fasta long_read.fastq > aln.sam If you read the documentation here, you will see that "-r" is an option with "aln" command as well as the "samse" command. In the former it is of type INT and in the latter of type STR. Now I am not sure how can this be taken care of in the wrapper, because I also plan to implement a checker_function. One way is to make a new class, lets say BwaAlignCommand which will take care of all options inside the "aln" command and separately implement another class say "BwaSamseCommand", and implement all the options of the "samse" command. But I am not sure if that is indeed the correct way of addressing the problem. Any pointers on this issue ? Thanks Saket Choudhary Undergraduate Student IIT Bombay,India From p.j.a.cock at googlemail.com Tue Feb 12 12:38:46 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 12 Feb 2013 17:38:46 +0000 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: On Tue, Feb 12, 2013 at 9:51 AM, Saket Choudhary wrote: > Hi, > > I am writing a bwa wrapper for bio-python. I have infact got the "index" > option working. However I have a concern: > > bwa has these options : > > bwa index -a bwtsw database.fasta > > bwa aln database.fasta short_read.fastq > aln_sa.sai > > bwa samse database.fasta aln_sa.sai short_read.fastq > aln.sam > > bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam > > bwa bwasw database.fasta long_read.fastq > aln.sam > > > If you read the documentation here, > you will see that "-r" is an option with "aln" command as well as the > "samse" command. In the former it is of type INT and in the latter of type > STR. Now I am not sure how can this be taken care of in the wrapper, > because I also plan to implement a checker_function. One way is to make a > new class, lets say BwaAlignCommand which will take care of all options > inside the "aln" command and separately implement another class say > "BwaSamseCommand", and implement all the options of the "samse" command. > But I am not sure if that is indeed the correct way of addressing the > problem. > > > Any pointers on this issue ? I would treat "bwa samse", "bwa sampe", "bwa ..." as separate tools and write a wrapper class for each of them. This would probably fit under the Bio.Sequencing.Applications namespace. Peter From p.j.a.cock at googlemail.com Tue Feb 12 12:51:15 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 12 Feb 2013 17:51:15 +0000 Subject: [Biopython-dev] Project ideas for GSoC (or other student projects) Message-ID: Hello all, Google recently confirmed they will be running Google Summer of Code 2013, and we (Biopython and the other Bio* projects) would hope to be accepted again under the Open Bioinformatics Foundation as in previous years: http://lists.open-bio.org/pipermail/gsoc/2013/000196.html It would be great to start coming up with potential project ideas, both larger pieces of work suitable for GSoC but also smaller tasks for other project students, or 'low hanging fruit' for potential contributors to cut their teeth on. See also http://biopython.org/wiki/Active_projects and the ideas list there. Regards, Peter From w.arindrarto at gmail.com Tue Feb 12 13:29:02 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 12 Feb 2013 19:29:02 +0100 Subject: [Biopython-dev] Project ideas for GSoC (or other student projects) In-Reply-To: References: Message-ID: Hi everyone, It's more or less a 'low hanging fruit', but I've been thinking perhaps it may be useful if we have our own interface to the HMMER3 online service? The corresponding SearchIO parsers may be written for this as well (they return different formats for which we haven't any parsers currently). And I think there are more things being worked on, not yet mentioned in the wiki: 1. Porting our docs to Sphinx[1] 2. Converting some/all of the print and compare tests to unit tests. For example, our Bio.Seq's tests are still print and compare tests. regards, Bow [1] See the original feature request here: https://redmine.open-bio.org/issues/3221 https://redmine.open-bio.org/issues/3220 https://redmine.open-bio.org/issues/3219 From eric.talevich at gmail.com Tue Feb 12 15:00:11 2013 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 12 Feb 2013 15:00:11 -0500 Subject: [Biopython-dev] Project ideas for GSoC (or other student projects) In-Reply-To: References: Message-ID: On Tue, Feb 12, 2013 at 12:51 PM, Peter Cock wrote: > Hello all, > > Google recently confirmed they will be running Google Summer of Code 2013, > and we (Biopython and the other Bio* projects) would hope to be accepted > again > under the Open Bioinformatics Foundation as in previous years: > http://lists.open-bio.org/pipermail/gsoc/2013/000196.html > > It would be great to start coming up with potential project ideas, both > larger > pieces of work suitable for GSoC but also smaller tasks for other project > students, or 'low hanging fruit' for potential contributors to cut > their teeth on. > One interesting GSoC project would be to implement support for phylogenetic placements. The programs pplacer and EPA (part of RAxML) can place sequence reads from metagenomic samples onto a reference phylogeny: http://matsen.fhcrc.org/pplacer/ http://sysbio.oxfordjournals.org/content/60/3/291 The output format of those programs has been standardized as something I suppose we could call the "jplace" format: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0031009 http://arxiv.org/abs/1201.3397 It's based on JSON and Newick, with a small extension to Newick that shouldn't be too hard to support. The GSoC project would be to implement a parser for this and implement querying as well as integration with the rest of Bio.Phylo to some reasonable extent. I would be available to mentor this. In terms of low-hanging fruit, there are some small but important functions that could be added to Bio.Phylo. My top three: Robinson-Foulds distance, majority-rules consensus, draw an unrooted tree using Felsenstein's Equal Daylight algorithm (which starts by computing the layout for a radial tree). -Eric From saketkc at gmail.com Tue Feb 12 15:45:46 2013 From: saketkc at gmail.com (Saket Choudhary) Date: Wed, 13 Feb 2013 02:15:46 +0530 Subject: [Biopython-dev] Project ideas for GSoC (or other student projects) In-Reply-To: References: Message-ID: Hi, I was thinking of a Synteny viewer on the lines of GSV if it makes sense . Saket On 12 February 2013 23:21, Peter Cock wrote: > Hello all, > > Google recently confirmed they will be running Google Summer of Code 2013, > and we (Biopython and the other Bio* projects) would hope to be accepted > again > under the Open Bioinformatics Foundation as in previous years: > http://lists.open-bio.org/pipermail/gsoc/2013/000196.html > > It would be great to start coming up with potential project ideas, both > larger > pieces of work suitable for GSoC but also smaller tasks for other project > students, or 'low hanging fruit' for potential contributors to cut > their teeth on. > > See also http://biopython.org/wiki/Active_projects and the ideas list > there. > > Regards, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From sefakilic at gmail.com Tue Feb 12 18:18:17 2013 From: sefakilic at gmail.com (=?UTF-8?B?U2VmYSBLxLFsxLHDpw==?=) Date: Tue, 12 Feb 2013 18:18:17 -0500 Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence In-Reply-To: References: Message-ID: Hi all, I am working on comparative genomics and I frequently use Motif module of Biopython. One of the most frequent operations that I do is to build a motif out of sites and search a sequence to find instances that are similar to the motif [Bio.Motif._Motif.search_instances()]. The problem is that the sequence that instances are searched is huge. Mostly it is the genome sequence itself, with its reverse complement. For example, scanning the E.coli genome + its reverse complement with a motif of length ~20 takes almost a minute in my machine. To make it faster, I implemented a C version of it and a Python interface so that you can call it from Python. It is pretty fast, it takes about ~2.5 seconds. Current implementation can be found at: https://github.com/sefakilic/yassi If anyone is interested and it is appropriate, I would like to modify the current implementation and integrate it into Biopython. Thanks! Sefa Kilic From mjldehoon at yahoo.com Tue Feb 12 21:06:33 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 12 Feb 2013 18:06:33 -0800 (PST) Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence In-Reply-To: Message-ID: <1360721193.98373.YahooMailClassic@web164003.mail.gq1.yahoo.com> Hi Sefa, Bio.Motif._Motif.search_instances() searches for exact instances of a motif, but it looks like your code searches for motifs based on its PSSM score. Then, isn't it the same as the current code in Bio/Motif/_pwm.c (or Bio/motifs/_pwm.c)? Best, -Michiel. --- On Tue, 2/12/13, Sefa K?l?? wrote: > From: Sefa K?l?? > Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence > To: biopython-dev at biopython.org > Date: Tuesday, February 12, 2013, 6:18 PM > Hi all, > > I am working on comparative genomics and I frequently use > Motif module of > Biopython. One of the most frequent operations that I do is > to build a > motif out of sites and search a sequence to find instances > that are similar > to the motif [Bio.Motif._Motif.search_instances()]. > > The problem is that the sequence that instances are searched > is huge. > Mostly it is the genome sequence itself, with its reverse > complement. For > example, scanning the E.coli genome + its reverse complement > with a motif > of length ~20 takes almost a minute in my machine. > > To make it faster, I implemented a C version of it and a > Python interface > so that you can call it from Python. It is pretty fast, it > takes about ~2.5 > seconds. > > Current implementation can be found at: > > https://github.com/sefakilic/yassi > > If anyone is interested and it is appropriate, I would like > to modify the > current implementation and integrate it into Biopython. > > Thanks! > > Sefa Kilic > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From mjldehoon at yahoo.com Tue Feb 12 21:08:26 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 12 Feb 2013 18:08:26 -0800 (PST) Subject: [Biopython-dev] Project ideas for GSoC (or other student projects) In-Reply-To: Message-ID: <1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com> It would be great to have better support for microarray analysis in Biopython. Something like lumi/limma in R. Perhaps this is an option for the GSoC? Best, -Michiel. --- On Tue, 2/12/13, Peter Cock wrote: > From: Peter Cock > Subject: [Biopython-dev] Project ideas for GSoC (or other student projects) > To: "Biopython-Dev Mailing List" > Date: Tuesday, February 12, 2013, 12:51 PM > Hello all, > > Google recently confirmed they will be running Google Summer > of Code 2013, > and we (Biopython and the other Bio* projects) would hope to > be accepted again > under the Open Bioinformatics Foundation as in previous > years: > http://lists.open-bio.org/pipermail/gsoc/2013/000196.html > > It would be great to start coming up with potential project > ideas, both larger > pieces of work suitable for GSoC but also smaller tasks for > other project > students, or 'low hanging fruit' for potential contributors > to cut > their teeth on. > > See also http://biopython.org/wiki/Active_projects > and the ideas list there. > > Regards, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From sefakilic at gmail.com Tue Feb 12 21:40:12 2013 From: sefakilic at gmail.com (=?UTF-8?B?U2VmYSBLxLFsxLHDpw==?=) Date: Tue, 12 Feb 2013 21:40:12 -0500 Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence In-Reply-To: <1360721193.98373.YahooMailClassic@web164003.mail.gq1.yahoo.com> References: <1360721193.98373.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: Hi Michiel, Thanks for the reply. It seems that _pwm.c does the same thing, as you said. I missed that part of the code. However, it seems that it is not mentioned in the tutorial and it might be useful to mention it there. Anyway, it was a good practice for re-implementing it. Thank you! Sefa Kilic On Tue, Feb 12, 2013 at 9:06 PM, Michiel de Hoon wrote: > Hi Sefa, > > Bio.Motif._Motif.search_instances() searches for exact instances of a > motif, but it looks like your code searches for motifs based on its PSSM > score. Then, isn't it the same as the current code in Bio/Motif/_pwm.c (or > Bio/motifs/_pwm.c)? > > Best, > -Michiel. > > --- On Tue, 2/12/13, Sefa K?l?? wrote: > > > From: Sefa K?l?? > > Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence > > To: biopython-dev at biopython.org > > Date: Tuesday, February 12, 2013, 6:18 PM > > Hi all, > > > > I am working on comparative genomics and I frequently use > > Motif module of > > Biopython. One of the most frequent operations that I do is > > to build a > > motif out of sites and search a sequence to find instances > > that are similar > > to the motif [Bio.Motif._Motif.search_instances()]. > > > > The problem is that the sequence that instances are searched > > is huge. > > Mostly it is the genome sequence itself, with its reverse > > complement. For > > example, scanning the E.coli genome + its reverse complement > > with a motif > > of length ~20 takes almost a minute in my machine. > > > > To make it faster, I implemented a C version of it and a > > Python interface > > so that you can call it from Python. It is pretty fast, it > > takes about ~2.5 > > seconds. > > > > Current implementation can be found at: > > > > https://github.com/sefakilic/yassi > > > > If anyone is interested and it is appropriate, I would like > > to modify the > > current implementation and integrate it into Biopython. > > > > Thanks! > > > > Sefa Kilic > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > From saketkc at gmail.com Thu Feb 14 11:02:21 2013 From: saketkc at gmail.com (Saket Choudhary) Date: Thu, 14 Feb 2013 21:32:21 +0530 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: Theres one more issue that I have run into . Consider the following command , the outout generated is written by piping it to a file called aln_sa.sai, bwa aln database.fasta short_read.fastq > aln_sa.sai Now if we look into the _call method here , it takes as its inout a boolean for stdout. So should I modify this so that it can take 'stdout' as on opened file instance which I can invoke while unvoking my BwaAlnCommandLine functions as follwos: a=BwaAlnCommandLine() b=a(stdout=open("aln_sa.sai","wb")) On 12 February 2013 23:08, Peter Cock wrote: > On Tue, Feb 12, 2013 at 9:51 AM, Saket Choudhary > wrote: > > Hi, > > > > I am writing a bwa wrapper for bio-python. I have infact got the "index" > > option working. However I have a concern: > > > > bwa has these options : > > > > bwa index -a bwtsw database.fasta > > > > bwa aln database.fasta short_read.fastq > aln_sa.sai > > > > bwa samse database.fasta aln_sa.sai short_read.fastq > aln.sam > > > > bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > > aln.sam > > > > bwa bwasw database.fasta long_read.fastq > aln.sam > > > > > > If you read the documentation here< > http://bio-bwa.sourceforge.net/bwa.shtml>, > > you will see that "-r" is an option with "aln" command as well as the > > "samse" command. In the former it is of type INT and in the latter of > type > > STR. Now I am not sure how can this be taken care of in the wrapper, > > because I also plan to implement a checker_function. One way is to make > a > > new class, lets say BwaAlignCommand which will take care of all options > > inside the "aln" command and separately implement another class say > > "BwaSamseCommand", and implement all the options of the "samse" command. > > But I am not sure if that is indeed the correct way of addressing the > > problem. > > > > > > Any pointers on this issue ? > > I would treat "bwa samse", "bwa sampe", "bwa ..." as separate tools and > write a wrapper class for each of them. This would probably fit under the > Bio.Sequencing.Applications namespace. > > Peter > From p.j.a.cock at googlemail.com Thu Feb 14 11:19:59 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 14 Feb 2013 16:19:59 +0000 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: On Thu, Feb 14, 2013 at 4:02 PM, Saket Choudhary wrote: > Theres one more issue that I have run into . Consider the following command > , the outout generated is written by piping it to a file called aln_sa.sai, > > bwa aln database.fasta short_read.fastq > aln_sa.sai > > Now if we look into the _call method here , it takes as its inout a boolean > for stdout. So should I modify this so that it can take 'stdout' as on > opened file instance which I can invoke while unvoking my BwaAlnCommandLine > functions as follwos: > > a=BwaAlnCommandLine() > b=a(stdout=open("aln_sa.sai","wb")) Is that possible? For complex use of subprocess and pipes, we've previously recommend the user handle this explicitly themselves, just use str() on the command line wrapper object to get 'bwa aln database.fasta short_read.fastq' in this case. There are some examples in the Tutorial with (multiple sequence) alignment tools. Peter From saketkc at gmail.com Thu Feb 14 12:04:04 2013 From: saketkc at gmail.com (Saket Choudhary) Date: Thu, 14 Feb 2013 22:34:04 +0530 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: I was thinking of adding one more parameter to the _call function, lets say 'stdout_to_filepath'. If this is set then I add one more if condition here to set the stdout as stdout_arg = open(stdout_to_filepath, "w") I tried it and it did work, but I am not sure if it this standard can be incorporated in the biopython codebase ? Thanks Saket On 14 February 2013 21:49, Peter Cock wrote: > On Thu, Feb 14, 2013 at 4:02 PM, Saket Choudhary > wrote: > > Theres one more issue that I have run into . Consider the following > command > > , the outout generated is written by piping it to a file called > aln_sa.sai, > > > > bwa aln database.fasta short_read.fastq > aln_sa.sai > > > > Now if we look into the _call method here , it takes as its inout a > boolean > > for stdout. So should I modify this so that it can take 'stdout' as on > > opened file instance which I can invoke while unvoking my > BwaAlnCommandLine > > functions as follwos: > > > > a=BwaAlnCommandLine() > > b=a(stdout=open("aln_sa.sai","wb")) > > Is that possible? > > For complex use of subprocess and pipes, we've previously recommend > the user handle this explicitly themselves, just use str() on the command > line wrapper object to get 'bwa aln database.fasta short_read.fastq' in > this > case. There are some examples in the Tutorial with (multiple sequence) > alignment tools. > > Peter > From saketkc at gmail.com Thu Feb 14 13:52:31 2013 From: saketkc at gmail.com (Saket Choudhary) Date: Fri, 15 Feb 2013 00:22:31 +0530 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: In short , am I allowed to play with this extra parameter thing as per the code standards of biopython ? On 14 February 2013 22:34, Saket Choudhary wrote: > I was thinking of adding one more parameter to the _call function, lets say > 'stdout_to_filepath'. > If this is set then I add one more if condition here to set the stdout as > > > stdout_arg = open(stdout_to_filepath, "w") > > I tried it and it did work, but I am not sure if it this standard can be > incorporated in the biopython codebase ? > > Thanks > > Saket > > > On 14 February 2013 21:49, Peter Cock wrote: >> >> On Thu, Feb 14, 2013 at 4:02 PM, Saket Choudhary >> wrote: >> > Theres one more issue that I have run into . Consider the following >> > command >> > , the outout generated is written by piping it to a file called >> > aln_sa.sai, >> > >> > bwa aln database.fasta short_read.fastq > aln_sa.sai >> > >> > Now if we look into the _call method here , it takes as its inout a >> > boolean >> > for stdout. So should I modify this so that it can take 'stdout' as on >> > opened file instance which I can invoke while unvoking my >> > BwaAlnCommandLine >> > functions as follwos: >> > >> > a=BwaAlnCommandLine() >> > b=a(stdout=open("aln_sa.sai","wb")) >> >> Is that possible? >> >> For complex use of subprocess and pipes, we've previously recommend >> the user handle this explicitly themselves, just use str() on the command >> line wrapper object to get 'bwa aln database.fasta short_read.fastq' in >> this >> case. There are some examples in the Tutorial with (multiple sequence) >> alignment tools. >> >> Peter > > From arklenna at gmail.com Thu Feb 14 14:43:18 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Thu, 14 Feb 2013 14:43:18 -0500 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: > > On 14 February 2013 22:34, Saket Choudhary wrote: > > I was thinking of adding one more parameter to the _call function, lets > say > > 'stdout_to_filepath'. > > If this is set then I add one more if condition here to set the stdout > as > > > > > > stdout_arg = open(stdout_to_filepath, "w") > > > > > What's wrong with accepting the stdout string that the current implementation provides and explicitly writing it to your file? Cheers, Lenna From arklenna at gmail.com Thu Feb 14 14:52:54 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Thu, 14 Feb 2013 14:52:54 -0500 Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: <1360468537.28338.YahooMailClassic@web164004.mail.gq1.yahoo.com> References: <1360468537.28338.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: On Sat, Feb 9, 2013 at 10:55 PM, Michiel de Hoon wrote: > > > Currently lex.yy.c contains lots of code that is generated automatically > by lex but is not actually needed for the mmCIF parser. I was thinking to > remove those parts, and to clean up the remainder so that the code is > understandable (allowing us to fix any bugs, or to convert it to pure > Python). > Whoops, failed to reply all. Sorry for the double email, Michiel. --- But generated C is by definition not understandable or debuggable. The only function of lex.yy.c is to tokenize the mmCIF input. All of the communication to Python is handled by MMCIFlexmodule.c, which is 70 lines and a header with 3 statements. In parallel to the PLY version, I rewrote the C to be object-oriented, which pushed it to 101 lines. Cheers, Lenna From p.j.a.cock at googlemail.com Thu Feb 14 15:33:37 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 14 Feb 2013 20:33:37 +0000 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: On Thu, Feb 14, 2013 at 7:43 PM, Lenna Peterson wrote: > > What's wrong with accepting the stdout string that the current > implementation provides and explicitly writing it to your file? > That is only a good idea for short output, say up to a few kb. With bwa (and samtools etc), quite often the output defaults to (or only goes to) stdout - and can be very large. It can also be binary rather than text, which is an additional complication with Python 2 vs Python 3 (byte strings versus unicode strings). See http://bio-bwa.sourceforge.net/bwa.shtml Peter From p.j.a.cock at googlemail.com Thu Feb 14 15:38:59 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 14 Feb 2013 20:38:59 +0000 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: On Thu, Feb 14, 2013 at 6:52 PM, Saket Choudhary wrote: > In short , am I allowed to play with this extra parameter thing as per > the code standards of biopython ? If you can come up with a nice extension to the current interface for the application wrapper's __call__ method, which is backward compatible, then we could be convinced. One idea would be stdout=True and stderr=True are treated as subprocess.PIPE (as now), and a false value would continue to mean don't capture the output (send it to /dev/null), but a (non-empty) string argument could be interpreted as a filename instead. You might be able to accept a handle, but I'm not sure if all Python handles would work or not here - it requires some careful cross platform testing. Peter From mjldehoon at yahoo.com Fri Feb 15 21:46:00 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 15 Feb 2013 18:46:00 -0800 (PST) Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: Message-ID: <1360982760.4805.YahooMailClassic@web164004.mail.gq1.yahoo.com> Hi Lenna, Maybe we are confusing each other.. I am looking for a solution that (a) doesn't introduce new dependencies, (b) is pure-Python so it can run on Jython, and (c) if that is not possible and we do need to use C, then that C code should be understandable so that it can be debugged if necessary. I was suggesting to clean up lex.yy.c so that we can at least achieve (c). The alternative is to start from the PLY-based parser and remove the dependency on PLY. Best, -Michiel. --- On Thu, 2/14/13, Lenna Peterson wrote: > From: Lenna Peterson > Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) > To: "BioPython-Dev Mailing List" > Date: Thursday, February 14, 2013, 2:52 PM > On Sat, Feb 9, 2013 at 10:55 PM, > Michiel de Hoon wrote: > > > > > > > Currently lex.yy.c contains lots of code that is > generated automatically > > by lex but is not actually needed for the mmCIF parser. > I was thinking to > > remove those parts, and to clean up the remainder so > that the code is > > understandable (allowing us to fix any bugs, or to > convert it to pure > > Python). > > > > Whoops, failed to reply all. Sorry for the double email, > Michiel. > > --- > > But generated C is by definition not understandable or > debuggable. The only > function of lex.yy.c is to tokenize the mmCIF input. > > All of the communication to Python is handled by > MMCIFlexmodule.c, which is > 70 lines and a header with 3 statements. In parallel to the > PLY version, I > rewrote the C to be object-oriented, which pushed it to 101 > lines. > > Cheers, > > Lenna > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From saketkc at gmail.com Sat Feb 16 02:08:46 2013 From: saketkc at gmail.com (Saket Choudhary) Date: Sat, 16 Feb 2013 12:38:46 +0530 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: On 15 February 2013 02:08, Peter Cock wrote: > On Thu, Feb 14, 2013 at 6:52 PM, Saket Choudhary wrote: >> In short , am I allowed to play with this extra parameter thing as per >> the code standards of biopython ? > > If you can come up with a nice extension to the current interface > for the application wrapper's __call__ method, which is backward > compatible, then we could be convinced. > > One idea would be stdout=True and stderr=True are treated as > subprocess.PIPE (as now), and a false value would continue > to mean don't capture the output (send it to /dev/null), but a > (non-empty) string argument could be interpreted as a filename > instead. You might be able to accept a handle, but I'm not sure > if all Python handles would work or not here - it requires some > careful cross platform testing. > > Peter HI Everyone, I have pushed the wrapper to https://github.com/saketkc/biopython/tree/bwa_wrapper Should I send a pull request ? I am in the middle of my University mid-semester examinations and hence this is not completely tested. I need to perform some more tests with more parameters after I am done with my examinations the next week. I would like to hear comments or have it code-reviewed, since this is the first time I am contributing to biopython and I might have missed out on some of the coding practices being followed. Thanks Saket From p.j.a.cock at googlemail.com Sat Feb 16 05:42:50 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 16 Feb 2013 10:42:50 +0000 Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: <1360982760.4805.YahooMailClassic@web164004.mail.gq1.yahoo.com> References: <1360982760.4805.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: On Sat, Feb 16, 2013 at 2:46 AM, Michiel de Hoon wrote: > Hi Lenna, > > Maybe we are confusing each other.. > I am looking for a solution that (a) doesn't introduce new dependencies, +1 > (b) is pure-Python so it can run on Jython, +1 And on PyPy (which to me is more interesting that Jython) etc. > and (c) if that is not possible and we do need to use C, then that C code > should be understandable so that it can be debugged if necessary. > > I was suggesting to clean up lex.yy.c so that we can at least achieve (c). This does mean we essentially give up on ever regenerating the lex.yy.c file every again - could that be a problem if Flex itself changes much? > The alternative is to start from the PLY-based parser and remove the > dependency on PLY. > > Best, > -Michiel. Peter From saketkc at gmail.com Sat Feb 16 06:48:43 2013 From: saketkc at gmail.com (Saket Choudhary) Date: Sat, 16 Feb 2013 17:18:43 +0530 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: Oops. Apparently I had forgotten to 'git add' the _bwa.py . Committed now : https://github.com/saketkc/biopython/commit/062aabf8f31a522929957f4bcd3f7a932f3bdf23 On 16 February 2013 12:38, Saket Choudhary wrote: > On 15 February 2013 02:08, Peter Cock wrote: >> On Thu, Feb 14, 2013 at 6:52 PM, Saket Choudhary wrote: >>> In short , am I allowed to play with this extra parameter thing as per >>> the code standards of biopython ? >> >> If you can come up with a nice extension to the current interface >> for the application wrapper's __call__ method, which is backward >> compatible, then we could be convinced. >> >> One idea would be stdout=True and stderr=True are treated as >> subprocess.PIPE (as now), and a false value would continue >> to mean don't capture the output (send it to /dev/null), but a >> (non-empty) string argument could be interpreted as a filename >> instead. You might be able to accept a handle, but I'm not sure >> if all Python handles would work or not here - it requires some >> careful cross platform testing. >> >> Peter > > > HI Everyone, > > I have pushed the wrapper to > https://github.com/saketkc/biopython/tree/bwa_wrapper > > Should I send a pull request ? I am in the middle of my University > mid-semester examinations and hence this is not completely tested. I > need to perform some more tests with more parameters after I am done > with my examinations the next week. > > > I would like to hear comments or have it code-reviewed, since this is > the first time I am contributing to biopython and I might have missed > out on some of the coding practices being followed. > > Thanks > > Saket From mjldehoon at yahoo.com Sat Feb 16 07:09:22 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 16 Feb 2013 04:09:22 -0800 (PST) Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: Message-ID: <1361016562.57361.YahooMailClassic@web164001.mail.gq1.yahoo.com> --- On Sat, 2/16/13, Peter Cock wrote: > This does mean we essentially give up on ever regenerating > the lex.yy.c file every again - could that be a problem if Flex > itself changes much? The lex.yy.c file was generated by Flex, but otherwise it's independent of it. It doesn't #include Flex's header files, and we don't link it to the Flex libraries. So we can do with it whatever we want. We may find though that a stripped-down version of lex.yy.c will be rather trivial, and converting it to Python may be straightforward. Best, -Michiel. From tiagoantao at gmail.com Mon Feb 18 08:57:15 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 18 Feb 2013 13:57:15 +0000 Subject: [Biopython-dev] Support for BioSQL on Java/Jython Message-ID: Dear All, I have implemented a set of changes to allow for BioSQL support in Jython. Features: 1. Totally transparent in terms of API. Indeed the existing tests on BioSQL work out of the box 2. MySQL and PostgreSQL. 3. No sqllite3 support. This library (standard in C-Python) does not exist in Jython You can find the changes here: https://github.com/tiagoantao/biopython/commits/master (top two commits) Comments appreciated. If there is no opposition, I will commit these soon (after incorporating feedback) to the main repo. -- "Liberty for wolves is death to the lambs" - Isaiah Berlin From p.j.a.cock at googlemail.com Mon Feb 18 12:44:30 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 18 Feb 2013 17:44:30 +0000 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: > On 16 February 2013 12:38, Saket Choudhary wrote: >> HI Everyone, >> >> I have pushed the wrapper to >> https://github.com/saketkc/biopython/tree/bwa_wrapper >> >> Should I send a pull request ? I am in the middle of my University >> mid-semester examinations and hence this is not completely tested. I >> need to perform some more tests with more parameters after I am done >> with my examinations the next week. >> >> >> I would like to hear comments or have it code-reviewed, since this is >> the first time I am contributing to biopython and I might have missed >> out on some of the coding practices being followed. >> >> Thanks >> >> Saket On Sat, Feb 16, 2013 at 11:48 AM, Saket Choudhary wrote: > Oops. Apparently I had forgotten to 'git add' the _bwa.py . Committed now : > > https://github.com/saketkc/biopython/commit/062aabf8f31a522929957f4bcd3f7a932f3bdf23 > This looks sensible. I think if we are going to extend the __call__ interface to allow stdout to be a filename, then we should do the same for stderr as well. Also this needs to be explained in the docstring (and perhaps also the Tutorial somewhere). Separately some simple unit tests for the wrapper would be good too (which can be as much work as the original code itself), and would be beneficial for cross-platform testing. Thanks, Peter From tiagoantao at gmail.com Tue Feb 19 06:42:22 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 19 Feb 2013 11:42:22 +0000 Subject: [Biopython-dev] Jython (non-existing) docs Message-ID: Hi, I had a cursory look at the documentation for installing Biopython under Jython and there seems to be none. If it is OK, I would extend the documentation to cover Jython -- "Liberty for wolves is death to the lambs" - Isaiah Berlin From p.j.a.cock at googlemail.com Tue Feb 19 07:01:15 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 19 Feb 2013 12:01:15 +0000 Subject: [Biopython-dev] Jython (non-existing) docs In-Reply-To: References: Message-ID: On Tue, Feb 19, 2013 at 11:42 AM, Tiago Ant?o wrote: > Hi, > > I had a cursory look at the documentation for installing Biopython under > Jython and there seems to be none. If it is OK, I would extend the > documentation to cover Jython I just use the usual mantra: jython setup.py build jython setup.py test jython setup.py install Perhaps there are pitfalls I'm not aware of? (Updating Doc/install/Installation.tex is still a good idea though) Peter From tiagoantao at gmail.com Tue Feb 19 07:02:18 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 19 Feb 2013 12:02:18 +0000 Subject: [Biopython-dev] Jython (non-existing) docs In-Reply-To: References: Message-ID: On Tue, Feb 19, 2013 at 12:01 PM, Peter Cock wrote: > Perhaps there are pitfalls I'm not aware of? > > JDBC driver for the new BioSQL code ;) Tiago -- "Liberty for wolves is death to the lambs" - Isaiah Berlin From p.j.a.cock at googlemail.com Tue Feb 19 07:07:39 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 19 Feb 2013 12:07:39 +0000 Subject: [Biopython-dev] Jython (non-existing) docs In-Reply-To: References: Message-ID: On Tue, Feb 19, 2013 at 12:02 PM, Tiago Ant?o wrote: > > On Tue, Feb 19, 2013 at 12:01 PM, Peter Cock > wrote: >> >> Perhaps there are pitfalls I'm not aware of? >> > > JDBC driver for the new BioSQL code ;) > > Tiago Good answer :) Yes, advice on installing optional dependencies like that makes sense. Peter From saketkc at gmail.com Tue Feb 19 08:15:56 2013 From: saketkc at gmail.com (Saket Choudhary) Date: Tue, 19 Feb 2013 18:45:56 +0530 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: On 18 February 2013 23:14, Peter Cock wrote: > > On 16 February 2013 12:38, Saket Choudhary wrote: > >> HI Everyone, > >> > >> I have pushed the wrapper to > >> https://github.com/saketkc/biopython/tree/bwa_wrapper > >> > >> Should I send a pull request ? I am in the middle of my University > >> mid-semester examinations and hence this is not completely tested. I > >> need to perform some more tests with more parameters after I am done > >> with my examinations the next week. > >> > >> > >> I would like to hear comments or have it code-reviewed, since this is > >> the first time I am contributing to biopython and I might have missed > >> out on some of the coding practices being followed. > >> > >> Thanks > >> > >> Saket > > > On Sat, Feb 16, 2013 at 11:48 AM, Saket Choudhary > wrote: > > Oops. Apparently I had forgotten to 'git add' the _bwa.py . Committed > now : > > > > > https://github.com/saketkc/biopython/commit/062aabf8f31a522929957f4bcd3f7a932f3bdf23 > > > > This looks sensible. I think if we are going to extend the __call__ > interface > to allow stdout to be a filename, then we should do the same for stderr > as well. Also this needs to be explained in the docstring (and perhaps > also the Tutorial somewhere). > > Separately some simple unit tests for the wrapper would be good too > (which can be as much work as the original code itself), and would > be beneficial for cross-platform testing. > > Thanks, > > Peter > Thanks Peter. I will add that. Any pointers to what would be a good reference test_aba.py file in Tests/ directory for writing unit tests for this ? I have worked on BDD before but Unit Tests are new for me, so it may take some time.I plan to finish it the coming week once my university examinations are done Thanks Saket From p.j.a.cock at googlemail.com Tue Feb 19 09:25:40 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 19 Feb 2013 14:25:40 +0000 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: On Tue, Feb 19, 2013 at 1:15 PM, Saket Choudhary wrote: > > Thanks Peter. > > I will add that. Any pointers to what would be a good reference test_aba.py > file in Tests/ directory for writing unit tests for this ? > > I have worked on BDD before but Unit Tests are new for me, so it may take > some time.I plan to finish it the coming week once my university > examinations are done > > Thanks > > Saket There's a chapter in the Tutorial about our test framework. In this case existing command line tool wrappers are the best reference, e.g. test_Emboss.py or test_Muscle.py Also if you want to use doctests and have them included in the test suite, add the module to the list in Tests/run_tests.py - however this does not handle optional dependencies (other than NumPy). Therefore all the application wrapper doctests to date have carefully avoided actually invoking the command line - and instead most print the string representation instead. This allows us to check the example use cases should run (and catches silly errors in the examples like a typo in an argument name). Thanks, Peter From p.j.a.cock at googlemail.com Sun Feb 24 07:42:47 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 24 Feb 2013 12:42:47 +0000 Subject: [Biopython-dev] [Biopython] blastdbcmd In-Reply-To: <5127B8D1.5090705@usp.br> References: <5127A44E.2030403@usp.br> <5127B8D1.5090705@usp.br> Message-ID: Great - let us know on the list if you have any questions. Peter On Fri, Feb 22, 2013 at 6:28 PM, Frederico Moraes Ferreira wrote: > Hi Peter, > Yes, I meant a Biopython Blast application for blastdbcmd. > Thanks for the link. > Best, > Fred > > Em 22-02-2013 14:23, Peter Cock escreveu: > >> On Fri, Feb 22, 2013 at 5:01 PM, Frederico Moraes Ferreira >> wrote: >>> >>> Hi there Biopythoneers, >>> As long as I know, there isnt't a blastdbcmd submodule into Biopython. >>> So, >>> I've been writing the blast matched sequences ID's to a file, fetching >>> them >>> all with a subprocess and reading with SeqIO afterwards. In some cases, >>> however, I miss a blastdbcmd parser to make things easy. How do you guys >>> are >>> dealing with this? >>> Best, >>> Fred >> >> Are you talking about a command line wrapper for blastdbcmd, to go in >> Bio/Blast/Applications.py? That seems a good idea. >> >> Personally I find the blastdbcmd tool quite handicapped due to the >> introduction of generated sequence identifiers, and rarely use it: >> >> http://blastedbio.blogspot.co.uk/2012/10/my-ids-not-good-enough-for-ncbi-blast.html >> >> Instead I would use Bio.SeqIO to index the FASTA file used for the >> database, and get the sequences that way. >> >> Peter >> > > -- > Dr. Frederico Moraes Ferreira > Universidade de S?o Paulo > Faculdade de Medicina > Instituto do Cora??o - Imunologia > Av. Dr. En?as de Carvalho Aguiar, 44 > 05403-900 S?o Paulo - SP > Brasil > From anaryin at gmail.com Tue Feb 26 11:14:52 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 26 Feb 2013 17:14:52 +0100 Subject: [Biopython-dev] Slight suggestion for PDBIO Message-ID: Hello all, I've modified slightly PDBIO to allow writing of any object of our PDB representation. Right now it accepts only Models or Structures (IIRC) and sometimes it's useful to have only a chain or a residue written. I've added a layer of code that builds the "missing" parts using StructureBuilder. I pushed it to a branch in my github account: https://github.com/JoaoRodrigues/biopython/tree/pdbio I've been using it for a while now so often I completely forgot about it.. Only noticed when I changed computers and the version there could not handle this. So I guess it should be solid enough. Cheers, Jo?o From eric.talevich at gmail.com Tue Feb 26 14:35:56 2013 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 26 Feb 2013 14:35:56 -0500 Subject: [Biopython-dev] Slight suggestion for PDBIO In-Reply-To: References: Message-ID: On Tue, Feb 26, 2013 at 11:14 AM, Jo?o Rodrigues wrote: > Hello all, > > I've modified slightly PDBIO to allow writing of any object of our PDB > representation. Right now it accepts only Models or Structures (IIRC) and > sometimes it's useful to have only a chain or a residue written. I've added > a layer of code that builds the "missing" parts using StructureBuilder. > > I pushed it to a branch in my github account: > > https://github.com/JoaoRodrigues/biopython/tree/pdbio > > I've been using it for a while now so often I completely forgot about it.. > Only noticed when I changed computers and the version there could not > handle this. So I guess it should be solid enough. > > Awesome. I support the idea. Could you do a pull request, so TravisCI runs the tests automatically? -Eric From anaryin at gmail.com Tue Feb 26 14:39:08 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 26 Feb 2013 20:39:08 +0100 Subject: [Biopython-dev] Slight suggestion for PDBIO In-Reply-To: References: Message-ID: There's some discussion about some implementation details: https://github.com/JoaoRodrigues/biopython/commit/cd86f3c8f4216d59440f4eaf8ac3ba2ab05d8eb4 What does everyone else think? Thanks for the input btw. Should I make a test too? I reckon it would be a good thing to add? 2013/2/26 Eric Talevich > On Tue, Feb 26, 2013 at 11:14 AM, Jo?o Rodrigues wrote: > >> Hello all, >> >> I've modified slightly PDBIO to allow writing of any object of our PDB >> representation. Right now it accepts only Models or Structures (IIRC) and >> sometimes it's useful to have only a chain or a residue written. I've >> added >> a layer of code that builds the "missing" parts using StructureBuilder. >> >> I pushed it to a branch in my github account: >> >> https://github.com/JoaoRodrigues/biopython/tree/pdbio >> >> I've been using it for a while now so often I completely forgot about it.. >> Only noticed when I changed computers and the version there could not >> handle this. So I guess it should be solid enough. >> >> > Awesome. I support the idea. Could you do a pull request, so TravisCI runs > the tests automatically? > > -Eric > From davidjosephcain at gmail.com Tue Feb 26 14:47:32 2013 From: davidjosephcain at gmail.com (David Cain) Date: Tue, 26 Feb 2013 14:47:32 -0500 Subject: [Biopython-dev] Slight suggestion for PDBIO In-Reply-To: References: Message-ID: I failed to mention this sooner, but I'm an enthusiastic proponent of what you've done. Your new set_structure() would be immensely helpful to me, as I've been using some workarounds to achieve the functionality you've implemented. Personally, I think a unit test would be really helpful in ensuring chain-less residues and the like will save appropriately. David Cain +1 (339) 222 4452 On Tue, Feb 26, 2013 at 2:39 PM, Jo?o Rodrigues wrote: > There's some discussion about some implementation details: > > > https://github.com/JoaoRodrigues/biopython/commit/cd86f3c8f4216d59440f4eaf8ac3ba2ab05d8eb4 > > What does everyone else think? > > Thanks for the input btw. Should I make a test too? I reckon it would be a > good thing to add? > > > 2013/2/26 Eric Talevich > > > On Tue, Feb 26, 2013 at 11:14 AM, Jo?o Rodrigues >wrote: > > > >> Hello all, > >> > >> I've modified slightly PDBIO to allow writing of any object of our PDB > >> representation. Right now it accepts only Models or Structures (IIRC) > and > >> sometimes it's useful to have only a chain or a residue written. I've > >> added > >> a layer of code that builds the "missing" parts using StructureBuilder. > >> > >> I pushed it to a branch in my github account: > >> > >> https://github.com/JoaoRodrigues/biopython/tree/pdbio > >> > >> I've been using it for a while now so often I completely forgot about > it.. > >> Only noticed when I changed computers and the version there could not > >> handle this. So I guess it should be solid enough. > >> > >> > > Awesome. I support the idea. Could you do a pull request, so TravisCI > runs > > the tests automatically? > > > > -Eric > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From p.j.a.cock at googlemail.com Tue Feb 26 16:45:00 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 26 Feb 2013 21:45:00 +0000 Subject: [Biopython-dev] Slight suggestion for PDBIO In-Reply-To: References: Message-ID: On Tue, Feb 26, 2013 at 2:39 PM, Jo?o Rodrigues wrote: >> Should I make a test too? I reckon it would be a good thing to add? >> On Tue, Feb 26, 2013 at 7:47 PM, David Cain wrote: > ... > > Personally, I think a unit test would be really helpful in ensuring > chain-less residues and the like will save appropriately. Absolutely, +1 on adding a test or two for this new functionality. And if there is anywhere in the Tutorial or docstrings which would benefit from mentioning this too, could you update that too please? Thanks, Peter From anaryin at gmail.com Wed Feb 27 04:25:26 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 27 Feb 2013 10:25:26 +0100 Subject: [Biopython-dev] Slight suggestion for PDBIO In-Reply-To: References: Message-ID: I'll have a look at the tutorial later (I think it is in the Bio.PDB FAQ). The whitespace issue is solved I think. What are the rules exactly? Sorry if I'm at a bit of a loss here.. I added tests for the save functions (a full structure and a single residue) as well as one for the chainless residue. I added the suggestion from David to keep the id in the parent if there is one. I reverted the commit and added the same (- the whitespace) and another with tests. If it looks ok, I'll make a pull request (if I can find the button, never did that..). Cheers, Jo?o 2013/2/26 Peter Cock > On Tue, Feb 26, 2013 at 2:39 PM, Jo?o Rodrigues wrote: > >> Should I make a test too? I reckon it would be a good thing to add? > >> > > On Tue, Feb 26, 2013 at 7:47 PM, David Cain > wrote: > > ... > > > > Personally, I think a unit test would be really helpful in ensuring > > chain-less residues and the like will save appropriately. > > Absolutely, +1 on adding a test or two for this new functionality. > > And if there is anywhere in the Tutorial or docstrings which would > benefit from mentioning this too, could you update that too please? > > Thanks, > > Peter > From p.j.a.cock at googlemail.com Wed Feb 27 11:34:42 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 27 Feb 2013 16:34:42 +0000 Subject: [Biopython-dev] Slight suggestion for PDBIO In-Reply-To: References: Message-ID: On Wed, Feb 27, 2013 at 9:25 AM, Jo?o Rodrigues wrote: > I'll have a look at the tutorial later (I think it is in the Bio.PDB FAQ). > > The whitespace issue is solved I think. What are the rules exactly? Sorry if > I'm at a bit of a loss here.. PEP8, http://www.python.org/dev/peps/pep-0008/#blank-lines (Currently an aspiration for the Biopython code, rather than a strict requirement) > I added tests for the save functions (a full structure and a single residue) > as well as one for the chainless residue. I added the suggestion from David > to keep the id in the parent if there is one. > > I reverted the commit and added the same (- the whitespace) and another with > tests. If it looks ok, I'll make a pull request (if I can find the button, > never did that..). GitHub have made it quite easy, but the first time is always the hardest. Good luck, and if you get stuck we can try to help or just pull the commits in directly from your fork. Thanks, Peter From anaryin at gmail.com Wed Feb 27 11:41:45 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 27 Feb 2013 17:41:45 +0100 Subject: [Biopython-dev] Slight suggestion for PDBIO In-Reply-To: References: Message-ID: Ok, done I guess: https://github.com/biopython/biopython/pull/165/files Thanks for all the input! 2013/2/27 Peter Cock > On Wed, Feb 27, 2013 at 9:25 AM, Jo?o Rodrigues wrote: > > I'll have a look at the tutorial later (I think it is in the Bio.PDB > FAQ). > > > > The whitespace issue is solved I think. What are the rules exactly? > Sorry if > > I'm at a bit of a loss here.. > > PEP8, http://www.python.org/dev/peps/pep-0008/#blank-lines > > (Currently an aspiration for the Biopython code, rather than a strict > requirement) > > > I added tests for the save functions (a full structure and a single > residue) > > as well as one for the chainless residue. I added the suggestion from > David > > to keep the id in the parent if there is one. > > > > I reverted the commit and added the same (- the whitespace) and another > with > > tests. If it looks ok, I'll make a pull request (if I can find the > button, > > never did that..). > > GitHub have made it quite easy, but the first time is always the hardest. > Good luck, and if you get stuck we can try to help or just pull the commits > in directly from your fork. > > Thanks, > > Peter > From p.j.a.cock at googlemail.com Wed Feb 27 17:32:35 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 27 Feb 2013 22:32:35 +0000 Subject: [Biopython-dev] Fwd: [Numpy-discussion] [ANN] SciPy2013: Call for abstracts In-Reply-To: References: Message-ID: The new bioinformatics mini-symposium this year makes SciPy 2013 especially interesting. Peter ---------- Forwarded message ---------- From: *Jonathan Rocher* Date: Wednesday, February 27, 2013 Subject: [Numpy-discussion] [ANN] SciPy2013: Call for abstracts To: SciPy Users List , numfocus at googlegroups.com, Discussion of Numerical Python [Apologies for cross-posts] Dear all, The annual SciPy Conference (Scientific Computing with Python) allows participants from academic, commercial, and governmental organizations to showcase their latest projects, learn from skilled users and developers, and collaborate on code development. *The deadline for abstract submissions is March 20th, 2013. * Submissions are welcome that address general Scientific Computing with Python, one of the two special themes for this years conference (machine learning & reproducible science), or the domain-specific mini-symposiaheld during the conference (Meteorology, climatology, and atmospheric and oceanic science, Astronomy and astrophysics, Medical imaging, Bio-informatics). Please submit your abstract at the SciPy 2013 website abstract submission form . Abstracts will be accepted for posters or presentations. Optional papers to be published in the conference proceedings will be requested following abstract submission. This year the proceedings will be made available prior to the conference to help attendees navigate the conference. We look forward to an exciting and interesting set of talks, posters, and discussions and hope to see you at the conference. The SciPy 2013 Program Committee Chairs Matt McCormick, Kitware, Inc. Katy Huff, University of Wisconsin-Madison and Argonne National Laboratory From redmine at redmine.open-bio.org Wed Feb 27 20:53:22 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 28 Feb 2013 01:53:22 +0000 Subject: [Biopython-dev] [Biopython - Bug #3419] (New) Bio.SearchIO.FastaIO Message-ID: Issue #3419 has been reported by Jason Stajich. ---------------------------------------- Bug #3419: Bio.SearchIO.FastaIO https://redmine.open-bio.org/issues/3419 Author: Jason Stajich Status: New Priority: Low Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: The strand of the translated sequence (query or subject depending on the analysis) is lost for tfastxy and fastx/y reports. from Bio import SearchIO qresults = SearchIO.parse('test.FASTY.out','fasta-m10') for qresult in qresults: for hit in qresult: for hsp in hit.hsps: print qresult.id, " ", hit.id, " ", \ hsp.query_start, "..",hsp.query_end, " ", hsp.query_strand, " ", \ hsp.hit_start, "..",hsp.hit_end, " ", hsp.hit_strand ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Wed Feb 27 20:53:23 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 28 Feb 2013 01:53:23 +0000 Subject: [Biopython-dev] [Biopython - Bug #3419] (New) Bio.SearchIO.FastaIO Message-ID: Issue #3419 has been reported by Jason Stajich. ---------------------------------------- Bug #3419: Bio.SearchIO.FastaIO https://redmine.open-bio.org/issues/3419 Author: Jason Stajich Status: New Priority: Low Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: The strand of the translated sequence (query or subject depending on the analysis) is lost for tfastxy and fastx/y reports. from Bio import SearchIO qresults = SearchIO.parse('test.FASTY.out','fasta-m10') for qresult in qresults: for hit in qresult: for hsp in hit.hsps: print qresult.id, " ", hit.id, " ", \ hsp.query_start, "..",hsp.query_end, " ", hsp.query_strand, " ", \ hsp.hit_start, "..",hsp.hit_end, " ", hsp.hit_strand -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Feb 28 02:08:50 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 28 Feb 2013 07:08:50 +0000 Subject: [Biopython-dev] [Biopython - Bug #3419] Bio.SearchIO.FastaIO References: Message-ID: Issue #3419 has been updated by Wibowo Arindrarto. Hi Jason, Thanks for the report :). Do you have an example file handy which I can try to include in our test suite? The FASTA parser was not tested using [t]fast[y|x], so there may be some lines / cases which the parser couldn't handle. ---------------------------------------- Bug #3419: Bio.SearchIO.FastaIO https://redmine.open-bio.org/issues/3419 Author: Jason Stajich Status: New Priority: Low Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: The strand of the translated sequence (query or subject depending on the analysis) is lost for tfastxy and fastx/y reports. from Bio import SearchIO qresults = SearchIO.parse('test.FASTY.out','fasta-m10') for qresult in qresults: for hit in qresult: for hsp in hit.hsps: print qresult.id, " ", hit.id, " ", \ hsp.query_start, "..",hsp.query_end, " ", hsp.query_strand, " ", \ hsp.hit_start, "..",hsp.hit_end, " ", hsp.hit_strand -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Feb 28 02:38:20 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 28 Feb 2013 07:38:20 +0000 Subject: [Biopython-dev] [Biopython - Bug #3419] Bio.SearchIO.FastaIO References: Message-ID: Issue #3419 has been updated by Jason Stajich. File bll0026-vs-94.tfasty added Here is a -m 10 report. I made this local patch to get it to report the strands, but this is not quite right because you actually don't have a strand for the query which is the protein. diff --git a/Bio/SearchIO/FastaIO.py b/Bio/SearchIO/FastaIO.py index ca08797..794efb8 100644 --- a/Bio/SearchIO/FastaIO.py +++ b/Bio/SearchIO/FastaIO.py @@ -197,7 +197,7 @@ def _set_hsp_seqs(hsp, parsed, program): # set seq and alphabet setattr(hsp.fragment, seq_type, parsed[seq_type]['seq']) - if alphabet is not generic_protein: + if alphabet is not generic_protein or 'tfast' in program: # get strand from coordinate; start <= end is plus # start > end is minus if start <= end: In BioPerl I solved this by writing explicit code for the TBLASTN/TFAST[XY] and BLASTX/FAST[XY] situations which then new whether the query or subject was translated DNA with a strand or input DNA. ---------------------------------------- Bug #3419: Bio.SearchIO.FastaIO https://redmine.open-bio.org/issues/3419 Author: Jason Stajich Status: New Priority: Low Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: The strand of the translated sequence (query or subject depending on the analysis) is lost for tfastxy and fastx/y reports. from Bio import SearchIO qresults = SearchIO.parse('test.FASTY.out','fasta-m10') for qresult in qresults: for hit in qresult: for hsp in hit.hsps: print qresult.id, " ", hit.id, " ", \ hsp.query_start, "..",hsp.query_end, " ", hsp.query_strand, " ", \ hsp.hit_start, "..",hsp.hit_end, " ", hsp.hit_strand -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From chapmanb at 50mail.com Thu Feb 28 21:25:42 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 28 Feb 2013 21:25:42 -0500 Subject: [Biopython-dev] Coming soon: BOSC/Broad Hackathon, SciPy Bioinformatics, BOSC Codefest Message-ID: <87lia7ua8p.fsf@fastmail.fm> Hi all; There are some upcoming coding events and conferences of interest to open source biology programmers: - BOSC/Broad Interoperability Hackathon -- This is a two day coding session at the Broad Institute in Cambridge, MA on April 7-8 focused on improving tool interoperability. Sign up and details: http://j.mp/XJT6ew - SciPy 2013 -- The Scientific Python conference is June 26-27 in Austin and has a Bioinformatics mini-symposia this year. They're doing some great work like IPython, NumPy, SciPy and scikit-learn; and this is a nice opportunity to reach a new set of like minded programmers and expand the open source bioinformatics community. Bioinformatics mini-symposia: http://j.mp/Z4xxXB Abstract details: http://conference.scipy.org/scipy2013/about.php - Codefest at the Bioinformatics Open Source Conference -- This year BOSC is taking place in Berlin from July 19-20 and we'll have a two day coding session before the conference. This is the 4th year of Codefests and they've proven to be a productive and fun time to work collectively on open source projects. Sign up and details: http://www.open-bio.org/wiki/Codefest_2013 BOSC conference: http://www.open-bio.org/wiki/BOSC_2013 Here are the key dates for the events and abstracts: March 20, 2013: SciPy abstracts due April 7-8, 2013: BOSC/Broad Interoperability Hackathon, Cambridge, MA April 12, 2013: BOSC abstracts due June 24-29, 2013: SciPy in Austin, TX July 17-18, 2013: Codefest 2013, Berlin July 19-20, 2013: BOSC 2013, Berlin Looking forward to seeing everyone this spring and summer for plenty of fun science and code, Brad From chapmanb at 50mail.com Thu Feb 28 21:36:34 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 28 Feb 2013 21:36:34 -0500 Subject: [Biopython-dev] [ANN] SciPy2013: Call for abstracts In-Reply-To: References: Message-ID: <87ppzjsv65.fsf@fastmail.fm> Peter; Thanks for sending this out. I'm helping with the organization of the SciPy bioinformatics session thanks to Peter's recommendation and wrote up a little bit about the types of abstracts that would fit will with the overall theme of SciPy: http://j.mp/Z4xxXB This is a great chance to connect with another open source scientific community so definitely send in an abstract if this is of interest; the deadline is coming up next month: March 20th. Austin also has awesome music and barbecue in addition to science and hacking so lots of reasons to attend, Brad > The new bioinformatics mini-symposium this year makes SciPy 2013 > especially interesting. > > Peter > > ---------- Forwarded message ---------- > From: *Jonathan Rocher* > Date: Wednesday, February 27, 2013 > Subject: [Numpy-discussion] [ANN] SciPy2013: Call for abstracts > To: SciPy Users List , numfocus at googlegroups.com, > Discussion of Numerical Python > > > [Apologies for cross-posts] > > Dear all, > > The annual SciPy Conference (Scientific Computing with > Python) allows > participants from academic, commercial, and governmental organizations to > showcase their latest projects, learn from skilled users and developers, > and collaborate on code development. *The deadline for abstract submissions > is March 20th, 2013. * > > Submissions are welcome that address general Scientific Computing with > Python, one of the two special themes for this years conference (machine > learning & reproducible science), or the domain-specific > mini-symposiaheld > during the conference (Meteorology, climatology, and atmospheric and > oceanic science, Astronomy and astrophysics, Medical imaging, > Bio-informatics). > > Please submit your abstract at the SciPy 2013 website abstract submission > form . > Abstracts will be accepted for posters or presentations. Optional papers to > be published in the conference proceedings will be requested following > abstract submission. This year the proceedings will be made available prior > to the conference to help attendees navigate the conference. > > We look forward to an exciting and interesting set of talks, posters, and > discussions and hope to see you at the conference. > The SciPy 2013 Program Committee Chairs > > Matt McCormick, Kitware, Inc. > Katy Huff, University of Wisconsin-Madison and Argonne National Laboratory > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From mjldehoon at yahoo.com Fri Feb 1 05:35:18 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 31 Jan 2013 21:35:18 -0800 (PST) Subject: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone In-Reply-To: Message-ID: <1359696918.16306.YahooMailClassic@web164005.mail.gq1.yahoo.com> Hi Peter, Bow, > > I'm OK with using the setUp and tearDown arguments to > > doctest.DocTestSuite to do the directory magic, but > keeping the test files > > under Tests/. > > As a more elegant version of the Bio._utils.run_doctest() > function? Exactly. Bow, do you want to give this approach a try? (Assuming that there are no further objections from the other developers). Best, -Michiel. From w.arindrarto at gmail.com Fri Feb 1 10:29:59 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 1 Feb 2013 11:29:59 +0100 Subject: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone In-Reply-To: <1359696918.16306.YahooMailClassic@web164005.mail.gq1.yahoo.com> References: <1359696918.16306.YahooMailClassic@web164005.mail.gq1.yahoo.com> Message-ID: Hi Michiel, Peter, everyone, >> > I'm OK with using the setUp and tearDown arguments to >> > doctest.DocTestSuite to do the directory magic, but >> keeping the test files >> > under Tests/. >> >> As a more elegant version of the Bio._utils.run_doctest() >> function? > > Exactly. Bow, do you want to give this approach a try? (Assuming that there are no further objections from the other developers). Just to be clear, we are: * changing all module's doctest file path to use relative paths (with respect to the module's location), * replacing the run_doctest() import with a simpler doctest import and `doctest.testmod()` in each module having this doctest * resorting to setUp and tearDown in the DocTestSuite in `run_tests.py` so that each module / submodule can find their test files * and refactoring all string functions in Bio._utils to Bio.Phylo and Bio.SearchIO, so that we can remove Bio._utils, right? I'd be happy to give this a shot if everyone feels the same :). Regards, Bow From p.j.a.cock at googlemail.com Fri Feb 1 11:07:22 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 11:07:22 +0000 Subject: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone In-Reply-To: References: <1359696918.16306.YahooMailClassic@web164005.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 10:29 AM, Wibowo Arindrarto wrote: > Hi Michiel, Peter, everyone, > >>> > I'm OK with using the setUp and tearDown arguments to >>> > doctest.DocTestSuite to do the directory magic, but >>> > keeping the test files >>> > under Tests/. >>> >>> As a more elegant version of the Bio._utils.run_doctest() >>> function? >> >> Exactly. Bow, do you want to give this approach a try? >> (Assuming that there are no further objections from the other developers). > > Just to be clear, we are: > > * changing all module's doctest file path to use relative paths (with > respect to the module's location), > * replacing the run_doctest() import with a simpler doctest import and > `doctest.testmod()` in each module having this doctest > * resorting to setUp and tearDown in the DocTestSuite in > `run_tests.py` so that each module / submodule can find their test > files That wasn't my understanding - I thought we we just talking about making the Bio._utils.run_doctest() use setUp and tearDown to take care of the path changes (although I'm not sure if that will actually be any shorter - we'd find out). > * and refactoring all string functions in Bio._utils to Bio.Phylo and > Bio.SearchIO, so that we can remove Bio._utils, I'm not particularly bothered either way on this. Having misc utilities like this under Bio.Phylo or Bio.SearchIO makes is clear where they are used, and makes it easier to compartmentalise functionality. Regards, Peter From mjldehoon at yahoo.com Fri Feb 1 11:23:15 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 1 Feb 2013 03:23:15 -0800 (PST) Subject: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone In-Reply-To: Message-ID: <1359717795.82942.YahooMailClassic@web164001.mail.gq1.yahoo.com> Hi Bow, Yes, that is correct. Responding to Peter's email: Peter, do you agree with this approach? Best, -Michiel. --- On Fri, 2/1/13, Wibowo Arindrarto wrote: > From: Wibowo Arindrarto > Subject: Re: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone > To: "Michiel de Hoon" > Cc: "Peter Cock" , "BioPython-Dev Mailing List" > Date: Friday, February 1, 2013, 5:29 AM > Hi Michiel, Peter, everyone, > > >> > I'm OK with using the setUp and tearDown > arguments to > >> > doctest.DocTestSuite to do the directory > magic, but > >> keeping the test files > >> > under Tests/. > >> > >> As a more elegant version of the > Bio._utils.run_doctest() > >> function? > > > > Exactly. Bow, do you want to give this approach a try? > (Assuming that there are no further objections from the > other developers). > > Just to be clear, we are: > > * changing all module's doctest file path to use relative > paths (with > respect to the module's location), > * replacing the run_doctest() import with a simpler doctest > import and > `doctest.testmod()` in each module having this doctest > * resorting to setUp and tearDown in the DocTestSuite in > `run_tests.py` so that each module / submodule can find > their test > files > * and refactoring all string functions in Bio._utils to > Bio.Phylo and > Bio.SearchIO, so that we can remove Bio._utils, > > right? > > I'd be happy to give this a shot if everyone feels the same > :). > > Regards, > Bow > From p.j.a.cock at googlemail.com Fri Feb 1 11:51:16 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 11:51:16 +0000 Subject: [Biopython-dev] Trie with_prefix doesn't work as expected In-Reply-To: References: Message-ID: On Thu, Jan 31, 2013 at 11:38 AM, Peter Cock wrote: > > Thanks to Jeff Chang for a very speedy fix (sent as an attachment off list), > which I have applied to the repository: > https://github.com/biopython/biopython/commit/cd7cc7174fd4b0607381e9c58f6ae0d17cca8f74 > > I've also added a unit test based on Kevin's example: > https://github.com/biopython/biopython/commit/efc289c8fe2e78ad12481973e42554fa40f2ea0a > > Thank you for reporting this Kevin. > > Peter > > P.S. Nice to hear from you again Jeff :) > > I think your last commit was before we moved from CVS to git, please > let us know if you want commit access on github. Thanks again to Kevin for another test case, and a Jeff for another quick code fix where a trie key exceeded the MAX_KEY_LENGTH buffer: https://github.com/biopython/biopython/commit/31909c8725d5cfbfba2096b7c15ef7afeaf20a5b Peter From redmine at redmine.open-bio.org Fri Feb 1 11:51:51 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Fri, 1 Feb 2013 11:51:51 +0000 Subject: [Biopython-dev] [Biopython - Bug #3395] Biopython trie implementation can't load large data sets References: Message-ID: Issue #3395 has been updated by Peter Cock. Kevin Wu reported a related issue, which we discussed with Jeff Chang (off list), where a key in the trie exceeded 1000 bytes (the original value of MAX_KEY_LENGTH). See: http://lists.open-bio.org/pipermail/biopython-dev/2013-February/010284.html https://github.com/biopython/biopython/commit/31909c8725d5cfbfba2096b7c15ef7afeaf20a5b (Ideally we could give a specific ValueError exception here, but nevertheless the current print message is an improvement) ---------------------------------------- Bug #3395: Biopython trie implementation can't load large data sets https://redmine.open-bio.org/issues/3395 Author: Micha? Nowotka Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Imagine I have Biopython trie: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'w') tr = trie.trie() #fill in the trie trie.save(f, trie) Now /tmp/trie.dat.gz is about 50MB. Let's try to read it: from Bio import trie import gzip f = gzip.open('/tmp/trie.dat.gz', 'r') tr = trie.load(f) Unfortunately I'm getting meaningless error saying: "loading failed for some reason" Any hints? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Fri Feb 1 12:14:49 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 12:14:49 +0000 Subject: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone In-Reply-To: <1359717795.82942.YahooMailClassic@web164001.mail.gq1.yahoo.com> References: <1359717795.82942.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: >Wibowo Arindrarto wrote: >> Just to be clear, we are: >> >> * changing all module's doctest file path to use relative >> paths (with >> respect to the module's location), >> * replacing the run_doctest() import with a simpler doctest >> import and >> `doctest.testmod()` in each module having this doctest >> * resorting to setUp and tearDown in the DocTestSuite in >> `run_tests.py` so that each module / submodule can find >> their test >> files >> * and refactoring all string functions in Bio._utils to >> Bio.Phylo and >> Bio.SearchIO, so that we can remove Bio._utils, >> >> right? >> >> I'd be happy to give this a shot if everyone feels the same >> :). >> On Fri, Feb 1, 2013 at 11:23 AM, Michiel de Hoon wrote: > Hi Bow, > > Yes, that is correct. > Responding to Peter's email: Peter, do you agree with this approach? > > Best, > -Michiel. No. I think we have misunderstood each other on the doctest discussion :( If we keep the test files under Tests/ (and I think that is best) then for example look at this doctest in Bio/SeqRecord.py >>> from Bio import SeqIO >>> record = SeqIO.read(open("Fasta/sweetpea.nu"),"fasta") >>> len(record) 309 That is currently written to assume it is run from the Tests/ folder. If we write this assuming is it in the Bio/ folder where the Python file SeqRecord.py lives, it becomes: >>> from Bio import SeqIO >>> record = SeqIO.read(open("../Tests/Fasta/sweetpea.nu"),"fasta") >>> len(record) 309 I think a beginner would find that more confusing. It is also longer and we already have trouble with some lines exceeding 80 chars. Ideally there would be a nice way for doctests to specify the folder, and then we could use a simple filename like "sweetpea.nu" with no directories at all. But I don't think that is possible without us making the testing infrastructure even more complicated. -- If we want to get rid of Bio._utils.run_doctest() (and the whole of the file Bio/_utils.py) then I would prefer reverting to the old situation prior to adding the Bio._utils.run_doctest() helper function. If the repetitive code snippets to run the doctests of a module are a problem it can be shortened to something less flexible, for example in Bio/SeqRecord.py could use something very short like this: if __name__ == "__main__": assert os.path.isfile("Fasta/sweetpea.nu"), "Run from Tests/ folder" import doctest doctest.testmod(verbose=2) Or, as I suggested before, we can remove these development convenience hooks completely? -- On the subject of the string functions in Bio/_utils.py, I have no objection to moving them back under Bio.SearchIO and/or Bio.Phylo - which has advantages in terms of modularity (a good thing for preventing accidental side effects). Regards, Peter From mjldehoon at yahoo.com Fri Feb 1 13:54:46 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 1 Feb 2013 05:54:46 -0800 (PST) Subject: [Biopython-dev] Namespace for online resources? Message-ID: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com> Hi Lenna, > Regarding point (2), is your primary concern namespace clutter or > importing efficiency? Regarding point (2), my primary concern is that a Bio.WWW module would group together modules that don't have much in common with each other. I agree to your point that the category of internet access is more fundamental than the category of parsers. But still, which modules should then go into a Bio.WWW module? Any module whose sole purpose is to use the internet (that would exclude Bio.Entrez)? Any module whose main purpose is to use the internet? This would be unclear; for example, Bio.Entrez may or may not fall in that category, depending on how you use the module. Any module whose functionality includes internet access? Then if one day we add access to the JASPAR database over the internet to Bio.Motif, it would have to move to Bio.WWW. Currently most modules are organized by theme (Bio.Seq, Bio.Motif, Bio.Cluster, Bio.Phylo, Bio.Entrez, etc.). For each theme, we have one module, one chapter in the documentation, one test of unit tests, one set of doctests, which I think is a huge advantage both in terms of clarity and in terms of user experience. Best, -Michiel. --- On Wed, 1/30/13, Lenna Peterson wrote: From: Lenna Peterson Subject: Re: [Biopython-dev] Namespace for online resources? To: "Michiel de Hoon" Cc: "Biopython-Dev Mailing List" Date: Wednesday, January 30, 2013, 12:10 PM Michiel,? You raise an excellent point that separating the modules in this way will complicate doctests.? Regarding point (2), is your primary concern namespace clutter or importing efficiency?? I still maintain that the category of internet access is more fundamental than the category of parsers. For point (1), if every database is accessed using a WWW submodule, a user will know to look there. Obviously moving everything would be a lot of work... Cheers,? Lenna On Tue, Jan 29, 2013 at 9:00 PM, Michiel de Hoon wrote: Bio.WWW was one of those modules that seem a good idea at first, but then failed to gain general acceptance. There are three problems with Bio.WWW: 1) From the module name, it's not clear what you would find in it. For example, if you want to access the Entrez database, would you first look in Bio.Entrez or in Bio.WWW? Similarly for TAIR: Would you look for it in Bio.TAIR, or in Bio.WWW? 2) The modules in Bio.WWW don't have much to do with each other, except that they access the internet. But any given user probably is mainly interested in Entrez, or ExPASy, or some other database, not in all of them at the same time. 3) The flip side of this is that a user accessing e.g. ExPASy would have to import both Bio.WWW and Bio.ExPASy to be able to use ExPASy. Doctests get more complicated also, as they would span more than one module. Here is an example from Bio.Entrez that accesses the database, and then parses the results: >>> from Bio import Entrez >>> Entrez.email = "Your.Name.Here at example.org" >>> handle = Entrez.einfo() # or esearch, efetch, ... >>> record = Entrez.read(handle) >>> handle.close() The ultimate question is whether we organize the code in Biopython by their functionality from a user perspective, or by the kind of things they do? Almost all of Biopython is organized according to the former. For example, we don't have a Bio.Parsers module for all the parsers; similarly, we don't have Bio.WWW for internet access. Best, -Michiel. --- On Tue, 1/29/13, Peter Cock wrote: > From: Peter Cock > Subject: Re: [Biopython-dev] Namespace for online resources? > To: "Wibowo Arindrarto" > Cc: "Biopython-Dev Mailing List" > Date: Tuesday, January 29, 2013, 4:11 PM > On Tue, Jan 29, 2013 at 9:03 PM, > Peter Cock > wrote: > > On Tue, Jan 29, 2013 at 7:52 PM, Wibowo Arindrarto > > > wrote: > >> Hi everyone, > >> > >> Why was Bio.WWW deprecated in the first place? > >> > > > > The flippant answer is everything under Bio.WWW was > moved > > or deprecated: > > http://lists.open-bio.org/pipermail/biopython-dev/2008-July/004059.html > > > > I'm trying to identify the discussions prior to that > covering the moves: > > > > Bio.WWW.ExPASy -> Bio.ExPASy > > Bio.WWW.InterPro -> Bio.InterPro > > Bio.WWW.NCBI -> Bio.Entrez > > Bio.WWW.SCOP -> Bio.SCOP > > Probably this thread, > http://lists.open-bio.org/pipermail/biopython-dev/2007-November/003241.html > > Also a bit more background on the NCBI Entrez side: > http://lists.open-bio.org/pipermail/biopython-dev/2008-February/003423.html > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Fri Feb 1 14:14:56 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 14:14:56 +0000 Subject: [Biopython-dev] Namespace for online resources? In-Reply-To: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com> References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 1:54 PM, Michiel de Hoon wrote: > Hi Lenna, > >> Regarding point (2), is your primary concern namespace clutter or >> importing efficiency? > > Regarding point (2), my primary concern is that a Bio.WWW module would > group together modules that don't have much in common with each other. I > agree to your point that the category of internet access is more fundamental > than the category of parsers. But still, which modules should then go into a > Bio.WWW module? Any module whose sole purpose is to use the internet (that > would exclude Bio.Entrez)? Any module whose main purpose is to use the > internet? This would be unclear; for example, Bio.Entrez may or may not fall > in that category, depending on how you use the module. Any module whose > functionality includes internet access? Then if one day we add access to the > JASPAR database over the internet to Bio.Motif, it would have to move to > Bio.WWW. > > Currently most modules are organized by theme (Bio.Seq, Bio.Motif, > Bio.Cluster, Bio.Phylo, Bio.Entrez, etc.). For each theme, we have one > module, one chapter in the documentation, one test of unit tests, one set of > doctests, which I think is a huge advantage both in terms of clarity and in > terms of user experience. Also with the theme approach, most (if not all) the themes are likely to have some online resources (databases or remote APIs). On those grounds it makes sense to keep online motif functionality (like weblogo) under Bio.Motif, and so on. People leaning for a Bio.WWW grouping: Bow, Lenna, Kevin (which could be a big disruption with lots of code relocation) People leaning against a Bio.WWW grouping: Michiel, Peter (me) (which would also be the status quo, so no disruption). In the specific case of Kevin's TAIR code for fetch Arabidopsis sequences, Bio.TAIR (lower case?) is consistent with current usage. Somewhere under Bio.Seq* also seems sensible to me, as I wrote at the start of this thread. Regards, Peter From mjldehoon at yahoo.com Fri Feb 1 14:12:38 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 1 Feb 2013 06:12:38 -0800 (PST) Subject: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone In-Reply-To: Message-ID: <1359727958.56029.YahooMailClassic@web164001.mail.gq1.yahoo.com> Hi Peter, As we misunderstood each other, let me try once to make the case for putting test files in Bio/*. If I fail to convince you, let's either go back to the situation before Bio._utils, or remove the "if __name__ == '__main__':" stuff altogether. First of all, if we use "if __name__ == '__main__':" to run the docstring tests, then those tests should pass if a user executes the script. Otherwise, we have installed some code that makes no sense outside of the distribution. This is also a problem with the os.path.isfile("Fasta/sweetpea.nu"), "Run from Tests/ folder" solution, as after installation there is no Tests/ folder any more. Suppose we make a subdirectory Examples in each module that uses docstring tests which need some data files, and put the data files in the Examples subdirectory. The docstring tests are supposed to be simple (full testing is done by the unittests), so the example data files can be tiny. The docstring tests can then use >>> record = SeqIO.read(open("Examples/sweetpea.nu"),"fasta") which is simple enough. The unit tests can switch to the appropriate directory when running the docstring tests. A user, finding the example in the docstring tests, can try out the example directly, since the data file is provided together with the relevant module. And since the data file is in the subdirectory Examples/, there is still some separation between the code and the data. Best, -Michiel. --- On Fri, 2/1/13, Peter Cock wrote: > From: Peter Cock > Subject: Re: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone > To: "Michiel de Hoon" > Cc: "Wibowo Arindrarto" , "BioPython-Dev Mailing List" > Date: Friday, February 1, 2013, 7:14 AM > >Wibowo Arindrarto > wrote: > >> Just to be clear, we are: > >> > >> * changing all module's doctest file path to use > relative > >> paths (with > >> respect to the module's location), > >> * replacing the run_doctest() import with a simpler > doctest > >> import and > >> `doctest.testmod()` in each module having this > doctest > >> * resorting to setUp and tearDown in the > DocTestSuite in > >> `run_tests.py` so that each module / submodule can > find > >> their test > >> files > >> * and refactoring all string functions in > Bio._utils to > >> Bio.Phylo and > >> Bio.SearchIO, so that we can remove Bio._utils, > >> > >> right? > >> > >> I'd be happy to give this a shot if everyone feels > the same > >> :). > >> > > On Fri, Feb 1, 2013 at 11:23 AM, Michiel de Hoon > wrote: > > Hi Bow, > > > > Yes, that is correct. > > Responding to Peter's email: Peter, do you agree with > this approach? > > > > Best, > > -Michiel. > > No. I think we have misunderstood each other on the doctest > discussion :( > > If we keep the test files under Tests/ (and I think that is > best) > then for example look at this doctest in Bio/SeqRecord.py > > ? ? ? ? >>> from Bio import > SeqIO > ? ? ? ? >>> record = > SeqIO.read(open("Fasta/sweetpea.nu"),"fasta") > ? ? ? ? >>> len(record) > ? ? ? ? 309 > > That is currently written to assume it is run from the > Tests/ > folder. If we write this assuming is it in the Bio/ folder > where > the Python file SeqRecord.py lives, it becomes: > > ? ? ? ? >>> from Bio import > SeqIO > ? ? ? ? >>> record = > SeqIO.read(open("../Tests/Fasta/sweetpea.nu"),"fasta") > ? ? ? ? >>> len(record) > ? ? ? ? 309 > > I think a beginner would find that more confusing. It is > also longer > and we already have trouble with some lines exceeding 80 > chars. > > Ideally there would be a nice way for doctests to specify > the folder, > and then we could use a simple filename like "sweetpea.nu" > with > no directories at all. But I don't think that is possible > without us > making the testing infrastructure even more complicated. > > -- > > If we want to get rid of Bio._utils.run_doctest() (and the > whole of > the file Bio/_utils.py) then I would prefer reverting to the > old situation > prior to adding the Bio._utils.run_doctest() helper > function. > > If the repetitive code snippets to run the doctests of a > module are a > problem it can be shortened to something less flexible, for > example > in Bio/SeqRecord.py could use something very short like > this: > > if __name__ == "__main__": > ? ? assert os.path.isfile("Fasta/sweetpea.nu"), > "Run from Tests/ folder" > ? ? import doctest > ? ? doctest.testmod(verbose=2) > > Or, as I suggested before, we can remove these development > convenience hooks completely? > > -- > > On the subject of the string functions in Bio/_utils.py, I > have no > objection to moving them back under Bio.SearchIO and/or > Bio.Phylo - which has advantages in terms of modularity (a > good thing for preventing accidental side effects). > > Regards, > > Peter > From p.j.a.cock at googlemail.com Fri Feb 1 14:32:46 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 14:32:46 +0000 Subject: [Biopython-dev] Deprecating Bio.ParserSupport, Bio.Blast.NCBIStandalone In-Reply-To: <1359727958.56029.YahooMailClassic@web164001.mail.gq1.yahoo.com> References: <1359727958.56029.YahooMailClassic@web164001.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 2:12 PM, Michiel de Hoon wrote: > Hi Peter, > > As we misunderstood each other, let me try once to make the case for > putting test files in Bio/*. If I fail to convince you, let's either go back > to the situation before Bio._utils, or remove the "if __name__ == > '__main__':" stuff altogether. I'm not convinced about putting test files under Bio/* so lets revert the use of the helper function Bio._utils.run_doctest(), and if you wish proceed with removing Bio/_utils.py as well. Shall I go ahead and revert 8b59d89bb4e282192ddee751e24ceef4afa63528 then remove run_doctest and find_test_dir from Bio/_utils.py now? > First of all, if we use "if __name__ == '__main__':" to run the docstring > tests, then those tests should pass if a user executes the script. > Otherwise, we have installed some code that makes no sense outside of the > distribution. This is also a problem with the > os.path.isfile("Fasta/sweetpea.nu"), "Run from Tests/ folder" > solution, as after installation there is no Tests/ folder any more. That is a good point, this has always been a weakness of the __main__ hook to run the doctests. > Suppose we make a subdirectory Examples in each module that uses docstring > tests which need some data files, and put the data files in the Examples > subdirectory. The docstring tests are supposed to be simple (full testing is > done by the unittests), so the example data files can be tiny. > > The docstring tests can then use >>>> record = SeqIO.read(open("Examples/sweetpea.nu"),"fasta") > which is simple enough. > The unit tests can switch to the appropriate directory when running the > docstring tests. > A user, finding the example in the docstring tests, can try out the > example directly, since the data file is provided together with the relevant > module. > And since the data file is in the subdirectory Examples/, there is still > some separation between the code and the data. Did you envision installing the examples subdirectories next to the code under site-packages? Technically that is doable, but I'm not sure if that is considered good practice (does anyone know the relevant Debian policies for example - they're quite keen on this kind of thing?). I much prefer the simplicity of having all the test files in one place (under the Tests/ folder) especially as things like simple FASTA files get used in doctests and unittests for multiple different areas of Biopython. Regards, Peter From p.j.a.cock at googlemail.com Fri Feb 1 14:56:02 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 14:56:02 +0000 Subject: [Biopython-dev] Bio.Motif update In-Reply-To: <1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com> References: <1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 2:53 PM, Michiel de Hoon wrote: > Hi Peter and all, > > --- On Tue, 1/29/13, Peter Cock wrote: >> We need to say something about this in the NEWS file too. > > Done. > >> I think it would make sense to add a PendingDeprecationWarning >> to Bio.Motif now. > > Done. Thanks. >> Also, if you feel the new Bio.motifs API isn't quite >> settled yet, adding the new BiopythonExperimentalWarning to >> that makes sense. > > I don't expect big changes in the API, so I think we can do without the > BiopythonExperimentalWarning. Also we should avoid the situation > that Bio.Motif gives a DeprecationWarning, and Bio.Motifs gives a > BiopythonExperimentalWarning. Agreed. >> (And once this is settled, I think we can schedule the >> release) > > We should also check whether we can remove deprecated modules, > or deprecate modules that are currently declared obsolete. Or has > somebody done that already? I went over the list in the DEPRECATED file last month, but a second check would be a good idea. Peter From mjldehoon at yahoo.com Fri Feb 1 14:53:06 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 1 Feb 2013 06:53:06 -0800 (PST) Subject: [Biopython-dev] Bio.Motif update In-Reply-To: Message-ID: <1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com> Hi Peter and all, --- On Tue, 1/29/13, Peter Cock wrote: > We need to say something about this in the NEWS file too. Done. > I think it would make sense to add a PendingDeprecationWarning > to Bio.Motif now. Done. > Also, if you feel the new Bio.motifs API isn't quite > settled yet, adding the new BiopythonExperimentalWarning to > that makes sense. I don't expect big changes in the API, so I think we can do without the BiopythonExperimentalWarning. Also we should avoid the situation that Bio.Motif gives a DeprecationWarning, and Bio.Motifs gives a BiopythonExperimentalWarning. > (And once this is settled, I think we can schedule the > release) We should also check whether we can remove deprecated modules, or deprecate modules that are currently declared obsolete. Or has somebody done that already? Best, -Michiel From p.j.a.cock at googlemail.com Fri Feb 1 15:03:24 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 15:03:24 +0000 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? Message-ID: Hello all, I think we're overdue for a Biopython release now, and I would like to do this next week. There are still plenty more additions and enhancements waiting in the wings, but right now I just want any remaining bug fixes addressed. Are there any release blocking issues? Thanks, Peter From w.arindrarto at gmail.com Fri Feb 1 15:29:09 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 1 Feb 2013 16:29:09 +0100 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: Message-ID: Hi Peter, > I think we're overdue for a Biopython release now, and I would > like to do this next week. There are still plenty more additions > and enhancements waiting in the wings, but right now I just > want any remaining bug fixes addressed. > > Are there any release blocking issues? There's still one bug for Bio.SearchIO that I would prefer to be fixed (https://redmine.open-bio.org/issues/3400). Is it possible to wait a few more days (no later than next week I hope) to sort this bug out? Also, since this is our first release with the BiopythonExperimentalWarning, I was thinking maybe we can include some modules that have been in the waiting line. One that I can think of right now is Andrew's MafIO (re: the recent mention as well). Considering that some people have started using it, maybe we can release it under a BiopythonExperimentalWarning. And later down the line, perhaps we can include Brad's GTF/GFF parser (seeing that this is already included in the wiki, maybe it's a good time to start considering where to put it)? Brad, if you don't mind, perhaps we can start working on this as well. Regards, Bow From p.j.a.cock at googlemail.com Fri Feb 1 15:40:03 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 15:40:03 +0000 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: Message-ID: On Fri, Feb 1, 2013 at 3:29 PM, Wibowo Arindrarto wrote: > Hi Peter, > >> I think we're overdue for a Biopython release now, and I would >> like to do this next week. There are still plenty more additions >> and enhancements waiting in the wings, but right now I just >> want any remaining bug fixes addressed. >> >> Are there any release blocking issues? > > There's still one bug for Bio.SearchIO that I would prefer to be fixed > (https://redmine.open-bio.org/issues/3400). Is it possible to wait a > few more days (no later than next week I hope) to sort this bug out? A few days sure - but that is a small enough issue (and in a clearly marked 'here be dragons experimental code' section) that I don't think it should delay the whole release. > Also, since this is our first release with the > BiopythonExperimentalWarning, I was thinking maybe we can include some > modules that have been in the waiting line. One that I can think of > right now is Andrew's MafIO (re: the recent mention as well). > Considering that some people have started using it, maybe we can > release it under a BiopythonExperimentalWarning. > > And later down the line, perhaps we can include Brad's GTF/GFF parser > (seeing that this is already included in the wiki, maybe it's a good > time to start considering where to put it)? Brad, if you don't mind, > perhaps we can start working on this as well. Both examples of things I would like to do *after* shipping Biopython 1.61, which I feel is already overdue. Regards, Peter From mjldehoon at yahoo.com Fri Feb 1 15:39:15 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 1 Feb 2013 07:39:15 -0800 (PST) Subject: [Biopython-dev] Bio.Motif update In-Reply-To: Message-ID: <1359733155.26451.YahooMailClassic@web164006.mail.gq1.yahoo.com> --- On Fri, 2/1/13, Peter Cock wrote: > I went over the list in the DEPRECATED file last month, but > a second check would be a good idea. The following were declared obsolete in Biopython 1.60, and can in principle be declared deprecated in Biopython 1.61: ---------- Bio/Blast/Applications.py: BlastallCommandline BlastpgpCommandline RpsBlastCommandline Bio/Blast/NCBIStandalone.py overall, and specifically: blastall blastpgp rpsblast Bio/ParserSupport.py overall Bio/PDB/AbstractPropertyMap.py: The has_key function in class AbstractPropertyMap Bio/PDB/FragmentMapper.py: The has_key function in class FragmentMapper Bio/UniGene/UniGene.py overall In BioSQL/BioSeqDatabase.py: class DBServer: remove_database class BioSeqDatabase: get_all_primary_ids get_Seq_by_primary_id ----------- These functions were deprecated in Biopython 1.59 or earlier, and could be removed for Biopython 1.61: Bio/Align/__init__.py: class MultipleSeqAlignment: get_column add_sequence Bio/Align/Generic.py: class Alignment overall get_all_seqs get_seq_by_num Bio/File.py: class StringHandle Bio/Graphics/GenomeDiagram/_AbstractDrawer.py: class AbstractDrawer: _set_xcentre, _set_ycentre Bio/Graphics/GenomeDiagram/_Graph.py: class GraphData: _set_centre Bio/ParserSupport.py: SGMLStrippingConsumer Bio/Seq.py: class Seq: .data property Bio/SeqIO/SffIO.py: _sff_read_roche_index_xml -------------------- The tostring() method of the class Seq in Bio/Seq.py: Can we declare this obsolete? -Michiel From w.arindrarto at gmail.com Fri Feb 1 15:47:14 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 1 Feb 2013 16:47:14 +0100 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: <510BE201.4090002@biotech.uni-tuebingen.de> References: <510BE201.4090002@biotech.uni-tuebingen.de> Message-ID: Hi Peter, Kai, >> There's still one bug for Bio.SearchIO that I would prefer to be >> fixed (https://redmine.open-bio.org/issues/3400). Is it possible to >> wait a few more days (no later than next week I hope) to sort this >> bug out? > > Sorry for letting this slip for so long, but I never got around to > write an actual test case. > > Bow, did we agree to use optionalcascade for now and then maybe > refactor? I'm pretty confident the code works as-is, all the BioPython > issues I've been running into with my production site have been in the > GenBank/EMBL parsers. :) Yes, we did :). I meant to do the optionalcascade refactor so the code is more maintainable (and to prevent a corner case bug). But in general, the optionalcascade fix looks to be fine. And for code marked with the BiopythonExperimentalWarning, having a fix without test cases seems better than no fix at all. Peter, if you're fine with Kai's fix, I think we can mark this bug solved and go on with the release. I'll add the test cases and refactor the code later on. Regards, Bow From p.j.a.cock at googlemail.com Fri Feb 1 15:51:07 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 15:51:07 +0000 Subject: [Biopython-dev] Bio.Motif update In-Reply-To: <1359733155.26451.YahooMailClassic@web164006.mail.gq1.yahoo.com> References: <1359733155.26451.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 3:39 PM, Michiel de Hoon wrote: > --- On Fri, 2/1/13, Peter Cock wrote: >> I went over the list in the DEPRECATED file last month, but >> a second check would be a good idea. > > The following were declared obsolete in Biopython 1.60, and can > in principle be declared deprecated in Biopython 1.61: > > ---------- > Bio/Blast/Applications.py: > BlastallCommandline > BlastpgpCommandline > RpsBlastCommandline My impression is there is still a sizeable group of people still using blastall and the rest of legacy BLAST as it is mature reliable code, while BLAST+ still has some rough edges. But as the NCBI themselves have now stopped updating legacy BLAST, perhaps the time has come. So if you want, deprecating the legacy BLAST wrappers seems OK. > Bio/Blast/NCBIStandalone.py overall, and specifically: > blastall > blastpgp > rpsblast Given the SearchIO use of the plain text BLAST parser, I think we agreed to leave that as is in the short term. The command line calling functions blastall, blastpgp & rpsblast the same applies as for BlastallCommandline, BlastpgpCommandline and RpsBlastCommandline (above). > Bio/ParserSupport.py overall Given the SearchIO use of the plain text BLAST parser which uses this, I think we agreed to leave that as is in the short term. > Bio/PDB/AbstractPropertyMap.py: > The has_key function in class AbstractPropertyMap > > Bio/PDB/FragmentMapper.py: > The has_key function in class FragmentMapper The Python dict lost the has_key function in Python 3, so it does make sense to proceed with those deprecations. > Bio/UniGene/UniGene.py overall > Yes, ready to deprecate. > In BioSQL/BioSeqDatabase.py: > class DBServer: > remove_database > class BioSeqDatabase: > get_all_primary_ids > get_Seq_by_primary_id Yes, ready to deprecate. Thanks, Peter From p.j.a.cock at googlemail.com Fri Feb 1 16:02:33 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 16:02:33 +0000 Subject: [Biopython-dev] Bugzilla --> RedMine --> GitHub issues? Message-ID: On Fri, Feb 1, 2013 at 3:40 PM, Kai Blin wrote: > > PS: I'd have replied on the bug tracker, but for some reason I can't > log in again, even after resetting the password. For some reason > redmine hates me. > Biopython used to use Bugzilla, at http://bugzilla.open-bio.org/ (it was left as a read only legacy listing, but it broke last year when the old server started to die and isn't really worth fixing). This was moved over to RedMine, along with all the other OBF projects. This does have some git integration, but I'm not that taken with it - and it is yet another service for the OBF team to maintain. What do people think of moving over to using GitHub issues? This would link in very well with pull requests and makes linking to commits much simpler too. One potential issue is if and how we could have bug reports sent to the biopython-dev mailing list (something we touched on recently for pull requests). A full automated move could be possible (NumPy did this), but I think a gradual move would be fine - stop filing new issues on RedMine and use GitHub issues in future. There are only about 100 issues open at the moment anyway, and a manual migration would also be a good way to review some of the older tickets. Thoughts?, Peter From p.j.a.cock at googlemail.com Fri Feb 1 16:04:10 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 16:04:10 +0000 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: <510BE201.4090002@biotech.uni-tuebingen.de> Message-ID: On Fri, Feb 1, 2013 at 3:47 PM, Wibowo Arindrarto wrote: > Hi Peter, Kai, > > >>> There's still one bug for Bio.SearchIO that I would prefer to be >>> fixed (https://redmine.open-bio.org/issues/3400). Is it possible to >>> wait a few more days (no later than next week I hope) to sort this >>> bug out? >> >> Sorry for letting this slip for so long, but I never got around to >> write an actual test case. >> >> Bow, did we agree to use optionalcascade for now and then maybe >> refactor? I'm pretty confident the code works as-is, all the BioPython >> issues I've been running into with my production site have been in the >> GenBank/EMBL parsers. :) > > Yes, we did :). I meant to do the optionalcascade refactor so the code > is more maintainable (and to prevent a corner case bug). But in > general, the optionalcascade fix looks to be fine. And for code marked > with the BiopythonExperimentalWarning, having a fix without test cases > seems better than no fix at all. That sounds OK for now. > Peter, if you're fine with Kai's fix, I think we can mark this bug > solved and go on with the release. I'll add the test cases and > refactor the code later on. You mean this patch from https://redmine.open-bio.org/issues/3400 ?: https://redmine.open-bio.org/attachments/1754/0001-SearchIO-Add-optionalcascade-getter-setter-to-allow-.patch I can apply that if you want. Peter From redmine at redmine.open-bio.org Fri Feb 1 16:04:20 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Fri, 1 Feb 2013 16:04:20 +0000 Subject: [Biopython-dev] [Biopython - Bug #3405] (New) to_networkx converts weights as string Message-ID: Issue #3405 has been reported by Aleksey Kladov. ---------------------------------------- Bug #3405: to_networkx converts weights as string https://redmine.open-bio.org/issues/3405 Author: Aleksey Kladov Status: New Priority: Normal Assignee: Category: Target version: URL: in the file /Bio/Phylo/_utils.py in the method add_edge(graph, n1, n2) there is a line
 graph.add_edge(n1, n2, weight=str(n2.branch_length or 1.0)) 
It's strange, because if weights are strings, then you are unable to find shortest paths due to
TypeError: unsupported operand type(s) for +: 'int' and 'str'
---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From kai.blin at biotech.uni-tuebingen.de Fri Feb 1 15:40:49 2013 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 01 Feb 2013 16:40:49 +0100 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: Message-ID: <510BE201.4090002@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2013-02-01 16:29, Wibowo Arindrarto wrote: > There's still one bug for Bio.SearchIO that I would prefer to be > fixed (https://redmine.open-bio.org/issues/3400). Is it possible to > wait a few more days (no later than next week I hope) to sort this > bug out? Sorry for letting this slip for so long, but I never got around to write an actual test case. Bow, did we agree to use optionalcascade for now and then maybe refactor? I'm pretty confident the code works as-is, all the BioPython issues I've been running into with my production site have been in the GenBank/EMBL parsers. :) Cheers, Kai PS: I'd have replied on the bug tracker, but for some reason I can't log in again, even after resetting the password. For some reason redmine hates me. - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJRC+IBAAoJEKM5lwBiwTTPlisH/1QSF+4jIx2jKycRCys1NPMj 6YwFTdKoGmIDYjEB+qge5PKNIHplN3EsGz6l4bRYZiWbqTlyvb5IUPHgwxFRigXg VuSnR/k8faSLNuGJpoFezLmZ0yJoLslXztCUJ+HbWXB02K9uzYXovRg8AtfHlnOu Qd9aNbyX/nzFrsayllTvYy9ZxcQNCH5Lrgm+EWMkuBptcMdBLjqSGkov5iE2g1bV ItHacrQUPJXVIAMTXW9mSy3AXzTqjOjqfBwXsthLSyXHEv8ppcnIi4bmVX+XS//n 4vc+LdaxzgkENaw4P+60bikkFqey/GFoxaIzLACh4HFupRAjK+6NaUzGYPSEQXM= =efd0 -----END PGP SIGNATURE----- From p.j.a.cock at googlemail.com Fri Feb 1 16:25:56 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 16:25:56 +0000 Subject: [Biopython-dev] Bugzilla --> RedMine --> GitHub issues? In-Reply-To: References: Message-ID: On Fri, Feb 1, 2013 at 4:02 PM, Peter Cock wrote: > > What do people think of moving over to using GitHub issues? > This would link in very well with pull requests and makes linking > to commits much simpler too. One potential issue is if and how > we could have bug reports sent to the biopython-dev mailing list > (something we touched on recently for pull requests). > I've filled an issue for that (I couldn't find any open issue like it): https://github.com/gitlabhq/gitlabhq/issues/2884 Peter From kai.blin at biotech.uni-tuebingen.de Fri Feb 1 16:27:13 2013 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 01 Feb 2013 17:27:13 +0100 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: <510BEAEF.4070108@biotech.uni-tuebingen.de> References: <510BE201.4090002@biotech.uni-tuebingen.de> <510BEAEF.4070108@biotech.uni-tuebingen.de> Message-ID: <510BECE1.4020306@biotech.uni-tuebingen.de> On 2013-02-01 17:18, Kai Blin wrote: > That's not quite it. Let me update my bug3400 branch and submit a > pull request. Will be ready in a minute. https://github.com/biopython/biopython/pull/150 Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From kai.blin at biotech.uni-tuebingen.de Fri Feb 1 16:18:55 2013 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Fri, 01 Feb 2013 17:18:55 +0100 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: <510BE201.4090002@biotech.uni-tuebingen.de> Message-ID: <510BEAEF.4070108@biotech.uni-tuebingen.de> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 2013-02-01 17:04, Peter Cock wrote: Hi Peter, >> Peter, if you're fine with Kai's fix, I think we can mark this >> bug solved and go on with the release. I'll add the test cases >> and refactor the code later on. > > You mean this patch from https://redmine.open-bio.org/issues/3400 > ?: > https://redmine.open-bio.org/attachments/1754/0001-SearchIO-Add-optionalcascade-getter-setter-to-allow-.patch > > I can apply that if you want. That's not quite it. Let me update my bug3400 branch and submit a pull request. Will be ready in a minute. Cheers, Kai - -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJRC+rvAAoJEKM5lwBiwTTPYH4H+QGiY5cyN7tFjT2RZGN28Pp8 2t/RbW9bYakVqKHtZR6xXu4QF48jCmHkkER0cMvDuKcWrjko/xAWSGuNqWK59rHe b7t9CgGywYC9KdhPih+pG5HzKuc9ZP1c2unK/e+c+y8rrFZTUoB1e2AbGqzg163S qplu0RGv8kSOMXmGVFNj+iZ/AJnN735Tp5gfzFHfudS13kzfqW+Mq1+DlSG1GOwM Y99kc6Uc5WFHmHME4pDdlLBGyKVd+9LlQnTeApBjWnBDcRBMyXI0HIck6Bw64swH BvPIz2yq3PEnhvgI0v0A9lO1xR0Yj9wGQGr8XGPLq0UHh0W0O0P1I8YbMCVHkPg= =kCtp -----END PGP SIGNATURE----- From p.j.a.cock at googlemail.com Fri Feb 1 16:50:57 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 16:50:57 +0000 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: Message-ID: On Fri, Feb 1, 2013 at 3:03 PM, Peter Cock wrote: > Hello all, > > I think we're overdue for a Biopython release now, and I would > like to do this next week. There are still plenty more additions > and enhancements waiting in the wings, but right now I just > want any remaining bug fixes addressed. > > Are there any release blocking issues? > > Thanks, > > Peter I won't have time to look at it today, but the BLAST+ wrappers need updating for the BLAST 2.2.27+ release, e.g. new arg frame_shift_penalty (checked via test_NCBI_BLAST_tools.py). Any volunteers? This should be a small job... Peter From w.arindrarto at gmail.com Fri Feb 1 17:37:57 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 1 Feb 2013 18:37:57 +0100 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: Message-ID: Hi Peter, >> I think we're overdue for a Biopython release now, and I would >> like to do this next week. There are still plenty more additions >> and enhancements waiting in the wings, but right now I just >> want any remaining bug fixes addressed. >> >> Are there any release blocking issues? >> >> Thanks, >> >> Peter > > I won't have time to look at it today, but the BLAST+ wrappers > need updating for the BLAST 2.2.27+ release, e.g. new arg > frame_shift_penalty (checked via test_NCBI_BLAST_tools.py). > > Any volunteers? This should be a small job... I've submitted a pull request here: https://github.com/biopython/biopython/pull/151 From w.arindrarto at gmail.com Fri Feb 1 17:43:23 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 1 Feb 2013 18:43:23 +0100 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: Message-ID: > Hi Peter, > >>> I think we're overdue for a Biopython release now, and I would >>> like to do this next week. There are still plenty more additions >>> and enhancements waiting in the wings, but right now I just >>> want any remaining bug fixes addressed. >>> >>> Are there any release blocking issues? >>> >>> Thanks, >>> >>> Peter >> >> I won't have time to look at it today, but the BLAST+ wrappers >> need updating for the BLAST 2.2.27+ release, e.g. new arg >> frame_shift_penalty (checked via test_NCBI_BLAST_tools.py). >> >> Any volunteers? This should be a small job... > > I've submitted a pull request here: > https://github.com/biopython/biopython/pull/151 Wops, sorry for sending an incomplete mail ~ I wanted to add that some test_NCBI_BLAST_tools.py doesn't correctly detect my blast installations (even though I have it). I had to comment out the "Install BLAST+ ..." notice and the rpsblast test (for some reason it keeps saying I don't have rpsblast too, even though I do). Anyway, these are not in the pull request, just something I did when writing this fix. Could you confirm that the fixes are OK? Hope that helps, Bow From w.arindrarto at gmail.com Fri Feb 1 17:48:09 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 1 Feb 2013 18:48:09 +0100 Subject: [Biopython-dev] Bugzilla --> RedMine --> GitHub issues? In-Reply-To: References: Message-ID: >> PS: I'd have replied on the bug tracker, but for some reason I can't >> log in again, even after resetting the password. For some reason >> redmine hates me. >> > > Biopython used to use Bugzilla, at http://bugzilla.open-bio.org/ > (it was left as a read only legacy listing, but it broke last year when > the old server started to die and isn't really worth fixing). > > This was moved over to RedMine, along with all the other OBF > projects. This does have some git integration, but I'm not that > taken with it - and it is yet another service for the OBF team > to maintain. > > What do people think of moving over to using GitHub issues? > This would link in very well with pull requests and makes linking > to commits much simpler too. One potential issue is if and how > we could have bug reports sent to the biopython-dev mailing list > (something we touched on recently for pull requests). > > A full automated move could be possible (NumPy did this), but I > think a gradual move would be fine - stop filing new issues on > RedMine and use GitHub issues in future. There are only about > 100 issues open at the moment anyway, and a manual migration > would also be a good way to review some of the older tickets. > > Thoughts?, Moving to GitHub sounds good to me. I'd prefer if we go over the issues manually (removing the obsolete ones and keeping the current ones). As per the bug reports sending to the mailing list, could we perhaps create our own custom hooks? e.g. anytime a pull request is issued, an email would be sent (see https://github.com/github/github-services and http://developer.github.com/v3/repos/hooks/#create-a-hook) Regards, Bow From arklenna at gmail.com Fri Feb 1 19:05:18 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Fri, 1 Feb 2013 14:05:18 -0500 Subject: [Biopython-dev] Namespace for online resources? In-Reply-To: References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock wrote: > > People leaning for a Bio.WWW grouping: Bow, Lenna, Kevin > (which could be a big disruption with lots of code relocation) > > People leaning against a Bio.WWW grouping: Michiel, Peter (me) > (which would also be the status quo, so no disruption). > > I concede that the potential benefit of refactoring to separate WWW is outweighed both by potential downsides and the disruption and effort involved. In the specific case of Kevin's TAIR code for fetch Arabidopsis sequences, > Bio.TAIR (lower case?) is consistent with current usage. Somewhere under > Bio.Seq* also seems sensible to me, as I wrote at the start of this thread. > > Populating the top level namespace with a submodule for each web-only service has the risk of creating too many submodules. Bio.Seq* makes sense, because the TAIR code pulls data into a Seq. Web services that connect to a single biopython representation can be organized under that submodule. Web services that return multiple types of information (e.g. Entrez) are big enough to logically comprise their own submodule. Is my interpretation of the biopython classification scheme more or less correct? Cheers, Lenna From p.j.a.cock at googlemail.com Fri Feb 1 21:00:10 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 1 Feb 2013 21:00:10 +0000 Subject: [Biopython-dev] Namespace for online resources? In-Reply-To: References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 7:05 PM, Lenna Peterson wrote: > On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock > wrote: >> >> >> People leaning for a Bio.WWW grouping: Bow, Lenna, Kevin >> (which could be a big disruption with lots of code relocation) >> >> People leaning against a Bio.WWW grouping: Michiel, Peter (me) >> (which would also be the status quo, so no disruption). >> > > I concede that the potential benefit of refactoring to separate WWW is > outweighed both by potential downsides and the disruption and effort > involved. > >> In the specific case of Kevin's TAIR code for fetch Arabidopsis sequences, >> Bio.TAIR (lower case?) is consistent with current usage. Somewhere under >> Bio.Seq* also seems sensible to me, as I wrote at the start of this >> thread. >> > > Populating the top level namespace with a submodule for each web-only > service has the risk of creating too many submodules. Bio.Seq* makes sense, > because the TAIR code pulls data into a Seq. Web services that connect to a > single biopython representation can be organized under that submodule. Web > services that return multiple types of information (e.g. Entrez) are big > enough to logically comprise their own submodule. > > Is my interpretation of the biopython classification scheme more or less > correct? Yes that sounds about right :) Of course, the historical muddle of Bio.Seq* is something we've talked about addressing recently - see this thread from October, http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html Peter From natemsutton at yahoo.com Fri Feb 1 21:54:42 2013 From: natemsutton at yahoo.com (Nate Sutton) Date: Fri, 1 Feb 2013 13:54:42 -0800 (PST) Subject: [Biopython-dev] New BioPython member In-Reply-To: References: <1359494577.29159.YahooMailNeo@web122606.mail.ne1.yahoo.com> Message-ID: <1359755682.16563.YahooMailNeo@web122605.mail.ne1.yahoo.com> Thanks for the welcome! ?Also, I looked briefly through the code with the files you wrote about and I see the command line app wrapping components you described. ?I appreciate the advice about how the do the wrapper and I am glad to know of that pattern of command line app wrapping that is consistent with code in other places of BioPython. ?Thanks for the other advice including possibly asking for guidance. ?I?ll just give it a shot and hopefully things go smoothly but it being my first BioPython coding I appreciate the support. Thanks, Nate ________________________________ From: Peter Cock To: Nate Sutton Cc: "biopython-dev at lists.open-bio.org" Sent: Wednesday, January 30, 2013 2:31 AM Subject: Re: [Biopython-dev] New BioPython member On Tue, Jan 29, 2013 at 9:22 PM, Nate Sutton wrote: > Dear all, > > I just recently joined the BioPython developers group and am > looking forward to contributing to BioPython!? I have worked for a while > in programming, genetics, and biology and have > a m.s. in Biomedical Informatics.? After > talking with some fellow contributors I have decided to try working on > https://redmine.open-bio.org/issues/3360 but I will also work on writing > some documentation on examples from the > cookbook, especially if I am stuck on the bug.? If anyone wants to work on > the same things, I?d be glad to hear that, I > may be slow on the work because I am still learning Python after coming > from > other languages. > > -Nate Hi Nate, and welcome. Eric is in charge of the Bio.Phylo module, but within that the command line application wrappers under Bio.Phylo.Applications follow a pattern used elsewhere in Biopython. To add a wrapper for fasttree http://www.microbesonline.org/fasttree/ have a look at the existing wrappers for PHYML and RAXML, defined in Bio/Phylo/Applications/_Phyml.py and Bio/Phylo/Applications/_Raxml.py (leading underscores mean private modules in Python), which are exposed to the user via Bio/Phylo/Applications/__init__.py In this case, I'd suggest putting the new wrapper in a new file, Bio/Phylo/Applications/_fastree.py Other similar wrappers existing under Bio.Emboss, Bio.Align, etc. Don't be shy about asking for guidance on this, or git and github. Ultimately I'm hoping you'll be able to do is take a fork (personally copy of the repository) on GitHub, create a new fasttree branch, commit your enhancements, and make a pull request. If that's all too much for now, simply writing the new file and letting us do the git side would be fine. Regards, Peter From k.d.murray.91 at gmail.com Fri Feb 1 23:59:57 2013 From: k.d.murray.91 at gmail.com (Kevin Murray) Date: Sat, 2 Feb 2013 10:59:57 +1100 Subject: [Biopython-dev] Namespace for online resources? In-Reply-To: References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: Hi All, How about this: In the vein of Lenna's last email, we create a module WebSeq (or Seq.Web, or whatever), containing modules whose sole purpose is to get sequences (Seq/SeqRecord objects) from an internet database. This would i think provide a good balance between a messy top-level domain full of modules like Bio.tair, and the absolutisim of having anything vaugly web related in a single WWW module. It should also provide the unified theme per module which Michiel talks of, and unit/doctests should be fine, as no modules will be split (simply moved in their entirety from Bio.x to Bio.WebSeq.x). >From a quick look, the only candiate (apart from TAIR) for a shift is TogoWS, and even then I'm not sure, as TogoWS isn't used just for Seq's (and does not return them). Regards Kevin Murray On 2 February 2013 08:00, Peter Cock wrote: > On Fri, Feb 1, 2013 at 7:05 PM, Lenna Peterson wrote: > > On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock > > wrote: > >> > >> > >> People leaning for a Bio.WWW grouping: Bow, Lenna, Kevin > >> (which could be a big disruption with lots of code relocation) > >> > >> People leaning against a Bio.WWW grouping: Michiel, Peter (me) > >> (which would also be the status quo, so no disruption). > >> > > > > I concede that the potential benefit of refactoring to separate WWW is > > outweighed both by potential downsides and the disruption and effort > > involved. > > > >> In the specific case of Kevin's TAIR code for fetch Arabidopsis > sequences, > >> Bio.TAIR (lower case?) is consistent with current usage. Somewhere under > >> Bio.Seq* also seems sensible to me, as I wrote at the start of this > >> thread. > >> > > > > Populating the top level namespace with a submodule for each web-only > > service has the risk of creating too many submodules. Bio.Seq* makes > sense, > > because the TAIR code pulls data into a Seq. Web services that connect > to a > > single biopython representation can be organized under that submodule. > Web > > services that return multiple types of information (e.g. Entrez) are big > > enough to logically comprise their own submodule. > > > > Is my interpretation of the biopython classification scheme more or less > > correct? > > Yes that sounds about right :) > > Of course, the historical muddle of Bio.Seq* is something we've talked > about addressing recently - see this thread from October, > http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From mjldehoon at yahoo.com Sat Feb 2 01:36:03 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 1 Feb 2013 17:36:03 -0800 (PST) Subject: [Biopython-dev] Namespace for online resources? In-Reply-To: Message-ID: <1359768963.25565.YahooMailClassic@web164004.mail.gq1.yahoo.com> In principle I am OK with this, but is TAIR only used for sequences? Or is it possible / likely that in the future we may want to add other functionality to TAIR? Anyway, if TAIR is predominantly used for sequences, then Bio.Seq.Web is a good option I think. Best, -Michiel. --- On Fri, 2/1/13, Kevin Murray wrote: > From: Kevin Murray > Subject: Re: [Biopython-dev] Namespace for online resources? > To: "Peter Cock" > Cc: "Biopython-Dev Mailing List" > Date: Friday, February 1, 2013, 6:59 PM > Hi All, > > How about this: > In the vein of Lenna's last email, we create a module WebSeq > (or Seq.Web, > or whatever), containing modules whose sole purpose is to > get sequences > (Seq/SeqRecord objects) from an internet database. This > would i think > provide a good balance between a messy top-level domain full > of modules > like Bio.tair, and the absolutisim of having anything vaugly > web related in > a single WWW module. It should also provide the unified > theme per module > which Michiel talks of, and unit/doctests should be fine, as > no modules > will be split (simply moved in their entirety from Bio.x to > Bio.WebSeq.x). > > >From a quick look, the only candiate (apart from TAIR) > for a shift is > TogoWS, and even then I'm not sure, as TogoWS isn't used > just for Seq's > (and does not return them). > > Regards > Kevin Murray > > > On 2 February 2013 08:00, Peter Cock > wrote: > > > On Fri, Feb 1, 2013 at 7:05 PM, Lenna Peterson > wrote: > > > On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock > > > wrote: > > >> > > >> > > >> People leaning for a Bio.WWW grouping: Bow, > Lenna, Kevin > > >> (which could be a big disruption with lots of > code relocation) > > >> > > >> People leaning against a Bio.WWW grouping: > Michiel, Peter (me) > > >> (which would also be the status quo, so no > disruption). > > >> > > > > > > I concede that the potential benefit of > refactoring to separate WWW is > > > outweighed both by potential downsides and the > disruption and effort > > > involved. > > > > > >> In the specific case of Kevin's TAIR code for > fetch Arabidopsis > > sequences, > > >> Bio.TAIR (lower case?) is consistent with > current usage. Somewhere under > > >> Bio.Seq* also seems sensible to me, as I wrote > at the start of this > > >> thread. > > >> > > > > > > Populating the top level namespace with a > submodule for each web-only > > > service has the risk of creating too many > submodules. Bio.Seq* makes > > sense, > > > because the TAIR code pulls data into a Seq. Web > services that connect > > to a > > > single biopython representation can be organized > under that submodule. > > Web > > > services that return multiple types of information > (e.g. Entrez) are big > > > enough to logically comprise their own submodule. > > > > > > Is my interpretation of the biopython > classification scheme more or less > > > correct? > > > > Yes that sounds about right :) > > > > Of course, the historical muddle of Bio.Seq* is > something we've talked > > about addressing recently - see this thread from > October, > > http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html > > > > Peter > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From k.d.murray.91 at gmail.com Sat Feb 2 06:00:34 2013 From: k.d.murray.91 at gmail.com (Kevin Murray) Date: Sat, 2 Feb 2013 17:00:34 +1100 Subject: [Biopython-dev] Namespace for online resources? In-Reply-To: <1359768963.25565.YahooMailClassic@web164004.mail.gq1.yahoo.com> References: <1359768963.25565.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: Michiel, TAIR (http://www.arabidopsis.org/) is primarily a sequence repository. I have no intention to extend it beyond that, and any other features would not be easily scriptable, or would be pointless to include in Biopython. Regards Kevin Murray On 2 February 2013 12:36, Michiel de Hoon wrote: > In principle I am OK with this, but is TAIR only used for sequences? Or is > it possible / likely that in the future we may want to add other > functionality to TAIR? Anyway, if TAIR is predominantly used for sequences, > then Bio.Seq.Web is a good option I think. > > Best, > -Michiel. > > --- On Fri, 2/1/13, Kevin Murray wrote: > > > From: Kevin Murray > > Subject: Re: [Biopython-dev] Namespace for online resources? > > To: "Peter Cock" > > Cc: "Biopython-Dev Mailing List" > > Date: Friday, February 1, 2013, 6:59 PM > > Hi All, > > > > How about this: > > In the vein of Lenna's last email, we create a module WebSeq > > (or Seq.Web, > > or whatever), containing modules whose sole purpose is to > > get sequences > > (Seq/SeqRecord objects) from an internet database. This > > would i think > > provide a good balance between a messy top-level domain full > > of modules > > like Bio.tair, and the absolutisim of having anything vaugly > > web related in > > a single WWW module. It should also provide the unified > > theme per module > > which Michiel talks of, and unit/doctests should be fine, as > > no modules > > will be split (simply moved in their entirety from Bio.x to > > Bio.WebSeq.x). > > > > >From a quick look, the only candiate (apart from TAIR) > > for a shift is > > TogoWS, and even then I'm not sure, as TogoWS isn't used > > just for Seq's > > (and does not return them). > > > > Regards > > Kevin Murray > > > > > > On 2 February 2013 08:00, Peter Cock > > wrote: > > > > > On Fri, Feb 1, 2013 at 7:05 PM, Lenna Peterson > > wrote: > > > > On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock < > p.j.a.cock at googlemail.com> > > > > wrote: > > > >> > > > >> > > > >> People leaning for a Bio.WWW grouping: Bow, > > Lenna, Kevin > > > >> (which could be a big disruption with lots of > > code relocation) > > > >> > > > >> People leaning against a Bio.WWW grouping: > > Michiel, Peter (me) > > > >> (which would also be the status quo, so no > > disruption). > > > >> > > > > > > > > I concede that the potential benefit of > > refactoring to separate WWW is > > > > outweighed both by potential downsides and the > > disruption and effort > > > > involved. > > > > > > > >> In the specific case of Kevin's TAIR code for > > fetch Arabidopsis > > > sequences, > > > >> Bio.TAIR (lower case?) is consistent with > > current usage. Somewhere under > > > >> Bio.Seq* also seems sensible to me, as I wrote > > at the start of this > > > >> thread. > > > >> > > > > > > > > Populating the top level namespace with a > > submodule for each web-only > > > > service has the risk of creating too many > > submodules. Bio.Seq* makes > > > sense, > > > > because the TAIR code pulls data into a Seq. Web > > services that connect > > > to a > > > > single biopython representation can be organized > > under that submodule. > > > Web > > > > services that return multiple types of information > > (e.g. Entrez) are big > > > > enough to logically comprise their own submodule. > > > > > > > > Is my interpretation of the biopython > > classification scheme more or less > > > > correct? > > > > > > Yes that sounds about right :) > > > > > > Of course, the historical muddle of Bio.Seq* is > > something we've talked > > > about addressing recently - see this thread from > > October, > > > > http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009999.html > > > > > > Peter > > > _______________________________________________ > > > Biopython-dev mailing list > > > Biopython-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > From eric.talevich at gmail.com Sat Feb 2 22:29:57 2013 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 2 Feb 2013 17:29:57 -0500 Subject: [Biopython-dev] Namespace for online resources? In-Reply-To: References: <1359726886.99038.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 9:14 AM, Peter Cock wrote: > On Fri, Feb 1, 2013 at 1:54 PM, Michiel de Hoon > wrote: > > Hi Lenna, > > > >> Regarding point (2), is your primary concern namespace clutter or > >> importing efficiency? > > > > Regarding point (2), my primary concern is that a Bio.WWW module would > > group together modules that don't have much in common with each other. I > > agree to your point that the category of internet access is more > fundamental > > than the category of parsers. But still, which modules should then go > into a > > Bio.WWW module? Any module whose sole purpose is to use the internet > (that > > would exclude Bio.Entrez)? Any module whose main purpose is to use the > > internet? This would be unclear; for example, Bio.Entrez may or may not > fall > > in that category, depending on how you use the module. Any module whose > > functionality includes internet access? Then if one day we add access to > the > > JASPAR database over the internet to Bio.Motif, it would have to move to > > Bio.WWW. > > > > Currently most modules are organized by theme (Bio.Seq, Bio.Motif, > > Bio.Cluster, Bio.Phylo, Bio.Entrez, etc.). For each theme, we have one > > module, one chapter in the documentation, one test of unit tests, one > set of > > doctests, which I think is a huge advantage both in terms of clarity and > in > > terms of user experience. > > Also with the theme approach, most (if not all) the themes are likely to > have some online resources (databases or remote APIs). On those > grounds it makes sense to keep online motif functionality (like weblogo) > under Bio.Motif, and so on. > I agree. >From an engineering perspective, it's usually best to organize code around data types. (To be clear: think classes and structures, not ints and strings.) The SeqIO, AlignIO, SearchIO, Phylo, Motif, PDB, etc. modules each have a core data type that serves as the "theme" for the sub-package. Within the sub-package we can have modules for different file formats, data transformations/manipulations, web servers, and command-line program wrappers, and keep all the interdependencies within the same small region of the code base. Since most users will not read the documentation in its entirety (if at all), this also makes it easier to look up how to do things with the data type in question. The core data type for a WWW module would be a network handle, I suppose -- but that's already part of the Python standard library. I've suggested before that we can justify the current placement of sequence-related modules at the top level, rather than under a new "Seq" sub-package, by considering sequences to be the default/implicit data type. As we've covered, many online resources can serve up several different data types, although sequences are probably the most common. In terms of namespace clutter, perhaps I've gotten too used to R, but I don't think we've reached the point where the number of modules and functions visible from the top level harms the user experience. In the specific case of Kevin's TAIR code for fetch Arabidopsis sequences, > Bio.TAIR (lower case?) is consistent with current usage. Somewhere under > Bio.Seq* also seems sensible to me, as I wrote at the start of this thread. > Bio.TAIR or Bio.Seq.TAIR or perhaps Bio.Seq.WWW.TAIR seem sensible to me, too. No preference on casing. -Eric From p.j.a.cock at googlemail.com Mon Feb 4 12:01:49 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 4 Feb 2013 12:01:49 +0000 Subject: [Biopython-dev] Deprecations for Biopython 1.61 release; Was: Bio.Motif update Message-ID: On Fri, Feb 1, 2013 at 3:39 PM, Michiel de Hoon wrote: > --- On Fri, 2/1/13, Peter Cock wrote: >> I went over the list in the DEPRECATED file last month, but >> a second check would be a good idea. > > The following were declared obsolete in Biopython 1.60, and can > in principle be declared deprecated in Biopython 1.61: > > ---------- > Bio/Blast/Applications.py: > BlastallCommandline > BlastpgpCommandline > RpsBlastCommandline > > Bio/Blast/NCBIStandalone.py overall, and specifically: > blastall > blastpgp > rpsblast > > Bio/ParserSupport.py overall > > Bio/PDB/AbstractPropertyMap.py: > The has_key function in class AbstractPropertyMap > > Bio/PDB/FragmentMapper.py: > The has_key function in class FragmentMapper > > Bio/UniGene/UniGene.py overall > > In BioSQL/BioSeqDatabase.py: > class DBServer: > remove_database > class BioSeqDatabase: > get_all_primary_ids > get_Seq_by_primary_id > > ----------- > > These functions were deprecated in Biopython 1.59 or earlier, and could be removed for Biopython 1.61: > > Bio/Align/__init__.py: > class MultipleSeqAlignment: > get_column > add_sequence > > Bio/Align/Generic.py: > class Alignment overall > get_all_seqs > get_seq_by_num > > Bio/File.py: > class StringHandle > > Bio/Graphics/GenomeDiagram/_AbstractDrawer.py: > class AbstractDrawer: > _set_xcentre, _set_ycentre > > Bio/Graphics/GenomeDiagram/_Graph.py: > class GraphData: > _set_centre > > Bio/ParserSupport.py: > SGMLStrippingConsumer > > Bio/Seq.py: > class Seq: > .data property > > Bio/SeqIO/SffIO.py: > _sff_read_roche_index_xml > > -------------------- > > The tostring() method of the class Seq in Bio/Seq.py: > Can we declare this obsolete? > > -Michiel Bio/SeqIO/SffIO.py function _sff_read_roche_index_xml done: https://github.com/biopython/biopython/commit/567464d9a5f8b87ec48e95bae127b86463bd4da1 Bio/File.py and Bio/ParserSupport.py bits done: https://github.com/biopython/biopython/commit/63997ea0afa5f7f6cac5c1b036d56416b04edb2a GenomeDiagram centre setters done: https://github.com/biopython/biopython/commit/2424c5ca36cdf4348b54bafdae444a91a6457288 Peter From ben at bendmorris.com Mon Feb 4 15:17:36 2013 From: ben at bendmorris.com (Ben Morris) Date: Mon, 4 Feb 2013 10:17:36 -0500 Subject: [Biopython-dev] Support for NeXML and RDF trees in Bio.Phylo In-Reply-To: References: Message-ID: On Fri, Jan 18, 2013 at 8:20 PM, Eric Talevich wrote: > On Fri, Dec 28, 2012 at 10:50 AM, Ben Morris wrote: >> >> On Tue, Dec 25, 2012 at 2:18 AM, Eric Talevich >> wrote: >> > >> > On Mon, Dec 24, 2012 at 8:58 AM, Ben Morris wrote: >> >> >> >> Hi all, >> >> >> >> I've implemented support for two new phylogenetic tree formats: NeXML >> >> and >> >> RDF (conforming to the Comparative Data Analysis Ontology). >> >> >> >> I noticed that NeXML support was planned, but I didn't see anyone >> >> working >> >> on it on GitHub and the feature request hadn't been updated in about a >> >> year, so I went ahead and implemented a simple version. At first I >> >> tried >> >> the generateDS.py approach, but the generated writer doesn't give very >> >> much >> >> control over the output, so I ended up writing my own parser/writer >> >> using >> >> ElementTree. >> >> >> >> As for the RDF/CDAO format, AFAIK this is not a format that's supported >> >> by >> >> any other phylogenetic libraries, so I'm not sure how useful this is to >> >> everyone else. It provides a simple, standards-compliant format that >> >> can be >> >> imported to a triple store and supports annotation. We'll be using it >> >> at >> >> NESCent so I wanted to make it available to everyone else as well. The >> >> parser and writer require the Redlands Python bindings. >> >> >> >> The code is available in my fork of Biopython, >> >> >> >> https://github.com/bendmorris/biopython >> >> >> >> under branches "cdao" and "nexml." I'd love to get everyone's thoughts >> >> and >> >> see if these contributions would be a good fit for the Biopython >> >> project. >> > >> > >> > >> > Thanks for letting us know! I'll try it out soonish. Looking at the code >> > on your nexml branch, I have a few comments: >> > >> > - The parser uses ElementTree.parse rather than iterparse, so in its >> > current state it would not be able to parse massive files (those larger than >> > available RAM). Worth fixing eventually? >> >> Great point. I rewrote it to use iterparse instead. >> >> > - The parser creates Newick.Tree and Newick.Clade objects, which is >> > nearly correct in my opinion. I would suggest subclassing BaseTree.Tree and >> > BaseTree.Clade to create NeXML-specific Tree and Clade classes, even if you >> > don't have any additional attributes to attach to those classes at the >> > moment. (These would go in a new file NeXML.py, similar to PhyloXML.py and >> > PhyloXMLIO.py.) >> >> Went ahead and did this as well. > > > Thanks! Sorry for the pace of this, I'm in the midst of a dissertation. > > >> > - The 'confidence' or 'confidences' attribute isn't used (for e.g. >> > bootstrap support values). Does NeXML define it? >> >> Not that I'm aware of, but I'm not sure. I searched >> http://nexml.org/nexml/html/doc/schema-1/ and didn't find anything. >> I'm going to ask some people who know more about this than I do. > > > I would like for Bio.Phylo's I/O modules to be able to successfully > round-trip a file from Newick to phyloXML to NeXML and back to Newick > without losing support values. I found these two examples of how to add this > data to a NeXML document by referencing CDAO: > https://www.nescent.org/wg_evoinfo/NeXML_Test_Files#Bootstraps_represented_using_the_.22meta.22_tag > https://www.nescent.org/wg_evoinfo/NeXML_Test_Files#Bootstraps_represented_without_new_tags_or_elements > > That's the standard way to store bootstrap supports in NeXML (Hilmar > confirms). How do your NeXML and CDAO modules interact, if at all? Would the > CDAO modules be useful to properly support NeXML metadata like > support/confidence values, or would it be simpler to just hard-code the few > tags we're specifically interested in? > > Relatedly, those look like good test files. I see you've started writing > NeXML unit tests already; if you would like help with any of this, just let > me know. > > -Eric No worries! I just returned from a NESCent-sponsored hackathon where we used BioPython as part of a Virtuoso-backed RDF treestore (https://github.com/phylotastic/rdf-treestore). Now that I'm back, I'll work on the bootstrap support values and annotations for NeXML as I have time. I think it's probably much easier to just hard-code specific tags for now. The CDAO module can convert the more readable CDAO prefix names to OBO numeric identifiers (e.g. cdao:has_Root -> obo:CDAO_0000148) but other than that I don't see a good way for them to interact. I gave a short demo of Bio.Phylo at the hackathon, and people were very impressed. We had some issues with Newick and Nexus parsing as well, so I'll open issues on the bug tracker. ~Ben From redmine at redmine.open-bio.org Mon Feb 4 15:20:38 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 4 Feb 2013 15:20:38 +0000 Subject: [Biopython-dev] [Biopython - Bug #3407] (New) Handling of bootstrap support values in Bio.Phylo Newick parser Message-ID: Issue #3407 has been reported by Ben Morris. ---------------------------------------- Bug #3407: Handling of bootstrap support values in Bio.Phylo Newick parser https://redmine.open-bio.org/issues/3407 Author: Ben Morris Status: New Priority: Normal Assignee: Category: Target version: URL: This was reported to me by Arlin Stoltzfus (quote): "There is a description of Newick here: http://evolution.genetics.washington.edu/phylip/newicktree.html and a BNF here: http://evolution.genetics.washington.edu/phylip/newick_doc.html Note that this allows square-bracketed comments. Bootstrap values commonly are represented in 2 ways, one of which is wrong. The wrong way to represent bootstrap values is to present them as internal node labels. Labels for internal nodes are given as follows: ((( human: 0.1, chimp:0.1 ) primates: 0.2, (rat:0.1, mouse:0.1) rodents:0.2), cat:0.3 ) where "primates" and "rodents" are internal node labels. They go between the right paren and the (optional) colon and distance. If you put numbers in the label position, a graphic renderer may place them on the nodes, which is why some people represent bootstrap values this way. However, the preferred way to represent bootstrap values is to make them syntactic comments (enclosed in square brackets) placed after all other node information, i.e., after the optional colon & branch length. Both examples are shown here: ((raccoon:19.19959,bear:6.80041)50:0.84600,((sea_lion:11.99700, seal:12.00300)100:7.52973,((monkey:100.85930,cat:47.14069)80:20.59201, weasel:18.87953)75:2.09460)50:3.87382,dog:25.46154); or ((raccoon:19.19959,bear:6.80041):0.84600[50],((sea_lion:11.99700, seal:12.00300):7.52973[100],((monkey:100.85930,cat:47.14069):20.59201[80], weasel:18.87953):2.09460[75]):3.87382[50],dog:25.46154); I recommend that you only support the second version, and treat the first version as a case of internal node labels. Arlin ------- Arlin Stoltzfus (arlin at umd.edu) Fellow, IBBR; Adj. Assoc. Prof., UMCP; Research Biologist, NIST IBBR, 9600 Gudelsky Drive, Rockville, MD, 20850 tel: 240 314 6208; web: www.molevol.org" ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Mon Feb 4 15:26:31 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 4 Feb 2013 15:26:31 +0000 Subject: [Biopython-dev] [Biopython - Bug #3408] (New) Parsing of Nexus files generated by TreeBase fails (Bio.Phylo) Message-ID: Issue #3408 has been reported by Ben Morris. ---------------------------------------- Bug #3408: Parsing of Nexus files generated by TreeBase fails (Bio.Phylo) https://redmine.open-bio.org/issues/3408 Author: Ben Morris Status: New Priority: Normal Assignee: Category: Target version: URL: Steps to reproduce: Pick a tree on TreeBase (e.g. http://treebase.org/treebase-web/search/study/trees.html?id=12003 or http://treebase.org/treebase-web/search/study/trees.html?id=1029) and click on "download reconstructed NEXUS file." Attempt to parse the file using Bio.Phylo.read. Exception:
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/_io.py", line 62, in read
    tree = tree_gen.next()
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/_io.py", line 50, in parse
    for tree in getattr(supported_formats[format], 'parse')(fp, **kwargs):
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NexusIO.py", line 39, in parse
    nex = Nexus.Nexus(handle)
  File "/usr/lib/python2.7/site-packages/Bio/Nexus/Nexus.py", line 572, in __init__
    self.read(input)
  File "/usr/lib/python2.7/site-packages/Bio/Nexus/Nexus.py", line 623, in read
    self._parse_nexus_block(title, contents)
  File "/usr/lib/python2.7/site-packages/Bio/Nexus/Nexus.py", line 664, in _parse_nexus_block
    getattr(self,'_'+line.command)(line.options)
AttributeError: 'Nexus' object has no attribute '_link'
DendroPy is able to parse the same files. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Mon Feb 4 16:49:07 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 4 Feb 2013 16:49:07 +0000 Subject: [Biopython-dev] Deprecations for Biopython 1.61 release; Was: Bio.Motif update In-Reply-To: References: Message-ID: On Mon, Feb 4, 2013 at 12:01 PM, Peter Cock wrote: > On Fri, Feb 1, 2013 at 3:39 PM, Michiel de Hoon wrote: >> --- On Fri, 2/1/13, Peter Cock wrote: >>> I went over the list in the DEPRECATED file last month, but >>> a second check would be a good idea. >> >> The following were declared obsolete in Biopython 1.60, and can >> in principle be declared deprecated in Biopython 1.61: >> >> ---------- >> Bio/Blast/Applications.py: >> BlastallCommandline >> BlastpgpCommandline >> RpsBlastCommandline >> >> Bio/Blast/NCBIStandalone.py overall, and specifically: >> blastall >> blastpgp >> rpsblast >> >> Bio/ParserSupport.py overall >> >> Bio/PDB/AbstractPropertyMap.py: >> The has_key function in class AbstractPropertyMap >> >> Bio/PDB/FragmentMapper.py: >> The has_key function in class FragmentMapper >> >> Bio/UniGene/UniGene.py overall >> >> In BioSQL/BioSeqDatabase.py: >> class DBServer: >> remove_database >> class BioSeqDatabase: >> get_all_primary_ids >> get_Seq_by_primary_id >> >> ----------- >> >> These functions were deprecated in Biopython 1.59 or earlier, and could be removed for Biopython 1.61: >> >> Bio/Align/__init__.py: >> class MultipleSeqAlignment: >> get_column >> add_sequence >> >> Bio/Align/Generic.py: >> class Alignment overall >> get_all_seqs >> get_seq_by_num >> >> Bio/File.py: >> class StringHandle >> >> Bio/Graphics/GenomeDiagram/_AbstractDrawer.py: >> class AbstractDrawer: >> _set_xcentre, _set_ycentre >> >> Bio/Graphics/GenomeDiagram/_Graph.py: >> class GraphData: >> _set_centre >> >> Bio/ParserSupport.py: >> SGMLStrippingConsumer >> >> Bio/Seq.py: >> class Seq: >> .data property >> >> Bio/SeqIO/SffIO.py: >> _sff_read_roche_index_xml >> >> -------------------- >> >> The tostring() method of the class Seq in Bio/Seq.py: >> Can we declare this obsolete? >> >> -Michiel > > Bio/SeqIO/SffIO.py function _sff_read_roche_index_xml done: > https://github.com/biopython/biopython/commit/567464d9a5f8b87ec48e95bae127b86463bd4da1 > > Bio/File.py and Bio/ParserSupport.py bits done: > https://github.com/biopython/biopython/commit/63997ea0afa5f7f6cac5c1b036d56416b04edb2a > > GenomeDiagram centre setters done: > https://github.com/biopython/biopython/commit/2424c5ca36cdf4348b54bafdae444a91a6457288 Michiel already did most of the others, https://github.com/biopython/biopython/commit/1b2025bee868b0282b913690a999833d13598ea4 I've just removed the Seq object's deprecated data property: https://github.com/biopython/biopython/commit/e3cf12a1bf28c1cd52e4b5492fb1cd76731b486b For the Seq object's tostring() method, let's review Bow's pull request after this release? https://github.com/biopython/biopython/pull/137 Regards, Peter From p.j.a.cock at googlemail.com Mon Feb 4 17:26:44 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 4 Feb 2013 17:26:44 +0000 Subject: [Biopython-dev] Bio.Motif update In-Reply-To: <1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com> References: <1359730386.17784.YahooMailClassic@web164005.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 1, 2013 at 2:53 PM, Michiel de Hoon wrote: > Hi Peter and all, > > --- On Tue, 1/29/13, Peter Cock wrote: >> We need to say something about this in the NEWS file too. > > Done. > >> I think it would make sense to add a PendingDeprecationWarning >> to Bio.Motif now. > > Done. > >> Also, if you feel the new Bio.motifs API isn't quite >> settled yet, adding the new BiopythonExperimentalWarning to >> that makes sense. > > I don't expect big changes in the API, so I think we can do without the > BiopythonExperimentalWarning. Also we should avoid the situation > that Bio.Motif gives a DeprecationWarning, and Bio.Motifs gives a > BiopythonExperimentalWarning. > >> (And once this is settled, I think we can schedule the >> release) Hi Michiel, Rather than having two (very similar) chapters in the Tutorial for the old Bio.Motif and new Bio.motifs modules, I've downgraded the old chapter to just a section of the new chapter: https://github.com/biopython/biopython/commit/ee5cccf6bc661befc924cb7fc2a422c07f3eeee1 There is still a lot of redundant content - would you be able to shorten this? Or can we just cut it and refer anyone interested to the tutorial shipped with Biopython 1.60 instead? I think a summary of the differences be more useful, to help people convert from the old module to the new motifs module. Also, what is the point of the Bio.motifs.create function? Is there a reason not to initialise a Motif object directly? Thanks, Peter From p.j.a.cock at googlemail.com Mon Feb 4 17:57:42 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 4 Feb 2013 17:57:42 +0000 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: Message-ID: On Fri, Feb 1, 2013 at 3:03 PM, Peter Cock wrote: > Hello all, > > I think we're overdue for a Biopython release now, and I would > like to do this next week. There are still plenty more additions > and enhancements waiting in the wings, but right now I just > want any remaining bug fixes addressed. > > Are there any release blocking issues? > > Thanks, > > Peter Hi all, I've posted the current tutorial as HTML and PDF online [*], http://biopython.org/DIST/docs/tutorial/Tutorial-dev.html http://biopython.org/DIST/docs/tutorial/Tutorial-dev.pdf It would be great to have you all re-read chapters you've contributed to or are familiar with - and fix or report any more typos etc. Note that some of the embedded examples in the LaTeX source are now tested via doctest using test_Tutorial.py, so if you do make some local edits run that before you commit them. Thanks, Peter [*] Those URLs used to be updated nightly, something I've not yet restored since the website was moved from the old OBF hardware to an Amazon cloud server. The simplest option here would be to install latex on the server... From redmine at redmine.open-bio.org Mon Feb 4 18:14:19 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 4 Feb 2013 18:14:19 +0000 Subject: [Biopython-dev] [Biopython - Bug #3409] (New) Newick parser fails to parse Greengenes tree (Bio.Phylo) Message-ID: Issue #3409 has been reported by Ben Morris. ---------------------------------------- Bug #3409: Newick parser fails to parse Greengenes tree (Bio.Phylo) https://redmine.open-bio.org/issues/3409 Author: Ben Morris Status: New Priority: Normal Assignee: Category: Target version: URL: The file is available here: http://www.evoio.org/wg/evoio/images/f/f9/Greengenes2011.txt (9.2 MB) The problem may be related to the use of single-quoted node labels which sometimes contain parentheses, e.g.
'p__Fusobacteria; c__Fusobacteria (class); o__Fusobacteriales; f__Fusobacteriaceae':0.11021
Exception:
  ...
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 110, in _parse_subtree
    clade.clades = [self._parse_subtree(st) for st in subtrees]
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 110, in _parse_subtree
    clade.clades = [self._parse_subtree(st) for st in subtrees]
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 110, in _parse_subtree
    clade.clades = [self._parse_subtree(st) for st in subtrees]
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 110, in _parse_subtree
    clade.clades = [self._parse_subtree(st) for st in subtrees]
  File "/usr/lib/python2.7/site-packages/Bio/Phylo/NewickIO.py", line 87, in _parse_subtree
    raise NewickError("Parentheses do not match in (sub)tree: " + text)
Bio.Phylo.NewickIO.NewickError: Parentheses do not match in (sub)tree: 139839:0.04507):0.02429
Other Newick parsers (ete and dendropy) are able to parse this file. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From mjldehoon at yahoo.com Tue Feb 5 04:01:26 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 4 Feb 2013 20:01:26 -0800 (PST) Subject: [Biopython-dev] Bio.Motif update In-Reply-To: Message-ID: <1360036886.33220.YahooMailClassic@web164006.mail.gq1.yahoo.com> Hi Peter, --- On Mon, 2/4/13, Peter Cock wrote: > Rather than having two (very similar) chapters in the > Tutorial for the old Bio.Motif and new Bio.motifs modules, > I've downgraded the old chapter to just a section of > the new chapter: ... > There is still a lot of redundant content - would you be > able to shorten this? I think it's OK if it is redundant. Anyway the chapter on the older Bio.Motif will be removed a few releases later. > I think a summary of the differences?be more useful, > to help people convert from the old module to the new > motifs module. Maybe, but for me it doesn't have a high priority. It's easier to understand the new chapter on Bio.motifs. > Also, what is the point of the Bio.motifs.create function? > Is there a reason not to initialise a Motif object directly? There are two ways to initialize a Motif: either to specify the alignment from which the motif is created, or directly from a position-weight matrix. This can be a bit confusing. To separate the two, the Bio.motifs.create function only initializes a Motif from an alignment; some of the motif parsers initialize a Motif from a position-weight matrix. Best, -Michiel. From p.j.a.cock at googlemail.com Tue Feb 5 12:32:47 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 12:32:47 +0000 Subject: [Biopython-dev] KEGG enhancements Message-ID: Hi all, We have a couple of new pull requests for KEGG enhancements, which we can look at after the imminent Biopython 1.61 release goes out this week. Kevin's working on the REST API, https://github.com/biopython/biopython/pull/152 Leighton's working on KGML and graphics, https://github.com/biopython/biopython/pull/153 There is a tiny bit of online access code in Leighton's code which can probably be changed to use Kevin's work - I've not had time to examine the overlap yet. Peter ---------- Forwarded message ---------- From: kevin Date: Mon, Feb 4, 2013 at 8:03 PM Subject: [biopython] Add KEGG API Querying Support (#152) To: biopython/biopython This adds support to query KEGG's REST API (http://www.kegg.jp/kegg/docs/keggapi.html) along with simple tests which ensure that the correct url is hit and documentation in the cookbook. This has been discussed on the mailing list in the following thread: http://lists.open-bio.org/pipermail/biopython-dev/2012-October/009981.html. ________________________________ You can merge this Pull Request by running git pull https://github.com/kevinwuhoo/biopython master Or view, comment on, or merge it at: https://github.com/biopython/biopython/pull/152 Commit Summary Added a KEGG API Wrapper Forgot copyright Added a general parser and a KEGG section in the tutorial. Updated querying code and corresponding tests. Updated documentation to reflect changes in KEGG module. File Changes M Bio/KEGG/__init__.py (196) M Doc/Tutorial.tex (88) M Tests/output/test_KEGG (41) M Tests/test_KEGG.py (159) Patch Links: https://github.com/biopython/biopython/pull/152.patch https://github.com/biopython/biopython/pull/152.diff From p.j.a.cock at googlemail.com Tue Feb 5 12:33:52 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 12:33:52 +0000 Subject: [Biopython-dev] KEGG enhancements In-Reply-To: References: Message-ID: On Tue, Feb 5, 2013 at 12:32 PM, Peter Cock wrote: > Hi all, > > We have a couple of new pull requests for KEGG enhancements, > which we can look at after the imminent Biopython 1.61 release > goes out this week. > > Kevin's working on the REST API, > https://github.com/biopython/biopython/pull/152 > > Leighton's working on KGML and graphics, Sorry, the correct URL, https://github.com/biopython/biopython/pull/155 Details below, Peter ---------- Forwarded message ---------- From: Leighton Pritchard Date: Tue, Feb 5, 2013 at 12:28 PM Subject: [biopython] KGML files (#155) To: biopython/biopython As we discussed - not an ideal pull request (rebasing added the recent Biopython changes to the KEGG branch, rather than what was expected), but if it's workable, here's the code in a way that doesn't seem to break Biopython ;) L. ________________________________ You can merge this Pull Request by running git pull https://github.com/widdowquinn/biopython kegg Or view, comment on, or merge it at: https://github.com/biopython/biopython/pull/155 Commit Summary First addition of KGML module (with tests) Moved Bio.KGML to Bio.KEGG.KGML and split KGML tests Modified comments to indicate TODO Removed accidentally-committed files Fix typo in error message Fix typo in blastall wrapper Add new Blast 2.2.27+ arguments to wrappers Ignore new blastx arguments if testing with old BLAST+ BLAST 2.2.27+ dropped -frame_shift_penalty argument Remove deprecated Bio.File.StringHandle and SGMLStripper Remove centre setters, add explicit deprecation warning to getters. Clarify docstrings of deprecated BLAST functions. Avoid ResourceWarning: unclosed file in these doctests Close handle in this doctest Remove the deprecated Seq object's data property Remove duplicated section labels in Tutorial (in repeated Motifs text) Downgrade Bio.Motif chapter to a section at the end of the Bio.motifs chapter Fix a typo Clarify docstring for obsolete Bio.Motif module Explain Bio.motifs replaces Bio.Motif in its docstring Update date in Tutorial Fix 2 typos. Add links to SearchIO tutorial files Update SearchIO tutorial language style Add links to SearchIO documentation pages Tutorial specific example files have previously gone under Doc/examples Update paths in tutorial after moving example files File Changes M Bio/Blast/Applications.py (36) M Bio/Blast/NCBIStandalone.py (21) M Bio/File.py (65) M Bio/Graphics/GenomeDiagram/_AbstractDrawer.py (30) M Bio/Graphics/GenomeDiagram/_Graph.py (14) A Bio/Graphics/KGML_vis.py (422) A Bio/KEGG/KGML/KGML_parser.py (184) A Bio/KEGG/KGML/KGML_pathway.py (766) A Bio/KEGG/KGML/KGML_scrape.py (109) A Bio/KEGG/KGML/__init__.py (15) M Bio/Motif/__init__.py (13) M Bio/ParserSupport.py (34) M Bio/Seq.py (33) M Bio/SeqIO/SffIO.py (1) M Bio/SeqRecord.py (6) M Bio/motifs/__init__.py (7) M DEPRECATED (8) M Doc/Tutorial.tex (164) A Doc/examples/my_blast.xml (0) A Doc/examples/my_blat.psl (0) A Tests/KEGG/ko01100.kgml (17805) A Tests/KEGG/ko01100.xml (25176) A Tests/KEGG/ko01100_mod_original.pdf (98) A Tests/KEGG/ko01100_original.pdf (98) A Tests/KEGG/ko01120.xml (11425) A Tests/KEGG/ko03070.kgml (249) A Tests/KEGG/ko03070.xml (413) A Tests/KEGG/ko03070_mod_original.pdf (113) A Tests/KEGG/ko03070_original.pdf (113) A Tests/KEGG/map01100.png (0) A Tests/KEGG/map03070.png (0) D Tests/Tutorial/README.txt (9) M Tests/test_File.py (13) A Tests/test_KGML_graphics.py (138) A Tests/test_KGML_nographics.py (99) A Tests/test_KGML_online.py (68) M Tests/test_NCBI_BLAST_tools.py (9) M Tests/test_ParserSupport.py (9) M setup.py (1) Patch Links: https://github.com/biopython/biopython/pull/155.patch https://github.com/biopython/biopython/pull/155.diff From p.j.a.cock at googlemail.com Tue Feb 5 12:36:55 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 12:36:55 +0000 Subject: [Biopython-dev] KEGG enhancements In-Reply-To: References: Message-ID: On Tue, Feb 5, 2013 at 12:33 PM, Peter Cock wrote: > On Tue, Feb 5, 2013 at 12:32 PM, Peter Cock wrote: >> Hi all, >> >> We have a couple of new pull requests for KEGG enhancements, >> which we can look at after the imminent Biopython 1.61 release >> goes out this week. >> >> Kevin's working on the REST API, >> https://github.com/biopython/biopython/pull/152 >> >> Leighton's working on KGML and graphics, > > Sorry, the correct URL, https://github.com/biopython/biopython/pull/155 > > Details below, See also Leighton's blog posts about this work (with pictures): http://armchairbiology.blogspot.co.uk/2013/01/keggwatch-part-i.html http://armchairbiology.blogspot.co.uk/2013/02/keggwatch-part-ii.html http://armchairbiology.blogspot.co.uk/2013/02/keggwatch-part-iii.html Regards, Peter From p.j.a.cock at googlemail.com Tue Feb 5 13:55:20 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 13:55:20 +0000 Subject: [Biopython-dev] Doing the Biopython 1.61 release next week? In-Reply-To: References: Message-ID: Hi all, I'm going to try and do the release this afternoon, so no commits to the master branch until further notice please. Thanks, Peter From p.j.a.cock at googlemail.com Tue Feb 5 14:49:20 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 14:49:20 +0000 Subject: [Biopython-dev] Biopython 1.61 release Message-ID: On Tue, Feb 5, 2013 at 1:55 PM, Peter Cock wrote: > Hi all, > > I'm going to try and do the release this afternoon, so > no commits to the master branch until further notice > please. > > Thanks, > > Peter The release is in progress... The Windows installers are on the website for some quick pre-announcement testing. If anyone spots an issue, please email me ASAP: http://biopython.org/DIST/ Last time we put 'beta' in the Python 3.2 installer to emphasise this was still not quite reading for prime time. Should we do that again? How comfortable are we all about encouraging more use under Python 3? Thanks, Peter From p.j.a.cock at googlemail.com Tue Feb 5 18:14:24 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 18:14:24 +0000 Subject: [Biopython-dev] Biopython 1.61 release In-Reply-To: References: Message-ID: On Tue, Feb 5, 2013 at 2:49 PM, Peter Cock wrote: > On Tue, Feb 5, 2013 at 1:55 PM, Peter Cock wrote: >> Hi all, >> >> I'm going to try and do the release this afternoon, so >> no commits to the master branch until further notice >> please. >> >> Thanks, >> >> Peter > > The release is in progress... > > The Windows installers are on the website for some quick > pre-announcement testing. If anyone spots an issue, please > email me ASAP: http://biopython.org/DIST/ > > Last time we put 'beta' in the Python 3.2 installer to emphasise > this was still not quite reading for prime time. Should we do that > again? How comfortable are we all about encouraging more > use under Python 3? I'm planning to do the same in terms of putting beta in the Windows installer for Python 3.2. After some trouble, I now have the epydoc API files updated (a manual refresh might be needed to see the changes): http://biopython.org/DIST/docs/api/ Bow - the `backtick` markup doesn't do anything in epydoc, but perhaps for the next release we can turn the SearchIO markup into restructuredtext instead? I think last time I didn't have the docutils dependency installed in order for epydoc to try and parse the restructuredtext (used in Bio.Phylo). Running epydoc also showed a few more epydoc formatting errors, fixed in git - I will now regenerate the installers, and tag this in git etc. Peter From w.arindrarto at gmail.com Tue Feb 5 18:22:46 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 5 Feb 2013 19:22:46 +0100 Subject: [Biopython-dev] Biopython 1.61 release In-Reply-To: References: Message-ID: Hi Peter, > Bow - the `backtick` markup doesn't do anything in epydoc, but > perhaps for the next release we can turn the SearchIO markup > into restructuredtext instead? > > I think last time I didn't have the docutils dependency installed > in order for epydoc to try and parse the restructuredtext (used > in Bio.Phylo). Running epydoc also showed a few more epydoc > formatting errors, fixed in git - I will now regenerate the installers, > and tag this in git etc. Hmm..IIRC, I did wrote the entire SearchIO doc using reStructuredText markup; in hindsight probably not wise since we still rely on epydoc. Using rSt for the next release sounds good. On a related not, do we have any solid plans to move out of epydoc (and into Sphinx?) for the next release? regards, Bow From p.j.a.cock at googlemail.com Tue Feb 5 18:30:29 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 18:30:29 +0000 Subject: [Biopython-dev] Biopython 1.61 release In-Reply-To: References: Message-ID: On Tue, Feb 5, 2013 at 6:22 PM, Wibowo Arindrarto wrote: > Hi Peter, > >> Bow - the `backtick` markup doesn't do anything in epydoc, but >> perhaps for the next release we can turn the SearchIO markup >> into restructuredtext instead? >> >> I think last time I didn't have the docutils dependency installed >> in order for epydoc to try and parse the restructuredtext (used >> in Bio.Phylo). Running epydoc also showed a few more epydoc >> formatting errors, fixed in git - I will now regenerate the installers, >> and tag this in git etc. > > Hmm..IIRC, I did wrote the entire SearchIO doc using reStructuredText > markup; in hindsight probably not wise since we still rely on epydoc. > Using rSt for the next release sounds good. Using reStructuredText (like Eric did with Bio.Phylo) would have been (and is) fine, however you had __docformat__ = 'epytext en' in the file. > On a related not, do we have any solid plans to move out of epydoc > (and into Sphinx?) for the next release? Not yet - but moving all the docstrings to reStructuredText is a very good step towards that, and a chance to review/update all the plain text docstrings in particular to look nicer and be more consistent. Peter From p.j.a.cock at googlemail.com Tue Feb 5 18:57:58 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 18:57:58 +0000 Subject: [Biopython-dev] Biopython 1.61 release In-Reply-To: References: Message-ID: Hi all, The Biopython 1.61 release files are live, http://biopython.org/DIST/ and this its tagged on GitHub now, i.e. this commit: https://github.com/biopython/biopython/commit/d372e59b3d9147cd9855feb6e3b90ff523f539b5 I've not yet pushed this to PyPI, nor done the announcement. If anyone would like to write a draft based on the NEWS file and the previous announcements during the next hour or two, that would be great. Otherwise I'll do this after dinner... Thanks, Peter From p.j.a.cock at googlemail.com Tue Feb 5 21:30:45 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 21:30:45 +0000 Subject: [Biopython-dev] Biopython 1.61 release In-Reply-To: References: Message-ID: On Tue, Feb 5, 2013 at 6:57 PM, Peter Cock wrote: > Hi all, > > The Biopython 1.61 release files are live, http://biopython.org/DIST/ > and this its tagged on GitHub now, i.e. this commit: > https://github.com/biopython/biopython/commit/d372e59b3d9147cd9855feb6e3b90ff523f539b5 > > I've not yet pushed this to PyPI, nor done the announcement. > > If anyone would like to write a draft based on the NEWS file > and the previous announcements during the next hour or two, > that would be great. Otherwise I'll do this after dinner... > > Thanks, > > Peter Draft text below, based heavily on the NEWS file - any comments? I'll post the new Tutorial online now, and then update the Downloads page on the wiki before posting this. Peter -- Biopython 1.61 released Source distributions and Windows installers for Biopython 1.61 are now available from the downloads page on the Biopython website and from the Python Package Index (PyPI). The updated Biopython Tutorial and Cookbook is online (PDF). Platforms/Deployment We currently support Python 2.5, 2.6 and 2.7 and also test under Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C extensions). We are still encouraging early adopters to help test on these platforms, and have included a ?beta? installer for Python 3.2 (and Python 3.3 too follow soon) under 32-bit Windows. Please note we are phasing out support for Python 2.5. We will continue support for at least one further release (Biopython 1.62). This could be extended given feedback from our users. Focusing on Python 2.6 and 2.7 only will make writing Python 3 compatible code easier. Features GenomeDiagram has three new sigils (shapes to illustrate features). OCTO shows an octagonal shape, like the existing BOX sigil but with the corners cut off. JAGGY shows a box with jagged edges at the start and end, intended for things like NNNNN regions in draft genomes. Finally BIGARROW is like the existing ARROW sigil but is drawn straddling the axis. This is useful for drawing vertically compact figures where you do not have overlapping genes. New module Bio.Graphics.ColorSpiral can generate colors along a spiral path through HSV color space. This can be used to make arbitrary ?rainbow? scales, for example to color features or cross-links on a GenomeDiagram figure. The Bio.SeqIO module now supports reading sequences from PDB files in two different ways. The ?pdb-atom? format determines the sequence as it appears in the structure based on the atom coordinate section of the file (via Bio.PDB, so NumPy is currently required for this). Alternatively, you can use the ?pdb-seqres? format to read the complete protein sequence as it is listed in the PDB header, if available. The Bio.SeqUtils module how has a seq1 function to turn a sequence using three letter amino acid codes into one using the more common one letter codes. This acts as the inverse of the existing seq3 function. The multiple-sequence-alignment object used by Bio.AlignIO etc now supports an annotation dictionary. Additional support for per-column annotation is planned, with addition and splicing to work like that for the SeqRecord per-letter annotation. The Bio.Motif module has been updated and reorganized. To allow for a clean deprecation of the old code, the new motif code is stored in a new module Bio.motifs, and a PendingDeprecationWarning was added to Bio.Motif. Experimental Code ? SearchIO This release also includes Bow?s Google Summer of Code work writing a unified parsing framework for NCBI BLAST (assorted formats including tabular and XML), HMMER, BLAT, and other sequence searching tools. This is currently available with the new BiopythonExperimentalWarning to indicate that this is still somewhat experimental. We?re bundling it with the main release to get more public feedback, but with the big warning that the API is likely to change. In fact, even the current name of Bio.SearchIO may change since unless you are familiar with BioPerl its purpose isn?t immediately clear. Contributors Brandon Invergo Bryan Lunt (first contribution) Christian Brueffer (first contribution) David Cain Eric Talevich Grace Yeo (first contribution) Jeffrey Chang Jingping Li (first contribution) Kai Blin (first contribution) Leighton Pritchard Lenna Peterson Lucas Sinclair (first contribution) Michiel de Hoon Nick Semenkovich (first contribution) Peter Cock Robert Ernst (first contribution) Tiago Antao Wibowo ?Bow? Arindrarto From p.j.a.cock at googlemail.com Tue Feb 5 21:42:06 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 21:42:06 +0000 Subject: [Biopython-dev] Biopython 1.61 release In-Reply-To: References: Message-ID: On Tue, Feb 5, 2013 at 9:34 PM, Lenna Peterson wrote: > Hi Peter, > > Looks great. Very small typo: in the last sentence of the paragraph about > platforms, "Python 3.3 too follow" should be "Python 3.3 to follow". Thanks Lenna :) I didn't make an installer for Python 3.3 this afternoon, but I will tomorrow having heard back from the NumPy 1.7 release manager that there shouldn't be any problems from compiling against their release candidate: http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065369.html On a related point, NumPy are looking at if they can include pre-compiled installers for 64bit Windows - once that happens (and it may have to wait until NumPy 1.8), we will need to look at this too: http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065339.html Peter From p.j.a.cock at googlemail.com Tue Feb 5 22:05:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 22:05:25 +0000 Subject: [Biopython-dev] Biopython 1.61 released Message-ID: Dear Biopythoneers, Source distributions and Windows installers for Biopython 1.61 are now available from the downloads page on the Biopython website and from the Python Package Index (PyPI). The updated Biopython Tutorial and Cookbook is online (PDF). Platforms/Deployment: We currently support Python 2.5, 2.6 and 2.7 and also test under Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C extensions). We are still encouraging early adopters to help test on these platforms, and have included a ?beta? installer for Python 3.2 (and Python 3.3 to follow soon) under 32-bit Windows. Please note we are phasing out support for Python 2.5. We will continue support for at least one further release (Biopython 1.62). This could be extended given feedback from our users. Focusing on Python 2.6 and 2.7 only will make writing Python 3 compatible code easier. New Features: GenomeDiagram has three new sigils (shapes to illustrate features). OCTO shows an octagonal shape, like the existing BOX sigil but with the corners cut off. JAGGY shows a box with jagged edges at the start and end, intended for things like NNNNN regions in draft genomes. Finally BIGARROW is like the existing ARROW sigil but is drawn straddling the axis. This is useful for drawing vertically compact figures where you do not have overlapping genes. New module Bio.Graphics.ColorSpiral can generate colors along a spiral path through HSV color space. This can be used to make arbitrary ?rainbow? scales, for example to color features or cross-links on a GenomeDiagram figure. The Bio.SeqIO module now supports reading sequences from PDB files in two different ways. The ?pdb-atom? format determines the sequence as it appears in the structure based on the atom coordinate section of the file (via Bio.PDB, so NumPy is currently required for this). Alternatively, you can use the ?pdb-seqres? format to read the complete protein sequence as it is listed in the PDB header, if available. The Bio.SeqUtils module how has a seq1 function to turn a sequence using three letter amino acid codes into one using the more common one letter codes. This acts as the inverse of the existing seq3 function. The multiple-sequence-alignment object used by Bio.AlignIO etc now supports an annotation dictionary. Additional support for per-column annotation is planned, with addition and splicing to work like that for the SeqRecord per-letter annotation. The Bio.Motif module has been updated and reorganized. To allow for a clean deprecation of the old code, the new motif code is stored in a new module Bio.motifs, and a PendingDeprecationWarning was added to Bio.Motif. Experimental Code ? SearchIO: This release also includes Bow?s Google Summer of Code work writing a unified parsing framework for NCBI BLAST (assorted formats including tabular and XML), HMMER, BLAT, and other sequence searching tools. This is currently available with the new BiopythonExperimentalWarning to indicate that this is still somewhat experimental. We?re bundling it with the main release to get more public feedback, but with the big warning that the API is likely to change. In fact, even the current name of Bio.SearchIO may change since unless you are familiar with BioPerl its purpose isn?t immediately clear. Contributors: Brandon Invergo Bryan Lunt (first contribution) Christian Brueffer (first contribution) David Cain Eric Talevich Grace Yeo (first contribution) Jeffrey Chang Jingping Li (first contribution) Kai Blin (first contribution) Leighton Pritchard Lenna Peterson Lucas Sinclair (first contribution) Michiel de Hoon Nick Semenkovich (first contribution) Peter Cock Robert Ernst (first contribution) Tiago Antao Wibowo ?Bow? Arindrarto Thank you all. Release announcement here (RSS feed available): http://news.open-bio.org/news/2013/02/biopython-1-61-released/ P.S. You can follow @Biopython on Twitter https://twitter.com/Biopython From p.j.a.cock at googlemail.com Tue Feb 5 22:38:32 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 5 Feb 2013 22:38:32 +0000 Subject: [Biopython-dev] More 'fun' with GenBank In-Reply-To: <50FD0F2B.1080606@biotech.uni-tuebingen.de> References: <50F57BC5.7020607@biotech.uni-tuebingen.de> <50F66496.8000109@biotech.uni-tuebingen.de> <50FD0F2B.1080606@biotech.uni-tuebingen.de> Message-ID: On Mon, Jan 21, 2013 at 9:49 AM, Kai Blin wrote: > >> Kai - would you mind retesting with f_loc5 (the rebased branch)? > > The location of the feature that caused trouble for me still looks > correct. I'm currently running some more sequences, but I'm pretty > confident that the code will work just fine. The tests I added to the > genbank parser code for all the problem cases I had pass, after all. :) > >> Everyone - does it seem sensible to include this now, ready for the >> upcoming release (*)? Or perhaps just after the release? > > I'd perfer having this in the next release if possible, but of course > if the release after that is coming up within a reasonable time frame, > that would work as well. > > Cheers, > Kai Unless anyone objects, I will apply the (rebased) version of this f_loc4 / f_loc5 branch later this week (now that Biopython 1.61 is out). This replaces the SeqFeature use of sub_features with a new CompoundLocation which I think is a far more natural way to handle join locations in EMBL/GenBank files. Also, it means we can offer parsing of GenBank/EMBL style location lines into (Compound)Location objects directly :) Regards, Peter From w.arindrarto at gmail.com Wed Feb 6 00:03:52 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 6 Feb 2013 01:03:52 +0100 Subject: [Biopython-dev] [Biopython] Biopython 1.61 released In-Reply-To: References: Message-ID: Hi Peter, > Dear Biopythoneers, > > Source distributions and Windows installers for Biopython 1.61 are now > available from the downloads page on the Biopython website and from > the Python Package Index (PyPI). > > The updated Biopython Tutorial and Cookbook is online (PDF). > > Platforms/Deployment: > > We currently support Python 2.5, 2.6 and 2.7 and also test under > Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython > 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C > extensions). We are still encouraging early adopters to help test on > these platforms, and have included a ?beta? installer for Python 3.2 > (and Python 3.3 to follow soon) under 32-bit Windows. > > Please note we are phasing out support for Python 2.5. We will > continue support for at least one further release (Biopython 1.62). > This could be extended given feedback from our users. Focusing on > Python 2.6 and 2.7 only will make writing Python 3 compatible code > easier. > > New Features: > > GenomeDiagram has three new sigils (shapes to illustrate features). > OCTO shows an octagonal shape, like the existing BOX sigil but with > the corners cut off. JAGGY shows a box with jagged edges at the start > and end, intended for things like NNNNN regions in draft genomes. > Finally BIGARROW is like the existing ARROW sigil but is drawn > straddling the axis. This is useful for drawing vertically compact > figures where you do not have overlapping genes. > > New module Bio.Graphics.ColorSpiral can generate colors along a spiral > path through HSV color space. This can be used to make arbitrary > ?rainbow? scales, for example to color features or cross-links on a > GenomeDiagram figure. > > The Bio.SeqIO module now supports reading sequences from PDB files in > two different ways. The ?pdb-atom? format determines the sequence as > it appears in the structure based on the atom coordinate section of > the file (via Bio.PDB, > so NumPy is currently required for this). Alternatively, you can use > the ?pdb-seqres? format to read the complete protein sequence as it is > listed in the PDB header, if available. > > The Bio.SeqUtils module how has a seq1 function to turn a sequence > using three letter amino acid codes into one using the more common one > letter codes. This acts as the inverse of the existing seq3 function. > > The multiple-sequence-alignment object used by Bio.AlignIO etc now > supports an annotation dictionary. Additional support for per-column > annotation is planned, with addition and splicing to work like that > for the SeqRecord per-letter annotation. > > The Bio.Motif module has been updated and reorganized. To allow for a > clean deprecation of the old code, the new motif code is stored in a > new module Bio.motifs, and a PendingDeprecationWarning was added to > Bio.Motif. > > Experimental Code ? SearchIO: > > This release also includes Bow?s Google Summer of Code work writing a > unified parsing framework for NCBI BLAST (assorted formats including > tabular and XML), HMMER, BLAT, and other sequence searching tools. > This is currently available with the new BiopythonExperimentalWarning > to indicate that this is still somewhat experimental. We?re bundling > it with the main release to get more public feedback, but with the big > warning that the API is likely to change. In fact, even the current > name of Bio.SearchIO may change since unless you are familiar with > BioPerl its purpose isn?t immediately clear. > > Contributors: > > Brandon Invergo > Bryan Lunt (first contribution) > Christian Brueffer (first contribution) > David Cain > Eric Talevich > Grace Yeo (first contribution) > Jeffrey Chang > Jingping Li (first contribution) > Kai Blin (first contribution) > Leighton Pritchard > Lenna Peterson > Lucas Sinclair (first contribution) > Michiel de Hoon > Nick Semenkovich (first contribution) > Peter Cock > Robert Ernst (first contribution) > Tiago Antao > Wibowo ?Bow? Arindrarto > > Thank you all. > > Release announcement here (RSS feed available): > http://news.open-bio.org/news/2013/02/biopython-1-61-released/ > > P.S. You can follow @Biopython on Twitter > https://twitter.com/Biopython Thanks for doing the release! It feels exciting to see SearchIO code finally live in the distributions :). Hopefully this will result in more feedback (and then more improvements ~ likewise for the whole Biopython as well). Also, thank you as well to everyone who has criticized / commented / contributed code to the module :). cheers, Bow From mjldehoon at yahoo.com Wed Feb 6 01:03:30 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 5 Feb 2013 17:03:30 -0800 (PST) Subject: [Biopython-dev] [Biopython-announce] Biopython 1.61 released In-Reply-To: Message-ID: <1360112610.67186.YahooMailClassic@web164006.mail.gq1.yahoo.com> Thanks Peter! Great to see this new code out. Best, -Michiel. --- On Tue, 2/5/13, Peter Cock wrote: > From: Peter Cock > Subject: [Biopython-announce] Biopython 1.61 released > To: biopython-announce at lists.open-bio.org, "Biopython Mailing List" , "Biopython-Dev Mailing List" > Date: Tuesday, February 5, 2013, 5:05 PM > Dear Biopythoneers, > > Source distributions and Windows installers for Biopython > 1.61 are now > available from the downloads page on the Biopython website > and from > the Python Package Index (PyPI). > > The updated Biopython Tutorial and Cookbook is online > (PDF). > > Platforms/Deployment: > > We currently support Python 2.5, 2.6 and 2.7 and also test > under > Python 3.1, 3.2 and 3.3 (including modules using NumPy), and > Jython > 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our > C > extensions). We are still encouraging early adopters to help > test on > these platforms, and have included a ?beta? installer > for Python 3.2 > (and Python 3.3 to follow soon) under 32-bit Windows. > > Please note we are phasing out support for Python 2.5. We > will > continue support for at least one further release (Biopython > 1.62). > This could be extended given feedback from our users. > Focusing on > Python 2.6 and 2.7 only will make writing Python 3 > compatible code > easier. > > New Features: > > GenomeDiagram has three new sigils (shapes to illustrate > features). > OCTO shows an octagonal shape, like the existing BOX sigil > but with > the corners cut off. JAGGY shows a box with jagged edges at > the start > and end, intended for things like NNNNN regions in draft > genomes. > Finally BIGARROW is like the existing ARROW sigil but is > drawn > straddling the axis. This is useful for drawing vertically > compact > figures where you do not have overlapping genes. > > New module Bio.Graphics.ColorSpiral can generate colors > along a spiral > path through HSV color space. This can be used to make > arbitrary > ?rainbow? scales, for example to color features or > cross-links on a > GenomeDiagram figure. > > The Bio.SeqIO module now supports reading sequences from PDB > files in > two different ways. The ?pdb-atom? format determines the > sequence as > it appears in the structure based on the atom coordinate > section of > the file (via Bio.PDB, > so NumPy is currently required for this). Alternatively, you > can use > the ?pdb-seqres? format to read the complete protein > sequence as it is > listed in the PDB header, if available. > > The Bio.SeqUtils module how has a seq1 function to turn a > sequence > using three letter amino acid codes into one using the more > common one > letter codes. This acts as the inverse of the existing seq3 > function. > > The multiple-sequence-alignment object used by Bio.AlignIO > etc now > supports an annotation dictionary. Additional support for > per-column > annotation is planned, with addition and splicing to work > like that > for the SeqRecord per-letter annotation. > > The Bio.Motif module has been updated and reorganized. To > allow for a > clean deprecation of the old code, the new motif code is > stored in a > new module Bio.motifs, and a PendingDeprecationWarning was > added to > Bio.Motif. > > Experimental Code ? SearchIO: > > This release also includes Bow?s Google Summer of Code > work writing a > unified parsing framework for NCBI BLAST (assorted formats > including > tabular and XML), HMMER, BLAT, and other sequence searching > tools. > This is currently available with the new > BiopythonExperimentalWarning > to indicate that this is still somewhat experimental. > We?re bundling > it with the main release to get more public feedback, but > with the big > warning that the API is likely to change. In fact, even the > current > name of Bio.SearchIO may change since unless you are > familiar with > BioPerl its purpose isn?t immediately clear. > > Contributors: > > Brandon Invergo > Bryan Lunt (first contribution) > Christian Brueffer (first contribution) > David Cain > Eric Talevich > Grace Yeo (first contribution) > Jeffrey Chang > Jingping Li (first contribution) > Kai Blin (first contribution) > Leighton Pritchard > Lenna Peterson > Lucas Sinclair (first contribution) > Michiel de Hoon > Nick Semenkovich (first contribution) > Peter Cock > Robert Ernst (first contribution) > Tiago Antao > Wibowo ?Bow? Arindrarto > > Thank you all. > > Release announcement here (RSS feed available): > http://news.open-bio.org/news/2013/02/biopython-1-61-released/ > > P.S. You can follow @Biopython on Twitter > https://twitter.com/Biopython > > _______________________________________________ > Biopython-announce mailing list? -? Biopython-announce at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-announce > From mjldehoon at yahoo.com Wed Feb 6 01:07:53 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 5 Feb 2013 17:07:53 -0800 (PST) Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) Message-ID: <1360112873.79741.YahooMailClassic@web164005.mail.gq1.yahoo.com> With Biopython 1.61 now out, perhaps this is a good time to tackle Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like to replace this with a plain C module, or perhaps with a pure-Python parser. This issue was previously discussed here: http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html Or is anybody else already looking at this module? Best, -Michiel. From arklenna at gmail.com Wed Feb 6 01:31:16 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Tue, 5 Feb 2013 20:31:16 -0500 Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: <1360112873.79741.YahooMailClassic@web164005.mail.gq1.yahoo.com> References: <1360112873.79741.YahooMailClassic@web164005.mail.gq1.yahoo.com> Message-ID: Hi Michiel, I worked on that a bit early last year. See thread on this bug: https://redmine.open-bio.org/issues/2619 Namely, I determined that the flex headers aren't required to compile the flex-generated C, which is a great start. I also started work on a PLY-based pure Python reimplementation. Pull request here: https://github.com/biopython/biopython/pull/33 I haven't looked at this code in quite a long time! Let me know if you have any questions about what I did and I will do my best to remember... Cheers, Lenna On Tue, Feb 5, 2013 at 8:07 PM, Michiel de Hoon wrote: > With Biopython 1.61 now out, perhaps this is a good time to tackle > Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like > to replace this with a plain C module, or perhaps with a pure-Python > parser. This issue was previously discussed here: > > http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html > > Or is anybody else already looking at this module? > > Best, > -Michiel. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From kieran.mace at gmail.com Wed Feb 6 02:05:19 2013 From: kieran.mace at gmail.com (Kieran Mace) Date: Tue, 5 Feb 2013 18:05:19 -0800 Subject: [Biopython-dev] [Biopython-announce] Biopython 1.61 released In-Reply-To: <1360112610.67186.YahooMailClassic@web164006.mail.gq1.yahoo.com> References: <1360112610.67186.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: Hi. I'm wondering if the MafIO module is going to be included in this release? -Kieran On Feb 5, 2013, at 17:03, Michiel de Hoon wrote: > Thanks Peter! > Great to see this new code out. > > Best, > -Michiel. > > --- On Tue, 2/5/13, Peter Cock wrote: > >> From: Peter Cock >> Subject: [Biopython-announce] Biopython 1.61 released >> To: biopython-announce at lists.open-bio.org, "Biopython Mailing List" , "Biopython-Dev Mailing List" >> Date: Tuesday, February 5, 2013, 5:05 PM >> Dear Biopythoneers, >> >> Source distributions and Windows installers for Biopython >> 1.61 are now >> available from the downloads page on the Biopython website >> and from >> the Python Package Index (PyPI). >> >> The updated Biopython Tutorial and Cookbook is online >> (PDF). >> >> Platforms/Deployment: >> >> We currently support Python 2.5, 2.6 and 2.7 and also test >> under >> Python 3.1, 3.2 and 3.3 (including modules using NumPy), and >> Jython >> 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our >> C >> extensions). We are still encouraging early adopters to help >> test on >> these platforms, and have included a ?beta? installer >> for Python 3.2 >> (and Python 3.3 to follow soon) under 32-bit Windows. >> >> Please note we are phasing out support for Python 2.5. We >> will >> continue support for at least one further release (Biopython >> 1.62). >> This could be extended given feedback from our users. >> Focusing on >> Python 2.6 and 2.7 only will make writing Python 3 >> compatible code >> easier. >> >> New Features: >> >> GenomeDiagram has three new sigils (shapes to illustrate >> features). >> OCTO shows an octagonal shape, like the existing BOX sigil >> but with >> the corners cut off. JAGGY shows a box with jagged edges at >> the start >> and end, intended for things like NNNNN regions in draft >> genomes. >> Finally BIGARROW is like the existing ARROW sigil but is >> drawn >> straddling the axis. This is useful for drawing vertically >> compact >> figures where you do not have overlapping genes. >> >> New module Bio.Graphics.ColorSpiral can generate colors >> along a spiral >> path through HSV color space. This can be used to make >> arbitrary >> ?rainbow? scales, for example to color features or >> cross-links on a >> GenomeDiagram figure. >> >> The Bio.SeqIO module now supports reading sequences from PDB >> files in >> two different ways. The ?pdb-atom? format determines the >> sequence as >> it appears in the structure based on the atom coordinate >> section of >> the file (via Bio.PDB, >> so NumPy is currently required for this). Alternatively, you >> can use >> the ?pdb-seqres? format to read the complete protein >> sequence as it is >> listed in the PDB header, if available. >> >> The Bio.SeqUtils module how has a seq1 function to turn a >> sequence >> using three letter amino acid codes into one using the more >> common one >> letter codes. This acts as the inverse of the existing seq3 >> function. >> >> The multiple-sequence-alignment object used by Bio.AlignIO >> etc now >> supports an annotation dictionary. Additional support for >> per-column >> annotation is planned, with addition and splicing to work >> like that >> for the SeqRecord per-letter annotation. >> >> The Bio.Motif module has been updated and reorganized. To >> allow for a >> clean deprecation of the old code, the new motif code is >> stored in a >> new module Bio.motifs, and a PendingDeprecationWarning was >> added to >> Bio.Motif. >> >> Experimental Code ? SearchIO: >> >> This release also includes Bow?s Google Summer of Code >> work writing a >> unified parsing framework for NCBI BLAST (assorted formats >> including >> tabular and XML), HMMER, BLAT, and other sequence searching >> tools. >> This is currently available with the new >> BiopythonExperimentalWarning >> to indicate that this is still somewhat experimental. >> We?re bundling >> it with the main release to get more public feedback, but >> with the big >> warning that the API is likely to change. In fact, even the >> current >> name of Bio.SearchIO may change since unless you are >> familiar with >> BioPerl its purpose isn?t immediately clear. >> >> Contributors: >> >> Brandon Invergo >> Bryan Lunt (first contribution) >> Christian Brueffer (first contribution) >> David Cain >> Eric Talevich >> Grace Yeo (first contribution) >> Jeffrey Chang >> Jingping Li (first contribution) >> Kai Blin (first contribution) >> Leighton Pritchard >> Lenna Peterson >> Lucas Sinclair (first contribution) >> Michiel de Hoon >> Nick Semenkovich (first contribution) >> Peter Cock >> Robert Ernst (first contribution) >> Tiago Antao >> Wibowo ?Bow? Arindrarto >> >> Thank you all. >> >> Release announcement here (RSS feed available): >> http://news.open-bio.org/news/2013/02/biopython-1-61-released/ >> >> P.S. You can follow @Biopython on Twitter >> https://twitter.com/Biopython >> >> _______________________________________________ >> Biopython-announce mailing list - Biopython-announce at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-announce > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Wed Feb 6 08:37:05 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 6 Feb 2013 08:37:05 +0000 Subject: [Biopython-dev] Biopython 1.61 released In-Reply-To: References: <1360112610.67186.YahooMailClassic@web164006.mail.gq1.yahoo.com> Message-ID: On Wednesday, February 6, 2013, Kieran Mace wrote: > Hi. > > I'm wondering if the MafIO module is going to be included in this release? > > -Kieran I'm not promising but I would hope so. There is some work to be done first with locations and start/end information in the SeqRecord. See also the CompoundLocation discussion. Peter From mjldehoon at yahoo.com Wed Feb 6 08:36:26 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 6 Feb 2013 00:36:26 -0800 (PST) Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: Message-ID: <1360139786.53882.YahooMailClassic@web164003.mail.gq1.yahoo.com> Hi Lenna, Thanks for your reply. Are you planning to continue your work on the PLY-based mmCIF parser? Best, -Michiel --- On Tue, 2/5/13, Lenna Peterson wrote: From: Lenna Peterson Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) To: "Michiel de Hoon" Cc: "BioPython-Dev Mailing List" Date: Tuesday, February 5, 2013, 8:31 PM Hi Michiel,? I worked on that a bit early last year. See thread on this bug:? https://redmine.open-bio.org/issues/2619 Namely, I determined that the flex headers aren't required to compile the flex-generated C, which is a great start.? I also started work on a PLY-based pure Python reimplementation. Pull request here: https://github.com/biopython/biopython/pull/33 I haven't looked at this code in quite a long time! Let me know if you have any questions about what I did and I will do my best to remember... Cheers, Lenna On Tue, Feb 5, 2013 at 8:07 PM, Michiel de Hoon wrote: With Biopython 1.61 now out, perhaps this is a good time to tackle Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like to replace this with a plain C module, or perhaps with a pure-Python parser. This issue was previously discussed here: http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html Or is anybody else already looking at this module? Best, -Michiel. _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From redmine at redmine.open-bio.org Wed Feb 6 21:39:04 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 6 Feb 2013 21:39:04 +0000 Subject: [Biopython-dev] [Biopython - Bug #3411] (New) Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST) Message-ID: Issue #3411 has been reported by Tom McCoy. ---------------------------------------- Bug #3411: Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST) https://redmine.open-bio.org/issues/3411 Author: Tom McCoy Status: New Priority: Normal Assignee: Category: Target version: URL: "Either a single UID or a comma-delimited list of UIDs may be provided. All of the UIDs must be from the database specified by db. There is no set maximum for the number of UIDs that can be passed to EFetch, *but if more than about 200 UIDs are to be provided, the request should be made using the HTTP POST method*." -- http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch Entrez.efetch uses this API endpoint via GET regardless of the number of UIDs supplied. The attached patch corrects this behavior. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Feb 7 10:20:30 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 7 Feb 2013 10:20:30 +0000 Subject: [Biopython-dev] [Biopython - Bug #3411] Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST) References: Message-ID: Issue #3411 has been updated by Peter Cock. Assignee set to Biopython Dev Mailing List I don't recall that guideline being in the earlier requirements/documentation when Bio.Entrez was first written, but the fix proposed looks sensible. (Note - do we need to worry about the ids being a string or a list at that point, and therefore how to count the entries?) P.S. Resetting assignee to default of the dev mailing list. ---------------------------------------- Bug #3411: Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST) https://redmine.open-bio.org/issues/3411 Author: Tom McCoy Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: "Either a single UID or a comma-delimited list of UIDs may be provided. All of the UIDs must be from the database specified by db. There is no set maximum for the number of UIDs that can be passed to EFetch, *but if more than about 200 UIDs are to be provided, the request should be made using the HTTP POST method*." -- http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch Entrez.efetch uses this API endpoint via GET regardless of the number of UIDs supplied. The attached patch corrects this behavior. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Thu Feb 7 11:33:25 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 7 Feb 2013 11:33:25 +0000 Subject: [Biopython-dev] Biopython 1.61 released In-Reply-To: References: Message-ID: On Tue, Feb 5, 2013 at 10:05 PM, Peter Cock wrote: > Dear Biopythoneers, > > Source distributions and Windows installers for Biopython 1.61 are now > available from the downloads page on the Biopython website and from > the Python Package Index (PyPI). > > The updated Biopython Tutorial and Cookbook is online (PDF). > > Platforms/Deployment: > > We currently support Python 2.5, 2.6 and 2.7 and also test under > Python 3.1, 3.2 and 3.3 (including modules using NumPy), and Jython > 2.5 and PyPy 1.9 (Jython and PyPy do not cover NumPy or our C > extensions). We are still encouraging early adopters to help test on > these platforms, and have included a ?beta? installer for Python 3.2 > (and Python 3.3 to follow soon) under 32-bit Windows. For those of you wanting to try Biopython on Python 3.3 on Windows, there is now an installer for Biopython 1.61 built against NumPy 1.7.0rc2. NumPy 1.7 is their first release to support Python 3.3, and the official release is expected to be near-identical to this second release candidate, see: http://mail.scipy.org/pipermail/numpy-discussion/2013-February/065384.html Regards, Peter From p.j.a.cock at googlemail.com Thu Feb 7 11:53:40 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 7 Feb 2013 11:53:40 +0000 Subject: [Biopython-dev] [Biopython] Fwd: Bug in bgzf module In-Reply-To: References: Message-ID: On Wed, Feb 6, 2013 at 10:35 PM, Petra Kubincov? wrote: > Hi Peter, > > based on your unit test for tell method I've created this: > http://dl.dropbox.com/u/... > I hope it's at least partially usable. > > Regards, > Petra Thanks, I turned that into this commit: https://github.com/biopython/biopython/commit/194bda7cd4bc292b37fd219f1f95a19e1316ac5a That lead me to notice a special case with offsets on a block boundary, see this fix and test: https://github.com/biopython/biopython/commit/fef7659dacaf93ddeb6270103d8ded6fb89414b7 Peter From p.j.a.cock at googlemail.com Thu Feb 7 13:30:31 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 7 Feb 2013 13:30:31 +0000 Subject: [Biopython-dev] More 'fun' with GenBank In-Reply-To: References: <50F57BC5.7020607@biotech.uni-tuebingen.de> <50F66496.8000109@biotech.uni-tuebingen.de> <50FD0F2B.1080606@biotech.uni-tuebingen.de> Message-ID: On Tue, Feb 5, 2013 at 10:38 PM, Peter Cock wrote: > On Mon, Jan 21, 2013 at 9:49 AM, Kai Blin > wrote: >> >>> Kai - would you mind retesting with f_loc5 (the rebased branch)? >> >> The location of the feature that caused trouble for me still looks >> correct. I'm currently running some more sequences, but I'm pretty >> confident that the code will work just fine. The tests I added to the >> genbank parser code for all the problem cases I had pass, after all. :) >> >>> Everyone - does it seem sensible to include this now, ready for the >>> upcoming release (*)? Or perhaps just after the release? >> >> I'd perfer having this in the next release if possible, but of course >> if the release after that is coming up within a reasonable time frame, >> that would work as well. >> >> Cheers, >> Kai > > Unless anyone objects, I will apply the (rebased) version of this > f_loc4 / f_loc5 branch later this week (now that Biopython 1.61 > is out). > > This replaces the SeqFeature use of sub_features with a new > CompoundLocation which I think is a far more natural way to > handle join locations in EMBL/GenBank files. > > Also, it means we can offer parsing of GenBank/EMBL style > location lines into (Compound)Location objects directly :) > > Regards, > > Peter Applied to master, https://github.com/biopython/biopython/commit/e5ff9e48e315924d59348c013ab082d6f155d18b Peter From kai.blin at biotech.uni-tuebingen.de Thu Feb 7 14:47:37 2013 From: kai.blin at biotech.uni-tuebingen.de (Kai Blin) Date: Thu, 07 Feb 2013 15:47:37 +0100 Subject: [Biopython-dev] More 'fun' with GenBank In-Reply-To: References: <50F57BC5.7020607@biotech.uni-tuebingen.de> <50F66496.8000109@biotech.uni-tuebingen.de> <50FD0F2B.1080606@biotech.uni-tuebingen.de> Message-ID: <5113BE89.3050303@biotech.uni-tuebingen.de> On 2013-02-07 14:30, Peter Cock wrote: Hi Peter, > Applied to master, > https://github.com/biopython/biopython/commit/e5ff9e48e315924d59348c013ab082d6f155d18b Thanks for that. Cheers, Kai -- Dipl.-Inform. Kai Blin kai.blin at biotech.uni-tuebingen.de Institute for Microbiology and Infection Medicine Division of Microbiology/Biotechnology Eberhard-Karls-Universit?t T?bingen Auf der Morgenstelle 28 Phone : ++49 7071 29-78841 D-72076 T?bingen Fax : ++49 7071 29-5979 Germany Homepage: http://www.mikrobio.uni-tuebingen.de/ag_wohlleben From arklenna at gmail.com Thu Feb 7 18:21:37 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Thu, 7 Feb 2013 13:21:37 -0500 Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: <1360139786.53882.YahooMailClassic@web164003.mail.gq1.yahoo.com> References: <1360139786.53882.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: Hi Michiel, If there are well-defined problems with the PLY parser, I can work on fixing them. I am not currently working with mmCIF so I am not in the best position to evaluate where and how the parser needs to be improved. I am working with X-ray PDB files and I am not sure if my collaborators are familiar with mmCIF. I have not dealt with NMR files of any type, either. Cheers, Lenna On Wed, Feb 6, 2013 at 3:36 AM, Michiel de Hoon wrote: > Hi Lenna, > > Thanks for your reply. > Are you planning to continue your work on the PLY-based mmCIF parser? > > Best, > -Michiel > > --- On *Tue, 2/5/13, Lenna Peterson * wrote: > > > From: Lenna Peterson > Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) > To: "Michiel de Hoon" > Cc: "BioPython-Dev Mailing List" > Date: Tuesday, February 5, 2013, 8:31 PM > > > Hi Michiel, > > I worked on that a bit early last year. See thread on this bug: > > https://redmine.open-bio.org/issues/2619 > > Namely, I determined that the flex headers aren't required to compile the > flex-generated C, which is a great start. > > I also started work on a PLY-based pure Python reimplementation. Pull > request here: > > https://github.com/biopython/biopython/pull/33 > > I haven't looked at this code in quite a long time! Let me know if you > have any questions about what I did and I will do my best to remember... > > Cheers, > > Lenna > > > On Tue, Feb 5, 2013 at 8:07 PM, Michiel de Hoon > > wrote: > > With Biopython 1.61 now out, perhaps this is a good time to tackle > Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like > to replace this with a plain C module, or perhaps with a pure-Python > parser. This issue was previously discussed here: > > http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html > > Or is anybody else already looking at this module? > > Best, > -Michiel. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > From anaryin at gmail.com Thu Feb 7 18:25:37 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 7 Feb 2013 19:25:37 +0100 Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: References: <1360139786.53882.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: Hi, In our NMR lab I am pretty sure mmCIF files are not even known.. How widely used is the format in x-ray labs? I have never seen it outside this mailing list to be honest. Best, Jo?o From p.j.a.cock at googlemail.com Fri Feb 8 15:21:46 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 8 Feb 2013 15:21:46 +0000 Subject: [Biopython-dev] Fwd: [biopython] Newick parser (#156) In-Reply-To: References: Message-ID: Eric, Could you take a look at this please? Thanks, Peter ---------- Forwarded message ---------- From: Ben Morris Date: Fri, Feb 8, 2013 at 3:12 PM Subject: [biopython] Newick parser (#156) To: biopython/biopython In light of three issues with the Newick parser: https://redmine.open-bio.org/issues/3409 https://redmine.open-bio.org/issues/3386 https://redmine.open-bio.org/issues/3407 this is a rewrite of the parser from scratch. It supports quoted node labels and can handle support values either as they were previously handled or from square-bracketed comments, as requested by Arlin. Additionally, it's consistently quite fast: [image: newick_parse_times] The unit tests still pass with these changes, and I'm now able to parse trees that previously raised exceptions. ------------------------------ You can merge this Pull Request by running git pull https://github.com/bendmorris/biopython newick Or view, comment on, or merge it at: https://github.com/biopython/biopython/pull/156 Commit Summary - A more efficient implementation of a Newick parser (linear time vs. quadratic) that makes only a single pass over the text and handles quoted labels correctly. - Implementing support values and fixing issue when external parentheses are missing. File Changes - *M* Bio/Phylo/NewickIO.py(198) Patch Links: - https://github.com/biopython/biopython/pull/156.patch - https://github.com/biopython/biopython/pull/156.diff From mjldehoon at yahoo.com Sat Feb 9 01:42:23 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 8 Feb 2013 17:42:23 -0800 (PST) Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: Message-ID: <1360374143.25311.YahooMailClassic@web164004.mail.gq1.yahoo.com> Hi Lenna, --- On Thu, 2/7/13, Lenna Peterson wrote: > If there are well-defined problems with the PLY parser, I can work on > fixing them. I am not currently working with mmCIF so I am not in the > best position to evaluate where and how the parser needs to be improved. I don't know of any problems with the PLY parser, but since it relies on PLY, it would add another dependency to Biopython. On the other hand, a pure-Python solution may be preferable, as it's easier to maintain and runs with Jython. The C implementation is considerably faster, but I doubt that it really matters since the Python (PLY) parser seems to be fast enough. I see three options then: 1) Remove the lex stuff from lex.yy.c, and optionally convert the remaining C code to Python. 2) Remove the PLY dependency from the PLY-based parser. 3) Write a new pure-Python parser from scratch. I'm guessing that 1) will be the most straightforward. Other opinions? Best, -Michiel. --- On Thu, 2/7/13, Lenna Peterson wrote: If there are well-defined problems with the PLY parser, I can work on fixing them. I am not currently working with mmCIF so I am not in the best position to evaluate where and how the parser needs to be improved. I am working with X-ray PDB files and I am not sure if my collaborators are familiar with mmCIF. I have not dealt with NMR files of any type, either.? Cheers, Lenna On Wed, Feb 6, 2013 at 3:36 AM, Michiel de Hoon wrote: Hi Lenna, Thanks for your reply. Are you planning to continue your work on the PLY-based mmCIF parser? Best, -Michiel --- On Tue, 2/5/13, Lenna Peterson wrote: From: Lenna Peterson Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) To: "Michiel de Hoon" Cc: "BioPython-Dev Mailing List" Date: Tuesday, February 5, 2013, 8:31 PM Hi Michiel,? I worked on that a bit early last year. See thread on this bug:? https://redmine.open-bio.org/issues/2619 Namely, I determined that the flex headers aren't required to compile the flex-generated C, which is a great start.? I also started work on a PLY-based pure Python reimplementation. Pull request here: https://github.com/biopython/biopython/pull/33 I haven't looked at this code in quite a long time! Let me know if you have any questions about what I did and I will do my best to remember... Cheers, Lenna On Tue, Feb 5, 2013 at 8:07 PM, Michiel de Hoon wrote: With Biopython 1.61 now out, perhaps this is a good time to tackle Bio.PDB.mmCIF? This module uses flex to generate the parser; I would like to replace this with a plain C module, or perhaps with a pure-Python parser. This issue was previously discussed here: http://lists.open-bio.org/pipermail/biopython-dev/2008-October/004466.html Or is anybody else already looking at this module? Best, -Michiel. _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From redmine at redmine.open-bio.org Sat Feb 9 08:22:31 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sat, 9 Feb 2013 08:22:31 +0000 Subject: [Biopython-dev] [Biopython - Bug #3411] Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST) References: Message-ID: Issue #3411 has been updated by Michiel de Hoon. Fixed (using a slightly different code); see revision f1836165. ---------------------------------------- Bug #3411: Bio.Entrez.efetch does not respect the API docs / spec on HTTP verb use (GET vs. POST) https://redmine.open-bio.org/issues/3411 Author: Tom McCoy Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Target version: URL: "Either a single UID or a comma-delimited list of UIDs may be provided. All of the UIDs must be from the database specified by db. There is no set maximum for the number of UIDs that can be passed to EFetch, *but if more than about 200 UIDs are to be provided, the request should be made using the HTTP POST method*." -- http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch Entrez.efetch uses this API endpoint via GET regardless of the number of UIDs supplied. The attached patch corrects this behavior. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Sat Feb 9 11:53:31 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 9 Feb 2013 11:53:31 +0000 Subject: [Biopython-dev] Deprecating Bio.Index? Message-ID: Hello all, Does anyone still use Bio.Index? I don't think any of Biopython itself does nowadays, so perhaps we can deprecate this? https://github.com/biopython/biopython/blob/master/Bio/Index.py (We should of course ask on the main list first just in case) Regards, Peter From colin.aibn at gmail.com Sat Feb 9 13:06:13 2013 From: colin.aibn at gmail.com (Colin Archer) Date: Sat, 9 Feb 2013 23:06:13 +1000 Subject: [Biopython-dev] SearchIO HSP indexing Message-ID: Hi everyone, I have a question about the implementation of high-scoring segment pairs (HSPs) in SearchIO. I currently have an BLAST output file in XML format I am parsing and this is one of the hits (removed the alignment details to save space): 1 gnl|BL_ORD_ID|111 ref|NC_007779|:125695-127587 111 1893 1 3352.79 1815 0 1 1893 1 1893 1 1 1867 1867 0 2 399.997 216 2.88061e-111 331 881 22 581 1 1 452 452 19 565 Using Hsp1 as an example, the query and hit starts ("Hsp_query_to" and "Hsp_hit_to") are both 1 in the XML but when I access the Hsp objects from the BlastResult, both values are equal to 0: >>> blast_record[0][0].query_start 0 >>> blast_record[0][0].hit_start 0 However, when I access the end objects for the query and hit, the result isn't 1892 (zero based 1893) but 1893: >>> blast_record[0][0].query_end 1893 >>> blast_record[0][0].hit_end 1893 Is this correct? I find it a little confusing that one result is zero-based and the other one-based. Thanks Colin From p.j.a.cock at googlemail.com Sat Feb 9 13:16:43 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 9 Feb 2013 13:16:43 +0000 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: References: Message-ID: On Sat, Feb 9, 2013 at 1:06 PM, Colin Archer wrote: > Hi everyone, > I have a question about the implementation of > high-scoring segment pairs (HSPs) in SearchIO. I currently have an BLAST > output file in XML format I am parsing and this is one of the hits (removed > the alignment details to save space): > > > 1 > gnl|BL_ORD_ID|111 > ref|NC_007779|:125695-127587 > 111 > 1893 > > > 1 > 3352.79 > 1815 > 0 > 1 > 1893 > 1 > 1893 > 1 > 1 > 1867 > 1867 > 0 > > > 2 > 399.997 > 216 > 2.88061e-111 > 331 > 881 > 22 > 581 > 1 > 1 > 452 > 452 > 19 > 565 > > > Using Hsp1 as an example, the query and hit starts ("Hsp_query_to" and > "Hsp_hit_to") are both 1 in the XML but when I access the Hsp objects from > the BlastResult, both values are equal to 0: > >>>> blast_record[0][0].query_start > 0 >>>> blast_record[0][0].hit_start > 0 > > However, when I access the end objects for the query and hit, the result > isn't 1892 (zero based 1893) but 1893: > >>>> blast_record[0][0].query_end > 1893 >>>> blast_record[0][0].hit_end > 1893 > > Is this correct? I find it a little confusing that one result is zero-based > and the other one-based. > > Thanks > Colin Hi Colin, The SearchIO positions like elsewhere in Biopython should be using Python style counting. Looking at this one: 1 1893 That is like a GenBank/EMBL location 1..1893 which in Python string slicing is [0:1893], so the start has -1 but the end is unchanged. The nice thing is the length is 1893 and is given as the difference of the Python slicing style end and start. Perhaps we need to work on the help text? Any suggestions? Thanks, Peter From colin.aibn at gmail.com Sat Feb 9 13:54:42 2013 From: colin.aibn at gmail.com (Colin Archer) Date: Sat, 9 Feb 2013 23:54:42 +1000 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: References: Message-ID: Hi Peter, Thanks for getting back to me so quickly. I'm curious about the benefits of having these values in Python string slicing format? I haven't come across this very often, I'm used to seeing values systematically zero or one-based. Would it be easier to keep the range variables hit_range and hit_range_all in slicing format and the start and end variables in sequence position format so that they represent the actual BLAST results? I had a look at some of the code and I can't see the slicing format mentioned anywhere (Hsp.py, Hit.py, or blast_xml.py). It would probably be helpful to explain the values in Hsp.py as a ** mark on hsp_start, hsp_end, query_start, and query_end so that if people are interested they can have a look at the files and see what they mean. Thanks Colin On Sat, Feb 9, 2013 at 11:16 PM, Peter Cock wrote: > On Sat, Feb 9, 2013 at 1:06 PM, Colin Archer wrote: > > Hi everyone, > > I have a question about the implementation of > > high-scoring segment pairs (HSPs) in SearchIO. I currently have an BLAST > > output file in XML format I am parsing and this is one of the hits > (removed > > the alignment details to save space): > > > > > > 1 > > gnl|BL_ORD_ID|111 > > ref|NC_007779|:125695-127587 > > 111 > > 1893 > > > > > > 1 > > 3352.79 > > 1815 > > 0 > > 1 > > 1893 > > 1 > > 1893 > > 1 > > 1 > > 1867 > > 1867 > > 0 > > > > > > 2 > > 399.997 > > 216 > > 2.88061e-111 > > 331 > > 881 > > 22 > > 581 > > 1 > > 1 > > 452 > > 452 > > 19 > > 565 > > > > > > Using Hsp1 as an example, the query and hit starts ("Hsp_query_to" and > > "Hsp_hit_to") are both 1 in the XML but when I access the Hsp objects > from > > the BlastResult, both values are equal to 0: > > > >>>> blast_record[0][0].query_start > > 0 > >>>> blast_record[0][0].hit_start > > 0 > > > > However, when I access the end objects for the query and hit, the result > > isn't 1892 (zero based 1893) but 1893: > > > >>>> blast_record[0][0].query_end > > 1893 > >>>> blast_record[0][0].hit_end > > 1893 > > > > Is this correct? I find it a little confusing that one result is > zero-based > > and the other one-based. > > > > Thanks > > Colin > > Hi Colin, > > The SearchIO positions like elsewhere in Biopython should be > using Python style counting. Looking at this one: > > 1 > 1893 > > That is like a GenBank/EMBL location 1..1893 which in Python string > slicing is [0:1893], so the start has -1 but the end is unchanged. The > nice thing is the length is 1893 and is given as the difference of the > Python slicing style end and start. > > Perhaps we need to work on the help text? Any suggestions? > > Thanks, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From p.j.a.cock at googlemail.com Sat Feb 9 14:30:26 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 9 Feb 2013 14:30:26 +0000 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: References: Message-ID: On Sat, Feb 9, 2013 at 1:54 PM, Colin Archer wrote: > Hi Peter, > Thanks for getting back to me so quickly. > Thank you - the main reason for including SearchIO in Biopython 1.61 as 'experimental code' is to get wider testing and feedback (hopefully an approach that will work well and we can use this more in future for other new code). > I'm curious about the benefits of having these values in Python string > slicing format? I haven't come across this very often, I'm used to seeing > values systematically zero or one-based. Once you're used to Python slicing it becomes very natural. > Would it be easier to keep the range variables hit_range and hit_range_all > in slicing format and the start and end variables in sequence position > format so that they represent the actual BLAST results? One reason for this is to be consistent across all the formats supported in SearchIO, and since Biopython is a Python library following Python norms seems most natural. > I had a look at some of the code and I can't see the slicing format > mentioned anywhere (Hsp.py, Hit.py, or blast_xml.py). It would probably be > helpful to explain the values in Hsp.py as a ** mark on hsp_start, hsp_end, > query_start, and query_end so that if people are interested they can have a > look at the files and see what they mean. > > Thanks > Colin OK, so some clarification with examples in the docstrings is needed. How about the Tutorial chapter? Thanks, Peter From chapmanb at 50mail.com Sat Feb 9 14:43:26 2013 From: chapmanb at 50mail.com (Brad Chapman) Date: Sat, 09 Feb 2013 09:43:26 -0500 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: References: Message-ID: <87a9rdy2cx.fsf@fastmail.fm> Colin; >> I'm curious about the benefits of having these values in Python string >> slicing format? I haven't come across this very often, I'm used to seeing >> values systematically zero or one-based. To clarify further in addition to Peter's response, the 0-based half-open and 1-based closed systems are the two systems you're referring to. Python, and most programming languages, use the 0-based half open indexing approach which is what SearchIO is converting to. Aaron has a nice response on BioStars while explains the differences in more details: http://www.biostars.org/p/6373/#6377 Brad From colin.aibn at gmail.com Sat Feb 9 15:18:33 2013 From: colin.aibn at gmail.com (Colin Archer) Date: Sun, 10 Feb 2013 01:18:33 +1000 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: <87a9rdy2cx.fsf@fastmail.fm> References: <87a9rdy2cx.fsf@fastmail.fm> Message-ID: Interesting commentary from Edsger Dijkstra as well: http://www.cs.utexas.edu/~EWD/ewd08xx/EWD831.PDF if possible, I would definitely add some of these links to either the tutorial or the code Colin On Sun, Feb 10, 2013 at 12:43 AM, Brad Chapman wrote: > > Colin; > > >> I'm curious about the benefits of having these values in Python string > >> slicing format? I haven't come across this very often, I'm used to > seeing > >> values systematically zero or one-based. > > To clarify further in addition to Peter's response, the 0-based > half-open and 1-based closed systems are the two systems you're > referring to. Python, and most programming languages, use the 0-based > half open indexing approach which is what SearchIO is converting to. > Aaron has a nice response on BioStars while explains the differences in > more details: > > http://www.biostars.org/p/6373/#6377 > > Brad > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From Markus.Piotrowski at ruhr-uni-bochum.de Sat Feb 9 15:12:12 2013 From: Markus.Piotrowski at ruhr-uni-bochum.de (Markus Piotrowski) Date: 9 Feb 2013 16:12:12 +0100 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: References: Message-ID: <739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de> Mmh, at least Bio.Blast.NCBIXML returns the exact values given in the xml result. So query_start and sbjct_start (BTW, not hit_start) return the values from and . Thus, my first guess would be that a search function that can return an entity 'query_start' will return the value that is written in the file. Markus Am 2013-02-09 15:30, schrieb Peter Cock: > On Sat, Feb 9, 2013 at 1:54 PM, Colin Archer > wrote: >> Hi Peter, >> Thanks for getting back to me so quickly. >> > > Thank you - the main reason for including SearchIO in Biopython 1.61 > as 'experimental code' is to get wider testing and feedback > (hopefully > an approach that will work well and we can use this more in future > for > other new code). > >> I'm curious about the benefits of having these values in Python >> string >> slicing format? I haven't come across this very often, I'm used to >> seeing >> values systematically zero or one-based. > > Once you're used to Python slicing it becomes very natural. > >> Would it be easier to keep the range variables hit_range and >> hit_range_all >> in slicing format and the start and end variables in sequence >> position >> format so that they represent the actual BLAST results? > > One reason for this is to be consistent across all the formats > supported > in SearchIO, and since Biopython is a Python library following Python > norms seems most natural. > >> I had a look at some of the code and I can't see the slicing format >> mentioned anywhere (Hsp.py, Hit.py, or blast_xml.py). It would >> probably be >> helpful to explain the values in Hsp.py as a ** mark on hsp_start, >> hsp_end, >> query_start, and query_end so that if people are interested they can >> have a >> look at the files and see what they mean. >> >> Thanks >> Colin > > OK, so some clarification with examples in the docstrings is needed. > How about the Tutorial chapter? > > Thanks, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From colin.aibn at gmail.com Sat Feb 9 15:19:26 2013 From: colin.aibn at gmail.com (Colin Archer) Date: Sun, 10 Feb 2013 01:19:26 +1000 Subject: [Biopython-dev] Fwd: SearchIO HSP indexing In-Reply-To: References: Message-ID: > Hi Peter, > > Thanks for getting back to me so quickly. > > > > Thank you - the main reason for including SearchIO in Biopython 1.61 > as 'experimental code' is to get wider testing and feedback (hopefully > an approach that will work well and we can use this more in future for > other new code). > > I've been using it for a couple months now and i definitely prefer it over the existing parser. > > I'm curious about the benefits of having these values in Python string > > slicing format? I haven't come across this very often, I'm used to seeing > > values systematically zero or one-based. > > Once you're used to Python slicing it becomes very natural. > > > Would it be easier to keep the range variables hit_range and hit_range_all > > in slicing format and the start and end variables in sequence position > > format so that they represent the actual BLAST results? > > One reason for this is to be consistent across all the formats supported > in SearchIO, and since Biopython is a Python library following Python > norms seems most natural. > > > I had a look at some of the code and I can't see the slicing format > > mentioned anywhere (Hsp.py, Hit.py, or blast_xml.py). It would probably > be > > helpful to explain the values in Hsp.py as a ** mark on hsp_start, > hsp_end, > > query_start, and query_end so that if people are interested they can > have a > > look at the files and see what they mean. > > > > Thanks > > Colin > > OK, so some clarification with examples in the docstrings is needed. > How about the Tutorial chapter? > > I would definitely add comments to the Hsp.py file and if there is a tutorial that people use, I would also update that as that would be the first place most people would look. I was wondering if there was any code in SearchIO to align high-scoring segment pairs against the same hit? I see the fragmentation code but that seems specific to BLAT results and when I look at the HSPFragments in the QueryResult object it does not seem to combine multiple HSPs against the same hit even if they are not overlapping. Thanks Colin From p.j.a.cock at googlemail.com Sat Feb 9 15:36:34 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 9 Feb 2013 15:36:34 +0000 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: <739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de> References: <739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de> Message-ID: > Am 2013-02-09 15:30, schrieb Peter Cock: >> One reason for this is to be consistent across all the formats supported >> in SearchIO, and since Biopython is a Python library following Python >> norms seems most natural. On Sat, Feb 9, 2013 at 3:12 PM, Markus Piotrowski wrote: > Mmh, at least Bio.Blast.NCBIXML returns the exact values given in the xml > result. Yes, the old Bio.Blast parsers do not try and convert the co-ordinates. Given they were only handling BLAST output that was a justifiable option. With Bio.SearchIO we're not just modelling BLAST output though - it covers multiple formats with different conventions. Peter From w.arindrarto at gmail.com Sat Feb 9 16:56:46 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Sat, 9 Feb 2013 17:56:46 +0100 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: References: <739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de> Message-ID: Hi everyone, Colin, thanks for the feedback! Peter has explained the rationale behind the decision, so I would like to add that there has been indeed an explanation of this behavior in the tutorial (http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc106) and the code (https://github.com/biopython/biopython/blob/master/Bio/SearchIO/__init__.py#L100). I do admit that the explanation in the code could be made clearer with some comments in hsp.py ~ which I can add :). As for your point about the alignment code: > I was wondering if there was any code in SearchIO to align high-scoring > segment pairs against the same hit? I see the fragmentation code but that > seems specific to BLAT results and when I look at the HSPFragments in the > QueryResult object it does not seem to combine multiple HSPs against the > same hit even if they are not overlapping. SearchIO relies on BLAST to do this ~ which has already grouped each HSP aligning to the same database sequence in one group (all of which is accessible through the Hit object). I've always assumed that if two HSPs came from the same database entry (Hit), they are grouped into one Hit by BLAST, regardless of whether they overlap or not. Have you seen any results from BLAST that shows otherwise? cheers, Bow From arklenna at gmail.com Sat Feb 9 17:14:01 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Sat, 9 Feb 2013 12:14:01 -0500 Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: <1360374143.25311.YahooMailClassic@web164004.mail.gq1.yahoo.com> References: <1360374143.25311.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: On Fri, Feb 8, 2013 at 8:42 PM, Michiel de Hoon wrote: > Hi Lenna, > > > --- On *Thu, 2/7/13, Lenna Peterson * wrote: > > If there are well-defined problems with the PLY parser, I can work on > > fixing them. I am not currently working with mmCIF so I am not in the > > best position to evaluate where and how the parser needs to be improved. > > I don't know of any problems with the PLY parser, but since it relies on > PLY, it would add another dependency to Biopython. > > On the other hand, a pure-Python solution may be preferable, as it's > easier to maintain and runs with Jython. > As far as I can tell, PLY works with Jython, discussion on this thread: http://permalink.gmane.org/gmane.comp.python.ply/402 Not sure about pypy. One option would be to deploy the PLY parser for non-CPython platforms and tell them to manually install PLY if they want to use mmCIF. Not ideal, but is that preferred to an explicit dependency? > > I see three options then: > 1) Remove the lex stuff from lex.yy.c, and optionally convert the > remaining C code to Python. > As is, the C compiles cross platform with no dependencies. There is nothing but lex stuff in lex.yy.c - I'm not quite sure what you mean here. > 2) Remove the PLY dependency from the PLY-based parser. > 3) Write a new pure-Python parser from scratch. > > I'm not sure whether there is an appreciable difference between options 2 and 3. Cheers, Lenna From mjldehoon at yahoo.com Sun Feb 10 03:55:37 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 9 Feb 2013 19:55:37 -0800 (PST) Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: Message-ID: <1360468537.28338.YahooMailClassic@web164004.mail.gq1.yahoo.com> Hi Lenna, >--- On Sat, 2/9/13, Lenna Peterson wrote: > > 1) Remove the lex stuff from lex.yy.c, and optionally convert the remaining > >? C code to Python. > As is, the C?compiles cross platform with?no?dependencies.?There is nothing > but?lex stuff in lex.yy.c - I'm not quite sure what you mean here. Currently lex.yy.c contains lots of code that is generated automatically by lex but is not actually needed for the mmCIF parser. I was thinking to remove those parts, and to clean up the remainder so that the code is understandable (allowing us to fix any bugs, or to convert it to pure Python). Best, -Michiel From colin.aibn at gmail.com Sun Feb 10 07:28:36 2013 From: colin.aibn at gmail.com (Colin Archer) Date: Sun, 10 Feb 2013 17:28:36 +1000 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: References: <739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de> Message-ID: On Sun, Feb 10, 2013 at 2:56 AM, Wibowo Arindrarto wrote: > Hi everyone, > > Colin, thanks for the feedback! Peter has explained the rationale > behind the decision, so I would like to add that there has been indeed > an explanation of this behavior in the tutorial > (http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc106) and > the code ( > https://github.com/biopython/biopython/blob/master/Bio/SearchIO/__init__.py#L100 > ). > I do admit that the explanation in the code could be made clearer with > some comments in hsp.py ~ which I can add :). > > As for your point about the alignment code: > > > I was wondering if there was any code in SearchIO to align high-scoring > > segment pairs against the same hit? I see the fragmentation code but that > > seems specific to BLAT results and when I look at the HSPFragments in the > > QueryResult object it does not seem to combine multiple HSPs against the > > same hit even if they are not overlapping. > > SearchIO relies on BLAST to do this ~ which has already grouped each > HSP aligning to the same database sequence in one group (all of which > is accessible through the Hit object). I've always assumed that if two > HSPs came from the same database entry (Hit), they are grouped into > one Hit by BLAST, regardless of whether they overlap or not. Have you > seen any results from BLAST that shows otherwise? > > I have a couple of examples where BLAST doesn't combine the HSPs as you would expect. It seems to mainly occur because the HSP alignments overlap and to combine them would mean including more gaps in each hsp. For example, *ftsK* in *E. coli* (ftsK.blast) or *aceF* in *E. coli* (aceF.blast). In the second case, the first HSP spans the entire query and there are two additional HSPs that are overlapped by it. I know that BioPerl tries to align/tile (in Bio::Search::BlastUtils) the HSPs somewhat when required but some people are hesitant to use their method in certain situations (e.g., with tblastn results that overestimate some of the metrics). They also implement additional functionality so that the user could do a complete smith-waterman alignment if they wanted to. Thanks Colin -------------- next part -------------- A non-text attachment was scrubbed... Name: aceF.blast Type: application/octet-stream Size: 12124 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ftsK.blast Type: application/octet-stream Size: 18537 bytes Desc: not available URL: From w.arindrarto at gmail.com Sun Feb 10 15:31:51 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Sun, 10 Feb 2013 16:31:51 +0100 Subject: [Biopython-dev] SearchIO HSP indexing In-Reply-To: References: <739b610114b22975ac614055b5a018c7@mpx2.rz.ruhr-uni-bochum.de> Message-ID: Hi Colin, >> As for your point about the alignment code: >> >> > I was wondering if there was any code in SearchIO to align high-scoring >> > segment pairs against the same hit? I see the fragmentation code but >> > that >> > seems specific to BLAT results and when I look at the HSPFragments in >> > the >> > QueryResult object it does not seem to combine multiple HSPs against the >> > same hit even if they are not overlapping. >> >> SearchIO relies on BLAST to do this ~ which has already grouped each >> HSP aligning to the same database sequence in one group (all of which >> is accessible through the Hit object). I've always assumed that if two >> HSPs came from the same database entry (Hit), they are grouped into >> one Hit by BLAST, regardless of whether they overlap or not. Have you >> seen any results from BLAST that shows otherwise? >> > > I have a couple of examples where BLAST doesn't combine the HSPs as you > would expect. It seems to mainly occur because the HSP alignments overlap > and to combine them would mean including more gaps in each hsp. For example, > ftsK in E. coli (ftsK.blast) or aceF in E. coli (aceF.blast). In the second > case, the first HSP spans the entire query and there are two additional HSPs > that are overlapped by it. > > I know that BioPerl tries to align/tile (in Bio::Search::BlastUtils) the > HSPs somewhat when required but some people are hesitant to use their method > in certain situations (e.g., with tblastn results that overestimate some of > the metrics). They also implement additional functionality so that the user > could do a complete smith-waterman alignment if they wanted to. Thanks for including the files! At the moment, no, SearchIO doesn't have any code to 'assemble'/'tile' overlapping HSPs. The fragment bits you're seeing in the BLAT parser is simply the name we use to refer to noncontiguous blocks inside a reported HSP. We may be able to add some functions to return the intervals for such overlapping HSPs, given a Hit object. But I'm a bit hesitant to go further than that (i.e. to the point where we merge the statistics of the each HSP to assign to the assembled HSP). This is mostly because such assembly seems very specific to the program's statistics and format (BLAST's merge would be different from BLAT? and BLAST XML's merge may be different from tabular BLAST). If anything, perhaps these functions deserve their own space in SearchUtils (taking parallels from Bio.SeqIO and Bio.SeqUtils)? regards, Bow From redmine at redmine.open-bio.org Sun Feb 10 22:13:20 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 10 Feb 2013 22:13:20 +0000 Subject: [Biopython-dev] [Biopython - Bug #3412] (New) Bad URL in docs for module NCBIWWW.qblast Message-ID: Issue #3412 has been reported by Vincent Davis. ---------------------------------------- Bug #3412: Bad URL in docs for module NCBIWWW.qblast https://redmine.open-bio.org/issues/3412 Author: Vincent Davis Status: New Priority: Low Assignee: Biopython Dev Mailing List Category: Documentation Target version: URL: At the bottom of "help(help(NCBIWWW.qblast) is a link to http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html This link is not valid. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sun Feb 10 22:13:20 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 10 Feb 2013 22:13:20 +0000 Subject: [Biopython-dev] [Biopython - Bug #3412] (New) Bad URL in docs for module NCBIWWW.qblast Message-ID: Issue #3412 has been reported by Vincent Davis. ---------------------------------------- Bug #3412: Bad URL in docs for module NCBIWWW.qblast https://redmine.open-bio.org/issues/3412 Author: Vincent Davis Status: New Priority: Low Assignee: Biopython Dev Mailing List Category: Documentation Target version: URL: At the bottom of "help(help(NCBIWWW.qblast) is a link to http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html This link is not valid. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sun Feb 10 22:40:21 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 10 Feb 2013 22:40:21 +0000 Subject: [Biopython-dev] [Biopython - Bug #3412] (Resolved) Bad URL in docs for module NCBIWWW.qblast References: Message-ID: Issue #3412 has been updated by Peter Cock. Status changed from New to Resolved % Done changed from 0 to 100 The NCBI seem to have broken that link, and if they did setup a redirect for a while it has stopped now. I'll use http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html instead I think, https://github.com/biopython/biopython/commit/ae84cc8cb828e868883c75a980fcd83585c338f8 Thanks! ---------------------------------------- Bug #3412: Bad URL in docs for module NCBIWWW.qblast https://redmine.open-bio.org/issues/3412 Author: Vincent Davis Status: Resolved Priority: Low Assignee: Biopython Dev Mailing List Category: Documentation Target version: URL: At the bottom of "help(help(NCBIWWW.qblast) is a link to http://www.ncbi.nlm.nih.gov/BLAST/blast_overview.html This link is not valid. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From eric.talevich at gmail.com Mon Feb 11 02:11:54 2013 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 10 Feb 2013 21:11:54 -0500 Subject: [Biopython-dev] New Newick parser in Bio.Phylo Message-ID: Hi Ben, I've noticed a couple new characteristics of the Newick parser that I had questions about. 1. There is no longer a way to tell the parser to treat internal node labels as confidence values. Lots of files in the wild do record the support values here, including those generated by RAxML, PhyML, FastTree and MrBayes, so I'd like to restore this option, and perhaps make it the default. I think the condition is: if not (self.values_are_confidence or self.comments_are_confidence or current_clade.is_terminal()): # parse confidence from node label Is there an easy way to add this option to the parser? I'm trying to get this to work in the "else" clause in parse_tree, where unquoted node labels are handled. 2. Confidence values are required to be between 0.0 and 1.0. Also, support values recorded as integers are treated as percentages and divided by 100 automatically. The phyloXML spec doesn't have this range requirement. RAxML scales bootstraps to 100, but PhyML records the raw number of supporting bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap replicates). So, I'd prefer to leave the confidence values as they are, requiring only that they be numeric. Thoughts? Thanks, Eric From ben at bendmorris.com Mon Feb 11 02:39:24 2013 From: ben at bendmorris.com (Ben Morris) Date: Sun, 10 Feb 2013 21:39:24 -0500 Subject: [Biopython-dev] New Newick parser in Bio.Phylo In-Reply-To: References: Message-ID: On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich wrote: > Hi Ben, > > I've noticed a couple new characteristics of the Newick parser that I had > questions about. > > 1. There is no longer a way to tell the parser to treat internal node labels > as confidence values. Lots of files in the wild do record the support values > here, including those generated by RAxML, PhyML, FastTree and MrBayes, so > I'd like to restore this option, and perhaps make it the default. I think > the condition is: > > if not (self.values_are_confidence or self.comments_are_confidence or > current_clade.is_terminal()): # parse confidence from node label > > Is there an easy way to add this option to the parser? I'm trying to get > this to work in the "else" clause in parse_tree, where unquoted node labels > are handled. > > > 2. Confidence values are required to be between 0.0 and 1.0. Also, support > values recorded as integers are treated as percentages and divided by 100 > automatically. The phyloXML spec doesn't have this range requirement. RAxML > scales bootstraps to 100, but PhyML records the raw number of supporting > bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap > replicates). So, I'd prefer to leave the confidence values as they are, > requiring only that they be numeric. Thoughts? > > > Thanks, > Eric 1. One issue is that current_clade.is_terminal() will always be true at that point because current_clade's children haven't been parsed yet. Putting the check in the "process_clade" function (which is called when the closing paren is hit, and therefore all children should have been parsed) should fix this. So, if values_are_confidence and comments_are_confidence are both false and a node label is numeric, it should be treated as confidence, and clade.name should be set to None - is that correct? 2. This should be as simple as removing current lines 123-127. ~Ben From eric.talevich at gmail.com Mon Feb 11 03:30:47 2013 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 10 Feb 2013 22:30:47 -0500 Subject: [Biopython-dev] New Newick parser in Bio.Phylo In-Reply-To: References: Message-ID: On Sun, Feb 10, 2013 at 9:39 PM, Ben Morris wrote: > On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich > wrote: > > Hi Ben, > > > > I've noticed a couple new characteristics of the Newick parser that I had > > questions about. > > > > 1. There is no longer a way to tell the parser to treat internal node > labels > > as confidence values. Lots of files in the wild do record the support > values > > here, including those generated by RAxML, PhyML, FastTree and MrBayes, so > > I'd like to restore this option, and perhaps make it the default. I think > > the condition is: > > > > if not (self.values_are_confidence or self.comments_are_confidence or > > current_clade.is_terminal()): # parse confidence from node label > > > > Is there an easy way to add this option to the parser? I'm trying to get > > this to work in the "else" clause in parse_tree, where unquoted node > labels > > are handled. > > > > > > 2. Confidence values are required to be between 0.0 and 1.0. Also, > support > > values recorded as integers are treated as percentages and divided by 100 > > automatically. The phyloXML spec doesn't have this range requirement. > RAxML > > scales bootstraps to 100, but PhyML records the raw number of supporting > > bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap > > replicates). So, I'd prefer to leave the confidence values as they are, > > requiring only that they be numeric. Thoughts? > > > > > > Thanks, > > Eric > > 1. One issue is that current_clade.is_terminal() will always be true > at that point because current_clade's children haven't been parsed > yet. Putting the check in the "process_clade" function (which is > called when the closing paren is hit, and therefore all children > should have been parsed) should fix this. > > So, if values_are_confidence and comments_are_confidence are both > false and a node label is numeric, it should be treated as confidence, > and clade.name should be set to None - is that correct? > > 2. This should be as simple as removing current lines 123-127. > > ~Ben > Thanks. Here's #2: https://github.com/biopython/biopython/commit/0aee549e72fe5dcf9bcea239d29780706500922a I agree with your assessment of #1, but haven't been able to get it working yet. I'm leaving Bug #3407 open for now: https://redmine.open-bio.org/issues/3407 From ben at bendmorris.com Mon Feb 11 04:04:45 2013 From: ben at bendmorris.com (Ben Morris) Date: Sun, 10 Feb 2013 23:04:45 -0500 Subject: [Biopython-dev] New Newick parser in Bio.Phylo In-Reply-To: References: Message-ID: On Sun, Feb 10, 2013 at 10:30 PM, Eric Talevich wrote: > On Sun, Feb 10, 2013 at 9:39 PM, Ben Morris wrote: >> >> On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich >> wrote: >> > Hi Ben, >> > >> > I've noticed a couple new characteristics of the Newick parser that I >> > had >> > questions about. >> > >> > 1. There is no longer a way to tell the parser to treat internal node >> > labels >> > as confidence values. Lots of files in the wild do record the support >> > values >> > here, including those generated by RAxML, PhyML, FastTree and MrBayes, >> > so >> > I'd like to restore this option, and perhaps make it the default. I >> > think >> > the condition is: >> > >> > if not (self.values_are_confidence or self.comments_are_confidence or >> > current_clade.is_terminal()): # parse confidence from node label >> > >> > Is there an easy way to add this option to the parser? I'm trying to get >> > this to work in the "else" clause in parse_tree, where unquoted node >> > labels >> > are handled. >> > >> > >> > 2. Confidence values are required to be between 0.0 and 1.0. Also, >> > support >> > values recorded as integers are treated as percentages and divided by >> > 100 >> > automatically. The phyloXML spec doesn't have this range requirement. >> > RAxML >> > scales bootstraps to 100, but PhyML records the raw number of supporting >> > bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap >> > replicates). So, I'd prefer to leave the confidence values as they are, >> > requiring only that they be numeric. Thoughts? >> > >> > >> > Thanks, >> > Eric >> >> 1. One issue is that current_clade.is_terminal() will always be true >> at that point because current_clade's children haven't been parsed >> yet. Putting the check in the "process_clade" function (which is >> called when the closing paren is hit, and therefore all children >> should have been parsed) should fix this. >> >> So, if values_are_confidence and comments_are_confidence are both >> false and a node label is numeric, it should be treated as confidence, >> and clade.name should be set to None - is that correct? >> >> 2. This should be as simple as removing current lines 123-127. >> >> ~Ben > > > > Thanks. Here's #2: > https://github.com/biopython/biopython/commit/0aee549e72fe5dcf9bcea239d29780706500922a > > I agree with your assessment of #1, but haven't been able to get it working > yet. I'm leaving Bug #3407 open for now: > https://redmine.open-bio.org/issues/3407 > I think this should do it: https://github.com/bendmorris/biopython/commit/b430f27ff908f07d8ab59bec48429947f0028d63 I also updated the test case to make sure this is working correctly and changed the default value of comments_are_confidences from True to False. If that looks correct, feel free to pull. ~Ben From eric.talevich at gmail.com Mon Feb 11 04:20:20 2013 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 10 Feb 2013 23:20:20 -0500 Subject: [Biopython-dev] New Newick parser in Bio.Phylo In-Reply-To: References: Message-ID: On Sun, Feb 10, 2013 at 11:04 PM, Ben Morris wrote: > On Sun, Feb 10, 2013 at 10:30 PM, Eric Talevich > wrote: > > On Sun, Feb 10, 2013 at 9:39 PM, Ben Morris wrote: > >> > >> On Sun, Feb 10, 2013 at 9:11 PM, Eric Talevich > > >> wrote: > >> > Hi Ben, > >> > > >> > I've noticed a couple new characteristics of the Newick parser that I > >> > had > >> > questions about. > >> > > >> > 1. There is no longer a way to tell the parser to treat internal node > >> > labels > >> > as confidence values. Lots of files in the wild do record the support > >> > values > >> > here, including those generated by RAxML, PhyML, FastTree and MrBayes, > >> > so > >> > I'd like to restore this option, and perhaps make it the default. I > >> > think > >> > the condition is: > >> > > >> > if not (self.values_are_confidence or self.comments_are_confidence or > >> > current_clade.is_terminal()): # parse confidence from node label > >> > > >> > Is there an easy way to add this option to the parser? I'm trying to > get > >> > this to work in the "else" clause in parse_tree, where unquoted node > >> > labels > >> > are handled. > >> > > >> > > >> > 2. Confidence values are required to be between 0.0 and 1.0. Also, > >> > support > >> > values recorded as integers are treated as percentages and divided by > >> > 100 > >> > automatically. The phyloXML spec doesn't have this range requirement. > >> > RAxML > >> > scales bootstraps to 100, but PhyML records the raw number of > supporting > >> > bootstrap runs (e.g. supports out of 1000 if there were 1000 bootstrap > >> > replicates). So, I'd prefer to leave the confidence values as they > are, > >> > requiring only that they be numeric. Thoughts? > >> > > >> > > >> > Thanks, > >> > Eric > >> > >> 1. One issue is that current_clade.is_terminal() will always be true > >> at that point because current_clade's children haven't been parsed > >> yet. Putting the check in the "process_clade" function (which is > >> called when the closing paren is hit, and therefore all children > >> should have been parsed) should fix this. > >> > >> So, if values_are_confidence and comments_are_confidence are both > >> false and a node label is numeric, it should be treated as confidence, > >> and clade.name should be set to None - is that correct? > >> > >> 2. This should be as simple as removing current lines 123-127. > >> > >> ~Ben > > > > > > > > Thanks. Here's #2: > > > https://github.com/biopython/biopython/commit/0aee549e72fe5dcf9bcea239d29780706500922a > > > > I agree with your assessment of #1, but haven't been able to get it > working > > yet. I'm leaving Bug #3407 open for now: > > https://redmine.open-bio.org/issues/3407 > > > > I think this should do it: > > > https://github.com/bendmorris/biopython/commit/b430f27ff908f07d8ab59bec48429947f0028d63 > > I also updated the test case to make sure this is working correctly > and changed the default value of comments_are_confidences from True to > False. > > If that looks correct, feel free to pull. > > ~Ben > Works for me, thanks! I cherry-picked it here: https://github.com/biopython/biopython/commit/f382f550f49f73301663ad949a6c1e40f5d71c0c From p.j.a.cock at googlemail.com Mon Feb 11 11:46:20 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 11 Feb 2013 11:46:20 +0000 Subject: [Biopython-dev] Dropping Python 2.5 and Jython 2.5 support? In-Reply-To: References: Message-ID: On Mon, Jan 7, 2013 at 6:55 PM, Peter Cock wrote: > > My only significant concern is for Jython users, since this will also > mean dropping support for Jython 2.5 (which implements the > Python 2.5 language). The replacement Jython 2.7 is still only > at the alpha release stage. Good news for Jython fans, although originally expected last year, they have now released a beta of Jython 2.7 (which supports the same language features as C Python 2.7): http://fwierzbicki.blogspot.co.uk/2013/02/jython-27-beta1-released.html Hopefully the Biopython unit tests will all be fine under this... and if so that is good news for phasing out support of Python 2.5. Regards, Peter From tiagoantao at gmail.com Mon Feb 11 11:50:10 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 11 Feb 2013 11:50:10 +0000 Subject: [Biopython-dev] Dropping Python 2.5 and Jython 2.5 support? In-Reply-To: References: Message-ID: On Mon, Feb 11, 2013 at 11:46 AM, Peter Cock wrote: > Good news for Jython fans, although originally expected last year, > they have now released a beta of Jython 2.7 (which supports the > same language features as C Python 2.7): I am going to setup builldbot now for this. I will set my slave first. If you have any slaves that you want to add this, please tell me. Tiago -- "Liberty for wolves is death to the lambs" - Isaiah Berlin From saketkc at gmail.com Tue Feb 12 09:51:54 2013 From: saketkc at gmail.com (Saket Choudhary) Date: Tue, 12 Feb 2013 15:21:54 +0530 Subject: [Biopython-dev] BWA Wrapper Message-ID: Hi, I am writing a bwa wrapper for bio-python. I have infact got the "index" option working. However I have a concern: bwa has these options : bwa index -a bwtsw database.fasta bwa aln database.fasta short_read.fastq > aln_sa.sai bwa samse database.fasta aln_sa.sai short_read.fastq > aln.sam bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam bwa bwasw database.fasta long_read.fastq > aln.sam If you read the documentation here, you will see that "-r" is an option with "aln" command as well as the "samse" command. In the former it is of type INT and in the latter of type STR. Now I am not sure how can this be taken care of in the wrapper, because I also plan to implement a checker_function. One way is to make a new class, lets say BwaAlignCommand which will take care of all options inside the "aln" command and separately implement another class say "BwaSamseCommand", and implement all the options of the "samse" command. But I am not sure if that is indeed the correct way of addressing the problem. Any pointers on this issue ? Thanks Saket Choudhary Undergraduate Student IIT Bombay,India From p.j.a.cock at googlemail.com Tue Feb 12 17:38:46 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 12 Feb 2013 17:38:46 +0000 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: On Tue, Feb 12, 2013 at 9:51 AM, Saket Choudhary wrote: > Hi, > > I am writing a bwa wrapper for bio-python. I have infact got the "index" > option working. However I have a concern: > > bwa has these options : > > bwa index -a bwtsw database.fasta > > bwa aln database.fasta short_read.fastq > aln_sa.sai > > bwa samse database.fasta aln_sa.sai short_read.fastq > aln.sam > > bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > aln.sam > > bwa bwasw database.fasta long_read.fastq > aln.sam > > > If you read the documentation here, > you will see that "-r" is an option with "aln" command as well as the > "samse" command. In the former it is of type INT and in the latter of type > STR. Now I am not sure how can this be taken care of in the wrapper, > because I also plan to implement a checker_function. One way is to make a > new class, lets say BwaAlignCommand which will take care of all options > inside the "aln" command and separately implement another class say > "BwaSamseCommand", and implement all the options of the "samse" command. > But I am not sure if that is indeed the correct way of addressing the > problem. > > > Any pointers on this issue ? I would treat "bwa samse", "bwa sampe", "bwa ..." as separate tools and write a wrapper class for each of them. This would probably fit under the Bio.Sequencing.Applications namespace. Peter From p.j.a.cock at googlemail.com Tue Feb 12 17:51:15 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 12 Feb 2013 17:51:15 +0000 Subject: [Biopython-dev] Project ideas for GSoC (or other student projects) Message-ID: Hello all, Google recently confirmed they will be running Google Summer of Code 2013, and we (Biopython and the other Bio* projects) would hope to be accepted again under the Open Bioinformatics Foundation as in previous years: http://lists.open-bio.org/pipermail/gsoc/2013/000196.html It would be great to start coming up with potential project ideas, both larger pieces of work suitable for GSoC but also smaller tasks for other project students, or 'low hanging fruit' for potential contributors to cut their teeth on. See also http://biopython.org/wiki/Active_projects and the ideas list there. Regards, Peter From w.arindrarto at gmail.com Tue Feb 12 18:29:02 2013 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 12 Feb 2013 19:29:02 +0100 Subject: [Biopython-dev] Project ideas for GSoC (or other student projects) In-Reply-To: References: Message-ID: Hi everyone, It's more or less a 'low hanging fruit', but I've been thinking perhaps it may be useful if we have our own interface to the HMMER3 online service? The corresponding SearchIO parsers may be written for this as well (they return different formats for which we haven't any parsers currently). And I think there are more things being worked on, not yet mentioned in the wiki: 1. Porting our docs to Sphinx[1] 2. Converting some/all of the print and compare tests to unit tests. For example, our Bio.Seq's tests are still print and compare tests. regards, Bow [1] See the original feature request here: https://redmine.open-bio.org/issues/3221 https://redmine.open-bio.org/issues/3220 https://redmine.open-bio.org/issues/3219 From eric.talevich at gmail.com Tue Feb 12 20:00:11 2013 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 12 Feb 2013 15:00:11 -0500 Subject: [Biopython-dev] Project ideas for GSoC (or other student projects) In-Reply-To: References: Message-ID: On Tue, Feb 12, 2013 at 12:51 PM, Peter Cock wrote: > Hello all, > > Google recently confirmed they will be running Google Summer of Code 2013, > and we (Biopython and the other Bio* projects) would hope to be accepted > again > under the Open Bioinformatics Foundation as in previous years: > http://lists.open-bio.org/pipermail/gsoc/2013/000196.html > > It would be great to start coming up with potential project ideas, both > larger > pieces of work suitable for GSoC but also smaller tasks for other project > students, or 'low hanging fruit' for potential contributors to cut > their teeth on. > One interesting GSoC project would be to implement support for phylogenetic placements. The programs pplacer and EPA (part of RAxML) can place sequence reads from metagenomic samples onto a reference phylogeny: http://matsen.fhcrc.org/pplacer/ http://sysbio.oxfordjournals.org/content/60/3/291 The output format of those programs has been standardized as something I suppose we could call the "jplace" format: http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0031009 http://arxiv.org/abs/1201.3397 It's based on JSON and Newick, with a small extension to Newick that shouldn't be too hard to support. The GSoC project would be to implement a parser for this and implement querying as well as integration with the rest of Bio.Phylo to some reasonable extent. I would be available to mentor this. In terms of low-hanging fruit, there are some small but important functions that could be added to Bio.Phylo. My top three: Robinson-Foulds distance, majority-rules consensus, draw an unrooted tree using Felsenstein's Equal Daylight algorithm (which starts by computing the layout for a radial tree). -Eric From saketkc at gmail.com Tue Feb 12 20:45:46 2013 From: saketkc at gmail.com (Saket Choudhary) Date: Wed, 13 Feb 2013 02:15:46 +0530 Subject: [Biopython-dev] Project ideas for GSoC (or other student projects) In-Reply-To: References: Message-ID: Hi, I was thinking of a Synteny viewer on the lines of GSV if it makes sense . Saket On 12 February 2013 23:21, Peter Cock wrote: > Hello all, > > Google recently confirmed they will be running Google Summer of Code 2013, > and we (Biopython and the other Bio* projects) would hope to be accepted > again > under the Open Bioinformatics Foundation as in previous years: > http://lists.open-bio.org/pipermail/gsoc/2013/000196.html > > It would be great to start coming up with potential project ideas, both > larger > pieces of work suitable for GSoC but also smaller tasks for other project > students, or 'low hanging fruit' for potential contributors to cut > their teeth on. > > See also http://biopython.org/wiki/Active_projects and the ideas list > there. > > Regards, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From sefakilic at gmail.com Tue Feb 12 23:18:17 2013 From: sefakilic at gmail.com (=?UTF-8?B?U2VmYSBLxLFsxLHDpw==?=) Date: Tue, 12 Feb 2013 18:18:17 -0500 Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence In-Reply-To: References: Message-ID: Hi all, I am working on comparative genomics and I frequently use Motif module of Biopython. One of the most frequent operations that I do is to build a motif out of sites and search a sequence to find instances that are similar to the motif [Bio.Motif._Motif.search_instances()]. The problem is that the sequence that instances are searched is huge. Mostly it is the genome sequence itself, with its reverse complement. For example, scanning the E.coli genome + its reverse complement with a motif of length ~20 takes almost a minute in my machine. To make it faster, I implemented a C version of it and a Python interface so that you can call it from Python. It is pretty fast, it takes about ~2.5 seconds. Current implementation can be found at: https://github.com/sefakilic/yassi If anyone is interested and it is appropriate, I would like to modify the current implementation and integrate it into Biopython. Thanks! Sefa Kilic From mjldehoon at yahoo.com Wed Feb 13 02:06:33 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 12 Feb 2013 18:06:33 -0800 (PST) Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence In-Reply-To: Message-ID: <1360721193.98373.YahooMailClassic@web164003.mail.gq1.yahoo.com> Hi Sefa, Bio.Motif._Motif.search_instances() searches for exact instances of a motif, but it looks like your code searches for motifs based on its PSSM score. Then, isn't it the same as the current code in Bio/Motif/_pwm.c (or Bio/motifs/_pwm.c)? Best, -Michiel. --- On Tue, 2/12/13, Sefa K?l?? wrote: > From: Sefa K?l?? > Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence > To: biopython-dev at biopython.org > Date: Tuesday, February 12, 2013, 6:18 PM > Hi all, > > I am working on comparative genomics and I frequently use > Motif module of > Biopython. One of the most frequent operations that I do is > to build a > motif out of sites and search a sequence to find instances > that are similar > to the motif [Bio.Motif._Motif.search_instances()]. > > The problem is that the sequence that instances are searched > is huge. > Mostly it is the genome sequence itself, with its reverse > complement. For > example, scanning the E.coli genome + its reverse complement > with a motif > of length ~20 takes almost a minute in my machine. > > To make it faster, I implemented a C version of it and a > Python interface > so that you can call it from Python. It is pretty fast, it > takes about ~2.5 > seconds. > > Current implementation can be found at: > > https://github.com/sefakilic/yassi > > If anyone is interested and it is appropriate, I would like > to modify the > current implementation and integrate it into Biopython. > > Thanks! > > Sefa Kilic > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From mjldehoon at yahoo.com Wed Feb 13 02:08:26 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 12 Feb 2013 18:08:26 -0800 (PST) Subject: [Biopython-dev] Project ideas for GSoC (or other student projects) In-Reply-To: Message-ID: <1360721306.47860.YahooMailClassic@web164001.mail.gq1.yahoo.com> It would be great to have better support for microarray analysis in Biopython. Something like lumi/limma in R. Perhaps this is an option for the GSoC? Best, -Michiel. --- On Tue, 2/12/13, Peter Cock wrote: > From: Peter Cock > Subject: [Biopython-dev] Project ideas for GSoC (or other student projects) > To: "Biopython-Dev Mailing List" > Date: Tuesday, February 12, 2013, 12:51 PM > Hello all, > > Google recently confirmed they will be running Google Summer > of Code 2013, > and we (Biopython and the other Bio* projects) would hope to > be accepted again > under the Open Bioinformatics Foundation as in previous > years: > http://lists.open-bio.org/pipermail/gsoc/2013/000196.html > > It would be great to start coming up with potential project > ideas, both larger > pieces of work suitable for GSoC but also smaller tasks for > other project > students, or 'low hanging fruit' for potential contributors > to cut > their teeth on. > > See also http://biopython.org/wiki/Active_projects > and the ideas list there. > > Regards, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From sefakilic at gmail.com Wed Feb 13 02:40:12 2013 From: sefakilic at gmail.com (=?UTF-8?B?U2VmYSBLxLFsxLHDpw==?=) Date: Tue, 12 Feb 2013 21:40:12 -0500 Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence In-Reply-To: <1360721193.98373.YahooMailClassic@web164003.mail.gq1.yahoo.com> References: <1360721193.98373.YahooMailClassic@web164003.mail.gq1.yahoo.com> Message-ID: Hi Michiel, Thanks for the reply. It seems that _pwm.c does the same thing, as you said. I missed that part of the code. However, it seems that it is not mentioned in the tutorial and it might be useful to mention it there. Anyway, it was a good practice for re-implementing it. Thank you! Sefa Kilic On Tue, Feb 12, 2013 at 9:06 PM, Michiel de Hoon wrote: > Hi Sefa, > > Bio.Motif._Motif.search_instances() searches for exact instances of a > motif, but it looks like your code searches for motifs based on its PSSM > score. Then, isn't it the same as the current code in Bio/Motif/_pwm.c (or > Bio/motifs/_pwm.c)? > > Best, > -Michiel. > > --- On Tue, 2/12/13, Sefa K?l?? wrote: > > > From: Sefa K?l?? > > Subject: [Biopython-dev] Fwd: Fast instance search of motif in a sequence > > To: biopython-dev at biopython.org > > Date: Tuesday, February 12, 2013, 6:18 PM > > Hi all, > > > > I am working on comparative genomics and I frequently use > > Motif module of > > Biopython. One of the most frequent operations that I do is > > to build a > > motif out of sites and search a sequence to find instances > > that are similar > > to the motif [Bio.Motif._Motif.search_instances()]. > > > > The problem is that the sequence that instances are searched > > is huge. > > Mostly it is the genome sequence itself, with its reverse > > complement. For > > example, scanning the E.coli genome + its reverse complement > > with a motif > > of length ~20 takes almost a minute in my machine. > > > > To make it faster, I implemented a C version of it and a > > Python interface > > so that you can call it from Python. It is pretty fast, it > > takes about ~2.5 > > seconds. > > > > Current implementation can be found at: > > > > https://github.com/sefakilic/yassi > > > > If anyone is interested and it is appropriate, I would like > > to modify the > > current implementation and integrate it into Biopython. > > > > Thanks! > > > > Sefa Kilic > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > From saketkc at gmail.com Thu Feb 14 16:02:21 2013 From: saketkc at gmail.com (Saket Choudhary) Date: Thu, 14 Feb 2013 21:32:21 +0530 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: Theres one more issue that I have run into . Consider the following command , the outout generated is written by piping it to a file called aln_sa.sai, bwa aln database.fasta short_read.fastq > aln_sa.sai Now if we look into the _call method here , it takes as its inout a boolean for stdout. So should I modify this so that it can take 'stdout' as on opened file instance which I can invoke while unvoking my BwaAlnCommandLine functions as follwos: a=BwaAlnCommandLine() b=a(stdout=open("aln_sa.sai","wb")) On 12 February 2013 23:08, Peter Cock wrote: > On Tue, Feb 12, 2013 at 9:51 AM, Saket Choudhary > wrote: > > Hi, > > > > I am writing a bwa wrapper for bio-python. I have infact got the "index" > > option working. However I have a concern: > > > > bwa has these options : > > > > bwa index -a bwtsw database.fasta > > > > bwa aln database.fasta short_read.fastq > aln_sa.sai > > > > bwa samse database.fasta aln_sa.sai short_read.fastq > aln.sam > > > > bwa sampe database.fasta aln_sa1.sai aln_sa2.sai read1.fq read2.fq > > aln.sam > > > > bwa bwasw database.fasta long_read.fastq > aln.sam > > > > > > If you read the documentation here< > http://bio-bwa.sourceforge.net/bwa.shtml>, > > you will see that "-r" is an option with "aln" command as well as the > > "samse" command. In the former it is of type INT and in the latter of > type > > STR. Now I am not sure how can this be taken care of in the wrapper, > > because I also plan to implement a checker_function. One way is to make > a > > new class, lets say BwaAlignCommand which will take care of all options > > inside the "aln" command and separately implement another class say > > "BwaSamseCommand", and implement all the options of the "samse" command. > > But I am not sure if that is indeed the correct way of addressing the > > problem. > > > > > > Any pointers on this issue ? > > I would treat "bwa samse", "bwa sampe", "bwa ..." as separate tools and > write a wrapper class for each of them. This would probably fit under the > Bio.Sequencing.Applications namespace. > > Peter > From p.j.a.cock at googlemail.com Thu Feb 14 16:19:59 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 14 Feb 2013 16:19:59 +0000 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: On Thu, Feb 14, 2013 at 4:02 PM, Saket Choudhary wrote: > Theres one more issue that I have run into . Consider the following command > , the outout generated is written by piping it to a file called aln_sa.sai, > > bwa aln database.fasta short_read.fastq > aln_sa.sai > > Now if we look into the _call method here , it takes as its inout a boolean > for stdout. So should I modify this so that it can take 'stdout' as on > opened file instance which I can invoke while unvoking my BwaAlnCommandLine > functions as follwos: > > a=BwaAlnCommandLine() > b=a(stdout=open("aln_sa.sai","wb")) Is that possible? For complex use of subprocess and pipes, we've previously recommend the user handle this explicitly themselves, just use str() on the command line wrapper object to get 'bwa aln database.fasta short_read.fastq' in this case. There are some examples in the Tutorial with (multiple sequence) alignment tools. Peter From saketkc at gmail.com Thu Feb 14 17:04:04 2013 From: saketkc at gmail.com (Saket Choudhary) Date: Thu, 14 Feb 2013 22:34:04 +0530 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: I was thinking of adding one more parameter to the _call function, lets say 'stdout_to_filepath'. If this is set then I add one more if condition here to set the stdout as stdout_arg = open(stdout_to_filepath, "w") I tried it and it did work, but I am not sure if it this standard can be incorporated in the biopython codebase ? Thanks Saket On 14 February 2013 21:49, Peter Cock wrote: > On Thu, Feb 14, 2013 at 4:02 PM, Saket Choudhary > wrote: > > Theres one more issue that I have run into . Consider the following > command > > , the outout generated is written by piping it to a file called > aln_sa.sai, > > > > bwa aln database.fasta short_read.fastq > aln_sa.sai > > > > Now if we look into the _call method here , it takes as its inout a > boolean > > for stdout. So should I modify this so that it can take 'stdout' as on > > opened file instance which I can invoke while unvoking my > BwaAlnCommandLine > > functions as follwos: > > > > a=BwaAlnCommandLine() > > b=a(stdout=open("aln_sa.sai","wb")) > > Is that possible? > > For complex use of subprocess and pipes, we've previously recommend > the user handle this explicitly themselves, just use str() on the command > line wrapper object to get 'bwa aln database.fasta short_read.fastq' in > this > case. There are some examples in the Tutorial with (multiple sequence) > alignment tools. > > Peter > From saketkc at gmail.com Thu Feb 14 18:52:31 2013 From: saketkc at gmail.com (Saket Choudhary) Date: Fri, 15 Feb 2013 00:22:31 +0530 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: In short , am I allowed to play with this extra parameter thing as per the code standards of biopython ? On 14 February 2013 22:34, Saket Choudhary wrote: > I was thinking of adding one more parameter to the _call function, lets say > 'stdout_to_filepath'. > If this is set then I add one more if condition here to set the stdout as > > > stdout_arg = open(stdout_to_filepath, "w") > > I tried it and it did work, but I am not sure if it this standard can be > incorporated in the biopython codebase ? > > Thanks > > Saket > > > On 14 February 2013 21:49, Peter Cock wrote: >> >> On Thu, Feb 14, 2013 at 4:02 PM, Saket Choudhary >> wrote: >> > Theres one more issue that I have run into . Consider the following >> > command >> > , the outout generated is written by piping it to a file called >> > aln_sa.sai, >> > >> > bwa aln database.fasta short_read.fastq > aln_sa.sai >> > >> > Now if we look into the _call method here , it takes as its inout a >> > boolean >> > for stdout. So should I modify this so that it can take 'stdout' as on >> > opened file instance which I can invoke while unvoking my >> > BwaAlnCommandLine >> > functions as follwos: >> > >> > a=BwaAlnCommandLine() >> > b=a(stdout=open("aln_sa.sai","wb")) >> >> Is that possible? >> >> For complex use of subprocess and pipes, we've previously recommend >> the user handle this explicitly themselves, just use str() on the command >> line wrapper object to get 'bwa aln database.fasta short_read.fastq' in >> this >> case. There are some examples in the Tutorial with (multiple sequence) >> alignment tools. >> >> Peter > > From arklenna at gmail.com Thu Feb 14 19:43:18 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Thu, 14 Feb 2013 14:43:18 -0500 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: > > On 14 February 2013 22:34, Saket Choudhary wrote: > > I was thinking of adding one more parameter to the _call function, lets > say > > 'stdout_to_filepath'. > > If this is set then I add one more if condition here to set the stdout > as > > > > > > stdout_arg = open(stdout_to_filepath, "w") > > > > > What's wrong with accepting the stdout string that the current implementation provides and explicitly writing it to your file? Cheers, Lenna From arklenna at gmail.com Thu Feb 14 19:52:54 2013 From: arklenna at gmail.com (Lenna Peterson) Date: Thu, 14 Feb 2013 14:52:54 -0500 Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: <1360468537.28338.YahooMailClassic@web164004.mail.gq1.yahoo.com> References: <1360468537.28338.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: On Sat, Feb 9, 2013 at 10:55 PM, Michiel de Hoon wrote: > > > Currently lex.yy.c contains lots of code that is generated automatically > by lex but is not actually needed for the mmCIF parser. I was thinking to > remove those parts, and to clean up the remainder so that the code is > understandable (allowing us to fix any bugs, or to convert it to pure > Python). > Whoops, failed to reply all. Sorry for the double email, Michiel. --- But generated C is by definition not understandable or debuggable. The only function of lex.yy.c is to tokenize the mmCIF input. All of the communication to Python is handled by MMCIFlexmodule.c, which is 70 lines and a header with 3 statements. In parallel to the PLY version, I rewrote the C to be object-oriented, which pushed it to 101 lines. Cheers, Lenna From p.j.a.cock at googlemail.com Thu Feb 14 20:33:37 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 14 Feb 2013 20:33:37 +0000 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: On Thu, Feb 14, 2013 at 7:43 PM, Lenna Peterson wrote: > > What's wrong with accepting the stdout string that the current > implementation provides and explicitly writing it to your file? > That is only a good idea for short output, say up to a few kb. With bwa (and samtools etc), quite often the output defaults to (or only goes to) stdout - and can be very large. It can also be binary rather than text, which is an additional complication with Python 2 vs Python 3 (byte strings versus unicode strings). See http://bio-bwa.sourceforge.net/bwa.shtml Peter From p.j.a.cock at googlemail.com Thu Feb 14 20:38:59 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 14 Feb 2013 20:38:59 +0000 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: On Thu, Feb 14, 2013 at 6:52 PM, Saket Choudhary wrote: > In short , am I allowed to play with this extra parameter thing as per > the code standards of biopython ? If you can come up with a nice extension to the current interface for the application wrapper's __call__ method, which is backward compatible, then we could be convinced. One idea would be stdout=True and stderr=True are treated as subprocess.PIPE (as now), and a false value would continue to mean don't capture the output (send it to /dev/null), but a (non-empty) string argument could be interpreted as a filename instead. You might be able to accept a handle, but I'm not sure if all Python handles would work or not here - it requires some careful cross platform testing. Peter From mjldehoon at yahoo.com Sat Feb 16 02:46:00 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 15 Feb 2013 18:46:00 -0800 (PST) Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: Message-ID: <1360982760.4805.YahooMailClassic@web164004.mail.gq1.yahoo.com> Hi Lenna, Maybe we are confusing each other.. I am looking for a solution that (a) doesn't introduce new dependencies, (b) is pure-Python so it can run on Jython, and (c) if that is not possible and we do need to use C, then that C code should be understandable so that it can be debugged if necessary. I was suggesting to clean up lex.yy.c so that we can at least achieve (c). The alternative is to start from the PLY-based parser and remove the dependency on PLY. Best, -Michiel. --- On Thu, 2/14/13, Lenna Peterson wrote: > From: Lenna Peterson > Subject: Re: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) > To: "BioPython-Dev Mailing List" > Date: Thursday, February 14, 2013, 2:52 PM > On Sat, Feb 9, 2013 at 10:55 PM, > Michiel de Hoon wrote: > > > > > > > Currently lex.yy.c contains lots of code that is > generated automatically > > by lex but is not actually needed for the mmCIF parser. > I was thinking to > > remove those parts, and to clean up the remainder so > that the code is > > understandable (allowing us to fix any bugs, or to > convert it to pure > > Python). > > > > Whoops, failed to reply all. Sorry for the double email, > Michiel. > > --- > > But generated C is by definition not understandable or > debuggable. The only > function of lex.yy.c is to tokenize the mmCIF input. > > All of the communication to Python is handled by > MMCIFlexmodule.c, which is > 70 lines and a header with 3 statements. In parallel to the > PLY version, I > rewrote the C to be object-oriented, which pushed it to 101 > lines. > > Cheers, > > Lenna > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From saketkc at gmail.com Sat Feb 16 07:08:46 2013 From: saketkc at gmail.com (Saket Choudhary) Date: Sat, 16 Feb 2013 12:38:46 +0530 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: On 15 February 2013 02:08, Peter Cock wrote: > On Thu, Feb 14, 2013 at 6:52 PM, Saket Choudhary wrote: >> In short , am I allowed to play with this extra parameter thing as per >> the code standards of biopython ? > > If you can come up with a nice extension to the current interface > for the application wrapper's __call__ method, which is backward > compatible, then we could be convinced. > > One idea would be stdout=True and stderr=True are treated as > subprocess.PIPE (as now), and a false value would continue > to mean don't capture the output (send it to /dev/null), but a > (non-empty) string argument could be interpreted as a filename > instead. You might be able to accept a handle, but I'm not sure > if all Python handles would work or not here - it requires some > careful cross platform testing. > > Peter HI Everyone, I have pushed the wrapper to https://github.com/saketkc/biopython/tree/bwa_wrapper Should I send a pull request ? I am in the middle of my University mid-semester examinations and hence this is not completely tested. I need to perform some more tests with more parameters after I am done with my examinations the next week. I would like to hear comments or have it code-reviewed, since this is the first time I am contributing to biopython and I might have missed out on some of the coding practices being followed. Thanks Saket From p.j.a.cock at googlemail.com Sat Feb 16 10:42:50 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 16 Feb 2013 10:42:50 +0000 Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: <1360982760.4805.YahooMailClassic@web164004.mail.gq1.yahoo.com> References: <1360982760.4805.YahooMailClassic@web164004.mail.gq1.yahoo.com> Message-ID: On Sat, Feb 16, 2013 at 2:46 AM, Michiel de Hoon wrote: > Hi Lenna, > > Maybe we are confusing each other.. > I am looking for a solution that (a) doesn't introduce new dependencies, +1 > (b) is pure-Python so it can run on Jython, +1 And on PyPy (which to me is more interesting that Jython) etc. > and (c) if that is not possible and we do need to use C, then that C code > should be understandable so that it can be debugged if necessary. > > I was suggesting to clean up lex.yy.c so that we can at least achieve (c). This does mean we essentially give up on ever regenerating the lex.yy.c file every again - could that be a problem if Flex itself changes much? > The alternative is to start from the PLY-based parser and remove the > dependency on PLY. > > Best, > -Michiel. Peter From saketkc at gmail.com Sat Feb 16 11:48:43 2013 From: saketkc at gmail.com (Saket Choudhary) Date: Sat, 16 Feb 2013 17:18:43 +0530 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: Oops. Apparently I had forgotten to 'git add' the _bwa.py . Committed now : https://github.com/saketkc/biopython/commit/062aabf8f31a522929957f4bcd3f7a932f3bdf23 On 16 February 2013 12:38, Saket Choudhary wrote: > On 15 February 2013 02:08, Peter Cock wrote: >> On Thu, Feb 14, 2013 at 6:52 PM, Saket Choudhary wrote: >>> In short , am I allowed to play with this extra parameter thing as per >>> the code standards of biopython ? >> >> If you can come up with a nice extension to the current interface >> for the application wrapper's __call__ method, which is backward >> compatible, then we could be convinced. >> >> One idea would be stdout=True and stderr=True are treated as >> subprocess.PIPE (as now), and a false value would continue >> to mean don't capture the output (send it to /dev/null), but a >> (non-empty) string argument could be interpreted as a filename >> instead. You might be able to accept a handle, but I'm not sure >> if all Python handles would work or not here - it requires some >> careful cross platform testing. >> >> Peter > > > HI Everyone, > > I have pushed the wrapper to > https://github.com/saketkc/biopython/tree/bwa_wrapper > > Should I send a pull request ? I am in the middle of my University > mid-semester examinations and hence this is not completely tested. I > need to perform some more tests with more parameters after I am done > with my examinations the next week. > > > I would like to hear comments or have it code-reviewed, since this is > the first time I am contributing to biopython and I might have missed > out on some of the coding practices being followed. > > Thanks > > Saket From mjldehoon at yahoo.com Sat Feb 16 12:09:22 2013 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 16 Feb 2013 04:09:22 -0800 (PST) Subject: [Biopython-dev] flex, setup.py and Bio.PDB.mmCIF (Bug 2619) In-Reply-To: Message-ID: <1361016562.57361.YahooMailClassic@web164001.mail.gq1.yahoo.com> --- On Sat, 2/16/13, Peter Cock wrote: > This does mean we essentially give up on ever regenerating > the lex.yy.c file every again - could that be a problem if Flex > itself changes much? The lex.yy.c file was generated by Flex, but otherwise it's independent of it. It doesn't #include Flex's header files, and we don't link it to the Flex libraries. So we can do with it whatever we want. We may find though that a stripped-down version of lex.yy.c will be rather trivial, and converting it to Python may be straightforward. Best, -Michiel. From tiagoantao at gmail.com Mon Feb 18 13:57:15 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 18 Feb 2013 13:57:15 +0000 Subject: [Biopython-dev] Support for BioSQL on Java/Jython Message-ID: Dear All, I have implemented a set of changes to allow for BioSQL support in Jython. Features: 1. Totally transparent in terms of API. Indeed the existing tests on BioSQL work out of the box 2. MySQL and PostgreSQL. 3. No sqllite3 support. This library (standard in C-Python) does not exist in Jython You can find the changes here: https://github.com/tiagoantao/biopython/commits/master (top two commits) Comments appreciated. If there is no opposition, I will commit these soon (after incorporating feedback) to the main repo. -- "Liberty for wolves is death to the lambs" - Isaiah Berlin From p.j.a.cock at googlemail.com Mon Feb 18 17:44:30 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 18 Feb 2013 17:44:30 +0000 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: > On 16 February 2013 12:38, Saket Choudhary wrote: >> HI Everyone, >> >> I have pushed the wrapper to >> https://github.com/saketkc/biopython/tree/bwa_wrapper >> >> Should I send a pull request ? I am in the middle of my University >> mid-semester examinations and hence this is not completely tested. I >> need to perform some more tests with more parameters after I am done >> with my examinations the next week. >> >> >> I would like to hear comments or have it code-reviewed, since this is >> the first time I am contributing to biopython and I might have missed >> out on some of the coding practices being followed. >> >> Thanks >> >> Saket On Sat, Feb 16, 2013 at 11:48 AM, Saket Choudhary wrote: > Oops. Apparently I had forgotten to 'git add' the _bwa.py . Committed now : > > https://github.com/saketkc/biopython/commit/062aabf8f31a522929957f4bcd3f7a932f3bdf23 > This looks sensible. I think if we are going to extend the __call__ interface to allow stdout to be a filename, then we should do the same for stderr as well. Also this needs to be explained in the docstring (and perhaps also the Tutorial somewhere). Separately some simple unit tests for the wrapper would be good too (which can be as much work as the original code itself), and would be beneficial for cross-platform testing. Thanks, Peter From tiagoantao at gmail.com Tue Feb 19 11:42:22 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 19 Feb 2013 11:42:22 +0000 Subject: [Biopython-dev] Jython (non-existing) docs Message-ID: Hi, I had a cursory look at the documentation for installing Biopython under Jython and there seems to be none. If it is OK, I would extend the documentation to cover Jython -- "Liberty for wolves is death to the lambs" - Isaiah Berlin From p.j.a.cock at googlemail.com Tue Feb 19 12:01:15 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 19 Feb 2013 12:01:15 +0000 Subject: [Biopython-dev] Jython (non-existing) docs In-Reply-To: References: Message-ID: On Tue, Feb 19, 2013 at 11:42 AM, Tiago Ant?o wrote: > Hi, > > I had a cursory look at the documentation for installing Biopython under > Jython and there seems to be none. If it is OK, I would extend the > documentation to cover Jython I just use the usual mantra: jython setup.py build jython setup.py test jython setup.py install Perhaps there are pitfalls I'm not aware of? (Updating Doc/install/Installation.tex is still a good idea though) Peter From tiagoantao at gmail.com Tue Feb 19 12:02:18 2013 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 19 Feb 2013 12:02:18 +0000 Subject: [Biopython-dev] Jython (non-existing) docs In-Reply-To: References: Message-ID: On Tue, Feb 19, 2013 at 12:01 PM, Peter Cock wrote: > Perhaps there are pitfalls I'm not aware of? > > JDBC driver for the new BioSQL code ;) Tiago -- "Liberty for wolves is death to the lambs" - Isaiah Berlin From p.j.a.cock at googlemail.com Tue Feb 19 12:07:39 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 19 Feb 2013 12:07:39 +0000 Subject: [Biopython-dev] Jython (non-existing) docs In-Reply-To: References: Message-ID: On Tue, Feb 19, 2013 at 12:02 PM, Tiago Ant?o wrote: > > On Tue, Feb 19, 2013 at 12:01 PM, Peter Cock > wrote: >> >> Perhaps there are pitfalls I'm not aware of? >> > > JDBC driver for the new BioSQL code ;) > > Tiago Good answer :) Yes, advice on installing optional dependencies like that makes sense. Peter From saketkc at gmail.com Tue Feb 19 13:15:56 2013 From: saketkc at gmail.com (Saket Choudhary) Date: Tue, 19 Feb 2013 18:45:56 +0530 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: On 18 February 2013 23:14, Peter Cock wrote: > > On 16 February 2013 12:38, Saket Choudhary wrote: > >> HI Everyone, > >> > >> I have pushed the wrapper to > >> https://github.com/saketkc/biopython/tree/bwa_wrapper > >> > >> Should I send a pull request ? I am in the middle of my University > >> mid-semester examinations and hence this is not completely tested. I > >> need to perform some more tests with more parameters after I am done > >> with my examinations the next week. > >> > >> > >> I would like to hear comments or have it code-reviewed, since this is > >> the first time I am contributing to biopython and I might have missed > >> out on some of the coding practices being followed. > >> > >> Thanks > >> > >> Saket > > > On Sat, Feb 16, 2013 at 11:48 AM, Saket Choudhary > wrote: > > Oops. Apparently I had forgotten to 'git add' the _bwa.py . Committed > now : > > > > > https://github.com/saketkc/biopython/commit/062aabf8f31a522929957f4bcd3f7a932f3bdf23 > > > > This looks sensible. I think if we are going to extend the __call__ > interface > to allow stdout to be a filename, then we should do the same for stderr > as well. Also this needs to be explained in the docstring (and perhaps > also the Tutorial somewhere). > > Separately some simple unit tests for the wrapper would be good too > (which can be as much work as the original code itself), and would > be beneficial for cross-platform testing. > > Thanks, > > Peter > Thanks Peter. I will add that. Any pointers to what would be a good reference test_aba.py file in Tests/ directory for writing unit tests for this ? I have worked on BDD before but Unit Tests are new for me, so it may take some time.I plan to finish it the coming week once my university examinations are done Thanks Saket From p.j.a.cock at googlemail.com Tue Feb 19 14:25:40 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 19 Feb 2013 14:25:40 +0000 Subject: [Biopython-dev] BWA Wrapper In-Reply-To: References: Message-ID: On Tue, Feb 19, 2013 at 1:15 PM, Saket Choudhary wrote: > > Thanks Peter. > > I will add that. Any pointers to what would be a good reference test_aba.py > file in Tests/ directory for writing unit tests for this ? > > I have worked on BDD before but Unit Tests are new for me, so it may take > some time.I plan to finish it the coming week once my university > examinations are done > > Thanks > > Saket There's a chapter in the Tutorial about our test framework. In this case existing command line tool wrappers are the best reference, e.g. test_Emboss.py or test_Muscle.py Also if you want to use doctests and have them included in the test suite, add the module to the list in Tests/run_tests.py - however this does not handle optional dependencies (other than NumPy). Therefore all the application wrapper doctests to date have carefully avoided actually invoking the command line - and instead most print the string representation instead. This allows us to check the example use cases should run (and catches silly errors in the examples like a typo in an argument name). Thanks, Peter From p.j.a.cock at googlemail.com Sun Feb 24 12:42:47 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 24 Feb 2013 12:42:47 +0000 Subject: [Biopython-dev] [Biopython] blastdbcmd In-Reply-To: <5127B8D1.5090705@usp.br> References: <5127A44E.2030403@usp.br> <5127B8D1.5090705@usp.br> Message-ID: Great - let us know on the list if you have any questions. Peter On Fri, Feb 22, 2013 at 6:28 PM, Frederico Moraes Ferreira wrote: > Hi Peter, > Yes, I meant a Biopython Blast application for blastdbcmd. > Thanks for the link. > Best, > Fred > > Em 22-02-2013 14:23, Peter Cock escreveu: > >> On Fri, Feb 22, 2013 at 5:01 PM, Frederico Moraes Ferreira >> wrote: >>> >>> Hi there Biopythoneers, >>> As long as I know, there isnt't a blastdbcmd submodule into Biopython. >>> So, >>> I've been writing the blast matched sequences ID's to a file, fetching >>> them >>> all with a subprocess and reading with SeqIO afterwards. In some cases, >>> however, I miss a blastdbcmd parser to make things easy. How do you guys >>> are >>> dealing with this? >>> Best, >>> Fred >> >> Are you talking about a command line wrapper for blastdbcmd, to go in >> Bio/Blast/Applications.py? That seems a good idea. >> >> Personally I find the blastdbcmd tool quite handicapped due to the >> introduction of generated sequence identifiers, and rarely use it: >> >> http://blastedbio.blogspot.co.uk/2012/10/my-ids-not-good-enough-for-ncbi-blast.html >> >> Instead I would use Bio.SeqIO to index the FASTA file used for the >> database, and get the sequences that way. >> >> Peter >> > > -- > Dr. Frederico Moraes Ferreira > Universidade de S?o Paulo > Faculdade de Medicina > Instituto do Cora??o - Imunologia > Av. Dr. En?as de Carvalho Aguiar, 44 > 05403-900 S?o Paulo - SP > Brasil > From anaryin at gmail.com Tue Feb 26 16:14:52 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 26 Feb 2013 17:14:52 +0100 Subject: [Biopython-dev] Slight suggestion for PDBIO Message-ID: Hello all, I've modified slightly PDBIO to allow writing of any object of our PDB representation. Right now it accepts only Models or Structures (IIRC) and sometimes it's useful to have only a chain or a residue written. I've added a layer of code that builds the "missing" parts using StructureBuilder. I pushed it to a branch in my github account: https://github.com/JoaoRodrigues/biopython/tree/pdbio I've been using it for a while now so often I completely forgot about it.. Only noticed when I changed computers and the version there could not handle this. So I guess it should be solid enough. Cheers, Jo?o From eric.talevich at gmail.com Tue Feb 26 19:35:56 2013 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 26 Feb 2013 14:35:56 -0500 Subject: [Biopython-dev] Slight suggestion for PDBIO In-Reply-To: References: Message-ID: On Tue, Feb 26, 2013 at 11:14 AM, Jo?o Rodrigues wrote: > Hello all, > > I've modified slightly PDBIO to allow writing of any object of our PDB > representation. Right now it accepts only Models or Structures (IIRC) and > sometimes it's useful to have only a chain or a residue written. I've added > a layer of code that builds the "missing" parts using StructureBuilder. > > I pushed it to a branch in my github account: > > https://github.com/JoaoRodrigues/biopython/tree/pdbio > > I've been using it for a while now so often I completely forgot about it.. > Only noticed when I changed computers and the version there could not > handle this. So I guess it should be solid enough. > > Awesome. I support the idea. Could you do a pull request, so TravisCI runs the tests automatically? -Eric From anaryin at gmail.com Tue Feb 26 19:39:08 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 26 Feb 2013 20:39:08 +0100 Subject: [Biopython-dev] Slight suggestion for PDBIO In-Reply-To: References: Message-ID: There's some discussion about some implementation details: https://github.com/JoaoRodrigues/biopython/commit/cd86f3c8f4216d59440f4eaf8ac3ba2ab05d8eb4 What does everyone else think? Thanks for the input btw. Should I make a test too? I reckon it would be a good thing to add? 2013/2/26 Eric Talevich > On Tue, Feb 26, 2013 at 11:14 AM, Jo?o Rodrigues wrote: > >> Hello all, >> >> I've modified slightly PDBIO to allow writing of any object of our PDB >> representation. Right now it accepts only Models or Structures (IIRC) and >> sometimes it's useful to have only a chain or a residue written. I've >> added >> a layer of code that builds the "missing" parts using StructureBuilder. >> >> I pushed it to a branch in my github account: >> >> https://github.com/JoaoRodrigues/biopython/tree/pdbio >> >> I've been using it for a while now so often I completely forgot about it.. >> Only noticed when I changed computers and the version there could not >> handle this. So I guess it should be solid enough. >> >> > Awesome. I support the idea. Could you do a pull request, so TravisCI runs > the tests automatically? > > -Eric > From davidjosephcain at gmail.com Tue Feb 26 19:47:32 2013 From: davidjosephcain at gmail.com (David Cain) Date: Tue, 26 Feb 2013 14:47:32 -0500 Subject: [Biopython-dev] Slight suggestion for PDBIO In-Reply-To: References: Message-ID: I failed to mention this sooner, but I'm an enthusiastic proponent of what you've done. Your new set_structure() would be immensely helpful to me, as I've been using some workarounds to achieve the functionality you've implemented. Personally, I think a unit test would be really helpful in ensuring chain-less residues and the like will save appropriately. David Cain +1 (339) 222 4452 On Tue, Feb 26, 2013 at 2:39 PM, Jo?o Rodrigues wrote: > There's some discussion about some implementation details: > > > https://github.com/JoaoRodrigues/biopython/commit/cd86f3c8f4216d59440f4eaf8ac3ba2ab05d8eb4 > > What does everyone else think? > > Thanks for the input btw. Should I make a test too? I reckon it would be a > good thing to add? > > > 2013/2/26 Eric Talevich > > > On Tue, Feb 26, 2013 at 11:14 AM, Jo?o Rodrigues >wrote: > > > >> Hello all, > >> > >> I've modified slightly PDBIO to allow writing of any object of our PDB > >> representation. Right now it accepts only Models or Structures (IIRC) > and > >> sometimes it's useful to have only a chain or a residue written. I've > >> added > >> a layer of code that builds the "missing" parts using StructureBuilder. > >> > >> I pushed it to a branch in my github account: > >> > >> https://github.com/JoaoRodrigues/biopython/tree/pdbio > >> > >> I've been using it for a while now so often I completely forgot about > it.. > >> Only noticed when I changed computers and the version there could not > >> handle this. So I guess it should be solid enough. > >> > >> > > Awesome. I support the idea. Could you do a pull request, so TravisCI > runs > > the tests automatically? > > > > -Eric > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From p.j.a.cock at googlemail.com Tue Feb 26 21:45:00 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 26 Feb 2013 21:45:00 +0000 Subject: [Biopython-dev] Slight suggestion for PDBIO In-Reply-To: References: Message-ID: On Tue, Feb 26, 2013 at 2:39 PM, Jo?o Rodrigues wrote: >> Should I make a test too? I reckon it would be a good thing to add? >> On Tue, Feb 26, 2013 at 7:47 PM, David Cain wrote: > ... > > Personally, I think a unit test would be really helpful in ensuring > chain-less residues and the like will save appropriately. Absolutely, +1 on adding a test or two for this new functionality. And if there is anywhere in the Tutorial or docstrings which would benefit from mentioning this too, could you update that too please? Thanks, Peter From anaryin at gmail.com Wed Feb 27 09:25:26 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 27 Feb 2013 10:25:26 +0100 Subject: [Biopython-dev] Slight suggestion for PDBIO In-Reply-To: References: Message-ID: I'll have a look at the tutorial later (I think it is in the Bio.PDB FAQ). The whitespace issue is solved I think. What are the rules exactly? Sorry if I'm at a bit of a loss here.. I added tests for the save functions (a full structure and a single residue) as well as one for the chainless residue. I added the suggestion from David to keep the id in the parent if there is one. I reverted the commit and added the same (- the whitespace) and another with tests. If it looks ok, I'll make a pull request (if I can find the button, never did that..). Cheers, Jo?o 2013/2/26 Peter Cock > On Tue, Feb 26, 2013 at 2:39 PM, Jo?o Rodrigues wrote: > >> Should I make a test too? I reckon it would be a good thing to add? > >> > > On Tue, Feb 26, 2013 at 7:47 PM, David Cain > wrote: > > ... > > > > Personally, I think a unit test would be really helpful in ensuring > > chain-less residues and the like will save appropriately. > > Absolutely, +1 on adding a test or two for this new functionality. > > And if there is anywhere in the Tutorial or docstrings which would > benefit from mentioning this too, could you update that too please? > > Thanks, > > Peter > From p.j.a.cock at googlemail.com Wed Feb 27 16:34:42 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 27 Feb 2013 16:34:42 +0000 Subject: [Biopython-dev] Slight suggestion for PDBIO In-Reply-To: References: Message-ID: On Wed, Feb 27, 2013 at 9:25 AM, Jo?o Rodrigues wrote: > I'll have a look at the tutorial later (I think it is in the Bio.PDB FAQ). > > The whitespace issue is solved I think. What are the rules exactly? Sorry if > I'm at a bit of a loss here.. PEP8, http://www.python.org/dev/peps/pep-0008/#blank-lines (Currently an aspiration for the Biopython code, rather than a strict requirement) > I added tests for the save functions (a full structure and a single residue) > as well as one for the chainless residue. I added the suggestion from David > to keep the id in the parent if there is one. > > I reverted the commit and added the same (- the whitespace) and another with > tests. If it looks ok, I'll make a pull request (if I can find the button, > never did that..). GitHub have made it quite easy, but the first time is always the hardest. Good luck, and if you get stuck we can try to help or just pull the commits in directly from your fork. Thanks, Peter From anaryin at gmail.com Wed Feb 27 16:41:45 2013 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 27 Feb 2013 17:41:45 +0100 Subject: [Biopython-dev] Slight suggestion for PDBIO In-Reply-To: References: Message-ID: Ok, done I guess: https://github.com/biopython/biopython/pull/165/files Thanks for all the input! 2013/2/27 Peter Cock > On Wed, Feb 27, 2013 at 9:25 AM, Jo?o Rodrigues wrote: > > I'll have a look at the tutorial later (I think it is in the Bio.PDB > FAQ). > > > > The whitespace issue is solved I think. What are the rules exactly? > Sorry if > > I'm at a bit of a loss here.. > > PEP8, http://www.python.org/dev/peps/pep-0008/#blank-lines > > (Currently an aspiration for the Biopython code, rather than a strict > requirement) > > > I added tests for the save functions (a full structure and a single > residue) > > as well as one for the chainless residue. I added the suggestion from > David > > to keep the id in the parent if there is one. > > > > I reverted the commit and added the same (- the whitespace) and another > with > > tests. If it looks ok, I'll make a pull request (if I can find the > button, > > never did that..). > > GitHub have made it quite easy, but the first time is always the hardest. > Good luck, and if you get stuck we can try to help or just pull the commits > in directly from your fork. > > Thanks, > > Peter > From p.j.a.cock at googlemail.com Wed Feb 27 22:32:35 2013 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 27 Feb 2013 22:32:35 +0000 Subject: [Biopython-dev] Fwd: [Numpy-discussion] [ANN] SciPy2013: Call for abstracts In-Reply-To: References: Message-ID: The new bioinformatics mini-symposium this year makes SciPy 2013 especially interesting. Peter ---------- Forwarded message ---------- From: *Jonathan Rocher* Date: Wednesday, February 27, 2013 Subject: [Numpy-discussion] [ANN] SciPy2013: Call for abstracts To: SciPy Users List , numfocus at googlegroups.com, Discussion of Numerical Python [Apologies for cross-posts] Dear all, The annual SciPy Conference (Scientific Computing with Python) allows participants from academic, commercial, and governmental organizations to showcase their latest projects, learn from skilled users and developers, and collaborate on code development. *The deadline for abstract submissions is March 20th, 2013. * Submissions are welcome that address general Scientific Computing with Python, one of the two special themes for this years conference (machine learning & reproducible science), or the domain-specific mini-symposiaheld during the conference (Meteorology, climatology, and atmospheric and oceanic science, Astronomy and astrophysics, Medical imaging, Bio-informatics). Please submit your abstract at the SciPy 2013 website abstract submission form . Abstracts will be accepted for posters or presentations. Optional papers to be published in the conference proceedings will be requested following abstract submission. This year the proceedings will be made available prior to the conference to help attendees navigate the conference. We look forward to an exciting and interesting set of talks, posters, and discussions and hope to see you at the conference. The SciPy 2013 Program Committee Chairs Matt McCormick, Kitware, Inc. Katy Huff, University of Wisconsin-Madison and Argonne National Laboratory From redmine at redmine.open-bio.org Thu Feb 28 01:53:22 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 28 Feb 2013 01:53:22 +0000 Subject: [Biopython-dev] [Biopython - Bug #3419] (New) Bio.SearchIO.FastaIO Message-ID: Issue #3419 has been reported by Jason Stajich. ---------------------------------------- Bug #3419: Bio.SearchIO.FastaIO https://redmine.open-bio.org/issues/3419 Author: Jason Stajich Status: New Priority: Low Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: The strand of the translated sequence (query or subject depending on the analysis) is lost for tfastxy and fastx/y reports. from Bio import SearchIO qresults = SearchIO.parse('test.FASTY.out','fasta-m10') for qresult in qresults: for hit in qresult: for hsp in hit.hsps: print qresult.id, " ", hit.id, " ", \ hsp.query_start, "..",hsp.query_end, " ", hsp.query_strand, " ", \ hsp.hit_start, "..",hsp.hit_end, " ", hsp.hit_strand ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Feb 28 01:53:23 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 28 Feb 2013 01:53:23 +0000 Subject: [Biopython-dev] [Biopython - Bug #3419] (New) Bio.SearchIO.FastaIO Message-ID: Issue #3419 has been reported by Jason Stajich. ---------------------------------------- Bug #3419: Bio.SearchIO.FastaIO https://redmine.open-bio.org/issues/3419 Author: Jason Stajich Status: New Priority: Low Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: The strand of the translated sequence (query or subject depending on the analysis) is lost for tfastxy and fastx/y reports. from Bio import SearchIO qresults = SearchIO.parse('test.FASTY.out','fasta-m10') for qresult in qresults: for hit in qresult: for hsp in hit.hsps: print qresult.id, " ", hit.id, " ", \ hsp.query_start, "..",hsp.query_end, " ", hsp.query_strand, " ", \ hsp.hit_start, "..",hsp.hit_end, " ", hsp.hit_strand -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Feb 28 07:08:50 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 28 Feb 2013 07:08:50 +0000 Subject: [Biopython-dev] [Biopython - Bug #3419] Bio.SearchIO.FastaIO References: Message-ID: Issue #3419 has been updated by Wibowo Arindrarto. Hi Jason, Thanks for the report :). Do you have an example file handy which I can try to include in our test suite? The FASTA parser was not tested using [t]fast[y|x], so there may be some lines / cases which the parser couldn't handle. ---------------------------------------- Bug #3419: Bio.SearchIO.FastaIO https://redmine.open-bio.org/issues/3419 Author: Jason Stajich Status: New Priority: Low Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: The strand of the translated sequence (query or subject depending on the analysis) is lost for tfastxy and fastx/y reports. from Bio import SearchIO qresults = SearchIO.parse('test.FASTY.out','fasta-m10') for qresult in qresults: for hit in qresult: for hsp in hit.hsps: print qresult.id, " ", hit.id, " ", \ hsp.query_start, "..",hsp.query_end, " ", hsp.query_strand, " ", \ hsp.hit_start, "..",hsp.hit_end, " ", hsp.hit_strand -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Feb 28 07:38:20 2013 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 28 Feb 2013 07:38:20 +0000 Subject: [Biopython-dev] [Biopython - Bug #3419] Bio.SearchIO.FastaIO References: Message-ID: Issue #3419 has been updated by Jason Stajich. File bll0026-vs-94.tfasty added Here is a -m 10 report. I made this local patch to get it to report the strands, but this is not quite right because you actually don't have a strand for the query which is the protein. diff --git a/Bio/SearchIO/FastaIO.py b/Bio/SearchIO/FastaIO.py index ca08797..794efb8 100644 --- a/Bio/SearchIO/FastaIO.py +++ b/Bio/SearchIO/FastaIO.py @@ -197,7 +197,7 @@ def _set_hsp_seqs(hsp, parsed, program): # set seq and alphabet setattr(hsp.fragment, seq_type, parsed[seq_type]['seq']) - if alphabet is not generic_protein: + if alphabet is not generic_protein or 'tfast' in program: # get strand from coordinate; start <= end is plus # start > end is minus if start <= end: In BioPerl I solved this by writing explicit code for the TBLASTN/TFAST[XY] and BLASTX/FAST[XY] situations which then new whether the query or subject was translated DNA with a strand or input DNA. ---------------------------------------- Bug #3419: Bio.SearchIO.FastaIO https://redmine.open-bio.org/issues/3419 Author: Jason Stajich Status: New Priority: Low Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: The strand of the translated sequence (query or subject depending on the analysis) is lost for tfastxy and fastx/y reports. from Bio import SearchIO qresults = SearchIO.parse('test.FASTY.out','fasta-m10') for qresult in qresults: for hit in qresult: for hsp in hit.hsps: print qresult.id, " ", hit.id, " ", \ hsp.query_start, "..",hsp.query_end, " ", hsp.query_strand, " ", \ hsp.hit_start, "..",hsp.hit_end, " ", hsp.hit_strand -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org