From p.j.a.cock at googlemail.com Thu Apr 1 05:45:29 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 1 Apr 2010 10:45:29 +0100 Subject: [Biopython-dev] pylint, was: Changes to the main repo In-Reply-To: <320fb6e01003230450h502adce0p27080d3a00ddda23@mail.gmail.com> References: <320fb6e01003220219u5f2020e1v6826a4e331ceb96d@mail.gmail.com> <320fb6e01003230450h502adce0p27080d3a00ddda23@mail.gmail.com> Message-ID: 2010/3/23 Peter Cock : > Hi Taigo, > > This is looking much better after your fixes last night - just one left: > > $ pylint --disable-msg-cat=CRW --include-ids=y > --disable-msg=E1101,E1103,E0102 -r n Bio.PopGen > No config file found, using default configuration > ************* Module Bio.PopGen.GenePop.Controller > E0602: 41:_read_allele_freq_table: Undefined variable 'self' > > Note if I turn off those particular error messages which in other > situations I had tentatively tagged as false positives, there could > be a few more issues: > > $ pylint --disable-msg-cat=CRW --include-ids=y -r n Bio.PopGen > ... Again, looking much better after yesterday's work: $ pylint --disable-msg-cat=CRW --include-ids=y -r n Bio.PopGen No config file found, using default configuration ************* Module Bio.PopGen.GenePop.EasyController E1101: 43:EasyController.test_hw_pop: Instance of 'GenePopController' has no 'test_pop_hz_prob' member ************* Module Bio.PopGen.GenePop.FileParser E1101:197:FileRecord.remove_population: Instance of 'FileRecord' has no 'populations' member E1101:206:FileRecord.remove_locus_by_position: Instance of 'FileRecord' has no 'populations' member The EasyController issue looks fairly simple, there are three test methods defined in Bio.PopGen.EasyPop.Controller * test_pop_hz_deficiency * test_pop_hz_excess * test_pop_hw_prob However, in Bio.PopGen.EasyPop.EasyController the method test_hw_pop tries to call these three methods of the controller: * test_pop_hz_deficiency * test_pop_hz_excess * test_pop_hz_prob It looks like an hw/hz typo - but as you use hw in other contexts, I am not 100% sure about this diagnosis. The second set of errors are in Bio.PopGen.GenePop.FileParser which is does look like self.populations is never defined. So again, this looks like pylint has found a real issue. Regards, Peter From tiagoantao at gmail.com Thu Apr 1 13:56:27 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 1 Apr 2010 18:56:27 +0100 Subject: [Biopython-dev] pylint, was: Changes to the main repo In-Reply-To: References: <320fb6e01003220219u5f2020e1v6826a4e331ceb96d@mail.gmail.com> <320fb6e01003230450h502adce0p27080d3a00ddda23@mail.gmail.com> Message-ID: > ************* Module Bio.PopGen.GenePop.FileParser > E1101:197:FileRecord.remove_population: Instance of 'FileRecord' has > no 'populations' member > E1101:206:FileRecord.remove_locus_by_position: Instance of > 'FileRecord' has no 'populations' member Oh gosh, this one I forgot to implement in the new parser. And it is going to be needed in some of the applications using this code. On 1.56. Tiago From tiagoantao at gmail.com Thu Apr 1 13:58:48 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 1 Apr 2010 18:58:48 +0100 Subject: [Biopython-dev] pylint, was: Changes to the main repo In-Reply-To: References: <320fb6e01003220219u5f2020e1v6826a4e331ceb96d@mail.gmail.com> <320fb6e01003230450h502adce0p27080d3a00ddda23@mail.gmail.com> Message-ID: 2010/4/1 Tiago Ant?o : > Oh gosh, this one I forgot to implement in the new parser. And it is > going to be needed in some of the applications using this code. On > 1.56. Sorry for the commit log on github. I actually put one that was sensible, but it ended up with a merge message... -- "If you want to get laid, go to college. If you want an education, go to the library." - Frank Zappa From p.j.a.cock at googlemail.com Fri Apr 2 07:09:22 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 2 Apr 2010 12:09:22 +0100 Subject: [Biopython-dev] pylint, was: Changes to the main repo In-Reply-To: References: <320fb6e01003220219u5f2020e1v6826a4e331ceb96d@mail.gmail.com> <320fb6e01003230450h502adce0p27080d3a00ddda23@mail.gmail.com> Message-ID: 2010/4/1 Tiago Ant?o : > 2010/4/1 Tiago Ant?o : >> Oh gosh, this one I forgot to implement in the new parser. And it is >> going to be needed in some of the applications using this code. On >> 1.56. > > Sorry for the commit log on github. I actually put one that was > sensible, but it ended up with a merge message... > Its fine - there is a comment on the commit before the merge. It looks like there is one final grumble from pylint: $ pylint --disable-msg-cat=CRW --include-ids=y -r n Bio.PopGen No config file found, using default configuration ************* Module Bio.PopGen.GenePop.EasyController E1120: 43:EasyController.test_hw_pop: No value passed for parameter 'ext' in function call Peter From bugzilla-daemon at portal.open-bio.org Fri Apr 2 08:38:42 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 2 Apr 2010 08:38:42 -0400 Subject: [Biopython-dev] [Bug 3000] Could SeqIO.parse() store the whole, unparsed multiline entry? In-Reply-To: Message-ID: <201004021238.o32CcgqI022975@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3000 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-02 08:38 EST ------- (In reply to comment #3) > Created an attachment (id=1436) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1436&action=view) [details] > Adds a get_raw method to the dictionaries returned by Bio.SeqIO.index() > > Outline implementation of an alternative proposal, allowing access to the > raw text for each record via the Bio.SeqIO.index() dictionary like objects. > See discussion here: > http://lists.open-bio.org/pipermail/biopython-dev/2010-February/007301.html Following a positive discussion on the mailing list, I have just checked in an updated patch including FASTQ file support, unit tests and documentation. Right now the only indexed file format not supported by the get_raw method is SFF... which could be done with a little more more. Although this does not implement the original request ("Could SeqIO.parse() store the whole, unparsed multiline entry?"), it does allow the original use case to be solved neatly with Bio.SeqIO - so I'm marking this bug as fixed. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at 50mail.com Fri Apr 2 09:05:48 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 2 Apr 2010 09:05:48 -0400 Subject: [Biopython-dev] BOSC and OpenBio solution challenge reminder -- April 15th Message-ID: <20100402130548.GG36623@sobchak.mgh.harvard.edu> Hello all; A friendly reminder that the deadline for the Bioinformatics Open Source Conference (BOSC) is coming up on April 15th: http://www.open-bio.org/wiki/BOSC_2010 This is a great opportunity to discuss code and biology with fellow developers. One session which I'd like to emphasize is the OpenBio Solution Challenge, a section of talks that describes how to solve practical problems in bioinformatics using a variety of approaches: http://www.open-bio.org/wiki/SolutionChallenge Any toolkit developers who are interested in giving a talk are encouraged to submit an abstract for the challenge. We have some initial project ideas on the page and welcome your feedback for other useful workflows that would emphasize the advantages of using open source toolkits to solve biological problems. Please copy messages to the OpenBio mailing list as a central point for discussion and questions: http://lists.open-bio.org/mailman/listinfo/open-bio-l Looking forward to seeing everyone in July, Brad BOSC contact and dates: Date: July 9-10, 2010 Location: Boston, Massachusetts, USA BOSC 2010 web site: http://www.open-bio.org/wiki/BOSC_2010 Abstract submission via Open Conference System site: http://events.open-bio.org/BOSC2010/openconf.php E-mail: bosc at open-bio.org Bosc-announce list: http://lists.open-bio.org/mailman/listinfo/bosc-announce Important Dates April 15: Abstract deadline May 5: Notification of accepted abstracts May 28: Early Registration Discount Cut-off date July 8-9: Codefest 2010 July 9-10: BOSC 2010 August 15: Manuscript deadline for BOSC 2010 Proceedings published in BMC Bioinformatics From bugzilla-daemon at portal.open-bio.org Fri Apr 2 09:32:36 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 2 Apr 2010 09:32:36 -0400 Subject: [Biopython-dev] [Bug 3026] Bio.SeqIO.InsdcIO._split_multi_line(): Your description cannot be broken into nice lines! In-Reply-To: Message-ID: <201004021332.o32DWad3025175@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3026 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-02 09:32 EST ------- I've update the code to give a slightly more helpful error message (it now include the problem text). I think the proper fix might be to try and split long words (like URLs) on hyphens or slashes if they can't otherwise fit in the allowed space. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Apr 2 12:24:06 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 2 Apr 2010 17:24:06 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.54 beta Message-ID: Hi all, I'm going try and put out a Biopython 1.54 beta release now, so could people please not check in anything to the trunk. Hopefully we can do the release proper in a week or two... Peter From p.j.a.cock at googlemail.com Fri Apr 2 13:34:00 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 2 Apr 2010 18:34:00 +0100 Subject: [Biopython-dev] Biopython 1.54 beta released Message-ID: Dear all, A beta release for Biopython 1.54 is now available for download and testing, as announced here: http://news.open-bio.org/news/2009/06/biopython-154-beta-released/ Noted that I haven't done a fully detailed release announcement, we'll leave that for the official release. Source distributions and Windows installers are available from the downloads page on the Biopython website. http://biopython.org/wiki/Download We are interested in getting feedback on the beta release as a whole, but especially on the new features - including the updated multiple sequence alignment object (which is what you?ll now get when parsing alignments with Bio.AlignIO), the new Bio.Phylo module, and the Bio.SeqIO support for Standard Flowgram Format (SFF) files. (At least) 10 people contributed to this release (so far), which includes 4 new people: Anne Pajon (first contribution) Brad Chapman Christian Zmasek Eric Talevich Jose Blanca (first contribution) Kevin Jacobs (first contribution) Leighton Pritchard Michiel de Hoon Peter Cock Thomas Holder (first contribution) On behalf of the Biopython team, thank you for any feedback, bug reports, and contributions. Peter P.S. You may wish to subscribe to our news feed. For RSS links etc, see: http://biopython.org/wiki/News Biopython news is also on twitter: http://twitter.com/biopython From p.j.a.cock at googlemail.com Fri Apr 2 13:39:08 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 2 Apr 2010 18:39:08 +0100 Subject: [Biopython-dev] Biopython 1.54 beta released In-Reply-To: References: Message-ID: > Dear all, > > A beta release for Biopython 1.54 is now available for download > and testing, as announced here: > > http://news.open-bio.org/news/2009/06/biopython-154-beta-released/ > > Noted that I haven't done a fully detailed release announcement, > we'll leave that for the official release. That URL should have been: http://news.open-bio.org/news/2010/04/biopython-1-54-beta-released/ Sorry for the extra email, Peter From biopython at maubp.freeserve.co.uk Fri Apr 2 14:49:44 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 2 Apr 2010 19:49:44 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.54 beta In-Reply-To: References: Message-ID: On Fri, Apr 2, 2010 at 5:24 PM, Peter wrote: > Hi all, > > I'm going try and put out a Biopython 1.54 beta release now, > so could people please not check in anything to the trunk. > > Hopefully we can do the release proper in a week or two... > > Peter OK, so the beta is out there. Maybe this wasn't really needed, but I wanted to be a little cautious regarding the alignment changes (which *might* break something). We'll need to address any issues reported from that but in the meantime we can do more work on the documentation. In particular, I'd like to have a chapter on Bio.Phylo (which will likely be based heavily on Eric's wiki page). Eric - do you know any LaTeX? If not, don't worry too much. We can guide you though installing pdflatex and hevea for producing PDF and HTML, and then adding things to the tutorial should be fairly easy. Or, the simpler option (for you) would be to hand over plain text and a kind volunteer (maybe me) can handle the LaTeX markup. David - would you be able to write us a proper announcement for the final Biopython 1.54 release? We'll want to highlight the fact that Eric's Bio.Phylo module was from the GSoC 2009 (and link to this year's projects). Thanks all, and to those enjoying an Easter Break, have a nice holiday - not of this is urgent ;) Peter P.S. Note to self or anyone interested: Why did the source code tar ball and zip file jump in size by about 2MB? Was it just the accumulation of more code and more tests - or did I mess up? From bugzilla-daemon at portal.open-bio.org Sat Apr 3 01:58:24 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 3 Apr 2010 01:58:24 -0400 Subject: [Biopython-dev] [Bug 3026] Bio.SeqIO.InsdcIO._split_multi_line(): Your description cannot be broken into nice lines! In-Reply-To: Message-ID: <201004030558.o335wOXo020189@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3026 ------- Comment #4 from mmokrejs at ribosome.natur.cuni.cz 2010-04-03 01:58 EST ------- I do not know what I would like to happen here in addition to the improved error message. Probably not get an error at all and have biopython able to cope with these cases as well. I have just asked asimpson at ludwig.org.br whether fix of the data in dbEST would be feasible. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Sat Apr 3 09:00:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 3 Apr 2010 14:00:53 +0100 Subject: [Biopython-dev] epydoc formatting in Bio.Phylo Message-ID: Hi Eric, One of the tasks in building a release which I am only doing now is updating the API docs: http://biopython.org/wiki/Building_a_release Using epydoc can raise warnings about errors in the code (usually things like broken imports) which means it doubles as a code check. The code news is I don't see any such issues (beyond existing name shadowing which we are stuck with). However, this has flagged a few things in Bio.Phylo, most of which look like documentation formatting issues since you have explicitly stated you are using the epydoc markup with: __docformat__ = "epytext en" +------------------------------------------------------------------------------------------------------ | File /usr/local/lib/python2.6/dist-packages/Bio/Phylo/BaseTree.py, line 671, in | Bio.Phylo.BaseTree.Subtree | Warning: @param for unknown parameter "label" | +------------------------------------------------------------------------------------------------------ | File /usr/local/lib/python2.6/dist-packages/Bio/Phylo/BaseTree.py, line 507, in | Bio.Phylo.BaseTree.TreeMixin.prune | Warning: Line 513: Possible mal-formatted field item. | +------------------------------------------------------------------------------------------------------ | File /usr/local/lib/python2.6/dist-packages/Bio/Phylo/Newick.py, line 255, in | Bio.Phylo.Newick._TreeShim.root_with_outgroup | Warning: Line 258: Improper paragraph indentation. | +------------------------------------------------------------------------------------------------------ | File /usr/local/lib/python2.6/dist-packages/Bio/Phylo/PhyloXML.py, line 117, in | Bio.Phylo.PhyloXML.Phylogeny | Warning: @param for unknown parameter "clade" | +------------------------------------------------------------------------------------------------------ | File /usr/local/lib/python2.6/dist-packages/Bio/Phylo/_utils.py, line 224, in | Bio.Phylo._utils.draw_ascii | Warning: Lines 228, 229, 230, 231, 232, 233, 234: Improper paragraph indentation. | +------------------------------------------------------------------------------------------------------ | File /usr/local/lib/python2.6/dist-packages/Bio/Phylo/_utils.py, line 132, in | Bio.Phylo._utils.draw_graphviz | Warning: Lines 159, 164, 176, 179: Fields must be the final elements in an epytext string. | Warning: Line 179: Improper paragraph indentation. | Another to-do item before Biopython 1.54 final. Peter From eric.talevich at gmail.com Sat Apr 3 09:21:22 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 3 Apr 2010 09:21:22 -0400 Subject: [Biopython-dev] Trunk freeze for Biopython 1.54 beta In-Reply-To: References: Message-ID: Hi Peter, On Fri, Apr 2, 2010 at 2:49 PM, Peter wrote: > Eric - do you know any LaTeX? If not, don't worry too > much. We can guide you though installing pdflatex > and hevea for producing PDF and HTML, and then > adding things to the tutorial should be fairly easy. > Or, the simpler option (for you) would be to hand > over plain text and a kind volunteer (maybe me) > can handle the LaTeX markup. > Yes, I can do LaTeX. Would it be better to rework the wiki page (or a section of it) into something chapter-like first, or just add a draft of the new chapter into the main tutorial document right away? Also, did you have a list of specific topics/subsections I should cover? P.S. Note to self or anyone interested: Why did > the source code tar ball and zip file jump in size > by about 2MB? Was it just the accumulation of > more code and more tests - or did I mess up? > The example phyloXML files are kind of hefty, especially ncbi_taxonomy_mollusca.xml. If size increase is a problem, I can remove that file from the unit tests without substantial harm. -Eric From biopython at maubp.freeserve.co.uk Sat Apr 3 09:37:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 3 Apr 2010 14:37:57 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.54 beta In-Reply-To: References: Message-ID: On Sat, Apr 3, 2010 at 2:21 PM, Eric Talevich wrote: > Hi Peter, > > On Fri, Apr 2, 2010 at 2:49 PM, Peter wrote: > >> Eric - do you know any LaTeX? If not, don't worry too >> much. We can guide you though installing pdflatex >> and hevea for producing PDF and HTML, and then >> adding things to the tutorial should be fairly easy. >> Or, the simpler option (for you) would be to hand >> over plain text and a kind volunteer (maybe me) >> can handle the LaTeX markup. > > Yes, I can do LaTeX. Would it be better to rework the > wiki page (or a section of it) into something chapter-like > first, or just add a draft of the new chapter into the main > tutorial document right away? Up to you. Maybe start with polishing the wiki? The only tricky bit will be images (HTML vs PDF layout), but there are examples you can copy (e.g. the Graphics chapter). > Also, did you have a list of specific topics/subsections > I should cover? Well, the basics of reading, writing and converting trees from different formats. Then something on using the tree objects... I was going to suggest re-rooting a tree but as per the earlier thread, its a bit more complicated than I had expected. How about taking a tree, and coloring specific clades (to save the colors in XML output and/or a graphical output)? >> P.S. Note to self or anyone interested: Why did >> the source code tar ball and zip file jump in size >> by about 2MB? Was it just the accumulation of >> more code and more tests - or did I mess up? > > The example phyloXML files are kind of hefty, > especially ncbi_taxonomy_mollusca.xml. If size > increase is a problem, I can remove that file > from the unit tests without substantial harm. That is just a 66K zip file though, so it isn't that. Peter From eric.talevich at gmail.com Sun Apr 4 10:12:19 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 4 Apr 2010 10:12:19 -0400 Subject: [Biopython-dev] epydoc formatting in Bio.Phylo In-Reply-To: References: Message-ID: On Sat, Apr 3, 2010 at 9:00 AM, Peter wrote: > Hi Eric, > > One of the tasks in building a release which I am only doing now is > updating the API docs: > http://biopython.org/wiki/Building_a_release > > Using epydoc can raise warnings about errors in the code (usually > things like broken imports) which means it doubles as a code check. > The code news is I don't see any such issues (beyond existing name > shadowing which we are stuck with). > > However, this has flagged a few things in Bio.Phylo, most of which > look like documentation formatting issues since you have explicitly > stated you are using the epydoc markup [...] > OK, I fixed the docstrings for epydoc (and a few other things) and pushed to GitHub. It should be all right now. Thanks, Eric From eric.talevich at gmail.com Sun Apr 4 10:50:21 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 4 Apr 2010 10:50:21 -0400 Subject: [Biopython-dev] String representation of trees in Bio.Phylo Message-ID: Hi all, The new phylogenetics module Bio.Phylo supports a few new ways of displaying trees. I'm trying to decide which of these should be used as the informal string representation for whole trees, i.e. what happens when you type "print tree" for some newly parsed tree object. A Tree consists of some global information (e.g. rooted or not) plus nested lists of Subtrees, which Clade objects in PhyloXML inherit from. Currently, the Subtree __str__ method is treated as a label for a clade -- it's the clade's name, if available; in the absence of any other identifier it prints out the class name. Similarly, str(Tree) just prints out the tree's 'name' attribute, or "Tree"; this probably isn't what the user expects, though. Here are the options. To start the example, here's a tree parsed from phyloXML and displayed as a Newick tree: >>> from Bio import Phylo >>> tree = Phylo.parse('ex/phyloxml_examples.xml', 'phyloxml').next() >>> print tree.format('newick') ((A:0.10200,B:0.23000)0.00000:0.06000,C:0.40000)0.00:0.00000; The pretty_print function, with the show_all option, uses 'repr' recursively to display the tree's nodes. I think this is probably the best choice for Tree.__str__, but it can be a bit cluttered if a lot of information is attached to each node/subtree/clade. >>> Phylo.pretty_print(tree, show_all=True) Phylogeny(rooted='True', description='phyloXML allows to use either a "branch_length" attribute...', name='example from Prof. Joe Felsenstein's book "Inferring Phyl...') Clade() Clade(branch_length='0.06') Clade(branch_length='0.102', name='A') Clade(branch_length='0.23', name='B') Clade(branch_length='0.4', name='C') By default, pretty_print uses 'str' instead of 'repr', showing only class names and string representations (labels) to reduce the clutter: >>> Phylo.pretty_print(tree) Phylogeny: example from Prof. Joe Felsenstein's ... Clade: Clade Clade: Clade Clade: A Clade: B Clade: C Is this useful to anyone? If not, then I could drop this part of the pretty_print function entirely. As an alternative, we could print the tree as ASCII art, as some other toolkits do. However, this function is very limited -- it doesn't print internal node labels, and trees of more than a couple hundred nodes will look strange, since the drawing is compressed into a fixed number of character columns (default 80). >>> Phylo.draw_ascii(tree) __________________ A __________| _| |___________________________________________ B | |___________________________________________________________________________ C For reference, here's the raw phyloXML: >>> Phylo.write(tree, sys.stdout, 'phyloxml', indent=True) example from Prof. Joe Felsenstein's book "Inferring Phylogenies" phyloXML allows to use either a "branch_length" attribute or element to indicate branch lengths. 0.06 A 0.102 B 0.23 C 0.4 What do you think? Thanks, Eric From chapmanb at 50mail.com Sun Apr 4 13:19:03 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 4 Apr 2010 13:19:03 -0400 Subject: [Biopython-dev] String representation of trees in Bio.Phylo In-Reply-To: References: Message-ID: <20100404171903.GF19540@kunkel> Hi Eric; > The new phylogenetics module Bio.Phylo supports a few new ways of displaying > trees. I'm trying to decide which of these should be used as the informal > string representation for whole trees, i.e. what happens when you type > "print tree" for some newly parsed tree object. [...[ > The pretty_print function, with the show_all option, uses 'repr' recursively > to display the tree's nodes. I think this is probably the best choice for > Tree.__str__, but it can be a bit cluttered if a lot of information is > attached to each node/subtree/clade. > > >>> Phylo.pretty_print(tree, show_all=True) > Phylogeny(rooted='True', description='phyloXML allows to use either a > "branch_length" attribute...', name='example from Prof. Joe Felsenstein's > book "Inferring Phyl...') > Clade() > Clade(branch_length='0.06') > Clade(branch_length='0.102', name='A') > Clade(branch_length='0.23', name='B') > Clade(branch_length='0.4', name='C') I like this one. Agreed that it could get ugly, but I think it shows the structure and associated information well. > As an alternative, we could print the tree as ASCII art, as some other > toolkits do. However, this function is very limited -- it doesn't print > internal node labels, and trees of more than a couple hundred nodes will > look strange, since the drawing is compressed into a fixed number of > character columns (default 80). > > >>> Phylo.draw_ascii(tree) > __________________ A > __________| > _| |___________________________________________ B > | > |___________________________________________________________________________ C This is a good idea. I think this is more useful than the current pretty_print without show_all for getting a quick overview of the tree. Brad From chapmanb at 50mail.com Sun Apr 4 13:19:03 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 4 Apr 2010 13:19:03 -0400 Subject: [Biopython-dev] String representation of trees in Bio.Phylo In-Reply-To: References: Message-ID: <20100404171903.GF19540@kunkel> Hi Eric; > The new phylogenetics module Bio.Phylo supports a few new ways of displaying > trees. I'm trying to decide which of these should be used as the informal > string representation for whole trees, i.e. what happens when you type > "print tree" for some newly parsed tree object. [...[ > The pretty_print function, with the show_all option, uses 'repr' recursively > to display the tree's nodes. I think this is probably the best choice for > Tree.__str__, but it can be a bit cluttered if a lot of information is > attached to each node/subtree/clade. > > >>> Phylo.pretty_print(tree, show_all=True) > Phylogeny(rooted='True', description='phyloXML allows to use either a > "branch_length" attribute...', name='example from Prof. Joe Felsenstein's > book "Inferring Phyl...') > Clade() > Clade(branch_length='0.06') > Clade(branch_length='0.102', name='A') > Clade(branch_length='0.23', name='B') > Clade(branch_length='0.4', name='C') I like this one. Agreed that it could get ugly, but I think it shows the structure and associated information well. > As an alternative, we could print the tree as ASCII art, as some other > toolkits do. However, this function is very limited -- it doesn't print > internal node labels, and trees of more than a couple hundred nodes will > look strange, since the drawing is compressed into a fixed number of > character columns (default 80). > > >>> Phylo.draw_ascii(tree) > __________________ A > __________| > _| |___________________________________________ B > | > |___________________________________________________________________________ C This is a good idea. I think this is more useful than the current pretty_print without show_all for getting a quick overview of the tree. Brad From bugzilla-daemon at portal.open-bio.org Mon Apr 5 20:49:40 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Apr 2010 20:49:40 -0400 Subject: [Biopython-dev] [Bug 3042] New: test_Mafft_tool fails Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3042 Summary: test_Mafft_tool fails Product: Biopython Version: 1.54b Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: mdehoon at ims.u-tokyo.ac.jp This is the error message I get: ====================================================================== FAIL: Simple round-trip through app with infile. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Mafft_tool.py", line 56, in test_Mafft_simple self.assert_("STEP 2 / 2 d" in stderr_string) AssertionError ====================================================================== FAIL: Round-trip with complex command line. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Mafft_tool.py", line 126, in test_Mafft_with_complex_command_line self.assertEqual(return_code, 0) AssertionError: 1 != 0 This is with MAFFT version 5.732 (2005/09/14). The output it generates starts with: $ mafft Fasta/f002 blosum 62 ppenalty = -1530 poffset = -123 generating 200PAM scoring matrix for nucleotides ... done scoremtx = -1 Gap Penalty = -1.53, +0.00, -0.12 Making a distance matrix .. 1 / 3nknown character n done. Constructing dendrogram ... 0 / 3 done. Progressive alignment ... STEP 2 /2 done. Whereas the bug may disappear with newer versions of mafft, most Biopython users will not use mafft, and we should not require to have the latest version of mafft installed to avoid test errors. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Apr 5 20:55:31 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Apr 2010 20:55:31 -0400 Subject: [Biopython-dev] [Bug 3043] New: test_NCBI_BLAST_tools fails Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3043 Summary: test_NCBI_BLAST_tools fails Product: Biopython Version: 1.54b Platform: PC OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: mdehoon at ims.u-tokyo.ac.jp This is the error I get: ====================================================================== FAIL: Check all blastn arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 121, in test_blastn self.check("blastn", Applications.NcbiblastnCommandline) File "test_NCBI_BLAST_tools.py", line 109, in check "Wrapper is missing: " + ", ".join(sorted(missing))) AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, - verbose ====================================================================== FAIL: Check all blastp arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 117, in test_blastp self.check("blastp", Applications.NcbiblastpCommandline) File "test_NCBI_BLAST_tools.py", line 109, in check "Wrapper is missing: " + ", ".join(sorted(missing))) AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, - verbose ====================================================================== FAIL: Check all blastx arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 113, in test_blastx self.check("blastx", Applications.NcbiblastxCommandline) File "test_NCBI_BLAST_tools.py", line 109, in check "Wrapper is missing: " + ", ".join(sorted(missing))) AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, - verbose ====================================================================== FAIL: Check all psiblast arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 133, in test_psiblast self.check("psiblast", Applications.NcbipsiblastCommandline) File "test_NCBI_BLAST_tools.py", line 109, in check "Wrapper is missing: " + ", ".join(sorted(missing))) AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, - verbose ====================================================================== FAIL: Check all rpsblast arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 137, in test_rpsblast self.check("rpsblast", Applications.NcbirpsblastCommandline) File "test_NCBI_BLAST_tools.py", line 109, in check "Wrapper is missing: " + ", ".join(sorted(missing))) AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, - verbose ====================================================================== FAIL: Check all rpstblastn arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 141, in test_rpstblastn self.check("rpstblastn", Applications.NcbirpstblastnCommandline) File "test_NCBI_BLAST_tools.py", line 109, in check "Wrapper is missing: " + ", ".join(sorted(missing))) AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, - verbose ====================================================================== FAIL: Check all tblastn arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 129, in test_tblastn self.check("tblastn", Applications.NcbitblastnCommandline) File "test_NCBI_BLAST_tools.py", line 109, in check "Wrapper is missing: " + ", ".join(sorted(missing))) AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, - verbose ====================================================================== FAIL: Check all tblastx arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 125, in test_tblastx self.check("tblastx", Applications.NcbitblastxCommandline) File "test_NCBI_BLAST_tools.py", line 109, in check "Wrapper is missing: " + ", ".join(sorted(missing))) AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, - verbose but actually these seem to be extra options rather than missing options: $ blastn -h USAGE blastn [-h] [-help] [-import_search_strategy filename] [-export_search_strategy filename] [-task task_name] [-db database_name] [-dbsize num_letters] [-gilist filename] [-negative_gilist filename] [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm] [-subject subject_input_file] [-subject_loc range] [-query input_file] [-out output_file] [-evalue evalue] [-word_size int_value] [-gapopen open_penalty] [-gapextend extend_penalty] [-perc_identity float_value] [-xdrop_ungap float_value] [-xdrop_gap float_value] [-xdrop_gap_final float_value] [-searchsp int_value] [-penalty penalty] [-reward reward] [-no_greedy] [-min_raw_gapped_score int_value] [-template_type type] [-template_length int_value] [-dust DUST_options] [-filtering_db filtering_database] [-window_masker_taxid window_masker_taxid] [-window_masker_db window_masker_db] [-soft_masking soft_masking] [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value] [-best_hit_score_edge float_value] [-window_size int_value] [-use_index boolean] [-index_name string] [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines] [-outfmt format] [-show_gis] [-num_descriptions int_value] [-num_alignments int_value] [-html] [-max_target_seqs num_sequences] [-num_threads int_value] [-remote] [-verbose] [-remote_verbose] [-use_test_remote_service] [-version] DESCRIPTION Nucleotide-Nucleotide BLAST 2.2.22+ Use '-help' to print detailed descriptions of command line arguments In any case, probably there will be slight differences in the options used by different versions of Blast, and this shouldn't cause tests to fail. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From winda002 at student.otago.ac.nz Mon Apr 5 21:51:05 2010 From: winda002 at student.otago.ac.nz (David Winter) Date: Tue, 06 Apr 2010 13:51:05 +1200 Subject: [Biopython-dev] Draft Announcement for Biopython 1.54 Message-ID: <20100406135105.61102b7osdvos03t@www.studentmail.otago.ac.nz> Hi all, Here's a draft announcement for the next release, very happy to take corrections and suggestions on how to change it. I'll put a marked up version of this on the OBF server soon. Cheers, David -- Biopythn 1.54 Released The Biopython team is proud to announce Biopython 1.54, a new stable release of the Biopython library. Biopython 1.54 comes four months after our last release and brings new features, tweaks to some established functions and the usual collection of bug fixes. This is the first stable release to feature the new Bio.Phylo module which can be used to read, write and take data from phylogenetic trees in Newick, Nexus and PhyloXML formats. The module is the result of Erick Talevich's Google Summer of Code project which was supported by The National Evolutionary Synthesis Center (NESCent). Biopython now supports the reading, writing and indexing of Standard Flowgram Format (SFF) files produced in 454 sequencing. Jose Blanca (the brains behind the widely used sff_extract tool) has extended Bio.SeqIO to handle these files, making it possible to convert between SFF, FASTQ, FASTA and QUAL formats (as trimmed or untrimmed reads). As well as adding features the new release tweaks and extends some of the core modules: *Both Bio.SeqIO and Bio.AlignIO will accept filenames as well as handles, as detailed here *The multiple sequence alignment object that underlies Bio.AlignIO has been improved. *Bio.SeqIO can read and write EMBL nucleotide files. *The dictionary-like object returned Bio.SeqIO.index() has a new method "get_raw" that gets unparsed data from a file as a string. * Bio.Entrez includes some more DTD files, in particular eLink_090910.dtd, needed for our NCBI Entrez Utilities XML parser. Binaries and source files for Biopython 1.54 are available from the downloads page The documentation has been updated to include the changes made since our last release. A big thanks to every one who tested our beta release or submitted bugs since Biopython 1.53. And an especially big thanks to everyone who contributed to this release, including four first time contributors: * Anne Pajon (first contribution) * Brad Chapman * Christian Zmasek * Eric Talevich * Jose Blanca (first contribution) * Kevin Jacobs (first contribution) * Leighton Pritchard * Michiel de Hoon * Peter Cock * Thomas Holder (first contribution) From winda002 at student.otago.ac.nz Mon Apr 5 21:54:38 2010 From: winda002 at student.otago.ac.nz (David Winter) Date: Tue, 06 Apr 2010 13:54:38 +1200 Subject: [Biopython-dev] Biopython devs at iEvoBio? Message-ID: <20100406135438.95193kse6z41o5su@www.studentmail.otago.ac.nz> Hi again guys, I was wondering if anyone else is planning to go to iEvoBio (http://ievobio.org/) in Portland in June. The meeting is planned to be a phyloinformatics counterpart to BOSC and is going to be run alongside the big Evolution Meetings. It might be a good venue to show Erick and Nick's GSoC projects from last year. Obviously, if Eric or Nick are planning to be at the meeting then they should present their work, but if they aren't going to be there I'd be happy to present a short demo on some of the things those libraries can do and how they might be brought together with other Biopython tools to build some useful workflows. ( it might start to make up for how slack of I've been in this news contributor role!) At the moment I really just need to know if better qualified people will be there and, if not, if people think a demo is a good idea (the software demonstration sessions don't need an abstract anytime soon) Cheers, david From eric.talevich at gmail.com Tue Apr 6 00:07:44 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 6 Apr 2010 00:07:44 -0400 Subject: [Biopython-dev] Biopython devs at iEvoBio? In-Reply-To: <20100406135438.95193kse6z41o5su@www.studentmail.otago.ac.nz> References: <20100406135438.95193kse6z41o5su@www.studentmail.otago.ac.nz> Message-ID: Hi David, I'm planning to go to BOSC this summer, and I'm not sure if I'll be able to go to iEvoBio in addition to that. But I'd certainly appreciate it if you could demo the new hotness in Biopython 1.54. I'll let you know if the situation changes (e.g. if BOSC rejects my abstract). Cheers, Eric On Mon, Apr 5, 2010 at 9:54 PM, David Winter wrote: > Hi again guys, > > I was wondering if anyone else is planning to go to iEvoBio > (http://ievobio.org/) in Portland in June. The meeting is planned to be a > phyloinformatics counterpart to BOSC and is going to be run alongside the > big > Evolution Meetings. > > It might be a good venue to show Erick and Nick's GSoC projects from last > year. Obviously, if Eric or Nick are planning to be at the meeting then > they > should present their work, but if they aren't going to be there I'd be > happy to > present a short demo on some of the things those libraries can do and how > they might be brought together with other Biopython tools to build some > useful workflows. ( it might start to make up for how slack of I've been in > this news contributor role!) > > At the moment I really just need to know if better qualified people will > be > there and, if not, if people think a demo is a good idea (the software > demonstration sessions don't need an abstract anytime soon) > > Cheers, > david > > > From bugzilla-daemon at portal.open-bio.org Tue Apr 6 02:14:32 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 02:14:32 -0400 Subject: [Biopython-dev] [Bug 3042] test_Mafft_tool fails In-Reply-To: Message-ID: <201004060614.o366EWsk016896@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3042 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 02:14 EST ------- (In reply to comment #0) > This is with MAFFT version 5.732 (2005/09/14). The output it generates starts > with: > ... > Whereas the bug may disappear with newer versions of mafft, most Biopython > users will not use mafft, and we should not require to have the latest version > of mafft installed to avoid test errors. I think you are right this is due to your version of MAFFT. The lattest version is MAFFT 6.717, the first public 6.x release was back in 2007. MAFFT 5.732 from late 2005 is really *very* old, right at the bottom of the release history page: http://mafft.cbrc.jp/alignment/software/changelog.html Probably the best solution here is to detect the version number (perhaps by the date?), and skip the tests if it is too old (like test_Emboss.py does now). Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 02:22:08 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 02:22:08 -0400 Subject: [Biopython-dev] [Bug 3043] test_NCBI_BLAST_tools fails In-Reply-To: Message-ID: <201004060622.o366M8iU017197@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3043 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 02:22 EST ------- (In reply to comment #0) > This is the error I get: > > ... > AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, > ... > > but actually these seem to be extra options rather than missing options: > > $ blastn -h > USAGE > blastn [-h] [-help] [-import_search_strategy filename] > [-export_search_strategy filename] [-task task_name] [-db database_name] > [-dbsize num_letters] [-gilist filename] [-negative_gilist filename] > [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm] > [-subject subject_input_file] [-subject_loc range] [-query input_file] > [-out output_file] [-evalue evalue] [-word_size int_value] > [-gapopen open_penalty] [-gapextend extend_penalty] > [-perc_identity float_value] [-xdrop_ungap float_value] > [-xdrop_gap float_value] [-xdrop_gap_final float_value] > [-searchsp int_value] [-penalty penalty] [-reward reward] [-no_greedy] > [-min_raw_gapped_score int_value] [-template_type type] > [-template_length int_value] [-dust DUST_options] > [-filtering_db filtering_database] > [-window_masker_taxid window_masker_taxid] > [-window_masker_db window_masker_db] [-soft_masking soft_masking] > [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value] > [-best_hit_score_edge float_value] [-window_size int_value] > [-use_index boolean] [-index_name string] [-lcase_masking] > [-query_loc range] [-strand strand] [-parse_deflines] [-outfmt format] > [-show_gis] [-num_descriptions int_value] [-num_alignments int_value] > [-html] [-max_target_seqs num_sequences] [-num_threads int_value] > [-remote] [-verbose] [-remote_verbose] [-use_test_remote_service] > [-version] > > DESCRIPTION > Nucleotide-Nucleotide BLAST 2.2.22+ > > Use '-help' to print detailed descriptions of command line arguments > > > In any case, probably there will be slight differences in the options used by > different versions of Blast, and this shouldn't cause tests to fail. I thought I was using 2.2.22+ on my dev machine - I'll check. Assuming you have installed the latest BLAST+ and our wrappers are missing some recently added switches, the test is functioning as designed. I really did WANT it to fail in this situation, to alert us to the fact the wrappers need updating. By design the test should pass fine on older BLAST+ releases with less options. OK, maybe it could be a warning, but if we let this pass silently we risk the wrappers getting out of date without anyone noticing, and then missing options being added ad-hoc as an when people need them. i.e. What seemed to have happens with other wrappers in the past. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Apr 6 02:29:24 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 6 Apr 2010 07:29:24 +0100 Subject: [Biopython-dev] Draft Announcement for Biopython 1.54 In-Reply-To: <20100406135105.61102b7osdvos03t@www.studentmail.otago.ac.nz> References: <20100406135105.61102b7osdvos03t@www.studentmail.otago.ac.nz> Message-ID: On Tue, Apr 6, 2010 at 2:51 AM, David Winter wrote: > Hi all, > > Here's a draft announcement for the next release, very happy to take > corrections and suggestions on how to change it. I'll put a marked up > version of this on the OBF server soon. > > Cheers, > David Lovely :) I spotted one small thing to correct: > Biopython now supports the reading, writing and indexing of Standard > Flowgram Format (SFF) files produced in 454 sequencing. Jose Blanca > (the brains behind the widely used sff_extract tool) has extended Bio.SeqIO > to handle these files, making it possible to convert between SFF, FASTQ, > FASTA and QUAL formats (as trimmed or untrimmed reads). The new SFF support was based on code donated from Jose Blanc, but he didn't actually do the SeqIO integration (or the indexing) - that was me. Also we can only convert from SFF to any of FASTQ, FASTA and QUAL formats. Going to SFF isn't possible because it requires the flow space data from the instrument which isn't present. Thanks David, Peter From bugzilla-daemon at portal.open-bio.org Tue Apr 6 04:14:03 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 04:14:03 -0400 Subject: [Biopython-dev] [Bug 3043] test_NCBI_BLAST_tools fails In-Reply-To: Message-ID: <201004060814.o368E3Gr021721@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3043 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2010-04-06 04:14 EST ------- (In reply to comment #1) > > Assuming you have installed the latest BLAST+ and our wrappers are missing some > recently added switches, the test is functioning as designed. I really did > WANT it to fail in this situation, to alert us to the fact the wrappers need > updating. Uhm, how does a test failing on some user's machine alert us to update the wrappers? Especially with new users, it's more likely that they will conclude that Biopython is buggy, and stop using it. It's better to execute such a test only if a user or developer specifically asks for it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 04:45:44 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 04:45:44 -0400 Subject: [Biopython-dev] [Bug 3043] test_NCBI_BLAST_tools fails In-Reply-To: Message-ID: <201004060845.o368jiB5023048@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3043 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 04:45 EST ------- (In reply to comment #2) > > Uhm, how does a test failing on some user's machine alert us to update the > wrappers? Especially with new users, it's more likely that they will conclude > that Biopython is buggy, and stop using it. It's better to execute such a test > only if a user or developer specifically asks for it. > I was expecting that if a user running a new BLAST+ runs the unit test and hits this issue, they'd report the issue to us. Looking at this afresh, the assert wasn't really a good long term solution (but it was very helpful in checking the wrappers had full coverage). To address this issue in the short term, I've made this abort the test with a missing external dependency error (so that run_tests.py will skip it) with what I hope is a clear and not too scary error message. In this particular case, those extra arguments *may* only these because your copy of BLAST 2.2.22+ has been compiled in debug mode. They are not present on my install of BLAST 2.2.22+ We don't have "full" test suite do we? That would be useful, and we could do things like require larger test files to be present (which we can track in the repository but not ship) or generated, or assume a particular BLAST database will be installed. Can we close this bug? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 09:39:08 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 09:39:08 -0400 Subject: [Biopython-dev] [Bug 3043] test_NCBI_BLAST_tools fails In-Reply-To: Message-ID: <201004061339.o36Dd8vP032063@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3043 ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2010-04-06 09:39 EST ------- (In reply to comment #3) > To address this issue in the short > term, I've made this abort the test with a missing external dependency error > (so that run_tests.py will skip it) with what I hope is a clear and not too > scary error message. > Sorry but I don't think this test is useful. If the test succeeds, all we know is that the user's Blast has the same options as the developer's Blast. But it doesn't actually test Bio.Blast.Applications. For many users, the test will generate a MissingDependencyError; as we've seen, even for the same version Blast may have different options. But the Blast dependency is not actually missing, and most like Bio.Blast.Applications works correctly even if some options were added to Blast. > We don't have "full" test suite do we? That would be useful, and we could do > things like require larger test files to be present (which we can track in the > repository but not ship) or generated, or assume a particular BLAST database > will be installed. > That would be useful. We could have a biopython/Developer directory in the repository with all the tests we want to run before making a release. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From cy at cymon.org Tue Apr 6 10:39:46 2010 From: cy at cymon.org (Cymon Cox) Date: Tue, 6 Apr 2010 15:39:46 +0100 Subject: [Biopython-dev] [Bug 3042] test_Mafft_tool fails In-Reply-To: <201004060614.o366EWsk016896@portal.open-bio.org> References: <201004060614.o366EWsk016896@portal.open-bio.org> Message-ID: On 6 April 2010 07:14, wrote: > http://bugzilla.open-bio.org/show_bug.cgi?id=3042 > > > > > > ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 02:14 EST ------- > (In reply to comment #0) > > This is with MAFFT version 5.732 (2005/09/14). The output it generates > starts > > with: > > ... > > Whereas the bug may disappear with newer versions of mafft, most > Biopython > > users will not use mafft, and we should not require to have the latest > version > > of mafft installed to avoid test errors. > > I think you are right this is due to your version of MAFFT. The lattest > version is MAFFT 6.717, the first public 6.x release was back in 2007. > MAFFT 5.732 from late 2005 is really *very* old, right at the bottom of > the release history page: > http://mafft.cbrc.jp/alignment/software/changelog.html > > Probably the best solution here is to detect the version number (perhaps by > the date?), and skip the tests if it is too old (like test_Emboss.py does > now). > For the alignment tool interfaces we could only test against the versions that the wrappers were written against (Mafft was 6.626b for instance), and skip all other versions - but that may be a bit drastic. Perhaps detecting the version and issuing a warning such as "This test may have failed because you are using an older/newer version of X", if necessary, is more appropriate. I'll look again at newer versions of these alignment tools (when I get a chance...). Cheers, C. -- From bugzilla-daemon at portal.open-bio.org Tue Apr 6 15:29:17 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 15:29:17 -0400 Subject: [Biopython-dev] [Bug 3043] test_NCBI_BLAST_tools fails In-Reply-To: Message-ID: <201004061929.o36JTHtb009859@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3043 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 15:29 EST ------- (In reply to comment #4) > (In reply to comment #3) > > To address this issue in the short > > term, I've made this abort the test with a missing external dependency error > > (so that run_tests.py will skip it) with what I hope is a clear and not too > > scary error message. > > > Sorry but I don't think this test is useful. If the test succeeds, all we know > is that the user's Blast has the same options as the developer's Blast. But it > doesn't actually test Bio.Blast.Applications. For many users, the test will > generate a MissingDependencyError; as we've seen, even for the same version > Blast may have different options. But the Blast dependency is not actually > missing, and most like Bio.Blast.Applications works correctly even if some > options were added to Blast. Do you agree this check would be useful as part of a "developer's extended test suite"? The idea being that it will hopefully catch when the NCBI adds or removes a BLAST+ switch. We would then update the wrapper and/or white list the change in the test. > > We don't have "full" test suite do we? That would be useful, and we could > > do things like require larger test files to be present (which we can track > > in the repository but not ship) or generated, or assume a particular BLAST > > database will be installed. > > That would be useful. We could have a biopython/Developer directory in the > repository with all the tests we want to run before making a release. I had been thinking something like Tests/dev_test_XXX.py and Tests/dev/XXX for any files required - but your suggestion of a new top level directory would make the manifest and setup.py work easier. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 16:48:18 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 16:48:18 -0400 Subject: [Biopython-dev] [Bug 3044] New: PhyloXMLIO, assigning node_id causes failures on write after re-reading Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3044 Summary: PhyloXMLIO, assigning node_id causes failures on write after re-reading Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P3 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: joelb at lanl.gov Hi, everybody. Thanks for the prompt attention to my previous bugs as I work my way through a PhyloXML project. I updated my biopython from git and archaeoptryx from cvs on Friday and now several things work differently, mostly for the better. I have problems with writing any phyloXML tree which was read with node_id's defined. Writing the tree the first time doesn't fail, and the tree can be subsequently read, but a failure occurs on the second write. Again, using an example file: >>> tree = Phylo.read('bcl_2.xml', 'phyloxml') >>> tree.clade[0].node_id = Phylo.PhyloXML.Id('node000') >>> Phylo.write(tree,'test1.xml','phyloxml') 1 >>> tree1 = Phylo.read('test1.xml','phyloxml') >>> Phylo.write(tree1,'test2.xml','phyloxml') Traceback (innermost last): File "", line 1, in File "/usr/lib64/python2.6/site-packages/Bio/Phylo/_io.py", line 82, in write n = getattr(supported_formats[format], 'write')(trees, file, **kwargs) File "/usr/lib64/python2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 142, in write return Writer(obj).write(file, encoding=encoding, indent=indent) File "/usr/lib64/python2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 684, in __init__ self._tree = ElementTree.ElementTree(self.phyloxml(phyloxml)) File "/usr/lib64/python2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 705, in phyloxml elem.append(self.phylogeny(tree)) File "/usr/lib64/python2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 656, in wrapped elem.append(getattr(self, subn)(getattr(obj, subn))) File "/usr/lib64/python2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 661, in wrapped elem.append(getattr(self, method)(item)) File "/usr/lib64/python2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 656, in wrapped elem.append(getattr(self, subn)(getattr(obj, subn))) File "/usr/lib64/python2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 651, in wrapped elem = ElementTree.Element(tag, _clean_attrib(obj, attribs)) File "/usr/lib64/python2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 643, in _clean_attrib val = getattr(obj, key) AttributeError: 'str' object has no attribute 'provider' If I specify a provider using tree.clade[0].node_id = Phylo.PhyloXML.Id('node000',provider='LANL') I get the same error. I realize it's fairly pointless specifying a node_id if one wants to intermix with Java because forester pays no attention to node_id and assigns its own. I think this is a bug in the Java implementation, according to the XML schema. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 17:28:27 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 17:28:27 -0400 Subject: [Biopython-dev] [Bug 3045] New: TreeMixin, please define enumerator and other convenience methods Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3045 Summary: TreeMixin, please define enumerator and other convenience methods Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P4 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: joelb at lanl.gov Hi again, I frequently find the need to go back and forth between tree objects and sequences defined over either the internal or the terminal nodes. Ideally these should be done in concise list comprehensions for performance and readability reasons. These list comprehensions necessarily mix indices into arrays and objects from generators, and the enumerate() pattern is the most convenient because of this mix. I suspect that many others have the same needs. The usage patterns for, say, setting a phyloXML property from an array prop_arr should look something like: [node.set_property(prop_arr[i], *prop_params, **prop_keywords) for i, node in tree.enumerate_internals()] The three issues that frustrate such concision are (1) internal nodes, terminal nodes, and all nodes are not currently on an equal footing with respect to methods, (2) there are no enumerator methods, and (3) the get/set methods for phyloXML are very awkward at the moment. I deal with (3) in the next feature request. Here I give some convenience methods that I wish were defined in TreeMixin. I have tested them as standalone methods. I hope you'll see fit to include them at some point. def count_internals(self): """Counts the number of non-terminal (internal) nodes within this tree.""" return [i for i,e in enumerate_internals(self)][-1] + 1 def enumerate_internals(self): """Returns an enumerator of non-terminal clades""" return enumerate(self.find_clades(terminal=False)) def enumerate_terminals(self): """Returns an enumerator of terminal clades""" return enumerate(self.find_clades(terminal=True)) def enumerate_all(self): """Returns an enumerator on all clades""" return enumerate(self.find_clades()) Less critical but still useful are the following two methods (and one private utility) that I find useful for operations on trees: def is_semipreterminal(self): """True if any direct descendent is terminal.""" if self.root.is_terminal(): return False for clade in self.clades: if clade.is_terminal(): return True return False def terminal_neighbor_dists(self): """Return a list of distances between adjacent terminals""" return [self.distance(*i) for i in _generate_pairs(self.find_clades(terminal=True))] def _generate_pairs(self): import itertools pairs = itertools.tee(self) pairs[1].next() return itertools.izip(pairs[0], pairs[1]) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 17:46:17 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 17:46:17 -0400 Subject: [Biopython-dev] [Bug 3046] New: PhyloXML, please define get/set methods Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3046 Summary: PhyloXML, please define get/set methods Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P4 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: joelb at lanl.gov It would be nice if there were get/set properties for phyloXML objects that were easier and more concise to use. Right now, to set, say, a phyloXML property, one has to read the code to learn the names and arguments of the Property class and also to learn that properties are added by appending to a list. Besides the matter of convenience, there is also a question about how the properties and taxonomies objects behave. I will take the matter up with the phyloXML mailing list, but I believe that these objects should be dictionary-like rather than list-like. That is, duplicate ref values should not be allowed because the question of how to handle duplicates would have to get pushed down to the user level and will be inconsistent. The following convenience methods make a start at these problems, but don't fully solve them because the current PhyloXML code would have to be reworked to deliver dictionaries of dictionaries. However, it's better than nothing: def set_property(self, *propArgs, **propkwArgs): for property in self.properties: if property.ref == propArgs[1]: property = PhyloXML.Property(propArgs) return self.properties.append(PhyloXML.Property(*propArgs, **propkwArgs)) def get_property(self, key): for property in self.properties: if property.ref == key: return property.value raise KeyError def set_ID(self, *idArgs, **idkwArgs): self.node_id = PhyloXML.Id(*idArgs, **idkwArgs) def add_taxonomy(self, *taxArgs, **taxkwArgs): self.taxonomies.append(PhyloXML.Taxonomy(*taxArgs, **taxkwArgs)) def set_color(node, red, green, blue): node.color = PhyloXML.BranchColor(red, green, blue) def get_taxonomy(self, rank): for taxonomy in self.taxonomies: if taxonomy.rank == rank: return taxonomy.scientific_name raise KeyError -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 18:13:23 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:13:23 -0400 Subject: [Biopython-dev] [Bug 3047] New: PhyloXML, behavior on setting color and width doesn't match docstring or spec Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3047 Summary: PhyloXML, behavior on setting color and width doesn't match docstring or spec Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P4 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: joelb at lanl.gov >From the Clade docstring: "Both 'color' and 'width' elements apply for the whole clade unless overwritten in-sub clades. This information follows the PhyloXML doc. However, that's not the way the code works: >>> tree = Phylo.read('Bacteria403.phyloxml', 'phyloxml') >>> tree.clade[0].color BranchColor(blue='0', green='76', red='41') >>> tree = Phylo.read('bcl_2.xml', 'phyloxml') >>> tree.clade[0].color >>> tree.clade[0].color = Phylo.PhyloXML.BranchColor(255,0,255) >>> tree.clade[0].color BranchColor(blue='255', green='0', red='255') >>> tree.clade[0][0].color >>> tree.clade[0].width = 3 >>> tree.clade[0][0].width >>> Personally I'd prefer changing the docstring. The Java code doesn't implement the spec either, and its actually more complicated for the user to deal with side-effects of setting at the entire clade at once than it is to iterate over the clade. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 18:22:25 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:22:25 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004062222.o36MMPlk014586@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 18:22 EST ------- I'm very tempted to mark this as "won't fixed", this is Python not Java (grin) and get/set functions are ugly. The actual functionality you are looking for might be expressed using explicit Python properties though (which would show up using dir(tree) etc). I'd need to see some examples to comment on the specifics. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 18:25:41 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:25:41 -0400 Subject: [Biopython-dev] [Bug 3045] TreeMixin, please define enumerator and other convenience methods In-Reply-To: Message-ID: <201004062225.o36MPfp3014644@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3045 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 18:25 EST ------- Interesting. I don't really see the need for most of these given how routine use of the enumerate function is elsewhere in Python. So -1 on the enumerate methods. I'm not fond of the name of the existing method get_terminals (which currently returns a list). My feeling is that using just terminals seems nicer (as a property, so no order argument - if you need that use the find method). Is there any advantage to returning a list vs an iterator? Everything is all in memory anyway, right? Given a terminals property (be it a read only list or an iterator), one might go further and add a sister property for the internal nodes (non-terminal nodes). What are your thoughts Eric? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 18:29:50 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:29:50 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004062229.o36MToTH014782@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 18:29 EST ------- (In reply to comment #0) > > ... > >>> tree.clade[0].color = Phylo.PhyloXML.BranchColor(255,0,255) > >>> tree.clade[0].color > BranchColor(blue='255', green='0', red='255') Maybe I should file this as a separate bug, but it looks like BranchColor needs an explicit __repr__ to ensure the arguments are listed as in the __init__ defintion (which is the conventional red, green, blue order - rather than alphabetical). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 18:32:31 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:32:31 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004062232.o36MWV73014870@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 18:32 EST ------- (In reply to comment #0) > From the Clade docstring: > > "Both 'color' and 'width' elements apply for the whole clade unless > overwritten in-sub clades. What is the bug? The docstring doesn't say the property will be explicitly cascaded down to the sub-clades, which seems to be your interpretation. Isn't that implicit when interpreting (drawing) the tree? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 18:35:17 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:35:17 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004062235.o36MZHiB014980@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 ------- Comment #2 from joelb at lanl.gov 2010-04-06 18:35 EST ------- (In reply to comment #1) > I'm very tempted to mark this as "won't fixed", this is Python not Java > (grin) and get/set functions are ugly. > > The actual functionality you are looking for might be expressed using explicit > Python properties though (which would show up using dir(tree) etc). I'd need > to see some examples to comment on the specifics. > Hi, Peter, Actually, I was thinking that the PhyloXML interface is *too* Java-esque. The functionality I'm trying to get was summarized in the previous feature request, namely a concise list comprehension such as: [node.set_property(prop_arr[i], *prop_params, **prop_keywords) for i, node in tree.enumerate_internals()] Obviously this could be done without explicit get/sets as [node.__setattr__('property', PhyloXML.Property(prop_arr[i], *prop_params, **prop_keywords)) for i, node in tree.enumerate_internals()] if property was actually settable, although that's ugly too. Unfortunately you can't set 'property', you can only append to the properties list, and I don't see any clean way of doing that through __setattr__ By the way, the taxonomies list totally doesn't work in the Java code; it only sees the last taxonomy that you added. I'm working with upstream on this. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 18:40:54 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:40:54 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004062240.o36MesZY015074@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 ------- Comment #3 from joelb at lanl.gov 2010-04-06 18:40 EST ------- (In reply to comment #2) > (In reply to comment #0) > > From the Clade docstring: > > > > "Both 'color' and 'width' elements apply for the whole clade unless > > overwritten in-sub clades. > > What is the bug? The docstring doesn't say the property will be explicitly > cascaded down to the sub-clades, which seems to be your interpretation. > Isn't that implicit when interpreting (drawing) the tree? > > Peter > I suppose one could maintain that it's the responsibility of user code to enforce the behavior specified in the docstring, although I think that's a recipe for incompatibilities. However, neither of the two available user codes I'm aware of (archaeoptryx or biopython's Phylo.draw_graphviz) actually implement it. It's better to just change the docstring, I think. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 18:45:58 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:45:58 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004062245.o36Mjwgm015132@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 18:45 EST ------- (In reply to comment #3) > > I suppose one could maintain that it's the responsibility of user code to > enforce the behavior specified in the docstring, although I think that's a > recipe for incompatibilities. However, neither of the two available user > codes I'm aware of (archaeoptryx or biopython's Phylo.draw_graphviz) actually > implement it. It's better to just change the docstring, I think. > The current behaviour seems very natural to me. Are you familiar with CSS? I think there are strong similarities - unless explicitly overridden a node implicitly inherits the color/width of its parent. What would you suggest for the docstring? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 18:56:13 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:56:13 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004062256.o36MuD8i015303@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 18:56 EST ------- (In reply to comment #2) > > Hi, Peter, > > Actually, I was thinking that the PhyloXML interface is *too* Java-esque. > The functionality I'm trying to get was summarized in the previous feature > request, namely a concise list comprehension such as: > > [node.set_property(prop_arr[i], *prop_params, **prop_keywords) > for i, node in tree.enumerate_internals()] I don't understand what you are trying to do in this example, but a method called set_property seems wrong - are you trying to do something using property attributes for new-style Python classes? Also why are you using a list comprehension if you care about the side effects (creating a property)? Why not just use a for loop? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 19:02:45 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 19:02:45 -0400 Subject: [Biopython-dev] [Bug 3044] PhyloXMLIO, assigning node_id causes failures on write after re-reading In-Reply-To: Message-ID: <201004062302.o36N2jl1015451@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3044 ------- Comment #1 from eric.talevich at gmail.com 2010-04-06 19:02 EST ------- Hi Joel, Thanks for testing! It's great to get this stuff ironed out before the first stable release. (In reply to comment #0) > I updated my biopython from git and archaeoptryx from cvs on Friday and now > several things work differently, mostly for the better. Heads up: I pushed another change to GitHub yesterday that *might* have broken your code. Would you mind pulling another update and seeing if everything still works? (The main effect is that "tree.get_path(code='OCTVU')" will work now.) > I have problems with writing any phyloXML tree which was read with node_id's > defined. Writing the tree the first time doesn't fail, and the tree can be > subsequently read, but a failure occurs on the second write. OK, I'll check this out soon. This may be due to a shim I added to make PhyloXML.Id object behave like a primitive type in some cases, for compatibility with non-PhyloXML trees. Best, Eric -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 19:09:34 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 19:09:34 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004062309.o36N9YFO015608@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 19:09 EST ------- Taking a specific example, you suggested adding this helper function: def set_color(node, red, green, blue): node.color = PhyloXML.BranchColor(red, green, blue) I might advocate adding a color property to the tree/node class, with a set method which accepts either a PhyloXML.BranchColor instance or perhaps for convenience a RGB tuple. Something like this: def _set_color(self, color): if isinstance(color, PhyloXML.BranchColor): self._color = color elif len(color)==3: self.color = PhyloXML.BranchColor(red=color[0], green=color[1], blue=color[2]) else: raise ValueError("Bad color") def _get_color(self): return self._color color = Property(_get_color, _set_color, doc="Node color") (It would be nice to make the color object similar to the ReportLab and GenomeDiagram conventions used elsewhere in Biopython). The point of that would be you would then use it like this: for node in tree.find(terminal=False): node.color = PhyloXML.BranchColor(255, 0, 0) for node in tree.find(terminal=True): node.color = PhyloXML.BranchColor(0, 0, 255) if you explicitly wanted to make all the internal nodes red and all the terminal nodes blue. Or, as discussed on Bug 3047 do this implicitly: tree.color = (255, 0, 0) #implicitly applies to children for node in tree.find(terminal=True): node.color = PhyloXML.BranchColor(0, 0, 255) Eric - how would this example be done with the current code base? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 19:20:13 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 19:20:13 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004062320.o36NKDTt015849@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 ------- Comment #5 from joelb at lanl.gov 2010-04-06 19:20 EST ------- (In reply to comment #3) > > I don't understand what you are trying to do in this example, but a method > called set_property seems wrong - are you trying to do something using > property attributes for new-style Python classes? > > Also why are you using a list comprehension if you care about the side > effects (creating a property)? Why not just use a for loop? > > Peter > There have long been built-in get/sets in python via __setattr__() and __getattribute__(). That's where the code I sent should live. Putting code to get and especially to set in those methods means that a user doesn't have to look up whatever classes were defined for attributes (e.g. like finding that 'color' is called 'BranchColor') and doesn't need to know that taxonomies and properties are only set through appending through lists. The reason why to use a list comprehension rather than a for loop is performance and readability. Small functions that work over a single item of a sequence are vectorizable, either as list comprehensions or through numpy.vectorize. See Ziade's "Expert Python Programming" p. 34. I have code examples where the difference is a factor of 300 in speed. I'm including an example code that I wrote. Feel free, Eric, to use it on the PhyloXML page if you wish. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 19:22:20 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 19:22:20 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004062322.o36NMK4m015894@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 ------- Comment #5 from joelb at lanl.gov 2010-04-06 19:22 EST ------- Created an attachment (id=1475) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1475&action=view) phyloXML user code showing how to colorize a tree -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 19:23:12 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 19:23:12 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004062323.o36NNCgC015925@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 ------- Comment #6 from joelb at lanl.gov 2010-04-06 19:23 EST ------- Created an attachment (id=1476) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1476&action=view) archaeoptryx screen dump showing colorized tree -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 19:34:43 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 19:34:43 -0400 Subject: [Biopython-dev] [Bug 3045] TreeMixin, please define enumerator and other convenience methods In-Reply-To: Message-ID: <201004062334.o36NYhru016144@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3045 ------- Comment #2 from eric.talevich at gmail.com 2010-04-06 19:34 EST ------- (In reply to comment #1) > Interesting. I don't really see the need for most of these given how routine > use of the enumerate function is elsewhere in Python. So -1 on the enumerate > methods. I'll make sure all of the use cases can be handled with a simple list comprehension, at least. > > I'm not fond of the name of the existing method get_terminals (which currently > returns a list). My feeling is that using just terminals seems nicer (as a > property, so no order argument - if you need that use the find method). Is > there any advantage to returning a list vs an iterator? Everything is all in > memory anyway, right? I took the method names from Bio.Nexus.Trees wherever it seemed reasonable -- one day I'd like Bio.Phylo to be a drop-in replacement for that module (as much as possible). Otherwise I'd be fine with a method called terminals(). The tree object doesn't keep a list of terminal nodes under the hood, so to get the terminal nodes it does a full search of the tree, with run time linear to the number of nodes in the tree. I feel uneasy about properties that don't run in O(1) time. The find* methods return iterators, and the get* methods return lists. I found that the results of get* usually needed to be converted to a list immediately, for indexing or length-checking, and aren't liable to be unexpectedly large -- smaller than the whole tree, anyway. Plus, get_terminals() is really just a shortcut for list(tree.find_clades(terminal=True)), for those who prefer to dive into the module or save some typing. > Given a terminals property (be it a read only list or an iterator), one might > go further and add a sister property for the internal nodes (non-terminal > nodes). Apparently there's some demand for it. It would be the same as list(tree.find_clades(terminal=False)), and forcing users to learn how find_* methods work after they're hooked on get_terminals() has some appeal, but I suppose we should just pick a name and add it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From matzke at berkeley.edu Tue Apr 6 20:46:17 2010 From: matzke at berkeley.edu (Nick Matzke) Date: Tue, 06 Apr 2010 17:46:17 -0700 Subject: [Biopython-dev] Biopython devs at iEvoBio? In-Reply-To: <20100406135438.95193kse6z41o5su@www.studentmail.otago.ac.nz> References: <20100406135438.95193kse6z41o5su@www.studentmail.otago.ac.nz> Message-ID: <4BBBD5D9.60504@berkeley.edu> I'll be at the Evo meeting and the iEvoBio meeting. I'd be happy to give a talk as long as it's short -- I have to prioritize my research talk at the main meeting! Cheers! Nick David Winter wrote: > Hi again guys, > > I was wondering if anyone else is planning to go to iEvoBio > (http://ievobio.org/) in Portland in June. The meeting is planned to be a > phyloinformatics counterpart to BOSC and is going to be run alongside > the big > Evolution Meetings. > > It might be a good venue to show Erick and Nick's GSoC projects from last > year. Obviously, if Eric or Nick are planning to be at the meeting then > they > should present their work, but if they aren't going to be there I'd be > happy to > present a short demo on some of the things those libraries can do and how > they might be brought together with other Biopython tools to build some > useful workflows. ( it might start to make up for how slack of I've been in > this news contributor role!) > > At the moment I really just need to know if better qualified people > will be > there and, if not, if people think a demo is a good idea (the software > demonstration sessions don't need an abstract anytime soon) > > Cheers, > david > > > -- ==================================================== Nicholas J. Matzke Ph.D. Student, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Graduate Student Instructor, IB200A Principles of Phylogenetics: Systematics http://ib.berkeley.edu/courses/ib200a/index.shtml Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: matzke at berkeley.edu Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 ----------------------------------------------------- "[W]hen people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together." Isaac Asimov (1989). "The Relativity of Wrong." The Skeptical Inquirer, 14(1), 35-44. Fall 1989. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm ==================================================== From bugzilla-daemon at portal.open-bio.org Tue Apr 6 23:32:30 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 23:32:30 -0400 Subject: [Biopython-dev] [Bug 3044] PhyloXMLIO, assigning node_id causes failures on write after re-reading In-Reply-To: Message-ID: <201004070332.o373WU1b024754@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3044 eric.talevich at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from eric.talevich at gmail.com 2010-04-06 23:32 EST ------- Fixed in GitHub now: http://github.com/biopython/biopython/commit/218a2e6759a901766125a99370593097f36b1bad -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 7 01:05:02 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Apr 2010 01:05:02 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004070505.o37552DX028541@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 ------- Comment #6 from eric.talevich at gmail.com 2010-04-07 01:05 EST ------- (In reply to comment #0) > It would be nice if there were get/set properties for phyloXML objects that > were easier and more concise to use. Right now, to set, say, a phyloXML > property, one has to read the code to learn the names and arguments of the > Property class and also to learn that properties are added by appending to a > list. Yes, it's easier to tweak the class definitions if there's not much syntactic sugar to get in the way. This is still pretty new code ;) but of course I'm open to suggestions. > Besides the matter of convenience, there is also a question about how the > properties and taxonomies objects behave. I will take the matter up with the > phyloXML mailing list, but I believe that these objects should be > dictionary-like rather than list-like. That is, duplicate ref values should > not be allowed because the question of how to handle duplicates would have to > get pushed down to the user level and will be inconsistent. The Events class (clade.events attribute) mimics a dictionary. Have you used that yet? About clade.properties: If ordering of properties doesn't matter, 'ref' is guaranteed to be unique at a node, and it seems to be the right way to index the other associated data, then I can make clade.properties act like a dictionary. Can we confirm all of these? And for the implementation, can you provide a sketch of what you'd like the final structure to look like, and maybe a contrived doctest-like code example showing what you'd like to be able to do? In many cases, the phyloXML spec doesn't currently promise enough to make nice shortcuts work without the possibility of breaking in the future. For example, check out this new demo with *two* bootstrap values for every clade: http://www.phylosoft.org/archaeopteryx/examples/data/multiple_supports.xml I was tempted to make confidences act like a dictionary indexed by support type, but clearly now that wouldn't have worked. A list of Confidence objects lets us stay faithful to the raw XML representation. > def set_property(self, *propArgs, **propkwArgs): > for property in self.properties: > if property.ref == propArgs[1]: > property = PhyloXML.Property(propArgs) > return > self.properties.append(PhyloXML.Property(*propArgs, **propkwArgs)) > > def get_property(self, key): > for property in self.properties: > if property.ref == key: > return property.value > raise KeyError It's possible that Bio.Phylo will pick up the convention of "add_foo/get_foo" methods where a property would be overly magical, and something noteworthy is going on internally. Alignment objects have "add_sequence", and Phylogeny objects have "get_alignment". Would you use a Phylogeny method called add_alignment, taking something like a Phylip character matrix? We can figure out a sugared interface for clade.properties once we know how which of the requirements stated above will actually be guaranteed. > def set_ID(self, *idArgs, **idkwArgs): > self.node_id = PhyloXML.Id(*idArgs, **idkwArgs) If you do "from Bio.Phylo import PhyloXML as PX" it really doesn't save any typing, and the **kwargs magic is even less suitable for introspection. It's not possible to take advantage of all the PhyloXML annotations available without learning about the annotation classes the -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Wed Apr 7 01:19:55 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 7 Apr 2010 01:19:55 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods Message-ID: Hi Joel, (In reply to comment #0) On Tue, Apr 6, 2010 at 5:46 PM, wrote: > > http://bugzilla.open-bio.org/show_bug.cgi?id=3046 > It would be nice if there were get/set properties for phyloXML objects that > were easier and more concise to use.? Right now, to set, say, a phyloXML > property, one has to read the code to learn the names and arguments of the > Property class and also to learn that properties are added by appending to a > list. Yes, it's easier to tweak the class definitions if there's not much syntactic sugar to get in the way. This is still pretty new code ;) but of course I'm open to suggestions. > Besides the matter of convenience, there is also a question about how the > properties and taxonomies objects behave.?? I will take the matter up with the > phyloXML mailing list, but I believe that these objects should be > dictionary-like rather than list-like.? That is, duplicate ref values should > not be allowed because the question of how to handle duplicates would have to > get pushed down to the user level and will be inconsistent. The Events class (clade.events attribute) mimics a dictionary. Have you used that yet? About clade.properties: If ordering of properties doesn't matter, 'ref' is guaranteed to be unique at a node, and it seems to be the right way to index the other associated data, then I can make clade.properties act like a dictionary. Can we confirm all of these? And for the implementation, can you provide a sketch of what you'd like the final structure to look like, and maybe a contrived doctest-like code example showing what you'd like to be able to do? In many cases, the phyloXML spec doesn't currently promise enough to make nice shortcuts work without the possibility of breaking in the future. For example, check out this new demo with *two* bootstrap values for every clade: http://www.phylosoft.org/archaeopteryx/examples/data/multiple_supports.xml I was tempted to make confidences act like a dictionary indexed by support type, but clearly now that wouldn't have worked. A list of Confidence objects lets us stay faithful to the raw XML representation. > def set_property(self,? *propArgs,? **propkwArgs): >???? for property in self.properties: >???????? if property.ref == propArgs[1]: >???????????? property = PhyloXML.Property(propArgs) >???????????? return >???? self.properties.append(PhyloXML.Property(*propArgs,? **propkwArgs)) > > def get_property(self,? key): >???? for property in self.properties: >???????? if property.ref == key: >???????????? return property.value >???? raise KeyError It's possible that Bio.Phylo will pick up the convention of "add_foo/get_foo" methods where a property would be overly magical, and something noteworthy is going on internally. Alignment objects have "add_sequence", and Phylogeny objects have "get_alignment". Would you use a Phylogeny method called add_alignment, taking something like a Phylip character matrix? We can figure out a sugared interface for clade.properties once we know how which of the requirements stated above will actually be guaranteed. > def set_ID(self,? *idArgs,? **idkwArgs): >???? self.node_id = PhyloXML.Id(*idArgs,? **idkwArgs) If you do "from Bio.Phylo import PhyloXML as PX" it doesn't really save much typing, and the **kwargs magic is even less suitable for introspection. It's not possible to take advantage of all the PhyloXML annotations available without learning about the annotation classes themselves. How about this: I'll write some decent documentation on the Biopython wiki's PhyloXML page and the official Biopython tutorial/cookbook. > def add_taxonomy(self,? *taxArgs,? **taxkwArgs): >???? self.taxonomies.append(PhyloXML.Taxonomy(*taxArgs,? **taxkwArgs)) > > def get_taxonomy(self, rank): >???? for taxonomy in self.taxonomies: >???????? if taxonomy.rank == rank: >???????????? return taxonomy.scientific_name >???? raise KeyError Unfortunately, none of the Taxonomy attributes are required in the phyloXML spec, so there's nothing we can rely on for easier indexing. But, if the phyloXML files you create yourself are well-behaved then you're free to make your own wrappers over the current low-level functionality. Clade.taxonomies will always be plural and iterable. > def set_color(node, red, green,? blue): >???? node.color =? PhyloXML.BranchColor(red, green, blue) Redundancy makes code harder to maintain -- I'd like to keep it clean at least for the very first release. The BranchColor class actually has much cooler functionality than this; try "node.color = PX.BranchColor.from_name('red')" for example. We can try adding sugar on top of this, but whatever we add, we'll need to maintain in Biopython for quite some time. Thanks again for all the testing and feedback! Best, Eric From winda002 at student.otago.ac.nz Wed Apr 7 01:31:11 2010 From: winda002 at student.otago.ac.nz (david winter) Date: Wed, 07 Apr 2010 17:31:11 +1200 Subject: [Biopython-dev] Biopython devs at iEvoBio? In-Reply-To: <4BBBD5D9.60504@berkeley.edu> References: <20100406135438.95193kse6z41o5su@www.studentmail.otago.ac.nz> <4BBBD5D9.60504@berkeley.edu> Message-ID: <4BBC189F.1030401@student.otago.ac.nz> Ok, So Nick will be there, Eric hopes not to be ;) I sent an email to the organizsng committee about different the talk categories. Based on their reply, and the fact that both Nick and I are going to be focused on our talks at the Evolution Meetings (ie, flying halfway around the world to present a 12min talk) it seems the best way to go would be have a lightening talk on each of the GSoC projects. Eric, presuming you go to BOSC and not iEvoBio I'll get in touch with you at some stage with an outline of a talk and you can help me whip it into shape. Cheers, David On 4/7/2010 12:46 PM, Nick Matzke wrote: > I'll be at the Evo meeting and the iEvoBio meeting. I'd be happy to > give a talk as long as it's short -- I have to prioritize my research > talk at the main meeting! > > Cheers! > Nick > > > David Winter wrote: >> Hi again guys, >> >> I was wondering if anyone else is planning to go to iEvoBio >> (http://ievobio.org/) in Portland in June.. From bugzilla-daemon at portal.open-bio.org Wed Apr 7 01:38:49 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Apr 2010 01:38:49 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004070538.o375cn5n029876@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 ------- Comment #7 from eric.talevich at gmail.com 2010-04-07 01:38 EST ------- (In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #0) > > > From the Clade docstring: > > > > > > "Both 'color' and 'width' elements apply for the whole clade unless > > > overwritten in-sub clades. > > > > What is the bug? The docstring doesn't say the property will be explicitly > > cascaded down to the sub-clades, which seems to be your interpretation. > > Isn't that implicit when interpreting (drawing) the tree? > > > > Peter > > > > I suppose one could maintain that it's the responsibility of user code to > enforce the behavior specified in the docstring, although I think that's a > recipe for incompatibilities. However, neither of the two available user codes > I'm aware of (archaeoptryx or biopython's Phylo.draw_graphviz) actually > implement it. It's better to just change the docstring, I think. > That's my interpretation of it -- the color attribute is meant to be handled that way by whatever code uses it for drawing. Which means two things should happen before closing this bug: - Change the docstring to indicate *user code* is supposed to handle colors and widths in a cascading fashion - Fix draw_graphviz (actually to_networkx) to cascade colors down branches, like the Archaeopteryx GUI does By the way, thanks for phyloXMLtools.py -- I'll take a closer look when I have some time. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 7 02:44:24 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Apr 2010 02:44:24 -0400 Subject: [Biopython-dev] [Bug 3045] TreeMixin, please define enumerator and other convenience methods In-Reply-To: Message-ID: <201004070644.o376iOvi000489@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3045 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-07 02:44 EST ------- Hi Eric, (In reply to comment #2) > > I'm not fond of the name of the existing method get_terminals (which currently > > returns a list). My feeling is that using just terminals seems nicer (as a > > property, so no order argument - if you need that use the find method). Is > > there any advantage to returning a list vs an iterator? Everything is all in > > memory anyway, right? > > I took the method names from Bio.Nexus.Trees wherever it seemed reasonable -- > one day I'd like Bio.Phylo to be a drop-in replacement for that module (as much > as possible). Otherwise I'd be fine with a method called terminals(). OK, that is a reasonable argument in favour. > The tree object doesn't keep a list of terminal nodes under the hood, so to get > the terminal nodes it does a full search of the tree, with run time linear to > the number of nodes in the tree. I feel uneasy about properties that don't run > in O(1) time. OK, so a property does seem unwise. > The find* methods return iterators, and the get* methods return lists. I found > that the results of get* usually needed to be converted to a list immediately, > for indexing or length-checking, and aren't liable to be unexpectedly large -- > smaller than the whole tree, anyway. Plus, get_terminals() is really just a > shortcut for list(tree.find_clades(terminal=True)), for those who prefer to > dive into the module or save some typing. If there is a good reason, that seems fine. > > Given a terminals property (be it a read only list or an iterator), one might > > go further and add a sister property for the internal nodes (non-terminal > > nodes). > > Apparently there's some demand for it. It would be the same as > list(tree.find_clades(terminal=False)), and forcing users to learn how find_* > methods work after they're hooked on get_terminals() has some appeal, but I > suppose we should just pick a name and add it. Maybe get_internals() would match, or get_non_terminals() might be clearer. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From winda002 at student.otago.ac.nz Wed Apr 7 01:16:25 2010 From: winda002 at student.otago.ac.nz (david winter) Date: Wed, 07 Apr 2010 17:16:25 +1200 Subject: [Biopython-dev] Draft Announcement for Biopython 1.54 In-Reply-To: <20100406135105.61102b7osdvos03t@www.studentmail.otago.ac.nz> References: <20100406135105.61102b7osdvos03t@www.studentmail.otago.ac.nz> Message-ID: <4BBC1529.3030709@student.otago.ac.nz> On 4/6/2010 1:51 PM, David Winter wrote: > Hi all, > > Here's a draft announcement for the next release ... Ok, changes from Eric (no axillary 'k') and Peter have been made and a version of the announcement with link and the like is waiting on the OBF server. Still easy to make changes if you've spotted something wrong/missing. David From bugzilla-daemon at portal.open-bio.org Wed Apr 7 03:00:03 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Apr 2010 03:00:03 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004070700.o37703t6001249@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-07 03:00 EST ------- (In reply to comment #5 by Joel) > Created an attachment (id=1475) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1475&action=view) [details] > phyloXML user code showing how to colorize a tree > Thanks for the example - it looks quite complicated. You have lots of functions taking "self" as the first argument. Are the intended to be methods of the tree/clade objects? Otherwise using an argument name like "tree" or "node" could be clearer. Why do you use leading underscores on some many variables whose scope is limited to single functions (in particular function hl_color_on_function)? They are private by their scope. (In reply to comment #7 by Eric) > (In reply to comment #3) > > (In reply to comment #2) > > > (In reply to comment #0) > > > > From the Clade docstring: > > > > > > > > "Both 'color' and 'width' elements apply for the whole clade unless > > > > overwritten in-sub clades. > > > > > > What is the bug? The docstring doesn't say the property will be explicitly > > > cascaded down to the sub-clades, which seems to be your interpretation. > > > Isn't that implicit when interpreting (drawing) the tree? > > > > > > Peter > > > > > > > I suppose one could maintain that it's the responsibility of user code to > > enforce the behavior specified in the docstring, although I think that's a > > recipe for incompatibilities. However, neither of the two available user codes > > I'm aware of (archaeoptryx or biopython's Phylo.draw_graphviz) actually > > implement it. It's better to just change the docstring, I think. > > > > > That's my interpretation of it -- the color attribute is meant to be handled > that way by whatever code uses it for drawing. > > Which means two things should happen before closing this bug: > > - Change the docstring to indicate *user code* is supposed to handle colors > and widths in a cascading fashion > > - Fix draw_graphviz (actually to_networkx) to cascade colors down branches, > like the Archaeopteryx GUI does So the Bio.Phylo drawing code (using NetworkX) doesn't cascade the colors/widths yet? We should probably check the behavour of other phyloXML GUI tools for consistency... and possibly file a bug in Archaeopteryx. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Apr 7 03:01:14 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 7 Apr 2010 08:01:14 +0100 Subject: [Biopython-dev] Draft Announcement for Biopython 1.54 In-Reply-To: <4BBC1529.3030709@student.otago.ac.nz> References: <20100406135105.61102b7osdvos03t@www.studentmail.otago.ac.nz> <4BBC1529.3030709@student.otago.ac.nz> Message-ID: On Wed, Apr 7, 2010 at 6:16 AM, david winter wrote: > > Ok, changes from Eric (no axillary 'k') and Peter have been made and a > version of the announcement with link and the like is waiting on the OBF > server. Still easy to make changes if you've spotted something > wrong/missing. > > David > Great work, thanks. Peter From biopython at maubp.freeserve.co.uk Wed Apr 7 03:36:20 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 7 Apr 2010 08:36:20 +0100 Subject: [Biopython-dev] Setting branch colors in Bio.Phylo Message-ID: Hi Eric, Following discussion on Bug 3046 and 3047, I wrote the following example using the current API to try to set all the branches to red, except the branches of the terminal nodes which I set to blue: from Bio import Phylo tree = Phylo.read("apaf.xml", "phyloxml") #This implicitly applies to all the children: tree.properties.append(Phylo.PhyloXML.BranchColor(255,0,0)) #Now set the terminal nodes to blue: for node in tree.find(terminal=True): node.properties.append(Phylo.PhyloXML.BranchColor(0,0,255)) Phylo.write(tree, "colored.xml", "phyloxml") It fails in the call to write - what am I doing wrong?: Traceback (most recent call last): File "", line 1, in File "/Users/pjcock/repositories/biopython/build/lib.macosx-10.6-universal-2.6/Bio/Phylo/_io.py", line 82, in write n = getattr(supported_formats[format], 'write')(trees, file, **kwargs) File "/Library/Python/2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 142, in write return Writer(obj).write(file, encoding=encoding, indent=indent) File "/Library/Python/2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 684, in __init__ self._tree = ElementTree.ElementTree(self.phyloxml(phyloxml)) File "/Library/Python/2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 705, in phyloxml elem.append(self.phylogeny(tree)) File "/Library/Python/2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 661, in wrapped elem.append(getattr(self, method)(item)) File "/Library/Python/2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 651, in wrapped elem = ElementTree.Element(tag, _clean_attrib(obj, attribs)) File "/Library/Python/2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 643, in _clean_attrib val = getattr(obj, key) AttributeError: 'BranchColor' object has no attribute 'ref' Thanks, Peter From rodrigo_faccioli at uol.com.br Wed Apr 7 08:50:21 2010 From: rodrigo_faccioli at uol.com.br (Rodrigo Faccioli) Date: Wed, 7 Apr 2010 09:50:21 -0300 Subject: [Biopython-dev] PDB-Tidy Project - Google Summer Project Proposed Message-ID: Hi, I'm a Ph.D student at University of Sao Paulo (USP), Brazil. I've worked with BioPython mainly its Bio.PDB module since last year. I would like to participate at Google Summer Code through PDB-Tidy: command-line tools for manipulating PDB files project. In this way, I've talked with Eric Talevich who helped to write my project proposed (link below). http://dl.dropbox.com/u/4270818/Google_Summer_Code_Proposed.pdf In that document, I show more details what I've made with my Bio.PDB extension and how I might to contribute for PDB-Tidy project. But, in general lines, Prometheus is a web-site which allows the studies about protein-protein complexes based on electrostatics properties of proteins. When access the Prometheus Web-site, please use the information below: username: gsoc password: g0sum&10 I'll try to register as student. Before, I appreciate any comments about my proposed. Thanks in advance, -- Rodrigo Antonio Faccioli Ph.D Student in Electrical Engineering University of Sao Paulo - USP Engineering School of Sao Carlos - EESC Department of Electrical Engineering - SEL Intelligent System in Structure Bioinformatics http://laips.sel.eesc.usp.br Phone: 55 (16) 3373-9366 Ext 229 Curriculum Lattes - http://lattes.cnpq.br/1025157978990218 From eric.talevich at gmail.com Wed Apr 7 08:57:29 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 7 Apr 2010 08:57:29 -0400 Subject: [Biopython-dev] Setting branch colors in Bio.Phylo In-Reply-To: References: Message-ID: On Wed, Apr 7, 2010 at 3:36 AM, Peter wrote: > Hi Eric, > > Following discussion on Bug 3046 and 3047, I wrote the following > example using the current API to try to set all the branches to red, > except the branches of the terminal nodes which I set to blue: My aproach would be: from Bio import Phylo from Bio.Phylo import PhyloXML as PX tree = Phylo.read("apaf.xml", "phyloxml") for clade in tree.find_clades(): if clade.is_terminal(): clade.color = PX.BranchColor.from_name('blue') else: clade.color = PX.BranchColor.from_name('red') Strictly according the phyloXML spec, with colors cascading down branches, this should display the same way (but doesn't in Phylo.draw_graphviz): for child in tree.root.clades: child.color = PX.BranchColor.from_name('red') for term in tree.get_terminals() child.color = PX.BranchColor.from_name('blue') I haven't confirmed that Archaeopteryx follows the spec here, but that's how the GUI behaves when colorizing branches, so I assume it does. > from Bio import Phylo > tree = Phylo.read("apaf.xml", "phyloxml") > #This implicitly applies to all the children: > tree.properties.append(Phylo.PhyloXML.BranchColor(255,0,0)) > #Now set the terminal nodes to blue: > for node in tree.find(terminal=True): > ? ?node.properties.append(Phylo.PhyloXML.BranchColor(0,0,255)) > Phylo.write(tree, "colored.xml", "phyloxml") > > It fails in the call to write - what am I doing wrong?: The clade.properties attribute isn't a container for Python properties, it's a phyloXML-specific thing: http://www.phyloxml.org/documentation/version_1.10/phyloxml.xsd.html#h158033242 Cheers, Eric From biopython at maubp.freeserve.co.uk Wed Apr 7 09:30:12 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 7 Apr 2010 14:30:12 +0100 Subject: [Biopython-dev] Setting branch colors in Bio.Phylo In-Reply-To: References: Message-ID: On Wed, Apr 7, 2010 at 1:57 PM, Eric Talevich wrote: > On Wed, Apr 7, 2010 at 3:36 AM, Peter wrote: >> Hi Eric, >> >> Following discussion on Bug 3046 and 3047, I wrote the following >> example using the current API to try to set all the branches to red, >> except the branches of the terminal nodes which I set to blue: > > My aproach would be: > > from Bio import Phylo > from Bio.Phylo import PhyloXML as PX > > tree = Phylo.read("apaf.xml", "phyloxml") > for clade in tree.find_clades(): > ? ?if clade.is_terminal(): > ? ? ? ?clade.color = PX.BranchColor.from_name('blue') > ? ?else: > ? ? ? ?clade.color = PX.BranchColor.from_name('red') Very helpful. > Strictly according the phyloXML spec, with colors cascading down > branches, this should display the same way (but doesn't in > Phylo.draw_graphviz): So for Bug 3047 you'll check fix Phylo.draw_graphviz to do that (assuming the Archaeopteryx tool also does this)? > for child in tree.root.clades: > ? ?child.color = PX.BranchColor.from_name('red') > for term in tree.get_terminals() > ? ?child.color = PX.BranchColor.from_name('blue') I'm assuming you made a typo with the variable names (term vs child). Why not just apply the red to the root node itself? This seems to work: from Bio import Phylo #This implicitly applies to all the children: tree.root.color = Phylo.PhyloXML.BranchColor(255,0,0) #Now set the terminal nodes to blue: for clade in tree.find_clades(terminal=True): clade.color = Phylo.PhyloXML.BranchColor(0,0,255) Phylo.write(tree, "colored.xml", "phyloxml") This is based on my original example but now using find_clades which is more specific than find_all as I now see, and also if hadn't appreciated the difference between tree and tree.root (a Tree object and a Clade object - in other libraries a tree is also a clade). As an aside, I don't like the find method - it seems dangerous is the case where find_all returns multiple hits. I can see it could be useful *if* it returns a single hit, None for no hits, or an exception for multiple hits. > I haven't confirmed that Archaeopteryx follows the spec here, but > that's how the GUI behaves when colorizing branches, so I assume it > does. > > >> from Bio import Phylo >> tree = Phylo.read("apaf.xml", "phyloxml") >> #This implicitly applies to all the children: >> tree.properties.append(Phylo.PhyloXML.BranchColor(255,0,0)) >> #Now set the terminal nodes to blue: >> for node in tree.find(terminal=True): >> ? ?node.properties.append(Phylo.PhyloXML.BranchColor(0,0,255)) >> Phylo.write(tree, "colored.xml", "phyloxml") >> >> It fails in the call to write - what am I doing wrong?: > > The clade.properties attribute isn't a container for Python > properties, it's a phyloXML-specific thing: > http://www.phyloxml.org/documentation/version_1.10/phyloxml.xsd.html#h158033242 But it is still a list of objects, so I would expect to be able to add (suitable) things to it. If you regard this as an implementation detail, then maybe rename the list to _properties instead? Regards, Peter From bugzilla-daemon at portal.open-bio.org Wed Apr 7 09:50:21 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Apr 2010 09:50:21 -0400 Subject: [Biopython-dev] [Bug 3048] New: Bio.Blast.Applications.NcbitblastxCommandline Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3048 Summary: Bio.Blast.Applications.NcbitblastxCommandline Product: Biopython Version: 1.53 Platform: All OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: gebauer-jung at ice.mpg.de NcbitblastxCommandline._validate() uses a non-existing attribute in_pssm (if query is set), raises an error and hence the commandline cannot be used as suggested in the tutorial. The module Bio.Blast.Applications.py is identical to the biopython 1.54b and current github versions. from Bio.Blast.Applications import NcbitblastxCommandline >>> cline = NcbitblastxCommandline() >>> cline.query = 'xxx' >>> print cline Traceback (most recent call last): File "", line 1, in File "....../Bio/Application/__init__.py", line 256, in __str__ self._validate() File "....../biopython-1.53/build/lib.linux-i686-2.5/Bio/Blast/Applications.py", line 929, in _validate if self.query and self.in_pssm: AttributeError: 'NcbitblastxCommandline' object has no attribute 'in_pssm' -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 7 10:09:34 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Apr 2010 10:09:34 -0400 Subject: [Biopython-dev] [Bug 3048] Bio.Blast.Applications.NcbitblastxCommandline In-Reply-To: Message-ID: <201004071409.o37E9YtO017358@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3048 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-07 10:09 EST ------- Confirmed. That validation code would have made sense for tblastn, but not for tblastx. Fixed: http://github.com/biopython/biopython/commit/284216406ecdc77062f3b9cd93bd648e08541b22 Thank you! Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Wed Apr 7 11:55:01 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 7 Apr 2010 11:55:01 -0400 Subject: [Biopython-dev] Setting branch colors in Bio.Phylo In-Reply-To: References: Message-ID: On Wed, Apr 7, 2010 at 9:30 AM, Peter wrote: > Why not just apply the red to the root node itself? This seems to > work: > > from Bio import Phylo > #This implicitly applies to all the children: > tree.root.color = Phylo.PhyloXML.BranchColor(255,0,0) > #Now set the terminal nodes to blue: > for clade in tree.find_clades(terminal=True): > ? clade.color = Phylo.PhyloXML.BranchColor(0,0,255) > Phylo.write(tree, "colored.xml", "phyloxml") You're right, that's better. > > This is based on my original example but now using find_clades > which is more specific than find_all as I now see, and also if > hadn't appreciated the difference between tree and tree.root > (a Tree object and a Clade object - in other libraries a tree > is also a clade). The object hierarchy is a fairly literal translation of the PhyloXML spec. That's why I used TreeMixin to plaster over the differences -- most of the methods that operate on a whole tree or subclade make sense either way, and we can still separate global from local information somewhat. > As an aside, I don't like the find method - it seems dangerous > is the case where find_all returns multiple hits. I can see it > could be useful *if* it returns a single hit, None for no hits, or > an exception for multiple hits. I use find_all and find_clades (mostly find_clades) in loops, and find in if statements. I'm open to renaming any of the TreeMixin methods -- e.g. find_all --> find_elements find --> find_any, or get_any, or just any find_clades --> find_all, or find, or stay the same Would those names be more intuitive? >>> for node in tree.find(terminal=True): >>> ? ?node.properties.append(Phylo.PhyloXML.BranchColor(0,0,255)) >>> Phylo.write(tree, "colored.xml", "phyloxml") >>> >>> It fails in the call to write - what am I doing wrong?: >> >> The clade.properties attribute isn't a container for Python >> properties, it's a phyloXML-specific thing: >> http://www.phyloxml.org/documentation/version_1.10/phyloxml.xsd.html#h158033242 > > But it is still a list of objects, so I would expect to be able to add > (suitable) > things to it. If you regard this as an implementation detail, then maybe > rename the list to _properties instead? You can append to it, but the thing you append needs to be a PhyloXML.Property object, or else the serializer harfs when it can't find the expected attributes. Mitigation: Since the phyloXML spec requires some attributes in Property instances, and the serializer assumes they'll be satisfied, Property.__init__ should do some additional checks too and fail early if necessary. Adding type checks all over Bio.Phylo seems un-Pythonic, but checking attribute existence should be easy. From eric.talevich at gmail.com Wed Apr 7 14:18:10 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 7 Apr 2010 14:18:10 -0400 Subject: [Biopython-dev] PhyloXML.BranchColor methods Message-ID: Hi all, There's been some discussion in Bugzilla about using the PhyloXML.BranchColor class to assign colors to Bio.Phylo tree branches. http://bugzilla.open-bio.org/show_bug.cgi?id=3047 Currently, one sets the color for a clade by assigning a BranchColor instance to the clade's color attribute: from Bio import Phylo from Bio.Phylo import PhyloXML as PX tree = Phylo.read(..., 'phyloxml') critters = tree.find(name='Rattus') critters.color = PX.BranchColor(0, 128, 0) # or, using HTML/matplotlib color names: critters.color = PX.BranchColor.from_name('green') The BranchColor class has these methods: from_name -- a class method that looks up RGB values in the hard-coded dictionary BranchColor.color_names to_hex -- a 24-bit hex string, e.g. '#00A000', suitable for HTML/CSS and matplotlib to_rgb -- a tuple of float RGB values, scaled 0 to 1.0 (See http://github.com/biopython/biopython/blob/master/Bio/Phylo/PhyloXML.py) Here are some proposals. Please let me know which of these you like or hate. 1. Add a function Bio.Phylo.PhyloXML.color(...) which behaves like the clade.color property Peter suggested earlier: def color(thing): if isinstance(thing, BranchColor): return thing elif isinstance(thing, basestring): if thing in BranchColor.color_names: return BranchColor.from_name(thing) elif len(thing) == 7 and thing[0] == '#': # CSS/HTML/matplotlib-style hex string return BranchColor.from_hex_string(thing) raise ValueError("Fail!") elif hasattr(thing, '__iter__') and len(thing) == 3: # RGB triple -- an abstract base class would be nice here # (or, take *args instead of thing) return BranchColor(*thing) raise ValueError("Fail!") Then the last line of the above example would be: critters.color = PX.color('green') 2. Add a class method from_hex_string for constructing BranchColor objects from a hex string like '#FF00AA' This complements the to_hex function (to be renamed to_hex_string, unless someone has a better name for it). The color function given above assumes this method exists. 3. Drop the to_rgb method; it's confusing and floating-point conversions lead to bugs. 4. New __repr__ and __str__ methods: >>> critters.color BranchColor(red=0, green=128, blue=0) >>> print critters.color (0, 128, 0) I'm don't think any of the other PhyloXML classes warrant a similar treatment -- except possibly PhyloXML.Sequence, which can be built from at SeqRecord using the from_seqrecord class method. Any other suggestions along these lines? Thanks, Eric From bugzilla-daemon at portal.open-bio.org Wed Apr 7 16:15:27 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Apr 2010 16:15:27 -0400 Subject: [Biopython-dev] [Bug 3045] TreeMixin, please define enumerator and other convenience methods In-Reply-To: Message-ID: <201004072015.o37KFRPR028122@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3045 ------- Comment #4 from eric.talevich at gmail.com 2010-04-07 16:15 EST ------- (In reply to comment #0) > > The usage patterns for, say, setting a phyloXML property from an array prop_arr > should look something like: > > [node.set_property(prop_arr[i], *prop_params, **prop_keywords) > for i, node in tree.enumerate_internals()] How about: [node.properties.append(PX.Property(prop_arr[i], *prop_args, **prop_kwargs[i])) for i, node in enumerate(tree.find_clades(terminal=False))] Is there something different about this form versus your example above that hurts performance? > The three issues that frustrate such concision are > (1) internal nodes, terminal nodes, and all nodes are not currently > on an equal footing with respect to methods For your usage it might be faster to use the generators: find_clades(terminal-False), find_clades(terminal=True), find_clades() I'm considering renaming 'find_clades' to 'find', and 'find' to 'find_any' -- would the shorter name make your code a little cleaner? We could also have 'get_nonterminal' and 'get_all_clades' -- I'm not so sure that the last one is useful enough to justify cluttering the API further; what do you think? (I actually balked at add get_terminals() originally, since it's so simple.) > (2) there are no enumerator methods Doesn't the enumerate() function work just as well, or even better, with the functional/array-oriented programming style you're using? The find_* methods return lazily-evaluated iterables to enable this kind of usage in a memory-efficient way. > Here I give some convenience methods that I wish were defined in TreeMixin. I > have tested them as standalone methods. I hope you'll see fit to include them > at some point. > > def count_internals(self): > """Counts the number of non-terminal (internal) nodes within this tree.""" > return [i for i,e in enumerate_internals(self)][-1] + 1 I can add a convenience function that would help: def iterlen(items): for i, x in enumerate(items): count = i return count + 1 Then count_internals(tree) is the same as: iterlen(tree.find_clades(terminal=False)) Or, if we add get_nonterminals() it's easy: len(tree.get_nonterminals()) > def enumerate_internals(self): > """Returns an enumerator of non-terminal clades""" > return enumerate(self.find_clades(terminal=False)) > > def enumerate_terminals(self): > """Returns an enumerator of terminal clades""" > return enumerate(self.find_clades(terminal=True)) > > def enumerate_all(self): > """Returns an enumerator on all clades""" > return enumerate(self.find_clades()) I can see why these are handy in your own code, because you're using them a lot, but I don't think they introduce enough new functionality to justify adding more methods to TreeMixin. > Less critical but still useful are the following two methods (and one private > utility) that I find useful for operations on trees: > > def is_semipreterminal(self): > """True if any direct descendent is terminal.""" > if self.root.is_terminal(): > return False > for clade in self.clades: > if clade.is_terminal(): > return True > return False Is semipreterminal a standard name for nodes like this? In Python 2.5 and later, you could also do: any(clade.is_terminal() for clade in self) > def terminal_neighbor_dists(self): > """Return a list of distances between adjacent terminals""" > return [self.distance(*i) for i in > _generate_pairs(self.find_clades(terminal=True))] > > def _generate_pairs(self): > import itertools > pairs = itertools.tee(self) > pairs[1].next() > return itertools.izip(pairs[0], pairs[1]) Interesting. Getting philosophical -- I don't intend for TreeMixin to have a built-in method for every possible use case, but one of my goals in Bio.Phylo is to provide all the low-level functionality necessary so that when you do have to write your own function to do something special, it doesn't take much new code. So, I'm quite pleased that you were able to implement this functionality for yourself in just 4 lines of (non-scaffolding) code. The biggest weakness in Bio.Phylo from my viewpoint is that most of the TreeMixin methods do some portion of a full-tree search every time they are called -- there's no internal lookup table. So to make more efficient algorithms possible, I added some methods that do as much as possible in one shot. Example: rather than a distance_to(node) method, we have TreeMixin.depths() which returns a dictionary of all nodes mapped to their respective total branch lengths from the root. What other whole-tree operations along this philosophy would you like to see implemented? Some ideas: - heights() -- like depths(), but mapping each node to the distance to the nearest (or farthest?) terminal - names() -- map each clade name to the clade instance. Clades with no name won't be in the dictionary. Each of these could take a 'target' specification like get_path does, so you can restrict the result to a specific set of clades (e.g. terminals). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Apr 7 16:29:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 7 Apr 2010 21:29:47 +0100 Subject: [Biopython-dev] PhyloXML.BranchColor methods In-Reply-To: References: Message-ID: On Wed, Apr 7, 2010 at 7:18 PM, Eric Talevich wrote: > Hi all, > > There's been some discussion in Bugzilla about using the > PhyloXML.BranchColor class to assign colors to Bio.Phylo tree > branches. > http://bugzilla.open-bio.org/show_bug.cgi?id=3047 > > Currently, one sets the color for a clade by assigning a BranchColor > instance to the clade's color attribute: > > from Bio import Phylo > from Bio.Phylo import PhyloXML as PX > > tree = Phylo.read(..., 'phyloxml') > critters = tree.find(name='Rattus') > critters.color = PX.BranchColor(0, 128, 0) > # or, using HTML/matplotlib color names: > critters.color = PX.BranchColor.from_name('green') > > The BranchColor class has these methods: > > ?from_name -- a class method that looks up RGB values in the > hard-coded dictionary BranchColor.color_names We could probably move the lookup table under Bio.Data, where it might also be useful for Bio.Graphics. I assume you are using standard HTML/CSS color names? > ?to_hex -- a 24-bit hex string, e.g. '#00A000', suitable for HTML/CSS > and matplotlib > ?to_rgb -- a tuple of float RGB values, scaled 0 to 1.0 > > (See http://github.com/biopython/biopython/blob/master/Bio/Phylo/PhyloXML.py) > > > Here are some proposals. Please let me know which of these you like or hate. > > 1. Add a function Bio.Phylo.PhyloXML.color(...) which behaves like the > clade.color property Peter suggested earlier: I was suggesting adding a property to the clade (which could for example map color names or RBG triples to the BranchColor objects automatically). It would still be: from Bio import Phylo from Bio.Phylo import PhyloXML as PX tree = Phylo.read(..., 'phyloxml') critters = tree.find(name='Rattus') critters.color = PX.BranchColor(0, 128, 0) BUT, you could choose to allow: critters.color = (0, 128, 0) Or a named color, critters.color = "green" Or a hex string. critters.color = "#008000" and have the property set method convert these into the same result, BranchColor(0, 128, 0). > 2. Add a class method from_hex_string for constructing BranchColor > objects from a hex string like '#FF00AA' > > This complements the to_hex function (to be renamed to_hex_string, > unless someone has a better name for it). The color function given > above assumes this method exists. Hmm, to_hex seems OK to me. > 3. Drop the to_rgb method; it's confusing and floating-point > conversions lead to bugs. I had assumed to_rgb would give a tuple of ints in the range 0 to 255 (following HTML/CSS color conventions). That would avoid the rounding issue. > 4. New __repr__ and __str__ methods: > >>>> critters.color > BranchColor(red=0, green=128, blue=0) >>>> print critters.color > (0, 128, 0) Personally I would like an HTML style output, hash then a six character hex number. Anyone planning to look at the XML should know these from HTML and CSS. However, I recognise this isn't universally understood. > I'm don't think any of the other PhyloXML classes warrant a similar > treatment -- except possibly PhyloXML.Sequence, which can be built > from at SeqRecord using the from_seqrecord class method. Any other > suggestions along these lines? Colors are special enough to warrant special attention. Possibly also width would to. Peter From eric.talevich at gmail.com Wed Apr 7 17:06:11 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 7 Apr 2010 17:06:11 -0400 Subject: [Biopython-dev] PhyloXML.BranchColor methods In-Reply-To: References: Message-ID: On Wed, Apr 7, 2010 at 4:29 PM, Peter wrote: > On Wed, Apr 7, 2010 at 7:18 PM, Eric Talevich > wrote: > > Hi all, > > > > There's been some discussion in Bugzilla about using the > > PhyloXML.BranchColor class to assign colors to Bio.Phylo tree > > branches. > > http://bugzilla.open-bio.org/show_bug.cgi?id=3047 > > > > Currently, one sets the color for a clade by assigning a BranchColor > > instance to the clade's color attribute: > > > > from Bio import Phylo > > from Bio.Phylo import PhyloXML as PX > > > > tree = Phylo.read(..., 'phyloxml') > > critters = tree.find(name='Rattus') > > critters.color = PX.BranchColor(0, 128, 0) > > # or, using HTML/matplotlib color names: > > critters.color = PX.BranchColor.from_name('green') > > > > > The BranchColor class has these methods: > > > > from_name -- a class method that looks up RGB values in the > > hard-coded dictionary BranchColor.color_names > > We could probably move the lookup table under Bio.Data, > where it might also be useful for Bio.Graphics. I assume you > are using standard HTML/CSS color names? > OK, that's cool too. I took the color names and values from the HTML standard and W3Schools: http://w3schools.com/html/html_colornames.asp I checked the more exotic names with matplotlib and gcolor2 -- so any name from this list will also work in matplotlib, and consequently, draw_graphviz. > (See > http://github.com/biopython/biopython/blob/master/Bio/Phylo/PhyloXML.py) > > > > Here are some proposals. Please let me know which of these you like or > hate. > > > > 1. Add a function Bio.Phylo.PhyloXML.color(...) which behaves like the > > clade.color property Peter suggested earlier: > > I was suggesting adding a property to the clade (which could for > example map color names or RBG triples to the BranchColor > objects automatically). It would still be: > > from Bio import Phylo > from Bio.Phylo import PhyloXML as PX > tree = Phylo.read(..., 'phyloxml') > critters = tree.find(name='Rattus') > critters.color = PX.BranchColor(0, 128, 0) > > BUT, you could choose to allow: > > critters.color = (0, 128, 0) > > Or a named color, > > critters.color = "green" > > Or a hex string. > > critters.color = "#008000" > > and have the property set method convert these into > the same result, BranchColor(0, 128, 0). > It's pretty magical, but the convenience of "critters.color = 'green'" wins. I'll implement the property to accept a BranchColor, RGB triple, color name, or hex string, and raise a ValueError otherwise. > > 2. Add a class method from_hex_string for constructing BranchColor > > objects from a hex string like '#FF00AA' > > > > This complements the to_hex function (to be renamed to_hex_string, > > unless someone has a better name for it). The color function given > > above assumes this method exists. > > Hmm, to_hex seems OK to me. > My only concern: the builtin hex() returns a string formatted a little differently. Matching that format would be useless here, but I was worried about people being confused. But if you're OK with to/from_hex, then I am too. > 3. Drop the to_rgb method; it's confusing and floating-point > > conversions lead to bugs. > > I had assumed to_rgb would give a tuple of ints in the range > 0 to 255 (following HTML/CSS color conventions). That would > avoid the rounding issue. > Strawmen: - Should to_rgb be renamed to_tuple, then? - if we defined BranchColor.__iter__ as "return (self.red, self.green, self.blue)", then "tuple(clade.color)" would work - if we defined BranchColor.__hex__, then similarly, "hex(clade.color)" would work - ... but those magic methods would hurt discoverability > 4. New __repr__ and __str__ methods: > > > >>>> critters.color > > BranchColor(red=0, green=128, blue=0) > >>>> print critters.color > > (0, 128, 0) > > Personally I would like an HTML style output, hash then a six character > hex number. Anyone planning to look at the XML should know these > from HTML and CSS. However, I recognise this isn't universally > understood. > I'm OK with that too. We could also do a reverse lookup in the color_names table and return the color name instead if there's a match. That would cover most users -- if you know RGB values, you can probably handle hex, and if you just use color names instead then you'll get color names back. > > I'm don't think any of the other PhyloXML classes warrant a similar > > treatment -- except possibly PhyloXML.Sequence, which can be built > > from at SeqRecord using the from_seqrecord class method. Any other > > suggestions along these lines? > > Colors are special enough to warrant special attention. Possibly also > width would to. > Fortunately, width is just a float -- no PhyloXML-specific classes to deal with. Cheers, Eric From biopython at maubp.freeserve.co.uk Wed Apr 7 17:31:42 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 7 Apr 2010 22:31:42 +0100 Subject: [Biopython-dev] PhyloXML.BranchColor methods In-Reply-To: References: Message-ID: On Wed, Apr 7, 2010 at 10:06 PM, Eric Talevich wrote: > On Wed, Apr 7, 2010 at 4:29 PM, Peter wrote: >> >> I was suggesting adding a property to the clade (which could for >> example map color names or RBG triples to the BranchColor >> objects automatically). It would still be: >> >> from Bio import Phylo >> from Bio.Phylo import PhyloXML as PX >> tree = Phylo.read(..., 'phyloxml') >> critters = tree.find(name='Rattus') >> critters.color = PX.BranchColor(0, 128, 0) >> >> BUT, you could choose to allow: >> >> critters.color = (0, 128, 0) >> >> Or a named color, >> >> critters.color = "green" >> >> Or a hex string. >> >> critters.color = "#008000" >> >> and have the property set method convert these into >> the same result, BranchColor(0, 128, 0). >> > > It's pretty magical, but the convenience of "critters.color = 'green'" wins. > I'll implement the property to accept a BranchColor, RGB triple, color name, > or hex string, and raise a ValueError otherwise. Yeah - it feels right to me ;) >> ?> 2. Add a class method from_hex_string for constructing BranchColor >> > objects from a hex string like '#FF00AA' >> > >> > This complements the to_hex function (to be renamed to_hex_string, >> > unless someone has a better name for it). The color function given >> > above assumes this method exists. >> >> Hmm, to_hex seems OK to me. >> > > My only concern: the builtin hex() returns a string formatted a little > differently. Matching that format would be useless here, but I was worried > about people being confused. But if you're OK with to/from_hex, then I am > too. See below. > ?> 3. Drop the to_rgb method; it's confusing and floating-point >> > conversions lead to bugs. >> >> I had assumed to_rgb would give a tuple of ints in the range >> 0 to 255 (following HTML/CSS color conventions). That would >> avoid the rounding issue. >> > > Strawmen: > - Should to_rgb be renamed to_tuple, then? There are other things a color tuple could be, although RGB is the most common (see also CMYK, HSV, ...). > - if we defined BranchColor.__iter__ as "return (self.red, self.green, > self.blue)", then "tuple(clade.color)" would work > - if we defined BranchColor.__hex__, then similarly, "hex(clade.color)" > would work > - ... but those magic methods would hurt discoverability Huh - that is neat, but it hadn't occurred to me to think about supporting those. It isn't helpful to support hex(...) since that usually returns a string which starts "0x..." which isn't what we want for HTML, CSS or XML. Maybe instead of to_hex() we should have to_css_color() or something like that? Peter From bugzilla-daemon at portal.open-bio.org Wed Apr 7 20:49:57 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Apr 2010 20:49:57 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004080049.o380nvUd001788@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 eric.talevich at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #9 from eric.talevich at gmail.com 2010-04-07 20:49 EST ------- (In reply to comment #8 by Peter) > (In reply to comment #7 by Eric) > > That's my interpretation of it -- the color attribute is meant to be handled > > that way by whatever code uses it for drawing. > > > > Which means two things should happen before closing this bug: > > > > - Change the docstring to indicate *user code* is supposed to handle colors > > and widths in a cascading fashion > > > > - Fix draw_graphviz (actually to_networkx) to cascade colors down branches, > > like the Archaeopteryx GUI does > > So the Bio.Phylo drawing code (using NetworkX) doesn't cascade the > colors/widths yet? It does now: http://github.com/biopython/biopython/commit/1ea0c921c9bea6acb2b6b41566383fc54ed4862f (and preceding commits -- sorry about the weight/width mixup) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Apr 8 06:55:29 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 8 Apr 2010 11:55:29 +0100 Subject: [Biopython-dev] test_PhyloXML.py error on Python 2.4 Message-ID: Hi Eric, I noticed that test_PhyloXML.py is failing on Python 2.4, it should be skipped since I don't have ElementTree installed. Have you got access to a Python 2.4 installation to look at this? Thanks Peter ====================================================================== ERROR: test_PhyloXML ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 267, in runTest suite = unittest.TestLoader().loadTestsFromName(name) File "c:\python24\lib\unittest.py", line 524, in loadTestsFromName module = __import__('.'.join(parts_copy)) File "test_PhyloXML.py", line 15, in ? from Bio.Phylo import PhyloXML as PX, PhyloXMLIO File "C:\repositories\biopython_official\build\lib.win32-2.4\Bio\Phylo\__init_ _.py", line 12, in ? from Bio.Phylo._io import parse, read, write, convert File "C:\repositories\biopython_official\build\lib.win32-2.4\Bio\Phylo\_io.py" , line 15, in ? import PhyloXMLIO File "C:\repositories\biopython_official\build\lib.win32-2.4\Bio\Phylo\PhyloXM LIO.py", line 23, in ? from Bio.Phylo import PhyloXML as PX ImportError: cannot import name PhyloXML From eric.talevich at gmail.com Thu Apr 8 09:05:13 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 8 Apr 2010 09:05:13 -0400 Subject: [Biopython-dev] test_PhyloXML.py error on Python 2.4 In-Reply-To: References: Message-ID: On Thu, Apr 8, 2010 at 6:55 AM, Peter wrote: > Hi Eric, > > I noticed that test_PhyloXML.py is failing on Python 2.4, it should > be skipped since I don't have ElementTree installed. Have you got > access to a Python 2.4 installation to look at this? > The traceback says the PhyloXML module is missing, but PhyloXMLIO and the rest of Bio.Phylo are there. Is that normal? I would expect that PhyloXML would still be installed with Biopython on Py2.4, but when the test runs it would trigger a MissingExternalDependency error for ElementTree when importing PhyloXMLIO, and run_tests.py would then skip it. I don't have Py2.4 on this machine but I can track down a copy. (Annoyingly, Ubuntu seems to have dropped Pythons 2.4 and 2.5 from the official repos in Lucid Lynx.) -Eric > ====================================================================== > ERROR: test_PhyloXML > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "run_tests.py", line 267, in runTest > suite = unittest.TestLoader().loadTestsFromName(name) > File "c:\python24\lib\unittest.py", line 524, in loadTestsFromName > module = __import__('.'.join(parts_copy)) > File "test_PhyloXML.py", line 15, in ? > from Bio.Phylo import PhyloXML as PX, PhyloXMLIO > File > "C:\repositories\biopython_official\build\lib.win32-2.4\Bio\Phylo\__init_ > _.py", line 12, in ? > from Bio.Phylo._io import parse, read, write, convert > File > "C:\repositories\biopython_official\build\lib.win32-2.4\Bio\Phylo\_io.py" > , line 15, in ? > import PhyloXMLIO > File > "C:\repositories\biopython_official\build\lib.win32-2.4\Bio\Phylo\PhyloXM > LIO.py", line 23, in ? > from Bio.Phylo import PhyloXML as PX > ImportError: cannot import name PhyloXML > From biopython at maubp.freeserve.co.uk Thu Apr 8 09:23:48 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 8 Apr 2010 14:23:48 +0100 Subject: [Biopython-dev] test_PhyloXML.py error on Python 2.4 In-Reply-To: References: Message-ID: On Thu, Apr 8, 2010 at 2:05 PM, Eric Talevich wrote: > On Thu, Apr 8, 2010 at 6:55 AM, Peter > wrote: >> >> Hi Eric, >> >> I noticed that test_PhyloXML.py is failing on Python 2.4, it should >> be skipped since I don't have ElementTree installed. Have you got >> access to a Python 2.4 installation to look at this? > > The traceback says the PhyloXML module is missing, but PhyloXMLIO and the > rest of Bio.Phylo are there. Is that normal? I would expect that PhyloXML > would still be installed with Biopython on Py2.4, but when the test runs it > would trigger a MissingExternalDependency error for ElementTree when > importing PhyloXMLIO, and run_tests.py would then skip it. Yes, and I don't understand why that doesn't happen in the test suite: C:\>c:\python24\python Python 2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio.Phylo import PhyloXML as PX Traceback (most recent call last): File "", line 1, in ? File "c:\python24\lib\site-packages\Bio\Phylo\__init__.py", line 12, in ? from Bio.Phylo._io import parse, read, write, convert File "c:\python24\Lib\site-packages\Bio\Phylo\_io.py", line 15, in ? import PhyloXMLIO File "c:\python24\Lib\site-packages\Bio\Phylo\PhyloXMLIO.py", line 42, in ? raise MissingExternalDependencyError( Bio.MissingExternalDependencyError: No ElementTree module was found. Use Python 2.5+, lxml or elementtree if you want to use Bio.PhyloXML. >>> Odd. Peter From bugzilla-daemon at portal.open-bio.org Thu Apr 8 17:56:41 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Apr 2010 17:56:41 -0400 Subject: [Biopython-dev] [Bug 3054] New: Add upper and lower methods to the SeqRecord Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3054 Summary: Add upper and lower methods to the SeqRecord Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Unlike some other potential string or Seq like methods the SeqRecord lacks, I don't see any problems with annotation with adding upper and lower methods. See also discussion on Bug 2351. The upper and lower methods are useful, e.g. making a mixed case FASTQ file into upper case: from Bio import SeqIO records = (rec.upper() for rec in SeqIO.parse("mixed.fastq", "fastq")) SeqIO.write(records, "upper.fastq", "fastq") Patch to follow. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 8 17:57:24 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Apr 2010 17:57:24 -0400 Subject: [Biopython-dev] [Bug 3054] Add upper and lower methods to the SeqRecord In-Reply-To: Message-ID: <201004082157.o38LvOpZ011476@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3054 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-08 17:57 EST ------- Created an attachment (id=1477) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1477&action=view) Adds upper and lower methods to the SeqRecord -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 8 18:02:30 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Apr 2010 18:02:30 -0400 Subject: [Biopython-dev] [Bug 2822] Bio.Application.AbstractCommandline - properties and kwargs In-Reply-To: Message-ID: <201004082202.o38M2UjF011682@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2822 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-08 18:02 EST ------- Should have marked this as fixed a while ago... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 8 18:04:04 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Apr 2010 18:04:04 -0400 Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text output with NCBStandalone.PSIBlastParser In-Reply-To: Message-ID: <201004082204.o38M44WB011716@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2927 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |WONTFIX ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-08 18:04 EST ------- Marking this as WONTFIX, since the problem output does appear to be a bug in the old "legacy" NCBI BLAST tool (and is fixed in the new NCBI BLAST+ tool). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 8 18:10:37 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Apr 2010 18:10:37 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004082210.o38MAbo7011867@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 ------- Comment #6 from eric.talevich at gmail.com 2010-04-07 01:05 EST ------- (In reply to comment #0) > It would be nice if there were get/set properties for phyloXML objects that > were easier and more concise to use. Right now, to set, say, a phyloXML > property, one has to read the code to learn the names and arguments of the > Property class and also to learn that properties are added by appending to a > list. Yes, it's easier to tweak the class definitions if there's not much syntactic sugar to get in the way. This is still pretty new code ;) but of course I'm open to suggestions. > Besides the matter of convenience, there is also a question about how the > properties and taxonomies objects behave. I will take the matter up with the > phyloXML mailing list, but I believe that these objects should be > dictionary-like rather than list-like. That is, duplicate ref values should > not be allowed because the question of how to handle duplicates would have to > get pushed down to the user level and will be inconsistent. The Events class (clade.events attribute) mimics a dictionary. Have you used that yet? About clade.properties: If ordering of properties doesn't matter, 'ref' is guaranteed to be unique at a node, and it seems to be the right way to index the other associated data, then I can make clade.properties act like a dictionary. Can we confirm all of these? And for the implementation, can you provide a sketch of what you'd like the final structure to look like, and maybe a contrived doctest-like code example showing what you'd like to be able to do? In many cases, the phyloXML spec doesn't currently promise enough to make nice shortcuts work without the possibility of breaking in the future. For example, check out this new demo with *two* bootstrap values for every clade: http://www.phylosoft.org/archaeopteryx/examples/data/multiple_supports.xml I was tempted to make confidences act like a dictionary indexed by support type, but clearly now that wouldn't have worked. A list of Confidence objects lets us stay faithful to the raw XML representation. > def set_property(self, *propArgs, **propkwArgs): > for property in self.properties: > if property.ref == propArgs[1]: > property = PhyloXML.Property(propArgs) > return > self.properties.append(PhyloXML.Property(*propArgs, **propkwArgs)) > > def get_property(self, key): > for property in self.properties: > if property.ref == key: > return property.value > raise KeyError It's possible that Bio.Phylo will pick up the convention of "add_foo/get_foo" methods where a property would be overly magical, and something noteworthy is going on internally. Alignment objects have "add_sequence", and Phylogeny objects have "get_alignment". Would you use a Phylogeny method called add_alignment, taking something like a Phylip character matrix? We can figure out a sugared interface for clade.properties once we know how which of the requirements stated above will actually be guaranteed. > def set_ID(self, *idArgs, **idkwArgs): > self.node_id = PhyloXML.Id(*idArgs, **idkwArgs) If you do "from Bio.Phylo import PhyloXML as PX" it really doesn't save any typing, and the **kwargs magic is even less suitable for introspection. It's not possible to take advantage of all the PhyloXML annotations available without learning about the annotation classes the ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-08 18:10 EST ------- (In reply to comment #6) > > It's possible that Bio.Phylo will pick up the convention of "add_foo/get_foo" > methods where a property would be overly magical, and something noteworthy is > going on internally. Alignment objects have "add_sequence", and Phylogeny > objects have "get_alignment". Would you use a Phylogeny method called > add_alignment, taking something like a Phylip character matrix? > Note that while the "old" Alignment object has an add_sequence method, it is now tagged as obsolete with the "new" Alignment object in Biopython 1.54 (instead you append a SeqRecord). Regarding PhyloXML, would it fit to rename "get_alignment" as "to_alignment"? That is a fairly common naming convention. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 8 18:14:24 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Apr 2010 18:14:24 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004082214.o38MEOvt011951@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-08 18:14 EST ------- (In reply to comment #6) > In many cases, the phyloXML spec doesn't currently promise enough to make nice > shortcuts work without the possibility of breaking in the future. For example, > check out this new demo with *two* bootstrap values for every clade: > http://www.phylosoft.org/archaeopteryx/examples/data/multiple_supports.xml I've actually done something like that a few years back, using bootstrap values from two different tree building tools (NJ and ML I think). I had to do this by loading two Newick files of the same tree with different bootstraps - quite messy! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Fri Apr 9 02:09:14 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 9 Apr 2010 02:09:14 -0400 Subject: [Biopython-dev] test_PhyloXML.py error on Python 2.4 In-Reply-To: References: Message-ID: On Thu, Apr 8, 2010 at 9:23 AM, Peter wrote: > On Thu, Apr 8, 2010 at 2:05 PM, Eric Talevich > wrote: > > On Thu, Apr 8, 2010 at 6:55 AM, Peter > > wrote: > >> > >> Hi Eric, > >> > >> I noticed that test_PhyloXML.py is failing on Python 2.4, it should > >> be skipped since I don't have ElementTree installed. Have you got > >> access to a Python 2.4 installation to look at this? > > > > The traceback says the PhyloXML module is missing, but PhyloXMLIO and the > > rest of Bio.Phylo are there. Is that normal? I would expect that PhyloXML > > would still be installed with Biopython on Py2.4, but when the test runs > it > > would trigger a MissingExternalDependency error for ElementTree when > > importing PhyloXMLIO, and run_tests.py would then skip it. > > Yes, and I don't understand why that doesn't happen in the test suite: > > C:\>c:\python24\python > Python 2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)] on > win32 > Type "help", "copyright", "credits" or "license" for more information. > >>> from Bio.Phylo import PhyloXML as PX > Traceback (most recent call last): > File "", line 1, in ? > File "c:\python24\lib\site-packages\Bio\Phylo\__init__.py", line 12, in ? > from Bio.Phylo._io import parse, read, write, convert > File "c:\python24\Lib\site-packages\Bio\Phylo\_io.py", line 15, in ? > import PhyloXMLIO > File "c:\python24\Lib\site-packages\Bio\Phylo\PhyloXMLIO.py", line 42, in > ? > raise MissingExternalDependencyError( > Bio.MissingExternalDependencyError: No ElementTree module was found. Use > Python > 2.5+, lxml or elementtree if you want to use Bio.PhyloXML. > >>> > > Odd. > > Peter > Well, it's fixed in GitHub now: http://github.com/biopython/biopython/commit/7bd18aaf9582dd7d9193cc39d7faf6d51e3e4161 It seems like imports are being cached in some way so that an import that failed once is not tried again. In any case, it still raises an ImportError which we can catch and turn into another MissingExternalDependencyError. -Eric From eric.talevich at gmail.com Fri Apr 9 02:19:17 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 9 Apr 2010 02:19:17 -0400 Subject: [Biopython-dev] PhyloXML.BranchColor methods In-Reply-To: References: Message-ID: On Wed, Apr 7, 2010 at 5:31 PM, Peter wrote: > On Wed, Apr 7, 2010 at 10:06 PM, Eric Talevich > wrote: > > On Wed, Apr 7, 2010 at 4:29 PM, Peter >wrote: > >> > >> I was suggesting adding a property to the clade (which could for > >> example map color names or RBG triples to the BranchColor > >> objects automatically). It would still be: > >> > >> from Bio import Phylo > >> from Bio.Phylo import PhyloXML as PX > >> tree = Phylo.read(..., 'phyloxml') > >> critters = tree.find(name='Rattus') > >> critters.color = PX.BranchColor(0, 128, 0) > >> > >> BUT, you could choose to allow: > >> > >> critters.color = (0, 128, 0) > >> > >> Or a named color, > >> > >> critters.color = "green" > >> > >> Or a hex string. > >> > >> critters.color = "#008000" > >> > >> and have the property set method convert these into > >> the same result, BranchColor(0, 128, 0). > >> > > > > It's pretty magical, but the convenience of "critters.color = 'green'" > wins. > > I'll implement the property to accept a BranchColor, RGB triple, color > name, > > or hex string, and raise a ValueError otherwise. > > Yeah - it feels right to me ;) > I implemented this property; it's in GitHub now. >> Hmm, to_hex seems OK to me. > >> > > > > My only concern: the builtin hex() returns a string formatted a little > > differently. Matching that format would be useless here, but I was > worried > > about people being confused. But if you're OK with to/from_hex, then I am > > too. > I left these as from_hex and to_hex. The docstrings are clear enough about what the methods do, I think. > > 3. Drop the to_rgb method; it's confusing and floating-point > >> > conversions lead to bugs. > >> > >> I had assumed to_rgb would give a tuple of ints in the range > >> 0 to 255 (following HTML/CSS color conventions). That would > >> avoid the rounding issue. > At some point I changed to_rgb to return a tuple as you'd expect, without rescaling. It's basically the constructor in reverse now. Cheers, Eric From eric.talevich at gmail.com Fri Apr 9 02:39:27 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 9 Apr 2010 02:39:27 -0400 Subject: [Biopython-dev] String representation of trees in Bio.Phylo In-Reply-To: <20100404171903.GF19540@kunkel> References: <20100404171903.GF19540@kunkel> Message-ID: Hi Brad et al., I guess I can turn this into a specific proposal now, and if no one objects, just do it: On Sun, Apr 4, 2010 at 1:19 PM, Brad Chapman wrote: > Hi Eric; > > > The new phylogenetics module Bio.Phylo supports a few new ways of > displaying > > trees. I'm trying to decide which of these should be used as the informal > > string representation for whole trees, i.e. what happens when you type > > "print tree" for some newly parsed tree object. > [...[ > > The pretty_print function, with the show_all option, uses 'repr' > recursively > > to display the tree's nodes. I think this is probably the best choice for > > Tree.__str__, but it can be a bit cluttered if a lot of information is > > attached to each node/subtree/clade. > > > > >>> Phylo.pretty_print(tree, show_all=True) > > Phylogeny(rooted='True', description='phyloXML allows to use either a > > "branch_length" attribute...', name='example from Prof. Joe Felsenstein's > > book "Inferring Phyl...') > > Clade() > > Clade(branch_length='0.06') > > Clade(branch_length='0.102', name='A') > > Clade(branch_length='0.23', name='B') > > Clade(branch_length='0.4', name='C') > > I like this one. Agreed that it could get ugly, but I think it shows > the structure and associated information well. > Action items: 1. Move the pretty_print(show_all=True) code into Tree.__str__, leaving __repr__ as it is, since pretty_print relies on it. 2. Remove the pretty_print function from Bio.Phylo._utils, dropping the show_all=False functionality altogether since Tree.__str__ is more informative and draw_ascii is prettier. Wrinkle: Making Subtrees work the same way would break a few things -- I've been treating str(clade) as something that generates a short, useful label for the node, like clade.name if it's available. For example, draw_ascii uses it this to get taxon labels. This means "print tree" shows the whole recursive object tree, while "print tree.root" shows a label for the node, which is just the class name ("Clade") if no name is set. Are we OK with this? Thanks, Eric > As an alternative, we could print the tree as ASCII art, as some other > > toolkits do. However, this function is very limited -- it doesn't print > > internal node labels, and trees of more than a couple hundred nodes will > > look strange, since the drawing is compressed into a fixed number of > > character columns (default 80). > > > > >>> Phylo.draw_ascii(tree) > > __________________ A > > __________| > > _| |___________________________________________ B > > | > > > |___________________________________________________________________________ > C > > This is a good idea. I think this is more useful than the current > pretty_print without show_all for getting a quick overview of the > tree. > > Brad > From eric.talevich at gmail.com Fri Apr 9 02:39:27 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 9 Apr 2010 02:39:27 -0400 Subject: [Biopython-dev] String representation of trees in Bio.Phylo In-Reply-To: <20100404171903.GF19540@kunkel> References: <20100404171903.GF19540@kunkel> Message-ID: Hi Brad et al., I guess I can turn this into a specific proposal now, and if no one objects, just do it: On Sun, Apr 4, 2010 at 1:19 PM, Brad Chapman wrote: > Hi Eric; > > > The new phylogenetics module Bio.Phylo supports a few new ways of > displaying > > trees. I'm trying to decide which of these should be used as the informal > > string representation for whole trees, i.e. what happens when you type > > "print tree" for some newly parsed tree object. > [...[ > > The pretty_print function, with the show_all option, uses 'repr' > recursively > > to display the tree's nodes. I think this is probably the best choice for > > Tree.__str__, but it can be a bit cluttered if a lot of information is > > attached to each node/subtree/clade. > > > > >>> Phylo.pretty_print(tree, show_all=True) > > Phylogeny(rooted='True', description='phyloXML allows to use either a > > "branch_length" attribute...', name='example from Prof. Joe Felsenstein's > > book "Inferring Phyl...') > > Clade() > > Clade(branch_length='0.06') > > Clade(branch_length='0.102', name='A') > > Clade(branch_length='0.23', name='B') > > Clade(branch_length='0.4', name='C') > > I like this one. Agreed that it could get ugly, but I think it shows > the structure and associated information well. > Action items: 1. Move the pretty_print(show_all=True) code into Tree.__str__, leaving __repr__ as it is, since pretty_print relies on it. 2. Remove the pretty_print function from Bio.Phylo._utils, dropping the show_all=False functionality altogether since Tree.__str__ is more informative and draw_ascii is prettier. Wrinkle: Making Subtrees work the same way would break a few things -- I've been treating str(clade) as something that generates a short, useful label for the node, like clade.name if it's available. For example, draw_ascii uses it this to get taxon labels. This means "print tree" shows the whole recursive object tree, while "print tree.root" shows a label for the node, which is just the class name ("Clade") if no name is set. Are we OK with this? Thanks, Eric > As an alternative, we could print the tree as ASCII art, as some other > > toolkits do. However, this function is very limited -- it doesn't print > > internal node labels, and trees of more than a couple hundred nodes will > > look strange, since the drawing is compressed into a fixed number of > > character columns (default 80). > > > > >>> Phylo.draw_ascii(tree) > > __________________ A > > __________| > > _| |___________________________________________ B > > | > > > |___________________________________________________________________________ > C > > This is a good idea. I think this is more useful than the current > pretty_print without show_all for getting a quick overview of the > tree. > > Brad > From bugzilla-daemon at portal.open-bio.org Fri Apr 9 13:18:49 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Apr 2010 13:18:49 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004091718.o39HInW2015975@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 ------- Comment #9 from eric.talevich at gmail.com 2010-04-09 13:18 EST ------- (In reply to comment #7, Peter) > Regarding PhyloXML, would it fit to rename "get_alignment" as "to_alignment"? > That is a fairly common naming convention. That's done in GitHub now: http://github.com/biopython/biopython/commit/22cd408c4433434472d12a5959ecc9d347c03660 (In reply to comment #6, myself) > > Would you use a Phylogeny method called > > add_alignment, taking something like a Phylip character matrix? I still think an "add_alignment" method on Phylogeny would be useful, but we can start with a cookbook on the wiki until we're confident in the right way to do it. (In reply to comment #5, Joel) > There have long been built-in get/sets in python via __setattr__() and > __getattribute__(). That's where the code I sent should live. Putting code > to get and especially to set in those methods means that a user doesn't have > to look up whatever classes were defined for attributes (e.g. like finding > that 'color' is called 'BranchColor') and doesn't need to know that > taxonomies and properties are only set through appending through lists. On the mailing list we decided that branch color was a special case for allowing a shortcut, because RGB triples and 24-bit hex strings are well-established ways to represent color codes. Branch width is already just a float, so no problem there. The rest of the attributes are special PhyloXML classes, and I think the right solution there is for me to write documentation and for users to read it. But the behavior of taxonomies, properties and the other sometimes-plural attributes should be fixed: they already support singular getters for one-element lists, so there should be corresponding setters that (a) put one element in the list if it's empty (b) replace the element in the list if there's only one (c) raise an exception if there are already multiple elements in the list I'm leaving this bug open for that. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Apr 10 00:10:39 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 10 Apr 2010 00:10:39 -0400 Subject: [Biopython-dev] [Bug 3045] TreeMixin, please define enumerator and other convenience methods In-Reply-To: Message-ID: <201004100410.o3A4AdKV032271@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3045 ------- Comment #5 from eric.talevich at gmail.com 2010-04-10 00:10 EST ------- (In reply to comment #4, myself) > (In reply to comment #0, Joel) > > (1) internal nodes, terminal nodes, and all nodes are not currently > > on an equal footing with respect to methods > > We could also have 'get_nonterminal' and 'get_all_clades' -- I'm not so sure > that the last one is useful enough to justify cluttering the API further; what > do you think? (I actually balked at add get_terminals() originally, since it's > so simple.) I added get_nonterminals() to TreeMixin: http://github.com/biopython/biopython/commit/de024f7d700a8ce83a64bc9f8cfd6273cefe95bc Do we need a get_all_clades method? Is that a good name? > > Here I give some convenience methods that I wish were defined in > > TreeMixin. I have tested them as standalone methods. I hope you'll > > see fit to include them at some point. > > > > def count_internals(self): > > """Counts the number of non-terminal (internal) nodes within this tree.""" > > return [i for i,e in enumerate_internals(self)][-1] + 1 > > I can add a convenience function that would help: > > def iterlen(items): > for i, x in enumerate(items): > count = i > return count + 1 > > Then count_internals(tree) is the same as: > iterlen(tree.find_clades(terminal=False)) > > Or, if we add get_nonterminals() it's easy: > len(tree.get_nonterminals()) Both of these can be done now, but len(tree.get_nonterminals()) is easiest. iterlen() is hidden in _sugar.py for now: http://github.com/biopython/biopython/commit/c8ce7f7b0314b54084b62759b1f82488374cae28 > > Less critical but still useful are the following two methods (and one private > > utility) that I find useful for operations on trees: > > > > def is_semipreterminal(self): > > """True if any direct descendent is terminal.""" > > if self.root.is_terminal(): > > return False > > for clade in self.clades: > > if clade.is_terminal(): > > return True > > return False > > Is semipreterminal a standard name for nodes like this? > > In Python 2.5 and later, you could also do: > any(clade.is_terminal() for clade in self) > > > > def terminal_neighbor_dists(self): > > """Return a list of distances between adjacent terminals""" > > return [self.distance(*i) for i in > > _generate_pairs(self.find_clades(terminal=True))] > > > > def _generate_pairs(self): > > import itertools > > pairs = itertools.tee(self) > > pairs[1].next() > > return itertools.izip(pairs[0], pairs[1]) I'll add these to the wiki as cookbook entries. One more thing -- should we rename the find_all and find_clades methods? I'm leaving this bug open as a reminder to decide that (and the get_all_clades question above) before the 1.54 release. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Apr 10 00:13:47 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 10 Apr 2010 00:13:47 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004100413.o3A4Dlg7032456@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 eric.talevich at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #10 from eric.talevich at gmail.com 2010-04-10 00:13 EST ------- (In reply to comment #9) > [T]he behavior of taxonomies, properties and the other sometimes-plural > attributes should be fixed: they already support singular getters for > one-element lists, so there should be corresponding setters that > (a) put one element in the list if it's empty > (b) replace the element in the list if there's only one > (c) raise an exception if there are already multiple elements in the list > > I'm leaving this bug open for that. > Done, and then some: http://github.com/biopython/biopython/commit/a1d4a1be469c6d06fcb093073dff0679b7ec5257 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Sat Apr 10 16:33:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 10 Apr 2010 21:33:57 +0100 Subject: [Biopython-dev] [Biopython] Bio.Application now subprocess? In-Reply-To: <1101855478758905131@unknownmsgid> References: <1101855478758905131@unknownmsgid> Message-ID: On Sat, Apr 10, 2010 at 8:27 PM, Vincent Davis wrote: > > So that was/is my plan to use it to writes command lone tools for the > affymetrix apt dev commandline app. unless this is redundant in a way > I am not aware of. > Thanks Ah - right, now this makes sense. Are you on the dev mailing list (CC'd)? That would be a better place to ask. I'd start by looking at Bio.Align.Applications (less subclasses there) as a model. Peter From vincent at vincentdavis.net Sun Apr 11 01:01:49 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Sat, 10 Apr 2010 23:01:49 -0600 Subject: [Biopython-dev] _Switch and _Option questions Message-ID: I am just getting started with class AbstractCommandline(object): The first set of questions is more about questions I had after reading the documentation. First questions/comment mostly about documentation: It appears that not all attributes need to be set in _Options and _Switch, for example _Switch: o is_set -- if the parameter has been set, I don't see and example of this being specified in a _Switch self.parameter statement. I see that it defaults to False, Is there a case that this is set in a self.parameters statement? I think I understand this. It is just documented as though it should be specified. If I don't need a checker_function in _Options then I must use an None, I guess as a place holder? Not clear if equate should be used as "equate" or 0,1, by this I mean it is not documented well same goes for is_required Looks like Value is similar to is_set in that it is not ever specified in a self.Parameters statement I don't think the example using Bio.Emboss.Applications import WaterCommandline includes the use of a _switch --------- Second set of questions For both of the next questions I am mostly asking if the feature/functionality is part of the class AbstractCommandline(object): If two _Switches are mutually exclusive in there use is there a way to make sure that they not both specified? Example anywhere? Basically same question for _Option, How do I refer to the value of another _option. Thanks *Vincent Davis 720-301-3003 * vincent at vincentdavis.net my blog | LinkedIn From biopython at maubp.freeserve.co.uk Sun Apr 11 06:11:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 11 Apr 2010 11:11:37 +0100 Subject: [Biopython-dev] _Switch and _Option questions In-Reply-To: References: Message-ID: On Sun, Apr 11, 2010 at 6:01 AM, Vincent Davis wrote: > I am just getting started with class AbstractCommandline(object): The first > set of questions is more about questions I had after reading the > documentation. Keep in mind that the documentation there is is more aimed at the end user, rather than a developer writing a new command line wrapper. There are also some "historical" bits which we are phasing out still (like the deprecated ApplicationResult class) which add confusion. > First questions/comment mostly about documentation: > It appears that not all attributes need to be set in _Options and _Switch, > for example > ? ?_Switch: o is_set -- if the parameter has been set, I don't see and > example of this being specified in a ?_Switch self.parameter statement. I > see that it defaults to False, Is there a case that this is set in a > self.parameters statement? I think I understand this. It is just documented > as though it should be specified. Switches are either true or false, meaning either appended to the command line string or not. This boolean is held in the is_set parameter. They don't take values (see _Option for that). > If I don't need a checker_function in _Options then I must use an None, I > guess as a place holder? I think so from memory. > Not clear if equate should be used as ?"equate" or 0,1, by this I mean it is > not documented well > ? same goes for is_required They should be interpreted as booleans, so for new code using True and False is clearer, but 1 and 0 are also fine (and used in a lot of the older code). > Looks like Value is similar to is_set in that it is not ever specified in a > self.Parameters statement No, the user specifies the value if they want to use that option. > I don't think the example using Bio.Emboss.Applications import > WaterCommandline includes the use of a _switch They do - all the EMBOSS wrapps have common switches like auto, stdout etc defined in the base class. > --------- > Second set of questions > For both of the next questions I am mostly asking if the > feature/functionality is part of the class AbstractCommandline(object): > > If two _Switches are mutually exclusive in there use is there a way to make > sure that they not both specified? Example anywhere? This isn't supported in Bio.Application explicitly, but can be done as in Bio.Blast.Applications (see the _validate methods). Do you really need to do this? You could just leave it to the user. > Basically same question for _Option, How do I refer to the value of another > _option. Just like an end user would, via the property it defines. See the Bio.Blast.Applications examples. Peter From eric.talevich at gmail.com Mon Apr 12 11:33:37 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 12 Apr 2010 11:33:37 -0400 Subject: [Biopython-dev] Another contributor for v1.54 Message-ID: Hello, I remembered one more contributor who I think should be mentioned with the Biopython 1.54 release: Diana Jaunzeikare, who worked on phyloXML support in BioRuby last summer parallel to my project, wrote a test file called made_up.xml which we're using in the Bio.Phylo test suite. http://github.com/biopython/biopython/commit/1a40c886757a7266ac8a0a74a31ca19e30f5bf5b (I checked with her and she's happy to be listed.) Thanks, Eric From biopython at maubp.freeserve.co.uk Mon Apr 12 11:37:02 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 12 Apr 2010 16:37:02 +0100 Subject: [Biopython-dev] Another contributor for v1.54 In-Reply-To: References: Message-ID: On Mon, Apr 12, 2010 at 4:33 PM, Eric Talevich wrote: > Hello, > > I remembered one more contributor who I think should be mentioned with > the Biopython 1.54 release: Diana Jaunzeikare, who worked on phyloXML > support in BioRuby last summer parallel to my project, wrote a test > file called made_up.xml which we're using in the Bio.Phylo test suite. > http://github.com/biopython/biopython/commit/1a40c886757a7266ac8a0a74a31ca19e30f5bf5b > > (I checked with her and she's happy to be listed.) > > Thanks, > Eric Sure - a good test case is certainly a contribution worth crediting. Please add her to the NEWS file retrospectively, and the CONTRIB file. Peter From bugzilla-daemon at portal.open-bio.org Tue Apr 13 10:59:20 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 13 Apr 2010 10:59:20 -0400 Subject: [Biopython-dev] [Bug 3057] New: Incremental parsing in Bio.Emboss.PrimerSearch Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3057 Summary: Incremental parsing in Bio.Emboss.PrimerSearch Product: Biopython Version: 1.54b Platform: PC OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk The Bio.Emboss.PrimerSearch module has a single function "read" which loads and parses an entire output file from the EMBOSS tool primersearch into memory at once, returning what is essentially a dictionary keyed by primer name, with as values lists of amplimer information objects. Even though this still seems to work with "large" output files for thousands of primer pairs, I think it would be useful to provide an iterator function "parse" returning the amplimers for each primer. The current "read" function could be retained for backward compatibility. The parsing code itself could be extended to extract information like the forward and reverse primer sequences, where the hit (location and strand) and with how many mismatches. This information is currently all held in a long string. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From updates at feedmyinbox.com Wed Apr 14 02:12:50 2010 From: updates at feedmyinbox.com (Feed My Inbox) Date: Wed, 14 Apr 2010 02:12:50 -0400 Subject: [Biopython-dev] 4/14 BioStar - Biopython Questions Message-ID: ================================================== 1. extracting a subset of sequences from a FASTQ file (BioPython speed) ================================================== April 13, 2010 at 8:09 AM Initially my problem was to extract all entries from a FASTQ file with names not present in a FASTA file. Using biopython I wrote: from Bio.SeqIO.QualityIO import FastqGeneralIterator corrected_fn = "my_input_fasta.fas" uncorrected_fn = "my_input_fastq.ftq" output_fn = "differences_fastq.ftq" corrected_names = [] for line in open(corrected_fn): if line[0] == ">": read_name = line.split()[0][1:] corrected_names.append(read_name) input_fastq_fn = uncorrected_fn corrected_names.sort() handle = open(output_fn, "w") for title, seq, qual in FastqGeneralIterator(open(input_fastq_fn)) : if title not in corrected_names: handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual)) handle.close() Problem is, it is very slow. On 2Ghz workstation starting from a local disc it can take two days per pair of files: 4870868 seqs in FASTQ 4299464 seqs in FASTA Removing title from corrected_names speeds up things a bit (this version I used for running). Am I doing something obviously silly or simply FastqGeneralIterator is not a best construct to use here? While I like Python best, I am open to answers in Perl/Ruby. Slicing and dicing FASTQ files based on lists seems to be fairly common task. Edit: Python 2.6.4, biopython 1.53, Linux Fedora 8. Edit 2: corrected one line of code, see comment to giovanni code snippet taken from: http://news.open-bio.org/news/2009/09/biopython-fast-fastq/ http://biostar.stackexchange.com/questions/671/extracting-a-subset-of-sequences-from-a-fastq-file-biopython-speed -------------------------------------------------- =========================================================== Source: http://biostar.stackexchange.com/questions/tagged/biopython This email was sent to biopython-dev at lists.open-bio.org. Account Login: https://www.feedmyinbox.com/members/login/ Don't want to receive this feed any longer? Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/311791/6ca55937c6ac7ef56420a858404addee7b17d3e7/ ----------------------------------------------------------- This email was carefully delivered by FeedMyInbox.com. 230 Franklin Road Suite 814 Franklin, TN 37064 From p.j.a.cock at googlemail.com Wed Apr 14 10:36:31 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 14 Apr 2010 15:36:31 +0100 Subject: [Biopython-dev] Biopython at the SciPy 2010 conference in Texas? Message-ID: Hi team, Would any of you be interested in presenting a talk or tutorial on Biopython at the SciPy 2010 conference in Austin, Texas? http://conference.scipy.org/scipy2010/index.html This is quite close before BOSC/ISMB 2010 kicks off in Boston (I'm wondering if I can attend both from the UK - it would be a busy 2 and a half week trip!): http://www.open-bio.org/wiki/BOSC_2010 Peter ---------- Forwarded message ---------- From: Glen Otero Date: Wed, Apr 14, 2010 at 2:04 PM Subject: Re: [bip] SciPy 2010 To: Peter Cock Hi Peter- It would be great if someone from BioPython could come and present. People are suggesting tutorial topics and voting on them here:http://conference.scipy.org/scipy2010/tutorialsUV.html. ?Please submit BioPython as a tutorial topic if you get the chance. If a tutorial is selected, the presenter will receive $1000-$1500 that they can put towards travel and registration. Hope to see the BioPython project represented at SciPy this year! Best, Glen On Apr 14, 2010, at 2:26 AM, Peter Cock wrote: Hi Glen, SciPy 2010 sounds great - we might be able to find someone from Biopython to come and present, maybe even offer a tutorial. Would this be suitable? I'm tempted to volunteer myself but would need funding to attend (from the UK). http://biopython.org/ Peter On Sun, Apr 11, 2010 at 5:48 AM, Glen Otero wrote: Hello folks- SciPy 2010 is rapidly approaching and will be held in Austin, TX this year (http://conference.scipy.org/scipy2010/index.html). I'm chairing the bioinformatics/biomedical track (http://conference.scipy.org/scipy2010/papers.html) and welcome any presentation suggestions from list members. Hope to see you there! Thanks! Glen _______________________________________________ biology-in-python mailing list - bip at lists.idyll.org. See http://bio.scipy.org/ for our Wiki. From matzke at berkeley.edu Thu Apr 15 04:35:59 2010 From: matzke at berkeley.edu (Nick Matzke) Date: Thu, 15 Apr 2010 01:35:59 -0700 Subject: [Biopython-dev] Biopython devs at iEvoBio? In-Reply-To: <4BBC189F.1030401@student.otago.ac.nz> References: <20100406135438.95193kse6z41o5su@www.studentmail.otago.ac.nz> <4BBBD5D9.60504@berkeley.edu> <4BBC189F.1030401@student.otago.ac.nz> Message-ID: <4BC6CFEF.40604@berkeley.edu> Hi all, Thanks for the invite David, I just registered for a lightning talk at the iEvoBio conference site: ========== Lightning Talk: Biopython (bio) Geography Module Nicholas J. Matzke matzke at berkeley.edu Department of Integrative Biology University of California, Berkeley Abstract: For Google Summer of Code 2009/NESCENT Phyloinformatics Summer of Code 2009, I built a Geography module for Biopython. The purpose of the module is to search, download, and process biogeographical data from GBIF, much as Biopython currently accesses Genbank. Application of the tool to a historical biogeography study on bivalves will be illustrated. ========== See everyone there! PS I have family in Portland and used to work there so if anyone needs bar suggestions I might be able to help... Cheers, Nick david winter wrote: > Ok, > > So Nick will be there, Eric hopes not to be ;) > > I sent an email to the organizsng committee about different the talk > categories. Based on their reply, and the fact that both Nick and I are > going to be focused on our talks at the Evolution Meetings (ie, flying > halfway around the world to present a 12min talk) it seems the best way > to go would be have a lightening talk on each of the GSoC projects. > > Eric, presuming you go to BOSC and not iEvoBio I'll get in touch with > you at some stage with an outline of a talk and you can help me whip it > into shape. > > Cheers, > David > > On 4/7/2010 12:46 PM, Nick Matzke wrote: >> I'll be at the Evo meeting and the iEvoBio meeting. I'd be happy to >> give a talk as long as it's short -- I have to prioritize my research >> talk at the main meeting! >> >> Cheers! >> Nick >> >> >> David Winter wrote: >>> Hi again guys, >>> >>> I was wondering if anyone else is planning to go to iEvoBio >>> (http://ievobio.org/) in Portland in June.. > > -- ==================================================== Nicholas J. Matzke Ph.D. Candidate, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Graduate Student Instructor, IB200A Principles of Phylogenetics: Systematics http://ib.berkeley.edu/courses/ib200a/index.shtml Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: matzke at berkeley.edu Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 ----------------------------------------------------- "[W]hen people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together." Isaac Asimov (1989). "The Relativity of Wrong." The Skeptical Inquirer, 14(1), 35-44. Fall 1989. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm ==================================================== From chapmanb at 50mail.com Thu Apr 15 09:43:01 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 15 Apr 2010 09:43:01 -0400 Subject: [Biopython-dev] Biopython at the SciPy 2010 conference in Texas? In-Reply-To: References: Message-ID: <20100415134301.GM54921@sobchak.mgh.harvard.edu> Peter; > Would any of you be interested in presenting a talk or tutorial on > Biopython at the SciPy 2010 conference in Austin, Texas? > http://conference.scipy.org/scipy2010/index.html > > This is quite close before BOSC/ISMB 2010 kicks off in Boston > (I'm wondering if I can attend both from the UK - it would be a > busy 2 and a half week trip!): > http://www.open-bio.org/wiki/BOSC_2010 I wanted to go to SciPy this year but the timing is terrible for me with respect to BOSC. It would be really nice to have a representative there if you or anyone else is keen. Connecting with that community would be really useful since their interests definitely align; check out the top tutorial ideas: - Distributed and multi-core computing - Large data set handling - Building web-based tools Now I'm especially sad I can't make it. Hope the timing and location works for someone, Brad > ---------- Forwarded message ---------- > From: Glen Otero > Date: Wed, Apr 14, 2010 at 2:04 PM > Subject: Re: [bip] SciPy 2010 > To: Peter Cock > > > Hi Peter- > It would be great if someone from BioPython could come and present. > People are suggesting tutorial topics and voting on them > here:http://conference.scipy.org/scipy2010/tutorialsUV.html. ?Please > submit BioPython as a tutorial topic if you get the chance. If a > tutorial is selected, the presenter will receive $1000-$1500 that they > can put towards travel and registration. > Hope to see the BioPython project represented at SciPy this year! > Best, > Glen > On Apr 14, 2010, at 2:26 AM, Peter Cock wrote: > > Hi Glen, > > SciPy 2010 sounds great - we might be able to find someone from Biopython > to come and present, maybe even offer a tutorial. Would this be suitable? I'm > tempted to volunteer myself but would need funding to attend (from the UK). > http://biopython.org/ > > Peter > > On Sun, Apr 11, 2010 at 5:48 AM, Glen Otero wrote: > > Hello folks- > > SciPy 2010 is rapidly approaching and will be held in Austin, TX this > year (http://conference.scipy.org/scipy2010/index.html). I'm chairing > the bioinformatics/biomedical track > (http://conference.scipy.org/scipy2010/papers.html) and welcome any > presentation suggestions from list members. > > Hope to see you there! > > Thanks! > > Glen > > _______________________________________________ > > biology-in-python mailing list - bip at lists.idyll.org. > > See http://bio.scipy.org/ for our Wiki. > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Thu Apr 15 11:03:02 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 15 Apr 2010 16:03:02 +0100 Subject: [Biopython-dev] Draft abstract for BOSC 2010 Biopython Project Update Message-ID: Hi all, I should have circulated this earlier, but here is a draft abstract for a "Biopython Project Update" talk at BOSC 2010, to be submitted *today*. http://www.open-bio.org/wiki/BOSC_2010 I'm hoping to attend BOSC again this year and give the talk, but haven't sorted out the finances - Brad has offered to present if I can't go, hence the talk author list. If anyone else wants to help with slides etc (or as a standby speaker) please let me know. This is based on the abstract from last year, included in this PDF: http://www.open-bio.org/w/images/c/c7/BOSC2009_program_20090601.pdf In the PDF version of the abstract I've made the logo smaller this time ;) Comments welcome, Thanks, Peter -- Biopython Project Update Peter Cock, Brad Chapman In this talk we present the current status of the Biopython project (www.biopython.org), described in a application note published last year (Cock et al., 2009). Biopython celebrated its 10th Birthday last year, and has now been cited or referred to in over 150 scientific publications (a list is included on our website). At the end of 2009, following an extended evaluation period, Biopython successfully migrated from using CVS for source code control to using git, hosted on github.com. This has helped our existing developers to work and test new features on publicly viewable branches before being merged, and has also encouraged new contributors to work on additions or improvements. Currently about fifty people have their own Biopython repository on GitHub. In summer 2009 we had two Google Summer of Code (GSoC) project students working on phylogenetic code for Biopython in conjunction with the National Evolutionary Synthesis Center (NESCent). Eric Talevich?s work on phylogenetic trees including phyloXML support (Han and Zamesk, 2009) was merged and included with Biopython 1.54, and he continues to be actively involved with Biopython. We hope to include Nick Matzke?s module for biogeographical data from the Global Biodiversity Information Facility (GBIF) later this year. For summer 2010 we have Biopython related GSoC projects submitted via both NESCent and the Open Bioinformatics Foundation (OBF), and hope to have students working on Biopython once again. Since BOSC 2009, Biopython has seen four releases. Biopython 1.51 (August 2009) was an important milestone in dropping support for Python 2.3 and our legacy parsing infra-structure (Martel/Mindy), but was most noteworthy for FASTQ support (Cock et al., 2010). Biopython 1.52 (September 2009) introduced indexing of most sequence file formats for random access, and made interconverting sequence and alignment files easier. Biopython 1.53 (December 2009) included wrappers for the new NCBI BLAST+ command line tools, and much improved support for running under Jython. Our latest release is Biopython 1.54 (April/May 2010), new features include Bio.Phylo for phylogenetic trees (GSoC project), and support for Standard Flowgram Format (SFF) files used for 454 Life Sciences (Roche) sequencing. Biopython is free open source software available from www.biopython.org under the Biopython License Agreement (an MIT style license, http://www.biopython.org/DIST/LICENSE). References Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M.J. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. doi:10.1093/bioinformatics/btp163 Han, M.V. and Zmasek, C.M. (2009) phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics 10:356. doi:10.1186/1471-2105-10-356 Cock, P.J.A., Fields, C.J., Goto N., Heuer, M.L., and Rice, P.M. (2010) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 38(6) 1767-71. doi:10.1093/nar/gkp1137 From p.j.a.cock at googlemail.com Fri Apr 16 09:55:35 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 16 Apr 2010 14:55:35 +0100 Subject: [Biopython-dev] Biopython at the SciPy 2010 conference in Texas? In-Reply-To: <20100415134301.GM54921@sobchak.mgh.harvard.edu> References: <20100415134301.GM54921@sobchak.mgh.harvard.edu> Message-ID: On Thu, Apr 15, 2010 at 2:43 PM, Brad Chapman wrote: > > I wanted to go to SciPy this year but the timing is terrible for me with > respect to BOSC. It would be really nice to have a representative there > if you or anyone else is keen. Connecting with that community would > be really useful since their interests definitely align; check out > the top tutorial ideas: > > - Distributed and multi-core computing > - Large data set handling > - Building web-based tools > > Now I'm especially sad I can't make it. Hope the timing and location > works for someone, > Brad > I've put up Biopython as a tutorial topic suggestion which people can vote on, and will look further at the logistics of attending. http://conference.scipy.org/scipy2010/tutorialsUV.html Also note the call for papers deadline is April 25 for the specialist tracks (we'd fall under Biomedical/bioinformatics). Peter From kanzure at gmail.com Fri Apr 16 10:42:09 2010 From: kanzure at gmail.com (Bryan Bishop) Date: Fri, 16 Apr 2010 09:42:09 -0500 Subject: [Biopython-dev] Biopython at the SciPy 2010 conference in Texas? In-Reply-To: References: <20100415134301.GM54921@sobchak.mgh.harvard.edu> Message-ID: On Fri, Apr 16, 2010 at 8:55 AM, Peter Cockwrote: > I've put up Biopython as a tutorial topic suggestion which people can > vote on, and will look further at the logistics of attending. > http://conference.scipy.org/scipy2010/tutorialsUV.html I live in Austin, Texas and am presenting a few different python projects (like sympy, pythonOCC and okfn/datapkg). Many wonderful projects wouldn't otherwise have a presence at scipy2010.. and they certainly should! So, someone has to do it. - Bryan http://heybryan.org/ 1 512 203 0507 From p.j.a.cock at googlemail.com Fri Apr 16 10:51:58 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 16 Apr 2010 15:51:58 +0100 Subject: [Biopython-dev] Biopython at the SciPy 2010 conference in Texas? In-Reply-To: References: <20100415134301.GM54921@sobchak.mgh.harvard.edu> Message-ID: On Fri, Apr 16, 2010 at 3:42 PM, Bryan Bishop wrote: > On Fri, Apr 16, 2010 at 8:55 AM, Peter Cockwrote: >> I've put up Biopython as a tutorial topic suggestion which people can >> vote on, and will look further at the logistics of attending. >> http://conference.scipy.org/scipy2010/tutorialsUV.html > > I live in Austin, Texas and am presenting a few different python > projects (like sympy, pythonOCC and okfn/datapkg). Many wonderful > projects wouldn't otherwise have a presence at scipy2010.. and they > certainly should! So, someone has to do it. > > - Bryan Excellent, and very public spirited of you :) Peter From bugzilla-daemon at portal.open-bio.org Fri Apr 16 17:13:19 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 16 Apr 2010 17:13:19 -0400 Subject: [Biopython-dev] [Bug 2951] PDBParser assigns model 0 to first model no matter what... In-Reply-To: Message-ID: <201004162113.o3GLDJW0005115@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2951 kamil at kamilkisiel.net changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kamil at kamilkisiel.net ------- Comment #3 from kamil at kamilkisiel.net 2010-04-16 17:13 EST ------- I don't really see the utility of having the id field be anything else other than the actual model ID as reported in the PDB file. Typically looping in Python isn't done based on using sequence indices. It's fine if the models are in indices 0 to n-1 in the child_list member of a structure, but I think their ID member should still reflect the actual model identifier. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Apr 16 17:28:28 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 16 Apr 2010 17:28:28 -0400 Subject: [Biopython-dev] [Bug 2950] Bio.PDBIO.save writes MODEL records without model id In-Reply-To: Message-ID: <201004162128.o3GLSS01005451@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2950 kamil at kamilkisiel.net changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kamil at kamilkisiel.net -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From updates at feedmyinbox.com Sat Apr 17 02:12:26 2010 From: updates at feedmyinbox.com (Feed My Inbox) Date: Sat, 17 Apr 2010 02:12:26 -0400 Subject: [Biopython-dev] 4/17 BioStar - Biopython Questions Message-ID: <085750679879c59b64a2f0e534328b64@74.63.51.88> ================================================== 1. Does Biopython parse blast -m8 or -m9 (aka blasttable)? ================================================== April 16, 2010 at 10:54 AM I am getting the following error when I try -m9 File "parseBlast.biopython.py", line 5, in ? blast_record = blast_parser.parse(result_handle) File "/src/biopython-1.52/build/lib.linux-x86_64-2.4/Bio/Blast/NCBIStandalone.py", line 763, in parse self._scanner.feed(handle, self._consumer) File "/src/biopython-1.52/build/lib.linux-x86_64-2.4/Bio/Blast/NCBIStandalone.py", line 96, in feed self._scan_header(uhandle, consumer) File "/src/biopython-1.52/build/lib.linux-x86_64-2.4/Bio/Blast/NCBIStandalone.py", line 213, in _scan_header raise ValueError("Invalid header?") ValueError: Invalid header? http://biostar.stackexchange.com/questions/760/does-biopython-parse-blast-m8-or-m9-aka-blasttable -------------------------------------------------- =========================================================== Source: http://biostar.stackexchange.com/questions/tagged/biopython This email was sent to biopython-dev at lists.open-bio.org. Account Login: https://www.feedmyinbox.com/members/login/ Don't want to receive this feed any longer? Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/311791/6ca55937c6ac7ef56420a858404addee7b17d3e7/ ----------------------------------------------------------- This email was carefully delivered by FeedMyInbox.com. 230 Franklin Road Suite 814 Franklin, TN 37064 From eric.talevich at gmail.com Sat Apr 17 09:35:57 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 17 Apr 2010 09:35:57 -0400 Subject: [Biopython-dev] Bio.Phylo: the home stretch Message-ID: Hi all, There are two more decisions in Bio.Phylo that I'd like to settle on before the release of Biopython 1.54. They're holding open Bug 3045: http://bugzilla.open-bio.org/show_bug.cgi?id=3045 1. *Do we need a get_all_clades() method on trees and clades?* Bio.Nexus has get_terminals(); I added the same to Bio.Phylo early on, and then get_nonterminals() to satisfy some demand for the opposite method: def get_terminals(self, order='preorder'): """Get a list of all of this tree's terminal (leaf) nodes.""" return list(self.find_clades(terminal=True, order=order)) def get_nonterminals(self, order='preorder'): """Get a list of all of this tree's nonterminal (internal) nodes.""" return list(self.find_clades(terminal=False, order=order)) They're both trivial, but the idea is to make the module easy to jump into without reading the docs first. (find_clades() is a generator function that several other functions use internally; to do useful things in Bio.Phylo you still need to learn how to use it eventually.) So (a) do we need yet another sugar function that retrieves all tree nodes, both internal and external? (b) if so, what should it be called? The implementation would be: list(self.find_clades(order=order)) Also accomplished as: tree.get_terminals() + tree.get_nonterminals() 2. *Rename find_clades() to find(), or something else?* I've previously renamed: find() => find_any() -- given the same parameters as find_clades(), return the first match found, or else None (useful in an if statement) find_all() => find_elements() -- phyloXML trees have some complex objects as tree attributes, containing other objects. This function searches for those directly, and for trees without such attributes (e.g. all Newick trees), this happens to be the same as find_clades() So: find_clades() can search inside complex objects attached to trees, but yields the corresponding clade object rather than the non-clade element itself. This lets you search clades by e.g. clade.taxonomy.scientific_name, or clade.sequence.type. It should be the first "find_*" function users reach for. Should we give it a shorter name to encourage that, and shorten the code that uses it? Here's a first crack at documentation: http://github.com/etal/biopython/commit/8056a198804a08e3e03ac943c45744ad020dd53f Thanks, Eric From bugzilla-daemon at portal.open-bio.org Mon Apr 19 12:52:52 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 19 Apr 2010 12:52:52 -0400 Subject: [Biopython-dev] [Bug 3059] New: PDBContructionException should be PDBConstructionException Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3059 Summary: PDBContructionException should be PDBConstructionException Product: Biopython Version: 1.54b Platform: PC URL: http://github.com/biopython/biopython/commit/dead6ab7704 abc760d3bd13f09f8036d75e7516b OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: kamil at kamilkisiel.net I noticed that part of this code was fixed, but obviously nobody verified the fix at all because there is an obvious typo in the name of the Exception type. The name is "PDBConstructionException" (note the s, which is missing in the code...) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Apr 19 18:28:16 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 19 Apr 2010 18:28:16 -0400 Subject: [Biopython-dev] [Bug 3059] PDBContructionException should be PDBConstructionException In-Reply-To: Message-ID: <201004192228.o3JMSGe2005614@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3059 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-19 18:28 EST ------- Good point - I thought I'd rerun pylint and it was happy. Odd. I've fixed it properly and added a unit test for this now as well: http://github.com/biopython/biopython/commit/ed22f3ac17d910cf1956c2be1a9aec9f6e3125a4 Thanks. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 21 12:07:51 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 Apr 2010 12:07:51 -0400 Subject: [Biopython-dev] [Bug 3060] New: Add ungap method to the SeqRecord? Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3060 Summary: Add ungap method to the SeqRecord? Product: Biopython Version: 1.54b Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Biopython 1.53 added an ungap method to the Seq object. This is a possible enhancement request to add a matching ungap method to the SeqRecord object, where the per-letter-annotation and features should be adjusted to match. My motivating example is to take an ACE file loaded with SeqIO, remove the gaps, and output the contigs as FASTQ or QUAL files. This requires the per-letter-annotation to be sliced to match the ungapped sequence. Likewise any features fully contained within ungapped regions should be retained and their co-ordinates shifted. I'm not sure if we should do anything about features spanning a gap - the simple option which I have implemented is they are lost. This is done via the existing SeqRecord slicing and addition code. Patch to follow... See also Bug 3054 for adding upper and lower methods to the SeqRecord, and the broader discussion on Bug 2351 about strings, Seq and SeqRecord objects. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 21 12:09:01 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 Apr 2010 12:09:01 -0400 Subject: [Biopython-dev] [Bug 3060] Add ungap method to the SeqRecord? In-Reply-To: Message-ID: <201004211609.o3LG91oZ025848@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3060 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-21 12:09 EST ------- Created an attachment (id=1482) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1482&action=view) Patch to Bio/SeqRecord.py to add ungap method This includes a basic doctest, and some debug checks (assert statements) which could be removed after more testing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 22 09:56:48 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 22 Apr 2010 09:56:48 -0400 Subject: [Biopython-dev] [Bug 3062] New: GenBank/EMBL parser breaks when features have no qualifiers Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3062 Summary: GenBank/EMBL parser breaks when features have no qualifiers Product: Biopython Version: 1.54b Platform: All OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: laserson at mit.edu CC: laserson at mit.edu I am trying to use the EMBL parser to parse the IMGT/LIGM flatfile. Whenever there is a feature, the parser checks whether there are qualifiers in the feature with an assert statement, and does not allow features with no qualifiers. However, the EMBL specification does not require features to have qualifiers, and the IMGT flatfile is full of entries that have features with no qualifiers (only coordinates). The assertion error is tracked to an assert statement in Scanner.py at line 269. It appears that the assumption in the code is that there is an unquoted continuation of a feature qualifier, rather than a feature with no qualifiers. I am using biopython 1.51 that I built from source using python 2.5 (from an EPD install 4.3.0). I am on a Mac running OS X 10.5.8 (Leopard). Peter mentioned that the problem in the code is still present in the 1.54b release, and also in the repository. To reproduce the problem, the parser broke on the following record (the traceback is below as well): ID A03907 IMGT/LIGM annotation : keyword level; unassigned DNA; HUM; 412 BP. XX AC A03907; XX DT 11-MAR-1998 (Rel. 8, arrived in LIGM-DB ) DT 10-JUN-2008 (Rel. 200824-2, Last updated, Version 3) XX DE H.sapiens antibody D1.3 variable region protein ; DE unassigned DNA; rearranged configuration; Ig-Heavy; regular; group IGHV. XX KW antigen receptor; Immunoglobulin superfamily (IgSF); KW Immunoglobulin (IG); IG-Heavy; variable; diversity; joining; KW rearranged. XX OS Homo sapiens (human) OC cellular organisms; Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; OC Bilateria; Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; OC Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Tetrapoda; OC Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; Primates; OC Haplorrhini; Simiiformes; Catarrhini; Hominoidea; Hominidae; OC Homo/Pan/Gorilla group; Homo. XX RN [1] RP 1-412 RA ; RT "Recombinant antibodies and methods for their production."; RL Patent number EP0239400-A/10, 30-SEP-1987. RL MEDICAL RESEARCH COUNCIL. XX DR EMBL; A03907. XX FH Key Location/Qualifiers (from EMBL) FH FT source 1..412 FT /organism="Homo sapiens" FT /mol_type="unassigned DNA" FT /db_xref="taxon:9606" FT V_region 8..>412 FT /note="antibody D1.3 V region" FT sig_peptide 8..64 FT CDS 8..>412 FT /product="antibody D1.3 V region (VDJ)" FT /protein_id="CAA00308.1" FT /translation="MAVLALLFCLVTFPSCILSQVQLKESGPGLVAPSQSLSITCTVSG FT FSLTGYGVNWVRQPPGKGLEWLGMIWGDGNTDYNSALKSRLSISKDNSKSQVFLKMNSL FT HTDDTARYYCARERDYRLDYWGQGTTLTVSS" FT D_segment 356..371 FT J_segment 372..>412 FT /note="J(H)2 region" XX SQ Sequence 412 BP; 105 A; 109 C; 104 G; 94 T; 0 other; tcagagcatg gctgtcctgg cattactctt ctgcctggta acattcccaa gctgtatcct 60 ttcccaggtg cagctgaagg agtcaggacc tggcctggtg gcgccctcac agagcctgtc 120 catcacatgc accgtctcag ggttctcatt aaccggctat ggtgtaaact gggttcgcca 180 gcctccagga aagggtctgg agtggctggg aatgatttgg ggtgatggaa acacagacta 240 taattcagct ctcaaatcca gactgagcat cagcaaggac aactccaaga gccaagtttt 300 cttaaaaatg aacagtctgc acactgatga cacagccagg tactactgtg ccagagagag 360 agattatagg cttgactact ggggccaagg caccactctc acagtctcct ca 412 // And the traceback was: ERROR: An unexpected error occurred while tokenizing input The following traceback may be corrupted or invalid The error message is: ('EOF in multi-line statement', (311, 0)) --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) /Volumes/External/home/laserson/research/church/vdj-ome/ref-data/IMGT/ in () /Library/Frameworks/Python.framework/Versions/4.3.0/lib/python2.5/site-packages/Bio/GenBank/Scanner.pyc in parse_records(self, handle, do_features) 418 #This is a generator function 419 while True : --> 420 record = self.parse(handle, do_features) 421 if record is None : break 422 assert record.id is not None /Library/Frameworks/Python.framework/Versions/4.3.0/lib/python2.5/site-packages/Bio/GenBank/Scanner.pyc in parse(self, handle, do_features) 401 feature_cleaner = FeatureValueCleaner()) 402 --> 403 if self.feed(handle, consumer, do_features) : 404 return consumer.data 405 else : /Library/Frameworks/Python.framework/Versions/4.3.0/lib/python2.5/site-packages/Bio/GenBank/Scanner.pyc in feed(self, handle, consumer, do_features) 373 #Features (common to both EMBL and GenBank): 374 if do_features : --> 375 self._feed_feature_table(consumer, self.parse_features(skip=False)) 376 else : 377 self.parse_features(skip=True) # ignore the data /Library/Frameworks/Python.framework/Versions/4.3.0/lib/python2.5/site-packages/Bio/GenBank/Scanner.pyc in parse_features(self, skip) 170 feature_lines.append(line[self.FEATURE_QUALIFIER_INDENT:].rstrip()) 171 line = self.handle.readline() --> 172 features.append(self.parse_feature(feature_key, feature_lines)) 173 self.line = line 174 return features /Library/Frameworks/Python.framework/Versions/4.3.0/lib/python2.5/site-packages/Bio/GenBank/Scanner.pyc in parse_feature(self, feature_key, lines) 267 else : 268 #Unquoted continuation --> 269 assert len(qualifiers) > 0 270 assert key==qualifiers[-1][0] 271 #if debug : print "Unquoted Cont %s:%s" % (key, line) AssertionError: Thanks! Uri -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 22 10:00:12 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 22 Apr 2010 10:00:12 -0400 Subject: [Biopython-dev] [Bug 3062] GenBank/EMBL parser breaks when features have no qualifiers In-Reply-To: Message-ID: <201004221400.o3ME0C4b008129@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3062 ------- Comment #1 from laserson at mit.edu 2010-04-22 10:00 EST ------- Created an attachment (id=1483) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1483&action=view) IMGT record that breaks -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 22 10:00:53 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 22 Apr 2010 10:00:53 -0400 Subject: [Biopython-dev] [Bug 3062] GenBank/EMBL parser breaks when features have no qualifiers In-Reply-To: Message-ID: <201004221400.o3ME0r8H008158@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3062 ------- Comment #2 from laserson at mit.edu 2010-04-22 10:00 EST ------- I added a text file with the IMGT record that breaks, as pasting it into the description messed it up. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 22 10:05:07 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 22 Apr 2010 10:05:07 -0400 Subject: [Biopython-dev] [Bug 3062] GenBank/EMBL parser breaks when features have no qualifiers In-Reply-To: Message-ID: <201004221405.o3ME572A008381@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3062 laserson at mit.edu changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |major OS/Version|Mac OS |All -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Apr 23 07:16:48 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 Apr 2010 07:16:48 -0400 Subject: [Biopython-dev] [Bug 3009] Check the FASTA m10 alignment parser works with FASTA36 In-Reply-To: Message-ID: <201004231116.o3NBGmC1016610@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3009 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-23 07:16 EST ------- There seems to be a bug in FASTA 36.1.2 m10 output where the pg_optcut line lacks its leading semi-colon: The following example command lines should illustrate the problem, using the following input fasta files from the NCBI, both are relatively small with three and 180 sequences each: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_O157H7/NC_002127.faa ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Klebsiella_pneumoniae_MGH_78578/NC_009649.faa $ ~/Downloads/Software/fasta-36.2.1/bin/fasta36 -Q -H -E 1 -m 10 NC_002127.faa NC_009649.faa > stdout.txt $ more stdout.txt ... [cut] ... ; pg_name_alg: FASTA ; pg_ver_rel: 3.7 Mar 2010 ; pg_matrix: BL50 (15:-5) ; pg_open-ext: -10 -2 ; pg_ktup: 2 ; pg_join: 42 pg_optcut: 30 ; mp_extrap: 60000 500 ; mp_stats: (shuffled [500]) Expectation_n fit: rho(ln(x))= 5.1864+/-0.0116; mu= 5.3472+/- 0.598 mean_var=54.6263+/-12.288, 0's: 0 Z-trim: 0 B-trim: 0 in 0/33 Lambda= 0.173529 ; mp_KS: -0.0000 (N=0) at 20 ; mp_Algorithm: FASTA (3.7 Mar 2010) [optimized] ... [cut] This breaks the Bio.AlignIO parser. Manually editing the file to insert the semi colons seems to fix things. I have reported this issue on the FASTA mailing list today: https://list.mail.virginia.edu/mailman/listinfo/fasta_list -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From elipapa at mit.edu Sun Apr 25 19:09:51 2010 From: elipapa at mit.edu (Eli Papa) Date: Mon, 26 Apr 2010 00:09:51 +0100 Subject: [Biopython-dev] GFF parser bug? Message-ID: Hello, While trying to use the GFF parser I ran into a value error. I think it's probably due to one of the GFF3 fields in my file not being specified as 'key=value', but just as 'value'. Hope this helps, eli In [1]: from BCBio.GFF import GFFExaminer In [2]: import pprint In [3]: in_file = "V1.UC-9.scaftig.more500.gff" In [4]: examiner = GFFExaminer() In [5]: in_handle = open(in_file) In [6]: pprint.pprint(examiner.parent_child_map(in_handle)) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /data/elipapa/gutmetahit/SingleSample_GenePrediction/ in () /home/elipapa/lib/python/bcbio-0.1-py2.4.egg/BCBio/GFF/GFFParser.py in _file_or_handle_inside(*args, **kwargs) 705 in_handle = open(in_file) 706 args = (args[0], in_handle) + args[2:] --> 707 out = fn(*args, **kwargs) 708 if need_close: 709 in_handle.close() /home/elipapa/lib/python/bcbio-0.1-py2.4.egg/BCBio/GFF/GFFParser.py in parent_child_map(self, gff_handle) 789 if line.strip(): 790 line_type, line_info = _gff_line_map(line, --> 791 self._get_local_params())[0] 792 if (line_type == 'parent' or (line_type == 'child' and 793 line_info['id'])): /home/elipapa/lib/python/bcbio-0.1-py2.4.egg/BCBio/GFF/GFFParser.py in _gff_line_map(line, params) 158 # collect all of the base qualifiers for this item 159 if len(parts) > 8: --> 160 quals, is_gff2 = _split_keyvals(gff_parts[8]) 161 else: 162 quals, is_gff2 = dict(), False /home/elipapa/lib/python/bcbio-0.1-py2.4.egg/BCBio/GFF/GFFParser.py in _split_keyvals(keyval_str) 84 pieces.append(p.strip().split(" ")) 85 key_vals = [(p[0], " ".join(p[1:])) for p in pieces] ---> 86 for key, val in key_vals: 87 # remove quotes in GFF2 files 88 if (len(val) > 0 and val[0] == '"' and val[-1] == '"'): ValueError: need more than 1 value to unpack ******* The gff file is as follows: ##gff-version 3 ##sequence-region scaffold4215_3 1 6526 scaffold4215_3 glimmer gene 3 62 . - . ID=GL0000006;Name=GL0000006;Lack 3'-end; scaffold4215_3 glimmer mRNA 3 62 . - . ID=GL0000006;Name=GL0000006;Parent=GL0000006;Lack 3'-end; scaffold4215_3 glimmer CDS 3 62 2.84 - 0 Parent=GL0000006;Lack 3'-end; scaffold4215_3 glimmer gene 124 1983 . - . ID=GL0000007;Name=GL0000007;Complete; [...] From biopython at maubp.freeserve.co.uk Mon Apr 26 05:43:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 26 Apr 2010 10:43:53 +0100 Subject: [Biopython-dev] GFF parser bug? In-Reply-To: References: Message-ID: On Mon, Apr 26, 2010 at 12:09 AM, Eli Papa wrote: > Hello, > > While trying to use the GFF parser I ran into a value error. > > I think it's probably due to one of the GFF3 fields in my file not being > specified as 'key=value', but just as 'value'. > > Hope this helps, > eli > > ... > The gff file is as follows: > > ##gff-version 3 > ##sequence-region scaffold4215_3 1 6526 > scaffold4215_3 ?glimmer gene ? ?3 ? ? ? 62 ? ? ?. ? ? ? - ? ? ? . > ?ID=GL0000006;Name=GL0000006;Lack 3'-end; > scaffold4215_3 ?glimmer mRNA ? ?3 ? ? ? 62 ? ? ?. ? ? ? - ? ? ? . > ?ID=GL0000006;Name=GL0000006;Parent=GL0000006;Lack 3'-end; > scaffold4215_3 ?glimmer CDS ? ? 3 ? ? ? 62 ? ? ?2.84 ? ?- ? ? ? 0 > ?Parent=GL0000006;Lack 3'-end; > scaffold4215_3 ?glimmer gene ? ?124 ? ? 1983 ? ?. ? ? ? - ? ? ? . > ?ID=GL0000007;Name=GL0000007;Complete; > [...] Hi Eli, Where did this GFF3 file come from? The final column looks invalid to me (it should be a list of key=value; statements). The specification seems quite clear on this: http://www.sequenceontology.org/gff3.shtml Regards, Peter From biopython at maubp.freeserve.co.uk Mon Apr 26 06:59:25 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 26 Apr 2010 11:59:25 +0100 Subject: [Biopython-dev] Bio.Phylo: the home stretch In-Reply-To: References: Message-ID: On Sat, Apr 17, 2010 at 2:35 PM, Eric Talevich wrote: > Hi all, > > There are two more decisions in Bio.Phylo that I'd like to settle on before > the release of Biopython 1.54. They're holding open Bug 3045: > http://bugzilla.open-bio.org/show_bug.cgi?id=3045 Sorry I didn't get round to this last weel. > 1. *Do we need a get_all_clades() method on trees and clades?* > > Bio.Nexus has get_terminals(); I added the same to Bio.Phylo early on, and > then get_nonterminals() to satisfy some demand for the opposite method: > > ? ?def get_terminals(self, order='preorder'): > ? ? ? ?"""Get a list of all of this tree's terminal (leaf) nodes.""" > ? ? ? ?return list(self.find_clades(terminal=True, order=order)) > > ? ?def get_nonterminals(self, order='preorder'): > ? ? ? ?"""Get a list of all of this tree's nonterminal (internal) nodes.""" > ? ? ? ?return list(self.find_clades(terminal=False, order=order)) > > They're both trivial, but the idea is to make the module easy to jump into > without reading the docs first. (find_clades() is a generator function that > several other functions use internally; to do useful things in Bio.Phylo you > still need to learn how to use it eventually.) > > So (a) do we need yet another sugar function that retrieves all tree nodes, > both internal and external? (b) if so, what should it be called? > > The implementation would be: ? ?list(self.find_clades(order=order)) > Also accomplished as: ? ?tree.get_terminals() + tree.get_nonterminals() I'd say no, we don't need it. You can always add it later, but removing something from the API is complicated with deprecations etc. > 2. *Rename find_clades() to find(), or something else?* > > I've previously renamed: > > find() => find_any() > -- given the same parameters as find_clades(), return the first match found, > or else None (useful in an if statement) > > find_all() => find_elements() > -- phyloXML trees have some complex objects as tree attributes, containing > other objects. This function searches for those directly, and for trees > without such attributes (e.g. all Newick trees), this happens to be the same > as find_clades() > > So: find_clades() can search inside complex objects attached to trees, but > yields the corresponding clade object rather than the non-clade element > itself. This lets you search clades by e.g. clade.taxonomy.scientific_name, > or clade.sequence.type. It should be the first "find_*" function users reach > for. Should we give it a shorter name to encourage that, and shorten the > code that uses it? Hmm. I think find_clades() is sensible. > Here's a first crack at documentation: > http://github.com/etal/biopython/commit/8056a198804a08e3e03ac943c45744ad020dd53f There is a very short tree example in the Alignment chapter section on Clustalw using Bio.Nexus.Trees - we should just replace that with "See Chapter X" on loading and manipulating trees. Peter From chapmanb at 50mail.com Mon Apr 26 07:56:01 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 26 Apr 2010 07:56:01 -0400 Subject: [Biopython-dev] GFF parser bug? In-Reply-To: References: Message-ID: <20100426115601.GE58289@sobchak.mgh.harvard.edu> Eli; > While trying to use the GFF parser I ran into a value error. > > I think it's probably due to one of the GFF3 fields in my file not being > specified as 'key=value', but just as 'value'. Thanks for the report. Oh boy, that's a pretty bad file. In addition to the lack of a value you brought up, there is also a Parent/Child reference problem. The second line in the GFF you sent contains two issues: - A duplicate ID value for GL0000006. ID values are supposed to be unique in a file. - The Parent=GL0000006 should be a reference to the initial gene with that ID, but is also refers to itself. > scaffold4215_3 glimmer gene 3 62 . - . ID=GL0000006;Name=GL0000006;Lack 3'-end; > scaffold4215_3 glimmer mRNA 3 62 . - . ID=GL0000006;Name=GL0000006;Parent=GL0000006;Lack 3'-end; > scaffold4215_3 glimmer CDS 3 62 2.84 - 0 Parent=GL0000006;Lack 3'-end; > scaffold4215_3 glimmer gene 124 1983 . - . ID=GL0000007;Name=GL0000007;Complete; As Peter mentioned it would be useful to also file a bug with the writers of the software that are producing this. Bringing it in line with the spec will allow it to be more widely handled by other GFF parsers. You can get a fixed version of the GFF parser that gracefully handles these issues at: http://github.com/chapmanb/bcbb/tree/master/gff/ or apply the changes to GFFParser directly: http://github.com/chapmanb/bcbb/commit/c530dc1b7d1d6b8b4df211849f969adf4df80a67 Thanks much for the report. Let us know if you have any other issues, Brad From bugzilla-daemon at portal.open-bio.org Mon Apr 26 09:10:32 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 26 Apr 2010 09:10:32 -0400 Subject: [Biopython-dev] [Bug 3045] TreeMixin, please define enumerator and other convenience methods In-Reply-To: Message-ID: <201004261310.o3QDAWxg018128@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3045 eric.talevich at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #6 from eric.talevich at gmail.com 2010-04-26 09:10 EST ------- I think we've taken care of everything we planned to for this bug -- added get_nonterminals(), decided against get_all_clades(), and resolved to convert Joel's examples to cookbook entries at some point (not blocking the 1.54 release). Discussion: http://lists.open-bio.org/pipermail/biopython-dev/2010-April/007654.html So, I'm marking this fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From elipapa at mit.edu Mon Apr 26 13:37:11 2010 From: elipapa at mit.edu (Eli Papa) Date: Mon, 26 Apr 2010 18:37:11 +0100 Subject: [Biopython-dev] GFF parser bug? In-Reply-To: <20100426115601.GE58289@sobchak.mgh.harvard.edu> References: <20100426115601.GE58289@sobchak.mgh.harvard.edu> Message-ID: Hi Brad, Thanks for the quick reply! Hopefully, I'll be able to reciprocate in the future.. The fix appears to work flawlessy so far, but I'll let you know if it gives me other problems. Unfortunately I have no control over the GFF (it was released to the public as part of a published study). It's unfortunately not clear from the methods section whether they have employed Glimmer, MetaGene or some custom script to put the file together. When I'll have some extra time, I'll certainly test which of these programs is the culprit and let the author know about the non-standard output format. cheers, eli On Mon, Apr 26, 2010 at 12:56 PM, Brad Chapman wrote: > Eli; > >> While trying to use the GFF parser I ran into a value error. >> >> I think it's probably due to one of the GFF3 fields in my file not being >> specified as 'key=value', but just as 'value'. > > Thanks for the report. Oh boy, that's a pretty bad file. In addition > to the lack of a value you brought up, there is also a Parent/Child > reference problem. The second line in the GFF you sent contains two > issues: > > - A duplicate ID value for GL0000006. ID values are supposed to be > ?unique in a file. > - The Parent=GL0000006 should be a reference to the initial > ?gene with that ID, but is also refers to itself. > >> scaffold4215_3 ?glimmer gene ? ?3 ? ? ? 62 ? ? ?. ? ? ? - ? ? ? . ID=GL0000006;Name=GL0000006;Lack 3'-end; >> scaffold4215_3 ?glimmer mRNA ? ?3 ? ? ? 62 ? ? ?. ? ? ? - ? ? ? . ID=GL0000006;Name=GL0000006;Parent=GL0000006;Lack 3'-end; >> scaffold4215_3 ?glimmer CDS ? ? 3 ? ? ? 62 ? ? ?2.84 ? ?- ? ? ? 0 Parent=GL0000006;Lack 3'-end; >> scaffold4215_3 ?glimmer gene ? ?124 ? ? 1983 ? ?. ? ? ? - ? ? ? . ID=GL0000007;Name=GL0000007;Complete; > > As Peter mentioned it would be useful to also file a bug with the > writers of the software that are producing this. Bringing it in line > with the spec will allow it to be more widely handled by other GFF > parsers. > > You can get a fixed version of the GFF parser that gracefully > handles these issues at: > > http://github.com/chapmanb/bcbb/tree/master/gff/ > > or apply the changes to GFFParser directly: > > http://github.com/chapmanb/bcbb/commit/c530dc1b7d1d6b8b4df211849f969adf4df80a67 > > Thanks much for the report. Let us know if you have any other > issues, > Brad > From bugzilla-daemon at portal.open-bio.org Mon Apr 26 13:52:43 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 26 Apr 2010 13:52:43 -0400 Subject: [Biopython-dev] [Bug 3062] GenBank/EMBL parser breaks when features have no qualifiers In-Reply-To: Message-ID: <201004261752.o3QHqhgZ027348@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3062 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-26 13:52 EST ------- I've just tried the file on attachment 1843 on Mac and Linux and it parses fine (using the latest Biopython code). However, I was sure I was able to reproduce this earlier (on Linux), but I forget now where I got the example file from (this was before Uri uploaded this attachment). I've been using this (and variants of this): from Bio import SeqIO record = SeqIO.read(open("A03907.embl"), "embl") In any case, the assert check looks sensible - the method parse_feature should be given a single feature, so any error is happening further up - probably in the parse_features method. I'm confused right now. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From p.j.a.cock at googlemail.com Mon Apr 26 18:30:54 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Apr 2010 23:30:54 +0100 Subject: [Biopython-dev] Google Summer of Code - accepted students In-Reply-To: <4BD60D63.1040400@cornell.edu> References: <4BD60D63.1040400@cornell.edu> Message-ID: ---------- Forwarded message ---------- From: Robert Buels Date: Mon, Apr 26, 2010 at 11:02 PM Subject: Google Summer of Code - accepted students To: rmb32 at cornell.edu Hi all, I'm pleased to announce the acceptance of OBF's 2010 Google Summer of Code students, listed in alphabetical order with their project titles and primary mentors: Mark Chapman (PM Andreas Prlic) - Improvements to BioJava including Implementation of Multiple Sequence Alignment Algorithms Jianjiong Gao (PM Peter Rose) - BioJava Packages for Identification, Classification, and Visualization of Posttranslational Modification of Proteins Kazuhiro Hayashi (PM Naohisa Goto) - Ruby 1.9.2 support of BioRuby Sara Rayburn (PM Christian Zmasek) - Implementing Speciation & Duplication Inference Algorithm for Binary and Non-binary Species Tree Joao Pedro Garcia Lopes Maia Rodrigues (PM Eric Talevich) - Extending Bio.PDB: broadening the usefulness of BioPython's Structural Biology module Jun Yin (PM Chris Fields) - BioPerl Alignment Subsystem Refactoring Congratulations to our accepted students! All told, we had 52 applications submitted for the 6 slots (5 originally assigned, plus 1 extra) allotted to us by Google. Proposals were extremely competitive: 6 out of 52 translates to an 11.5% acceptance rate. ?We received a lot of really excellent proposals, the decisions were not easy. Thanks very much to all the students who applied, we very much appreciate your hard work. Here's to a great 2010 Summer of Code, I'm sure these students will do some wonderful work. Rob Buels OBF GSoC 2010 Administrator From bugzilla-daemon at portal.open-bio.org Mon Apr 26 19:44:15 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 26 Apr 2010 19:44:15 -0400 Subject: [Biopython-dev] [Bug 3062] GenBank/EMBL parser breaks when features have no qualifiers In-Reply-To: Message-ID: <201004262344.o3QNiFCr003594@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3062 ------- Comment #4 from laserson at mit.edu 2010-04-26 19:44 EST ------- I did something stupid, and uploaded the wrong IMGT record. I will upload the actual offending record. However, after stepping through the code with pdb, it appears that the problem with the offending record is that the feature qualifiers are indented too far, so that the whitespace is not fully stripped off. Has it ever been considered to parse the features by breaking the line with split(), instead of hardcoding the number of columns? While the official EMBL specification may hardcode the size of the fields, the parse may be more robust to such errors. (Though I understand the desire to conform exactly to EMBL standards). Eitherway, I will notify the curators of the IMGT database. (And see the attached file with the offending record.) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Apr 26 19:47:37 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 26 Apr 2010 19:47:37 -0400 Subject: [Biopython-dev] [Bug 3062] GenBank/EMBL parser breaks when features have no qualifiers In-Reply-To: Message-ID: <201004262347.o3QNlb34003630@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3062 laserson at mit.edu changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1483 is|0 |1 obsolete| | ------- Comment #5 from laserson at mit.edu 2010-04-26 19:47 EST ------- Created an attachment (id=1489) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1489&action=view) IMGT record that actually breaks. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Apr 26 20:14:59 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 26 Apr 2010 20:14:59 -0400 Subject: [Biopython-dev] [Bug 3062] GenBank/EMBL parser breaks when features have no qualifiers In-Reply-To: Message-ID: <201004270014.o3R0Exew004936@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3062 ------- Comment #6 from laserson at mit.edu 2010-04-26 20:14 EST ------- Alternatively, an additional lstrip() call for each line in lines in parse_feature() would probably also solve the problem. What are reasons not to do this? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 27 05:43:14 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 27 Apr 2010 05:43:14 -0400 Subject: [Biopython-dev] [Bug 3062] GenBank/EMBL parser breaks on over-indented features In-Reply-To: Message-ID: <201004270943.o3R9hEab020932@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3062 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED Summary|GenBank/EMBL parser breaks |GenBank/EMBL parser breaks |when features have no |on over-indented features |qualifiers | ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-27 05:43 EST ------- (In reply to comment #4) > I did something stupid, and uploaded the wrong IMGT record. I will upload the > actual offending record. However, after stepping through the code with pdb, > it appears that the problem with the offending record is that the feature > qualifiers are indented too far, so that the whitespace is not fully stripped > off. Thanks for checking and working out what was wrong. Yes, this file does indeed break. > Has it ever been considered to parse the features by breaking the line with > split(), instead of hardcoding the number of columns? While the official EMBL > specification may hardcode the size of the fields, the parse may be more > robust to such errors. (Though I understand the desire to conform exactly to > EMBL standards). Eitherway, I will notify the curators of the IMGT database. Please do contact the IMGT curators. (In reply to comment #6) > Alternatively, an additional lstrip() call for each line in lines in > parse_feature() would probably also solve the problem. What are reasons not > to do this? Trying to parse out-of-spec files is a potential nightmare. We do try and be tolerant of "quirks" in official NCBI or EMBL files (which are occasionally technically invalid), as long as such corrections look easy and unambiguous. In this particular case, we can cope with the extra indentation as you suggest by stripping any leading white space. Fixed in the repository: http://github.com/biopython/biopython/commit/73caa4072898e7d5a71d38138c9e053066f11b24 Thank you Uri, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 11:32:20 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 11:32:20 -0400 Subject: [Biopython-dev] [Bug 3066] New: Iterating/looping over colums/rows of a MultipleSeqAlignment Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3066 Summary: Iterating/looping over colums/rows of a MultipleSeqAlignment Product: Biopython Version: 1.54b Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk The new MultipleSeqAlignment object (like the old Alignment object it replaces) stores the rows of the alignment as SeqRecord objects. This means column based access is slow. It can often be useful to be able to iterate over the columns, and a dedicated method to do this should be faster than repeatedly accessing columns by index (either via slicing with __getitem__ or the old get_column method). A related question here is should the columns be returned as strings or as Seq objects? Possible implementation to follow as a patch... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 11:33:06 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 11:33:06 -0400 Subject: [Biopython-dev] [Bug 3066] Iterating/looping over colums/rows of a MultipleSeqAlignment In-Reply-To: Message-ID: <201004281533.o3SFX6r5007784@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3066 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-28 11:33 EST ------- Created an attachment (id=1490) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1490&action=view) PAtch to Bio/Align/__init__.py Possible solution using iterators returning strings. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 14:29:56 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 14:29:56 -0400 Subject: [Biopython-dev] [Bug 3066] Iterating/looping over colums/rows of a MultipleSeqAlignment In-Reply-To: Message-ID: <201004281829.o3SITu0x014523@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3066 ------- Comment #2 from eric.talevich at gmail.com 2010-04-28 14:29 EST ------- I don't mind having plain strings returned; Bio.Seq works well enough with them for me. Two things: 1. Is this implementation fast? It basically transposes the alignment as a list-of-lists, right? So: return zip(*self) or: from itertools import izip return (''.join(col) for col in izip(*self)) 2. On the topic of efficiency -- have you encountered a situation where having an alignment as a NumPy character array would have helped? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 15:32:43 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 15:32:43 -0400 Subject: [Biopython-dev] [Bug 3067] New: SPARK parser errors should be sent to stderr Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3067 Summary: SPARK parser errors should be sent to stderr Product: Biopython Version: 1.54b Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: laserson at mit.edu CC: laserson at mit.edu The SPARK code currently sends parsing errors to stdout. This makes it difficult to sort out legitimate output from error output. Attached is a patch that corrects this. It changes two output lines to send to sys.stderr. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 15:33:37 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 15:33:37 -0400 Subject: [Biopython-dev] [Bug 3067] SPARK parser errors should be sent to stderr In-Reply-To: Message-ID: <201004281933.o3SJXbkO016573@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3067 ------- Comment #1 from laserson at mit.edu 2010-04-28 15:33 EST ------- Created an attachment (id=1491) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1491&action=view) Patch to output error messages to stderr. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 17:30:43 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 17:30:43 -0400 Subject: [Biopython-dev] [Bug 3069] New: More robust feature parser for GenBank/EMBL records Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3069 Summary: More robust feature parser for GenBank/EMBL records Product: Biopython Version: 1.54b Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: laserson at mit.edu CC: laserson at mit.edu We recently made a modification to allow for over-indented features to be processed correctly (to handle IMGT records). However, this only works if the feature keys are within the INSDC established guidelines, which is not always the case with IMGT. This specifically causes a problem for the location lines of features. I will shortly upload a patch which corrects this problem, by processing only the first line of a feature using split(), rather than the hardcoded distances. Are there any objections to this? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 17:34:16 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 17:34:16 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004282134.o3SLYGri019342@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 ------- Comment #1 from laserson at mit.edu 2010-04-28 17:34 EST ------- Created an attachment (id=1492) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1492&action=view) Generalize the processing of location lines in a feature table. There are two methods in this patch. The main one (uncommented) is a two-line method that will work perfectly, I believe. The second method is commented out, and is an alternative one-line method to do it as well. However, it will replace all whitespace with single spaces, which has potential to change the content seen by the parser, though this is unlikely. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 19:25:37 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 19:25:37 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004282325.o3SNPb8L022634@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-28 19:25 EST ------- Hi Uri Could you attach a example input file showing the kind of invalid records you want to parse? Thanks Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 19:58:38 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 19:58:38 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004282358.o3SNwc72023624@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 ------- Comment #3 from laserson at mit.edu 2010-04-28 19:58 EST ------- Created an attachment (id=1493) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1493&action=view) IMGT record that fails with current repository version. The long feature key gets chopped up and messes up the location. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 20:35:11 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 20:35:11 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004290035.o3T0ZBM8024421@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 ------- Comment #4 from laserson at mit.edu 2010-04-28 20:35 EST ------- Actually, the record I attached fails, but it's not the worst-case scenario. Using the extended feature-key length, there are some keys that actually make it to the border of the qualifiers, so that they are contiguous. This means that the indentation must be hardcoded for IMGT just like anything else. In order to solve this problem once and for all, is the best approach to subclass the IndscScanner and put in values that make sense for IMGT? If so, then there is one more problem that needs to be addressed. About 80% of the records in IMGT conform to the EMBL format correctly, while about 20% have this over-indentation problem. Would it make more sense to go through the entire IMGT database and change each record to have the increased indentation? Then the subclassed Scanner would have no problem. The alternative is that for each record, the amount of indentation should be "discovered" and changed appropriately for each record. The parsing would then proceed as it currently does. Uri This leaves two options: 1) Go through each record in IMGT and enforce the longer indentation for each such record. (This shouldn't be too difficult). 2) Su -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 29 04:00:51 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Apr 2010 04:00:51 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004290800.o3T80pdl002051@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-29 04:00 EST ------- Rather than formalising this as a sub-format or EMBL variant I would rather encourage the IMGT to fix their file to follow the EMBL standard. Correcting the indentation shouldn't be too hard - the potential problem will be contracting too long feature keys that can't fit into the EMBL allocated field. It would also be interesting to see how BioPerl etc handle these out-of-spec files. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 29 06:33:26 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Apr 2010 06:33:26 -0400 Subject: [Biopython-dev] [Bug 3067] SPARK parser errors should be sent to stderr In-Reply-To: Message-ID: <201004291033.o3TAXQZI007883@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3067 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-29 06:33 EST ------- Patch cherry-picked from github, thanks. Note that the I plan to replace the spark based location parsing with something faster using regular expressions (see Bug 2738), at which point we can deprecate and then drop our copy of spark. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 29 07:02:42 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Apr 2010 07:02:42 -0400 Subject: [Biopython-dev] [Bug 3066] Iterating/looping over colums/rows of a MultipleSeqAlignment In-Reply-To: Message-ID: <201004291102.o3TB2gN2009204@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3066 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-29 07:02 EST ------- (In reply to comment #2) > Two things: > > 1. Is this implementation fast? It basically transposes the alignment as a > list-of-lists, right? So: > > return zip(*self) > > or: > > from itertools import izip > return (''.join(col) for col in izip(*self)) I haven't done any profiling yet - using itertools would be worth trying. > 2. On the topic of efficiency -- have you encountered a situation where > having an alignment as a NumPy character array would have helped? Not personally, but these iterators should facilitate creating a NumPy character array from our alignment object. I was also pondering adding an explicit "as_array" or "to_array" method which would require NumPy at runtime. However, I would rather keep the core of Biopython without any NumPy dependency. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 29 19:16:59 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Apr 2010 19:16:59 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004292316.o3TNGxie030251@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 ------- Comment #6 from laserson at mit.edu 2010-04-29 19:16 EST ------- Generally I agree with you. However, based on my knowledge of the people at IMGT, this is highly unlikely. From their perspective, they invested a very large amount of time into their ontology/database structure, and I don't think they'll really be prepared to shorten their feature keys to be in compliance with EMBL. I will try to cook up a parser for IMGT that integrates into biopython (but I can't guarantee success, as I'm not extremely familiar with the internals). I'll keep you posted. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Apr 30 00:52:45 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Apr 2010 00:52:45 -0400 Subject: [Biopython-dev] [Bug 3071] New: EMBL parser does not parse RP lines correctly. Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3071 Summary: EMBL parser does not parse RP lines correctly. Product: Biopython Version: 1.54b Platform: All OS/Version: All Status: NEW Severity: major Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: laserson at mit.edu CC: laserson at mit.edu The EMBL parser makes an incorrect assert statement at line 679 of Bio/GenBank/Scanner.py: elif line_type == 'RP': # Reformat reference numbers for the GenBank based consumer # e.g. '1-4639675' becomes '(bases 1 to 4639675)' assert data.count("-")==1 consumer.reference_bases("(bases " + data.replace("-", " to ") + ")") The EMBL specification states that there can be multiple ranges in this line: http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_10_3 This breaks at least one record in IMGT (which will be attached shortly). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Apr 30 00:53:42 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Apr 2010 00:53:42 -0400 Subject: [Biopython-dev] [Bug 3071] EMBL parser does not parse RP lines correctly. In-Reply-To: Message-ID: <201004300453.o3U4rgpG005244@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3071 ------- Comment #1 from laserson at mit.edu 2010-04-30 00:53 EST ------- Created an attachment (id=1496) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1496&action=view) IMGT/EMBL record that breaks because of RP parsing error -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Apr 30 03:46:49 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Apr 2010 03:46:49 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004300746.o3U7kngI009202@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-30 03:46 EST ------- (In reply to comment #6) > Generally I agree with you. However, based on my knowledge of the people at > IMGT, this is highly unlikely. From their perspective, they invested a very > large amount of time into their ontology/database structure, and I don't think > they'll really be prepared to shorten their feature keys to be in compliance > with EMBL. You're in a much better position to access this - but could you ask them about this anyway? They may at least clarify how they bend the EMBL specification. Do they have a preferred file format (e.g. XML)? > I will try to cook up a parser for IMGT that integrates into biopython (but I > can't guarantee success, as I'm not extremely familiar with the internals). > I'll keep you posted. How I would try this would be to write a new scanner subclassing the EMBL scanner in Bio/GenBank/Scanner.py (which probably only needs to override the feature parsing), and then new functions in Bio/SeqIO/InsdcIO.py to call it (matching the GenBank and EMBL functions), and define a new format name (mabye "embl-imgt") in the dictionary in Bio/SeqIO/__init__.py and done. However, if the only out-of-specification thing in the IMGT EMBL files is the feature indentation and long feature keys, many your original request to make the EMBL parser more tolerant is the best route. Thinking ahead would you also want to be able to write out IMGT variant EMBL files? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Apr 30 03:46:49 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Apr 2010 03:46:49 -0400 Subject: [Biopython-dev] [Bug 3071] EMBL parser does not parse RP lines correctly. In-Reply-To: Message-ID: <201004300746.o3U7kn2V009203@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3071 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-30 03:46 EST ------- Good point, thanks for posting an example too. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Apr 30 11:18:00 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Apr 2010 11:18:00 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004301518.o3UFI0UI022780@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 ------- Comment #8 from laserson at mit.edu 2010-04-30 11:17 EST ------- (In reply to comment #7) > You're in a much better position to access this - but could you ask them about > this anyway? They may at least clarify how they bend the EMBL specification. I am waiting to hear from them regarding all the changes compared with the EMBL spec. But I am not confident they are even sure. Part of the problem is the database was started over 20 years ago, so some older records may not have been updated properly. > Do they have a preferred file format (e.g. XML)? The only have a text file in their "EMBL" format. See here for all their download options: http://imgt.cines.fr/textes/IMGTdownloads.html > How I would try this would be to write a new scanner subclassing the EMBL > scanner in Bio/GenBank/Scanner.py (which probably only needs to override the > feature parsing), and then new functions in Bio/SeqIO/InsdcIO.py to call it > (matching the GenBank and EMBL functions), and define a new format name > (mabye "embl-imgt") in the dictionary in Bio/SeqIO/__init__.py and done. Done. I will upload the patch shortly. The code only reads the IMGT info. It does not write it. I can work on that as well, if you think it's prudent that every readable format should also be writable. > However, if the only out-of-specification thing in the IMGT EMBL files is the > feature indentation and long feature keys, many your original request to make > the EMBL parser more tolerant is the best route. I think it will actually be a headache to do so. Unless you want to rewrite the EMBL parser the way that I wrote the IMGT parser. The only thing that needed changing was handling the header lines. Once it finds an FH line, it uses the position of the "Location..." string to determine how indented the qualifiers are. > Thinking ahead would you also want to be able to write out IMGT variant EMBL > files? > I personally don't need this functionality, but I am willing to write it to complement the IMGT parser that I wrote. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Apr 30 11:23:45 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Apr 2010 11:23:45 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004301523.o3UFNjcn022994@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 laserson at mit.edu changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1492 is|0 |1 obsolete| | ------- Comment #9 from laserson at mit.edu 2010-04-30 11:23 EST ------- Created an attachment (id=1497) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1497&action=view) New IMGT parser object This patch changes four files. Most of it is in Bio/GenBank/Scanner.py, and then there are a few extra additions to integrate it into SeqIO (for parse() and index()). I still have not been able to run through the whole IMGT database with this parser but this is because of actual errors in the IMGT records (which I will report back to the IMGT curators), or because of other bugs that I have discovered in the EMBL parser (from which the IMGT parser is derived; e.g., Bug #3071). However, it does breeze through most of the IMGT records without a problem, and handles both the EMBL-indented, and the IMGT-over-indented records. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From p.j.a.cock at googlemail.com Thu Apr 1 09:45:29 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 1 Apr 2010 10:45:29 +0100 Subject: [Biopython-dev] pylint, was: Changes to the main repo In-Reply-To: <320fb6e01003230450h502adce0p27080d3a00ddda23@mail.gmail.com> References: <320fb6e01003220219u5f2020e1v6826a4e331ceb96d@mail.gmail.com> <320fb6e01003230450h502adce0p27080d3a00ddda23@mail.gmail.com> Message-ID: 2010/3/23 Peter Cock : > Hi Taigo, > > This is looking much better after your fixes last night - just one left: > > $ pylint --disable-msg-cat=CRW --include-ids=y > --disable-msg=E1101,E1103,E0102 -r n Bio.PopGen > No config file found, using default configuration > ************* Module Bio.PopGen.GenePop.Controller > E0602: 41:_read_allele_freq_table: Undefined variable 'self' > > Note if I turn off those particular error messages which in other > situations I had tentatively tagged as false positives, there could > be a few more issues: > > $ pylint --disable-msg-cat=CRW --include-ids=y -r n Bio.PopGen > ... Again, looking much better after yesterday's work: $ pylint --disable-msg-cat=CRW --include-ids=y -r n Bio.PopGen No config file found, using default configuration ************* Module Bio.PopGen.GenePop.EasyController E1101: 43:EasyController.test_hw_pop: Instance of 'GenePopController' has no 'test_pop_hz_prob' member ************* Module Bio.PopGen.GenePop.FileParser E1101:197:FileRecord.remove_population: Instance of 'FileRecord' has no 'populations' member E1101:206:FileRecord.remove_locus_by_position: Instance of 'FileRecord' has no 'populations' member The EasyController issue looks fairly simple, there are three test methods defined in Bio.PopGen.EasyPop.Controller * test_pop_hz_deficiency * test_pop_hz_excess * test_pop_hw_prob However, in Bio.PopGen.EasyPop.EasyController the method test_hw_pop tries to call these three methods of the controller: * test_pop_hz_deficiency * test_pop_hz_excess * test_pop_hz_prob It looks like an hw/hz typo - but as you use hw in other contexts, I am not 100% sure about this diagnosis. The second set of errors are in Bio.PopGen.GenePop.FileParser which is does look like self.populations is never defined. So again, this looks like pylint has found a real issue. Regards, Peter From tiagoantao at gmail.com Thu Apr 1 17:56:27 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 1 Apr 2010 18:56:27 +0100 Subject: [Biopython-dev] pylint, was: Changes to the main repo In-Reply-To: References: <320fb6e01003220219u5f2020e1v6826a4e331ceb96d@mail.gmail.com> <320fb6e01003230450h502adce0p27080d3a00ddda23@mail.gmail.com> Message-ID: > ************* Module Bio.PopGen.GenePop.FileParser > E1101:197:FileRecord.remove_population: Instance of 'FileRecord' has > no 'populations' member > E1101:206:FileRecord.remove_locus_by_position: Instance of > 'FileRecord' has no 'populations' member Oh gosh, this one I forgot to implement in the new parser. And it is going to be needed in some of the applications using this code. On 1.56. Tiago From tiagoantao at gmail.com Thu Apr 1 17:58:48 2010 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Thu, 1 Apr 2010 18:58:48 +0100 Subject: [Biopython-dev] pylint, was: Changes to the main repo In-Reply-To: References: <320fb6e01003220219u5f2020e1v6826a4e331ceb96d@mail.gmail.com> <320fb6e01003230450h502adce0p27080d3a00ddda23@mail.gmail.com> Message-ID: 2010/4/1 Tiago Ant?o : > Oh gosh, this one I forgot to implement in the new parser. And it is > going to be needed in some of the applications using this code. On > 1.56. Sorry for the commit log on github. I actually put one that was sensible, but it ended up with a merge message... -- "If you want to get laid, go to college. If you want an education, go to the library." - Frank Zappa From p.j.a.cock at googlemail.com Fri Apr 2 11:09:22 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 2 Apr 2010 12:09:22 +0100 Subject: [Biopython-dev] pylint, was: Changes to the main repo In-Reply-To: References: <320fb6e01003220219u5f2020e1v6826a4e331ceb96d@mail.gmail.com> <320fb6e01003230450h502adce0p27080d3a00ddda23@mail.gmail.com> Message-ID: 2010/4/1 Tiago Ant?o : > 2010/4/1 Tiago Ant?o : >> Oh gosh, this one I forgot to implement in the new parser. And it is >> going to be needed in some of the applications using this code. On >> 1.56. > > Sorry for the commit log on github. I actually put one that was > sensible, but it ended up with a merge message... > Its fine - there is a comment on the commit before the merge. It looks like there is one final grumble from pylint: $ pylint --disable-msg-cat=CRW --include-ids=y -r n Bio.PopGen No config file found, using default configuration ************* Module Bio.PopGen.GenePop.EasyController E1120: 43:EasyController.test_hw_pop: No value passed for parameter 'ext' in function call Peter From bugzilla-daemon at portal.open-bio.org Fri Apr 2 12:38:42 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 2 Apr 2010 08:38:42 -0400 Subject: [Biopython-dev] [Bug 3000] Could SeqIO.parse() store the whole, unparsed multiline entry? In-Reply-To: Message-ID: <201004021238.o32CcgqI022975@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3000 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-02 08:38 EST ------- (In reply to comment #3) > Created an attachment (id=1436) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1436&action=view) [details] > Adds a get_raw method to the dictionaries returned by Bio.SeqIO.index() > > Outline implementation of an alternative proposal, allowing access to the > raw text for each record via the Bio.SeqIO.index() dictionary like objects. > See discussion here: > http://lists.open-bio.org/pipermail/biopython-dev/2010-February/007301.html Following a positive discussion on the mailing list, I have just checked in an updated patch including FASTQ file support, unit tests and documentation. Right now the only indexed file format not supported by the get_raw method is SFF... which could be done with a little more more. Although this does not implement the original request ("Could SeqIO.parse() store the whole, unparsed multiline entry?"), it does allow the original use case to be solved neatly with Bio.SeqIO - so I'm marking this bug as fixed. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at 50mail.com Fri Apr 2 13:05:48 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 2 Apr 2010 09:05:48 -0400 Subject: [Biopython-dev] BOSC and OpenBio solution challenge reminder -- April 15th Message-ID: <20100402130548.GG36623@sobchak.mgh.harvard.edu> Hello all; A friendly reminder that the deadline for the Bioinformatics Open Source Conference (BOSC) is coming up on April 15th: http://www.open-bio.org/wiki/BOSC_2010 This is a great opportunity to discuss code and biology with fellow developers. One session which I'd like to emphasize is the OpenBio Solution Challenge, a section of talks that describes how to solve practical problems in bioinformatics using a variety of approaches: http://www.open-bio.org/wiki/SolutionChallenge Any toolkit developers who are interested in giving a talk are encouraged to submit an abstract for the challenge. We have some initial project ideas on the page and welcome your feedback for other useful workflows that would emphasize the advantages of using open source toolkits to solve biological problems. Please copy messages to the OpenBio mailing list as a central point for discussion and questions: http://lists.open-bio.org/mailman/listinfo/open-bio-l Looking forward to seeing everyone in July, Brad BOSC contact and dates: Date: July 9-10, 2010 Location: Boston, Massachusetts, USA BOSC 2010 web site: http://www.open-bio.org/wiki/BOSC_2010 Abstract submission via Open Conference System site: http://events.open-bio.org/BOSC2010/openconf.php E-mail: bosc at open-bio.org Bosc-announce list: http://lists.open-bio.org/mailman/listinfo/bosc-announce Important Dates April 15: Abstract deadline May 5: Notification of accepted abstracts May 28: Early Registration Discount Cut-off date July 8-9: Codefest 2010 July 9-10: BOSC 2010 August 15: Manuscript deadline for BOSC 2010 Proceedings published in BMC Bioinformatics From bugzilla-daemon at portal.open-bio.org Fri Apr 2 13:32:36 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 2 Apr 2010 09:32:36 -0400 Subject: [Biopython-dev] [Bug 3026] Bio.SeqIO.InsdcIO._split_multi_line(): Your description cannot be broken into nice lines! In-Reply-To: Message-ID: <201004021332.o32DWad3025175@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3026 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-02 09:32 EST ------- I've update the code to give a slightly more helpful error message (it now include the problem text). I think the proper fix might be to try and split long words (like URLs) on hyphens or slashes if they can't otherwise fit in the allowed space. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Apr 2 16:24:06 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 2 Apr 2010 17:24:06 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.54 beta Message-ID: Hi all, I'm going try and put out a Biopython 1.54 beta release now, so could people please not check in anything to the trunk. Hopefully we can do the release proper in a week or two... Peter From p.j.a.cock at googlemail.com Fri Apr 2 17:34:00 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 2 Apr 2010 18:34:00 +0100 Subject: [Biopython-dev] Biopython 1.54 beta released Message-ID: Dear all, A beta release for Biopython 1.54 is now available for download and testing, as announced here: http://news.open-bio.org/news/2009/06/biopython-154-beta-released/ Noted that I haven't done a fully detailed release announcement, we'll leave that for the official release. Source distributions and Windows installers are available from the downloads page on the Biopython website. http://biopython.org/wiki/Download We are interested in getting feedback on the beta release as a whole, but especially on the new features - including the updated multiple sequence alignment object (which is what you?ll now get when parsing alignments with Bio.AlignIO), the new Bio.Phylo module, and the Bio.SeqIO support for Standard Flowgram Format (SFF) files. (At least) 10 people contributed to this release (so far), which includes 4 new people: Anne Pajon (first contribution) Brad Chapman Christian Zmasek Eric Talevich Jose Blanca (first contribution) Kevin Jacobs (first contribution) Leighton Pritchard Michiel de Hoon Peter Cock Thomas Holder (first contribution) On behalf of the Biopython team, thank you for any feedback, bug reports, and contributions. Peter P.S. You may wish to subscribe to our news feed. For RSS links etc, see: http://biopython.org/wiki/News Biopython news is also on twitter: http://twitter.com/biopython From p.j.a.cock at googlemail.com Fri Apr 2 17:39:08 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 2 Apr 2010 18:39:08 +0100 Subject: [Biopython-dev] Biopython 1.54 beta released In-Reply-To: References: Message-ID: > Dear all, > > A beta release for Biopython 1.54 is now available for download > and testing, as announced here: > > http://news.open-bio.org/news/2009/06/biopython-154-beta-released/ > > Noted that I haven't done a fully detailed release announcement, > we'll leave that for the official release. That URL should have been: http://news.open-bio.org/news/2010/04/biopython-1-54-beta-released/ Sorry for the extra email, Peter From biopython at maubp.freeserve.co.uk Fri Apr 2 18:49:44 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 2 Apr 2010 19:49:44 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.54 beta In-Reply-To: References: Message-ID: On Fri, Apr 2, 2010 at 5:24 PM, Peter wrote: > Hi all, > > I'm going try and put out a Biopython 1.54 beta release now, > so could people please not check in anything to the trunk. > > Hopefully we can do the release proper in a week or two... > > Peter OK, so the beta is out there. Maybe this wasn't really needed, but I wanted to be a little cautious regarding the alignment changes (which *might* break something). We'll need to address any issues reported from that but in the meantime we can do more work on the documentation. In particular, I'd like to have a chapter on Bio.Phylo (which will likely be based heavily on Eric's wiki page). Eric - do you know any LaTeX? If not, don't worry too much. We can guide you though installing pdflatex and hevea for producing PDF and HTML, and then adding things to the tutorial should be fairly easy. Or, the simpler option (for you) would be to hand over plain text and a kind volunteer (maybe me) can handle the LaTeX markup. David - would you be able to write us a proper announcement for the final Biopython 1.54 release? We'll want to highlight the fact that Eric's Bio.Phylo module was from the GSoC 2009 (and link to this year's projects). Thanks all, and to those enjoying an Easter Break, have a nice holiday - not of this is urgent ;) Peter P.S. Note to self or anyone interested: Why did the source code tar ball and zip file jump in size by about 2MB? Was it just the accumulation of more code and more tests - or did I mess up? From bugzilla-daemon at portal.open-bio.org Sat Apr 3 05:58:24 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 3 Apr 2010 01:58:24 -0400 Subject: [Biopython-dev] [Bug 3026] Bio.SeqIO.InsdcIO._split_multi_line(): Your description cannot be broken into nice lines! In-Reply-To: Message-ID: <201004030558.o335wOXo020189@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3026 ------- Comment #4 from mmokrejs at ribosome.natur.cuni.cz 2010-04-03 01:58 EST ------- I do not know what I would like to happen here in addition to the improved error message. Probably not get an error at all and have biopython able to cope with these cases as well. I have just asked asimpson at ludwig.org.br whether fix of the data in dbEST would be feasible. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Sat Apr 3 13:00:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 3 Apr 2010 14:00:53 +0100 Subject: [Biopython-dev] epydoc formatting in Bio.Phylo Message-ID: Hi Eric, One of the tasks in building a release which I am only doing now is updating the API docs: http://biopython.org/wiki/Building_a_release Using epydoc can raise warnings about errors in the code (usually things like broken imports) which means it doubles as a code check. The code news is I don't see any such issues (beyond existing name shadowing which we are stuck with). However, this has flagged a few things in Bio.Phylo, most of which look like documentation formatting issues since you have explicitly stated you are using the epydoc markup with: __docformat__ = "epytext en" +------------------------------------------------------------------------------------------------------ | File /usr/local/lib/python2.6/dist-packages/Bio/Phylo/BaseTree.py, line 671, in | Bio.Phylo.BaseTree.Subtree | Warning: @param for unknown parameter "label" | +------------------------------------------------------------------------------------------------------ | File /usr/local/lib/python2.6/dist-packages/Bio/Phylo/BaseTree.py, line 507, in | Bio.Phylo.BaseTree.TreeMixin.prune | Warning: Line 513: Possible mal-formatted field item. | +------------------------------------------------------------------------------------------------------ | File /usr/local/lib/python2.6/dist-packages/Bio/Phylo/Newick.py, line 255, in | Bio.Phylo.Newick._TreeShim.root_with_outgroup | Warning: Line 258: Improper paragraph indentation. | +------------------------------------------------------------------------------------------------------ | File /usr/local/lib/python2.6/dist-packages/Bio/Phylo/PhyloXML.py, line 117, in | Bio.Phylo.PhyloXML.Phylogeny | Warning: @param for unknown parameter "clade" | +------------------------------------------------------------------------------------------------------ | File /usr/local/lib/python2.6/dist-packages/Bio/Phylo/_utils.py, line 224, in | Bio.Phylo._utils.draw_ascii | Warning: Lines 228, 229, 230, 231, 232, 233, 234: Improper paragraph indentation. | +------------------------------------------------------------------------------------------------------ | File /usr/local/lib/python2.6/dist-packages/Bio/Phylo/_utils.py, line 132, in | Bio.Phylo._utils.draw_graphviz | Warning: Lines 159, 164, 176, 179: Fields must be the final elements in an epytext string. | Warning: Line 179: Improper paragraph indentation. | Another to-do item before Biopython 1.54 final. Peter From eric.talevich at gmail.com Sat Apr 3 13:21:22 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 3 Apr 2010 09:21:22 -0400 Subject: [Biopython-dev] Trunk freeze for Biopython 1.54 beta In-Reply-To: References: Message-ID: Hi Peter, On Fri, Apr 2, 2010 at 2:49 PM, Peter wrote: > Eric - do you know any LaTeX? If not, don't worry too > much. We can guide you though installing pdflatex > and hevea for producing PDF and HTML, and then > adding things to the tutorial should be fairly easy. > Or, the simpler option (for you) would be to hand > over plain text and a kind volunteer (maybe me) > can handle the LaTeX markup. > Yes, I can do LaTeX. Would it be better to rework the wiki page (or a section of it) into something chapter-like first, or just add a draft of the new chapter into the main tutorial document right away? Also, did you have a list of specific topics/subsections I should cover? P.S. Note to self or anyone interested: Why did > the source code tar ball and zip file jump in size > by about 2MB? Was it just the accumulation of > more code and more tests - or did I mess up? > The example phyloXML files are kind of hefty, especially ncbi_taxonomy_mollusca.xml. If size increase is a problem, I can remove that file from the unit tests without substantial harm. -Eric From biopython at maubp.freeserve.co.uk Sat Apr 3 13:37:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 3 Apr 2010 14:37:57 +0100 Subject: [Biopython-dev] Trunk freeze for Biopython 1.54 beta In-Reply-To: References: Message-ID: On Sat, Apr 3, 2010 at 2:21 PM, Eric Talevich wrote: > Hi Peter, > > On Fri, Apr 2, 2010 at 2:49 PM, Peter wrote: > >> Eric - do you know any LaTeX? If not, don't worry too >> much. We can guide you though installing pdflatex >> and hevea for producing PDF and HTML, and then >> adding things to the tutorial should be fairly easy. >> Or, the simpler option (for you) would be to hand >> over plain text and a kind volunteer (maybe me) >> can handle the LaTeX markup. > > Yes, I can do LaTeX. Would it be better to rework the > wiki page (or a section of it) into something chapter-like > first, or just add a draft of the new chapter into the main > tutorial document right away? Up to you. Maybe start with polishing the wiki? The only tricky bit will be images (HTML vs PDF layout), but there are examples you can copy (e.g. the Graphics chapter). > Also, did you have a list of specific topics/subsections > I should cover? Well, the basics of reading, writing and converting trees from different formats. Then something on using the tree objects... I was going to suggest re-rooting a tree but as per the earlier thread, its a bit more complicated than I had expected. How about taking a tree, and coloring specific clades (to save the colors in XML output and/or a graphical output)? >> P.S. Note to self or anyone interested: Why did >> the source code tar ball and zip file jump in size >> by about 2MB? Was it just the accumulation of >> more code and more tests - or did I mess up? > > The example phyloXML files are kind of hefty, > especially ncbi_taxonomy_mollusca.xml. If size > increase is a problem, I can remove that file > from the unit tests without substantial harm. That is just a 66K zip file though, so it isn't that. Peter From eric.talevich at gmail.com Sun Apr 4 14:12:19 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 4 Apr 2010 10:12:19 -0400 Subject: [Biopython-dev] epydoc formatting in Bio.Phylo In-Reply-To: References: Message-ID: On Sat, Apr 3, 2010 at 9:00 AM, Peter wrote: > Hi Eric, > > One of the tasks in building a release which I am only doing now is > updating the API docs: > http://biopython.org/wiki/Building_a_release > > Using epydoc can raise warnings about errors in the code (usually > things like broken imports) which means it doubles as a code check. > The code news is I don't see any such issues (beyond existing name > shadowing which we are stuck with). > > However, this has flagged a few things in Bio.Phylo, most of which > look like documentation formatting issues since you have explicitly > stated you are using the epydoc markup [...] > OK, I fixed the docstrings for epydoc (and a few other things) and pushed to GitHub. It should be all right now. Thanks, Eric From eric.talevich at gmail.com Sun Apr 4 14:50:21 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 4 Apr 2010 10:50:21 -0400 Subject: [Biopython-dev] String representation of trees in Bio.Phylo Message-ID: Hi all, The new phylogenetics module Bio.Phylo supports a few new ways of displaying trees. I'm trying to decide which of these should be used as the informal string representation for whole trees, i.e. what happens when you type "print tree" for some newly parsed tree object. A Tree consists of some global information (e.g. rooted or not) plus nested lists of Subtrees, which Clade objects in PhyloXML inherit from. Currently, the Subtree __str__ method is treated as a label for a clade -- it's the clade's name, if available; in the absence of any other identifier it prints out the class name. Similarly, str(Tree) just prints out the tree's 'name' attribute, or "Tree"; this probably isn't what the user expects, though. Here are the options. To start the example, here's a tree parsed from phyloXML and displayed as a Newick tree: >>> from Bio import Phylo >>> tree = Phylo.parse('ex/phyloxml_examples.xml', 'phyloxml').next() >>> print tree.format('newick') ((A:0.10200,B:0.23000)0.00000:0.06000,C:0.40000)0.00:0.00000; The pretty_print function, with the show_all option, uses 'repr' recursively to display the tree's nodes. I think this is probably the best choice for Tree.__str__, but it can be a bit cluttered if a lot of information is attached to each node/subtree/clade. >>> Phylo.pretty_print(tree, show_all=True) Phylogeny(rooted='True', description='phyloXML allows to use either a "branch_length" attribute...', name='example from Prof. Joe Felsenstein's book "Inferring Phyl...') Clade() Clade(branch_length='0.06') Clade(branch_length='0.102', name='A') Clade(branch_length='0.23', name='B') Clade(branch_length='0.4', name='C') By default, pretty_print uses 'str' instead of 'repr', showing only class names and string representations (labels) to reduce the clutter: >>> Phylo.pretty_print(tree) Phylogeny: example from Prof. Joe Felsenstein's ... Clade: Clade Clade: Clade Clade: A Clade: B Clade: C Is this useful to anyone? If not, then I could drop this part of the pretty_print function entirely. As an alternative, we could print the tree as ASCII art, as some other toolkits do. However, this function is very limited -- it doesn't print internal node labels, and trees of more than a couple hundred nodes will look strange, since the drawing is compressed into a fixed number of character columns (default 80). >>> Phylo.draw_ascii(tree) __________________ A __________| _| |___________________________________________ B | |___________________________________________________________________________ C For reference, here's the raw phyloXML: >>> Phylo.write(tree, sys.stdout, 'phyloxml', indent=True) example from Prof. Joe Felsenstein's book "Inferring Phylogenies" phyloXML allows to use either a "branch_length" attribute or element to indicate branch lengths. 0.06 A 0.102 B 0.23 C 0.4 What do you think? Thanks, Eric From chapmanb at 50mail.com Sun Apr 4 17:19:03 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 4 Apr 2010 13:19:03 -0400 Subject: [Biopython-dev] String representation of trees in Bio.Phylo In-Reply-To: References: Message-ID: <20100404171903.GF19540@kunkel> Hi Eric; > The new phylogenetics module Bio.Phylo supports a few new ways of displaying > trees. I'm trying to decide which of these should be used as the informal > string representation for whole trees, i.e. what happens when you type > "print tree" for some newly parsed tree object. [...[ > The pretty_print function, with the show_all option, uses 'repr' recursively > to display the tree's nodes. I think this is probably the best choice for > Tree.__str__, but it can be a bit cluttered if a lot of information is > attached to each node/subtree/clade. > > >>> Phylo.pretty_print(tree, show_all=True) > Phylogeny(rooted='True', description='phyloXML allows to use either a > "branch_length" attribute...', name='example from Prof. Joe Felsenstein's > book "Inferring Phyl...') > Clade() > Clade(branch_length='0.06') > Clade(branch_length='0.102', name='A') > Clade(branch_length='0.23', name='B') > Clade(branch_length='0.4', name='C') I like this one. Agreed that it could get ugly, but I think it shows the structure and associated information well. > As an alternative, we could print the tree as ASCII art, as some other > toolkits do. However, this function is very limited -- it doesn't print > internal node labels, and trees of more than a couple hundred nodes will > look strange, since the drawing is compressed into a fixed number of > character columns (default 80). > > >>> Phylo.draw_ascii(tree) > __________________ A > __________| > _| |___________________________________________ B > | > |___________________________________________________________________________ C This is a good idea. I think this is more useful than the current pretty_print without show_all for getting a quick overview of the tree. Brad From chapmanb at 50mail.com Sun Apr 4 17:19:03 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 4 Apr 2010 13:19:03 -0400 Subject: [Biopython-dev] String representation of trees in Bio.Phylo In-Reply-To: References: Message-ID: <20100404171903.GF19540@kunkel> Hi Eric; > The new phylogenetics module Bio.Phylo supports a few new ways of displaying > trees. I'm trying to decide which of these should be used as the informal > string representation for whole trees, i.e. what happens when you type > "print tree" for some newly parsed tree object. [...[ > The pretty_print function, with the show_all option, uses 'repr' recursively > to display the tree's nodes. I think this is probably the best choice for > Tree.__str__, but it can be a bit cluttered if a lot of information is > attached to each node/subtree/clade. > > >>> Phylo.pretty_print(tree, show_all=True) > Phylogeny(rooted='True', description='phyloXML allows to use either a > "branch_length" attribute...', name='example from Prof. Joe Felsenstein's > book "Inferring Phyl...') > Clade() > Clade(branch_length='0.06') > Clade(branch_length='0.102', name='A') > Clade(branch_length='0.23', name='B') > Clade(branch_length='0.4', name='C') I like this one. Agreed that it could get ugly, but I think it shows the structure and associated information well. > As an alternative, we could print the tree as ASCII art, as some other > toolkits do. However, this function is very limited -- it doesn't print > internal node labels, and trees of more than a couple hundred nodes will > look strange, since the drawing is compressed into a fixed number of > character columns (default 80). > > >>> Phylo.draw_ascii(tree) > __________________ A > __________| > _| |___________________________________________ B > | > |___________________________________________________________________________ C This is a good idea. I think this is more useful than the current pretty_print without show_all for getting a quick overview of the tree. Brad From bugzilla-daemon at portal.open-bio.org Tue Apr 6 00:49:40 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Apr 2010 20:49:40 -0400 Subject: [Biopython-dev] [Bug 3042] New: test_Mafft_tool fails Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3042 Summary: test_Mafft_tool fails Product: Biopython Version: 1.54b Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: mdehoon at ims.u-tokyo.ac.jp This is the error message I get: ====================================================================== FAIL: Simple round-trip through app with infile. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Mafft_tool.py", line 56, in test_Mafft_simple self.assert_("STEP 2 / 2 d" in stderr_string) AssertionError ====================================================================== FAIL: Round-trip with complex command line. ---------------------------------------------------------------------- Traceback (most recent call last): File "test_Mafft_tool.py", line 126, in test_Mafft_with_complex_command_line self.assertEqual(return_code, 0) AssertionError: 1 != 0 This is with MAFFT version 5.732 (2005/09/14). The output it generates starts with: $ mafft Fasta/f002 blosum 62 ppenalty = -1530 poffset = -123 generating 200PAM scoring matrix for nucleotides ... done scoremtx = -1 Gap Penalty = -1.53, +0.00, -0.12 Making a distance matrix .. 1 / 3nknown character n done. Constructing dendrogram ... 0 / 3 done. Progressive alignment ... STEP 2 /2 done. Whereas the bug may disappear with newer versions of mafft, most Biopython users will not use mafft, and we should not require to have the latest version of mafft installed to avoid test errors. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 00:55:31 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 5 Apr 2010 20:55:31 -0400 Subject: [Biopython-dev] [Bug 3043] New: test_NCBI_BLAST_tools fails Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3043 Summary: test_NCBI_BLAST_tools fails Product: Biopython Version: 1.54b Platform: PC OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: mdehoon at ims.u-tokyo.ac.jp This is the error I get: ====================================================================== FAIL: Check all blastn arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 121, in test_blastn self.check("blastn", Applications.NcbiblastnCommandline) File "test_NCBI_BLAST_tools.py", line 109, in check "Wrapper is missing: " + ", ".join(sorted(missing))) AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, - verbose ====================================================================== FAIL: Check all blastp arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 117, in test_blastp self.check("blastp", Applications.NcbiblastpCommandline) File "test_NCBI_BLAST_tools.py", line 109, in check "Wrapper is missing: " + ", ".join(sorted(missing))) AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, - verbose ====================================================================== FAIL: Check all blastx arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 113, in test_blastx self.check("blastx", Applications.NcbiblastxCommandline) File "test_NCBI_BLAST_tools.py", line 109, in check "Wrapper is missing: " + ", ".join(sorted(missing))) AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, - verbose ====================================================================== FAIL: Check all psiblast arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 133, in test_psiblast self.check("psiblast", Applications.NcbipsiblastCommandline) File "test_NCBI_BLAST_tools.py", line 109, in check "Wrapper is missing: " + ", ".join(sorted(missing))) AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, - verbose ====================================================================== FAIL: Check all rpsblast arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 137, in test_rpsblast self.check("rpsblast", Applications.NcbirpsblastCommandline) File "test_NCBI_BLAST_tools.py", line 109, in check "Wrapper is missing: " + ", ".join(sorted(missing))) AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, - verbose ====================================================================== FAIL: Check all rpstblastn arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 141, in test_rpstblastn self.check("rpstblastn", Applications.NcbirpstblastnCommandline) File "test_NCBI_BLAST_tools.py", line 109, in check "Wrapper is missing: " + ", ".join(sorted(missing))) AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, - verbose ====================================================================== FAIL: Check all tblastn arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 129, in test_tblastn self.check("tblastn", Applications.NcbitblastnCommandline) File "test_NCBI_BLAST_tools.py", line 109, in check "Wrapper is missing: " + ", ".join(sorted(missing))) AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, - verbose ====================================================================== FAIL: Check all tblastx arguments are supported ---------------------------------------------------------------------- Traceback (most recent call last): File "test_NCBI_BLAST_tools.py", line 125, in test_tblastx self.check("tblastx", Applications.NcbitblastxCommandline) File "test_NCBI_BLAST_tools.py", line 109, in check "Wrapper is missing: " + ", ".join(sorted(missing))) AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, - verbose but actually these seem to be extra options rather than missing options: $ blastn -h USAGE blastn [-h] [-help] [-import_search_strategy filename] [-export_search_strategy filename] [-task task_name] [-db database_name] [-dbsize num_letters] [-gilist filename] [-negative_gilist filename] [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm] [-subject subject_input_file] [-subject_loc range] [-query input_file] [-out output_file] [-evalue evalue] [-word_size int_value] [-gapopen open_penalty] [-gapextend extend_penalty] [-perc_identity float_value] [-xdrop_ungap float_value] [-xdrop_gap float_value] [-xdrop_gap_final float_value] [-searchsp int_value] [-penalty penalty] [-reward reward] [-no_greedy] [-min_raw_gapped_score int_value] [-template_type type] [-template_length int_value] [-dust DUST_options] [-filtering_db filtering_database] [-window_masker_taxid window_masker_taxid] [-window_masker_db window_masker_db] [-soft_masking soft_masking] [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value] [-best_hit_score_edge float_value] [-window_size int_value] [-use_index boolean] [-index_name string] [-lcase_masking] [-query_loc range] [-strand strand] [-parse_deflines] [-outfmt format] [-show_gis] [-num_descriptions int_value] [-num_alignments int_value] [-html] [-max_target_seqs num_sequences] [-num_threads int_value] [-remote] [-verbose] [-remote_verbose] [-use_test_remote_service] [-version] DESCRIPTION Nucleotide-Nucleotide BLAST 2.2.22+ Use '-help' to print detailed descriptions of command line arguments In any case, probably there will be slight differences in the options used by different versions of Blast, and this shouldn't cause tests to fail. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From winda002 at student.otago.ac.nz Tue Apr 6 01:51:05 2010 From: winda002 at student.otago.ac.nz (David Winter) Date: Tue, 06 Apr 2010 13:51:05 +1200 Subject: [Biopython-dev] Draft Announcement for Biopython 1.54 Message-ID: <20100406135105.61102b7osdvos03t@www.studentmail.otago.ac.nz> Hi all, Here's a draft announcement for the next release, very happy to take corrections and suggestions on how to change it. I'll put a marked up version of this on the OBF server soon. Cheers, David -- Biopythn 1.54 Released The Biopython team is proud to announce Biopython 1.54, a new stable release of the Biopython library. Biopython 1.54 comes four months after our last release and brings new features, tweaks to some established functions and the usual collection of bug fixes. This is the first stable release to feature the new Bio.Phylo module which can be used to read, write and take data from phylogenetic trees in Newick, Nexus and PhyloXML formats. The module is the result of Erick Talevich's Google Summer of Code project which was supported by The National Evolutionary Synthesis Center (NESCent). Biopython now supports the reading, writing and indexing of Standard Flowgram Format (SFF) files produced in 454 sequencing. Jose Blanca (the brains behind the widely used sff_extract tool) has extended Bio.SeqIO to handle these files, making it possible to convert between SFF, FASTQ, FASTA and QUAL formats (as trimmed or untrimmed reads). As well as adding features the new release tweaks and extends some of the core modules: *Both Bio.SeqIO and Bio.AlignIO will accept filenames as well as handles, as detailed here *The multiple sequence alignment object that underlies Bio.AlignIO has been improved. *Bio.SeqIO can read and write EMBL nucleotide files. *The dictionary-like object returned Bio.SeqIO.index() has a new method "get_raw" that gets unparsed data from a file as a string. * Bio.Entrez includes some more DTD files, in particular eLink_090910.dtd, needed for our NCBI Entrez Utilities XML parser. Binaries and source files for Biopython 1.54 are available from the downloads page The documentation has been updated to include the changes made since our last release. A big thanks to every one who tested our beta release or submitted bugs since Biopython 1.53. And an especially big thanks to everyone who contributed to this release, including four first time contributors: * Anne Pajon (first contribution) * Brad Chapman * Christian Zmasek * Eric Talevich * Jose Blanca (first contribution) * Kevin Jacobs (first contribution) * Leighton Pritchard * Michiel de Hoon * Peter Cock * Thomas Holder (first contribution) From winda002 at student.otago.ac.nz Tue Apr 6 01:54:38 2010 From: winda002 at student.otago.ac.nz (David Winter) Date: Tue, 06 Apr 2010 13:54:38 +1200 Subject: [Biopython-dev] Biopython devs at iEvoBio? Message-ID: <20100406135438.95193kse6z41o5su@www.studentmail.otago.ac.nz> Hi again guys, I was wondering if anyone else is planning to go to iEvoBio (http://ievobio.org/) in Portland in June. The meeting is planned to be a phyloinformatics counterpart to BOSC and is going to be run alongside the big Evolution Meetings. It might be a good venue to show Erick and Nick's GSoC projects from last year. Obviously, if Eric or Nick are planning to be at the meeting then they should present their work, but if they aren't going to be there I'd be happy to present a short demo on some of the things those libraries can do and how they might be brought together with other Biopython tools to build some useful workflows. ( it might start to make up for how slack of I've been in this news contributor role!) At the moment I really just need to know if better qualified people will be there and, if not, if people think a demo is a good idea (the software demonstration sessions don't need an abstract anytime soon) Cheers, david From eric.talevich at gmail.com Tue Apr 6 04:07:44 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 6 Apr 2010 00:07:44 -0400 Subject: [Biopython-dev] Biopython devs at iEvoBio? In-Reply-To: <20100406135438.95193kse6z41o5su@www.studentmail.otago.ac.nz> References: <20100406135438.95193kse6z41o5su@www.studentmail.otago.ac.nz> Message-ID: Hi David, I'm planning to go to BOSC this summer, and I'm not sure if I'll be able to go to iEvoBio in addition to that. But I'd certainly appreciate it if you could demo the new hotness in Biopython 1.54. I'll let you know if the situation changes (e.g. if BOSC rejects my abstract). Cheers, Eric On Mon, Apr 5, 2010 at 9:54 PM, David Winter wrote: > Hi again guys, > > I was wondering if anyone else is planning to go to iEvoBio > (http://ievobio.org/) in Portland in June. The meeting is planned to be a > phyloinformatics counterpart to BOSC and is going to be run alongside the > big > Evolution Meetings. > > It might be a good venue to show Erick and Nick's GSoC projects from last > year. Obviously, if Eric or Nick are planning to be at the meeting then > they > should present their work, but if they aren't going to be there I'd be > happy to > present a short demo on some of the things those libraries can do and how > they might be brought together with other Biopython tools to build some > useful workflows. ( it might start to make up for how slack of I've been in > this news contributor role!) > > At the moment I really just need to know if better qualified people will > be > there and, if not, if people think a demo is a good idea (the software > demonstration sessions don't need an abstract anytime soon) > > Cheers, > david > > > From bugzilla-daemon at portal.open-bio.org Tue Apr 6 06:14:32 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 02:14:32 -0400 Subject: [Biopython-dev] [Bug 3042] test_Mafft_tool fails In-Reply-To: Message-ID: <201004060614.o366EWsk016896@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3042 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 02:14 EST ------- (In reply to comment #0) > This is with MAFFT version 5.732 (2005/09/14). The output it generates starts > with: > ... > Whereas the bug may disappear with newer versions of mafft, most Biopython > users will not use mafft, and we should not require to have the latest version > of mafft installed to avoid test errors. I think you are right this is due to your version of MAFFT. The lattest version is MAFFT 6.717, the first public 6.x release was back in 2007. MAFFT 5.732 from late 2005 is really *very* old, right at the bottom of the release history page: http://mafft.cbrc.jp/alignment/software/changelog.html Probably the best solution here is to detect the version number (perhaps by the date?), and skip the tests if it is too old (like test_Emboss.py does now). Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 06:22:08 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 02:22:08 -0400 Subject: [Biopython-dev] [Bug 3043] test_NCBI_BLAST_tools fails In-Reply-To: Message-ID: <201004060622.o366M8iU017197@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3043 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 02:22 EST ------- (In reply to comment #0) > This is the error I get: > > ... > AssertionError: Wrapper is missing: -remote_verbose, -use_test_remote_service, > ... > > but actually these seem to be extra options rather than missing options: > > $ blastn -h > USAGE > blastn [-h] [-help] [-import_search_strategy filename] > [-export_search_strategy filename] [-task task_name] [-db database_name] > [-dbsize num_letters] [-gilist filename] [-negative_gilist filename] > [-entrez_query entrez_query] [-db_soft_mask filtering_algorithm] > [-subject subject_input_file] [-subject_loc range] [-query input_file] > [-out output_file] [-evalue evalue] [-word_size int_value] > [-gapopen open_penalty] [-gapextend extend_penalty] > [-perc_identity float_value] [-xdrop_ungap float_value] > [-xdrop_gap float_value] [-xdrop_gap_final float_value] > [-searchsp int_value] [-penalty penalty] [-reward reward] [-no_greedy] > [-min_raw_gapped_score int_value] [-template_type type] > [-template_length int_value] [-dust DUST_options] > [-filtering_db filtering_database] > [-window_masker_taxid window_masker_taxid] > [-window_masker_db window_masker_db] [-soft_masking soft_masking] > [-ungapped] [-culling_limit int_value] [-best_hit_overhang float_value] > [-best_hit_score_edge float_value] [-window_size int_value] > [-use_index boolean] [-index_name string] [-lcase_masking] > [-query_loc range] [-strand strand] [-parse_deflines] [-outfmt format] > [-show_gis] [-num_descriptions int_value] [-num_alignments int_value] > [-html] [-max_target_seqs num_sequences] [-num_threads int_value] > [-remote] [-verbose] [-remote_verbose] [-use_test_remote_service] > [-version] > > DESCRIPTION > Nucleotide-Nucleotide BLAST 2.2.22+ > > Use '-help' to print detailed descriptions of command line arguments > > > In any case, probably there will be slight differences in the options used by > different versions of Blast, and this shouldn't cause tests to fail. I thought I was using 2.2.22+ on my dev machine - I'll check. Assuming you have installed the latest BLAST+ and our wrappers are missing some recently added switches, the test is functioning as designed. I really did WANT it to fail in this situation, to alert us to the fact the wrappers need updating. By design the test should pass fine on older BLAST+ releases with less options. OK, maybe it could be a warning, but if we let this pass silently we risk the wrappers getting out of date without anyone noticing, and then missing options being added ad-hoc as an when people need them. i.e. What seemed to have happens with other wrappers in the past. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Apr 6 06:29:24 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 6 Apr 2010 07:29:24 +0100 Subject: [Biopython-dev] Draft Announcement for Biopython 1.54 In-Reply-To: <20100406135105.61102b7osdvos03t@www.studentmail.otago.ac.nz> References: <20100406135105.61102b7osdvos03t@www.studentmail.otago.ac.nz> Message-ID: On Tue, Apr 6, 2010 at 2:51 AM, David Winter wrote: > Hi all, > > Here's a draft announcement for the next release, very happy to take > corrections and suggestions on how to change it. I'll put a marked up > version of this on the OBF server soon. > > Cheers, > David Lovely :) I spotted one small thing to correct: > Biopython now supports the reading, writing and indexing of Standard > Flowgram Format (SFF) files produced in 454 sequencing. Jose Blanca > (the brains behind the widely used sff_extract tool) has extended Bio.SeqIO > to handle these files, making it possible to convert between SFF, FASTQ, > FASTA and QUAL formats (as trimmed or untrimmed reads). The new SFF support was based on code donated from Jose Blanc, but he didn't actually do the SeqIO integration (or the indexing) - that was me. Also we can only convert from SFF to any of FASTQ, FASTA and QUAL formats. Going to SFF isn't possible because it requires the flow space data from the instrument which isn't present. Thanks David, Peter From bugzilla-daemon at portal.open-bio.org Tue Apr 6 08:14:03 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 04:14:03 -0400 Subject: [Biopython-dev] [Bug 3043] test_NCBI_BLAST_tools fails In-Reply-To: Message-ID: <201004060814.o368E3Gr021721@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3043 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2010-04-06 04:14 EST ------- (In reply to comment #1) > > Assuming you have installed the latest BLAST+ and our wrappers are missing some > recently added switches, the test is functioning as designed. I really did > WANT it to fail in this situation, to alert us to the fact the wrappers need > updating. Uhm, how does a test failing on some user's machine alert us to update the wrappers? Especially with new users, it's more likely that they will conclude that Biopython is buggy, and stop using it. It's better to execute such a test only if a user or developer specifically asks for it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 08:45:44 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 04:45:44 -0400 Subject: [Biopython-dev] [Bug 3043] test_NCBI_BLAST_tools fails In-Reply-To: Message-ID: <201004060845.o368jiB5023048@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3043 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 04:45 EST ------- (In reply to comment #2) > > Uhm, how does a test failing on some user's machine alert us to update the > wrappers? Especially with new users, it's more likely that they will conclude > that Biopython is buggy, and stop using it. It's better to execute such a test > only if a user or developer specifically asks for it. > I was expecting that if a user running a new BLAST+ runs the unit test and hits this issue, they'd report the issue to us. Looking at this afresh, the assert wasn't really a good long term solution (but it was very helpful in checking the wrappers had full coverage). To address this issue in the short term, I've made this abort the test with a missing external dependency error (so that run_tests.py will skip it) with what I hope is a clear and not too scary error message. In this particular case, those extra arguments *may* only these because your copy of BLAST 2.2.22+ has been compiled in debug mode. They are not present on my install of BLAST 2.2.22+ We don't have "full" test suite do we? That would be useful, and we could do things like require larger test files to be present (which we can track in the repository but not ship) or generated, or assume a particular BLAST database will be installed. Can we close this bug? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 13:39:08 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 09:39:08 -0400 Subject: [Biopython-dev] [Bug 3043] test_NCBI_BLAST_tools fails In-Reply-To: Message-ID: <201004061339.o36Dd8vP032063@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3043 ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2010-04-06 09:39 EST ------- (In reply to comment #3) > To address this issue in the short > term, I've made this abort the test with a missing external dependency error > (so that run_tests.py will skip it) with what I hope is a clear and not too > scary error message. > Sorry but I don't think this test is useful. If the test succeeds, all we know is that the user's Blast has the same options as the developer's Blast. But it doesn't actually test Bio.Blast.Applications. For many users, the test will generate a MissingDependencyError; as we've seen, even for the same version Blast may have different options. But the Blast dependency is not actually missing, and most like Bio.Blast.Applications works correctly even if some options were added to Blast. > We don't have "full" test suite do we? That would be useful, and we could do > things like require larger test files to be present (which we can track in the > repository but not ship) or generated, or assume a particular BLAST database > will be installed. > That would be useful. We could have a biopython/Developer directory in the repository with all the tests we want to run before making a release. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From cy at cymon.org Tue Apr 6 14:39:46 2010 From: cy at cymon.org (Cymon Cox) Date: Tue, 6 Apr 2010 15:39:46 +0100 Subject: [Biopython-dev] [Bug 3042] test_Mafft_tool fails In-Reply-To: <201004060614.o366EWsk016896@portal.open-bio.org> References: <201004060614.o366EWsk016896@portal.open-bio.org> Message-ID: On 6 April 2010 07:14, wrote: > http://bugzilla.open-bio.org/show_bug.cgi?id=3042 > > > > > > ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 02:14 EST ------- > (In reply to comment #0) > > This is with MAFFT version 5.732 (2005/09/14). The output it generates > starts > > with: > > ... > > Whereas the bug may disappear with newer versions of mafft, most > Biopython > > users will not use mafft, and we should not require to have the latest > version > > of mafft installed to avoid test errors. > > I think you are right this is due to your version of MAFFT. The lattest > version is MAFFT 6.717, the first public 6.x release was back in 2007. > MAFFT 5.732 from late 2005 is really *very* old, right at the bottom of > the release history page: > http://mafft.cbrc.jp/alignment/software/changelog.html > > Probably the best solution here is to detect the version number (perhaps by > the date?), and skip the tests if it is too old (like test_Emboss.py does > now). > For the alignment tool interfaces we could only test against the versions that the wrappers were written against (Mafft was 6.626b for instance), and skip all other versions - but that may be a bit drastic. Perhaps detecting the version and issuing a warning such as "This test may have failed because you are using an older/newer version of X", if necessary, is more appropriate. I'll look again at newer versions of these alignment tools (when I get a chance...). Cheers, C. -- From bugzilla-daemon at portal.open-bio.org Tue Apr 6 19:29:17 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 15:29:17 -0400 Subject: [Biopython-dev] [Bug 3043] test_NCBI_BLAST_tools fails In-Reply-To: Message-ID: <201004061929.o36JTHtb009859@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3043 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 15:29 EST ------- (In reply to comment #4) > (In reply to comment #3) > > To address this issue in the short > > term, I've made this abort the test with a missing external dependency error > > (so that run_tests.py will skip it) with what I hope is a clear and not too > > scary error message. > > > Sorry but I don't think this test is useful. If the test succeeds, all we know > is that the user's Blast has the same options as the developer's Blast. But it > doesn't actually test Bio.Blast.Applications. For many users, the test will > generate a MissingDependencyError; as we've seen, even for the same version > Blast may have different options. But the Blast dependency is not actually > missing, and most like Bio.Blast.Applications works correctly even if some > options were added to Blast. Do you agree this check would be useful as part of a "developer's extended test suite"? The idea being that it will hopefully catch when the NCBI adds or removes a BLAST+ switch. We would then update the wrapper and/or white list the change in the test. > > We don't have "full" test suite do we? That would be useful, and we could > > do things like require larger test files to be present (which we can track > > in the repository but not ship) or generated, or assume a particular BLAST > > database will be installed. > > That would be useful. We could have a biopython/Developer directory in the > repository with all the tests we want to run before making a release. I had been thinking something like Tests/dev_test_XXX.py and Tests/dev/XXX for any files required - but your suggestion of a new top level directory would make the manifest and setup.py work easier. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 20:48:18 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 16:48:18 -0400 Subject: [Biopython-dev] [Bug 3044] New: PhyloXMLIO, assigning node_id causes failures on write after re-reading Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3044 Summary: PhyloXMLIO, assigning node_id causes failures on write after re-reading Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P3 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: joelb at lanl.gov Hi, everybody. Thanks for the prompt attention to my previous bugs as I work my way through a PhyloXML project. I updated my biopython from git and archaeoptryx from cvs on Friday and now several things work differently, mostly for the better. I have problems with writing any phyloXML tree which was read with node_id's defined. Writing the tree the first time doesn't fail, and the tree can be subsequently read, but a failure occurs on the second write. Again, using an example file: >>> tree = Phylo.read('bcl_2.xml', 'phyloxml') >>> tree.clade[0].node_id = Phylo.PhyloXML.Id('node000') >>> Phylo.write(tree,'test1.xml','phyloxml') 1 >>> tree1 = Phylo.read('test1.xml','phyloxml') >>> Phylo.write(tree1,'test2.xml','phyloxml') Traceback (innermost last): File "", line 1, in File "/usr/lib64/python2.6/site-packages/Bio/Phylo/_io.py", line 82, in write n = getattr(supported_formats[format], 'write')(trees, file, **kwargs) File "/usr/lib64/python2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 142, in write return Writer(obj).write(file, encoding=encoding, indent=indent) File "/usr/lib64/python2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 684, in __init__ self._tree = ElementTree.ElementTree(self.phyloxml(phyloxml)) File "/usr/lib64/python2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 705, in phyloxml elem.append(self.phylogeny(tree)) File "/usr/lib64/python2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 656, in wrapped elem.append(getattr(self, subn)(getattr(obj, subn))) File "/usr/lib64/python2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 661, in wrapped elem.append(getattr(self, method)(item)) File "/usr/lib64/python2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 656, in wrapped elem.append(getattr(self, subn)(getattr(obj, subn))) File "/usr/lib64/python2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 651, in wrapped elem = ElementTree.Element(tag, _clean_attrib(obj, attribs)) File "/usr/lib64/python2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 643, in _clean_attrib val = getattr(obj, key) AttributeError: 'str' object has no attribute 'provider' If I specify a provider using tree.clade[0].node_id = Phylo.PhyloXML.Id('node000',provider='LANL') I get the same error. I realize it's fairly pointless specifying a node_id if one wants to intermix with Java because forester pays no attention to node_id and assigns its own. I think this is a bug in the Java implementation, according to the XML schema. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 21:28:27 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 17:28:27 -0400 Subject: [Biopython-dev] [Bug 3045] New: TreeMixin, please define enumerator and other convenience methods Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3045 Summary: TreeMixin, please define enumerator and other convenience methods Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P4 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: joelb at lanl.gov Hi again, I frequently find the need to go back and forth between tree objects and sequences defined over either the internal or the terminal nodes. Ideally these should be done in concise list comprehensions for performance and readability reasons. These list comprehensions necessarily mix indices into arrays and objects from generators, and the enumerate() pattern is the most convenient because of this mix. I suspect that many others have the same needs. The usage patterns for, say, setting a phyloXML property from an array prop_arr should look something like: [node.set_property(prop_arr[i], *prop_params, **prop_keywords) for i, node in tree.enumerate_internals()] The three issues that frustrate such concision are (1) internal nodes, terminal nodes, and all nodes are not currently on an equal footing with respect to methods, (2) there are no enumerator methods, and (3) the get/set methods for phyloXML are very awkward at the moment. I deal with (3) in the next feature request. Here I give some convenience methods that I wish were defined in TreeMixin. I have tested them as standalone methods. I hope you'll see fit to include them at some point. def count_internals(self): """Counts the number of non-terminal (internal) nodes within this tree.""" return [i for i,e in enumerate_internals(self)][-1] + 1 def enumerate_internals(self): """Returns an enumerator of non-terminal clades""" return enumerate(self.find_clades(terminal=False)) def enumerate_terminals(self): """Returns an enumerator of terminal clades""" return enumerate(self.find_clades(terminal=True)) def enumerate_all(self): """Returns an enumerator on all clades""" return enumerate(self.find_clades()) Less critical but still useful are the following two methods (and one private utility) that I find useful for operations on trees: def is_semipreterminal(self): """True if any direct descendent is terminal.""" if self.root.is_terminal(): return False for clade in self.clades: if clade.is_terminal(): return True return False def terminal_neighbor_dists(self): """Return a list of distances between adjacent terminals""" return [self.distance(*i) for i in _generate_pairs(self.find_clades(terminal=True))] def _generate_pairs(self): import itertools pairs = itertools.tee(self) pairs[1].next() return itertools.izip(pairs[0], pairs[1]) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 21:46:17 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 17:46:17 -0400 Subject: [Biopython-dev] [Bug 3046] New: PhyloXML, please define get/set methods Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3046 Summary: PhyloXML, please define get/set methods Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P4 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: joelb at lanl.gov It would be nice if there were get/set properties for phyloXML objects that were easier and more concise to use. Right now, to set, say, a phyloXML property, one has to read the code to learn the names and arguments of the Property class and also to learn that properties are added by appending to a list. Besides the matter of convenience, there is also a question about how the properties and taxonomies objects behave. I will take the matter up with the phyloXML mailing list, but I believe that these objects should be dictionary-like rather than list-like. That is, duplicate ref values should not be allowed because the question of how to handle duplicates would have to get pushed down to the user level and will be inconsistent. The following convenience methods make a start at these problems, but don't fully solve them because the current PhyloXML code would have to be reworked to deliver dictionaries of dictionaries. However, it's better than nothing: def set_property(self, *propArgs, **propkwArgs): for property in self.properties: if property.ref == propArgs[1]: property = PhyloXML.Property(propArgs) return self.properties.append(PhyloXML.Property(*propArgs, **propkwArgs)) def get_property(self, key): for property in self.properties: if property.ref == key: return property.value raise KeyError def set_ID(self, *idArgs, **idkwArgs): self.node_id = PhyloXML.Id(*idArgs, **idkwArgs) def add_taxonomy(self, *taxArgs, **taxkwArgs): self.taxonomies.append(PhyloXML.Taxonomy(*taxArgs, **taxkwArgs)) def set_color(node, red, green, blue): node.color = PhyloXML.BranchColor(red, green, blue) def get_taxonomy(self, rank): for taxonomy in self.taxonomies: if taxonomy.rank == rank: return taxonomy.scientific_name raise KeyError -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 22:13:23 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:13:23 -0400 Subject: [Biopython-dev] [Bug 3047] New: PhyloXML, behavior on setting color and width doesn't match docstring or spec Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3047 Summary: PhyloXML, behavior on setting color and width doesn't match docstring or spec Product: Biopython Version: Not Applicable Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P4 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: joelb at lanl.gov >From the Clade docstring: "Both 'color' and 'width' elements apply for the whole clade unless overwritten in-sub clades. This information follows the PhyloXML doc. However, that's not the way the code works: >>> tree = Phylo.read('Bacteria403.phyloxml', 'phyloxml') >>> tree.clade[0].color BranchColor(blue='0', green='76', red='41') >>> tree = Phylo.read('bcl_2.xml', 'phyloxml') >>> tree.clade[0].color >>> tree.clade[0].color = Phylo.PhyloXML.BranchColor(255,0,255) >>> tree.clade[0].color BranchColor(blue='255', green='0', red='255') >>> tree.clade[0][0].color >>> tree.clade[0].width = 3 >>> tree.clade[0][0].width >>> Personally I'd prefer changing the docstring. The Java code doesn't implement the spec either, and its actually more complicated for the user to deal with side-effects of setting at the entire clade at once than it is to iterate over the clade. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 22:22:25 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:22:25 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004062222.o36MMPlk014586@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 18:22 EST ------- I'm very tempted to mark this as "won't fixed", this is Python not Java (grin) and get/set functions are ugly. The actual functionality you are looking for might be expressed using explicit Python properties though (which would show up using dir(tree) etc). I'd need to see some examples to comment on the specifics. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 22:25:41 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:25:41 -0400 Subject: [Biopython-dev] [Bug 3045] TreeMixin, please define enumerator and other convenience methods In-Reply-To: Message-ID: <201004062225.o36MPfp3014644@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3045 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 18:25 EST ------- Interesting. I don't really see the need for most of these given how routine use of the enumerate function is elsewhere in Python. So -1 on the enumerate methods. I'm not fond of the name of the existing method get_terminals (which currently returns a list). My feeling is that using just terminals seems nicer (as a property, so no order argument - if you need that use the find method). Is there any advantage to returning a list vs an iterator? Everything is all in memory anyway, right? Given a terminals property (be it a read only list or an iterator), one might go further and add a sister property for the internal nodes (non-terminal nodes). What are your thoughts Eric? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 22:29:50 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:29:50 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004062229.o36MToTH014782@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 18:29 EST ------- (In reply to comment #0) > > ... > >>> tree.clade[0].color = Phylo.PhyloXML.BranchColor(255,0,255) > >>> tree.clade[0].color > BranchColor(blue='255', green='0', red='255') Maybe I should file this as a separate bug, but it looks like BranchColor needs an explicit __repr__ to ensure the arguments are listed as in the __init__ defintion (which is the conventional red, green, blue order - rather than alphabetical). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 22:32:31 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:32:31 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004062232.o36MWV73014870@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 18:32 EST ------- (In reply to comment #0) > From the Clade docstring: > > "Both 'color' and 'width' elements apply for the whole clade unless > overwritten in-sub clades. What is the bug? The docstring doesn't say the property will be explicitly cascaded down to the sub-clades, which seems to be your interpretation. Isn't that implicit when interpreting (drawing) the tree? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 22:35:17 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:35:17 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004062235.o36MZHiB014980@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 ------- Comment #2 from joelb at lanl.gov 2010-04-06 18:35 EST ------- (In reply to comment #1) > I'm very tempted to mark this as "won't fixed", this is Python not Java > (grin) and get/set functions are ugly. > > The actual functionality you are looking for might be expressed using explicit > Python properties though (which would show up using dir(tree) etc). I'd need > to see some examples to comment on the specifics. > Hi, Peter, Actually, I was thinking that the PhyloXML interface is *too* Java-esque. The functionality I'm trying to get was summarized in the previous feature request, namely a concise list comprehension such as: [node.set_property(prop_arr[i], *prop_params, **prop_keywords) for i, node in tree.enumerate_internals()] Obviously this could be done without explicit get/sets as [node.__setattr__('property', PhyloXML.Property(prop_arr[i], *prop_params, **prop_keywords)) for i, node in tree.enumerate_internals()] if property was actually settable, although that's ugly too. Unfortunately you can't set 'property', you can only append to the properties list, and I don't see any clean way of doing that through __setattr__ By the way, the taxonomies list totally doesn't work in the Java code; it only sees the last taxonomy that you added. I'm working with upstream on this. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 22:40:54 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:40:54 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004062240.o36MesZY015074@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 ------- Comment #3 from joelb at lanl.gov 2010-04-06 18:40 EST ------- (In reply to comment #2) > (In reply to comment #0) > > From the Clade docstring: > > > > "Both 'color' and 'width' elements apply for the whole clade unless > > overwritten in-sub clades. > > What is the bug? The docstring doesn't say the property will be explicitly > cascaded down to the sub-clades, which seems to be your interpretation. > Isn't that implicit when interpreting (drawing) the tree? > > Peter > I suppose one could maintain that it's the responsibility of user code to enforce the behavior specified in the docstring, although I think that's a recipe for incompatibilities. However, neither of the two available user codes I'm aware of (archaeoptryx or biopython's Phylo.draw_graphviz) actually implement it. It's better to just change the docstring, I think. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 22:45:58 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:45:58 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004062245.o36Mjwgm015132@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 18:45 EST ------- (In reply to comment #3) > > I suppose one could maintain that it's the responsibility of user code to > enforce the behavior specified in the docstring, although I think that's a > recipe for incompatibilities. However, neither of the two available user > codes I'm aware of (archaeoptryx or biopython's Phylo.draw_graphviz) actually > implement it. It's better to just change the docstring, I think. > The current behaviour seems very natural to me. Are you familiar with CSS? I think there are strong similarities - unless explicitly overridden a node implicitly inherits the color/width of its parent. What would you suggest for the docstring? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 22:56:13 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 18:56:13 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004062256.o36MuD8i015303@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 18:56 EST ------- (In reply to comment #2) > > Hi, Peter, > > Actually, I was thinking that the PhyloXML interface is *too* Java-esque. > The functionality I'm trying to get was summarized in the previous feature > request, namely a concise list comprehension such as: > > [node.set_property(prop_arr[i], *prop_params, **prop_keywords) > for i, node in tree.enumerate_internals()] I don't understand what you are trying to do in this example, but a method called set_property seems wrong - are you trying to do something using property attributes for new-style Python classes? Also why are you using a list comprehension if you care about the side effects (creating a property)? Why not just use a for loop? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 23:02:45 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 19:02:45 -0400 Subject: [Biopython-dev] [Bug 3044] PhyloXMLIO, assigning node_id causes failures on write after re-reading In-Reply-To: Message-ID: <201004062302.o36N2jl1015451@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3044 ------- Comment #1 from eric.talevich at gmail.com 2010-04-06 19:02 EST ------- Hi Joel, Thanks for testing! It's great to get this stuff ironed out before the first stable release. (In reply to comment #0) > I updated my biopython from git and archaeoptryx from cvs on Friday and now > several things work differently, mostly for the better. Heads up: I pushed another change to GitHub yesterday that *might* have broken your code. Would you mind pulling another update and seeing if everything still works? (The main effect is that "tree.get_path(code='OCTVU')" will work now.) > I have problems with writing any phyloXML tree which was read with node_id's > defined. Writing the tree the first time doesn't fail, and the tree can be > subsequently read, but a failure occurs on the second write. OK, I'll check this out soon. This may be due to a shim I added to make PhyloXML.Id object behave like a primitive type in some cases, for compatibility with non-PhyloXML trees. Best, Eric -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 23:09:34 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 19:09:34 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004062309.o36N9YFO015608@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |biopython- | |bugzilla at maubp.freeserve.co. | |uk ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-06 19:09 EST ------- Taking a specific example, you suggested adding this helper function: def set_color(node, red, green, blue): node.color = PhyloXML.BranchColor(red, green, blue) I might advocate adding a color property to the tree/node class, with a set method which accepts either a PhyloXML.BranchColor instance or perhaps for convenience a RGB tuple. Something like this: def _set_color(self, color): if isinstance(color, PhyloXML.BranchColor): self._color = color elif len(color)==3: self.color = PhyloXML.BranchColor(red=color[0], green=color[1], blue=color[2]) else: raise ValueError("Bad color") def _get_color(self): return self._color color = Property(_get_color, _set_color, doc="Node color") (It would be nice to make the color object similar to the ReportLab and GenomeDiagram conventions used elsewhere in Biopython). The point of that would be you would then use it like this: for node in tree.find(terminal=False): node.color = PhyloXML.BranchColor(255, 0, 0) for node in tree.find(terminal=True): node.color = PhyloXML.BranchColor(0, 0, 255) if you explicitly wanted to make all the internal nodes red and all the terminal nodes blue. Or, as discussed on Bug 3047 do this implicitly: tree.color = (255, 0, 0) #implicitly applies to children for node in tree.find(terminal=True): node.color = PhyloXML.BranchColor(0, 0, 255) Eric - how would this example be done with the current code base? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 23:20:13 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 19:20:13 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004062320.o36NKDTt015849@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 ------- Comment #5 from joelb at lanl.gov 2010-04-06 19:20 EST ------- (In reply to comment #3) > > I don't understand what you are trying to do in this example, but a method > called set_property seems wrong - are you trying to do something using > property attributes for new-style Python classes? > > Also why are you using a list comprehension if you care about the side > effects (creating a property)? Why not just use a for loop? > > Peter > There have long been built-in get/sets in python via __setattr__() and __getattribute__(). That's where the code I sent should live. Putting code to get and especially to set in those methods means that a user doesn't have to look up whatever classes were defined for attributes (e.g. like finding that 'color' is called 'BranchColor') and doesn't need to know that taxonomies and properties are only set through appending through lists. The reason why to use a list comprehension rather than a for loop is performance and readability. Small functions that work over a single item of a sequence are vectorizable, either as list comprehensions or through numpy.vectorize. See Ziade's "Expert Python Programming" p. 34. I have code examples where the difference is a factor of 300 in speed. I'm including an example code that I wrote. Feel free, Eric, to use it on the PhyloXML page if you wish. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 23:22:20 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 19:22:20 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004062322.o36NMK4m015894@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 ------- Comment #5 from joelb at lanl.gov 2010-04-06 19:22 EST ------- Created an attachment (id=1475) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1475&action=view) phyloXML user code showing how to colorize a tree -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 23:23:12 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 19:23:12 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004062323.o36NNCgC015925@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 ------- Comment #6 from joelb at lanl.gov 2010-04-06 19:23 EST ------- Created an attachment (id=1476) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1476&action=view) archaeoptryx screen dump showing colorized tree -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 6 23:34:43 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 19:34:43 -0400 Subject: [Biopython-dev] [Bug 3045] TreeMixin, please define enumerator and other convenience methods In-Reply-To: Message-ID: <201004062334.o36NYhru016144@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3045 ------- Comment #2 from eric.talevich at gmail.com 2010-04-06 19:34 EST ------- (In reply to comment #1) > Interesting. I don't really see the need for most of these given how routine > use of the enumerate function is elsewhere in Python. So -1 on the enumerate > methods. I'll make sure all of the use cases can be handled with a simple list comprehension, at least. > > I'm not fond of the name of the existing method get_terminals (which currently > returns a list). My feeling is that using just terminals seems nicer (as a > property, so no order argument - if you need that use the find method). Is > there any advantage to returning a list vs an iterator? Everything is all in > memory anyway, right? I took the method names from Bio.Nexus.Trees wherever it seemed reasonable -- one day I'd like Bio.Phylo to be a drop-in replacement for that module (as much as possible). Otherwise I'd be fine with a method called terminals(). The tree object doesn't keep a list of terminal nodes under the hood, so to get the terminal nodes it does a full search of the tree, with run time linear to the number of nodes in the tree. I feel uneasy about properties that don't run in O(1) time. The find* methods return iterators, and the get* methods return lists. I found that the results of get* usually needed to be converted to a list immediately, for indexing or length-checking, and aren't liable to be unexpectedly large -- smaller than the whole tree, anyway. Plus, get_terminals() is really just a shortcut for list(tree.find_clades(terminal=True)), for those who prefer to dive into the module or save some typing. > Given a terminals property (be it a read only list or an iterator), one might > go further and add a sister property for the internal nodes (non-terminal > nodes). Apparently there's some demand for it. It would be the same as list(tree.find_clades(terminal=False)), and forcing users to learn how find_* methods work after they're hooked on get_terminals() has some appeal, but I suppose we should just pick a name and add it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From matzke at berkeley.edu Wed Apr 7 00:46:17 2010 From: matzke at berkeley.edu (Nick Matzke) Date: Tue, 06 Apr 2010 17:46:17 -0700 Subject: [Biopython-dev] Biopython devs at iEvoBio? In-Reply-To: <20100406135438.95193kse6z41o5su@www.studentmail.otago.ac.nz> References: <20100406135438.95193kse6z41o5su@www.studentmail.otago.ac.nz> Message-ID: <4BBBD5D9.60504@berkeley.edu> I'll be at the Evo meeting and the iEvoBio meeting. I'd be happy to give a talk as long as it's short -- I have to prioritize my research talk at the main meeting! Cheers! Nick David Winter wrote: > Hi again guys, > > I was wondering if anyone else is planning to go to iEvoBio > (http://ievobio.org/) in Portland in June. The meeting is planned to be a > phyloinformatics counterpart to BOSC and is going to be run alongside > the big > Evolution Meetings. > > It might be a good venue to show Erick and Nick's GSoC projects from last > year. Obviously, if Eric or Nick are planning to be at the meeting then > they > should present their work, but if they aren't going to be there I'd be > happy to > present a short demo on some of the things those libraries can do and how > they might be brought together with other Biopython tools to build some > useful workflows. ( it might start to make up for how slack of I've been in > this news contributor role!) > > At the moment I really just need to know if better qualified people > will be > there and, if not, if people think a demo is a good idea (the software > demonstration sessions don't need an abstract anytime soon) > > Cheers, > david > > > -- ==================================================== Nicholas J. Matzke Ph.D. Student, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Graduate Student Instructor, IB200A Principles of Phylogenetics: Systematics http://ib.berkeley.edu/courses/ib200a/index.shtml Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: matzke at berkeley.edu Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 ----------------------------------------------------- "[W]hen people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together." Isaac Asimov (1989). "The Relativity of Wrong." The Skeptical Inquirer, 14(1), 35-44. Fall 1989. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm ==================================================== From bugzilla-daemon at portal.open-bio.org Wed Apr 7 03:32:30 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 6 Apr 2010 23:32:30 -0400 Subject: [Biopython-dev] [Bug 3044] PhyloXMLIO, assigning node_id causes failures on write after re-reading In-Reply-To: Message-ID: <201004070332.o373WU1b024754@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3044 eric.talevich at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from eric.talevich at gmail.com 2010-04-06 23:32 EST ------- Fixed in GitHub now: http://github.com/biopython/biopython/commit/218a2e6759a901766125a99370593097f36b1bad -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 7 05:05:02 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Apr 2010 01:05:02 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004070505.o37552DX028541@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 ------- Comment #6 from eric.talevich at gmail.com 2010-04-07 01:05 EST ------- (In reply to comment #0) > It would be nice if there were get/set properties for phyloXML objects that > were easier and more concise to use. Right now, to set, say, a phyloXML > property, one has to read the code to learn the names and arguments of the > Property class and also to learn that properties are added by appending to a > list. Yes, it's easier to tweak the class definitions if there's not much syntactic sugar to get in the way. This is still pretty new code ;) but of course I'm open to suggestions. > Besides the matter of convenience, there is also a question about how the > properties and taxonomies objects behave. I will take the matter up with the > phyloXML mailing list, but I believe that these objects should be > dictionary-like rather than list-like. That is, duplicate ref values should > not be allowed because the question of how to handle duplicates would have to > get pushed down to the user level and will be inconsistent. The Events class (clade.events attribute) mimics a dictionary. Have you used that yet? About clade.properties: If ordering of properties doesn't matter, 'ref' is guaranteed to be unique at a node, and it seems to be the right way to index the other associated data, then I can make clade.properties act like a dictionary. Can we confirm all of these? And for the implementation, can you provide a sketch of what you'd like the final structure to look like, and maybe a contrived doctest-like code example showing what you'd like to be able to do? In many cases, the phyloXML spec doesn't currently promise enough to make nice shortcuts work without the possibility of breaking in the future. For example, check out this new demo with *two* bootstrap values for every clade: http://www.phylosoft.org/archaeopteryx/examples/data/multiple_supports.xml I was tempted to make confidences act like a dictionary indexed by support type, but clearly now that wouldn't have worked. A list of Confidence objects lets us stay faithful to the raw XML representation. > def set_property(self, *propArgs, **propkwArgs): > for property in self.properties: > if property.ref == propArgs[1]: > property = PhyloXML.Property(propArgs) > return > self.properties.append(PhyloXML.Property(*propArgs, **propkwArgs)) > > def get_property(self, key): > for property in self.properties: > if property.ref == key: > return property.value > raise KeyError It's possible that Bio.Phylo will pick up the convention of "add_foo/get_foo" methods where a property would be overly magical, and something noteworthy is going on internally. Alignment objects have "add_sequence", and Phylogeny objects have "get_alignment". Would you use a Phylogeny method called add_alignment, taking something like a Phylip character matrix? We can figure out a sugared interface for clade.properties once we know how which of the requirements stated above will actually be guaranteed. > def set_ID(self, *idArgs, **idkwArgs): > self.node_id = PhyloXML.Id(*idArgs, **idkwArgs) If you do "from Bio.Phylo import PhyloXML as PX" it really doesn't save any typing, and the **kwargs magic is even less suitable for introspection. It's not possible to take advantage of all the PhyloXML annotations available without learning about the annotation classes the -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Wed Apr 7 05:19:55 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 7 Apr 2010 01:19:55 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods Message-ID: Hi Joel, (In reply to comment #0) On Tue, Apr 6, 2010 at 5:46 PM, wrote: > > http://bugzilla.open-bio.org/show_bug.cgi?id=3046 > It would be nice if there were get/set properties for phyloXML objects that > were easier and more concise to use.? Right now, to set, say, a phyloXML > property, one has to read the code to learn the names and arguments of the > Property class and also to learn that properties are added by appending to a > list. Yes, it's easier to tweak the class definitions if there's not much syntactic sugar to get in the way. This is still pretty new code ;) but of course I'm open to suggestions. > Besides the matter of convenience, there is also a question about how the > properties and taxonomies objects behave.?? I will take the matter up with the > phyloXML mailing list, but I believe that these objects should be > dictionary-like rather than list-like.? That is, duplicate ref values should > not be allowed because the question of how to handle duplicates would have to > get pushed down to the user level and will be inconsistent. The Events class (clade.events attribute) mimics a dictionary. Have you used that yet? About clade.properties: If ordering of properties doesn't matter, 'ref' is guaranteed to be unique at a node, and it seems to be the right way to index the other associated data, then I can make clade.properties act like a dictionary. Can we confirm all of these? And for the implementation, can you provide a sketch of what you'd like the final structure to look like, and maybe a contrived doctest-like code example showing what you'd like to be able to do? In many cases, the phyloXML spec doesn't currently promise enough to make nice shortcuts work without the possibility of breaking in the future. For example, check out this new demo with *two* bootstrap values for every clade: http://www.phylosoft.org/archaeopteryx/examples/data/multiple_supports.xml I was tempted to make confidences act like a dictionary indexed by support type, but clearly now that wouldn't have worked. A list of Confidence objects lets us stay faithful to the raw XML representation. > def set_property(self,? *propArgs,? **propkwArgs): >???? for property in self.properties: >???????? if property.ref == propArgs[1]: >???????????? property = PhyloXML.Property(propArgs) >???????????? return >???? self.properties.append(PhyloXML.Property(*propArgs,? **propkwArgs)) > > def get_property(self,? key): >???? for property in self.properties: >???????? if property.ref == key: >???????????? return property.value >???? raise KeyError It's possible that Bio.Phylo will pick up the convention of "add_foo/get_foo" methods where a property would be overly magical, and something noteworthy is going on internally. Alignment objects have "add_sequence", and Phylogeny objects have "get_alignment". Would you use a Phylogeny method called add_alignment, taking something like a Phylip character matrix? We can figure out a sugared interface for clade.properties once we know how which of the requirements stated above will actually be guaranteed. > def set_ID(self,? *idArgs,? **idkwArgs): >???? self.node_id = PhyloXML.Id(*idArgs,? **idkwArgs) If you do "from Bio.Phylo import PhyloXML as PX" it doesn't really save much typing, and the **kwargs magic is even less suitable for introspection. It's not possible to take advantage of all the PhyloXML annotations available without learning about the annotation classes themselves. How about this: I'll write some decent documentation on the Biopython wiki's PhyloXML page and the official Biopython tutorial/cookbook. > def add_taxonomy(self,? *taxArgs,? **taxkwArgs): >???? self.taxonomies.append(PhyloXML.Taxonomy(*taxArgs,? **taxkwArgs)) > > def get_taxonomy(self, rank): >???? for taxonomy in self.taxonomies: >???????? if taxonomy.rank == rank: >???????????? return taxonomy.scientific_name >???? raise KeyError Unfortunately, none of the Taxonomy attributes are required in the phyloXML spec, so there's nothing we can rely on for easier indexing. But, if the phyloXML files you create yourself are well-behaved then you're free to make your own wrappers over the current low-level functionality. Clade.taxonomies will always be plural and iterable. > def set_color(node, red, green,? blue): >???? node.color =? PhyloXML.BranchColor(red, green, blue) Redundancy makes code harder to maintain -- I'd like to keep it clean at least for the very first release. The BranchColor class actually has much cooler functionality than this; try "node.color = PX.BranchColor.from_name('red')" for example. We can try adding sugar on top of this, but whatever we add, we'll need to maintain in Biopython for quite some time. Thanks again for all the testing and feedback! Best, Eric From winda002 at student.otago.ac.nz Wed Apr 7 05:31:11 2010 From: winda002 at student.otago.ac.nz (david winter) Date: Wed, 07 Apr 2010 17:31:11 +1200 Subject: [Biopython-dev] Biopython devs at iEvoBio? In-Reply-To: <4BBBD5D9.60504@berkeley.edu> References: <20100406135438.95193kse6z41o5su@www.studentmail.otago.ac.nz> <4BBBD5D9.60504@berkeley.edu> Message-ID: <4BBC189F.1030401@student.otago.ac.nz> Ok, So Nick will be there, Eric hopes not to be ;) I sent an email to the organizsng committee about different the talk categories. Based on their reply, and the fact that both Nick and I are going to be focused on our talks at the Evolution Meetings (ie, flying halfway around the world to present a 12min talk) it seems the best way to go would be have a lightening talk on each of the GSoC projects. Eric, presuming you go to BOSC and not iEvoBio I'll get in touch with you at some stage with an outline of a talk and you can help me whip it into shape. Cheers, David On 4/7/2010 12:46 PM, Nick Matzke wrote: > I'll be at the Evo meeting and the iEvoBio meeting. I'd be happy to > give a talk as long as it's short -- I have to prioritize my research > talk at the main meeting! > > Cheers! > Nick > > > David Winter wrote: >> Hi again guys, >> >> I was wondering if anyone else is planning to go to iEvoBio >> (http://ievobio.org/) in Portland in June.. From bugzilla-daemon at portal.open-bio.org Wed Apr 7 05:38:49 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Apr 2010 01:38:49 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004070538.o375cn5n029876@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 ------- Comment #7 from eric.talevich at gmail.com 2010-04-07 01:38 EST ------- (In reply to comment #3) > (In reply to comment #2) > > (In reply to comment #0) > > > From the Clade docstring: > > > > > > "Both 'color' and 'width' elements apply for the whole clade unless > > > overwritten in-sub clades. > > > > What is the bug? The docstring doesn't say the property will be explicitly > > cascaded down to the sub-clades, which seems to be your interpretation. > > Isn't that implicit when interpreting (drawing) the tree? > > > > Peter > > > > I suppose one could maintain that it's the responsibility of user code to > enforce the behavior specified in the docstring, although I think that's a > recipe for incompatibilities. However, neither of the two available user codes > I'm aware of (archaeoptryx or biopython's Phylo.draw_graphviz) actually > implement it. It's better to just change the docstring, I think. > That's my interpretation of it -- the color attribute is meant to be handled that way by whatever code uses it for drawing. Which means two things should happen before closing this bug: - Change the docstring to indicate *user code* is supposed to handle colors and widths in a cascading fashion - Fix draw_graphviz (actually to_networkx) to cascade colors down branches, like the Archaeopteryx GUI does By the way, thanks for phyloXMLtools.py -- I'll take a closer look when I have some time. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 7 06:44:24 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Apr 2010 02:44:24 -0400 Subject: [Biopython-dev] [Bug 3045] TreeMixin, please define enumerator and other convenience methods In-Reply-To: Message-ID: <201004070644.o376iOvi000489@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3045 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-07 02:44 EST ------- Hi Eric, (In reply to comment #2) > > I'm not fond of the name of the existing method get_terminals (which currently > > returns a list). My feeling is that using just terminals seems nicer (as a > > property, so no order argument - if you need that use the find method). Is > > there any advantage to returning a list vs an iterator? Everything is all in > > memory anyway, right? > > I took the method names from Bio.Nexus.Trees wherever it seemed reasonable -- > one day I'd like Bio.Phylo to be a drop-in replacement for that module (as much > as possible). Otherwise I'd be fine with a method called terminals(). OK, that is a reasonable argument in favour. > The tree object doesn't keep a list of terminal nodes under the hood, so to get > the terminal nodes it does a full search of the tree, with run time linear to > the number of nodes in the tree. I feel uneasy about properties that don't run > in O(1) time. OK, so a property does seem unwise. > The find* methods return iterators, and the get* methods return lists. I found > that the results of get* usually needed to be converted to a list immediately, > for indexing or length-checking, and aren't liable to be unexpectedly large -- > smaller than the whole tree, anyway. Plus, get_terminals() is really just a > shortcut for list(tree.find_clades(terminal=True)), for those who prefer to > dive into the module or save some typing. If there is a good reason, that seems fine. > > Given a terminals property (be it a read only list or an iterator), one might > > go further and add a sister property for the internal nodes (non-terminal > > nodes). > > Apparently there's some demand for it. It would be the same as > list(tree.find_clades(terminal=False)), and forcing users to learn how find_* > methods work after they're hooked on get_terminals() has some appeal, but I > suppose we should just pick a name and add it. Maybe get_internals() would match, or get_non_terminals() might be clearer. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From winda002 at student.otago.ac.nz Wed Apr 7 05:16:25 2010 From: winda002 at student.otago.ac.nz (david winter) Date: Wed, 07 Apr 2010 17:16:25 +1200 Subject: [Biopython-dev] Draft Announcement for Biopython 1.54 In-Reply-To: <20100406135105.61102b7osdvos03t@www.studentmail.otago.ac.nz> References: <20100406135105.61102b7osdvos03t@www.studentmail.otago.ac.nz> Message-ID: <4BBC1529.3030709@student.otago.ac.nz> On 4/6/2010 1:51 PM, David Winter wrote: > Hi all, > > Here's a draft announcement for the next release ... Ok, changes from Eric (no axillary 'k') and Peter have been made and a version of the announcement with link and the like is waiting on the OBF server. Still easy to make changes if you've spotted something wrong/missing. David From bugzilla-daemon at portal.open-bio.org Wed Apr 7 07:00:03 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Apr 2010 03:00:03 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004070700.o37703t6001249@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-07 03:00 EST ------- (In reply to comment #5 by Joel) > Created an attachment (id=1475) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1475&action=view) [details] > phyloXML user code showing how to colorize a tree > Thanks for the example - it looks quite complicated. You have lots of functions taking "self" as the first argument. Are the intended to be methods of the tree/clade objects? Otherwise using an argument name like "tree" or "node" could be clearer. Why do you use leading underscores on some many variables whose scope is limited to single functions (in particular function hl_color_on_function)? They are private by their scope. (In reply to comment #7 by Eric) > (In reply to comment #3) > > (In reply to comment #2) > > > (In reply to comment #0) > > > > From the Clade docstring: > > > > > > > > "Both 'color' and 'width' elements apply for the whole clade unless > > > > overwritten in-sub clades. > > > > > > What is the bug? The docstring doesn't say the property will be explicitly > > > cascaded down to the sub-clades, which seems to be your interpretation. > > > Isn't that implicit when interpreting (drawing) the tree? > > > > > > Peter > > > > > > > I suppose one could maintain that it's the responsibility of user code to > > enforce the behavior specified in the docstring, although I think that's a > > recipe for incompatibilities. However, neither of the two available user codes > > I'm aware of (archaeoptryx or biopython's Phylo.draw_graphviz) actually > > implement it. It's better to just change the docstring, I think. > > > > > That's my interpretation of it -- the color attribute is meant to be handled > that way by whatever code uses it for drawing. > > Which means two things should happen before closing this bug: > > - Change the docstring to indicate *user code* is supposed to handle colors > and widths in a cascading fashion > > - Fix draw_graphviz (actually to_networkx) to cascade colors down branches, > like the Archaeopteryx GUI does So the Bio.Phylo drawing code (using NetworkX) doesn't cascade the colors/widths yet? We should probably check the behavour of other phyloXML GUI tools for consistency... and possibly file a bug in Archaeopteryx. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Apr 7 07:01:14 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 7 Apr 2010 08:01:14 +0100 Subject: [Biopython-dev] Draft Announcement for Biopython 1.54 In-Reply-To: <4BBC1529.3030709@student.otago.ac.nz> References: <20100406135105.61102b7osdvos03t@www.studentmail.otago.ac.nz> <4BBC1529.3030709@student.otago.ac.nz> Message-ID: On Wed, Apr 7, 2010 at 6:16 AM, david winter wrote: > > Ok, changes from Eric (no axillary 'k') and Peter have been made and a > version of the announcement with link and the like is waiting on the OBF > server. Still easy to make changes if you've spotted something > wrong/missing. > > David > Great work, thanks. Peter From biopython at maubp.freeserve.co.uk Wed Apr 7 07:36:20 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 7 Apr 2010 08:36:20 +0100 Subject: [Biopython-dev] Setting branch colors in Bio.Phylo Message-ID: Hi Eric, Following discussion on Bug 3046 and 3047, I wrote the following example using the current API to try to set all the branches to red, except the branches of the terminal nodes which I set to blue: from Bio import Phylo tree = Phylo.read("apaf.xml", "phyloxml") #This implicitly applies to all the children: tree.properties.append(Phylo.PhyloXML.BranchColor(255,0,0)) #Now set the terminal nodes to blue: for node in tree.find(terminal=True): node.properties.append(Phylo.PhyloXML.BranchColor(0,0,255)) Phylo.write(tree, "colored.xml", "phyloxml") It fails in the call to write - what am I doing wrong?: Traceback (most recent call last): File "", line 1, in File "/Users/pjcock/repositories/biopython/build/lib.macosx-10.6-universal-2.6/Bio/Phylo/_io.py", line 82, in write n = getattr(supported_formats[format], 'write')(trees, file, **kwargs) File "/Library/Python/2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 142, in write return Writer(obj).write(file, encoding=encoding, indent=indent) File "/Library/Python/2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 684, in __init__ self._tree = ElementTree.ElementTree(self.phyloxml(phyloxml)) File "/Library/Python/2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 705, in phyloxml elem.append(self.phylogeny(tree)) File "/Library/Python/2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 661, in wrapped elem.append(getattr(self, method)(item)) File "/Library/Python/2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 651, in wrapped elem = ElementTree.Element(tag, _clean_attrib(obj, attribs)) File "/Library/Python/2.6/site-packages/Bio/Phylo/PhyloXMLIO.py", line 643, in _clean_attrib val = getattr(obj, key) AttributeError: 'BranchColor' object has no attribute 'ref' Thanks, Peter From rodrigo_faccioli at uol.com.br Wed Apr 7 12:50:21 2010 From: rodrigo_faccioli at uol.com.br (Rodrigo Faccioli) Date: Wed, 7 Apr 2010 09:50:21 -0300 Subject: [Biopython-dev] PDB-Tidy Project - Google Summer Project Proposed Message-ID: Hi, I'm a Ph.D student at University of Sao Paulo (USP), Brazil. I've worked with BioPython mainly its Bio.PDB module since last year. I would like to participate at Google Summer Code through PDB-Tidy: command-line tools for manipulating PDB files project. In this way, I've talked with Eric Talevich who helped to write my project proposed (link below). http://dl.dropbox.com/u/4270818/Google_Summer_Code_Proposed.pdf In that document, I show more details what I've made with my Bio.PDB extension and how I might to contribute for PDB-Tidy project. But, in general lines, Prometheus is a web-site which allows the studies about protein-protein complexes based on electrostatics properties of proteins. When access the Prometheus Web-site, please use the information below: username: gsoc password: g0sum&10 I'll try to register as student. Before, I appreciate any comments about my proposed. Thanks in advance, -- Rodrigo Antonio Faccioli Ph.D Student in Electrical Engineering University of Sao Paulo - USP Engineering School of Sao Carlos - EESC Department of Electrical Engineering - SEL Intelligent System in Structure Bioinformatics http://laips.sel.eesc.usp.br Phone: 55 (16) 3373-9366 Ext 229 Curriculum Lattes - http://lattes.cnpq.br/1025157978990218 From eric.talevich at gmail.com Wed Apr 7 12:57:29 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 7 Apr 2010 08:57:29 -0400 Subject: [Biopython-dev] Setting branch colors in Bio.Phylo In-Reply-To: References: Message-ID: On Wed, Apr 7, 2010 at 3:36 AM, Peter wrote: > Hi Eric, > > Following discussion on Bug 3046 and 3047, I wrote the following > example using the current API to try to set all the branches to red, > except the branches of the terminal nodes which I set to blue: My aproach would be: from Bio import Phylo from Bio.Phylo import PhyloXML as PX tree = Phylo.read("apaf.xml", "phyloxml") for clade in tree.find_clades(): if clade.is_terminal(): clade.color = PX.BranchColor.from_name('blue') else: clade.color = PX.BranchColor.from_name('red') Strictly according the phyloXML spec, with colors cascading down branches, this should display the same way (but doesn't in Phylo.draw_graphviz): for child in tree.root.clades: child.color = PX.BranchColor.from_name('red') for term in tree.get_terminals() child.color = PX.BranchColor.from_name('blue') I haven't confirmed that Archaeopteryx follows the spec here, but that's how the GUI behaves when colorizing branches, so I assume it does. > from Bio import Phylo > tree = Phylo.read("apaf.xml", "phyloxml") > #This implicitly applies to all the children: > tree.properties.append(Phylo.PhyloXML.BranchColor(255,0,0)) > #Now set the terminal nodes to blue: > for node in tree.find(terminal=True): > ? ?node.properties.append(Phylo.PhyloXML.BranchColor(0,0,255)) > Phylo.write(tree, "colored.xml", "phyloxml") > > It fails in the call to write - what am I doing wrong?: The clade.properties attribute isn't a container for Python properties, it's a phyloXML-specific thing: http://www.phyloxml.org/documentation/version_1.10/phyloxml.xsd.html#h158033242 Cheers, Eric From biopython at maubp.freeserve.co.uk Wed Apr 7 13:30:12 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 7 Apr 2010 14:30:12 +0100 Subject: [Biopython-dev] Setting branch colors in Bio.Phylo In-Reply-To: References: Message-ID: On Wed, Apr 7, 2010 at 1:57 PM, Eric Talevich wrote: > On Wed, Apr 7, 2010 at 3:36 AM, Peter wrote: >> Hi Eric, >> >> Following discussion on Bug 3046 and 3047, I wrote the following >> example using the current API to try to set all the branches to red, >> except the branches of the terminal nodes which I set to blue: > > My aproach would be: > > from Bio import Phylo > from Bio.Phylo import PhyloXML as PX > > tree = Phylo.read("apaf.xml", "phyloxml") > for clade in tree.find_clades(): > ? ?if clade.is_terminal(): > ? ? ? ?clade.color = PX.BranchColor.from_name('blue') > ? ?else: > ? ? ? ?clade.color = PX.BranchColor.from_name('red') Very helpful. > Strictly according the phyloXML spec, with colors cascading down > branches, this should display the same way (but doesn't in > Phylo.draw_graphviz): So for Bug 3047 you'll check fix Phylo.draw_graphviz to do that (assuming the Archaeopteryx tool also does this)? > for child in tree.root.clades: > ? ?child.color = PX.BranchColor.from_name('red') > for term in tree.get_terminals() > ? ?child.color = PX.BranchColor.from_name('blue') I'm assuming you made a typo with the variable names (term vs child). Why not just apply the red to the root node itself? This seems to work: from Bio import Phylo #This implicitly applies to all the children: tree.root.color = Phylo.PhyloXML.BranchColor(255,0,0) #Now set the terminal nodes to blue: for clade in tree.find_clades(terminal=True): clade.color = Phylo.PhyloXML.BranchColor(0,0,255) Phylo.write(tree, "colored.xml", "phyloxml") This is based on my original example but now using find_clades which is more specific than find_all as I now see, and also if hadn't appreciated the difference between tree and tree.root (a Tree object and a Clade object - in other libraries a tree is also a clade). As an aside, I don't like the find method - it seems dangerous is the case where find_all returns multiple hits. I can see it could be useful *if* it returns a single hit, None for no hits, or an exception for multiple hits. > I haven't confirmed that Archaeopteryx follows the spec here, but > that's how the GUI behaves when colorizing branches, so I assume it > does. > > >> from Bio import Phylo >> tree = Phylo.read("apaf.xml", "phyloxml") >> #This implicitly applies to all the children: >> tree.properties.append(Phylo.PhyloXML.BranchColor(255,0,0)) >> #Now set the terminal nodes to blue: >> for node in tree.find(terminal=True): >> ? ?node.properties.append(Phylo.PhyloXML.BranchColor(0,0,255)) >> Phylo.write(tree, "colored.xml", "phyloxml") >> >> It fails in the call to write - what am I doing wrong?: > > The clade.properties attribute isn't a container for Python > properties, it's a phyloXML-specific thing: > http://www.phyloxml.org/documentation/version_1.10/phyloxml.xsd.html#h158033242 But it is still a list of objects, so I would expect to be able to add (suitable) things to it. If you regard this as an implementation detail, then maybe rename the list to _properties instead? Regards, Peter From bugzilla-daemon at portal.open-bio.org Wed Apr 7 13:50:21 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Apr 2010 09:50:21 -0400 Subject: [Biopython-dev] [Bug 3048] New: Bio.Blast.Applications.NcbitblastxCommandline Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3048 Summary: Bio.Blast.Applications.NcbitblastxCommandline Product: Biopython Version: 1.53 Platform: All OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: gebauer-jung at ice.mpg.de NcbitblastxCommandline._validate() uses a non-existing attribute in_pssm (if query is set), raises an error and hence the commandline cannot be used as suggested in the tutorial. The module Bio.Blast.Applications.py is identical to the biopython 1.54b and current github versions. from Bio.Blast.Applications import NcbitblastxCommandline >>> cline = NcbitblastxCommandline() >>> cline.query = 'xxx' >>> print cline Traceback (most recent call last): File "", line 1, in File "....../Bio/Application/__init__.py", line 256, in __str__ self._validate() File "....../biopython-1.53/build/lib.linux-i686-2.5/Bio/Blast/Applications.py", line 929, in _validate if self.query and self.in_pssm: AttributeError: 'NcbitblastxCommandline' object has no attribute 'in_pssm' -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 7 14:09:34 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Apr 2010 10:09:34 -0400 Subject: [Biopython-dev] [Bug 3048] Bio.Blast.Applications.NcbitblastxCommandline In-Reply-To: Message-ID: <201004071409.o37E9YtO017358@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3048 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-07 10:09 EST ------- Confirmed. That validation code would have made sense for tblastn, but not for tblastx. Fixed: http://github.com/biopython/biopython/commit/284216406ecdc77062f3b9cd93bd648e08541b22 Thank you! Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Wed Apr 7 15:55:01 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 7 Apr 2010 11:55:01 -0400 Subject: [Biopython-dev] Setting branch colors in Bio.Phylo In-Reply-To: References: Message-ID: On Wed, Apr 7, 2010 at 9:30 AM, Peter wrote: > Why not just apply the red to the root node itself? This seems to > work: > > from Bio import Phylo > #This implicitly applies to all the children: > tree.root.color = Phylo.PhyloXML.BranchColor(255,0,0) > #Now set the terminal nodes to blue: > for clade in tree.find_clades(terminal=True): > ? clade.color = Phylo.PhyloXML.BranchColor(0,0,255) > Phylo.write(tree, "colored.xml", "phyloxml") You're right, that's better. > > This is based on my original example but now using find_clades > which is more specific than find_all as I now see, and also if > hadn't appreciated the difference between tree and tree.root > (a Tree object and a Clade object - in other libraries a tree > is also a clade). The object hierarchy is a fairly literal translation of the PhyloXML spec. That's why I used TreeMixin to plaster over the differences -- most of the methods that operate on a whole tree or subclade make sense either way, and we can still separate global from local information somewhat. > As an aside, I don't like the find method - it seems dangerous > is the case where find_all returns multiple hits. I can see it > could be useful *if* it returns a single hit, None for no hits, or > an exception for multiple hits. I use find_all and find_clades (mostly find_clades) in loops, and find in if statements. I'm open to renaming any of the TreeMixin methods -- e.g. find_all --> find_elements find --> find_any, or get_any, or just any find_clades --> find_all, or find, or stay the same Would those names be more intuitive? >>> for node in tree.find(terminal=True): >>> ? ?node.properties.append(Phylo.PhyloXML.BranchColor(0,0,255)) >>> Phylo.write(tree, "colored.xml", "phyloxml") >>> >>> It fails in the call to write - what am I doing wrong?: >> >> The clade.properties attribute isn't a container for Python >> properties, it's a phyloXML-specific thing: >> http://www.phyloxml.org/documentation/version_1.10/phyloxml.xsd.html#h158033242 > > But it is still a list of objects, so I would expect to be able to add > (suitable) > things to it. If you regard this as an implementation detail, then maybe > rename the list to _properties instead? You can append to it, but the thing you append needs to be a PhyloXML.Property object, or else the serializer harfs when it can't find the expected attributes. Mitigation: Since the phyloXML spec requires some attributes in Property instances, and the serializer assumes they'll be satisfied, Property.__init__ should do some additional checks too and fail early if necessary. Adding type checks all over Bio.Phylo seems un-Pythonic, but checking attribute existence should be easy. From eric.talevich at gmail.com Wed Apr 7 18:18:10 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 7 Apr 2010 14:18:10 -0400 Subject: [Biopython-dev] PhyloXML.BranchColor methods Message-ID: Hi all, There's been some discussion in Bugzilla about using the PhyloXML.BranchColor class to assign colors to Bio.Phylo tree branches. http://bugzilla.open-bio.org/show_bug.cgi?id=3047 Currently, one sets the color for a clade by assigning a BranchColor instance to the clade's color attribute: from Bio import Phylo from Bio.Phylo import PhyloXML as PX tree = Phylo.read(..., 'phyloxml') critters = tree.find(name='Rattus') critters.color = PX.BranchColor(0, 128, 0) # or, using HTML/matplotlib color names: critters.color = PX.BranchColor.from_name('green') The BranchColor class has these methods: from_name -- a class method that looks up RGB values in the hard-coded dictionary BranchColor.color_names to_hex -- a 24-bit hex string, e.g. '#00A000', suitable for HTML/CSS and matplotlib to_rgb -- a tuple of float RGB values, scaled 0 to 1.0 (See http://github.com/biopython/biopython/blob/master/Bio/Phylo/PhyloXML.py) Here are some proposals. Please let me know which of these you like or hate. 1. Add a function Bio.Phylo.PhyloXML.color(...) which behaves like the clade.color property Peter suggested earlier: def color(thing): if isinstance(thing, BranchColor): return thing elif isinstance(thing, basestring): if thing in BranchColor.color_names: return BranchColor.from_name(thing) elif len(thing) == 7 and thing[0] == '#': # CSS/HTML/matplotlib-style hex string return BranchColor.from_hex_string(thing) raise ValueError("Fail!") elif hasattr(thing, '__iter__') and len(thing) == 3: # RGB triple -- an abstract base class would be nice here # (or, take *args instead of thing) return BranchColor(*thing) raise ValueError("Fail!") Then the last line of the above example would be: critters.color = PX.color('green') 2. Add a class method from_hex_string for constructing BranchColor objects from a hex string like '#FF00AA' This complements the to_hex function (to be renamed to_hex_string, unless someone has a better name for it). The color function given above assumes this method exists. 3. Drop the to_rgb method; it's confusing and floating-point conversions lead to bugs. 4. New __repr__ and __str__ methods: >>> critters.color BranchColor(red=0, green=128, blue=0) >>> print critters.color (0, 128, 0) I'm don't think any of the other PhyloXML classes warrant a similar treatment -- except possibly PhyloXML.Sequence, which can be built from at SeqRecord using the from_seqrecord class method. Any other suggestions along these lines? Thanks, Eric From bugzilla-daemon at portal.open-bio.org Wed Apr 7 20:15:27 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Apr 2010 16:15:27 -0400 Subject: [Biopython-dev] [Bug 3045] TreeMixin, please define enumerator and other convenience methods In-Reply-To: Message-ID: <201004072015.o37KFRPR028122@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3045 ------- Comment #4 from eric.talevich at gmail.com 2010-04-07 16:15 EST ------- (In reply to comment #0) > > The usage patterns for, say, setting a phyloXML property from an array prop_arr > should look something like: > > [node.set_property(prop_arr[i], *prop_params, **prop_keywords) > for i, node in tree.enumerate_internals()] How about: [node.properties.append(PX.Property(prop_arr[i], *prop_args, **prop_kwargs[i])) for i, node in enumerate(tree.find_clades(terminal=False))] Is there something different about this form versus your example above that hurts performance? > The three issues that frustrate such concision are > (1) internal nodes, terminal nodes, and all nodes are not currently > on an equal footing with respect to methods For your usage it might be faster to use the generators: find_clades(terminal-False), find_clades(terminal=True), find_clades() I'm considering renaming 'find_clades' to 'find', and 'find' to 'find_any' -- would the shorter name make your code a little cleaner? We could also have 'get_nonterminal' and 'get_all_clades' -- I'm not so sure that the last one is useful enough to justify cluttering the API further; what do you think? (I actually balked at add get_terminals() originally, since it's so simple.) > (2) there are no enumerator methods Doesn't the enumerate() function work just as well, or even better, with the functional/array-oriented programming style you're using? The find_* methods return lazily-evaluated iterables to enable this kind of usage in a memory-efficient way. > Here I give some convenience methods that I wish were defined in TreeMixin. I > have tested them as standalone methods. I hope you'll see fit to include them > at some point. > > def count_internals(self): > """Counts the number of non-terminal (internal) nodes within this tree.""" > return [i for i,e in enumerate_internals(self)][-1] + 1 I can add a convenience function that would help: def iterlen(items): for i, x in enumerate(items): count = i return count + 1 Then count_internals(tree) is the same as: iterlen(tree.find_clades(terminal=False)) Or, if we add get_nonterminals() it's easy: len(tree.get_nonterminals()) > def enumerate_internals(self): > """Returns an enumerator of non-terminal clades""" > return enumerate(self.find_clades(terminal=False)) > > def enumerate_terminals(self): > """Returns an enumerator of terminal clades""" > return enumerate(self.find_clades(terminal=True)) > > def enumerate_all(self): > """Returns an enumerator on all clades""" > return enumerate(self.find_clades()) I can see why these are handy in your own code, because you're using them a lot, but I don't think they introduce enough new functionality to justify adding more methods to TreeMixin. > Less critical but still useful are the following two methods (and one private > utility) that I find useful for operations on trees: > > def is_semipreterminal(self): > """True if any direct descendent is terminal.""" > if self.root.is_terminal(): > return False > for clade in self.clades: > if clade.is_terminal(): > return True > return False Is semipreterminal a standard name for nodes like this? In Python 2.5 and later, you could also do: any(clade.is_terminal() for clade in self) > def terminal_neighbor_dists(self): > """Return a list of distances between adjacent terminals""" > return [self.distance(*i) for i in > _generate_pairs(self.find_clades(terminal=True))] > > def _generate_pairs(self): > import itertools > pairs = itertools.tee(self) > pairs[1].next() > return itertools.izip(pairs[0], pairs[1]) Interesting. Getting philosophical -- I don't intend for TreeMixin to have a built-in method for every possible use case, but one of my goals in Bio.Phylo is to provide all the low-level functionality necessary so that when you do have to write your own function to do something special, it doesn't take much new code. So, I'm quite pleased that you were able to implement this functionality for yourself in just 4 lines of (non-scaffolding) code. The biggest weakness in Bio.Phylo from my viewpoint is that most of the TreeMixin methods do some portion of a full-tree search every time they are called -- there's no internal lookup table. So to make more efficient algorithms possible, I added some methods that do as much as possible in one shot. Example: rather than a distance_to(node) method, we have TreeMixin.depths() which returns a dictionary of all nodes mapped to their respective total branch lengths from the root. What other whole-tree operations along this philosophy would you like to see implemented? Some ideas: - heights() -- like depths(), but mapping each node to the distance to the nearest (or farthest?) terminal - names() -- map each clade name to the clade instance. Clades with no name won't be in the dictionary. Each of these could take a 'target' specification like get_path does, so you can restrict the result to a specific set of clades (e.g. terminals). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Apr 7 20:29:47 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 7 Apr 2010 21:29:47 +0100 Subject: [Biopython-dev] PhyloXML.BranchColor methods In-Reply-To: References: Message-ID: On Wed, Apr 7, 2010 at 7:18 PM, Eric Talevich wrote: > Hi all, > > There's been some discussion in Bugzilla about using the > PhyloXML.BranchColor class to assign colors to Bio.Phylo tree > branches. > http://bugzilla.open-bio.org/show_bug.cgi?id=3047 > > Currently, one sets the color for a clade by assigning a BranchColor > instance to the clade's color attribute: > > from Bio import Phylo > from Bio.Phylo import PhyloXML as PX > > tree = Phylo.read(..., 'phyloxml') > critters = tree.find(name='Rattus') > critters.color = PX.BranchColor(0, 128, 0) > # or, using HTML/matplotlib color names: > critters.color = PX.BranchColor.from_name('green') > > The BranchColor class has these methods: > > ?from_name -- a class method that looks up RGB values in the > hard-coded dictionary BranchColor.color_names We could probably move the lookup table under Bio.Data, where it might also be useful for Bio.Graphics. I assume you are using standard HTML/CSS color names? > ?to_hex -- a 24-bit hex string, e.g. '#00A000', suitable for HTML/CSS > and matplotlib > ?to_rgb -- a tuple of float RGB values, scaled 0 to 1.0 > > (See http://github.com/biopython/biopython/blob/master/Bio/Phylo/PhyloXML.py) > > > Here are some proposals. Please let me know which of these you like or hate. > > 1. Add a function Bio.Phylo.PhyloXML.color(...) which behaves like the > clade.color property Peter suggested earlier: I was suggesting adding a property to the clade (which could for example map color names or RBG triples to the BranchColor objects automatically). It would still be: from Bio import Phylo from Bio.Phylo import PhyloXML as PX tree = Phylo.read(..., 'phyloxml') critters = tree.find(name='Rattus') critters.color = PX.BranchColor(0, 128, 0) BUT, you could choose to allow: critters.color = (0, 128, 0) Or a named color, critters.color = "green" Or a hex string. critters.color = "#008000" and have the property set method convert these into the same result, BranchColor(0, 128, 0). > 2. Add a class method from_hex_string for constructing BranchColor > objects from a hex string like '#FF00AA' > > This complements the to_hex function (to be renamed to_hex_string, > unless someone has a better name for it). The color function given > above assumes this method exists. Hmm, to_hex seems OK to me. > 3. Drop the to_rgb method; it's confusing and floating-point > conversions lead to bugs. I had assumed to_rgb would give a tuple of ints in the range 0 to 255 (following HTML/CSS color conventions). That would avoid the rounding issue. > 4. New __repr__ and __str__ methods: > >>>> critters.color > BranchColor(red=0, green=128, blue=0) >>>> print critters.color > (0, 128, 0) Personally I would like an HTML style output, hash then a six character hex number. Anyone planning to look at the XML should know these from HTML and CSS. However, I recognise this isn't universally understood. > I'm don't think any of the other PhyloXML classes warrant a similar > treatment -- except possibly PhyloXML.Sequence, which can be built > from at SeqRecord using the from_seqrecord class method. Any other > suggestions along these lines? Colors are special enough to warrant special attention. Possibly also width would to. Peter From eric.talevich at gmail.com Wed Apr 7 21:06:11 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 7 Apr 2010 17:06:11 -0400 Subject: [Biopython-dev] PhyloXML.BranchColor methods In-Reply-To: References: Message-ID: On Wed, Apr 7, 2010 at 4:29 PM, Peter wrote: > On Wed, Apr 7, 2010 at 7:18 PM, Eric Talevich > wrote: > > Hi all, > > > > There's been some discussion in Bugzilla about using the > > PhyloXML.BranchColor class to assign colors to Bio.Phylo tree > > branches. > > http://bugzilla.open-bio.org/show_bug.cgi?id=3047 > > > > Currently, one sets the color for a clade by assigning a BranchColor > > instance to the clade's color attribute: > > > > from Bio import Phylo > > from Bio.Phylo import PhyloXML as PX > > > > tree = Phylo.read(..., 'phyloxml') > > critters = tree.find(name='Rattus') > > critters.color = PX.BranchColor(0, 128, 0) > > # or, using HTML/matplotlib color names: > > critters.color = PX.BranchColor.from_name('green') > > > > > The BranchColor class has these methods: > > > > from_name -- a class method that looks up RGB values in the > > hard-coded dictionary BranchColor.color_names > > We could probably move the lookup table under Bio.Data, > where it might also be useful for Bio.Graphics. I assume you > are using standard HTML/CSS color names? > OK, that's cool too. I took the color names and values from the HTML standard and W3Schools: http://w3schools.com/html/html_colornames.asp I checked the more exotic names with matplotlib and gcolor2 -- so any name from this list will also work in matplotlib, and consequently, draw_graphviz. > (See > http://github.com/biopython/biopython/blob/master/Bio/Phylo/PhyloXML.py) > > > > Here are some proposals. Please let me know which of these you like or > hate. > > > > 1. Add a function Bio.Phylo.PhyloXML.color(...) which behaves like the > > clade.color property Peter suggested earlier: > > I was suggesting adding a property to the clade (which could for > example map color names or RBG triples to the BranchColor > objects automatically). It would still be: > > from Bio import Phylo > from Bio.Phylo import PhyloXML as PX > tree = Phylo.read(..., 'phyloxml') > critters = tree.find(name='Rattus') > critters.color = PX.BranchColor(0, 128, 0) > > BUT, you could choose to allow: > > critters.color = (0, 128, 0) > > Or a named color, > > critters.color = "green" > > Or a hex string. > > critters.color = "#008000" > > and have the property set method convert these into > the same result, BranchColor(0, 128, 0). > It's pretty magical, but the convenience of "critters.color = 'green'" wins. I'll implement the property to accept a BranchColor, RGB triple, color name, or hex string, and raise a ValueError otherwise. > > 2. Add a class method from_hex_string for constructing BranchColor > > objects from a hex string like '#FF00AA' > > > > This complements the to_hex function (to be renamed to_hex_string, > > unless someone has a better name for it). The color function given > > above assumes this method exists. > > Hmm, to_hex seems OK to me. > My only concern: the builtin hex() returns a string formatted a little differently. Matching that format would be useless here, but I was worried about people being confused. But if you're OK with to/from_hex, then I am too. > 3. Drop the to_rgb method; it's confusing and floating-point > > conversions lead to bugs. > > I had assumed to_rgb would give a tuple of ints in the range > 0 to 255 (following HTML/CSS color conventions). That would > avoid the rounding issue. > Strawmen: - Should to_rgb be renamed to_tuple, then? - if we defined BranchColor.__iter__ as "return (self.red, self.green, self.blue)", then "tuple(clade.color)" would work - if we defined BranchColor.__hex__, then similarly, "hex(clade.color)" would work - ... but those magic methods would hurt discoverability > 4. New __repr__ and __str__ methods: > > > >>>> critters.color > > BranchColor(red=0, green=128, blue=0) > >>>> print critters.color > > (0, 128, 0) > > Personally I would like an HTML style output, hash then a six character > hex number. Anyone planning to look at the XML should know these > from HTML and CSS. However, I recognise this isn't universally > understood. > I'm OK with that too. We could also do a reverse lookup in the color_names table and return the color name instead if there's a match. That would cover most users -- if you know RGB values, you can probably handle hex, and if you just use color names instead then you'll get color names back. > > I'm don't think any of the other PhyloXML classes warrant a similar > > treatment -- except possibly PhyloXML.Sequence, which can be built > > from at SeqRecord using the from_seqrecord class method. Any other > > suggestions along these lines? > > Colors are special enough to warrant special attention. Possibly also > width would to. > Fortunately, width is just a float -- no PhyloXML-specific classes to deal with. Cheers, Eric From biopython at maubp.freeserve.co.uk Wed Apr 7 21:31:42 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 7 Apr 2010 22:31:42 +0100 Subject: [Biopython-dev] PhyloXML.BranchColor methods In-Reply-To: References: Message-ID: On Wed, Apr 7, 2010 at 10:06 PM, Eric Talevich wrote: > On Wed, Apr 7, 2010 at 4:29 PM, Peter wrote: >> >> I was suggesting adding a property to the clade (which could for >> example map color names or RBG triples to the BranchColor >> objects automatically). It would still be: >> >> from Bio import Phylo >> from Bio.Phylo import PhyloXML as PX >> tree = Phylo.read(..., 'phyloxml') >> critters = tree.find(name='Rattus') >> critters.color = PX.BranchColor(0, 128, 0) >> >> BUT, you could choose to allow: >> >> critters.color = (0, 128, 0) >> >> Or a named color, >> >> critters.color = "green" >> >> Or a hex string. >> >> critters.color = "#008000" >> >> and have the property set method convert these into >> the same result, BranchColor(0, 128, 0). >> > > It's pretty magical, but the convenience of "critters.color = 'green'" wins. > I'll implement the property to accept a BranchColor, RGB triple, color name, > or hex string, and raise a ValueError otherwise. Yeah - it feels right to me ;) >> ?> 2. Add a class method from_hex_string for constructing BranchColor >> > objects from a hex string like '#FF00AA' >> > >> > This complements the to_hex function (to be renamed to_hex_string, >> > unless someone has a better name for it). The color function given >> > above assumes this method exists. >> >> Hmm, to_hex seems OK to me. >> > > My only concern: the builtin hex() returns a string formatted a little > differently. Matching that format would be useless here, but I was worried > about people being confused. But if you're OK with to/from_hex, then I am > too. See below. > ?> 3. Drop the to_rgb method; it's confusing and floating-point >> > conversions lead to bugs. >> >> I had assumed to_rgb would give a tuple of ints in the range >> 0 to 255 (following HTML/CSS color conventions). That would >> avoid the rounding issue. >> > > Strawmen: > - Should to_rgb be renamed to_tuple, then? There are other things a color tuple could be, although RGB is the most common (see also CMYK, HSV, ...). > - if we defined BranchColor.__iter__ as "return (self.red, self.green, > self.blue)", then "tuple(clade.color)" would work > - if we defined BranchColor.__hex__, then similarly, "hex(clade.color)" > would work > - ... but those magic methods would hurt discoverability Huh - that is neat, but it hadn't occurred to me to think about supporting those. It isn't helpful to support hex(...) since that usually returns a string which starts "0x..." which isn't what we want for HTML, CSS or XML. Maybe instead of to_hex() we should have to_css_color() or something like that? Peter From bugzilla-daemon at portal.open-bio.org Thu Apr 8 00:49:57 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 7 Apr 2010 20:49:57 -0400 Subject: [Biopython-dev] [Bug 3047] PhyloXML, behavior on setting color and width doesn't match docstring or spec In-Reply-To: Message-ID: <201004080049.o380nvUd001788@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3047 eric.talevich at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #9 from eric.talevich at gmail.com 2010-04-07 20:49 EST ------- (In reply to comment #8 by Peter) > (In reply to comment #7 by Eric) > > That's my interpretation of it -- the color attribute is meant to be handled > > that way by whatever code uses it for drawing. > > > > Which means two things should happen before closing this bug: > > > > - Change the docstring to indicate *user code* is supposed to handle colors > > and widths in a cascading fashion > > > > - Fix draw_graphviz (actually to_networkx) to cascade colors down branches, > > like the Archaeopteryx GUI does > > So the Bio.Phylo drawing code (using NetworkX) doesn't cascade the > colors/widths yet? It does now: http://github.com/biopython/biopython/commit/1ea0c921c9bea6acb2b6b41566383fc54ed4862f (and preceding commits -- sorry about the weight/width mixup) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Apr 8 10:55:29 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 8 Apr 2010 11:55:29 +0100 Subject: [Biopython-dev] test_PhyloXML.py error on Python 2.4 Message-ID: Hi Eric, I noticed that test_PhyloXML.py is failing on Python 2.4, it should be skipped since I don't have ElementTree installed. Have you got access to a Python 2.4 installation to look at this? Thanks Peter ====================================================================== ERROR: test_PhyloXML ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 267, in runTest suite = unittest.TestLoader().loadTestsFromName(name) File "c:\python24\lib\unittest.py", line 524, in loadTestsFromName module = __import__('.'.join(parts_copy)) File "test_PhyloXML.py", line 15, in ? from Bio.Phylo import PhyloXML as PX, PhyloXMLIO File "C:\repositories\biopython_official\build\lib.win32-2.4\Bio\Phylo\__init_ _.py", line 12, in ? from Bio.Phylo._io import parse, read, write, convert File "C:\repositories\biopython_official\build\lib.win32-2.4\Bio\Phylo\_io.py" , line 15, in ? import PhyloXMLIO File "C:\repositories\biopython_official\build\lib.win32-2.4\Bio\Phylo\PhyloXM LIO.py", line 23, in ? from Bio.Phylo import PhyloXML as PX ImportError: cannot import name PhyloXML From eric.talevich at gmail.com Thu Apr 8 13:05:13 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 8 Apr 2010 09:05:13 -0400 Subject: [Biopython-dev] test_PhyloXML.py error on Python 2.4 In-Reply-To: References: Message-ID: On Thu, Apr 8, 2010 at 6:55 AM, Peter wrote: > Hi Eric, > > I noticed that test_PhyloXML.py is failing on Python 2.4, it should > be skipped since I don't have ElementTree installed. Have you got > access to a Python 2.4 installation to look at this? > The traceback says the PhyloXML module is missing, but PhyloXMLIO and the rest of Bio.Phylo are there. Is that normal? I would expect that PhyloXML would still be installed with Biopython on Py2.4, but when the test runs it would trigger a MissingExternalDependency error for ElementTree when importing PhyloXMLIO, and run_tests.py would then skip it. I don't have Py2.4 on this machine but I can track down a copy. (Annoyingly, Ubuntu seems to have dropped Pythons 2.4 and 2.5 from the official repos in Lucid Lynx.) -Eric > ====================================================================== > ERROR: test_PhyloXML > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "run_tests.py", line 267, in runTest > suite = unittest.TestLoader().loadTestsFromName(name) > File "c:\python24\lib\unittest.py", line 524, in loadTestsFromName > module = __import__('.'.join(parts_copy)) > File "test_PhyloXML.py", line 15, in ? > from Bio.Phylo import PhyloXML as PX, PhyloXMLIO > File > "C:\repositories\biopython_official\build\lib.win32-2.4\Bio\Phylo\__init_ > _.py", line 12, in ? > from Bio.Phylo._io import parse, read, write, convert > File > "C:\repositories\biopython_official\build\lib.win32-2.4\Bio\Phylo\_io.py" > , line 15, in ? > import PhyloXMLIO > File > "C:\repositories\biopython_official\build\lib.win32-2.4\Bio\Phylo\PhyloXM > LIO.py", line 23, in ? > from Bio.Phylo import PhyloXML as PX > ImportError: cannot import name PhyloXML > From biopython at maubp.freeserve.co.uk Thu Apr 8 13:23:48 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 8 Apr 2010 14:23:48 +0100 Subject: [Biopython-dev] test_PhyloXML.py error on Python 2.4 In-Reply-To: References: Message-ID: On Thu, Apr 8, 2010 at 2:05 PM, Eric Talevich wrote: > On Thu, Apr 8, 2010 at 6:55 AM, Peter > wrote: >> >> Hi Eric, >> >> I noticed that test_PhyloXML.py is failing on Python 2.4, it should >> be skipped since I don't have ElementTree installed. Have you got >> access to a Python 2.4 installation to look at this? > > The traceback says the PhyloXML module is missing, but PhyloXMLIO and the > rest of Bio.Phylo are there. Is that normal? I would expect that PhyloXML > would still be installed with Biopython on Py2.4, but when the test runs it > would trigger a MissingExternalDependency error for ElementTree when > importing PhyloXMLIO, and run_tests.py would then skip it. Yes, and I don't understand why that doesn't happen in the test suite: C:\>c:\python24\python Python 2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> from Bio.Phylo import PhyloXML as PX Traceback (most recent call last): File "", line 1, in ? File "c:\python24\lib\site-packages\Bio\Phylo\__init__.py", line 12, in ? from Bio.Phylo._io import parse, read, write, convert File "c:\python24\Lib\site-packages\Bio\Phylo\_io.py", line 15, in ? import PhyloXMLIO File "c:\python24\Lib\site-packages\Bio\Phylo\PhyloXMLIO.py", line 42, in ? raise MissingExternalDependencyError( Bio.MissingExternalDependencyError: No ElementTree module was found. Use Python 2.5+, lxml or elementtree if you want to use Bio.PhyloXML. >>> Odd. Peter From bugzilla-daemon at portal.open-bio.org Thu Apr 8 21:56:41 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Apr 2010 17:56:41 -0400 Subject: [Biopython-dev] [Bug 3054] New: Add upper and lower methods to the SeqRecord Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3054 Summary: Add upper and lower methods to the SeqRecord Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Unlike some other potential string or Seq like methods the SeqRecord lacks, I don't see any problems with annotation with adding upper and lower methods. See also discussion on Bug 2351. The upper and lower methods are useful, e.g. making a mixed case FASTQ file into upper case: from Bio import SeqIO records = (rec.upper() for rec in SeqIO.parse("mixed.fastq", "fastq")) SeqIO.write(records, "upper.fastq", "fastq") Patch to follow. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 8 21:57:24 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Apr 2010 17:57:24 -0400 Subject: [Biopython-dev] [Bug 3054] Add upper and lower methods to the SeqRecord In-Reply-To: Message-ID: <201004082157.o38LvOpZ011476@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3054 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-08 17:57 EST ------- Created an attachment (id=1477) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1477&action=view) Adds upper and lower methods to the SeqRecord -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 8 22:02:30 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Apr 2010 18:02:30 -0400 Subject: [Biopython-dev] [Bug 2822] Bio.Application.AbstractCommandline - properties and kwargs In-Reply-To: Message-ID: <201004082202.o38M2UjF011682@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2822 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-08 18:02 EST ------- Should have marked this as fixed a while ago... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 8 22:04:04 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Apr 2010 18:04:04 -0400 Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text output with NCBStandalone.PSIBlastParser In-Reply-To: Message-ID: <201004082204.o38M44WB011716@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2927 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution| |WONTFIX ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-08 18:04 EST ------- Marking this as WONTFIX, since the problem output does appear to be a bug in the old "legacy" NCBI BLAST tool (and is fixed in the new NCBI BLAST+ tool). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 8 22:10:37 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Apr 2010 18:10:37 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004082210.o38MAbo7011867@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 ------- Comment #6 from eric.talevich at gmail.com 2010-04-07 01:05 EST ------- (In reply to comment #0) > It would be nice if there were get/set properties for phyloXML objects that > were easier and more concise to use. Right now, to set, say, a phyloXML > property, one has to read the code to learn the names and arguments of the > Property class and also to learn that properties are added by appending to a > list. Yes, it's easier to tweak the class definitions if there's not much syntactic sugar to get in the way. This is still pretty new code ;) but of course I'm open to suggestions. > Besides the matter of convenience, there is also a question about how the > properties and taxonomies objects behave. I will take the matter up with the > phyloXML mailing list, but I believe that these objects should be > dictionary-like rather than list-like. That is, duplicate ref values should > not be allowed because the question of how to handle duplicates would have to > get pushed down to the user level and will be inconsistent. The Events class (clade.events attribute) mimics a dictionary. Have you used that yet? About clade.properties: If ordering of properties doesn't matter, 'ref' is guaranteed to be unique at a node, and it seems to be the right way to index the other associated data, then I can make clade.properties act like a dictionary. Can we confirm all of these? And for the implementation, can you provide a sketch of what you'd like the final structure to look like, and maybe a contrived doctest-like code example showing what you'd like to be able to do? In many cases, the phyloXML spec doesn't currently promise enough to make nice shortcuts work without the possibility of breaking in the future. For example, check out this new demo with *two* bootstrap values for every clade: http://www.phylosoft.org/archaeopteryx/examples/data/multiple_supports.xml I was tempted to make confidences act like a dictionary indexed by support type, but clearly now that wouldn't have worked. A list of Confidence objects lets us stay faithful to the raw XML representation. > def set_property(self, *propArgs, **propkwArgs): > for property in self.properties: > if property.ref == propArgs[1]: > property = PhyloXML.Property(propArgs) > return > self.properties.append(PhyloXML.Property(*propArgs, **propkwArgs)) > > def get_property(self, key): > for property in self.properties: > if property.ref == key: > return property.value > raise KeyError It's possible that Bio.Phylo will pick up the convention of "add_foo/get_foo" methods where a property would be overly magical, and something noteworthy is going on internally. Alignment objects have "add_sequence", and Phylogeny objects have "get_alignment". Would you use a Phylogeny method called add_alignment, taking something like a Phylip character matrix? We can figure out a sugared interface for clade.properties once we know how which of the requirements stated above will actually be guaranteed. > def set_ID(self, *idArgs, **idkwArgs): > self.node_id = PhyloXML.Id(*idArgs, **idkwArgs) If you do "from Bio.Phylo import PhyloXML as PX" it really doesn't save any typing, and the **kwargs magic is even less suitable for introspection. It's not possible to take advantage of all the PhyloXML annotations available without learning about the annotation classes the ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-08 18:10 EST ------- (In reply to comment #6) > > It's possible that Bio.Phylo will pick up the convention of "add_foo/get_foo" > methods where a property would be overly magical, and something noteworthy is > going on internally. Alignment objects have "add_sequence", and Phylogeny > objects have "get_alignment". Would you use a Phylogeny method called > add_alignment, taking something like a Phylip character matrix? > Note that while the "old" Alignment object has an add_sequence method, it is now tagged as obsolete with the "new" Alignment object in Biopython 1.54 (instead you append a SeqRecord). Regarding PhyloXML, would it fit to rename "get_alignment" as "to_alignment"? That is a fairly common naming convention. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 8 22:14:24 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 8 Apr 2010 18:14:24 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004082214.o38MEOvt011951@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-08 18:14 EST ------- (In reply to comment #6) > In many cases, the phyloXML spec doesn't currently promise enough to make nice > shortcuts work without the possibility of breaking in the future. For example, > check out this new demo with *two* bootstrap values for every clade: > http://www.phylosoft.org/archaeopteryx/examples/data/multiple_supports.xml I've actually done something like that a few years back, using bootstrap values from two different tree building tools (NJ and ML I think). I had to do this by loading two Newick files of the same tree with different bootstraps - quite messy! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Fri Apr 9 06:09:14 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 9 Apr 2010 02:09:14 -0400 Subject: [Biopython-dev] test_PhyloXML.py error on Python 2.4 In-Reply-To: References: Message-ID: On Thu, Apr 8, 2010 at 9:23 AM, Peter wrote: > On Thu, Apr 8, 2010 at 2:05 PM, Eric Talevich > wrote: > > On Thu, Apr 8, 2010 at 6:55 AM, Peter > > wrote: > >> > >> Hi Eric, > >> > >> I noticed that test_PhyloXML.py is failing on Python 2.4, it should > >> be skipped since I don't have ElementTree installed. Have you got > >> access to a Python 2.4 installation to look at this? > > > > The traceback says the PhyloXML module is missing, but PhyloXMLIO and the > > rest of Bio.Phylo are there. Is that normal? I would expect that PhyloXML > > would still be installed with Biopython on Py2.4, but when the test runs > it > > would trigger a MissingExternalDependency error for ElementTree when > > importing PhyloXMLIO, and run_tests.py would then skip it. > > Yes, and I don't understand why that doesn't happen in the test suite: > > C:\>c:\python24\python > Python 2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)] on > win32 > Type "help", "copyright", "credits" or "license" for more information. > >>> from Bio.Phylo import PhyloXML as PX > Traceback (most recent call last): > File "", line 1, in ? > File "c:\python24\lib\site-packages\Bio\Phylo\__init__.py", line 12, in ? > from Bio.Phylo._io import parse, read, write, convert > File "c:\python24\Lib\site-packages\Bio\Phylo\_io.py", line 15, in ? > import PhyloXMLIO > File "c:\python24\Lib\site-packages\Bio\Phylo\PhyloXMLIO.py", line 42, in > ? > raise MissingExternalDependencyError( > Bio.MissingExternalDependencyError: No ElementTree module was found. Use > Python > 2.5+, lxml or elementtree if you want to use Bio.PhyloXML. > >>> > > Odd. > > Peter > Well, it's fixed in GitHub now: http://github.com/biopython/biopython/commit/7bd18aaf9582dd7d9193cc39d7faf6d51e3e4161 It seems like imports are being cached in some way so that an import that failed once is not tried again. In any case, it still raises an ImportError which we can catch and turn into another MissingExternalDependencyError. -Eric From eric.talevich at gmail.com Fri Apr 9 06:19:17 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 9 Apr 2010 02:19:17 -0400 Subject: [Biopython-dev] PhyloXML.BranchColor methods In-Reply-To: References: Message-ID: On Wed, Apr 7, 2010 at 5:31 PM, Peter wrote: > On Wed, Apr 7, 2010 at 10:06 PM, Eric Talevich > wrote: > > On Wed, Apr 7, 2010 at 4:29 PM, Peter >wrote: > >> > >> I was suggesting adding a property to the clade (which could for > >> example map color names or RBG triples to the BranchColor > >> objects automatically). It would still be: > >> > >> from Bio import Phylo > >> from Bio.Phylo import PhyloXML as PX > >> tree = Phylo.read(..., 'phyloxml') > >> critters = tree.find(name='Rattus') > >> critters.color = PX.BranchColor(0, 128, 0) > >> > >> BUT, you could choose to allow: > >> > >> critters.color = (0, 128, 0) > >> > >> Or a named color, > >> > >> critters.color = "green" > >> > >> Or a hex string. > >> > >> critters.color = "#008000" > >> > >> and have the property set method convert these into > >> the same result, BranchColor(0, 128, 0). > >> > > > > It's pretty magical, but the convenience of "critters.color = 'green'" > wins. > > I'll implement the property to accept a BranchColor, RGB triple, color > name, > > or hex string, and raise a ValueError otherwise. > > Yeah - it feels right to me ;) > I implemented this property; it's in GitHub now. >> Hmm, to_hex seems OK to me. > >> > > > > My only concern: the builtin hex() returns a string formatted a little > > differently. Matching that format would be useless here, but I was > worried > > about people being confused. But if you're OK with to/from_hex, then I am > > too. > I left these as from_hex and to_hex. The docstrings are clear enough about what the methods do, I think. > > 3. Drop the to_rgb method; it's confusing and floating-point > >> > conversions lead to bugs. > >> > >> I had assumed to_rgb would give a tuple of ints in the range > >> 0 to 255 (following HTML/CSS color conventions). That would > >> avoid the rounding issue. > At some point I changed to_rgb to return a tuple as you'd expect, without rescaling. It's basically the constructor in reverse now. Cheers, Eric From eric.talevich at gmail.com Fri Apr 9 06:39:27 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 9 Apr 2010 02:39:27 -0400 Subject: [Biopython-dev] String representation of trees in Bio.Phylo In-Reply-To: <20100404171903.GF19540@kunkel> References: <20100404171903.GF19540@kunkel> Message-ID: Hi Brad et al., I guess I can turn this into a specific proposal now, and if no one objects, just do it: On Sun, Apr 4, 2010 at 1:19 PM, Brad Chapman wrote: > Hi Eric; > > > The new phylogenetics module Bio.Phylo supports a few new ways of > displaying > > trees. I'm trying to decide which of these should be used as the informal > > string representation for whole trees, i.e. what happens when you type > > "print tree" for some newly parsed tree object. > [...[ > > The pretty_print function, with the show_all option, uses 'repr' > recursively > > to display the tree's nodes. I think this is probably the best choice for > > Tree.__str__, but it can be a bit cluttered if a lot of information is > > attached to each node/subtree/clade. > > > > >>> Phylo.pretty_print(tree, show_all=True) > > Phylogeny(rooted='True', description='phyloXML allows to use either a > > "branch_length" attribute...', name='example from Prof. Joe Felsenstein's > > book "Inferring Phyl...') > > Clade() > > Clade(branch_length='0.06') > > Clade(branch_length='0.102', name='A') > > Clade(branch_length='0.23', name='B') > > Clade(branch_length='0.4', name='C') > > I like this one. Agreed that it could get ugly, but I think it shows > the structure and associated information well. > Action items: 1. Move the pretty_print(show_all=True) code into Tree.__str__, leaving __repr__ as it is, since pretty_print relies on it. 2. Remove the pretty_print function from Bio.Phylo._utils, dropping the show_all=False functionality altogether since Tree.__str__ is more informative and draw_ascii is prettier. Wrinkle: Making Subtrees work the same way would break a few things -- I've been treating str(clade) as something that generates a short, useful label for the node, like clade.name if it's available. For example, draw_ascii uses it this to get taxon labels. This means "print tree" shows the whole recursive object tree, while "print tree.root" shows a label for the node, which is just the class name ("Clade") if no name is set. Are we OK with this? Thanks, Eric > As an alternative, we could print the tree as ASCII art, as some other > > toolkits do. However, this function is very limited -- it doesn't print > > internal node labels, and trees of more than a couple hundred nodes will > > look strange, since the drawing is compressed into a fixed number of > > character columns (default 80). > > > > >>> Phylo.draw_ascii(tree) > > __________________ A > > __________| > > _| |___________________________________________ B > > | > > > |___________________________________________________________________________ > C > > This is a good idea. I think this is more useful than the current > pretty_print without show_all for getting a quick overview of the > tree. > > Brad > From eric.talevich at gmail.com Fri Apr 9 06:39:27 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 9 Apr 2010 02:39:27 -0400 Subject: [Biopython-dev] String representation of trees in Bio.Phylo In-Reply-To: <20100404171903.GF19540@kunkel> References: <20100404171903.GF19540@kunkel> Message-ID: Hi Brad et al., I guess I can turn this into a specific proposal now, and if no one objects, just do it: On Sun, Apr 4, 2010 at 1:19 PM, Brad Chapman wrote: > Hi Eric; > > > The new phylogenetics module Bio.Phylo supports a few new ways of > displaying > > trees. I'm trying to decide which of these should be used as the informal > > string representation for whole trees, i.e. what happens when you type > > "print tree" for some newly parsed tree object. > [...[ > > The pretty_print function, with the show_all option, uses 'repr' > recursively > > to display the tree's nodes. I think this is probably the best choice for > > Tree.__str__, but it can be a bit cluttered if a lot of information is > > attached to each node/subtree/clade. > > > > >>> Phylo.pretty_print(tree, show_all=True) > > Phylogeny(rooted='True', description='phyloXML allows to use either a > > "branch_length" attribute...', name='example from Prof. Joe Felsenstein's > > book "Inferring Phyl...') > > Clade() > > Clade(branch_length='0.06') > > Clade(branch_length='0.102', name='A') > > Clade(branch_length='0.23', name='B') > > Clade(branch_length='0.4', name='C') > > I like this one. Agreed that it could get ugly, but I think it shows > the structure and associated information well. > Action items: 1. Move the pretty_print(show_all=True) code into Tree.__str__, leaving __repr__ as it is, since pretty_print relies on it. 2. Remove the pretty_print function from Bio.Phylo._utils, dropping the show_all=False functionality altogether since Tree.__str__ is more informative and draw_ascii is prettier. Wrinkle: Making Subtrees work the same way would break a few things -- I've been treating str(clade) as something that generates a short, useful label for the node, like clade.name if it's available. For example, draw_ascii uses it this to get taxon labels. This means "print tree" shows the whole recursive object tree, while "print tree.root" shows a label for the node, which is just the class name ("Clade") if no name is set. Are we OK with this? Thanks, Eric > As an alternative, we could print the tree as ASCII art, as some other > > toolkits do. However, this function is very limited -- it doesn't print > > internal node labels, and trees of more than a couple hundred nodes will > > look strange, since the drawing is compressed into a fixed number of > > character columns (default 80). > > > > >>> Phylo.draw_ascii(tree) > > __________________ A > > __________| > > _| |___________________________________________ B > > | > > > |___________________________________________________________________________ > C > > This is a good idea. I think this is more useful than the current > pretty_print without show_all for getting a quick overview of the > tree. > > Brad > From bugzilla-daemon at portal.open-bio.org Fri Apr 9 17:18:49 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 9 Apr 2010 13:18:49 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004091718.o39HInW2015975@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 ------- Comment #9 from eric.talevich at gmail.com 2010-04-09 13:18 EST ------- (In reply to comment #7, Peter) > Regarding PhyloXML, would it fit to rename "get_alignment" as "to_alignment"? > That is a fairly common naming convention. That's done in GitHub now: http://github.com/biopython/biopython/commit/22cd408c4433434472d12a5959ecc9d347c03660 (In reply to comment #6, myself) > > Would you use a Phylogeny method called > > add_alignment, taking something like a Phylip character matrix? I still think an "add_alignment" method on Phylogeny would be useful, but we can start with a cookbook on the wiki until we're confident in the right way to do it. (In reply to comment #5, Joel) > There have long been built-in get/sets in python via __setattr__() and > __getattribute__(). That's where the code I sent should live. Putting code > to get and especially to set in those methods means that a user doesn't have > to look up whatever classes were defined for attributes (e.g. like finding > that 'color' is called 'BranchColor') and doesn't need to know that > taxonomies and properties are only set through appending through lists. On the mailing list we decided that branch color was a special case for allowing a shortcut, because RGB triples and 24-bit hex strings are well-established ways to represent color codes. Branch width is already just a float, so no problem there. The rest of the attributes are special PhyloXML classes, and I think the right solution there is for me to write documentation and for users to read it. But the behavior of taxonomies, properties and the other sometimes-plural attributes should be fixed: they already support singular getters for one-element lists, so there should be corresponding setters that (a) put one element in the list if it's empty (b) replace the element in the list if there's only one (c) raise an exception if there are already multiple elements in the list I'm leaving this bug open for that. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Apr 10 04:10:39 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 10 Apr 2010 00:10:39 -0400 Subject: [Biopython-dev] [Bug 3045] TreeMixin, please define enumerator and other convenience methods In-Reply-To: Message-ID: <201004100410.o3A4AdKV032271@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3045 ------- Comment #5 from eric.talevich at gmail.com 2010-04-10 00:10 EST ------- (In reply to comment #4, myself) > (In reply to comment #0, Joel) > > (1) internal nodes, terminal nodes, and all nodes are not currently > > on an equal footing with respect to methods > > We could also have 'get_nonterminal' and 'get_all_clades' -- I'm not so sure > that the last one is useful enough to justify cluttering the API further; what > do you think? (I actually balked at add get_terminals() originally, since it's > so simple.) I added get_nonterminals() to TreeMixin: http://github.com/biopython/biopython/commit/de024f7d700a8ce83a64bc9f8cfd6273cefe95bc Do we need a get_all_clades method? Is that a good name? > > Here I give some convenience methods that I wish were defined in > > TreeMixin. I have tested them as standalone methods. I hope you'll > > see fit to include them at some point. > > > > def count_internals(self): > > """Counts the number of non-terminal (internal) nodes within this tree.""" > > return [i for i,e in enumerate_internals(self)][-1] + 1 > > I can add a convenience function that would help: > > def iterlen(items): > for i, x in enumerate(items): > count = i > return count + 1 > > Then count_internals(tree) is the same as: > iterlen(tree.find_clades(terminal=False)) > > Or, if we add get_nonterminals() it's easy: > len(tree.get_nonterminals()) Both of these can be done now, but len(tree.get_nonterminals()) is easiest. iterlen() is hidden in _sugar.py for now: http://github.com/biopython/biopython/commit/c8ce7f7b0314b54084b62759b1f82488374cae28 > > Less critical but still useful are the following two methods (and one private > > utility) that I find useful for operations on trees: > > > > def is_semipreterminal(self): > > """True if any direct descendent is terminal.""" > > if self.root.is_terminal(): > > return False > > for clade in self.clades: > > if clade.is_terminal(): > > return True > > return False > > Is semipreterminal a standard name for nodes like this? > > In Python 2.5 and later, you could also do: > any(clade.is_terminal() for clade in self) > > > > def terminal_neighbor_dists(self): > > """Return a list of distances between adjacent terminals""" > > return [self.distance(*i) for i in > > _generate_pairs(self.find_clades(terminal=True))] > > > > def _generate_pairs(self): > > import itertools > > pairs = itertools.tee(self) > > pairs[1].next() > > return itertools.izip(pairs[0], pairs[1]) I'll add these to the wiki as cookbook entries. One more thing -- should we rename the find_all and find_clades methods? I'm leaving this bug open as a reminder to decide that (and the get_all_clades question above) before the 1.54 release. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Apr 10 04:13:47 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 10 Apr 2010 00:13:47 -0400 Subject: [Biopython-dev] [Bug 3046] PhyloXML, please define get/set methods In-Reply-To: Message-ID: <201004100413.o3A4Dlg7032456@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3046 eric.talevich at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #10 from eric.talevich at gmail.com 2010-04-10 00:13 EST ------- (In reply to comment #9) > [T]he behavior of taxonomies, properties and the other sometimes-plural > attributes should be fixed: they already support singular getters for > one-element lists, so there should be corresponding setters that > (a) put one element in the list if it's empty > (b) replace the element in the list if there's only one > (c) raise an exception if there are already multiple elements in the list > > I'm leaving this bug open for that. > Done, and then some: http://github.com/biopython/biopython/commit/a1d4a1be469c6d06fcb093073dff0679b7ec5257 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Sat Apr 10 20:33:57 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 10 Apr 2010 21:33:57 +0100 Subject: [Biopython-dev] [Biopython] Bio.Application now subprocess? In-Reply-To: <1101855478758905131@unknownmsgid> References: <1101855478758905131@unknownmsgid> Message-ID: On Sat, Apr 10, 2010 at 8:27 PM, Vincent Davis wrote: > > So that was/is my plan to use it to writes command lone tools for the > affymetrix apt dev commandline app. unless this is redundant in a way > I am not aware of. > Thanks Ah - right, now this makes sense. Are you on the dev mailing list (CC'd)? That would be a better place to ask. I'd start by looking at Bio.Align.Applications (less subclasses there) as a model. Peter From vincent at vincentdavis.net Sun Apr 11 05:01:49 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Sat, 10 Apr 2010 23:01:49 -0600 Subject: [Biopython-dev] _Switch and _Option questions Message-ID: I am just getting started with class AbstractCommandline(object): The first set of questions is more about questions I had after reading the documentation. First questions/comment mostly about documentation: It appears that not all attributes need to be set in _Options and _Switch, for example _Switch: o is_set -- if the parameter has been set, I don't see and example of this being specified in a _Switch self.parameter statement. I see that it defaults to False, Is there a case that this is set in a self.parameters statement? I think I understand this. It is just documented as though it should be specified. If I don't need a checker_function in _Options then I must use an None, I guess as a place holder? Not clear if equate should be used as "equate" or 0,1, by this I mean it is not documented well same goes for is_required Looks like Value is similar to is_set in that it is not ever specified in a self.Parameters statement I don't think the example using Bio.Emboss.Applications import WaterCommandline includes the use of a _switch --------- Second set of questions For both of the next questions I am mostly asking if the feature/functionality is part of the class AbstractCommandline(object): If two _Switches are mutually exclusive in there use is there a way to make sure that they not both specified? Example anywhere? Basically same question for _Option, How do I refer to the value of another _option. Thanks *Vincent Davis 720-301-3003 * vincent at vincentdavis.net my blog | LinkedIn From biopython at maubp.freeserve.co.uk Sun Apr 11 10:11:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 11 Apr 2010 11:11:37 +0100 Subject: [Biopython-dev] _Switch and _Option questions In-Reply-To: References: Message-ID: On Sun, Apr 11, 2010 at 6:01 AM, Vincent Davis wrote: > I am just getting started with class AbstractCommandline(object): The first > set of questions is more about questions I had after reading the > documentation. Keep in mind that the documentation there is is more aimed at the end user, rather than a developer writing a new command line wrapper. There are also some "historical" bits which we are phasing out still (like the deprecated ApplicationResult class) which add confusion. > First questions/comment mostly about documentation: > It appears that not all attributes need to be set in _Options and _Switch, > for example > ? ?_Switch: o is_set -- if the parameter has been set, I don't see and > example of this being specified in a ?_Switch self.parameter statement. I > see that it defaults to False, Is there a case that this is set in a > self.parameters statement? I think I understand this. It is just documented > as though it should be specified. Switches are either true or false, meaning either appended to the command line string or not. This boolean is held in the is_set parameter. They don't take values (see _Option for that). > If I don't need a checker_function in _Options then I must use an None, I > guess as a place holder? I think so from memory. > Not clear if equate should be used as ?"equate" or 0,1, by this I mean it is > not documented well > ? same goes for is_required They should be interpreted as booleans, so for new code using True and False is clearer, but 1 and 0 are also fine (and used in a lot of the older code). > Looks like Value is similar to is_set in that it is not ever specified in a > self.Parameters statement No, the user specifies the value if they want to use that option. > I don't think the example using Bio.Emboss.Applications import > WaterCommandline includes the use of a _switch They do - all the EMBOSS wrapps have common switches like auto, stdout etc defined in the base class. > --------- > Second set of questions > For both of the next questions I am mostly asking if the > feature/functionality is part of the class AbstractCommandline(object): > > If two _Switches are mutually exclusive in there use is there a way to make > sure that they not both specified? Example anywhere? This isn't supported in Bio.Application explicitly, but can be done as in Bio.Blast.Applications (see the _validate methods). Do you really need to do this? You could just leave it to the user. > Basically same question for _Option, How do I refer to the value of another > _option. Just like an end user would, via the property it defines. See the Bio.Blast.Applications examples. Peter From eric.talevich at gmail.com Mon Apr 12 15:33:37 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 12 Apr 2010 11:33:37 -0400 Subject: [Biopython-dev] Another contributor for v1.54 Message-ID: Hello, I remembered one more contributor who I think should be mentioned with the Biopython 1.54 release: Diana Jaunzeikare, who worked on phyloXML support in BioRuby last summer parallel to my project, wrote a test file called made_up.xml which we're using in the Bio.Phylo test suite. http://github.com/biopython/biopython/commit/1a40c886757a7266ac8a0a74a31ca19e30f5bf5b (I checked with her and she's happy to be listed.) Thanks, Eric From biopython at maubp.freeserve.co.uk Mon Apr 12 15:37:02 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 12 Apr 2010 16:37:02 +0100 Subject: [Biopython-dev] Another contributor for v1.54 In-Reply-To: References: Message-ID: On Mon, Apr 12, 2010 at 4:33 PM, Eric Talevich wrote: > Hello, > > I remembered one more contributor who I think should be mentioned with > the Biopython 1.54 release: Diana Jaunzeikare, who worked on phyloXML > support in BioRuby last summer parallel to my project, wrote a test > file called made_up.xml which we're using in the Bio.Phylo test suite. > http://github.com/biopython/biopython/commit/1a40c886757a7266ac8a0a74a31ca19e30f5bf5b > > (I checked with her and she's happy to be listed.) > > Thanks, > Eric Sure - a good test case is certainly a contribution worth crediting. Please add her to the NEWS file retrospectively, and the CONTRIB file. Peter From bugzilla-daemon at portal.open-bio.org Tue Apr 13 14:59:20 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 13 Apr 2010 10:59:20 -0400 Subject: [Biopython-dev] [Bug 3057] New: Incremental parsing in Bio.Emboss.PrimerSearch Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3057 Summary: Incremental parsing in Bio.Emboss.PrimerSearch Product: Biopython Version: 1.54b Platform: PC OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk The Bio.Emboss.PrimerSearch module has a single function "read" which loads and parses an entire output file from the EMBOSS tool primersearch into memory at once, returning what is essentially a dictionary keyed by primer name, with as values lists of amplimer information objects. Even though this still seems to work with "large" output files for thousands of primer pairs, I think it would be useful to provide an iterator function "parse" returning the amplimers for each primer. The current "read" function could be retained for backward compatibility. The parsing code itself could be extended to extract information like the forward and reverse primer sequences, where the hit (location and strand) and with how many mismatches. This information is currently all held in a long string. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From updates at feedmyinbox.com Wed Apr 14 06:12:50 2010 From: updates at feedmyinbox.com (Feed My Inbox) Date: Wed, 14 Apr 2010 02:12:50 -0400 Subject: [Biopython-dev] 4/14 BioStar - Biopython Questions Message-ID: ================================================== 1. extracting a subset of sequences from a FASTQ file (BioPython speed) ================================================== April 13, 2010 at 8:09 AM Initially my problem was to extract all entries from a FASTQ file with names not present in a FASTA file. Using biopython I wrote: from Bio.SeqIO.QualityIO import FastqGeneralIterator corrected_fn = "my_input_fasta.fas" uncorrected_fn = "my_input_fastq.ftq" output_fn = "differences_fastq.ftq" corrected_names = [] for line in open(corrected_fn): if line[0] == ">": read_name = line.split()[0][1:] corrected_names.append(read_name) input_fastq_fn = uncorrected_fn corrected_names.sort() handle = open(output_fn, "w") for title, seq, qual in FastqGeneralIterator(open(input_fastq_fn)) : if title not in corrected_names: handle.write("@%s\n%s\n+\n%s\n" % (title, seq, qual)) handle.close() Problem is, it is very slow. On 2Ghz workstation starting from a local disc it can take two days per pair of files: 4870868 seqs in FASTQ 4299464 seqs in FASTA Removing title from corrected_names speeds up things a bit (this version I used for running). Am I doing something obviously silly or simply FastqGeneralIterator is not a best construct to use here? While I like Python best, I am open to answers in Perl/Ruby. Slicing and dicing FASTQ files based on lists seems to be fairly common task. Edit: Python 2.6.4, biopython 1.53, Linux Fedora 8. Edit 2: corrected one line of code, see comment to giovanni code snippet taken from: http://news.open-bio.org/news/2009/09/biopython-fast-fastq/ http://biostar.stackexchange.com/questions/671/extracting-a-subset-of-sequences-from-a-fastq-file-biopython-speed -------------------------------------------------- =========================================================== Source: http://biostar.stackexchange.com/questions/tagged/biopython This email was sent to biopython-dev at lists.open-bio.org. Account Login: https://www.feedmyinbox.com/members/login/ Don't want to receive this feed any longer? Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/311791/6ca55937c6ac7ef56420a858404addee7b17d3e7/ ----------------------------------------------------------- This email was carefully delivered by FeedMyInbox.com. 230 Franklin Road Suite 814 Franklin, TN 37064 From p.j.a.cock at googlemail.com Wed Apr 14 14:36:31 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 14 Apr 2010 15:36:31 +0100 Subject: [Biopython-dev] Biopython at the SciPy 2010 conference in Texas? Message-ID: Hi team, Would any of you be interested in presenting a talk or tutorial on Biopython at the SciPy 2010 conference in Austin, Texas? http://conference.scipy.org/scipy2010/index.html This is quite close before BOSC/ISMB 2010 kicks off in Boston (I'm wondering if I can attend both from the UK - it would be a busy 2 and a half week trip!): http://www.open-bio.org/wiki/BOSC_2010 Peter ---------- Forwarded message ---------- From: Glen Otero Date: Wed, Apr 14, 2010 at 2:04 PM Subject: Re: [bip] SciPy 2010 To: Peter Cock Hi Peter- It would be great if someone from BioPython could come and present. People are suggesting tutorial topics and voting on them here:http://conference.scipy.org/scipy2010/tutorialsUV.html. ?Please submit BioPython as a tutorial topic if you get the chance. If a tutorial is selected, the presenter will receive $1000-$1500 that they can put towards travel and registration. Hope to see the BioPython project represented at SciPy this year! Best, Glen On Apr 14, 2010, at 2:26 AM, Peter Cock wrote: Hi Glen, SciPy 2010 sounds great - we might be able to find someone from Biopython to come and present, maybe even offer a tutorial. Would this be suitable? I'm tempted to volunteer myself but would need funding to attend (from the UK). http://biopython.org/ Peter On Sun, Apr 11, 2010 at 5:48 AM, Glen Otero wrote: Hello folks- SciPy 2010 is rapidly approaching and will be held in Austin, TX this year (http://conference.scipy.org/scipy2010/index.html). I'm chairing the bioinformatics/biomedical track (http://conference.scipy.org/scipy2010/papers.html) and welcome any presentation suggestions from list members. Hope to see you there! Thanks! Glen _______________________________________________ biology-in-python mailing list - bip at lists.idyll.org. See http://bio.scipy.org/ for our Wiki. From matzke at berkeley.edu Thu Apr 15 08:35:59 2010 From: matzke at berkeley.edu (Nick Matzke) Date: Thu, 15 Apr 2010 01:35:59 -0700 Subject: [Biopython-dev] Biopython devs at iEvoBio? In-Reply-To: <4BBC189F.1030401@student.otago.ac.nz> References: <20100406135438.95193kse6z41o5su@www.studentmail.otago.ac.nz> <4BBBD5D9.60504@berkeley.edu> <4BBC189F.1030401@student.otago.ac.nz> Message-ID: <4BC6CFEF.40604@berkeley.edu> Hi all, Thanks for the invite David, I just registered for a lightning talk at the iEvoBio conference site: ========== Lightning Talk: Biopython (bio) Geography Module Nicholas J. Matzke matzke at berkeley.edu Department of Integrative Biology University of California, Berkeley Abstract: For Google Summer of Code 2009/NESCENT Phyloinformatics Summer of Code 2009, I built a Geography module for Biopython. The purpose of the module is to search, download, and process biogeographical data from GBIF, much as Biopython currently accesses Genbank. Application of the tool to a historical biogeography study on bivalves will be illustrated. ========== See everyone there! PS I have family in Portland and used to work there so if anyone needs bar suggestions I might be able to help... Cheers, Nick david winter wrote: > Ok, > > So Nick will be there, Eric hopes not to be ;) > > I sent an email to the organizsng committee about different the talk > categories. Based on their reply, and the fact that both Nick and I are > going to be focused on our talks at the Evolution Meetings (ie, flying > halfway around the world to present a 12min talk) it seems the best way > to go would be have a lightening talk on each of the GSoC projects. > > Eric, presuming you go to BOSC and not iEvoBio I'll get in touch with > you at some stage with an outline of a talk and you can help me whip it > into shape. > > Cheers, > David > > On 4/7/2010 12:46 PM, Nick Matzke wrote: >> I'll be at the Evo meeting and the iEvoBio meeting. I'd be happy to >> give a talk as long as it's short -- I have to prioritize my research >> talk at the main meeting! >> >> Cheers! >> Nick >> >> >> David Winter wrote: >>> Hi again guys, >>> >>> I was wondering if anyone else is planning to go to iEvoBio >>> (http://ievobio.org/) in Portland in June.. > > -- ==================================================== Nicholas J. Matzke Ph.D. Candidate, Graduate Student Researcher Huelsenbeck Lab Center for Theoretical Evolutionary Genomics 4151 VLSB (Valley Life Sciences Building) Department of Integrative Biology University of California, Berkeley Graduate Student Instructor, IB200A Principles of Phylogenetics: Systematics http://ib.berkeley.edu/courses/ib200a/index.shtml Lab websites: http://ib.berkeley.edu/people/lab_detail.php?lab=54 http://fisher.berkeley.edu/cteg/hlab.html Dept. personal page: http://ib.berkeley.edu/people/students/person_detail.php?person=370 Lab personal page: http://fisher.berkeley.edu/cteg/members/matzke.html Lab phone: 510-643-6299 Dept. fax: 510-643-6264 Cell phone: 510-301-0179 Email: matzke at berkeley.edu Mailing address: Department of Integrative Biology 3060 VLSB #3140 Berkeley, CA 94720-3140 ----------------------------------------------------- "[W]hen people thought the earth was flat, they were wrong. When people thought the earth was spherical, they were wrong. But if you think that thinking the earth is spherical is just as wrong as thinking the earth is flat, then your view is wronger than both of them put together." Isaac Asimov (1989). "The Relativity of Wrong." The Skeptical Inquirer, 14(1), 35-44. Fall 1989. http://chem.tufts.edu/AnswersInScience/RelativityofWrong.htm ==================================================== From chapmanb at 50mail.com Thu Apr 15 13:43:01 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 15 Apr 2010 09:43:01 -0400 Subject: [Biopython-dev] Biopython at the SciPy 2010 conference in Texas? In-Reply-To: References: Message-ID: <20100415134301.GM54921@sobchak.mgh.harvard.edu> Peter; > Would any of you be interested in presenting a talk or tutorial on > Biopython at the SciPy 2010 conference in Austin, Texas? > http://conference.scipy.org/scipy2010/index.html > > This is quite close before BOSC/ISMB 2010 kicks off in Boston > (I'm wondering if I can attend both from the UK - it would be a > busy 2 and a half week trip!): > http://www.open-bio.org/wiki/BOSC_2010 I wanted to go to SciPy this year but the timing is terrible for me with respect to BOSC. It would be really nice to have a representative there if you or anyone else is keen. Connecting with that community would be really useful since their interests definitely align; check out the top tutorial ideas: - Distributed and multi-core computing - Large data set handling - Building web-based tools Now I'm especially sad I can't make it. Hope the timing and location works for someone, Brad > ---------- Forwarded message ---------- > From: Glen Otero > Date: Wed, Apr 14, 2010 at 2:04 PM > Subject: Re: [bip] SciPy 2010 > To: Peter Cock > > > Hi Peter- > It would be great if someone from BioPython could come and present. > People are suggesting tutorial topics and voting on them > here:http://conference.scipy.org/scipy2010/tutorialsUV.html. ?Please > submit BioPython as a tutorial topic if you get the chance. If a > tutorial is selected, the presenter will receive $1000-$1500 that they > can put towards travel and registration. > Hope to see the BioPython project represented at SciPy this year! > Best, > Glen > On Apr 14, 2010, at 2:26 AM, Peter Cock wrote: > > Hi Glen, > > SciPy 2010 sounds great - we might be able to find someone from Biopython > to come and present, maybe even offer a tutorial. Would this be suitable? I'm > tempted to volunteer myself but would need funding to attend (from the UK). > http://biopython.org/ > > Peter > > On Sun, Apr 11, 2010 at 5:48 AM, Glen Otero wrote: > > Hello folks- > > SciPy 2010 is rapidly approaching and will be held in Austin, TX this > year (http://conference.scipy.org/scipy2010/index.html). I'm chairing > the bioinformatics/biomedical track > (http://conference.scipy.org/scipy2010/papers.html) and welcome any > presentation suggestions from list members. > > Hope to see you there! > > Thanks! > > Glen > > _______________________________________________ > > biology-in-python mailing list - bip at lists.idyll.org. > > See http://bio.scipy.org/ for our Wiki. > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Thu Apr 15 15:03:02 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 15 Apr 2010 16:03:02 +0100 Subject: [Biopython-dev] Draft abstract for BOSC 2010 Biopython Project Update Message-ID: Hi all, I should have circulated this earlier, but here is a draft abstract for a "Biopython Project Update" talk at BOSC 2010, to be submitted *today*. http://www.open-bio.org/wiki/BOSC_2010 I'm hoping to attend BOSC again this year and give the talk, but haven't sorted out the finances - Brad has offered to present if I can't go, hence the talk author list. If anyone else wants to help with slides etc (or as a standby speaker) please let me know. This is based on the abstract from last year, included in this PDF: http://www.open-bio.org/w/images/c/c7/BOSC2009_program_20090601.pdf In the PDF version of the abstract I've made the logo smaller this time ;) Comments welcome, Thanks, Peter -- Biopython Project Update Peter Cock, Brad Chapman In this talk we present the current status of the Biopython project (www.biopython.org), described in a application note published last year (Cock et al., 2009). Biopython celebrated its 10th Birthday last year, and has now been cited or referred to in over 150 scientific publications (a list is included on our website). At the end of 2009, following an extended evaluation period, Biopython successfully migrated from using CVS for source code control to using git, hosted on github.com. This has helped our existing developers to work and test new features on publicly viewable branches before being merged, and has also encouraged new contributors to work on additions or improvements. Currently about fifty people have their own Biopython repository on GitHub. In summer 2009 we had two Google Summer of Code (GSoC) project students working on phylogenetic code for Biopython in conjunction with the National Evolutionary Synthesis Center (NESCent). Eric Talevich?s work on phylogenetic trees including phyloXML support (Han and Zamesk, 2009) was merged and included with Biopython 1.54, and he continues to be actively involved with Biopython. We hope to include Nick Matzke?s module for biogeographical data from the Global Biodiversity Information Facility (GBIF) later this year. For summer 2010 we have Biopython related GSoC projects submitted via both NESCent and the Open Bioinformatics Foundation (OBF), and hope to have students working on Biopython once again. Since BOSC 2009, Biopython has seen four releases. Biopython 1.51 (August 2009) was an important milestone in dropping support for Python 2.3 and our legacy parsing infra-structure (Martel/Mindy), but was most noteworthy for FASTQ support (Cock et al., 2010). Biopython 1.52 (September 2009) introduced indexing of most sequence file formats for random access, and made interconverting sequence and alignment files easier. Biopython 1.53 (December 2009) included wrappers for the new NCBI BLAST+ command line tools, and much improved support for running under Jython. Our latest release is Biopython 1.54 (April/May 2010), new features include Bio.Phylo for phylogenetic trees (GSoC project), and support for Standard Flowgram Format (SFF) files used for 454 Life Sciences (Roche) sequencing. Biopython is free open source software available from www.biopython.org under the Biopython License Agreement (an MIT style license, http://www.biopython.org/DIST/LICENSE). References Cock, P.J.A., Antao, T., Chang, J.T., Chapman, B.A., Cox, C.J., Dalke, A., Friedberg, I., Hamelryck, T., Kauff, F., Wilczynski, B., de Hoon, M.J. (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25(11) 1422-3. doi:10.1093/bioinformatics/btp163 Han, M.V. and Zmasek, C.M. (2009) phyloXML: XML for evolutionary biology and comparative genomics. BMC Bioinformatics 10:356. doi:10.1186/1471-2105-10-356 Cock, P.J.A., Fields, C.J., Goto N., Heuer, M.L., and Rice, P.M. (2010) The Sanger FASTQ file format for sequences with quality scores, and the Solexa/Illumina FASTQ variants. Nucleic Acids Res 38(6) 1767-71. doi:10.1093/nar/gkp1137 From p.j.a.cock at googlemail.com Fri Apr 16 13:55:35 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 16 Apr 2010 14:55:35 +0100 Subject: [Biopython-dev] Biopython at the SciPy 2010 conference in Texas? In-Reply-To: <20100415134301.GM54921@sobchak.mgh.harvard.edu> References: <20100415134301.GM54921@sobchak.mgh.harvard.edu> Message-ID: On Thu, Apr 15, 2010 at 2:43 PM, Brad Chapman wrote: > > I wanted to go to SciPy this year but the timing is terrible for me with > respect to BOSC. It would be really nice to have a representative there > if you or anyone else is keen. Connecting with that community would > be really useful since their interests definitely align; check out > the top tutorial ideas: > > - Distributed and multi-core computing > - Large data set handling > - Building web-based tools > > Now I'm especially sad I can't make it. Hope the timing and location > works for someone, > Brad > I've put up Biopython as a tutorial topic suggestion which people can vote on, and will look further at the logistics of attending. http://conference.scipy.org/scipy2010/tutorialsUV.html Also note the call for papers deadline is April 25 for the specialist tracks (we'd fall under Biomedical/bioinformatics). Peter From kanzure at gmail.com Fri Apr 16 14:42:09 2010 From: kanzure at gmail.com (Bryan Bishop) Date: Fri, 16 Apr 2010 09:42:09 -0500 Subject: [Biopython-dev] Biopython at the SciPy 2010 conference in Texas? In-Reply-To: References: <20100415134301.GM54921@sobchak.mgh.harvard.edu> Message-ID: On Fri, Apr 16, 2010 at 8:55 AM, Peter Cockwrote: > I've put up Biopython as a tutorial topic suggestion which people can > vote on, and will look further at the logistics of attending. > http://conference.scipy.org/scipy2010/tutorialsUV.html I live in Austin, Texas and am presenting a few different python projects (like sympy, pythonOCC and okfn/datapkg). Many wonderful projects wouldn't otherwise have a presence at scipy2010.. and they certainly should! So, someone has to do it. - Bryan http://heybryan.org/ 1 512 203 0507 From p.j.a.cock at googlemail.com Fri Apr 16 14:51:58 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 16 Apr 2010 15:51:58 +0100 Subject: [Biopython-dev] Biopython at the SciPy 2010 conference in Texas? In-Reply-To: References: <20100415134301.GM54921@sobchak.mgh.harvard.edu> Message-ID: On Fri, Apr 16, 2010 at 3:42 PM, Bryan Bishop wrote: > On Fri, Apr 16, 2010 at 8:55 AM, Peter Cockwrote: >> I've put up Biopython as a tutorial topic suggestion which people can >> vote on, and will look further at the logistics of attending. >> http://conference.scipy.org/scipy2010/tutorialsUV.html > > I live in Austin, Texas and am presenting a few different python > projects (like sympy, pythonOCC and okfn/datapkg). Many wonderful > projects wouldn't otherwise have a presence at scipy2010.. and they > certainly should! So, someone has to do it. > > - Bryan Excellent, and very public spirited of you :) Peter From bugzilla-daemon at portal.open-bio.org Fri Apr 16 21:13:19 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 16 Apr 2010 17:13:19 -0400 Subject: [Biopython-dev] [Bug 2951] PDBParser assigns model 0 to first model no matter what... In-Reply-To: Message-ID: <201004162113.o3GLDJW0005115@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2951 kamil at kamilkisiel.net changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kamil at kamilkisiel.net ------- Comment #3 from kamil at kamilkisiel.net 2010-04-16 17:13 EST ------- I don't really see the utility of having the id field be anything else other than the actual model ID as reported in the PDB file. Typically looping in Python isn't done based on using sequence indices. It's fine if the models are in indices 0 to n-1 in the child_list member of a structure, but I think their ID member should still reflect the actual model identifier. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Apr 16 21:28:28 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 16 Apr 2010 17:28:28 -0400 Subject: [Biopython-dev] [Bug 2950] Bio.PDBIO.save writes MODEL records without model id In-Reply-To: Message-ID: <201004162128.o3GLSS01005451@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2950 kamil at kamilkisiel.net changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |kamil at kamilkisiel.net -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From updates at feedmyinbox.com Sat Apr 17 06:12:26 2010 From: updates at feedmyinbox.com (Feed My Inbox) Date: Sat, 17 Apr 2010 02:12:26 -0400 Subject: [Biopython-dev] 4/17 BioStar - Biopython Questions Message-ID: <085750679879c59b64a2f0e534328b64@74.63.51.88> ================================================== 1. Does Biopython parse blast -m8 or -m9 (aka blasttable)? ================================================== April 16, 2010 at 10:54 AM I am getting the following error when I try -m9 File "parseBlast.biopython.py", line 5, in ? blast_record = blast_parser.parse(result_handle) File "/src/biopython-1.52/build/lib.linux-x86_64-2.4/Bio/Blast/NCBIStandalone.py", line 763, in parse self._scanner.feed(handle, self._consumer) File "/src/biopython-1.52/build/lib.linux-x86_64-2.4/Bio/Blast/NCBIStandalone.py", line 96, in feed self._scan_header(uhandle, consumer) File "/src/biopython-1.52/build/lib.linux-x86_64-2.4/Bio/Blast/NCBIStandalone.py", line 213, in _scan_header raise ValueError("Invalid header?") ValueError: Invalid header? http://biostar.stackexchange.com/questions/760/does-biopython-parse-blast-m8-or-m9-aka-blasttable -------------------------------------------------- =========================================================== Source: http://biostar.stackexchange.com/questions/tagged/biopython This email was sent to biopython-dev at lists.open-bio.org. Account Login: https://www.feedmyinbox.com/members/login/ Don't want to receive this feed any longer? Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/311791/6ca55937c6ac7ef56420a858404addee7b17d3e7/ ----------------------------------------------------------- This email was carefully delivered by FeedMyInbox.com. 230 Franklin Road Suite 814 Franklin, TN 37064 From eric.talevich at gmail.com Sat Apr 17 13:35:57 2010 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 17 Apr 2010 09:35:57 -0400 Subject: [Biopython-dev] Bio.Phylo: the home stretch Message-ID: Hi all, There are two more decisions in Bio.Phylo that I'd like to settle on before the release of Biopython 1.54. They're holding open Bug 3045: http://bugzilla.open-bio.org/show_bug.cgi?id=3045 1. *Do we need a get_all_clades() method on trees and clades?* Bio.Nexus has get_terminals(); I added the same to Bio.Phylo early on, and then get_nonterminals() to satisfy some demand for the opposite method: def get_terminals(self, order='preorder'): """Get a list of all of this tree's terminal (leaf) nodes.""" return list(self.find_clades(terminal=True, order=order)) def get_nonterminals(self, order='preorder'): """Get a list of all of this tree's nonterminal (internal) nodes.""" return list(self.find_clades(terminal=False, order=order)) They're both trivial, but the idea is to make the module easy to jump into without reading the docs first. (find_clades() is a generator function that several other functions use internally; to do useful things in Bio.Phylo you still need to learn how to use it eventually.) So (a) do we need yet another sugar function that retrieves all tree nodes, both internal and external? (b) if so, what should it be called? The implementation would be: list(self.find_clades(order=order)) Also accomplished as: tree.get_terminals() + tree.get_nonterminals() 2. *Rename find_clades() to find(), or something else?* I've previously renamed: find() => find_any() -- given the same parameters as find_clades(), return the first match found, or else None (useful in an if statement) find_all() => find_elements() -- phyloXML trees have some complex objects as tree attributes, containing other objects. This function searches for those directly, and for trees without such attributes (e.g. all Newick trees), this happens to be the same as find_clades() So: find_clades() can search inside complex objects attached to trees, but yields the corresponding clade object rather than the non-clade element itself. This lets you search clades by e.g. clade.taxonomy.scientific_name, or clade.sequence.type. It should be the first "find_*" function users reach for. Should we give it a shorter name to encourage that, and shorten the code that uses it? Here's a first crack at documentation: http://github.com/etal/biopython/commit/8056a198804a08e3e03ac943c45744ad020dd53f Thanks, Eric From bugzilla-daemon at portal.open-bio.org Mon Apr 19 16:52:52 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 19 Apr 2010 12:52:52 -0400 Subject: [Biopython-dev] [Bug 3059] New: PDBContructionException should be PDBConstructionException Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3059 Summary: PDBContructionException should be PDBConstructionException Product: Biopython Version: 1.54b Platform: PC URL: http://github.com/biopython/biopython/commit/dead6ab7704 abc760d3bd13f09f8036d75e7516b OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: kamil at kamilkisiel.net I noticed that part of this code was fixed, but obviously nobody verified the fix at all because there is an obvious typo in the name of the Exception type. The name is "PDBConstructionException" (note the s, which is missing in the code...) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Apr 19 22:28:16 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 19 Apr 2010 18:28:16 -0400 Subject: [Biopython-dev] [Bug 3059] PDBContructionException should be PDBConstructionException In-Reply-To: Message-ID: <201004192228.o3JMSGe2005614@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3059 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-19 18:28 EST ------- Good point - I thought I'd rerun pylint and it was happy. Odd. I've fixed it properly and added a unit test for this now as well: http://github.com/biopython/biopython/commit/ed22f3ac17d910cf1956c2be1a9aec9f6e3125a4 Thanks. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 21 16:07:51 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 Apr 2010 12:07:51 -0400 Subject: [Biopython-dev] [Bug 3060] New: Add ungap method to the SeqRecord? Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3060 Summary: Add ungap method to the SeqRecord? Product: Biopython Version: 1.54b Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Biopython 1.53 added an ungap method to the Seq object. This is a possible enhancement request to add a matching ungap method to the SeqRecord object, where the per-letter-annotation and features should be adjusted to match. My motivating example is to take an ACE file loaded with SeqIO, remove the gaps, and output the contigs as FASTQ or QUAL files. This requires the per-letter-annotation to be sliced to match the ungapped sequence. Likewise any features fully contained within ungapped regions should be retained and their co-ordinates shifted. I'm not sure if we should do anything about features spanning a gap - the simple option which I have implemented is they are lost. This is done via the existing SeqRecord slicing and addition code. Patch to follow... See also Bug 3054 for adding upper and lower methods to the SeqRecord, and the broader discussion on Bug 2351 about strings, Seq and SeqRecord objects. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 21 16:09:01 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 21 Apr 2010 12:09:01 -0400 Subject: [Biopython-dev] [Bug 3060] Add ungap method to the SeqRecord? In-Reply-To: Message-ID: <201004211609.o3LG91oZ025848@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3060 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-21 12:09 EST ------- Created an attachment (id=1482) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1482&action=view) Patch to Bio/SeqRecord.py to add ungap method This includes a basic doctest, and some debug checks (assert statements) which could be removed after more testing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 22 13:56:48 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 22 Apr 2010 09:56:48 -0400 Subject: [Biopython-dev] [Bug 3062] New: GenBank/EMBL parser breaks when features have no qualifiers Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3062 Summary: GenBank/EMBL parser breaks when features have no qualifiers Product: Biopython Version: 1.54b Platform: All OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: laserson at mit.edu CC: laserson at mit.edu I am trying to use the EMBL parser to parse the IMGT/LIGM flatfile. Whenever there is a feature, the parser checks whether there are qualifiers in the feature with an assert statement, and does not allow features with no qualifiers. However, the EMBL specification does not require features to have qualifiers, and the IMGT flatfile is full of entries that have features with no qualifiers (only coordinates). The assertion error is tracked to an assert statement in Scanner.py at line 269. It appears that the assumption in the code is that there is an unquoted continuation of a feature qualifier, rather than a feature with no qualifiers. I am using biopython 1.51 that I built from source using python 2.5 (from an EPD install 4.3.0). I am on a Mac running OS X 10.5.8 (Leopard). Peter mentioned that the problem in the code is still present in the 1.54b release, and also in the repository. To reproduce the problem, the parser broke on the following record (the traceback is below as well): ID A03907 IMGT/LIGM annotation : keyword level; unassigned DNA; HUM; 412 BP. XX AC A03907; XX DT 11-MAR-1998 (Rel. 8, arrived in LIGM-DB ) DT 10-JUN-2008 (Rel. 200824-2, Last updated, Version 3) XX DE H.sapiens antibody D1.3 variable region protein ; DE unassigned DNA; rearranged configuration; Ig-Heavy; regular; group IGHV. XX KW antigen receptor; Immunoglobulin superfamily (IgSF); KW Immunoglobulin (IG); IG-Heavy; variable; diversity; joining; KW rearranged. XX OS Homo sapiens (human) OC cellular organisms; Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; OC Bilateria; Coelomata; Deuterostomia; Chordata; Craniata; Vertebrata; OC Gnathostomata; Teleostomi; Euteleostomi; Sarcopterygii; Tetrapoda; OC Amniota; Mammalia; Theria; Eutheria; Euarchontoglires; Primates; OC Haplorrhini; Simiiformes; Catarrhini; Hominoidea; Hominidae; OC Homo/Pan/Gorilla group; Homo. XX RN [1] RP 1-412 RA ; RT "Recombinant antibodies and methods for their production."; RL Patent number EP0239400-A/10, 30-SEP-1987. RL MEDICAL RESEARCH COUNCIL. XX DR EMBL; A03907. XX FH Key Location/Qualifiers (from EMBL) FH FT source 1..412 FT /organism="Homo sapiens" FT /mol_type="unassigned DNA" FT /db_xref="taxon:9606" FT V_region 8..>412 FT /note="antibody D1.3 V region" FT sig_peptide 8..64 FT CDS 8..>412 FT /product="antibody D1.3 V region (VDJ)" FT /protein_id="CAA00308.1" FT /translation="MAVLALLFCLVTFPSCILSQVQLKESGPGLVAPSQSLSITCTVSG FT FSLTGYGVNWVRQPPGKGLEWLGMIWGDGNTDYNSALKSRLSISKDNSKSQVFLKMNSL FT HTDDTARYYCARERDYRLDYWGQGTTLTVSS" FT D_segment 356..371 FT J_segment 372..>412 FT /note="J(H)2 region" XX SQ Sequence 412 BP; 105 A; 109 C; 104 G; 94 T; 0 other; tcagagcatg gctgtcctgg cattactctt ctgcctggta acattcccaa gctgtatcct 60 ttcccaggtg cagctgaagg agtcaggacc tggcctggtg gcgccctcac agagcctgtc 120 catcacatgc accgtctcag ggttctcatt aaccggctat ggtgtaaact gggttcgcca 180 gcctccagga aagggtctgg agtggctggg aatgatttgg ggtgatggaa acacagacta 240 taattcagct ctcaaatcca gactgagcat cagcaaggac aactccaaga gccaagtttt 300 cttaaaaatg aacagtctgc acactgatga cacagccagg tactactgtg ccagagagag 360 agattatagg cttgactact ggggccaagg caccactctc acagtctcct ca 412 // And the traceback was: ERROR: An unexpected error occurred while tokenizing input The following traceback may be corrupted or invalid The error message is: ('EOF in multi-line statement', (311, 0)) --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) /Volumes/External/home/laserson/research/church/vdj-ome/ref-data/IMGT/ in () /Library/Frameworks/Python.framework/Versions/4.3.0/lib/python2.5/site-packages/Bio/GenBank/Scanner.pyc in parse_records(self, handle, do_features) 418 #This is a generator function 419 while True : --> 420 record = self.parse(handle, do_features) 421 if record is None : break 422 assert record.id is not None /Library/Frameworks/Python.framework/Versions/4.3.0/lib/python2.5/site-packages/Bio/GenBank/Scanner.pyc in parse(self, handle, do_features) 401 feature_cleaner = FeatureValueCleaner()) 402 --> 403 if self.feed(handle, consumer, do_features) : 404 return consumer.data 405 else : /Library/Frameworks/Python.framework/Versions/4.3.0/lib/python2.5/site-packages/Bio/GenBank/Scanner.pyc in feed(self, handle, consumer, do_features) 373 #Features (common to both EMBL and GenBank): 374 if do_features : --> 375 self._feed_feature_table(consumer, self.parse_features(skip=False)) 376 else : 377 self.parse_features(skip=True) # ignore the data /Library/Frameworks/Python.framework/Versions/4.3.0/lib/python2.5/site-packages/Bio/GenBank/Scanner.pyc in parse_features(self, skip) 170 feature_lines.append(line[self.FEATURE_QUALIFIER_INDENT:].rstrip()) 171 line = self.handle.readline() --> 172 features.append(self.parse_feature(feature_key, feature_lines)) 173 self.line = line 174 return features /Library/Frameworks/Python.framework/Versions/4.3.0/lib/python2.5/site-packages/Bio/GenBank/Scanner.pyc in parse_feature(self, feature_key, lines) 267 else : 268 #Unquoted continuation --> 269 assert len(qualifiers) > 0 270 assert key==qualifiers[-1][0] 271 #if debug : print "Unquoted Cont %s:%s" % (key, line) AssertionError: Thanks! Uri -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 22 14:00:12 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 22 Apr 2010 10:00:12 -0400 Subject: [Biopython-dev] [Bug 3062] GenBank/EMBL parser breaks when features have no qualifiers In-Reply-To: Message-ID: <201004221400.o3ME0C4b008129@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3062 ------- Comment #1 from laserson at mit.edu 2010-04-22 10:00 EST ------- Created an attachment (id=1483) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1483&action=view) IMGT record that breaks -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 22 14:00:53 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 22 Apr 2010 10:00:53 -0400 Subject: [Biopython-dev] [Bug 3062] GenBank/EMBL parser breaks when features have no qualifiers In-Reply-To: Message-ID: <201004221400.o3ME0r8H008158@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3062 ------- Comment #2 from laserson at mit.edu 2010-04-22 10:00 EST ------- I added a text file with the IMGT record that breaks, as pasting it into the description messed it up. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 22 14:05:07 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 22 Apr 2010 10:05:07 -0400 Subject: [Biopython-dev] [Bug 3062] GenBank/EMBL parser breaks when features have no qualifiers In-Reply-To: Message-ID: <201004221405.o3ME572A008381@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3062 laserson at mit.edu changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |major OS/Version|Mac OS |All -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Apr 23 11:16:48 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 23 Apr 2010 07:16:48 -0400 Subject: [Biopython-dev] [Bug 3009] Check the FASTA m10 alignment parser works with FASTA36 In-Reply-To: Message-ID: <201004231116.o3NBGmC1016610@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3009 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-23 07:16 EST ------- There seems to be a bug in FASTA 36.1.2 m10 output where the pg_optcut line lacks its leading semi-colon: The following example command lines should illustrate the problem, using the following input fasta files from the NCBI, both are relatively small with three and 180 sequences each: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Escherichia_coli_O157H7/NC_002127.faa ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Klebsiella_pneumoniae_MGH_78578/NC_009649.faa $ ~/Downloads/Software/fasta-36.2.1/bin/fasta36 -Q -H -E 1 -m 10 NC_002127.faa NC_009649.faa > stdout.txt $ more stdout.txt ... [cut] ... ; pg_name_alg: FASTA ; pg_ver_rel: 3.7 Mar 2010 ; pg_matrix: BL50 (15:-5) ; pg_open-ext: -10 -2 ; pg_ktup: 2 ; pg_join: 42 pg_optcut: 30 ; mp_extrap: 60000 500 ; mp_stats: (shuffled [500]) Expectation_n fit: rho(ln(x))= 5.1864+/-0.0116; mu= 5.3472+/- 0.598 mean_var=54.6263+/-12.288, 0's: 0 Z-trim: 0 B-trim: 0 in 0/33 Lambda= 0.173529 ; mp_KS: -0.0000 (N=0) at 20 ; mp_Algorithm: FASTA (3.7 Mar 2010) [optimized] ... [cut] This breaks the Bio.AlignIO parser. Manually editing the file to insert the semi colons seems to fix things. I have reported this issue on the FASTA mailing list today: https://list.mail.virginia.edu/mailman/listinfo/fasta_list -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From elipapa at mit.edu Sun Apr 25 23:09:51 2010 From: elipapa at mit.edu (Eli Papa) Date: Mon, 26 Apr 2010 00:09:51 +0100 Subject: [Biopython-dev] GFF parser bug? Message-ID: Hello, While trying to use the GFF parser I ran into a value error. I think it's probably due to one of the GFF3 fields in my file not being specified as 'key=value', but just as 'value'. Hope this helps, eli In [1]: from BCBio.GFF import GFFExaminer In [2]: import pprint In [3]: in_file = "V1.UC-9.scaftig.more500.gff" In [4]: examiner = GFFExaminer() In [5]: in_handle = open(in_file) In [6]: pprint.pprint(examiner.parent_child_map(in_handle)) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /data/elipapa/gutmetahit/SingleSample_GenePrediction/ in () /home/elipapa/lib/python/bcbio-0.1-py2.4.egg/BCBio/GFF/GFFParser.py in _file_or_handle_inside(*args, **kwargs) 705 in_handle = open(in_file) 706 args = (args[0], in_handle) + args[2:] --> 707 out = fn(*args, **kwargs) 708 if need_close: 709 in_handle.close() /home/elipapa/lib/python/bcbio-0.1-py2.4.egg/BCBio/GFF/GFFParser.py in parent_child_map(self, gff_handle) 789 if line.strip(): 790 line_type, line_info = _gff_line_map(line, --> 791 self._get_local_params())[0] 792 if (line_type == 'parent' or (line_type == 'child' and 793 line_info['id'])): /home/elipapa/lib/python/bcbio-0.1-py2.4.egg/BCBio/GFF/GFFParser.py in _gff_line_map(line, params) 158 # collect all of the base qualifiers for this item 159 if len(parts) > 8: --> 160 quals, is_gff2 = _split_keyvals(gff_parts[8]) 161 else: 162 quals, is_gff2 = dict(), False /home/elipapa/lib/python/bcbio-0.1-py2.4.egg/BCBio/GFF/GFFParser.py in _split_keyvals(keyval_str) 84 pieces.append(p.strip().split(" ")) 85 key_vals = [(p[0], " ".join(p[1:])) for p in pieces] ---> 86 for key, val in key_vals: 87 # remove quotes in GFF2 files 88 if (len(val) > 0 and val[0] == '"' and val[-1] == '"'): ValueError: need more than 1 value to unpack ******* The gff file is as follows: ##gff-version 3 ##sequence-region scaffold4215_3 1 6526 scaffold4215_3 glimmer gene 3 62 . - . ID=GL0000006;Name=GL0000006;Lack 3'-end; scaffold4215_3 glimmer mRNA 3 62 . - . ID=GL0000006;Name=GL0000006;Parent=GL0000006;Lack 3'-end; scaffold4215_3 glimmer CDS 3 62 2.84 - 0 Parent=GL0000006;Lack 3'-end; scaffold4215_3 glimmer gene 124 1983 . - . ID=GL0000007;Name=GL0000007;Complete; [...] From biopython at maubp.freeserve.co.uk Mon Apr 26 09:43:53 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 26 Apr 2010 10:43:53 +0100 Subject: [Biopython-dev] GFF parser bug? In-Reply-To: References: Message-ID: On Mon, Apr 26, 2010 at 12:09 AM, Eli Papa wrote: > Hello, > > While trying to use the GFF parser I ran into a value error. > > I think it's probably due to one of the GFF3 fields in my file not being > specified as 'key=value', but just as 'value'. > > Hope this helps, > eli > > ... > The gff file is as follows: > > ##gff-version 3 > ##sequence-region scaffold4215_3 1 6526 > scaffold4215_3 ?glimmer gene ? ?3 ? ? ? 62 ? ? ?. ? ? ? - ? ? ? . > ?ID=GL0000006;Name=GL0000006;Lack 3'-end; > scaffold4215_3 ?glimmer mRNA ? ?3 ? ? ? 62 ? ? ?. ? ? ? - ? ? ? . > ?ID=GL0000006;Name=GL0000006;Parent=GL0000006;Lack 3'-end; > scaffold4215_3 ?glimmer CDS ? ? 3 ? ? ? 62 ? ? ?2.84 ? ?- ? ? ? 0 > ?Parent=GL0000006;Lack 3'-end; > scaffold4215_3 ?glimmer gene ? ?124 ? ? 1983 ? ?. ? ? ? - ? ? ? . > ?ID=GL0000007;Name=GL0000007;Complete; > [...] Hi Eli, Where did this GFF3 file come from? The final column looks invalid to me (it should be a list of key=value; statements). The specification seems quite clear on this: http://www.sequenceontology.org/gff3.shtml Regards, Peter From biopython at maubp.freeserve.co.uk Mon Apr 26 10:59:25 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 26 Apr 2010 11:59:25 +0100 Subject: [Biopython-dev] Bio.Phylo: the home stretch In-Reply-To: References: Message-ID: On Sat, Apr 17, 2010 at 2:35 PM, Eric Talevich wrote: > Hi all, > > There are two more decisions in Bio.Phylo that I'd like to settle on before > the release of Biopython 1.54. They're holding open Bug 3045: > http://bugzilla.open-bio.org/show_bug.cgi?id=3045 Sorry I didn't get round to this last weel. > 1. *Do we need a get_all_clades() method on trees and clades?* > > Bio.Nexus has get_terminals(); I added the same to Bio.Phylo early on, and > then get_nonterminals() to satisfy some demand for the opposite method: > > ? ?def get_terminals(self, order='preorder'): > ? ? ? ?"""Get a list of all of this tree's terminal (leaf) nodes.""" > ? ? ? ?return list(self.find_clades(terminal=True, order=order)) > > ? ?def get_nonterminals(self, order='preorder'): > ? ? ? ?"""Get a list of all of this tree's nonterminal (internal) nodes.""" > ? ? ? ?return list(self.find_clades(terminal=False, order=order)) > > They're both trivial, but the idea is to make the module easy to jump into > without reading the docs first. (find_clades() is a generator function that > several other functions use internally; to do useful things in Bio.Phylo you > still need to learn how to use it eventually.) > > So (a) do we need yet another sugar function that retrieves all tree nodes, > both internal and external? (b) if so, what should it be called? > > The implementation would be: ? ?list(self.find_clades(order=order)) > Also accomplished as: ? ?tree.get_terminals() + tree.get_nonterminals() I'd say no, we don't need it. You can always add it later, but removing something from the API is complicated with deprecations etc. > 2. *Rename find_clades() to find(), or something else?* > > I've previously renamed: > > find() => find_any() > -- given the same parameters as find_clades(), return the first match found, > or else None (useful in an if statement) > > find_all() => find_elements() > -- phyloXML trees have some complex objects as tree attributes, containing > other objects. This function searches for those directly, and for trees > without such attributes (e.g. all Newick trees), this happens to be the same > as find_clades() > > So: find_clades() can search inside complex objects attached to trees, but > yields the corresponding clade object rather than the non-clade element > itself. This lets you search clades by e.g. clade.taxonomy.scientific_name, > or clade.sequence.type. It should be the first "find_*" function users reach > for. Should we give it a shorter name to encourage that, and shorten the > code that uses it? Hmm. I think find_clades() is sensible. > Here's a first crack at documentation: > http://github.com/etal/biopython/commit/8056a198804a08e3e03ac943c45744ad020dd53f There is a very short tree example in the Alignment chapter section on Clustalw using Bio.Nexus.Trees - we should just replace that with "See Chapter X" on loading and manipulating trees. Peter From chapmanb at 50mail.com Mon Apr 26 11:56:01 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 26 Apr 2010 07:56:01 -0400 Subject: [Biopython-dev] GFF parser bug? In-Reply-To: References: Message-ID: <20100426115601.GE58289@sobchak.mgh.harvard.edu> Eli; > While trying to use the GFF parser I ran into a value error. > > I think it's probably due to one of the GFF3 fields in my file not being > specified as 'key=value', but just as 'value'. Thanks for the report. Oh boy, that's a pretty bad file. In addition to the lack of a value you brought up, there is also a Parent/Child reference problem. The second line in the GFF you sent contains two issues: - A duplicate ID value for GL0000006. ID values are supposed to be unique in a file. - The Parent=GL0000006 should be a reference to the initial gene with that ID, but is also refers to itself. > scaffold4215_3 glimmer gene 3 62 . - . ID=GL0000006;Name=GL0000006;Lack 3'-end; > scaffold4215_3 glimmer mRNA 3 62 . - . ID=GL0000006;Name=GL0000006;Parent=GL0000006;Lack 3'-end; > scaffold4215_3 glimmer CDS 3 62 2.84 - 0 Parent=GL0000006;Lack 3'-end; > scaffold4215_3 glimmer gene 124 1983 . - . ID=GL0000007;Name=GL0000007;Complete; As Peter mentioned it would be useful to also file a bug with the writers of the software that are producing this. Bringing it in line with the spec will allow it to be more widely handled by other GFF parsers. You can get a fixed version of the GFF parser that gracefully handles these issues at: http://github.com/chapmanb/bcbb/tree/master/gff/ or apply the changes to GFFParser directly: http://github.com/chapmanb/bcbb/commit/c530dc1b7d1d6b8b4df211849f969adf4df80a67 Thanks much for the report. Let us know if you have any other issues, Brad From bugzilla-daemon at portal.open-bio.org Mon Apr 26 13:10:32 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 26 Apr 2010 09:10:32 -0400 Subject: [Biopython-dev] [Bug 3045] TreeMixin, please define enumerator and other convenience methods In-Reply-To: Message-ID: <201004261310.o3QDAWxg018128@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3045 eric.talevich at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #6 from eric.talevich at gmail.com 2010-04-26 09:10 EST ------- I think we've taken care of everything we planned to for this bug -- added get_nonterminals(), decided against get_all_clades(), and resolved to convert Joel's examples to cookbook entries at some point (not blocking the 1.54 release). Discussion: http://lists.open-bio.org/pipermail/biopython-dev/2010-April/007654.html So, I'm marking this fixed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From elipapa at mit.edu Mon Apr 26 17:37:11 2010 From: elipapa at mit.edu (Eli Papa) Date: Mon, 26 Apr 2010 18:37:11 +0100 Subject: [Biopython-dev] GFF parser bug? In-Reply-To: <20100426115601.GE58289@sobchak.mgh.harvard.edu> References: <20100426115601.GE58289@sobchak.mgh.harvard.edu> Message-ID: Hi Brad, Thanks for the quick reply! Hopefully, I'll be able to reciprocate in the future.. The fix appears to work flawlessy so far, but I'll let you know if it gives me other problems. Unfortunately I have no control over the GFF (it was released to the public as part of a published study). It's unfortunately not clear from the methods section whether they have employed Glimmer, MetaGene or some custom script to put the file together. When I'll have some extra time, I'll certainly test which of these programs is the culprit and let the author know about the non-standard output format. cheers, eli On Mon, Apr 26, 2010 at 12:56 PM, Brad Chapman wrote: > Eli; > >> While trying to use the GFF parser I ran into a value error. >> >> I think it's probably due to one of the GFF3 fields in my file not being >> specified as 'key=value', but just as 'value'. > > Thanks for the report. Oh boy, that's a pretty bad file. In addition > to the lack of a value you brought up, there is also a Parent/Child > reference problem. The second line in the GFF you sent contains two > issues: > > - A duplicate ID value for GL0000006. ID values are supposed to be > ?unique in a file. > - The Parent=GL0000006 should be a reference to the initial > ?gene with that ID, but is also refers to itself. > >> scaffold4215_3 ?glimmer gene ? ?3 ? ? ? 62 ? ? ?. ? ? ? - ? ? ? . ID=GL0000006;Name=GL0000006;Lack 3'-end; >> scaffold4215_3 ?glimmer mRNA ? ?3 ? ? ? 62 ? ? ?. ? ? ? - ? ? ? . ID=GL0000006;Name=GL0000006;Parent=GL0000006;Lack 3'-end; >> scaffold4215_3 ?glimmer CDS ? ? 3 ? ? ? 62 ? ? ?2.84 ? ?- ? ? ? 0 Parent=GL0000006;Lack 3'-end; >> scaffold4215_3 ?glimmer gene ? ?124 ? ? 1983 ? ?. ? ? ? - ? ? ? . ID=GL0000007;Name=GL0000007;Complete; > > As Peter mentioned it would be useful to also file a bug with the > writers of the software that are producing this. Bringing it in line > with the spec will allow it to be more widely handled by other GFF > parsers. > > You can get a fixed version of the GFF parser that gracefully > handles these issues at: > > http://github.com/chapmanb/bcbb/tree/master/gff/ > > or apply the changes to GFFParser directly: > > http://github.com/chapmanb/bcbb/commit/c530dc1b7d1d6b8b4df211849f969adf4df80a67 > > Thanks much for the report. Let us know if you have any other > issues, > Brad > From bugzilla-daemon at portal.open-bio.org Mon Apr 26 17:52:43 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 26 Apr 2010 13:52:43 -0400 Subject: [Biopython-dev] [Bug 3062] GenBank/EMBL parser breaks when features have no qualifiers In-Reply-To: Message-ID: <201004261752.o3QHqhgZ027348@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3062 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-26 13:52 EST ------- I've just tried the file on attachment 1843 on Mac and Linux and it parses fine (using the latest Biopython code). However, I was sure I was able to reproduce this earlier (on Linux), but I forget now where I got the example file from (this was before Uri uploaded this attachment). I've been using this (and variants of this): from Bio import SeqIO record = SeqIO.read(open("A03907.embl"), "embl") In any case, the assert check looks sensible - the method parse_feature should be given a single feature, so any error is happening further up - probably in the parse_features method. I'm confused right now. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From p.j.a.cock at googlemail.com Mon Apr 26 22:30:54 2010 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 26 Apr 2010 23:30:54 +0100 Subject: [Biopython-dev] Google Summer of Code - accepted students In-Reply-To: <4BD60D63.1040400@cornell.edu> References: <4BD60D63.1040400@cornell.edu> Message-ID: ---------- Forwarded message ---------- From: Robert Buels Date: Mon, Apr 26, 2010 at 11:02 PM Subject: Google Summer of Code - accepted students To: rmb32 at cornell.edu Hi all, I'm pleased to announce the acceptance of OBF's 2010 Google Summer of Code students, listed in alphabetical order with their project titles and primary mentors: Mark Chapman (PM Andreas Prlic) - Improvements to BioJava including Implementation of Multiple Sequence Alignment Algorithms Jianjiong Gao (PM Peter Rose) - BioJava Packages for Identification, Classification, and Visualization of Posttranslational Modification of Proteins Kazuhiro Hayashi (PM Naohisa Goto) - Ruby 1.9.2 support of BioRuby Sara Rayburn (PM Christian Zmasek) - Implementing Speciation & Duplication Inference Algorithm for Binary and Non-binary Species Tree Joao Pedro Garcia Lopes Maia Rodrigues (PM Eric Talevich) - Extending Bio.PDB: broadening the usefulness of BioPython's Structural Biology module Jun Yin (PM Chris Fields) - BioPerl Alignment Subsystem Refactoring Congratulations to our accepted students! All told, we had 52 applications submitted for the 6 slots (5 originally assigned, plus 1 extra) allotted to us by Google. Proposals were extremely competitive: 6 out of 52 translates to an 11.5% acceptance rate. ?We received a lot of really excellent proposals, the decisions were not easy. Thanks very much to all the students who applied, we very much appreciate your hard work. Here's to a great 2010 Summer of Code, I'm sure these students will do some wonderful work. Rob Buels OBF GSoC 2010 Administrator From bugzilla-daemon at portal.open-bio.org Mon Apr 26 23:44:15 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 26 Apr 2010 19:44:15 -0400 Subject: [Biopython-dev] [Bug 3062] GenBank/EMBL parser breaks when features have no qualifiers In-Reply-To: Message-ID: <201004262344.o3QNiFCr003594@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3062 ------- Comment #4 from laserson at mit.edu 2010-04-26 19:44 EST ------- I did something stupid, and uploaded the wrong IMGT record. I will upload the actual offending record. However, after stepping through the code with pdb, it appears that the problem with the offending record is that the feature qualifiers are indented too far, so that the whitespace is not fully stripped off. Has it ever been considered to parse the features by breaking the line with split(), instead of hardcoding the number of columns? While the official EMBL specification may hardcode the size of the fields, the parse may be more robust to such errors. (Though I understand the desire to conform exactly to EMBL standards). Eitherway, I will notify the curators of the IMGT database. (And see the attached file with the offending record.) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Apr 26 23:47:37 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 26 Apr 2010 19:47:37 -0400 Subject: [Biopython-dev] [Bug 3062] GenBank/EMBL parser breaks when features have no qualifiers In-Reply-To: Message-ID: <201004262347.o3QNlb34003630@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3062 laserson at mit.edu changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1483 is|0 |1 obsolete| | ------- Comment #5 from laserson at mit.edu 2010-04-26 19:47 EST ------- Created an attachment (id=1489) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1489&action=view) IMGT record that actually breaks. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 27 00:14:59 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 26 Apr 2010 20:14:59 -0400 Subject: [Biopython-dev] [Bug 3062] GenBank/EMBL parser breaks when features have no qualifiers In-Reply-To: Message-ID: <201004270014.o3R0Exew004936@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3062 ------- Comment #6 from laserson at mit.edu 2010-04-26 20:14 EST ------- Alternatively, an additional lstrip() call for each line in lines in parse_feature() would probably also solve the problem. What are reasons not to do this? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Apr 27 09:43:14 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 27 Apr 2010 05:43:14 -0400 Subject: [Biopython-dev] [Bug 3062] GenBank/EMBL parser breaks on over-indented features In-Reply-To: Message-ID: <201004270943.o3R9hEab020932@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3062 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED Summary|GenBank/EMBL parser breaks |GenBank/EMBL parser breaks |when features have no |on over-indented features |qualifiers | ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-27 05:43 EST ------- (In reply to comment #4) > I did something stupid, and uploaded the wrong IMGT record. I will upload the > actual offending record. However, after stepping through the code with pdb, > it appears that the problem with the offending record is that the feature > qualifiers are indented too far, so that the whitespace is not fully stripped > off. Thanks for checking and working out what was wrong. Yes, this file does indeed break. > Has it ever been considered to parse the features by breaking the line with > split(), instead of hardcoding the number of columns? While the official EMBL > specification may hardcode the size of the fields, the parse may be more > robust to such errors. (Though I understand the desire to conform exactly to > EMBL standards). Eitherway, I will notify the curators of the IMGT database. Please do contact the IMGT curators. (In reply to comment #6) > Alternatively, an additional lstrip() call for each line in lines in > parse_feature() would probably also solve the problem. What are reasons not > to do this? Trying to parse out-of-spec files is a potential nightmare. We do try and be tolerant of "quirks" in official NCBI or EMBL files (which are occasionally technically invalid), as long as such corrections look easy and unambiguous. In this particular case, we can cope with the extra indentation as you suggest by stripping any leading white space. Fixed in the repository: http://github.com/biopython/biopython/commit/73caa4072898e7d5a71d38138c9e053066f11b24 Thank you Uri, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 15:32:20 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 11:32:20 -0400 Subject: [Biopython-dev] [Bug 3066] New: Iterating/looping over colums/rows of a MultipleSeqAlignment Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3066 Summary: Iterating/looping over colums/rows of a MultipleSeqAlignment Product: Biopython Version: 1.54b Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk The new MultipleSeqAlignment object (like the old Alignment object it replaces) stores the rows of the alignment as SeqRecord objects. This means column based access is slow. It can often be useful to be able to iterate over the columns, and a dedicated method to do this should be faster than repeatedly accessing columns by index (either via slicing with __getitem__ or the old get_column method). A related question here is should the columns be returned as strings or as Seq objects? Possible implementation to follow as a patch... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 15:33:06 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 11:33:06 -0400 Subject: [Biopython-dev] [Bug 3066] Iterating/looping over colums/rows of a MultipleSeqAlignment In-Reply-To: Message-ID: <201004281533.o3SFX6r5007784@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3066 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-28 11:33 EST ------- Created an attachment (id=1490) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1490&action=view) PAtch to Bio/Align/__init__.py Possible solution using iterators returning strings. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 18:29:56 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 14:29:56 -0400 Subject: [Biopython-dev] [Bug 3066] Iterating/looping over colums/rows of a MultipleSeqAlignment In-Reply-To: Message-ID: <201004281829.o3SITu0x014523@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3066 ------- Comment #2 from eric.talevich at gmail.com 2010-04-28 14:29 EST ------- I don't mind having plain strings returned; Bio.Seq works well enough with them for me. Two things: 1. Is this implementation fast? It basically transposes the alignment as a list-of-lists, right? So: return zip(*self) or: from itertools import izip return (''.join(col) for col in izip(*self)) 2. On the topic of efficiency -- have you encountered a situation where having an alignment as a NumPy character array would have helped? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 19:32:43 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 15:32:43 -0400 Subject: [Biopython-dev] [Bug 3067] New: SPARK parser errors should be sent to stderr Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3067 Summary: SPARK parser errors should be sent to stderr Product: Biopython Version: 1.54b Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: laserson at mit.edu CC: laserson at mit.edu The SPARK code currently sends parsing errors to stdout. This makes it difficult to sort out legitimate output from error output. Attached is a patch that corrects this. It changes two output lines to send to sys.stderr. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 19:33:37 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 15:33:37 -0400 Subject: [Biopython-dev] [Bug 3067] SPARK parser errors should be sent to stderr In-Reply-To: Message-ID: <201004281933.o3SJXbkO016573@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3067 ------- Comment #1 from laserson at mit.edu 2010-04-28 15:33 EST ------- Created an attachment (id=1491) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1491&action=view) Patch to output error messages to stderr. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 21:30:43 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 17:30:43 -0400 Subject: [Biopython-dev] [Bug 3069] New: More robust feature parser for GenBank/EMBL records Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3069 Summary: More robust feature parser for GenBank/EMBL records Product: Biopython Version: 1.54b Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: laserson at mit.edu CC: laserson at mit.edu We recently made a modification to allow for over-indented features to be processed correctly (to handle IMGT records). However, this only works if the feature keys are within the INSDC established guidelines, which is not always the case with IMGT. This specifically causes a problem for the location lines of features. I will shortly upload a patch which corrects this problem, by processing only the first line of a feature using split(), rather than the hardcoded distances. Are there any objections to this? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 21:34:16 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 17:34:16 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004282134.o3SLYGri019342@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 ------- Comment #1 from laserson at mit.edu 2010-04-28 17:34 EST ------- Created an attachment (id=1492) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1492&action=view) Generalize the processing of location lines in a feature table. There are two methods in this patch. The main one (uncommented) is a two-line method that will work perfectly, I believe. The second method is commented out, and is an alternative one-line method to do it as well. However, it will replace all whitespace with single spaces, which has potential to change the content seen by the parser, though this is unlikely. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 23:25:37 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 19:25:37 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004282325.o3SNPb8L022634@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-28 19:25 EST ------- Hi Uri Could you attach a example input file showing the kind of invalid records you want to parse? Thanks Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Apr 28 23:58:38 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 19:58:38 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004282358.o3SNwc72023624@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 ------- Comment #3 from laserson at mit.edu 2010-04-28 19:58 EST ------- Created an attachment (id=1493) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1493&action=view) IMGT record that fails with current repository version. The long feature key gets chopped up and messes up the location. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 29 00:35:11 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 28 Apr 2010 20:35:11 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004290035.o3T0ZBM8024421@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 ------- Comment #4 from laserson at mit.edu 2010-04-28 20:35 EST ------- Actually, the record I attached fails, but it's not the worst-case scenario. Using the extended feature-key length, there are some keys that actually make it to the border of the qualifiers, so that they are contiguous. This means that the indentation must be hardcoded for IMGT just like anything else. In order to solve this problem once and for all, is the best approach to subclass the IndscScanner and put in values that make sense for IMGT? If so, then there is one more problem that needs to be addressed. About 80% of the records in IMGT conform to the EMBL format correctly, while about 20% have this over-indentation problem. Would it make more sense to go through the entire IMGT database and change each record to have the increased indentation? Then the subclassed Scanner would have no problem. The alternative is that for each record, the amount of indentation should be "discovered" and changed appropriately for each record. The parsing would then proceed as it currently does. Uri This leaves two options: 1) Go through each record in IMGT and enforce the longer indentation for each such record. (This shouldn't be too difficult). 2) Su -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 29 08:00:51 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Apr 2010 04:00:51 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004290800.o3T80pdl002051@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-29 04:00 EST ------- Rather than formalising this as a sub-format or EMBL variant I would rather encourage the IMGT to fix their file to follow the EMBL standard. Correcting the indentation shouldn't be too hard - the potential problem will be contracting too long feature keys that can't fit into the EMBL allocated field. It would also be interesting to see how BioPerl etc handle these out-of-spec files. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 29 10:33:26 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Apr 2010 06:33:26 -0400 Subject: [Biopython-dev] [Bug 3067] SPARK parser errors should be sent to stderr In-Reply-To: Message-ID: <201004291033.o3TAXQZI007883@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3067 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-29 06:33 EST ------- Patch cherry-picked from github, thanks. Note that the I plan to replace the spark based location parsing with something faster using regular expressions (see Bug 2738), at which point we can deprecate and then drop our copy of spark. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 29 11:02:42 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Apr 2010 07:02:42 -0400 Subject: [Biopython-dev] [Bug 3066] Iterating/looping over colums/rows of a MultipleSeqAlignment In-Reply-To: Message-ID: <201004291102.o3TB2gN2009204@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3066 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-29 07:02 EST ------- (In reply to comment #2) > Two things: > > 1. Is this implementation fast? It basically transposes the alignment as a > list-of-lists, right? So: > > return zip(*self) > > or: > > from itertools import izip > return (''.join(col) for col in izip(*self)) I haven't done any profiling yet - using itertools would be worth trying. > 2. On the topic of efficiency -- have you encountered a situation where > having an alignment as a NumPy character array would have helped? Not personally, but these iterators should facilitate creating a NumPy character array from our alignment object. I was also pondering adding an explicit "as_array" or "to_array" method which would require NumPy at runtime. However, I would rather keep the core of Biopython without any NumPy dependency. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Apr 29 23:16:59 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 29 Apr 2010 19:16:59 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004292316.o3TNGxie030251@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 ------- Comment #6 from laserson at mit.edu 2010-04-29 19:16 EST ------- Generally I agree with you. However, based on my knowledge of the people at IMGT, this is highly unlikely. From their perspective, they invested a very large amount of time into their ontology/database structure, and I don't think they'll really be prepared to shorten their feature keys to be in compliance with EMBL. I will try to cook up a parser for IMGT that integrates into biopython (but I can't guarantee success, as I'm not extremely familiar with the internals). I'll keep you posted. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Apr 30 04:52:45 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Apr 2010 00:52:45 -0400 Subject: [Biopython-dev] [Bug 3071] New: EMBL parser does not parse RP lines correctly. Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3071 Summary: EMBL parser does not parse RP lines correctly. Product: Biopython Version: 1.54b Platform: All OS/Version: All Status: NEW Severity: major Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: laserson at mit.edu CC: laserson at mit.edu The EMBL parser makes an incorrect assert statement at line 679 of Bio/GenBank/Scanner.py: elif line_type == 'RP': # Reformat reference numbers for the GenBank based consumer # e.g. '1-4639675' becomes '(bases 1 to 4639675)' assert data.count("-")==1 consumer.reference_bases("(bases " + data.replace("-", " to ") + ")") The EMBL specification states that there can be multiple ranges in this line: http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_10_3 This breaks at least one record in IMGT (which will be attached shortly). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Apr 30 04:53:42 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Apr 2010 00:53:42 -0400 Subject: [Biopython-dev] [Bug 3071] EMBL parser does not parse RP lines correctly. In-Reply-To: Message-ID: <201004300453.o3U4rgpG005244@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3071 ------- Comment #1 from laserson at mit.edu 2010-04-30 00:53 EST ------- Created an attachment (id=1496) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1496&action=view) IMGT/EMBL record that breaks because of RP parsing error -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Apr 30 07:46:49 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Apr 2010 03:46:49 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004300746.o3U7kngI009202@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-30 03:46 EST ------- (In reply to comment #6) > Generally I agree with you. However, based on my knowledge of the people at > IMGT, this is highly unlikely. From their perspective, they invested a very > large amount of time into their ontology/database structure, and I don't think > they'll really be prepared to shorten their feature keys to be in compliance > with EMBL. You're in a much better position to access this - but could you ask them about this anyway? They may at least clarify how they bend the EMBL specification. Do they have a preferred file format (e.g. XML)? > I will try to cook up a parser for IMGT that integrates into biopython (but I > can't guarantee success, as I'm not extremely familiar with the internals). > I'll keep you posted. How I would try this would be to write a new scanner subclassing the EMBL scanner in Bio/GenBank/Scanner.py (which probably only needs to override the feature parsing), and then new functions in Bio/SeqIO/InsdcIO.py to call it (matching the GenBank and EMBL functions), and define a new format name (mabye "embl-imgt") in the dictionary in Bio/SeqIO/__init__.py and done. However, if the only out-of-specification thing in the IMGT EMBL files is the feature indentation and long feature keys, many your original request to make the EMBL parser more tolerant is the best route. Thinking ahead would you also want to be able to write out IMGT variant EMBL files? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Apr 30 07:46:49 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Apr 2010 03:46:49 -0400 Subject: [Biopython-dev] [Bug 3071] EMBL parser does not parse RP lines correctly. In-Reply-To: Message-ID: <201004300746.o3U7kn2V009203@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3071 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2010-04-30 03:46 EST ------- Good point, thanks for posting an example too. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Apr 30 15:18:00 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Apr 2010 11:18:00 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004301518.o3UFI0UI022780@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 ------- Comment #8 from laserson at mit.edu 2010-04-30 11:17 EST ------- (In reply to comment #7) > You're in a much better position to access this - but could you ask them about > this anyway? They may at least clarify how they bend the EMBL specification. I am waiting to hear from them regarding all the changes compared with the EMBL spec. But I am not confident they are even sure. Part of the problem is the database was started over 20 years ago, so some older records may not have been updated properly. > Do they have a preferred file format (e.g. XML)? The only have a text file in their "EMBL" format. See here for all their download options: http://imgt.cines.fr/textes/IMGTdownloads.html > How I would try this would be to write a new scanner subclassing the EMBL > scanner in Bio/GenBank/Scanner.py (which probably only needs to override the > feature parsing), and then new functions in Bio/SeqIO/InsdcIO.py to call it > (matching the GenBank and EMBL functions), and define a new format name > (mabye "embl-imgt") in the dictionary in Bio/SeqIO/__init__.py and done. Done. I will upload the patch shortly. The code only reads the IMGT info. It does not write it. I can work on that as well, if you think it's prudent that every readable format should also be writable. > However, if the only out-of-specification thing in the IMGT EMBL files is the > feature indentation and long feature keys, many your original request to make > the EMBL parser more tolerant is the best route. I think it will actually be a headache to do so. Unless you want to rewrite the EMBL parser the way that I wrote the IMGT parser. The only thing that needed changing was handling the header lines. Once it finds an FH line, it uses the position of the "Location..." string to determine how indented the qualifiers are. > Thinking ahead would you also want to be able to write out IMGT variant EMBL > files? > I personally don't need this functionality, but I am willing to write it to complement the IMGT parser that I wrote. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Apr 30 15:23:45 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 30 Apr 2010 11:23:45 -0400 Subject: [Biopython-dev] [Bug 3069] More robust feature parser for GenBank/EMBL records In-Reply-To: Message-ID: <201004301523.o3UFNjcn022994@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3069 laserson at mit.edu changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1492 is|0 |1 obsolete| | ------- Comment #9 from laserson at mit.edu 2010-04-30 11:23 EST ------- Created an attachment (id=1497) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1497&action=view) New IMGT parser object This patch changes four files. Most of it is in Bio/GenBank/Scanner.py, and then there are a few extra additions to integrate it into SeqIO (for parse() and index()). I still have not been able to run through the whole IMGT database with this parser but this is because of actual errors in the IMGT records (which I will report back to the IMGT curators), or because of other bugs that I have discovered in the EMBL parser (from which the IMGT parser is derived; e.g., Bug #3071). However, it does breeze through most of the IMGT records without a problem, and handles both the EMBL-indented, and the IMGT-over-indented records. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee.