From chapmanb at 50mail.com Thu Jun 2 15:49:00 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 2 Jun 2011 15:49:00 -0400 Subject: [Biopython-dev] Early registration for BOSC ends tomorrow, Friday June 3 Message-ID: <20110602194900.GG21074@sobchak> If you haven't already registered for BOSC, now is your chance--after June 3, prices will go up! Registration for BOSC is through the ISMB main conference website: http://www.iscb.org/ismbeccb2011-registration#sigs . Since BOSC is a two-day SIG, the price is 2x the one-day SIG price listed on the ISMB website. You can register for BOSC without registering for the main ISMB conference, if you want. The preliminary BOSC schedule (subject to change) is now up at http://www.open-bio.org/wiki/BOSC_2011_Schedule (more details will be added soon). There is also a two day Codefest proceeding BOSC; please add yourself to the list of attendees if you are interested: http://www.open-bio.org/wiki/Codefest_2011 The BOSC talks have already been chosen, but we have spaces for last-minute posters. If you'd like your poster abstract to appear in the BOSC program, you should submit it now--see http://www.open-bio.org/wiki/BOSC_2011#Abstract_Submission_Information We hope to see you at BOSC! Nomi Harris Co-Chair, BOSC 2011 From eric.talevich at gmail.com Sun Jun 5 12:09:59 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 5 Jun 2011 12:09:59 -0400 Subject: [Biopython-dev] [biopython] Bugfix in test_Phylo; branch length formatter for Newick trees (#7) In-Reply-To: References:

<3B2A0BA4-3B13-4DEE-ADFB-E7253857E8DA@gmail.com> <7B5DB32C-25FB-43F6-A3CB-15848A975418@gmail.com> Message-ID: On Fri, May 27, 2011 at 9:48 AM, Erick Matsen wrote: > Hello everyone-- > > > Hope you don't mind my chiming into this discussion. > > > Good to know. The format for confidences is also hard-coded ("%1.2f"), do > > you suppose that should be given the same treatment? > > I think this would be entirely appropriate. There are some cases (eg > bootstrap) where the confidence is actually a count, and being able to > express it as such might be convenient. > > I have one related point to discuss if you don't mind. In > > > https://github.com/biopython/biopython/blob/master/Bio/Phylo/NewickIO.py#L246 > > trees without confidence values get written out as trees with confidence > values of zero. These are of course two different things. > > I realize that if we want to write out a tree without confidence values > we can specify branchlengths_only, but it would seem to me that the most > natural behavior would be to just write out confidence values when they > are specified. > > Hi Erick & folks, This commit should fix both those issues: https://github.com/biopython/biopython/commit/4ce56619cb13e27659927707e2979807d37b26b0 There's an issue with naming -- I called the argument "format_support" because all the other arguments refer to confidence as "support", since they were all copied from Bio.Nexus. The Bio.Phylo Clade attribute this affects is called "confidence", though. It's confusing either way. Thoughts & suggestions? -E From eric.talevich at gmail.com Sun Jun 5 14:53:58 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 5 Jun 2011 14:53:58 -0400 Subject: [Biopython-dev] [biopython] Bugfix in test_Phylo; branch length formatter for Newick trees (#7) In-Reply-To: References:

<3B2A0BA4-3B13-4DEE-ADFB-E7253857E8DA@gmail.com> <7B5DB32C-25FB-43F6-A3CB-15848A975418@gmail.com> Message-ID: On Sun, Jun 5, 2011 at 12:09 PM, Eric Talevich wrote: > This commit should fix both those issues: > > https://github.com/biopython/biopython/commit/4ce56619cb13e27659927707e2979807d37b26b0 > > There's an issue with naming -- I called the argument "format_support" > because all the other arguments refer to confidence as "support", since they > were all copied from Bio.Nexus. The Bio.Phylo Clade attribute this affects > is called "confidence", though. It's confusing either way. Thoughts & > suggestions? > To follow up on that last bit, would anyone be opposed to deprecating/renaming the other arguments in NewickIO functions to change all references from "support" to "confidence"? The keyword arguments are a little esoteric, I think; can we try a 1-release (or even 0-release) deprecation cycle here? -Eric From p.j.a.cock at googlemail.com Sun Jun 5 16:11:26 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 5 Jun 2011 21:11:26 +0100 Subject: [Biopython-dev] [biopython] Bugfix in test_Phylo; branch length formatter for Newick trees (#7) In-Reply-To: References:

<3B2A0BA4-3B13-4DEE-ADFB-E7253857E8DA@gmail.com> <7B5DB32C-25FB-43F6-A3CB-15848A975418@gmail.com> Message-ID: On Sun, Jun 5, 2011 at 7:53 PM, Eric Talevich wrote: >> >> There's an issue with naming -- I called the argument "format_support" >> because all the other arguments refer to confidence as "support", since they >> were all copied from Bio.Nexus. The Bio.Phylo Clade attribute this affects >> is called "confidence", though. It's confusing either way. Thoughts & >> suggestions? > > To follow up on that last bit, would anyone be opposed to > deprecating/renaming the other arguments in NewickIO functions to > change all references from "support" to "confidence"? The keyword > arguments are a little esoteric, I think; can we try a 1-release (or > even 0-release) deprecation cycle here? Do you think you could have both arg names supported in 1.58 (with a deprecation warning if the old names are used)? If so, and you post this to the main list and get no objections, I'm open to adding a deprecation warning in 1.58 (which could be relatively soon, before BOSC/ISMB would be my guess) and dropping the old names in 1.59. Peter From updates at feedmyinbox.com Sun Jun 5 15:27:28 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Sun, 5 Jun 2011 15:27:28 -0400 Subject: [Biopython-dev] 6/5 biopython Questions - BioStar Message-ID: <0de76f2a5cecc44812c6261fb9e96b92@74.63.51.88> // What is biopython and bioeclipse used for? // June 4, 2011 at 12:10 PM http://biostar.stackexchange.com/questions/8844/what-is-biopython-and-bioeclipse-used-for Just would like to know why do things like biopython and bioeclipse exist and in which bioinformatics context are they used -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/687953/851dd4cd10a2537cf271a85dfd1566976527e0cd/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From p.j.a.cock at googlemail.com Sun Jun 5 17:51:26 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 5 Jun 2011 22:51:26 +0100 Subject: [Biopython-dev] float('inf') or float('-inf') Message-ID: Hi all, As explained in PEP 754, prior to Python 2.6 float('inf'), float('-inf') and also float('nan') were passed to the underlying C library, which may or may not return the IEEE special floating point value for infinity, minus infinity or nan. See: http://www.python.org/dev/peps/pep-0754/ This is the root cause of this unit test failure on Windows Python 2.5, http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.5/builds/224/steps/shell/logs/stdio ====================================================================== ERROR: Test a simple model with 2 states and 2 symbols. ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\repositories\BuildBot\win25\build\Tests\test_HMMGeneral.py", line 260, in test_simple_hmm viterbi = model.viterbi(observed_emissions, NumberAlphabet) File "c:\repositories\BuildBot\win25\build\build\lib.win32-2.5\Bio\HMM\MarkovModel.py", line 499, in viterbi log_initial = self._log_transform(self.initial_prob) File "c:\repositories\BuildBot\win25\build\build\lib.win32-2.5\Bio\HMM\MarkovModel.py", line 598, in _log_transform neg_inf = float("-inf") ValueError: invalid literal for float(): -inf and similarly on Windows Python 2.4, http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.4/builds/226/steps/shell/logs/stdio This test failure is a result of me committing Walter Gillett's fix to Txt Bio/HMM/MarkovModel.py last week: https://github.com/biopython/biopython/commit/152f469179d4a142858a04c02169f8d1fc5f8c83 We would have spotted this earlier, but the Windows buildslave machine has been in use by a new staff member (a temporary arrangement). I guess we need more volunteer buildslave machines... Based on the PEP 754 example, 1E400 may work instead: try: neg_inf = float("-inf") except ValueError: neg_inf = -1E400 I'll try to test this on the machine later in the week when I get a chance. If anyone else has Python 2.4 or 2.5 on Windows and wants to look at this now, please go ahead. Regards, Peter From b.invergo at gmail.com Fri Jun 10 06:53:12 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Fri, 10 Jun 2011 12:53:12 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: <20110228163521.GF9652@sobchak.mgh.harvard.edu> References: <20110114154035.GC30193@sobchak.mgh.harvard.edu>

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> Message-ID: Hi everyone, It's been quite a while since I've updated you with my PAML progress. My side projects had to take a back seat to my PhD research for a while, so I couldn't work on it. Anyway, I finally got back to it and implemented some much-needed restructuring as suggested. I've implemented a generalized PAML class which the others inherit from to reduce duplicated code. I've also created files containing helper functions explicitly for parsing the result files. So, the Codeml.read() function now does all of its parsing through functions held in the _parse_codeml.py file. This was done for both clarity and cleanliness. I've taken the suggestion to split the parsing task into several functions so I hope it's all a bit more readable now. I certainly think it is; I was hesitant at first but now that it's done I see how much better it is. I also caught some really poor parts of the code which I was able to fix. Codeml parsing remains a bit complicated compared to Baseml and Yn00 but I think that's just the nature of the beast. So, if anyone has a moment, could you take a moment to do a quick code review? Barring any major changes, I'll send a message to the users list to see if people would be willing to take it for a test drive. https://github.com/brandoninvergo/biopython/tree/paml-branch/Bio/Phylo/PAML There is one other problem. As you may recall, I decided to reimplement the Chi2 program from the PAML package to provide a convenient means to do likelihood ratio testing without having to load another library (scipy, rpy2). The original was written in C but had limited command-line options so I couldn't just write an interface to it. Re-writing the code in Python seemed to work fine, as far as getting the correct results/output. However, I later found that doing tests with large degrees of freedom (one codeml model comparison requires 41 df) takes an exorbitant amount of time compared to the C code. So, I see three options: dig into the code to try to find ways to optimize it, look into something like Weave for compiling the C code into a Python module, or just remove Chi2 for now and wait for him to release a version that takes command line arguments (which he claims is coming in the next version). Any thoughts on this matter? Cheers, Brandon On Mon, Feb 28, 2011 at 5:35 PM, Brad Chapman wrote: > Brandon; > > [pypaml branch: https://github.com/brandoninvergo/biopython/tree/paml-branch] > > [base class] >> This is a really good idea and I'm a bit disappointed that I didn't >> see it myself! Indeed, most of the functionality is just copied/pasted >> between the classes, with only some variation in the >> read/write_ctl_file functions for codeml and baseml. So, writing a >> base class would really simplify things. I do have one question, >> though, since this is my first time organizing my code in a >> large-scale Python project. Where would be the best place to implement >> this base paml class? In __init__.py or in its own paml.py file? I >> know the end result would be the same but I figure I should start >> learning some of these best practices. > > It's always easier to get perspective on code when you haven't been > directly in the middle of it. Even if you don't have someone to do > code reviews, stepping away from a project and coming back later > will often lead to a bunch of insights. > > For the base class, I would follow Eric and Peter's example and use > files in the same directory with an underscore: something like _shared.py > or _base.py. > > [read functions] >> This mess is precisely why I had to include so many different >> output files for the unittesting (codeml is the main culprit; baseml >> is moderately bad; yn00 isn't a problem) > > I definitely feel your pain on this. This is exactly why your work > doing this is appreciated; you'll save someone a lot of headache > later on. > >> So, because I would potentially end up scanning almost the entire file >> just to figure out what's going on, I think just parsing-as-you-go, >> using elif statements to short-circuit and skip further evaluations of >> a line after a match has been found, would be the better option. >> Perhaps the files aren't long enough to be able to make an appeal for >> computational efficiency but at the same time, I hesitate to read >> through the file multiple times unnecessarily. I agree, though, that >> this makes the read() function quite long. For that, though, I tried >> to provide descriptive comments before each parsing case, describing >> exactly what the next block of code is meant to parse and also >> including a specific example line which should be parsed by it. > > The issue really is that deeply nested code is hard to read, > long functions are hard to read, and when you combine them together > it just makes it very difficult for others to follow your logic. > > I don't think you necessarily have to make multiple passes to parse it > in a more structure way, but what you would want to focus on is making > the flow through the function simpler. The way I would normally attack > this is to break components into smaller more re-usable functions. > Here's a concrete example from the start of the codeml parser: > > https://github.com/brandoninvergo/biopython/blob/paml-branch/Bio/Phylo/PAML/codeml.py > > siteclass_re = re.match("Site-class models:\s*(.*)", line) > if siteclass_re is not None: > ? ?siteclass_model = siteclass_re.group(1) > ? ?if siteclass_model == "": > ? ? ? ?multi_models = True > ? ? ? ?continue > ? ?results["site-class model"] = siteclass_model > ? ?if siteclass_model == "NearlyNeutral": > ? ? ? ?current_model = 1 > ? ? ? ?results["NSsites"][current_model] = \ > ? ? ? ? ? ?{"description":siteclass_model} > ? ? ? ?if 0 in results["NSsites"]: > ? ? ? ? ? ?del results["NSsites"][0] > ? ?elif siteclass_model == "PositiveSelection": > ? ? ? ?current_model = 2 > ? ? ? ?results["NSsites"][current_model] = \ > ? ? ? ? ? ?{"description":siteclass_model} > ? ? ? ?if 0 in results["NSsites"]: > ? ? ? ? ? ?del results["NSsites"][0] > ? ?elif siteclass_model == "discrete (4 categories)": > ? ? ? ?current_model = 3 > ? ? ? ?results["NSsites"][current_model] = \ > ? ? ? ? ? ?{"description":siteclass_model} > ? ? ? ?if 0 in results["NSsites"]: > ? ? ? ? ? ?del results["NSsites"][0] > ? ?elif siteclass_model == "beta (4 categories)": > ? ? ? ?current_model = 7 > ? ? ? ?results["NSsites"][current_model] = \ > ? ? ? ? ? ?{"description":siteclass_model} > ? ? ? ?if 0 in results["NSsites"]: > ? ? ? ? ? ?del results["NSsites"][0] > ? ?elif siteclass_model == "beta&w>1 (5 categories)": > ? ? ? ?current_model = 8 > ? ? ? ?results["NSsites"][current_model] = \ > ? ? ? ? ? ?{"description":siteclass_model} > ? ? ? ?if 0 in results["NSsites"]: > ? ? ? ? ? ?del results["NSsites"][0] > > You could refactor this something along the lines of: > > class _CodemlParser: > ? ?def __init__(self): > ? ? ? ?self.results = {} > ? ? ? ?self.flags = dict(multi_models = False) > > ? ?def read(self, results_handle): > ? ? ? ?for line in results_handle: > ? ? ? ? ? ?siteclass_re = re.match("Site-class models:\s*(.*)", line) > ? ? ? ? ? ?if siteclass_re is not None: > ? ? ? ? ? ? ? ?self._siteclass_parse(siteclass_re) > > ? ?def _add_siteclass_model(self, siteclass_model): > ? ? ? ?self.results["site-class model"] = siteclass_model > ? ? ? ?name_to_num = {"NearlyNeutral": 1, > ? ? ? ? ? ? ? ? ? ? ? "PositiveSelection": 2, > ? ? ? ? ? ? ? ? ? ? ? "discrete (4 categories)": 3, > ? ? ? ? ? ? ? ? ? ? ? "beta (4 categories)": 7 > ? ? ? ? ? ? ? ? ? ? ? "beta&w>1 (5 categories)": 8} > ? ? ? ?current_model = name_to_num[siteclass_model] > ? ? ? ?self.results["NSsites"][current_model] = {"description":siteclass_model} > ? ? ? ?if 0 in results["NSsites"]: > ? ? ? ? ? ?del results["NSsites"][0] > > ? ?def _siteclass_parse(self, siteclass_re): > ? ? ? ?if siteclass_model == "": > ? ? ? ? ? ?self.flags["multi_models"] = True > ? ? ? ?else: > ? ? ? ? ? ?self._add_siteclass_model(siteclass_model) > > You are not changing the parsing strategy, but now you've got > individual functions handling each of the steps so it's clear that > the _siteclass_parse either sets multi_models or adds details about > the single model. Then you can dig into the _add_siteclass_model > function to see what it is doing. To the reader, each individual > unit can be read and understood separately. > > This type of refactoring work is useful generally. I have to do it all > the time in my work and discover new tricks and approaches. Hope this > is helpful and thanks again for all the work on this, > Brad > From p.j.a.cock at googlemail.com Fri Jun 10 06:58:37 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 10 Jun 2011 11:58:37 +0100 Subject: [Biopython-dev] pypaml In-Reply-To: References: <20110114154035.GC30193@sobchak.mgh.harvard.edu>

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> Message-ID: On Fri, Jun 10, 2011 at 11:53 AM, Brandon Invergo wrote: > > There is one other problem. As you may recall, I decided to > reimplement the Chi2 program from the PAML package to provide a > convenient means to do likelihood ratio testing without having to load > another library (scipy, rpy2). The original was written in C but had > limited command-line options so I couldn't just write an interface to > it. Re-writing the code in Python seemed to work fine, as far as > getting the correct results/output. However, I later found that doing > tests with large degrees of freedom (one codeml model comparison > requires 41 df) takes an exorbitant amount of time compared to the C > code. So, I see three options: dig into the code to try to find ways > to optimize it, look into something like Weave for compiling the C > code into a Python module, or just remove Chi2 for now and wait for > him to release a version that takes command line arguments (which he > claims is coming in the next version). Any thoughts on this matter? Adding C code has drawbacks as it has to compile on multiple platforms, and cannot be used in Jython (and likely not in IronPython or PyPy - I've not kept up to date with their progress). Also, it doesn't see a good use of time to reimplement something the next version of PAML will include. I would wait for the next version of Chi2. Peter From p.j.a.cock at googlemail.com Fri Jun 10 07:17:11 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 10 Jun 2011 12:17:11 +0100 Subject: [Biopython-dev] pypaml In-Reply-To: References: <20110114154035.GC30193@sobchak.mgh.harvard.edu>

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu>

Message-ID: On Fri, Jun 10, 2011 at 12:09 PM, Brandon Invergo wrote: > On Fri, Jun 10, 2011 at 12:58 PM, Peter Cock wrote: >> Adding C code has drawbacks as it has to compile on multiple >> platforms, and cannot be used in Jython (and likely not in IronPython >> or PyPy - I've not kept up to date with their progress). Also, it doesn't >> see a good use of time to reimplement something the next version >> of PAML will include. >> >> I would wait for the next version of Chi2. > > Ok that makes sense. I'll remove chi2.py from the module (though if > people still want it, it'll still be over at my pypaml.googlecode.com > page). I'll keep my eye on his page for new versions. OK > I also just now decided that I'm going to change all of the default > control file option values. At the moment, they're set to the values > found in example files when you download the PAML package. However > these don't necessarily reflect the most common values. I just see it > being a source of confusion and or problems in the future (ie people > not realizing that an option is set to a value that will negatively > affect their results). Better to set all the values to None and let > the user decide what are the best values. That sounds best. Peter From eric.talevich at gmail.com Fri Jun 10 11:59:44 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 10 Jun 2011 11:59:44 -0400 Subject: [Biopython-dev] pypaml In-Reply-To: References: <20110114154035.GC30193@sobchak.mgh.harvard.edu>

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> Message-ID: On Fri, Jun 10, 2011 at 6:53 AM, Brandon Invergo wrote: > There is one other problem. As you may recall, I decided to > reimplement the Chi2 program from the PAML package to provide a > convenient means to do likelihood ratio testing without having to load > another library (scipy, rpy2). The original was written in C but had > limited command-line options so I couldn't just write an interface to > it. Re-writing the code in Python seemed to work fine, as far as > getting the correct results/output. However, I later found that doing > tests with large degrees of freedom (one codeml model comparison > requires 41 df) takes an exorbitant amount of time compared to the C > code. So, I see three options: dig into the code to try to find ways > to optimize it, look into something like Weave for compiling the C > code into a Python module, or just remove Chi2 for now and wait for > him to release a version that takes command line arguments (which he > claims is coming in the next version). Any thoughts on this matter? > If you've already ported the code to pure Python or Python+Numpy/Scipy, do you think it would make sense to provide this function under Bio.Phylo._utils instead of in your PAML module? Then users would be able to do a likelihood ratio test on trees without having the PAML binaries installed. The pure-Python version would still be handy for smaller degrees of freedom, and if someone happens to be using PyPy it would probably be wicked fast. The best solution is probably Numpy, rather than Scipy, since other parts of Biopython already use Numpy as an optional dependency. (Right now, Bio.Phylo runs on Python 3, Jython, and Pypy, so adding and supporting a hand-written C extension on all of these platforms is probably not worth the trouble.) Thanks, Eric From b.invergo at gmail.com Fri Jun 10 12:11:19 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Fri, 10 Jun 2011 18:11:19 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: References: <20110114154035.GC30193@sobchak.mgh.harvard.edu>

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> Message-ID: On Fri, Jun 10, 2011 at 5:59 PM, Eric Talevich wrote: > If you've already ported the code to pure Python or Python+Numpy/Scipy, do > you think it would make sense to provide this function under > Bio.Phylo._utils instead of in your PAML module? Then users would be able to > do a likelihood ratio test on trees without having the PAML binaries > installed. That would be fine by me. I guess there's not much sense in getting rid of it entirely since it's already been written. > The pure-Python version would still be handy for smaller degrees of freedom, > and if someone happens to be using PyPy it would probably be wicked fast. > The best solution is probably Numpy, rather than Scipy, since other parts of > Biopython already use Numpy as an optional dependency. I don't think Numpy has a Chi^2 cumulative distribution function; you can only draw random numbers from the distribution. Scipy has that, but as a user, I would be annoyed to have to install a huge package just to perform likelihood ratio tests. I like the idea of having it built into biopython, since I think it's a fairly common procedure given all the maximum likelihood techniques in biology. That's just my 2 cents though... Cheers, -Brandon From chapmanb at 50mail.com Sat Jun 11 11:59:00 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Sat, 11 Jun 2011 11:59:00 -0400 Subject: [Biopython-dev] pypaml In-Reply-To: References:

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> Message-ID: <20110611155900.GB2831@kunkel> Brandon; > It's been quite a while since I've updated you with my PAML progress. > My side projects had to take a back seat to my PhD research for a > while, so I couldn't work on it. Anyway, I finally got back to it and > implemented some much-needed restructuring as suggested. Thanks very much for taking this on. The restructuring looks fantastic. > I've taken the suggestion to split the parsing task into > several functions so I hope it's all a bit more readable now. I > certainly think it is; I was hesitant at first but now that it's done > I see how much better it is. Really glad the comments were helpful. It really is the hardest thing in programming to mess with a bunch of working code for the sake of trying to refactor it, and you've done excellent work. I only have one more small suggestion. A number of the functions take a results dictionary and then modify it directly, taking advantage of the fact that it's the same object. For instance, 'parse_parameters' in _parse_baseml.py looks like: results["parameters"] = {} parse_parameter_list(lines, results, num_params) parse_kappas(lines, results) parse_rates(lines, results) parse_freqs(lines, results) A nice way to do this is to pass in and return the modified dictionary, so it is clear what is happening in the function. Ideally, this would look like: parameters = {} parameters = parse_parameter_list(lines, parameters, num_params) parameters = parse_kappas(lines, parameters) parameters = parse_rates(lines, parameters) parameters = parse_freqs(lines, parameters) results["parameters"] = parameters For someone reading the code this makes it more explicit that each of those functions modifies the 'parameters' dictionary. Otherwise the side effects that change the results or parameters dictionary could be missed. For the Chi2 question, I'm 100% agreed with Peter and Eric. The pure python version could be useful, but no sense re-writing a C version if an external one exists in Scipy. PyCogent also has some functionality here as well: http://pycogent.sourceforge.net/cookbook/standard_statistical_analyses.html#chi-square Thanks again for all your work, Brad From b.invergo at gmail.com Sun Jun 12 08:28:31 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Sun, 12 Jun 2011 14:28:31 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: <20110611155900.GB2831@kunkel> References:

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> Message-ID: > I only have one more small suggestion. A number of the functions > take a results dictionary and then modify it directly, taking > advantage of the fact that it's the same object. For instance, > 'parse_parameters' in _parse_baseml.py looks like: > > results["parameters"] = {} > parse_parameter_list(lines, results, num_params) > parse_kappas(lines, results) > parse_rates(lines, results) > parse_freqs(lines, results) > > A nice way to do this is to pass in and return the modified > dictionary, so it is clear what is happening in the function. > Ideally, this would look like: > > parameters = {} > parameters = parse_parameter_list(lines, parameters, num_params) > parameters = parse_kappas(lines, parameters) > parameters = parse_rates(lines, parameters) > parameters = parse_freqs(lines, parameters) > results["parameters"] = parameters > > For someone reading the code this makes it more explicit that each > of those functions modifies the 'parameters' dictionary. Otherwise > the side effects that change the results or parameters dictionary > could be missed. Done! Funnily enough, it was originally that way but then I remembered that Python passes arguments by reference, so I changed it to not return the dict every time. I thought I was being clever and Pythonic but I agree that this way is more readable/maintainable. > For the Chi2 question, I'm 100% agreed with Peter and Eric. The pure > python version could be useful, but no sense re-writing a C version > if an external one exists in Scipy. PyCogent also has some > functionality here as well: > > http://pycogent.sourceforge.net/cookbook/standard_statistical_analyses.html#chi-square Ok, the pure Python version is back in. Once PAML is officially part of Biopython, I can write some documentation for the wiki and provide a warning about the high df values... Cheers, Brandon From chapmanb at 50mail.com Tue Jun 14 08:17:54 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 14 Jun 2011 08:17:54 -0400 Subject: [Biopython-dev] pypaml In-Reply-To: References:

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> Message-ID: <20110614121754.GF2552@kunkel> Brandon; [pass and return parameters] > Done! > Funnily enough, it was originally that way but then I remembered that > Python passes arguments by reference, so I changed it to not return > the dict every time. I thought I was being clever and Pythonic but I > agree that this way is more readable/maintainable. That looks fabulous; thanks much. We can invoke the 'Explicit is better than implicit' clause of the Zen of Python for this one. > Ok, the pure Python version is back in. Once PAML is officially part > of Biopython, I can write some documentation for the wiki and provide > a warning about the high df values... Great, thanks for this as well. From my point of view, the only thing you need is some documentation to finish it off. It would definitely be worthwhile to send it to the main list to see if others have feedback. I'm happy to work on merging it in if everyone else is agreed. Thanks again, Brad From b.invergo at gmail.com Tue Jun 14 09:23:12 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Tue, 14 Jun 2011 15:23:12 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: <20110614121754.GF2552@kunkel> References:

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> Message-ID: > Great, thanks for this as well. From my point of view, the only > thing you need is some documentation to finish it off. It would > definitely be worthwhile to send it to the main list to see if > others have feedback. I'm happy to work on merging it in if everyone > else is agreed. Ok I've just sent the email to the main list. I can write up some documentation this week. What is the official procedure for adding documentation to the wiki, if any? Or can I just create an account and start writing? Also, just to double-check, are my docstrings all sufficient or should I expand those? Lastly, I've been having trouble trying to merge the upstream repo with my master branch using $ git pull upstream master (I have set the upstream to the biopython repo as described in the wiki) The connection routinely times out. This is on my lab computer, though, which is behind a proxy that always causes troubles. I'll try again from my home computer this evening. Just thought I'd mention it for now... Cheers, Brandon From chapmanb at 50mail.com Wed Jun 15 07:54:25 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 15 Jun 2011 07:54:25 -0400 Subject: [Biopython-dev] pypaml In-Reply-To: References:

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> Message-ID: <20110615115425.GB22528@sobchak> Brandon; > Ok I've just sent the email to the main list. Awesome, thanks for this. Hope this convinces some other folks to take a look. > I can write up some documentation this week. What is the official > procedure for adding documentation to the wiki, if any? Or can I just > create an account and start writing? Create an account and start writing. Nothing official except that documentation is good. > Also, just to double-check, are my docstrings all sufficient or should > I expand those? Your code comments looked great to me. The end user documentation seems to be the main thing at this point: describing how someone can pick up and get started with the code. Thanks again for all the work, Brad From chapmanb at 50mail.com Wed Jun 15 09:03:54 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 15 Jun 2011 09:03:54 -0400 Subject: [Biopython-dev] [Fwd: Bug in Biopython Phyloxml reader] Message-ID: <20110615130354.GD22528@sobchak> Nicolas; Thanks for reporting the problem. The Biopython mailing lists (http://biopython.org/wiki/Mailing_lists) are the right place to report these types of issues. Hopefully Eric or someone else on the list will be able to help. Thanks again, Brad ----- Forwarded message from Nicolas Rochette ----- Date: Wed, 15 Jun 2011 14:25:09 +0200 Sir, I apologize for contacting you directly, but I could not find the right place for this report. Could you please forward it ? The bug is about 0-length terminal branches being given a length of 1 ; please find an example below. Regards, Nicolas Rochette PhD student Laboratory for Biometry and Evolutive Biology Lyon, France // echo '(A:2,(B:0,C:3):0,D:5);' > foo.newick java -cp forester.jar org.forester.application.phyloxml_converter -f=nn -i foo.newick foo.phyloxml python -c 'from Bio import Phylo; Phylo.write(Phylo.read("foo.phyloxml","phyloxml"), "bar.newick", "newick")' cat bar.newick (A:2.00000,(B:1.00000,C:3.00000)0.00000:0.00000,D:5.00000)0.00:0.00000; ----- End forwarded message ----- From eric.talevich at gmail.com Wed Jun 15 10:20:41 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 15 Jun 2011 10:20:41 -0400 Subject: [Biopython-dev] [Fwd: Bug in Biopython Phyloxml reader] In-Reply-To: <20110615130354.GD22528@sobchak> References: <20110615130354.GD22528@sobchak> Message-ID: Hi Nicolas and Brad, Thanks for reporting and forwarding this. The 0 -> 1 terminal branch lengths are another surprising default that crept in during the port of NewickIO; I don't see a problem with changing it to keep 0-length branches at the tips. The phyloXML parser and writer shouldn't be introducing this bug; it should just be occurring when writing in Newick or Nexus formats. I'll keep you posted on the fix, probably this weekend. If you'd like to try it yourself, the code to edit is in Bio/Phylo/NewickIO.py. Best, Eric On Wed, Jun 15, 2011 at 9:03 AM, Brad Chapman wrote: > Nicolas; > Thanks for reporting the problem. The Biopython mailing lists > (http://biopython.org/wiki/Mailing_lists) are the right place to > report these types of issues. > > Hopefully Eric or someone else on the list will be able to help. > Thanks again, > Brad > > ----- Forwarded message from Nicolas Rochette < > nicolas.rochette at univ-lyon1.fr> ----- > > Date: Wed, 15 Jun 2011 14:25:09 +0200 > > Sir, > > I apologize for contacting you directly, but I could not find the > right place for this report. Could you please forward it ? > > The bug is about 0-length terminal branches being given a length of 1 > ; please find an example below. > > Regards, > > Nicolas Rochette > PhD student > Laboratory for Biometry and Evolutive Biology > Lyon, France > > // > > echo '(A:2,(B:0,C:3):0,D:5);' > foo.newick > java -cp forester.jar org.forester.application.phyloxml_converter > -f=nn -i foo.newick foo.phyloxml > python -c 'from Bio import Phylo; > Phylo.write(Phylo.read("foo.phyloxml","phyloxml"), "bar.newick", > "newick")' > cat bar.newick > > (A:2.00000,(B:1.00000,C:3.00000)0.00000:0.00000,D:5.00000)0.00:0.00000; > > > ----- End forwarded message ----- > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From eric.talevich at gmail.com Wed Jun 15 22:29:18 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 15 Jun 2011 22:29:18 -0400 Subject: [Biopython-dev] [Fwd: Bug in Biopython Phyloxml reader] In-Reply-To: References: <20110615130354.GD22528@sobchak> Message-ID: Folks, I pushed a very small fix: https://github.com/biopython/biopython/commit/db9d876ba2199efcce067241f5554ef701cb70e3 It appears I misunderstood the Nexus.Trees code while I was porting NewickIO, so there was no good reason for this behavior to be the default. Anyway, it behaves as expected now. Nicolas: Bio.Phylo includes a function called 'convert' which you may also find useful. >>> Phylo.convert('foo.xml', 'phyloxml', 'bar.nwk', 'newick') Cheers, Eric On Wed, Jun 15, 2011 at 10:20 AM, Eric Talevich wrote: > Hi Nicolas and Brad, > > Thanks for reporting and forwarding this. The 0 -> 1 terminal branch > lengths are another surprising default that crept in during the port of > NewickIO; I don't see a problem with changing it to keep 0-length branches > at the tips. The phyloXML parser and writer shouldn't be introducing this > bug; it should just be occurring when writing in Newick or Nexus formats. > > I'll keep you posted on the fix, probably this weekend. If you'd like to > try it yourself, the code to edit is in Bio/Phylo/NewickIO.py. > > Best, > Eric > > > On Wed, Jun 15, 2011 at 9:03 AM, Brad Chapman wrote: > >> Nicolas; >> Thanks for reporting the problem. The Biopython mailing lists >> (http://biopython.org/wiki/Mailing_lists) are the right place to >> report these types of issues. >> >> Hopefully Eric or someone else on the list will be able to help. >> Thanks again, >> Brad >> >> ----- Forwarded message from Nicolas Rochette < >> nicolas.rochette at univ-lyon1.fr> ----- >> >> Date: Wed, 15 Jun 2011 14:25:09 +0200 >> >> Sir, >> >> I apologize for contacting you directly, but I could not find the >> right place for this report. Could you please forward it ? >> >> The bug is about 0-length terminal branches being given a length of 1 >> ; please find an example below. >> >> Regards, >> >> Nicolas Rochette >> PhD student >> Laboratory for Biometry and Evolutive Biology >> Lyon, France >> >> // >> >> echo '(A:2,(B:0,C:3):0,D:5);' > foo.newick >> java -cp forester.jar org.forester.application.phyloxml_converter >> -f=nn -i foo.newick foo.phyloxml >> python -c 'from Bio import Phylo; >> Phylo.write(Phylo.read("foo.phyloxml","phyloxml"), "bar.newick", >> "newick")' >> cat bar.newick >> >> (A:2.00000,(B:1.00000,C:3.00000)0.00000:0.00000,D:5.00000)0.00:0.00000; >> >> >> ----- End forwarded message ----- >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > > From updates at feedmyinbox.com Thu Jun 16 07:12:51 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Thu, 16 Jun 2011 07:12:51 -0400 Subject: [Biopython-dev] 6/16 biopython Questions - BioStar Message-ID: <5a4c5e8b0ca0dec3061963e9b2bd7aa2@74.63.51.88> // Reduce BLAST XML size? // June 15, 2011 at 10:59 PM http://biostar.stackexchange.com/questions/9246/reduce-blast-xml-size Hi, I have a really large BLAST XML file - something like 30gb in size. I'd like to reduce it so I can run through it quicker with Biopython. Is there a way to reduce the file by keeping something like the top 25 hits based on bitscore for each query. Preferably I'd like to do with Python/Biopython. Thanks // running tests/tools in MTAP platform // June 14, 2011 at 12:46 PM http://biostar.stackexchange.com/questions/9155/running-tests-tools-in-mtap-platform My goal is to run different tools and compare the results in MTAP. Tools include SLIMFINDER, D-STAR, MEME, DILIMOT - input known linear motif sequences and get the results. How do i run these tests in MTAP? What files do i need to edit? What are the command line options that will do this? // What is biopython and bioeclipse used for? // June 4, 2011 at 12:10 PM http://biostar.stackexchange.com/questions/8844/what-is-biopython-and-bioeclipse-used-for Just would like to know why do things like biopython and bioeclipse exist and in which bioinformatics context are they used -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782463/cfe3e2c307e215f87d612a439b646b9c22290b84/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From updates at feedmyinbox.com Thu Jun 16 07:12:51 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Thu, 16 Jun 2011 07:12:51 -0400 Subject: [Biopython-dev] 6/16 newest questions tagged biopython - Stack Overflow Message-ID: <1e867a9860819ff5bf648e82dcf1d07b@74.63.51.88> // Convert GenBank Flatfiles to FASTA // June 13, 2011 at 5:55 PM http://stackoverflow.com/questions/6336853/convert-genbank-flatfiles-to-fasta I need to parse a preliminary GenBank Flatfile. The sequence hasn't been published yet, so I can't look it up by accession and download a FASTA file. I'm new to Bioinformatics, so could someone show me where I could find a BioPerl or BioPython script to do this myself? Thanks! // have trouble in installing biopython package // June 7, 2011 at 3:44 PM http://stackoverflow.com/questions/6270730/have-trouble-in-installing-biopython-package im using mac 10.6.7, and xcode 4 with gcc 4.2 installed. but when i was installing biopython with: python setup.py install on the command, it gives out error on gcc: 10-54-41-155-wireless1x:biopython-1.57 xueran2010$ python setup.py install running install running build running build_py running build_ext building 'Bio.cpairwise2' extension gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch ppc -arch x86_64 -pipe -IBio -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 -c Bio/cpairwise2module.c -o build/temp.macosx-10.6-universal-2.6/Bio/cpairwise2module.o /usr/libexec/gcc/powerpc-apple-darwin10/4.2.1/as: assembler (/usr/bin/../libexec/gcc/darwin/ppc/as or /usr/bin/../local/libexec/gcc/darwin/ppc/as) for architecture ppc not installed Installed assemblers are: /usr/bin/../libexec/gcc/darwin/x86_64/as for architecture x86_64 /usr/bin/../libexec/gcc/darwin/i386/as for architecture i386 Bio/cpairwise2module.c:639: fatal error: error writing to -: Broken pipe compilation terminated. lipo: can't open input file: /var/folders/ir/ir6RCJTKGB4QU5sVdTXwt++++TI/-Tmp-//cccUvTiF.out (No such file or directory) error: command 'gcc-4.2' failed with exit status 1 -- Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782465/c6ce4e74edf1048798e4b627c86b1b0b51013840/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From b.invergo at gmail.com Thu Jun 16 11:34:00 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 16 Jun 2011 17:34:00 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: <20110615115425.GB22528@sobchak> References:

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> Message-ID: Ok, the documentation is finished: http://biopython.org/wiki/PAML Cheers, Brandon On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman wrote: > Brandon; > >> Ok I've just sent the email to the main list. > > Awesome, thanks for this. Hope this convinces some other folks to > take a look. > >> I can write up some documentation this week. What is the official >> procedure for adding documentation to the wiki, if any? Or can I just >> create an account and start writing? > > Create an account and start writing. Nothing official except that > documentation is good. > >> Also, just to double-check, are my docstrings all sufficient or should >> I expand those? > > Your code comments looked great to me. The end user documentation > seems to be the main thing at this point: describing how someone can > pick up and get started with the code. > > Thanks again for all the work, > Brad > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From p.j.a.cock at googlemail.com Thu Jun 16 14:44:48 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 16 Jun 2011 19:44:48 +0100 Subject: [Biopython-dev] float('inf') or float('-inf') In-Reply-To: References: Message-ID: On Sun, Jun 5, 2011 at 10:51 PM, Peter Cock wrote: > Hi all, > > As explained in PEP 754, prior to Python 2.6 float('inf'), float('-inf') > and also float('nan') were passed to the underlying C library, which > may or may not return the IEEE special floating point value for > infinity, minus infinity or nan. See: > http://www.python.org/dev/peps/pep-0754/ > > This is the root cause of this unit test failure on Windows Python 2.5, > http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.5/builds/224/steps/shell/logs/stdio > ... > > Based on the PEP 754 example, 1E400 may work instead: > > try: > ? ?neg_inf = float("-inf") > except ValueError: > ? ?neg_inf = ?-1E400 > > I'll try to test this on the machine later in the week when I get > a chance. If anyone else has Python 2.4 or 2.5 on Windows > and wants to look at this now, please go ahead. Committed now, after testing locally: https://github.com/biopython/biopython/commit/60f28efdfe19fdcb2bb3ca4dc2ca6a222be97394 Peter From p.j.a.cock at googlemail.com Thu Jun 16 15:15:30 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 16 Jun 2011 20:15:30 +0100 Subject: [Biopython-dev] Failure in test_Phylo; `format_branchlength` parameter to NewickIO writers. Message-ID: Hi all, Looking at the Python 2.5 on Windows buildbot, there is a recent failure: http://testing.open-bio.org:8010/builders/Windows%20XP%20-%20Python%202.5 ====================================================================== FAIL: Custom format string for Newick branch length serialization. ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\repositories\BuildBot\win25\build\Tests\test_Phylo.py", line 52, in test_format_branch_length self.assertEqual(mem_file.getvalue().strip(), 'A:1e-01;') AssertionError: 'A:1e-001;' != 'A:1e-01;' ---------------------------------------------------------------------- Last known good build was 2c30a0cb7e32fe4a4e2e5bc5573778ba07fa1b83 http://testing.open-bio.org:8010/builders/Windows%20XP%20-%20Python%202.5/builds/223 Bug present in 3f911d5d0683437f26d6c40d109506f8bdcd1298 http://testing.open-bio.org:8010/builders/Windows%20XP%20-%20Python%202.5/builds/224 Also in latest, 60f28efdfe19fdcb2bb3ca4dc2ca6a222be97394 http://testing.open-bio.org:8010/builders/Windows%20XP%20-%20Python%202.5/builds/225 It looks like something from the changes on 2011-05-26 is to blame, and a platform specific floating point assumption in the test. Peter From eric.talevich at gmail.com Fri Jun 17 00:18:03 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 17 Jun 2011 00:18:03 -0400 Subject: [Biopython-dev] Failure in test_Phylo; `format_branchlength` parameter to NewickIO writers. In-Reply-To: References: Message-ID: On Thu, Jun 16, 2011 at 3:15 PM, Peter Cock wrote: > Hi all, > > Looking at the Python 2.5 on Windows buildbot, there is a recent failure: > http://testing.open-bio.org:8010/builders/Windows%20XP%20-%20Python%202.5 > > ====================================================================== > FAIL: Custom format string for Newick branch length serialization. > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "c:\repositories\BuildBot\win25\build\Tests\test_Phylo.py", > line 52, in test_format_branch_length > self.assertEqual(mem_file.getvalue().strip(), 'A:1e-01;') > AssertionError: 'A:1e-001;' != 'A:1e-01;' > > ---------------------------------------------------------------------- > > Urgh. This is only occurring on Windows in Py2.5, not on 2.6? I suppose these platform-specific differences have been ironed out over time in Python. To make the test work on Win+Py2.5, which of these do you prefer? (a) change the test to assert_(memfile.getvalue().lstrip().startswith('A:1e-')) (b) Test for the Python version, and if less than 2.6, do the same test as (a), but also include an angry note about Windows compatibility in earlier Python versions. Thanks for spotting this, Eric From p.j.a.cock at googlemail.com Fri Jun 17 01:22:02 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 17 Jun 2011 06:22:02 +0100 Subject: [Biopython-dev] Failure in test_Phylo; `format_branchlength` parameter to NewickIO writers. In-Reply-To: References: Message-ID: On Fri, Jun 17, 2011 at 5:18 AM, Eric Talevich wrote: > On Thu, Jun 16, 2011 at 3:15 PM, Peter Cock wrote: >> >> FAIL: Custom format string for Newick branch length serialization. >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ?File "c:\repositories\BuildBot\win25\build\Tests\test_Phylo.py", >> line 52, in test_format_branch_length >> ? ?self.assertEqual(mem_file.getvalue().strip(), 'A:1e-01;') >> AssertionError: 'A:1e-001;' != 'A:1e-01;' >> >> ---------------------------------------------------------------------- >> > > Urgh. This is only occurring on Windows in Py2.5, not on 2.6? I suppose > these platform-specific differences have been ironed out over time in > Python. Indeed. Probably something at the C API level from the OS. > To make the test work on Win+Py2.5, which of these do you prefer? > > (a) change the test to > assert_(memfile.getvalue().lstrip().startswith('A:1e-')) Well, use the self.assertTrue(...) method rather than assert_? > (b) Test for the Python version, and if less than 2.6, do the same test as > (a), but also include an angry note about Windows compatibility in earlier > Python versions. That might be fragile depending on how the Python was compiled - not everyone uses the supplied binaries from Python.org after all. How about using this, with a comment to explain the variation: self.assertTrue(memfile.getvalue().lstrip() in ['A:1e-001;', 'A:1e-01;']) Note we can't use self.assertIn(a, b) since that requires Python 2.7 > > Thanks for spotting this, > That's what the buildbot is for :) Peter From eric.talevich at gmail.com Sat Jun 18 15:07:29 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Jun 2011 15:07:29 -0400 Subject: [Biopython-dev] Failure in test_Phylo; `format_branchlength` parameter to NewickIO writers. In-Reply-To: References: Message-ID: On Fri, Jun 17, 2011 at 1:22 AM, Peter Cock wrote: > On Fri, Jun 17, 2011 at 5:18 AM, Eric Talevich wrote: > > On Thu, Jun 16, 2011 at 3:15 PM, Peter Cock wrote: > >> > >> FAIL: Custom format string for Newick branch length serialization. > >> ---------------------------------------------------------------------- > >> Traceback (most recent call last): > >> File "c:\repositories\BuildBot\win25\build\Tests\test_Phylo.py", > >> line 52, in test_format_branch_length > >> self.assertEqual(mem_file.getvalue().strip(), 'A:1e-01;') > >> AssertionError: 'A:1e-001;' != 'A:1e-01;' > >> > >> ---------------------------------------------------------------------- > >> > [...] > > How about using this, with a comment to explain the variation: > > self.assertTrue(memfile.getvalue().lstrip() in ['A:1e-001;', 'A:1e-01;']) > > Note we can't use self.assertIn(a, b) since that requires Python 2.7 > > Committed: https://github.com/biopython/biopython/commit/eebf86f863f1d5bfc9a09ab53abf101a4cc8dc79 -Eric From eric.talevich at gmail.com Sat Jun 18 15:13:34 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Jun 2011 15:13:34 -0400 Subject: [Biopython-dev] [biopython] Bugfix in test_Phylo; branch length formatter for Newick trees (#7) In-Reply-To: References:

<3B2A0BA4-3B13-4DEE-ADFB-E7253857E8DA@gmail.com> <7B5DB32C-25FB-43F6-A3CB-15848A975418@gmail.com> Message-ID: On Sun, Jun 5, 2011 at 4:11 PM, Peter Cock wrote: > On Sun, Jun 5, 2011 at 7:53 PM, Eric Talevich wrote: > >> > >> There's an issue with naming -- I called the argument "format_support" > >> because all the other arguments refer to confidence as "support", since > they > >> were all copied from Bio.Nexus. The Bio.Phylo Clade attribute this > affects > >> is called "confidence", though. It's confusing either way. Thoughts & > >> suggestions? > > > > To follow up on that last bit, would anyone be opposed to > > deprecating/renaming the other arguments in NewickIO functions to > > change all references from "support" to "confidence"? The keyword > > arguments are a little esoteric, I think; can we try a 1-release (or > > even 0-release) deprecation cycle here? > > Do you think you could have both arg names supported in 1.58 > (with a deprecation warning if the old names are used)? > > If so, and you post this to the main list and get no objections, > I'm open to adding a deprecation warning in 1.58 (which could > be relatively soon, before BOSC/ISMB would be my guess) and > dropping the old names in 1.59. > > I posted this to the main list about a week ago, and haven't heard anything since then, so I committed the change: https://github.com/biopython/biopython/commit/375fdb70bf0e28f9cbb50bb78fd20480d0ccc5ba The new keyword argument names replace the old names in the argument, and then the old ones reappear at the end of the argument list, with default values "None". If existing scripts use positional arguments without the keyword, it will still work fine without triggering a warning; if they use the old name it triggers a warning. -Eric From p.j.a.cock at googlemail.com Sat Jun 18 17:16:24 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 18 Jun 2011 22:16:24 +0100 Subject: [Biopython-dev] [biopython] Bugfix in test_Phylo; branch length formatter for Newick trees (#7) In-Reply-To: References:

<3B2A0BA4-3B13-4DEE-ADFB-E7253857E8DA@gmail.com> <7B5DB32C-25FB-43F6-A3CB-15848A975418@gmail.com> Message-ID: On Sat, Jun 18, 2011 at 8:13 PM, Eric Talevich wrote: > > I posted this to the main list about a week ago, and haven't heard anything > since then, so I committed the change: > https://github.com/biopython/biopython/commit/375fdb70bf0e28f9cbb50bb78fd20480d0ccc5ba > > The new keyword argument names replace the old names in the argument, and > then the old ones reappear at the end of the argument list, with default > values "None". If existing scripts use positional arguments without the > keyword, it will still work fine without triggering a warning; if they use > the old name it triggers a warning. > > -Eric > Looks good. Peter From redmine at redmine.open-bio.org Sat Jun 18 23:47:24 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 19 Jun 2011 03:47:24 +0000 Subject: [Biopython-dev] [Biopython - Feature #3216] Bio.Phylo.Applications support for PhyML References: Message-ID: Issue #3216 has been updated by Eric Talevich. File _Phyml.py added Assignee changed from Eric Talevich to Biopython Dev Mailing List Priority changed from Low to Normal I took a crack at this today. To test it, put this file into Bio/Phylo/Applications, add a blank __init__.py, and run it like this:

>>> from Bio import Phylo
>>> from Bio.Phylo.Applications._Phyml import PhymlCommandline
>>> cmd = PhymlCommandline(input='Tests/Phylip/random.phy')
>>> cmd()
>>> tree = Phylo.read('Tests/Phylip/random.phy', 'newick')
>>> Phylo.draw_ascii(tree)

It prints a lot of junk to stdout. The output tree takes the name _phyml_tree.txt, and I don't see the option for changing the output file name. There are some more comments sprinkled through the source, and I'm still not sure how to handle the '-f' option. ---------------------------------------- Feature #3216: Bio.Phylo.Applications support for PhyML https://redmine.open-bio.org/issues/3216 Author: Eric Talevich Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: Not Applicable URL: http://www.atgc-montpellier.fr/phyml/ PhyML is another popular tool for inferring phylogenies by maximum likelihood, and can be also be considered reasonably best-practice (alongside RAxML, which isn't yet packaged). The Debian package for PhyML was recently updated to the latest version (3.0), and should be trickling down to the distros soon. Let's create a wrapper for it in Biopython. The input is just a Phylip alignment, so this should be easier than MrBayes. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Tue Jun 21 11:07:04 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 21 Jun 2011 15:07:04 +0000 Subject: [Biopython-dev] [Biopython - Bug #3170] (Resolved) Integration of external package: pypaml References: Message-ID: Issue #3170 has been updated by Peter Cock. Status changed from New to Resolved % Done changed from 0 to 100 Brad's merged Brandon's code, so this can be closed now I think. ---------------------------------------- Bug #3170: Integration of external package: pypaml https://redmine.open-bio.org/issues/3170 Author: Brandon Invergo Status: Resolved Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: Not Applicable URL: PAML (Phylogenetic Analysis by Maximum Likelihood; http://abacus.gene.ucl.ac.uk/software/paml.html) is a package of programs written by Ziheng Yang. The programs are used widely, especially CODEML which is used to estimate evolutionary rate parameters for a given sequence alignment. There is currently a PAML library for BioPerl but, to my knowledge, no such wrapper exists for Python. I have independently written a Python interface to the CODEML program of the PAML package, with the intention of eventually covering all of the programs in the package. You can find my code here: http://code.google.com/p/pypaml/ I believe it would be beneficial to integrate my pypaml package into the main Biopython project and to continue its development as such. Before it can be integrated, some immediate tasks must be done: - change the licensing: currently it's GPL, as described in the code and on the project page. Is it sufficient to simply remove its dedicated project page and change the verbiage in the code? - check coding standards as described in the Contributing to Biopython wiki - make some changes to be compatible with Python 2.5: I use @property and @x.setter decorator tags which are only 2.6+. I think that's the only incompatability - double-check the CODEML output parsing for many PAML versions; the output is notoriously non-standard from release to release. I may have to build some version-checking into the parser. I wrote it based on the output of PAML 4.3. I propose that compatibility with only 4.X+ be implemented (current version = 4.4c - build some unit tests (I'm new to this in Python so I need to learn a bit about that I've tried from the start to make it very generalized so I don't think any major changes need to be made. Plus, I think structurally it should be easy to implement the other PAML programs by copying a lot of the code. The output parsing for each program is a different story, though. I will implement many of the above changes first in my stand-alone library before merging it with a branch of the Biopython git repository. Because CODEML appears to be the most commonly used program from the package, for the immediate future it will continue to receive most of the focus, but with time the other programs will be implemented. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From updates at feedmyinbox.com Wed Jun 22 07:28:56 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Wed, 22 Jun 2011 07:28:56 -0400 Subject: [Biopython-dev] 6/22 newest questions tagged biopython - Stack Overflow Message-ID: <5bc82b573fb2cf78281fc02161e0cd04@74.63.51.88> // How can I run clustalw using Biopython // June 22, 2011 at 5:30 AM http://stackoverflow.com/questions/6437754/how-can-i-run-clustalw-using-biopython I am doing research in Bioinformatics. I don't know, How to run clustalw using Biopython. please help me // How to get distance between two atoms using for loop? // June 22, 2011 at 5:01 AM http://stackoverflow.com/questions/6437391/how-to-get-distance-between-two-atoms-using-for-loop I have one PDB structure. This structure has 13 residues. I have to find the distance between two atoms(only C,O,N,S) using for loop. First I have to find the distance between first and second residue. after that first and third residue.up to first and 13 th residue and so on. How can I write the python script using for loop? -- Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782465/c6ce4e74edf1048798e4b627c86b1b0b51013840/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From redmine at redmine.open-bio.org Fri Jun 24 20:32:31 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sat, 25 Jun 2011 00:32:31 +0000 Subject: [Biopython-dev] [Biopython - Feature #3258] (New) phastCons score parser Message-ID: Issue #3258 has been reported by Beisi Xu. ---------------------------------------- Feature #3258: phastCons score parser https://redmine.open-bio.org/issues/3258 Author: Beisi Xu Status: New Priority: Normal Assignee: Beisi Xu Category: Main Distribution Target version: 1.57 URL: usage: chr*.phastCons46way.placental.wigFix.gz should be downloaded: mkdir -p /home/user/data/hg19/phastcons/ cd /home/user/data/hg19/phastcons/ for i in `seq 1 21` X Y do wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/placentalMammals/chr${i}.phastCons46way.placental.wigFix.gz done you can download http://hgdownload.cse.ucsc.edu/goldenPath/hg19/phastCons46way/placentalMammals/chr21.phastCons46way.placental.wigFix.gz only for test test_phast.py can be found in the source code file ######### it takes 100 seconds to read a 30M(80k lines) gziped phastCons compressed file. And stored the offset of each record that allow quick search. so it takes very little memory for compressed phastCons file. uncompress file format take more memories, optimized that only loading one chrom one time, so it will be effient and lower memory if you are scoring chrom one after one, but it will take more time if you scoring like: chr1 1, chr2 1, chr1 2, chr2 3 result for hg19 46ways : $time python test_phast.py chr21 9411193 0 chr21 9411194 0.053 chr21 9411195 0.044 chr21 9448727 0.009 chr21 9448728 0 chr21 9448729 0 chr21 9448878 0 chr21 9448879 0.002 chr21 9448880 0.004 real 1m47.140s user 1m46.996s sys 0m0.079s ######### http://genome.ucsc.edu/goldenPath/help/phastCons.html phastCons File Format phastCons data files contain the compressed conservation scores that underlie the Conservation annotation track and the phastCons table. For a detailed description of the algorithm used to produce the scores, see the Genome Browser description page associated with the Conservation track. File Format (assemblies released Nov. 2004 and later) When uncompressed, the file contains a declaration line and one column of data in wiggle table fixed-step format: fixedStep chrom=scaffold_1 start=3462 step=1 0.0978. 0.1588 0.1919 0.1948. 0.1684. 1. Declaration line: The declaration line specifies the starting point of the data in the assembly. It consists of the following fields: * fixedStep -- keyword indicating the wiggle track format used to write the data. In fixed step format, the data is single-column with a fixed interval between values. * chrom -- chromosome or scaffold on which first value is located. * start -- position of first value on chromosome or scaffold specified by chrom. NOTE: Unlike most Genome Browser coordinates, these are one-based. * step -- size of the interval (in bases) between values.. A new declaration line is inserted in the file when the chrom value changes, when a gap is encountered (requiring a new start value), or when the step interval changes. 2. Data lines: The first data value below the header shows the score corresponding to the position specified in the header. Subsequent score values step along the assembly in one-base intervals. The score shows the posterior probability that phastCons's phylogenetic hidden Markov model (HMM) is in its most-conserved state at that base position. File Format (assemblies prior to Nov. 2004) When uncompressed, the data file contains two columns: 294 0.0953 295 0.0948 296 0.0943 297 0.0936 298 0.0929 299 0.0921 Column #1 contains a one-based position coordinate. Column #2 contains a score showing the posterior probability that phastCons's phylogenetic hidden Markov model (HMM) is in its most conserved state at that base position. References for phastCons Siepel A and Haussler D (2005). Phylogenetic hidden Markov models. In R. Nielsen, ed., Statistical Methods in Molecular Evolution, pp. 325-351, Springer, New York. Siepel, A., Bejerano, G., Pedersen, J.S., Hinrichs, A., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L.W., Richards, S., Weinstock, G.M., Wilson, R. K., Gibbs, R.A., Kent, W.J., Miller, W., and Haussler, D. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034-1050 (2005). For a discussion of the methods used to calculate the phastCons scores, see the description page for the hg17 Conservation track in the Genome Browser ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sun Jun 26 13:21:04 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 26 Jun 2011 17:21:04 +0000 Subject: [Biopython-dev] [Biopython - Feature #3216] (Closed) Bio.Phylo.Applications support for PhyML References: Message-ID: Issue #3216 has been updated by Eric Talevich. Status changed from New to Closed % Done changed from 0 to 100 Committed: https://github.com/biopython/biopython/commit/b22304a40df4abfdf177af5e7d8fd4254d44833f I also added a smidgen of documentation to the Tutorial. ---------------------------------------- Feature #3216: Bio.Phylo.Applications support for PhyML https://redmine.open-bio.org/issues/3216 Author: Eric Talevich Status: Closed Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: Not Applicable URL: http://www.atgc-montpellier.fr/phyml/ PhyML is another popular tool for inferring phylogenies by maximum likelihood, and can be also be considered reasonably best-practice (alongside RAxML, which isn't yet packaged). The Debian package for PhyML was recently updated to the latest version (3.0), and should be trickling down to the distros soon. Let's create a wrapper for it in Biopython. The input is just a Phylip alignment, so this should be easier than MrBayes. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From eric.talevich at gmail.com Sun Jun 26 13:32:52 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 26 Jun 2011 13:32:52 -0400 Subject: [Biopython-dev] pypaml In-Reply-To: References:

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> Message-ID: Hi Brandon, I just added a stub for Bio.Phylo.PAML to the main Tutorial: https://github.com/biopython/biopython/commit/190a85c5bde9c079fa5cee4ab9f8ee3362538cb8 Do you think you could add some more to that section, maybe pulling a chunk of content from the wiki page you just wrote? If you're not comfortable with LaTeX you can just point me to some text and I'll add it. Thanks, Eric On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo wrote: > Ok, the documentation is finished: > http://biopython.org/wiki/PAML > > Cheers, > Brandon > > On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman wrote: > > Brandon; > > > >> Ok I've just sent the email to the main list. > > > > Awesome, thanks for this. Hope this convinces some other folks to > > take a look. > > > >> I can write up some documentation this week. What is the official > >> procedure for adding documentation to the wiki, if any? Or can I just > >> create an account and start writing? > > > > Create an account and start writing. Nothing official except that > > documentation is good. > > > >> Also, just to double-check, are my docstrings all sufficient or should > >> I expand those? > > > > Your code comments looked great to me. The end user documentation > > seems to be the main thing at this point: describing how someone can > > pick up and get started with the code. > > > > Thanks again for all the work, > > Brad > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From b.invergo at gmail.com Mon Jun 27 11:26:50 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Mon, 27 Jun 2011 17:26:50 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: References:

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> Message-ID: Hi Eric, No problem, I'll start writing something up now. Cheers, -brandon On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich wrote: > Hi Brandon, > > I just added a stub for Bio.Phylo.PAML to the main Tutorial: > https://github.com/biopython/biopython/commit/190a85c5bde9c079fa5cee4ab9f8ee3362538cb8 > > Do you think you could add some more to that section, maybe pulling a chunk > of content from the wiki page you just wrote? If you're not comfortable with > LaTeX you can just point me to some text and I'll add it. > > Thanks, > Eric > > On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo > wrote: >> >> Ok, the documentation is finished: >> http://biopython.org/wiki/PAML >> >> Cheers, >> Brandon >> >> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman wrote: >> > Brandon; >> > >> >> Ok I've just sent the email to the main list. >> > >> > Awesome, thanks for this. Hope this convinces some other folks to >> > take a look. >> > >> >> I can write up some documentation this week. What is the official >> >> procedure for adding documentation to the wiki, if any? Or can I just >> >> create an account and start writing? >> > >> > Create an account and start writing. Nothing official except that >> > documentation is good. >> > >> >> Also, just to double-check, are my docstrings all sufficient or should >> >> I expand those? >> > >> > Your code comments looked great to me. The end user documentation >> > seems to be the main thing at this point: describing how someone can >> > pick up and get started with the code. >> > >> > Thanks again for all the work, >> > Brad >> > _______________________________________________ >> > Biopython-dev mailing list >> > Biopython-dev at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev > > From redmine at redmine.open-bio.org Wed Jun 29 10:52:30 2011 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Wed, 29 Jun 2011 14:52:30 +0000 Subject: [Biopython-dev] [Biopython - Bug #3259] (New) PAML: run() methods can't find output file for parsing Message-ID: Issue #3259 has been reported by Brandon Invergo. ---------------------------------------- Bug #3259: PAML: run() methods can't find output file for parsing https://redmine.open-bio.org/issues/3259 Author: Brandon Invergo Status: New Priority: Normal Assignee: Category: Target version: URL: All three PAML programs (baseml, codeml, and yn00) attempt to parse the output file given a path relative to the paml working directory. If this is different then the current (python) working directory, the output file won't be found. The erroneous line is the same in all three files and thus the same fix is needed for all three: baseml.py (line 177), codeml.py (line 187), yn00.py (line 104)

        if parse:
---->       results = read(self._rel_out_file)
        else:
            results = None

change to:

        if parse:
---->       results = read(self.out_file)
        else:
            results = None

---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From b.invergo at gmail.com Wed Jun 29 11:10:30 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Wed, 29 Jun 2011 17:10:30 +0200 Subject: [Biopython-dev] [Biopython - Bug #3259] (New) PAML: run() methods can't find output file for parsing In-Reply-To: References: Message-ID: Hi everyone, This was a stupid mistake and I'm not sure how it got past me. I've already addressed it and I'm about to do a pull request. Sorry about that, -brandon On Wed, Jun 29, 2011 at 4:52 PM, wrote: > > Issue #3259 has been reported by Brandon Invergo. > > ---------------------------------------- > Bug #3259: PAML: run() methods can't find output file for parsing > https://redmine.open-bio.org/issues/3259 > > Author: Brandon Invergo > Status: New > Priority: Normal > Assignee: > Category: > Target version: > URL: > > > All three PAML programs (baseml, codeml, and yn00) attempt to parse the output file given a path relative to the paml working directory. If this is different then the current (python) working directory, the output file won't be found. > > The erroneous line is the same in all three files and thus the same fix is needed for all three: > baseml.py (line 177), codeml.py (line 187), yn00.py (line 104) > >

> ? ? ? ?if parse:
> ----> ? ? ? results = read(self._rel_out_file)
> ? ? ? ?else:
> ? ? ? ? ? ?results = None
>

> > change to: >

> ? ? ? ?if parse:
> ----> ? ? ? results = read(self.out_file)
> ? ? ? ?else:
> ? ? ? ? ? ?results = None
>

> > > > > > > ---------------------------------------- > You have received this notification because this email was added to the New Issue Alert plugin > > > -- > You have received this notification because you have either subscribed to it, or are involved in it. > To change your notification preferences, please click here and login: http://redmine.open-bio.org > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From b.invergo at gmail.com Wed Jun 29 11:27:27 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Wed, 29 Jun 2011 17:27:27 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: References:

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> Message-ID: Well, it's not much, but how's this? https://github.com/brandoninvergo/biopython/tree/doc-branch Do you want me to go more into detail about the options available like in the wikior is this sufficient as a tutorial? Just let me know... Cheers, Brandon On Mon, Jun 27, 2011 at 5:26 PM, Brandon Invergo wrote: > Hi Eric, > No problem, I'll start writing something up now. > Cheers, > -brandon > > On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich wrote: >> Hi Brandon, >> >> I just added a stub for Bio.Phylo.PAML to the main Tutorial: >> https://github.com/biopython/biopython/commit/190a85c5bde9c079fa5cee4ab9f8ee3362538cb8 >> >> Do you think you could add some more to that section, maybe pulling a chunk >> of content from the wiki page you just wrote? If you're not comfortable with >> LaTeX you can just point me to some text and I'll add it. >> >> Thanks, >> Eric >> >> On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo >> wrote: >>> >>> Ok, the documentation is finished: >>> http://biopython.org/wiki/PAML >>> >>> Cheers, >>> Brandon >>> >>> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman wrote: >>> > Brandon; >>> > >>> >> Ok I've just sent the email to the main list. >>> > >>> > Awesome, thanks for this. Hope this convinces some other folks to >>> > take a look. >>> > >>> >> I can write up some documentation this week. What is the official >>> >> procedure for adding documentation to the wiki, if any? Or can I just >>> >> create an account and start writing? >>> > >>> > Create an account and start writing. Nothing official except that >>> > documentation is good. >>> > >>> >> Also, just to double-check, are my docstrings all sufficient or should >>> >> I expand those? >>> > >>> > Your code comments looked great to me. The end user documentation >>> > seems to be the main thing at this point: describing how someone can >>> > pick up and get started with the code. >>> > >>> > Thanks again for all the work, >>> > Brad >>> > _______________________________________________ >>> > Biopython-dev mailing list >>> > Biopython-dev at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/biopython-dev >>> > >>> _______________________________________________ >>> Biopython-dev mailing list >>> Biopython-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >> > From p.j.a.cock at googlemail.com Wed Jun 29 17:10:02 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 29 Jun 2011 22:10:02 +0100 Subject: [Biopython-dev] [Biopython] DIHEDRAL ANGLES from PDB In-Reply-To: References: Message-ID: On Wed, Jun 29, 2011 at 7:54 PM, Babban Mia wrote: > Hello Everyone > > > I am looking for a tool that can calculate dihedral angle with in a python > script between four atoms in PDB file. > > I hope Biopython has something to offer. > > Please advise. > > Best Yes, try this: http://www.warwick.ac.uk/go/peter_cock/python/ramachandran/ Peter From eric.talevich at gmail.com Thu Jun 30 22:17:26 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 30 Jun 2011 22:17:26 -0400 Subject: [Biopython-dev] pypaml In-Reply-To: References:

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> <20110615115425.GB22528@sobchak> Message-ID: Hi Brandon, Looks good, thanks! It's just enough to get the point across, and the wiki is a fine place for extended examples. Reading this, I notice that the cml.set_option(key, value) gets kind of tedious when a lot of options need to be set. It would be nice to be able to set them all in one go, as keyword arguments: cml.set_options( seqtype=1, verbose=0, noisy=0, RateAncestor=0, model=0, NSsites=[0, 1, 2], CodonFreq=2, cleandata=1, fix_alpha=1, kappa=4.54006, ) What do you think? Worth implementing? Cheers, Eric On Wed, Jun 29, 2011 at 11:27 AM, Brandon Invergo wrote: > Well, it's not much, but how's this? > https://github.com/brandoninvergo/biopython/tree/doc-branch > Do you want me to go more into detail about the options available like > in the wikior is this sufficient as a tutorial? Just let me know... > > Cheers, > Brandon > > On Mon, Jun 27, 2011 at 5:26 PM, Brandon Invergo > wrote: > > Hi Eric, > > No problem, I'll start writing something up now. > > Cheers, > > -brandon > > > > On Sun, Jun 26, 2011 at 7:32 PM, Eric Talevich > wrote: > >> Hi Brandon, > >> > >> I just added a stub for Bio.Phylo.PAML to the main Tutorial: > >> > https://github.com/biopython/biopython/commit/190a85c5bde9c079fa5cee4ab9f8ee3362538cb8 > >> > >> Do you think you could add some more to that section, maybe pulling a > chunk > >> of content from the wiki page you just wrote? If you're not comfortable > with > >> LaTeX you can just point me to some text and I'll add it. > >> > >> Thanks, > >> Eric > >> > >> On Thu, Jun 16, 2011 at 11:34 AM, Brandon Invergo > >> wrote: > >>> > >>> Ok, the documentation is finished: > >>> http://biopython.org/wiki/PAML > >>> > >>> Cheers, > >>> Brandon > >>> > >>> On Wed, Jun 15, 2011 at 1:54 PM, Brad Chapman > wrote: > >>> > Brandon; > >>> > > >>> >> Ok I've just sent the email to the main list. > >>> > > >>> > Awesome, thanks for this. Hope this convinces some other folks to > >>> > take a look. > >>> > > >>> >> I can write up some documentation this week. What is the official > >>> >> procedure for adding documentation to the wiki, if any? Or can I > just > >>> >> create an account and start writing? > >>> > > >>> > Create an account and start writing. Nothing official except that > >>> > documentation is good. > >>> > > >>> >> Also, just to double-check, are my docstrings all sufficient or > should > >>> >> I expand those? > >>> > > >>> > Your code comments looked great to me. The end user documentation > >>> > seems to be the main thing at this point: describing how someone can > >>> > pick up and get started with the code. > >>> > > >>> > Thanks again for all the work, > >>> > Brad > >>> > _______________________________________________ > >>> > Biopython-dev mailing list > >>> > Biopython-dev at lists.open-bio.org > >>> > http://lists.open-bio.org/mailman/listinfo/biopython-dev > >>> > > >>> _______________________________________________ > >>> Biopython-dev mailing list > >>> Biopython-dev at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev > >> > >> > > > From chapmanb at 50mail.com Thu Jun 2 19:49:00 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 2 Jun 2011 15:49:00 -0400 Subject: [Biopython-dev] Early registration for BOSC ends tomorrow, Friday June 3 Message-ID: <20110602194900.GG21074@sobchak> If you haven't already registered for BOSC, now is your chance--after June 3, prices will go up! Registration for BOSC is through the ISMB main conference website: http://www.iscb.org/ismbeccb2011-registration#sigs . Since BOSC is a two-day SIG, the price is 2x the one-day SIG price listed on the ISMB website. You can register for BOSC without registering for the main ISMB conference, if you want. The preliminary BOSC schedule (subject to change) is now up at http://www.open-bio.org/wiki/BOSC_2011_Schedule (more details will be added soon). There is also a two day Codefest proceeding BOSC; please add yourself to the list of attendees if you are interested: http://www.open-bio.org/wiki/Codefest_2011 The BOSC talks have already been chosen, but we have spaces for last-minute posters. If you'd like your poster abstract to appear in the BOSC program, you should submit it now--see http://www.open-bio.org/wiki/BOSC_2011#Abstract_Submission_Information We hope to see you at BOSC! Nomi Harris Co-Chair, BOSC 2011 From eric.talevich at gmail.com Sun Jun 5 16:09:59 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 5 Jun 2011 12:09:59 -0400 Subject: [Biopython-dev] [biopython] Bugfix in test_Phylo; branch length formatter for Newick trees (#7) In-Reply-To: References:

<3B2A0BA4-3B13-4DEE-ADFB-E7253857E8DA@gmail.com> <7B5DB32C-25FB-43F6-A3CB-15848A975418@gmail.com> Message-ID: On Fri, May 27, 2011 at 9:48 AM, Erick Matsen wrote: > Hello everyone-- > > > Hope you don't mind my chiming into this discussion. > > > Good to know. The format for confidences is also hard-coded ("%1.2f"), do > > you suppose that should be given the same treatment? > > I think this would be entirely appropriate. There are some cases (eg > bootstrap) where the confidence is actually a count, and being able to > express it as such might be convenient. > > I have one related point to discuss if you don't mind. In > > > https://github.com/biopython/biopython/blob/master/Bio/Phylo/NewickIO.py#L246 > > trees without confidence values get written out as trees with confidence > values of zero. These are of course two different things. > > I realize that if we want to write out a tree without confidence values > we can specify branchlengths_only, but it would seem to me that the most > natural behavior would be to just write out confidence values when they > are specified. > > Hi Erick & folks, This commit should fix both those issues: https://github.com/biopython/biopython/commit/4ce56619cb13e27659927707e2979807d37b26b0 There's an issue with naming -- I called the argument "format_support" because all the other arguments refer to confidence as "support", since they were all copied from Bio.Nexus. The Bio.Phylo Clade attribute this affects is called "confidence", though. It's confusing either way. Thoughts & suggestions? -E From eric.talevich at gmail.com Sun Jun 5 18:53:58 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Sun, 5 Jun 2011 14:53:58 -0400 Subject: [Biopython-dev] [biopython] Bugfix in test_Phylo; branch length formatter for Newick trees (#7) In-Reply-To: References:

<3B2A0BA4-3B13-4DEE-ADFB-E7253857E8DA@gmail.com> <7B5DB32C-25FB-43F6-A3CB-15848A975418@gmail.com> Message-ID: On Sun, Jun 5, 2011 at 12:09 PM, Eric Talevich wrote: > This commit should fix both those issues: > > https://github.com/biopython/biopython/commit/4ce56619cb13e27659927707e2979807d37b26b0 > > There's an issue with naming -- I called the argument "format_support" > because all the other arguments refer to confidence as "support", since they > were all copied from Bio.Nexus. The Bio.Phylo Clade attribute this affects > is called "confidence", though. It's confusing either way. Thoughts & > suggestions? > To follow up on that last bit, would anyone be opposed to deprecating/renaming the other arguments in NewickIO functions to change all references from "support" to "confidence"? The keyword arguments are a little esoteric, I think; can we try a 1-release (or even 0-release) deprecation cycle here? -Eric From p.j.a.cock at googlemail.com Sun Jun 5 20:11:26 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 5 Jun 2011 21:11:26 +0100 Subject: [Biopython-dev] [biopython] Bugfix in test_Phylo; branch length formatter for Newick trees (#7) In-Reply-To: References:

<3B2A0BA4-3B13-4DEE-ADFB-E7253857E8DA@gmail.com> <7B5DB32C-25FB-43F6-A3CB-15848A975418@gmail.com> Message-ID: On Sun, Jun 5, 2011 at 7:53 PM, Eric Talevich wrote: >> >> There's an issue with naming -- I called the argument "format_support" >> because all the other arguments refer to confidence as "support", since they >> were all copied from Bio.Nexus. The Bio.Phylo Clade attribute this affects >> is called "confidence", though. It's confusing either way. Thoughts & >> suggestions? > > To follow up on that last bit, would anyone be opposed to > deprecating/renaming the other arguments in NewickIO functions to > change all references from "support" to "confidence"? The keyword > arguments are a little esoteric, I think; can we try a 1-release (or > even 0-release) deprecation cycle here? Do you think you could have both arg names supported in 1.58 (with a deprecation warning if the old names are used)? If so, and you post this to the main list and get no objections, I'm open to adding a deprecation warning in 1.58 (which could be relatively soon, before BOSC/ISMB would be my guess) and dropping the old names in 1.59. Peter From updates at feedmyinbox.com Sun Jun 5 19:27:28 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Sun, 5 Jun 2011 15:27:28 -0400 Subject: [Biopython-dev] 6/5 biopython Questions - BioStar Message-ID: <0de76f2a5cecc44812c6261fb9e96b92@74.63.51.88> // What is biopython and bioeclipse used for? // June 4, 2011 at 12:10 PM http://biostar.stackexchange.com/questions/8844/what-is-biopython-and-bioeclipse-used-for Just would like to know why do things like biopython and bioeclipse exist and in which bioinformatics context are they used -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/687953/851dd4cd10a2537cf271a85dfd1566976527e0cd/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From p.j.a.cock at googlemail.com Sun Jun 5 21:51:26 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 5 Jun 2011 22:51:26 +0100 Subject: [Biopython-dev] float('inf') or float('-inf') Message-ID: Hi all, As explained in PEP 754, prior to Python 2.6 float('inf'), float('-inf') and also float('nan') were passed to the underlying C library, which may or may not return the IEEE special floating point value for infinity, minus infinity or nan. See: http://www.python.org/dev/peps/pep-0754/ This is the root cause of this unit test failure on Windows Python 2.5, http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.5/builds/224/steps/shell/logs/stdio ====================================================================== ERROR: Test a simple model with 2 states and 2 symbols. ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\repositories\BuildBot\win25\build\Tests\test_HMMGeneral.py", line 260, in test_simple_hmm viterbi = model.viterbi(observed_emissions, NumberAlphabet) File "c:\repositories\BuildBot\win25\build\build\lib.win32-2.5\Bio\HMM\MarkovModel.py", line 499, in viterbi log_initial = self._log_transform(self.initial_prob) File "c:\repositories\BuildBot\win25\build\build\lib.win32-2.5\Bio\HMM\MarkovModel.py", line 598, in _log_transform neg_inf = float("-inf") ValueError: invalid literal for float(): -inf and similarly on Windows Python 2.4, http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.4/builds/226/steps/shell/logs/stdio This test failure is a result of me committing Walter Gillett's fix to Txt Bio/HMM/MarkovModel.py last week: https://github.com/biopython/biopython/commit/152f469179d4a142858a04c02169f8d1fc5f8c83 We would have spotted this earlier, but the Windows buildslave machine has been in use by a new staff member (a temporary arrangement). I guess we need more volunteer buildslave machines... Based on the PEP 754 example, 1E400 may work instead: try: neg_inf = float("-inf") except ValueError: neg_inf = -1E400 I'll try to test this on the machine later in the week when I get a chance. If anyone else has Python 2.4 or 2.5 on Windows and wants to look at this now, please go ahead. Regards, Peter From b.invergo at gmail.com Fri Jun 10 10:53:12 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Fri, 10 Jun 2011 12:53:12 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: <20110228163521.GF9652@sobchak.mgh.harvard.edu> References: <20110114154035.GC30193@sobchak.mgh.harvard.edu>

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> Message-ID: Hi everyone, It's been quite a while since I've updated you with my PAML progress. My side projects had to take a back seat to my PhD research for a while, so I couldn't work on it. Anyway, I finally got back to it and implemented some much-needed restructuring as suggested. I've implemented a generalized PAML class which the others inherit from to reduce duplicated code. I've also created files containing helper functions explicitly for parsing the result files. So, the Codeml.read() function now does all of its parsing through functions held in the _parse_codeml.py file. This was done for both clarity and cleanliness. I've taken the suggestion to split the parsing task into several functions so I hope it's all a bit more readable now. I certainly think it is; I was hesitant at first but now that it's done I see how much better it is. I also caught some really poor parts of the code which I was able to fix. Codeml parsing remains a bit complicated compared to Baseml and Yn00 but I think that's just the nature of the beast. So, if anyone has a moment, could you take a moment to do a quick code review? Barring any major changes, I'll send a message to the users list to see if people would be willing to take it for a test drive. https://github.com/brandoninvergo/biopython/tree/paml-branch/Bio/Phylo/PAML There is one other problem. As you may recall, I decided to reimplement the Chi2 program from the PAML package to provide a convenient means to do likelihood ratio testing without having to load another library (scipy, rpy2). The original was written in C but had limited command-line options so I couldn't just write an interface to it. Re-writing the code in Python seemed to work fine, as far as getting the correct results/output. However, I later found that doing tests with large degrees of freedom (one codeml model comparison requires 41 df) takes an exorbitant amount of time compared to the C code. So, I see three options: dig into the code to try to find ways to optimize it, look into something like Weave for compiling the C code into a Python module, or just remove Chi2 for now and wait for him to release a version that takes command line arguments (which he claims is coming in the next version). Any thoughts on this matter? Cheers, Brandon On Mon, Feb 28, 2011 at 5:35 PM, Brad Chapman wrote: > Brandon; > > [pypaml branch: https://github.com/brandoninvergo/biopython/tree/paml-branch] > > [base class] >> This is a really good idea and I'm a bit disappointed that I didn't >> see it myself! Indeed, most of the functionality is just copied/pasted >> between the classes, with only some variation in the >> read/write_ctl_file functions for codeml and baseml. So, writing a >> base class would really simplify things. I do have one question, >> though, since this is my first time organizing my code in a >> large-scale Python project. Where would be the best place to implement >> this base paml class? In __init__.py or in its own paml.py file? I >> know the end result would be the same but I figure I should start >> learning some of these best practices. > > It's always easier to get perspective on code when you haven't been > directly in the middle of it. Even if you don't have someone to do > code reviews, stepping away from a project and coming back later > will often lead to a bunch of insights. > > For the base class, I would follow Eric and Peter's example and use > files in the same directory with an underscore: something like _shared.py > or _base.py. > > [read functions] >> This mess is precisely why I had to include so many different >> output files for the unittesting (codeml is the main culprit; baseml >> is moderately bad; yn00 isn't a problem) > > I definitely feel your pain on this. This is exactly why your work > doing this is appreciated; you'll save someone a lot of headache > later on. > >> So, because I would potentially end up scanning almost the entire file >> just to figure out what's going on, I think just parsing-as-you-go, >> using elif statements to short-circuit and skip further evaluations of >> a line after a match has been found, would be the better option. >> Perhaps the files aren't long enough to be able to make an appeal for >> computational efficiency but at the same time, I hesitate to read >> through the file multiple times unnecessarily. I agree, though, that >> this makes the read() function quite long. For that, though, I tried >> to provide descriptive comments before each parsing case, describing >> exactly what the next block of code is meant to parse and also >> including a specific example line which should be parsed by it. > > The issue really is that deeply nested code is hard to read, > long functions are hard to read, and when you combine them together > it just makes it very difficult for others to follow your logic. > > I don't think you necessarily have to make multiple passes to parse it > in a more structure way, but what you would want to focus on is making > the flow through the function simpler. The way I would normally attack > this is to break components into smaller more re-usable functions. > Here's a concrete example from the start of the codeml parser: > > https://github.com/brandoninvergo/biopython/blob/paml-branch/Bio/Phylo/PAML/codeml.py > > siteclass_re = re.match("Site-class models:\s*(.*)", line) > if siteclass_re is not None: > ? ?siteclass_model = siteclass_re.group(1) > ? ?if siteclass_model == "": > ? ? ? ?multi_models = True > ? ? ? ?continue > ? ?results["site-class model"] = siteclass_model > ? ?if siteclass_model == "NearlyNeutral": > ? ? ? ?current_model = 1 > ? ? ? ?results["NSsites"][current_model] = \ > ? ? ? ? ? ?{"description":siteclass_model} > ? ? ? ?if 0 in results["NSsites"]: > ? ? ? ? ? ?del results["NSsites"][0] > ? ?elif siteclass_model == "PositiveSelection": > ? ? ? ?current_model = 2 > ? ? ? ?results["NSsites"][current_model] = \ > ? ? ? ? ? ?{"description":siteclass_model} > ? ? ? ?if 0 in results["NSsites"]: > ? ? ? ? ? ?del results["NSsites"][0] > ? ?elif siteclass_model == "discrete (4 categories)": > ? ? ? ?current_model = 3 > ? ? ? ?results["NSsites"][current_model] = \ > ? ? ? ? ? ?{"description":siteclass_model} > ? ? ? ?if 0 in results["NSsites"]: > ? ? ? ? ? ?del results["NSsites"][0] > ? ?elif siteclass_model == "beta (4 categories)": > ? ? ? ?current_model = 7 > ? ? ? ?results["NSsites"][current_model] = \ > ? ? ? ? ? ?{"description":siteclass_model} > ? ? ? ?if 0 in results["NSsites"]: > ? ? ? ? ? ?del results["NSsites"][0] > ? ?elif siteclass_model == "beta&w>1 (5 categories)": > ? ? ? ?current_model = 8 > ? ? ? ?results["NSsites"][current_model] = \ > ? ? ? ? ? ?{"description":siteclass_model} > ? ? ? ?if 0 in results["NSsites"]: > ? ? ? ? ? ?del results["NSsites"][0] > > You could refactor this something along the lines of: > > class _CodemlParser: > ? ?def __init__(self): > ? ? ? ?self.results = {} > ? ? ? ?self.flags = dict(multi_models = False) > > ? ?def read(self, results_handle): > ? ? ? ?for line in results_handle: > ? ? ? ? ? ?siteclass_re = re.match("Site-class models:\s*(.*)", line) > ? ? ? ? ? ?if siteclass_re is not None: > ? ? ? ? ? ? ? ?self._siteclass_parse(siteclass_re) > > ? ?def _add_siteclass_model(self, siteclass_model): > ? ? ? ?self.results["site-class model"] = siteclass_model > ? ? ? ?name_to_num = {"NearlyNeutral": 1, > ? ? ? ? ? ? ? ? ? ? ? "PositiveSelection": 2, > ? ? ? ? ? ? ? ? ? ? ? "discrete (4 categories)": 3, > ? ? ? ? ? ? ? ? ? ? ? "beta (4 categories)": 7 > ? ? ? ? ? ? ? ? ? ? ? "beta&w>1 (5 categories)": 8} > ? ? ? ?current_model = name_to_num[siteclass_model] > ? ? ? ?self.results["NSsites"][current_model] = {"description":siteclass_model} > ? ? ? ?if 0 in results["NSsites"]: > ? ? ? ? ? ?del results["NSsites"][0] > > ? ?def _siteclass_parse(self, siteclass_re): > ? ? ? ?if siteclass_model == "": > ? ? ? ? ? ?self.flags["multi_models"] = True > ? ? ? ?else: > ? ? ? ? ? ?self._add_siteclass_model(siteclass_model) > > You are not changing the parsing strategy, but now you've got > individual functions handling each of the steps so it's clear that > the _siteclass_parse either sets multi_models or adds details about > the single model. Then you can dig into the _add_siteclass_model > function to see what it is doing. To the reader, each individual > unit can be read and understood separately. > > This type of refactoring work is useful generally. I have to do it all > the time in my work and discover new tricks and approaches. Hope this > is helpful and thanks again for all the work on this, > Brad > From p.j.a.cock at googlemail.com Fri Jun 10 10:58:37 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 10 Jun 2011 11:58:37 +0100 Subject: [Biopython-dev] pypaml In-Reply-To: References: <20110114154035.GC30193@sobchak.mgh.harvard.edu>

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> Message-ID: On Fri, Jun 10, 2011 at 11:53 AM, Brandon Invergo wrote: > > There is one other problem. As you may recall, I decided to > reimplement the Chi2 program from the PAML package to provide a > convenient means to do likelihood ratio testing without having to load > another library (scipy, rpy2). The original was written in C but had > limited command-line options so I couldn't just write an interface to > it. Re-writing the code in Python seemed to work fine, as far as > getting the correct results/output. However, I later found that doing > tests with large degrees of freedom (one codeml model comparison > requires 41 df) takes an exorbitant amount of time compared to the C > code. So, I see three options: dig into the code to try to find ways > to optimize it, look into something like Weave for compiling the C > code into a Python module, or just remove Chi2 for now and wait for > him to release a version that takes command line arguments (which he > claims is coming in the next version). Any thoughts on this matter? Adding C code has drawbacks as it has to compile on multiple platforms, and cannot be used in Jython (and likely not in IronPython or PyPy - I've not kept up to date with their progress). Also, it doesn't see a good use of time to reimplement something the next version of PAML will include. I would wait for the next version of Chi2. Peter From p.j.a.cock at googlemail.com Fri Jun 10 11:17:11 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 10 Jun 2011 12:17:11 +0100 Subject: [Biopython-dev] pypaml In-Reply-To: References: <20110114154035.GC30193@sobchak.mgh.harvard.edu>

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu>

Message-ID: On Fri, Jun 10, 2011 at 12:09 PM, Brandon Invergo wrote: > On Fri, Jun 10, 2011 at 12:58 PM, Peter Cock wrote: >> Adding C code has drawbacks as it has to compile on multiple >> platforms, and cannot be used in Jython (and likely not in IronPython >> or PyPy - I've not kept up to date with their progress). Also, it doesn't >> see a good use of time to reimplement something the next version >> of PAML will include. >> >> I would wait for the next version of Chi2. > > Ok that makes sense. I'll remove chi2.py from the module (though if > people still want it, it'll still be over at my pypaml.googlecode.com > page). I'll keep my eye on his page for new versions. OK > I also just now decided that I'm going to change all of the default > control file option values. At the moment, they're set to the values > found in example files when you download the PAML package. However > these don't necessarily reflect the most common values. I just see it > being a source of confusion and or problems in the future (ie people > not realizing that an option is set to a value that will negatively > affect their results). Better to set all the values to None and let > the user decide what are the best values. That sounds best. Peter From eric.talevich at gmail.com Fri Jun 10 15:59:44 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 10 Jun 2011 11:59:44 -0400 Subject: [Biopython-dev] pypaml In-Reply-To: References: <20110114154035.GC30193@sobchak.mgh.harvard.edu>

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> Message-ID: On Fri, Jun 10, 2011 at 6:53 AM, Brandon Invergo wrote: > There is one other problem. As you may recall, I decided to > reimplement the Chi2 program from the PAML package to provide a > convenient means to do likelihood ratio testing without having to load > another library (scipy, rpy2). The original was written in C but had > limited command-line options so I couldn't just write an interface to > it. Re-writing the code in Python seemed to work fine, as far as > getting the correct results/output. However, I later found that doing > tests with large degrees of freedom (one codeml model comparison > requires 41 df) takes an exorbitant amount of time compared to the C > code. So, I see three options: dig into the code to try to find ways > to optimize it, look into something like Weave for compiling the C > code into a Python module, or just remove Chi2 for now and wait for > him to release a version that takes command line arguments (which he > claims is coming in the next version). Any thoughts on this matter? > If you've already ported the code to pure Python or Python+Numpy/Scipy, do you think it would make sense to provide this function under Bio.Phylo._utils instead of in your PAML module? Then users would be able to do a likelihood ratio test on trees without having the PAML binaries installed. The pure-Python version would still be handy for smaller degrees of freedom, and if someone happens to be using PyPy it would probably be wicked fast. The best solution is probably Numpy, rather than Scipy, since other parts of Biopython already use Numpy as an optional dependency. (Right now, Bio.Phylo runs on Python 3, Jython, and Pypy, so adding and supporting a hand-written C extension on all of these platforms is probably not worth the trouble.) Thanks, Eric From b.invergo at gmail.com Fri Jun 10 16:11:19 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Fri, 10 Jun 2011 18:11:19 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: References: <20110114154035.GC30193@sobchak.mgh.harvard.edu>

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> Message-ID: On Fri, Jun 10, 2011 at 5:59 PM, Eric Talevich wrote: > If you've already ported the code to pure Python or Python+Numpy/Scipy, do > you think it would make sense to provide this function under > Bio.Phylo._utils instead of in your PAML module? Then users would be able to > do a likelihood ratio test on trees without having the PAML binaries > installed. That would be fine by me. I guess there's not much sense in getting rid of it entirely since it's already been written. > The pure-Python version would still be handy for smaller degrees of freedom, > and if someone happens to be using PyPy it would probably be wicked fast. > The best solution is probably Numpy, rather than Scipy, since other parts of > Biopython already use Numpy as an optional dependency. I don't think Numpy has a Chi^2 cumulative distribution function; you can only draw random numbers from the distribution. Scipy has that, but as a user, I would be annoyed to have to install a huge package just to perform likelihood ratio tests. I like the idea of having it built into biopython, since I think it's a fairly common procedure given all the maximum likelihood techniques in biology. That's just my 2 cents though... Cheers, -Brandon From chapmanb at 50mail.com Sat Jun 11 15:59:00 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Sat, 11 Jun 2011 11:59:00 -0400 Subject: [Biopython-dev] pypaml In-Reply-To: References:

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> Message-ID: <20110611155900.GB2831@kunkel> Brandon; > It's been quite a while since I've updated you with my PAML progress. > My side projects had to take a back seat to my PhD research for a > while, so I couldn't work on it. Anyway, I finally got back to it and > implemented some much-needed restructuring as suggested. Thanks very much for taking this on. The restructuring looks fantastic. > I've taken the suggestion to split the parsing task into > several functions so I hope it's all a bit more readable now. I > certainly think it is; I was hesitant at first but now that it's done > I see how much better it is. Really glad the comments were helpful. It really is the hardest thing in programming to mess with a bunch of working code for the sake of trying to refactor it, and you've done excellent work. I only have one more small suggestion. A number of the functions take a results dictionary and then modify it directly, taking advantage of the fact that it's the same object. For instance, 'parse_parameters' in _parse_baseml.py looks like: results["parameters"] = {} parse_parameter_list(lines, results, num_params) parse_kappas(lines, results) parse_rates(lines, results) parse_freqs(lines, results) A nice way to do this is to pass in and return the modified dictionary, so it is clear what is happening in the function. Ideally, this would look like: parameters = {} parameters = parse_parameter_list(lines, parameters, num_params) parameters = parse_kappas(lines, parameters) parameters = parse_rates(lines, parameters) parameters = parse_freqs(lines, parameters) results["parameters"] = parameters For someone reading the code this makes it more explicit that each of those functions modifies the 'parameters' dictionary. Otherwise the side effects that change the results or parameters dictionary could be missed. For the Chi2 question, I'm 100% agreed with Peter and Eric. The pure python version could be useful, but no sense re-writing a C version if an external one exists in Scipy. PyCogent also has some functionality here as well: http://pycogent.sourceforge.net/cookbook/standard_statistical_analyses.html#chi-square Thanks again for all your work, Brad From b.invergo at gmail.com Sun Jun 12 12:28:31 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Sun, 12 Jun 2011 14:28:31 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: <20110611155900.GB2831@kunkel> References:

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> Message-ID: > I only have one more small suggestion. A number of the functions > take a results dictionary and then modify it directly, taking > advantage of the fact that it's the same object. For instance, > 'parse_parameters' in _parse_baseml.py looks like: > > results["parameters"] = {} > parse_parameter_list(lines, results, num_params) > parse_kappas(lines, results) > parse_rates(lines, results) > parse_freqs(lines, results) > > A nice way to do this is to pass in and return the modified > dictionary, so it is clear what is happening in the function. > Ideally, this would look like: > > parameters = {} > parameters = parse_parameter_list(lines, parameters, num_params) > parameters = parse_kappas(lines, parameters) > parameters = parse_rates(lines, parameters) > parameters = parse_freqs(lines, parameters) > results["parameters"] = parameters > > For someone reading the code this makes it more explicit that each > of those functions modifies the 'parameters' dictionary. Otherwise > the side effects that change the results or parameters dictionary > could be missed. Done! Funnily enough, it was originally that way but then I remembered that Python passes arguments by reference, so I changed it to not return the dict every time. I thought I was being clever and Pythonic but I agree that this way is more readable/maintainable. > For the Chi2 question, I'm 100% agreed with Peter and Eric. The pure > python version could be useful, but no sense re-writing a C version > if an external one exists in Scipy. PyCogent also has some > functionality here as well: > > http://pycogent.sourceforge.net/cookbook/standard_statistical_analyses.html#chi-square Ok, the pure Python version is back in. Once PAML is officially part of Biopython, I can write some documentation for the wiki and provide a warning about the high df values... Cheers, Brandon From chapmanb at 50mail.com Tue Jun 14 12:17:54 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 14 Jun 2011 08:17:54 -0400 Subject: [Biopython-dev] pypaml In-Reply-To: References:

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> Message-ID: <20110614121754.GF2552@kunkel> Brandon; [pass and return parameters] > Done! > Funnily enough, it was originally that way but then I remembered that > Python passes arguments by reference, so I changed it to not return > the dict every time. I thought I was being clever and Pythonic but I > agree that this way is more readable/maintainable. That looks fabulous; thanks much. We can invoke the 'Explicit is better than implicit' clause of the Zen of Python for this one. > Ok, the pure Python version is back in. Once PAML is officially part > of Biopython, I can write some documentation for the wiki and provide > a warning about the high df values... Great, thanks for this as well. From my point of view, the only thing you need is some documentation to finish it off. It would definitely be worthwhile to send it to the main list to see if others have feedback. I'm happy to work on merging it in if everyone else is agreed. Thanks again, Brad From b.invergo at gmail.com Tue Jun 14 13:23:12 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Tue, 14 Jun 2011 15:23:12 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: <20110614121754.GF2552@kunkel> References:

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> Message-ID: > Great, thanks for this as well. From my point of view, the only > thing you need is some documentation to finish it off. It would > definitely be worthwhile to send it to the main list to see if > others have feedback. I'm happy to work on merging it in if everyone > else is agreed. Ok I've just sent the email to the main list. I can write up some documentation this week. What is the official procedure for adding documentation to the wiki, if any? Or can I just create an account and start writing? Also, just to double-check, are my docstrings all sufficient or should I expand those? Lastly, I've been having trouble trying to merge the upstream repo with my master branch using $ git pull upstream master (I have set the upstream to the biopython repo as described in the wiki) The connection routinely times out. This is on my lab computer, though, which is behind a proxy that always causes troubles. I'll try again from my home computer this evening. Just thought I'd mention it for now... Cheers, Brandon From chapmanb at 50mail.com Wed Jun 15 11:54:25 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 15 Jun 2011 07:54:25 -0400 Subject: [Biopython-dev] pypaml In-Reply-To: References:

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu> <20110611155900.GB2831@kunkel> <20110614121754.GF2552@kunkel> Message-ID: <20110615115425.GB22528@sobchak> Brandon; > Ok I've just sent the email to the main list. Awesome, thanks for this. Hope this convinces some other folks to take a look. > I can write up some documentation this week. What is the official > procedure for adding documentation to the wiki, if any? Or can I just > create an account and start writing? Create an account and start writing. Nothing official except that documentation is good. > Also, just to double-check, are my docstrings all sufficient or should > I expand those? Your code comments looked great to me. The end user documentation seems to be the main thing at this point: describing how someone can pick up and get started with the code. Thanks again for all the work, Brad From chapmanb at 50mail.com Wed Jun 15 13:03:54 2011 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 15 Jun 2011 09:03:54 -0400 Subject: [Biopython-dev] [Fwd: Bug in Biopython Phyloxml reader] Message-ID: <20110615130354.GD22528@sobchak> Nicolas; Thanks for reporting the problem. The Biopython mailing lists (http://biopython.org/wiki/Mailing_lists) are the right place to report these types of issues. Hopefully Eric or someone else on the list will be able to help. Thanks again, Brad ----- Forwarded message from Nicolas Rochette ----- Date: Wed, 15 Jun 2011 14:25:09 +0200 Sir, I apologize for contacting you directly, but I could not find the right place for this report. Could you please forward it ? The bug is about 0-length terminal branches being given a length of 1 ; please find an example below. Regards, Nicolas Rochette PhD student Laboratory for Biometry and Evolutive Biology Lyon, France // echo '(A:2,(B:0,C:3):0,D:5);' > foo.newick java -cp forester.jar org.forester.application.phyloxml_converter -f=nn -i foo.newick foo.phyloxml python -c 'from Bio import Phylo; Phylo.write(Phylo.read("foo.phyloxml","phyloxml"), "bar.newick", "newick")' cat bar.newick (A:2.00000,(B:1.00000,C:3.00000)0.00000:0.00000,D:5.00000)0.00:0.00000; ----- End forwarded message ----- From eric.talevich at gmail.com Wed Jun 15 14:20:41 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 15 Jun 2011 10:20:41 -0400 Subject: [Biopython-dev] [Fwd: Bug in Biopython Phyloxml reader] In-Reply-To: <20110615130354.GD22528@sobchak> References: <20110615130354.GD22528@sobchak> Message-ID: Hi Nicolas and Brad, Thanks for reporting and forwarding this. The 0 -> 1 terminal branch lengths are another surprising default that crept in during the port of NewickIO; I don't see a problem with changing it to keep 0-length branches at the tips. The phyloXML parser and writer shouldn't be introducing this bug; it should just be occurring when writing in Newick or Nexus formats. I'll keep you posted on the fix, probably this weekend. If you'd like to try it yourself, the code to edit is in Bio/Phylo/NewickIO.py. Best, Eric On Wed, Jun 15, 2011 at 9:03 AM, Brad Chapman wrote: > Nicolas; > Thanks for reporting the problem. The Biopython mailing lists > (http://biopython.org/wiki/Mailing_lists) are the right place to > report these types of issues. > > Hopefully Eric or someone else on the list will be able to help. > Thanks again, > Brad > > ----- Forwarded message from Nicolas Rochette < > nicolas.rochette at univ-lyon1.fr> ----- > > Date: Wed, 15 Jun 2011 14:25:09 +0200 > > Sir, > > I apologize for contacting you directly, but I could not find the > right place for this report. Could you please forward it ? > > The bug is about 0-length terminal branches being given a length of 1 > ; please find an example below. > > Regards, > > Nicolas Rochette > PhD student > Laboratory for Biometry and Evolutive Biology > Lyon, France > > // > > echo '(A:2,(B:0,C:3):0,D:5);' > foo.newick > java -cp forester.jar org.forester.application.phyloxml_converter > -f=nn -i foo.newick foo.phyloxml > python -c 'from Bio import Phylo; > Phylo.write(Phylo.read("foo.phyloxml","phyloxml"), "bar.newick", > "newick")' > cat bar.newick > > (A:2.00000,(B:1.00000,C:3.00000)0.00000:0.00000,D:5.00000)0.00:0.00000; > > > ----- End forwarded message ----- > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From eric.talevich at gmail.com Thu Jun 16 02:29:18 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 15 Jun 2011 22:29:18 -0400 Subject: [Biopython-dev] [Fwd: Bug in Biopython Phyloxml reader] In-Reply-To: References: <20110615130354.GD22528@sobchak> Message-ID: Folks, I pushed a very small fix: https://github.com/biopython/biopython/commit/db9d876ba2199efcce067241f5554ef701cb70e3 It appears I misunderstood the Nexus.Trees code while I was porting NewickIO, so there was no good reason for this behavior to be the default. Anyway, it behaves as expected now. Nicolas: Bio.Phylo includes a function called 'convert' which you may also find useful. >>> Phylo.convert('foo.xml', 'phyloxml', 'bar.nwk', 'newick') Cheers, Eric On Wed, Jun 15, 2011 at 10:20 AM, Eric Talevich wrote: > Hi Nicolas and Brad, > > Thanks for reporting and forwarding this. The 0 -> 1 terminal branch > lengths are another surprising default that crept in during the port of > NewickIO; I don't see a problem with changing it to keep 0-length branches > at the tips. The phyloXML parser and writer shouldn't be introducing this > bug; it should just be occurring when writing in Newick or Nexus formats. > > I'll keep you posted on the fix, probably this weekend. If you'd like to > try it yourself, the code to edit is in Bio/Phylo/NewickIO.py. > > Best, > Eric > > > On Wed, Jun 15, 2011 at 9:03 AM, Brad Chapman wrote: > >> Nicolas; >> Thanks for reporting the problem. The Biopython mailing lists >> (http://biopython.org/wiki/Mailing_lists) are the right place to >> report these types of issues. >> >> Hopefully Eric or someone else on the list will be able to help. >> Thanks again, >> Brad >> >> ----- Forwarded message from Nicolas Rochette < >> nicolas.rochette at univ-lyon1.fr> ----- >> >> Date: Wed, 15 Jun 2011 14:25:09 +0200 >> >> Sir, >> >> I apologize for contacting you directly, but I could not find the >> right place for this report. Could you please forward it ? >> >> The bug is about 0-length terminal branches being given a length of 1 >> ; please find an example below. >> >> Regards, >> >> Nicolas Rochette >> PhD student >> Laboratory for Biometry and Evolutive Biology >> Lyon, France >> >> // >> >> echo '(A:2,(B:0,C:3):0,D:5);' > foo.newick >> java -cp forester.jar org.forester.application.phyloxml_converter >> -f=nn -i foo.newick foo.phyloxml >> python -c 'from Bio import Phylo; >> Phylo.write(Phylo.read("foo.phyloxml","phyloxml"), "bar.newick", >> "newick")' >> cat bar.newick >> >> (A:2.00000,(B:1.00000,C:3.00000)0.00000:0.00000,D:5.00000)0.00:0.00000; >> >> >> ----- End forwarded message ----- >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > > From updates at feedmyinbox.com Thu Jun 16 11:12:51 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Thu, 16 Jun 2011 07:12:51 -0400 Subject: [Biopython-dev] 6/16 biopython Questions - BioStar Message-ID: <5a4c5e8b0ca0dec3061963e9b2bd7aa2@74.63.51.88> // Reduce BLAST XML size? // June 15, 2011 at 10:59 PM http://biostar.stackexchange.com/questions/9246/reduce-blast-xml-size Hi, I have a really large BLAST XML file - something like 30gb in size. I'd like to reduce it so I can run through it quicker with Biopython. Is there a way to reduce the file by keeping something like the top 25 hits based on bitscore for each query. Preferably I'd like to do with Python/Biopython. Thanks // running tests/tools in MTAP platform // June 14, 2011 at 12:46 PM http://biostar.stackexchange.com/questions/9155/running-tests-tools-in-mtap-platform My goal is to run different tools and compare the results in MTAP. Tools include SLIMFINDER, D-STAR, MEME, DILIMOT - input known linear motif sequences and get the results. How do i run these tests in MTAP? What files do i need to edit? What are the command line options that will do this? // What is biopython and bioeclipse used for? // June 4, 2011 at 12:10 PM http://biostar.stackexchange.com/questions/8844/what-is-biopython-and-bioeclipse-used-for Just would like to know why do things like biopython and bioeclipse exist and in which bioinformatics context are they used -- Website: http://biostar.stackexchange.com/questions/tagged/biopython Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782463/cfe3e2c307e215f87d612a439b646b9c22290b84/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From updates at feedmyinbox.com Thu Jun 16 11:12:51 2011 From: updates at feedmyinbox.com (Feed My Inbox) Date: Thu, 16 Jun 2011 07:12:51 -0400 Subject: [Biopython-dev] 6/16 newest questions tagged biopython - Stack Overflow Message-ID: <1e867a9860819ff5bf648e82dcf1d07b@74.63.51.88> // Convert GenBank Flatfiles to FASTA // June 13, 2011 at 5:55 PM http://stackoverflow.com/questions/6336853/convert-genbank-flatfiles-to-fasta I need to parse a preliminary GenBank Flatfile. The sequence hasn't been published yet, so I can't look it up by accession and download a FASTA file. I'm new to Bioinformatics, so could someone show me where I could find a BioPerl or BioPython script to do this myself? Thanks! // have trouble in installing biopython package // June 7, 2011 at 3:44 PM http://stackoverflow.com/questions/6270730/have-trouble-in-installing-biopython-package im using mac 10.6.7, and xcode 4 with gcc 4.2 installed. but when i was installing biopython with: python setup.py install on the command, it gives out error on gcc: 10-54-41-155-wireless1x:biopython-1.57 xueran2010$ python setup.py install running install running build running build_py running build_ext building 'Bio.cpairwise2' extension gcc-4.2 -fno-strict-aliasing -fno-common -dynamic -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch i386 -arch ppc -arch x86_64 -pipe -IBio -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 -c Bio/cpairwise2module.c -o build/temp.macosx-10.6-universal-2.6/Bio/cpairwise2module.o /usr/libexec/gcc/powerpc-apple-darwin10/4.2.1/as: assembler (/usr/bin/../libexec/gcc/darwin/ppc/as or /usr/bin/../local/libexec/gcc/darwin/ppc/as) for architecture ppc not installed Installed assemblers are: /usr/bin/../libexec/gcc/darwin/x86_64/as for architecture x86_64 /usr/bin/../libexec/gcc/darwin/i386/as for architecture i386 Bio/cpairwise2module.c:639: fatal error: error writing to -: Broken pipe compilation terminated. lipo: can't open input file: /var/folders/ir/ir6RCJTKGB4QU5sVdTXwt++++TI/-Tmp-//cccUvTiF.out (No such file or directory) error: command 'gcc-4.2' failed with exit status 1 -- Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest Account Login: https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email Unsubscribe here: http://www.feedmyinbox.com/feeds/unsubscribe/782465/c6ce4e74edf1048798e4b627c86b1b0b51013840/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email -- This email was carefully delivered by FeedMyInbox.com. PO Box 682532 Franklin, TN 37068 From b.invergo at gmail.com Thu Jun 16 15:34:00 2011 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 16 Jun 2011 17:34:00 +0200 Subject: [Biopython-dev] pypaml In-Reply-To: <20110615115425.GB22528@sobchak> References:

<20110223131151.GE4922@sobchak.mgh.harvard.edu> <20110228163521.GF9652@sobchak.mgh.harvard.edu>