From harsh.beria93 at gmail.com Mon Mar 3 16:57:35 2014 From: harsh.beria93 at gmail.com (Harsh Beria) Date: Tue, 4 Mar 2014 03:27:35 +0530 Subject: [Biopython-dev] Gsoc 2014 aspirant In-Reply-To: References: <1393582069.6863.YahooMailBasic@web164006.mail.gq1.yahoo.com>

Message-ID: The pairwise alignment project is not listed in the Ideas page. If I work on it and make a GUI or command line frontend, can that be taken up as a GSOC project? Who can be the potential mentor for this project so that I can chalk out the details before starting to code? Also, do I need to add the project on the idea page? On Sat, Mar 1, 2014 at 12:40 AM, Harsh Beria wrote: > I can work on pairwise sequence alignment. Actually, I have previously > worked on this using Dynamic programming. But I doubt whether this can be a > GSOC project because the work load will not be too much. If we use > different methods to predict sequence alignment and make a front-end which > allows the user to input the sequence or even a pdb file and method of > alignment and predict the alignment, the work can be substantial enough. > > Also, as suggested by Christopher, sequence alignment is pretty basic and > we can use C backend, which can significantly improve the runtime. So, we > can discuss it and I can start working on it. > > > On Fri, Feb 28, 2014 at 11:15 PM, Fields, Christopher J < > cjfields at illinois.edu> wrote: > >> I'm wondering, with something that is as broadly applicable as pairwise >> alignment, would it be better to implement only in Python (or implement in >> Python wedded to a C backend)? Or maybe set up something in python that >> taps into an already well-defined C/C++ library that does this? >> >> The reason I mention this: with bioperl we went down this route with >> bioperl-ext a long time ago (these are generally C-based backend tools with >> a perl front-end), that bit-rotted simply b/c there were other more >> maintainable options. IIUC from this post, similar issues re: >> maintainability held for Bio/pairwise2.py (unless I'm mistaken, which is >> entirely possible). However, tools like pysam and Bio::DB::Samtools (on >> the perl end) seem to have been maintained much more readily since they tap >> into a common library. >> >> For instance, my suggestion would be to implement a Biopython tool that >> does pairwise alignment using library X (SeqAn, EMBOSS, etc). Or maybe a >> generic python front-end that allows users to pick the tool/method for the >> alignment, with maybe a library binding as an initial implementation. >> >> chris >> >> On Feb 28, 2014, at 4:07 AM, Michiel de Hoon wrote: >> >> > Hi Harsh Beria, >> > >> > One option is to work on pairwise sequence alignments. Currently there >> is some code for that in Biopython (in Bio/pairwise2.py), but it is not >> general and is not being maintained. This may need to be rebuilt from the >> ground up. >> > >> > Best, >> > -Michiel. >> > >> > -------------------------------------------- >> > On Wed, 2/26/14, Harsh Beria wrote: >> > >> > Subject: [Biopython-dev] Gsoc 2014 aspirant >> > To: biopython at lists.open-bio.org, biopython-dev at lists.open-bio.org, >> gsoc at lists.open-bio.org >> > Date: Wednesday, February 26, 2014, 11:14 AM >> > >> > Hi, >> > >> > I am a Harsh Beria, third year UG student at Indian >> > Institute of >> > Technology, Kharagpur. I have started working in >> > Computational Biophysics >> > recently, having written code for pdb to fasta parser, >> > sequence alignment >> > using Needleman Wunch and Smith Waterman, Secondary >> > Structure prediction, >> > Henikoff's weight and am currently working on Monte Carlo >> > simulation. >> > Overall, I have started to like this field and want to carry >> > my interest >> > forward by pursuing a relevant project for GSOC 2014. I >> > mainly code in C >> > and python and would like to start contributing to the >> > Biopython library. I >> > started going through the official contribution wiki page ( >> > http://biopython.org/wiki/Contributing) >> > >> > I also went through the wiki page of Bio.SeqlO's. I >> > seriously want to >> > contribute to the Biopython library through GSOC. What do I >> > do next ? >> > >> > Thanks >> > -- >> > >> > Harsh Beria, >> > Indian Institute of Technology,Kharagpur >> > E-mail: harsh.beria93 at gmail.com >> > _______________________________________________ >> > Biopython-dev mailing list >> > Biopython-dev at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > >> > _______________________________________________ >> > Biopython-dev mailing list >> > Biopython-dev at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >> > > > -- > > Harsh Beria, > Indian Institute of Technology,Kharagpur > E-mail: harsh.beria93 at gmail.com > > Ph: +919332157616 > > -- Harsh Beria, Indian Institute of Technology,Kharagpur E-mail: harsh.beria93 at gmail.com Ph: +919332157616 From mjldehoon at yahoo.com Tue Mar 4 05:40:52 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 4 Mar 2014 02:40:52 -0800 (PST) Subject: [Biopython-dev] Gsoc 2014 aspirant In-Reply-To: Message-ID: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com> I would suggest to implement this in C, with a thin wrapper in Python. Using 3rd-party libraries would increase the compile-time dependencies of Biopython. Anyway I expect that the tricky part will be the design of the module, rather than the algorithms themselves, so using 3rd-party libraries wouldn't help us so much. Best, -Michiel. -------------------------------------------- On Fri, 2/28/14, Fields, Christopher J wrote: Subject: Re: [Biopython-dev] Gsoc 2014 aspirant To: "Michiel de Hoon" Cc: "biopython-dev at lists.open-bio.org" , "Harsh Beria" Date: Friday, February 28, 2014, 12:45 PM I?m wondering, with something that is as broadly applicable as pairwise alignment, would it be better to implement only in Python (or implement in Python wedded to a C backend)?? Or maybe set up something in python that taps into an already well-defined C/C++ library that does this?? The reason I mention this: with bioperl we went down this route with bioperl-ext a long time ago (these are generally C-based backend tools with a perl front-end), that bit-rotted simply b/c there were other more maintainable options.? IIUC from this post, similar issues re: maintainability held for Bio/pairwise2.py (unless I?m mistaken, which is entirely possible).? However, tools like pysam and Bio::DB::Samtools (on the perl end) seem to have been maintained much more readily since they tap into a common library. For instance, my suggestion would be to implement a Biopython tool that does pairwise alignment using library X (SeqAn, EMBOSS, etc).? Or maybe a generic python front-end that allows users to pick the tool/method for the alignment, with maybe a library binding as an initial implementation. chris On Feb 28, 2014, at 4:07 AM, Michiel de Hoon wrote: > Hi Harsh Beria, > > One option is to work on pairwise sequence alignments. Currently there is some code for that in Biopython (in Bio/pairwise2.py), but it is not general and is not being maintained. This may need to be rebuilt from the ground up. > > Best, > -Michiel. > > -------------------------------------------- > On Wed, 2/26/14, Harsh Beria wrote: > > Subject: [Biopython-dev] Gsoc 2014 aspirant > To: biopython at lists.open-bio.org, biopython-dev at lists.open-bio.org, gsoc at lists.open-bio.org > Date: Wednesday, February 26, 2014, 11:14 AM > > Hi, > > I am a Harsh Beria, third year UG student at Indian > Institute of > Technology, Kharagpur. I have started working in > Computational Biophysics > recently, having written code for pdb to fasta parser, > sequence alignment > using Needleman Wunch and Smith Waterman, Secondary > Structure prediction, > Henikoff's weight and am currently working on Monte Carlo > simulation. > Overall, I have started to like this field and want to carry > my interest > forward by pursuing a relevant project for GSOC 2014. I > mainly code in C > and python and would like to start contributing to the > Biopython library. I > started going through the official contribution wiki page ( > http://biopython.org/wiki/Contributing) > > I also went through the wiki page of Bio.SeqlO's. I > seriously want to > contribute to the Biopython library through GSOC. What do I > do next ? > > Thanks > -- > > Harsh Beria, > Indian Institute of Technology,Kharagpur > E-mail: harsh.beria93 at gmail.com > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Tue Mar 4 07:32:47 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 4 Mar 2014 12:32:47 +0000 Subject: [Biopython-dev] Gsoc 2014 aspirant In-Reply-To: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com> References: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: On Tue, Mar 4, 2014 at 10:40 AM, Michiel de Hoon wrote: > I would suggest to implement this in C, with a thin wrapper in Python. > Using 3rd-party libraries would increase the compile-time dependencies of Biopython. > Anyway I expect that the tricky part will be the design of the module, rather than the algorithms themselves, so using 3rd-party libraries wouldn't help us so much. > Best, > -Michiel. I would also consider a pure Python implementation on top, both for cross-testing, but also for use under Jython or PyPy where using the C code wouldn't be possible (or at least, becomes more complicated). (This is what the existing Bio.pairwise2 module does) Adding third party C libraries would also make life hard for cross platform testing (Linux, Mac, Windows). Peter From cjfields at illinois.edu Tue Mar 4 15:25:47 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 4 Mar 2014 20:25:47 +0000 Subject: [Biopython-dev] Gsoc 2014 aspirant In-Reply-To: References: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: On Mar 4, 2014, at 6:32 AM, Peter Cock wrote: > On Tue, Mar 4, 2014 at 10:40 AM, Michiel de Hoon wrote: >> I would suggest to implement this in C, with a thin wrapper in Python. >> Using 3rd-party libraries would increase the compile-time dependencies of Biopython. >> Anyway I expect that the tricky part will be the design of the module, rather than the algorithms themselves, so using 3rd-party libraries wouldn't help us so much. >> Best, >> -Michiel. > > I would also consider a pure Python implementation on top, > both for cross-testing, but also for use under Jython or PyPy > where using the C code wouldn't be possible (or at least, > becomes more complicated). > > (This is what the existing Bio.pairwise2 module does) Ah, so it?s pure python. Makes sense to have it for that purpose. You could simply repurpose the existing code. > Adding third party C libraries would also make life hard for > cross platform testing (Linux, Mac, Windows). > > Peter This is a problem with bioinformatics tools in general; they simply aren?t Windows-friendly. However, one can write code with portability in mind (even C/C++). chris From p.j.a.cock at googlemail.com Tue Mar 4 16:45:09 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 4 Mar 2014 21:45:09 +0000 Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: References: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com>

Message-ID: On Tuesday, March 4, 2014, Fields, Christopher J wrote: > On Mar 4, 2014, at 6:32 AM, Peter Cock > > wrote: > > > On Tue, Mar 4, 2014 at 10:40 AM, Michiel de Hoon > > wrote: > >> I would suggest to implement this in C, with a thin wrapper in Python. > >> Using 3rd-party libraries would increase the compile-time dependencies > of Biopython. > >> Anyway I expect that the tricky part will be the design of the module, > rather than the algorithms themselves, so using 3rd-party libraries > wouldn't help us so much. > >> Best, > >> -Michiel. > > > > I would also consider a pure Python implementation on top, > > both for cross-testing, but also for use under Jython or PyPy > > where using the C code wouldn't be possible (or at least, > > becomes more complicated). > > > > (This is what the existing Bio.pairwise2 module does) > > Ah, so it's pure python. Makes sense to have it for that purpose. You > could simply repurpose the existing code. > > Apologies if unclear - Biopython has both a C and pure Python version of pairwise2 - although most of our bits of C code don't have a fallback and so break under Jython or PyPy etc. Personally I am optimistic about the potential of PyPy to speed up most Python code with its JIT so am a little wary of adding more C code (which may act as a barrier to entry for future maintainers) without a matching Python implementation - but appreciate that for typical C Python this is often the best way to attain high performance. But Michiel is absolutely right - the algorithm choice is even more important. > > Adding third party C libraries would also make life hard for > > cross platform testing (Linux, Mac, Windows). > > > > Peter > > This is a problem with bioinformatics tools in general; they simply aren't > Windows-friendly. However, one can write code with portability in mind > (even C/C++). > > chris Yes indeed - this is one reason why the buildbot for automated cross-platform testing is really helpful (since few if currently any of the Biopython developers use Windows as their primary system). Peter From nigel.delaney at outlook.com Tue Mar 4 17:39:04 2014 From: nigel.delaney at outlook.com (Nigel Delaney) Date: Tue, 4 Mar 2014 17:39:04 -0500 Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: References: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com>

Message-ID: As a quick $0.02 from a library user on this. Back in 2006 when I first started using biopython I was working with Bio.pairwise2, and learned that it was too slow for the task at hand, so wound up switching to passing/parsing files to an aligner on the command line to get around this, as (on windows back then), I never got the compiled C code to work properly. Although pypy may be fast enough and has been very impressive in my experience (plus computers are much faster now), I think wrapping a library that is cross platform and maintains its own binaries would be a great option rather than implementing C-code (I think this might be what the GSoC student did last year for BioPython by wrapping Python). Mostly though I just wanted to second Peter's wariness about adding C-code into the library. I have found over the years that a lot of python scientific tools that in theory should be cross platform (Stampy, IPython, Matplotlib, Numpy, GATK, etc.) are really not and can be a huge timesuck of dealing with installation issues as code moves between computers and operating systems, usually due to some C code or OS specific behavior. Since code in python works anywhere python is installed, I really appreciate the extent that the library can be as much pure python as allowable or strictly dependent on a particular downloadable binary for a specific OS/Architecture/Scenario. From mjldehoon at yahoo.com Wed Mar 5 20:49:49 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 5 Mar 2014 17:49:49 -0800 (PST) Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: Message-ID: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> Hi Nigel, While compiling Biopython on Windows can be tricky, in my experience it has been easy to compile the C libraries in Biopython on other platforms (Unix/Linux/MacOSX). Have you run into specific problems compiling Biopython? I would think that wrapping 3rd-party libraries or executables is much more error-prone. Best, -Michiel. -------------------------------------------- On Tue, 3/4/14, Nigel Delaney wrote: Subject: Re: [Biopython-dev] [GSoC] Gsoc 2014 aspirant To: "'Peter Cock'" , "'Fields, Christopher J'" Cc: biopython-dev at lists.open-bio.org, "'Harsh Beria'" Date: Tuesday, March 4, 2014, 5:39 PM As a quick $0.02 from a library user on this.? Back in 2006 when I first started using biopython I was working with Bio.pairwise2, and learned that it was too slow for the task at hand, so wound up switching to passing/parsing files to an aligner on the command line to get around this, as (on windows back then), I never got the compiled C code to work properly. Although pypy may be fast enough and has been very impressive in my experience (plus computers are much faster now), I think wrapping a library that is cross platform and maintains its own binaries would be a great option rather than implementing C-code (I think this might be what the GSoC student did last year for BioPython by wrapping Python). Mostly though I just wanted to second Peter's wariness about adding C-code into the library.? I have found over the years that a lot of python scientific tools that in theory should be cross platform (Stampy, IPython, Matplotlib, Numpy, GATK, etc.) are really not and can be a huge timesuck of dealing with installation issues as code moves between computers and operating systems, usually due to some C code or OS specific behavior. Since code in python works anywhere python is installed, I really appreciate the extent that the library can be as much pure python as allowable or strictly dependent on a particular downloadable binary for a specific OS/Architecture/Scenario. _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From harsh.beria93 at gmail.com Fri Mar 7 18:41:53 2014 From: harsh.beria93 at gmail.com (Harsh Beria) Date: Sat, 8 Mar 2014 05:11:53 +0530 Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> References: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: Hi, Regarding the algorithm part of Pairwise Sequence Alignment, I can use Dynamic Programming (Smith Waterman for local and Needleman Wunsch for Global Alignment). Please suggest if I should go for dynamic programming. Also, the above discussion points out that the implementation should be purely python based for cross-platform compatibility. On Thu, Mar 6, 2014 at 7:19 AM, Michiel de Hoon wrote: > Hi Nigel, > > While compiling Biopython on Windows can be tricky, in my experience it > has been easy to compile the C libraries in Biopython on other platforms > (Unix/Linux/MacOSX). Have you run into specific problems compiling > Biopython? I would think that wrapping 3rd-party libraries or executables > is much more error-prone. > > Best, > -Michiel. > > -------------------------------------------- > On Tue, 3/4/14, Nigel Delaney wrote: > > Subject: Re: [Biopython-dev] [GSoC] Gsoc 2014 aspirant > To: "'Peter Cock'" , "'Fields, Christopher > J'" > Cc: biopython-dev at lists.open-bio.org, "'Harsh Beria'" < > harsh.beria93 at gmail.com> > Date: Tuesday, March 4, 2014, 5:39 PM > > As a quick $0.02 from a library user > on this. Back in 2006 when I first > started using biopython I was working with Bio.pairwise2, > and learned that > it was too slow for the task at hand, so wound up switching > to > passing/parsing files to an aligner on the command line to > get around this, > as (on windows back then), I never got the compiled C code > to work properly. > Although pypy may be fast enough and has been very > impressive in my > experience (plus computers are much faster now), I think > wrapping a library > that is cross platform and maintains its own binaries would > be a great > option rather than implementing C-code (I think this might > be what the GSoC > student did last year for BioPython by wrapping Python). > > Mostly though I just wanted to second Peter's wariness about > adding C-code > into the library. I have found over the years that a > lot of python > scientific tools that in theory should be cross platform > (Stampy, IPython, > Matplotlib, Numpy, GATK, etc.) are really not and can be a > huge timesuck of > dealing with installation issues as code moves between > computers and > operating systems, usually due to some C code or OS specific > behavior. > Since code in python works anywhere python is installed, I > really appreciate > the extent that the library can be as much pure python as > allowable or > strictly dependent on a particular downloadable binary for a > specific > OS/Architecture/Scenario. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > -- Harsh Beria, Indian Institute of Technology,Kharagpur E-mail: harsh.beria93 at gmail.com Ph: +919332157616 From tra at popgen.net Mon Mar 10 13:02:17 2014 From: tra at popgen.net (Tiago Antao) Date: Mon, 10 Mar 2014 17:02:17 +0000 Subject: [Biopython-dev] Installing all biopython dependencies Message-ID: <20140310170217.048f00a1@lnx> Hi, I am trying to create a easy-to-install, easy-to-replicate Virtual Machine(*) with all the requirements for Biopython. The idea is mainly to make it easy to have reliable testing, but it can also be used as a very fast installation of Biopython. The VM is currently based on Ubuntu saucy, and I am trying to make sure all the dependencies are met. I would like some advice on the following please: EmbossPhylipNew - the ubuntu package (embassy-phylip) has dependency problems, so I guess this requires a manual download/install? reportlab - What is the best way to get the fonts? XXmotif - What is this??? PAML - There seemed to be a ubuntu package, but no more? The following packages require manual installation (no ubuntu package), please correct me if I am wrong (makes my life easier)... DSSP Dialign msaprobs NACCESS Prank Probcons TCoffee (*) Actually I am building a docker container, but for ease of explanation it is similar to the more familiar Virtual Machine concept Thanks, Tiago From p.j.a.cock at googlemail.com Mon Mar 10 13:20:43 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 10 Mar 2014 17:20:43 +0000 Subject: [Biopython-dev] Installing all biopython dependencies In-Reply-To: <20140310170217.048f00a1@lnx> References: <20140310170217.048f00a1@lnx> Message-ID: On Mon, Mar 10, 2014 at 5:02 PM, Tiago Antao wrote: > Hi, > > I am trying to create a easy-to-install, easy-to-replicate Virtual > Machine(*) with all the requirements for Biopython. The idea is mainly > to make it easy to have reliable testing, but it can also be used as a > very fast installation of Biopython. Sounds good :) > The VM is currently based on Ubuntu saucy, and I am trying to > make sure all the dependencies are met. Some of this would apply to the TravisCI VM, which is also Debian/Ubuntu based. There we have to balance total run time (install everything & run tests) against full coverage). https://travis-ci.org/biopython/biopython/builds It would be neat to have an instance of your docker based VM running as a buildslave too... http://testing.open-bio.org/biopython/tgrid > I would like some advice on the following please: > > EmbossPhylipNew - the ubuntu package (embassy-phylip) has dependency > problems, so I guess this requires a manual download/install? > reportlab - What is the best way to get the fonts? > XXmotif - What is this??? > PAML - There seemed to be a ubuntu package, but no more? > > > The following packages require manual installation (no ubuntu > package), please correct me if I am wrong (makes my life easier)... > > DSSP > Dialign > msaprobs > NACCESS > Prank > Probcons > TCoffee For TravisCI we install a Debian/Ubuntu package for t-coffee, so at least that ought to be easy. e.g. https://packages.debian.org/sid/t-coffee Others (where the licence permits) we can request DebianMed/ BioLinux look at for packaging... Peter From anaryin at gmail.com Mon Mar 10 13:20:37 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 10 Mar 2014 18:20:37 +0100 Subject: [Biopython-dev] Installing all biopython dependencies In-Reply-To: <20140310170217.048f00a1@lnx> References: <20140310170217.048f00a1@lnx> Message-ID: Hi Tiago, For DSSP and NACCESS, you need a manual installation. DSSP is publicly available (binaries): ftp://ftp.cmbi.ru.nl/pub/software/dssp/ NACCESS is more complicated.. you need a license to get it and g77 installed to compile. You might have to contact the authors to allow such a broad distribution.. ? Cheers, Jo?o From tra at popgen.net Mon Mar 10 13:28:54 2014 From: tra at popgen.net (Tiago Antao) Date: Mon, 10 Mar 2014 17:28:54 +0000 Subject: [Biopython-dev] Installing all biopython dependencies In-Reply-To: References: <20140310170217.048f00a1@lnx> Message-ID: <20140310172854.07b0c1df@lnx> On Mon, 10 Mar 2014 18:20:37 +0100 Jo?o Rodrigues wrote: > NACCESS is more complicated.. you need a license to get it and g77 > installed to compile. You might have to contact the authors to allow > such a broad distribution.. Thanks, I might skip NACCESS at this stage. From tra at popgen.net Mon Mar 10 13:33:20 2014 From: tra at popgen.net (Tiago Antao) Date: Mon, 10 Mar 2014 17:33:20 +0000 Subject: [Biopython-dev] Installing all biopython dependencies In-Reply-To: References: <20140310170217.048f00a1@lnx> Message-ID: <20140310173320.228a7866@lnx> On Mon, 10 Mar 2014 17:20:43 +0000 Peter Cock wrote: > It would be neat to have an instance of your docker based > VM running as a buildslave too... > http://testing.open-bio.org/biopython/tgrid That was my original objective, which I have split into two: 1. A biopython docker container 2. A buildbot docker container for biopython (a different kind of beast) And then research how this might integrate with BioCloudLinux. As a side I have to say that using docker is progressing quite well and it seems a very interesting platform for deployment and testing. > Others (where the licence permits) we can request DebianMed/ > BioLinux look at for packaging... >From the problematic list, I will gather a list of software whose license permits packaging and report back on this. Tiago From tra at popgen.net Tue Mar 11 08:04:13 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 11 Mar 2014 12:04:13 +0000 Subject: [Biopython-dev] test_Fasttree_tool Message-ID: <20140311120413.73bbac2e@lnx> Hi, When I run test_Fasttree_tool standalone, all goes well. But if I run it through run_test.py I get this: ====================================================================== FAIL: runTest (__main__.ComparisonTestCase) test_Fasttree_tool ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 302, in runTest self.fail("Warning: Can't open %s for test %s" % (outputfile, self.name)) AssertionError: Warning: Can't open ./output/test_Fasttree_tool for test test_Fasttree_tool ---------------------------------------------------------------------- Any ideas? Thanks, T From harijay at gmail.com Tue Mar 11 09:37:39 2014 From: harijay at gmail.com (hari jayaram) Date: Tue, 11 Mar 2014 09:37:39 -0400 Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings In-Reply-To: <1388053712.20811.YahooMailBasic@web164006.mail.gq1.yahoo.com> References: <1388053712.20811.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: Hi all, I just pull-ed from the git repository just now and after installing the newest numpy and scipy ( also from their respective git repos)..when I try to install biopython I get the same error complaining that I need to define : #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] I tried adding to file "/usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/old_defines.h" the following line #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION But it still fails to install with an error as indicated below. I am sorry I dont know how to work around this. Thanks for your help Hari ################# error message ################# In file included from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1725: /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:11:2: warning: "Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] #warning "Using deprecated NumPy API, disable it by #defining ... ^ /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:23:2: error: Should never include npy_deprecated_api directly. #error Should never include npy_deprecated_api directly. ^ In file included from Bio/Cluster/clustermodule.c:3: In file included from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:15: In file included from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17: In file included from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1725: In file included from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:126: /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/old_defines.h:6:2: error: The header "old_defines.h" is deprecated as of NumPy 1.7. #error The header "old_defines.h" is deprecated as of NumPy 1.7. ^ 1 warning and 2 errors generated. error: command 'cc' failed with exit status 1 On Thu, Dec 26, 2013 at 5:28 AM, Michiel de Hoon wrote: > Fixed; please let us know if you encounter any problems. > > -Michiel. > > > > -------------------------------------------- > On Mon, 9/23/13, Peter Cock wrote: > > Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings > To: "Biopython-Dev Mailing List" > Date: Monday, September 23, 2013, 4:58 PM > > Hi all, > > I'm seeing the following warning from NumPy 1.7 with Python > 3.3 on Mac > OS X, and on Linux too. I believe the NumPy version is the > critical > factor: > > building 'Bio.Cluster.cluster' extension > building 'Bio.KDTree._CKDTree' extension > building 'Bio.Motif._pwm' extension > building 'Bio.motifs._pwm' extension > > all give: > > > /Users/peterjc/lib/python3.3/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:11:2: > warning: "Using > deprecated NumPy API, disable it by > #defining > NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] > > According to this page, > http://docs.scipy.org/doc/numpy-dev/reference/c-api.deprecations.html > > If we add this line it should confirm our code is clean for > NumPy 1.7 > (and implies to side effects on older NumPy): > > #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION > > Unfortunately that seems all four modules have problems > doing > that, presumably planned NumPy C API changes we need to > handle via a version conditional #ifdef? > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From p.j.a.cock at googlemail.com Tue Mar 11 09:42:55 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 11 Mar 2014 13:42:55 +0000 Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings In-Reply-To: References: <1388053712.20811.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: On Tue, Mar 11, 2014 at 1:37 PM, hari jayaram wrote: > Hi all, > I just pull-ed from the git repository just now and after installing the > newest numpy and scipy ( also from their respective git repos)..when I try > to install biopython I get the same error complaining that I need to define > : > > #defining NPY_NO_DEPRECATED_API > NPY_1_7_API_VERSION" [-W#warnings] > > I tried adding to file > "/usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/old_defines.h" > the following line > > > #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION > > > But it still fails to install with an error as indicated below. > > I am sorry I dont know how to work around this. > Thanks for your help > > Hari I suspect based on this NumPy thread that it is a problem with your NumPy install, perhaps you have some old files from a previous NumPy installation which are confusing things? http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069396.html Peter From p.j.a.cock at googlemail.com Tue Mar 11 09:52:47 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 11 Mar 2014 13:52:47 +0000 Subject: [Biopython-dev] test_Fasttree_tool In-Reply-To: <20140311120413.73bbac2e@lnx> References: <20140311120413.73bbac2e@lnx> Message-ID: On Tue, Mar 11, 2014 at 12:04 PM, Tiago Antao wrote: > Hi, > > When I run test_Fasttree_tool standalone, all goes well. But if I run > it through run_test.py I get this: > ====================================================================== > FAIL: runTest (__main__.ComparisonTestCase) > test_Fasttree_tool > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "run_tests.py", line 302, in runTest > self.fail("Warning: Can't open %s for test %s" % (outputfile, > self.name)) AssertionError: Warning: Can't > open ./output/test_Fasttree_tool for test test_Fasttree_tool > > ---------------------------------------------------------------------- > > > Any ideas? > > Thanks, > T I'm surprised it ever works - the expected output file is not in git :( Try: $ run_tests.py -g test_Fasttree_tool $ more output/test_Fasttree_tool $ git add output/test_Fasttree_tool $ git commit -m "Checking in missing output file for test_Fasttree_tool.py" Peter From tra at popgen.net Tue Mar 11 10:38:53 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 11 Mar 2014 14:38:53 +0000 Subject: [Biopython-dev] A docker container for Biopython Message-ID: <20140311143853.054f89fe@lnx> Hi, In a effort to have a complete, reliable and easy to replicate testing platform for Biopython I am in the process of creating a docker container (inspired by Brad's CloudBioLinux work) with everything needed for Biopython. I currently have a container that allows easy installation of Biopython. I have documented the process here: http://fe.popgen.net/2014/03/a-docker-container-for-biopython/ A few points: 1. A few applications still missing, not many 2. The fasttree test case is still failing 3. Database servers are included 4. This can be used to do a very fast deploy of Biopython (teaching, demo, etc...) 5. The container to test biopython (buildbot based) will be a different one (and probably only of interest to Peter and me ;) ) This is my first container, problems & suggestions most welcome! Tiago From harijay at gmail.com Tue Mar 11 20:17:39 2014 From: harijay at gmail.com (hari jayaram) Date: Tue, 11 Mar 2014 20:17:39 -0400 Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings In-Reply-To: References: <1388053712.20811.YahooMailBasic@web164006.mail.gq1.yahoo.com>

Message-ID: Thanks Peter .. That was indeed the case. I had a python in /usr/local/lib/python2.7/site-packages/numpy That was getting called rather than the one in my .virtualenv Once I removed that python . The install progressed very smoothly Thanks for your help Hari On Tue, Mar 11, 2014 at 9:42 AM, Peter Cock wrote: > On Tue, Mar 11, 2014 at 1:37 PM, hari jayaram wrote: > > Hi all, > > I just pull-ed from the git repository just now and after installing the > > newest numpy and scipy ( also from their respective git repos)..when I > try > > to install biopython I get the same error complaining that I need to > define > > : > > > > #defining NPY_NO_DEPRECATED_API > > NPY_1_7_API_VERSION" [-W#warnings] > > > > I tried adding to file > > > "/usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/old_defines.h" > > the following line > > > > > > #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION > > > > > > But it still fails to install with an error as indicated below. > > > > I am sorry I dont know how to work around this. > > Thanks for your help > > > > Hari > > I suspect based on this NumPy thread that it is a problem with > your NumPy install, perhaps you have some old files from a > previous NumPy installation which are confusing things? > > http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069396.html > > Peter > From p.j.a.cock at googlemail.com Tue Mar 11 20:25:33 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 12 Mar 2014 00:25:33 +0000 Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings In-Reply-To: References: <1388053712.20811.YahooMailBasic@web164006.mail.gq1.yahoo.com>

Message-ID: Great - thanks for letting us know that solved the problem :) Peter On Wed, Mar 12, 2014 at 12:17 AM, hari jayaram wrote: > Thanks Peter .. > That was indeed the case. I had a python in > > /usr/local/lib/python2.7/site-packages/numpy > > That was getting called rather than the one in my .virtualenv > > Once I removed that python . The install progressed very smoothly > > Thanks for your help > > Hari > > > > On Tue, Mar 11, 2014 at 9:42 AM, Peter Cock > wrote: >> >> On Tue, Mar 11, 2014 at 1:37 PM, hari jayaram wrote: >> > Hi all, >> > I just pull-ed from the git repository just now and after installing >> > the >> > newest numpy and scipy ( also from their respective git repos)..when I >> > try >> > to install biopython I get the same error complaining that I need to >> > define >> > : >> > >> > #defining NPY_NO_DEPRECATED_API >> > NPY_1_7_API_VERSION" [-W#warnings] >> > >> > I tried adding to file >> > >> > "/usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/old_defines.h" >> > the following line >> > >> > >> > #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION >> > >> > >> > But it still fails to install with an error as indicated below. >> > >> > I am sorry I dont know how to work around this. >> > Thanks for your help >> > >> > Hari >> >> I suspect based on this NumPy thread that it is a problem with >> your NumPy install, perhaps you have some old files from a >> previous NumPy installation which are confusing things? >> >> http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069396.html >> >> Peter > > From p.j.a.cock at googlemail.com Wed Mar 12 05:48:44 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 12 Mar 2014 09:48:44 +0000 Subject: [Biopython-dev] Problem on Mac OS X since update Xcode 5.1 Message-ID: Hi all, I installed the Xcode 5.1 update last night on my work Mac, and this seems to have broken the builds on Python 2.6 and 2.7 (run via builtbot). http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.6/builds/174/steps/compile/logs/stdio ... running build_ext building 'Bio.cpairwise2' extension creating build/temp.macosx-10.9-intel-2.6 creating build/temp.macosx-10.9-intel-2.6/Bio cc -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 -c Bio/cpairwise2module.c -o build/temp.macosx-10.9-intel-2.6/Bio/cpairwise2module.o clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future] clang: note: this will be a hard error (cannot be downgraded to a warning) in the future error: command 'cc' failed with exit status 1 http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.7/builds/170/steps/compile/logs/stdio ... running build_ext building 'Bio.cpairwise2' extension creating build/temp.macosx-10.9-intel-2.7 creating build/temp.macosx-10.9-intel-2.7/Bio cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c Bio/cpairwise2module.c -o build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future] clang: note: this will be a hard error (cannot be downgraded to a warning) in the future error: command 'cc' failed with exit status 1 This looks like a problem where distutils is using a gcc argument which cc (clang) used to ignore but not treats as an error. There will probably be similar reports on other Python projects as well... Peter From p.j.a.cock at googlemail.com Wed Mar 12 05:59:31 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 12 Mar 2014 09:59:31 +0000 Subject: [Biopython-dev] Problem on Mac OS X since update Xcode 5.1 In-Reply-To: References: Message-ID: On Wed, Mar 12, 2014 at 9:48 AM, Peter Cock wrote: > Hi all, > > I installed the Xcode 5.1 update last night on my work Mac, and > this seems to have broken the builds on Python 2.6 and 2.7 > (run via builtbot). > > http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.6/builds/174/steps/compile/logs/stdio > ... > running build_ext > building 'Bio.cpairwise2' extension > creating build/temp.macosx-10.9-intel-2.6 > creating build/temp.macosx-10.9-intel-2.6/Bio > cc -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common > -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX > -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g > -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 > -arch i386 -pipe > -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 > -c Bio/cpairwise2module.c -o > build/temp.macosx-10.9-intel-2.6/Bio/cpairwise2module.o > clang: error: unknown argument: '-mno-fused-madd' > [-Wunused-command-line-argument-hard-error-in-future] > clang: note: this will be a hard error (cannot be downgraded to a > warning) in the future > error: command 'cc' failed with exit status 1 > > http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.7/builds/170/steps/compile/logs/stdio > ... > running build_ext > building 'Bio.cpairwise2' extension > creating build/temp.macosx-10.9-intel-2.7 > creating build/temp.macosx-10.9-intel-2.7/Bio > cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 > -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd > -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes > -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes > -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe > -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 > -c Bio/cpairwise2module.c -o > build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o > clang: error: unknown argument: '-mno-fused-madd' > [-Wunused-command-line-argument-hard-error-in-future] > clang: note: this will be a hard error (cannot be downgraded to a > warning) in the future > error: command 'cc' failed with exit status 1 > > This looks like a problem where distutils is using a gcc argument > which cc (clang) used to ignore but not treats as an error. There > will probably be similar reports on other Python projects as well... > > Peter This looks relevant, especially this reply from Paul Kehrer which suggests this is entirely Apple's fault for shipping a Python and clang compiler which don't get along with the default settings: http://stackoverflow.com/questions/22313407/clang-error-unknown-argument-mno-fused-madd-psycopg2-installation-failure The suggested workaround seems to do the trick, $ export CFLAGS=-Qunused-arguments $ export CPPFLAGS=-Qunused-arguments Perhaps we can add this hack to our setup.py on Mac OS X... it seems harmless under gcc (e.g. my locally compiled version of Python 3.3 used gcc rather than clang)? Or it could be done via the buildbot setup, or on this buildslave directly (e.g. the ~/.bash_profile). What are folks' thoughts on this? We want it to remain easy to install Biopython from source under Mac OS X. Peter From tra at popgen.net Wed Mar 12 09:09:20 2014 From: tra at popgen.net (Tiago Antao) Date: Wed, 12 Mar 2014 13:09:20 +0000 Subject: [Biopython-dev] Logging in on the wiki Message-ID: <20140312130920.701a656e@lnx> Hi, Are people able to log in on the wiki? I am getting back a page with: " Google Error: invalid_request Error in parsing the OpenID auth request. Learn more" Maybe its a google thing, but it might be on our side? Tiago From w.arindrarto at gmail.com Wed Mar 12 09:15:45 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 12 Mar 2014 14:15:45 +0100 Subject: [Biopython-dev] Logging in on the wiki In-Reply-To: <20140312130920.701a656e@lnx> References: <20140312130920.701a656e@lnx> Message-ID: Hi Tiago, I can log in using my Google OpenID. Best, Bow On Wed, Mar 12, 2014 at 2:09 PM, Tiago Antao wrote: > Hi, > > Are people able to log in on the wiki? I am getting back a page with: > > " > Google > Error: invalid_request > Error in parsing the OpenID auth request. > Learn more" > > Maybe its a google thing, but it might be on our side? > > Tiago > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Wed Mar 12 09:19:45 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 12 Mar 2014 13:19:45 +0000 Subject: [Biopython-dev] Logging in on the wiki In-Reply-To: References: <20140312130920.701a656e@lnx> Message-ID: I too can log in to the wiki with my Google OpenID. (Probably unrelated, we had to restart MySQL on the server earlier this week) Peter On Wed, Mar 12, 2014 at 1:15 PM, Wibowo Arindrarto wrote: > Hi Tiago, > > I can log in using my Google OpenID. > > Best, > Bow > > On Wed, Mar 12, 2014 at 2:09 PM, Tiago Antao wrote: >> Hi, >> >> Are people able to log in on the wiki? I am getting back a page with: >> >> " >> Google >> Error: invalid_request >> Error in parsing the OpenID auth request. >> Learn more" >> >> Maybe its a google thing, but it might be on our side? >> >> Tiago >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From tra at popgen.net Wed Mar 12 09:23:45 2014 From: tra at popgen.net (Tiago Antao) Date: Wed, 12 Mar 2014 13:23:45 +0000 Subject: [Biopython-dev] Logging in on the wiki In-Reply-To: References: <20140312130920.701a656e@lnx> Message-ID: <20140312132345.3b577a47@lnx> On Wed, 12 Mar 2014 14:15:45 +0100 Wibowo Arindrarto wrote: > Hi Tiago, > > I can log in using my Google OpenID. > Thanks. I also can login now. I suppose it was something temporary on the google side... Tiago From cjfields at illinois.edu Wed Mar 12 10:46:15 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 12 Mar 2014 14:46:15 +0000 Subject: [Biopython-dev] Problem on Mac OS X since update Xcode 5.1 In-Reply-To: References:

Message-ID: On Mar 12, 2014, at 4:59 AM, Peter Cock wrote: > On Wed, Mar 12, 2014 at 9:48 AM, Peter Cock wrote: >> Hi all, >> >> I installed the Xcode 5.1 update last night on my work Mac, and >> this seems to have broken the builds on Python 2.6 and 2.7 >> (run via builtbot). >> >> http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.6/builds/174/steps/compile/logs/stdio >> ... >> running build_ext >> building 'Bio.cpairwise2' extension >> creating build/temp.macosx-10.9-intel-2.6 >> creating build/temp.macosx-10.9-intel-2.6/Bio >> cc -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common >> -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX >> -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g >> -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 >> -arch i386 -pipe >> -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 >> -c Bio/cpairwise2module.c -o >> build/temp.macosx-10.9-intel-2.6/Bio/cpairwise2module.o >> clang: error: unknown argument: '-mno-fused-madd' >> [-Wunused-command-line-argument-hard-error-in-future] >> clang: note: this will be a hard error (cannot be downgraded to a >> warning) in the future >> error: command 'cc' failed with exit status 1 >> >> http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.7/builds/170/steps/compile/logs/stdio >> ... >> running build_ext >> building 'Bio.cpairwise2' extension >> creating build/temp.macosx-10.9-intel-2.7 >> creating build/temp.macosx-10.9-intel-2.7/Bio >> cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 >> -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd >> -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes >> -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes >> -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe >> -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 >> -c Bio/cpairwise2module.c -o >> build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o >> clang: error: unknown argument: '-mno-fused-madd' >> [-Wunused-command-line-argument-hard-error-in-future] >> clang: note: this will be a hard error (cannot be downgraded to a >> warning) in the future >> error: command 'cc' failed with exit status 1 >> >> This looks like a problem where distutils is using a gcc argument >> which cc (clang) used to ignore but not treats as an error. There >> will probably be similar reports on other Python projects as well... >> >> Peter > > This looks relevant, especially this reply from Paul Kehrer which > suggests this is entirely Apple's fault for shipping a Python and > clang compiler which don't get along with the default settings: > > http://stackoverflow.com/questions/22313407/clang-error-unknown-argument-mno-fused-madd-psycopg2-installation-failure > > The suggested workaround seems to do the trick, > > $ export CFLAGS=-Qunused-arguments > $ export CPPFLAGS=-Qunused-arguments > > Perhaps we can add this hack to our setup.py on Mac OS X... > it seems harmless under gcc (e.g. my locally compiled version > of Python 3.3 used gcc rather than clang)? > > Or it could be done via the buildbot setup, or on this buildslave > directly (e.g. the ~/.bash_profile). > > What are folks' thoughts on this? We want it to remain easy > to install Biopython from source under Mac OS X. > > Peter That?s scary. I planned on updating to the latest Xcode myself today, nice to be forewarned. I?ve been seeing clang complaints with various tools already, so I wouldn?t be surprised if this problem is more wide-spread than python. chris From tra at popgen.net Wed Mar 12 11:10:48 2014 From: tra at popgen.net (Tiago Antao) Date: Wed, 12 Mar 2014 15:10:48 +0000 Subject: [Biopython-dev] A biopython buildbot as a docker container Message-ID: <20140312151048.45066ade@lnx> Hi, I have a docker container ready (save for a few applications). Simple usage instructions: 1. Create a directory and download inside this file: https://raw.github.com/tiagoantao/my-containers/master/Biopython-Test 2. Rename it Dockerfile (capital D) 3. Get a buildbot username and password (from Peter or me), edit the file and replace CHANGEUSER CHANGEPASS 4. do docker build -t biopython-buildbot . 5. do docker run biopython-buildbot Beta-version, comments appreciated ;) If people like this, I will amend the Continuous Integration page on the wiki accordingly Tiago From eparker at ucdavis.edu Wed Mar 12 20:06:51 2014 From: eparker at ucdavis.edu (Evan Parker) Date: Wed, 12 Mar 2014 17:06:51 -0700 Subject: [Biopython-dev] Gsoc 2014: another aspirant here Message-ID: Hello, My name is Evan Parker, I am a third year graduate student studying analytical chemistry at UC Davis. Coding was my hobby in undergrad and has become a major component of my current graduate work in the context of mass-spectral interpretation software. I use Biopython for parsing Uniprot sequence data/annotations and I would be delighted to have the opportunity give back, especially under the umbrella of the Google Summer of Code. The project on implementing an indexing & lazy-loading sequence parser looks interesting to me and, while difficult, it is something that I could wrap my mind around. I apologize in advance for the wall of text but if you have the time I'd like to ask a couple of questions relating to implementation as I prepare my proposal. 1) Should the lazy loading be done primarily in the context of records returned from the SeqIO.index() dict-like object, or should the lazy loading be available to the generator made by SeqIO.parse()? The project idea in the wiki mentions adding lazy-loading to SeqIO.parse() but it seems to me that the best implementation of lazy loading in these two SeqIO functions would be significantly different. My initial impression of the project would be for SeqIO.parse() to stage a file segment and selectively generate information when called while SeqIO.index() would use a more detailed map created at instantiation to pull information selectively. 2) Is slower instantiation an acceptable trade-off for memory efficiency? In the current implementation of SeqIO.index(), sequence files are read twice, once to annotate beginning points of entries and a second time to load the SeqRecord requested by __getitem__(). A lazy-loading parser could amplify this issue if it works by indexing locations other than the start of the record. The alternative approach of passing the complete textual sequence record and selectively parsing would be easier to implement (and would include dual compatibility with parse and index) but it seems that it would be slower when called and potentially less memory efficient. Any of your thoughts and comments are appreciated, - Evan From w.arindrarto at gmail.com Thu Mar 13 05:04:16 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 13 Mar 2014 10:04:16 +0100 Subject: [Biopython-dev] Gsoc 2014: another aspirant here In-Reply-To: References: Message-ID: Hi Evan, Thank you for your interest in the project :). It's good to know you're already quite familiar with SeqIO as well. My replies are below. > 1) Should the lazy loading be done primarily in the context of records > returned from the SeqIO.index() dict-like object, or should the lazy > loading be available to the generator made by SeqIO.parse()? The project > idea in the wiki mentions adding lazy-loading to SeqIO.parse() but it seems > to me that the best implementation of lazy loading in these two SeqIO > functions would be significantly different. My initial impression of the > project would be for SeqIO.parse() to stage a file segment and selectively > generate information when called while SeqIO.index() would use a more > detailed map created at instantiation to pull information selectively. We don't necessarily have to be restricted to SeqIO.index() objects here. You'll notice of course that SeqIO.index() indexes complete records without granularity up to the possible subsequences. What we're looking for is compatibility with our existing SeqIO parsers. The lazy parser may well be a new object implemented alongside SeqIO, but the parsing logic itself (the one whose invocation is delayed by the lazy parser) should rely on existing parsers. > 2) Is slower instantiation an acceptable trade-off for memory efficiency? > In the current implementation of SeqIO.index(), sequence files are read > twice, once to annotate beginning points of entries and a second time to > load the SeqRecord requested by __getitem__(). A lazy-loading parser could > amplify this issue if it works by indexing locations other than the start > of the record. The alternative approach of passing the complete textual > sequence record and selectively parsing would be easier to implement (and > would include dual compatibility with parse and index) but it seems that it > would be slower when called and potentially less memory efficient. I think this will depend on what you want to store in the indices and how you store them, which will most likely differ per sequencing file format. Coming up with this, we expect, is an important part of the project implementation. Doing a first pass for indexing is acceptable. Instantiation of the object using the index doesn't necessarily have to be slow. Retrieval of the actual (sub)sequence will be slower since we will touch the disk and do the actual parsing by then. But this can also be improved, perhaps by caching the result so subsequent retrieval is faster. One important point (and the use case that we envision for this project) is that subsequences in large sequence files (genome assemblies, for example) can be retrieved quite quickly. Take a look at some existing indexing implementations, such as faidx[1] for FASTA files and BAM indexing[2]. Looking at the tabix[3] tool may also help. The faidx indexing, for example, relies on the FASTA file having the same line length, which means it can be used to retrieve subsequences given only the file offset of a FASTA record. Hope this gives you some useful hints. Good luck with your proposal :). Cheers, Bow [1] http://samtools.sourceforge.net/samtools.shtml [2] http://samtools.github.io/hts-specs/SAMv1.pdf [3] http://bioinformatics.oxfordjournals.org/content/27/5/718 From eparker at ucdavis.edu Thu Mar 13 15:04:34 2014 From: eparker at ucdavis.edu (Evan Parker) Date: Thu, 13 Mar 2014 12:04:34 -0700 Subject: [Biopython-dev] Gsoc 2014: another aspirant here In-Reply-To: References: Message-ID: Thank you Bow, I'll need to digest this a bit, but you have given me a good direction. My inclination for the proposal is to focus on sequential file formats used to transmit 'databases' of sequences (like fasta, embl, uniprot-xml, swiss, and others) and to mostly ignore formats used to convey alignment (ie. anything covered exclusively by parsers in AlignIO). If this is a poor direction please tell me so that I can add to my preparation. -Evan Evan Parker Ph.D. Candidate Dept. of Chemistry - Lebrilla Lab University of California, Davis On Thu, Mar 13, 2014 at 2:04 AM, Wibowo Arindrarto wrote: > Hi Evan, > > Thank you for your interest in the project :). It's good to know > you're already quite familiar with SeqIO as well. > > My replies are below. > > > 1) Should the lazy loading be done primarily in the context of records > > returned from the SeqIO.index() dict-like object, or should the lazy > > loading be available to the generator made by SeqIO.parse()? The project > > idea in the wiki mentions adding lazy-loading to SeqIO.parse() but it > seems > > to me that the best implementation of lazy loading in these two SeqIO > > functions would be significantly different. My initial impression of the > > project would be for SeqIO.parse() to stage a file segment and > selectively > > generate information when called while SeqIO.index() would use a more > > detailed map created at instantiation to pull information selectively. > > We don't necessarily have to be restricted to SeqIO.index() objects > here. You'll notice of course that SeqIO.index() indexes complete > records without granularity up to the possible subsequences. What > we're looking for is compatibility with our existing SeqIO parsers. > The lazy parser may well be a new object implemented alongside SeqIO, > but the parsing logic itself (the one whose invocation is delayed by > the lazy parser) should rely on existing parsers. > > > 2) Is slower instantiation an acceptable trade-off for memory efficiency? > > In the current implementation of SeqIO.index(), sequence files are read > > twice, once to annotate beginning points of entries and a second time to > > load the SeqRecord requested by __getitem__(). A lazy-loading parser > could > > amplify this issue if it works by indexing locations other than the start > > of the record. The alternative approach of passing the complete textual > > sequence record and selectively parsing would be easier to implement (and > > would include dual compatibility with parse and index) but it seems that > it > > would be slower when called and potentially less memory efficient. > > I think this will depend on what you want to store in the indices and > how you store them, which will most likely differ per sequencing file > format. Coming up with this, we expect, is an important part of the > project implementation. Doing a first pass for indexing is acceptable. > Instantiation of the object using the index doesn't necessarily have > to be slow. Retrieval of the actual (sub)sequence will be slower since > we will touch the disk and do the actual parsing by then. But this can > also be improved, perhaps by caching the result so subsequent > retrieval is faster. One important point (and the use case that we > envision for this project) is that subsequences in large sequence > files (genome assemblies, for example) can be retrieved quite quickly. > > Take a look at some existing indexing implementations, such as > faidx[1] for FASTA files and BAM indexing[2]. Looking at the tabix[3] > tool may also help. The faidx indexing, for example, relies on the > FASTA file having the same line length, which means it can be used to > retrieve subsequences given only the file offset of a FASTA record. > > Hope this gives you some useful hints. Good luck with your proposal :). > > Cheers, > Bow > > [1] http://samtools.sourceforge.net/samtools.shtml > [2] http://samtools.github.io/hts-specs/SAMv1.pdf > [3] http://bioinformatics.oxfordjournals.org/content/27/5/718 > From w.arindrarto at gmail.com Fri Mar 14 01:30:13 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 14 Mar 2014 06:30:13 +0100 Subject: [Biopython-dev] Gsoc 2014: another aspirant here In-Reply-To: References: Message-ID: Hi Evan, Focusing on the SeqIO parsers is ok. That's where having lazy parsers would help most (and you've got a handful of formats there already). Remember that you'll also need to account for time to write tests, possibly benchmark or profile the code (lazy parsers should improve performance after all), and write documentation, outside of writing the code itself. You'll also want to be clear about this in your proposed timeline, since that will be your main guide during the coding period. Looking forward to reading your proposal :), Bow On Thu, Mar 13, 2014 at 8:04 PM, Evan Parker wrote: > Thank you Bow, > > I'll need to digest this a bit, but you have given me a good direction. My > inclination for the proposal is to focus on sequential file formats used to > transmit 'databases' of sequences (like fasta, embl, uniprot-xml, swiss, and > others) and to mostly ignore formats used to convey alignment (ie. anything > covered exclusively by parsers in AlignIO). If this is a poor direction > please tell me so that I can add to my preparation. > > -Evan > > Evan Parker > Ph.D. Candidate > Dept. of Chemistry - Lebrilla Lab > University of California, Davis > > > On Thu, Mar 13, 2014 at 2:04 AM, Wibowo Arindrarto > wrote: >> >> Hi Evan, >> >> Thank you for your interest in the project :). It's good to know >> you're already quite familiar with SeqIO as well. >> >> My replies are below. >> >> > 1) Should the lazy loading be done primarily in the context of records >> > returned from the SeqIO.index() dict-like object, or should the lazy >> > loading be available to the generator made by SeqIO.parse()? The project >> > idea in the wiki mentions adding lazy-loading to SeqIO.parse() but it >> > seems >> > to me that the best implementation of lazy loading in these two SeqIO >> > functions would be significantly different. My initial impression of the >> > project would be for SeqIO.parse() to stage a file segment and >> > selectively >> > generate information when called while SeqIO.index() would use a more >> > detailed map created at instantiation to pull information selectively. >> >> We don't necessarily have to be restricted to SeqIO.index() objects >> here. You'll notice of course that SeqIO.index() indexes complete >> records without granularity up to the possible subsequences. What >> we're looking for is compatibility with our existing SeqIO parsers. >> The lazy parser may well be a new object implemented alongside SeqIO, >> but the parsing logic itself (the one whose invocation is delayed by >> the lazy parser) should rely on existing parsers. >> >> > 2) Is slower instantiation an acceptable trade-off for memory >> > efficiency? >> > In the current implementation of SeqIO.index(), sequence files are read >> > twice, once to annotate beginning points of entries and a second time to >> > load the SeqRecord requested by __getitem__(). A lazy-loading parser >> > could >> > amplify this issue if it works by indexing locations other than the >> > start >> > of the record. The alternative approach of passing the complete textual >> > sequence record and selectively parsing would be easier to implement >> > (and >> > would include dual compatibility with parse and index) but it seems that >> > it >> > would be slower when called and potentially less memory efficient. >> >> I think this will depend on what you want to store in the indices and >> how you store them, which will most likely differ per sequencing file >> format. Coming up with this, we expect, is an important part of the >> project implementation. Doing a first pass for indexing is acceptable. >> Instantiation of the object using the index doesn't necessarily have >> to be slow. Retrieval of the actual (sub)sequence will be slower since >> we will touch the disk and do the actual parsing by then. But this can >> also be improved, perhaps by caching the result so subsequent >> retrieval is faster. One important point (and the use case that we >> envision for this project) is that subsequences in large sequence >> files (genome assemblies, for example) can be retrieved quite quickly. >> >> Take a look at some existing indexing implementations, such as >> faidx[1] for FASTA files and BAM indexing[2]. Looking at the tabix[3] >> tool may also help. The faidx indexing, for example, relies on the >> FASTA file having the same line length, which means it can be used to >> retrieve subsequences given only the file offset of a FASTA record. >> >> Hope this gives you some useful hints. Good luck with your proposal :). >> >> Cheers, >> Bow >> >> [1] http://samtools.sourceforge.net/samtools.shtml >> [2] http://samtools.github.io/hts-specs/SAMv1.pdf >> [3] http://bioinformatics.oxfordjournals.org/content/27/5/718 > > From p.j.a.cock at googlemail.com Fri Mar 14 09:34:40 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 14 Mar 2014 13:34:40 +0000 Subject: [Biopython-dev] Gsoc 2014: another aspirant here In-Reply-To: References:

Message-ID: On Fri, Mar 14, 2014 at 5:30 AM, Wibowo Arindrarto wrote: > Hi Evan, > > Focusing on the SeqIO parsers is ok. That's where having lazy parsers > would help most (and you've got a handful of formats there already). > Remember that you'll also need to account for time to write tests, > possibly benchmark or profile the code (lazy parsers should improve > performance after all), and write documentation, outside of writing > the code itself. You'll also want to be clear about this in your > proposed timeline, since that will be your main guide during the > coding period. > > Looking forward to reading your proposal :), > Bow Yes, profiling will be important here - if your script accesses all the annotation/sequence/etc of a record, then the lazy parser will probably be slower (all the same work, plus an overhead). It should win when only a subset of the data is needed, both in terms of speed and memory usage. Peter From eric.talevich at gmail.com Sat Mar 15 01:29:21 2014 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 14 Mar 2014 22:29:21 -0700 Subject: [Biopython-dev] Google Summer of Code 2014: Call for student applications Message-ID: Hi everyone, Google Summer of Code is an annual program that funds students all over the world to work with open-source software projects to develop new code. This summer, the Open Bioinformatics Foundation (OBF) is taking on students through the Google Summer of Code program to work with mentors on established bioinformatics software projects including BioPython. We invite students to submit applications by Friday, March 21. Full details are here: http://news.open-bio.org/news/2014/03/obf-gsoc-2014-call-for-student-applications/ All the best, Eric & Raoul OBF GSoC organization admins From arklenna at gmail.com Sun Mar 16 16:53:22 2014 From: arklenna at gmail.com (Lenna Peterson) Date: Sun, 16 Mar 2014 16:53:22 -0400 Subject: [Biopython-dev] Problem on Mac OS X since update Xcode 5.1 In-Reply-To: References:

Message-ID: On Wed, Mar 12, 2014 at 5:59 AM, Peter Cock wrote: > On Wed, Mar 12, 2014 at 9:48 AM, Peter Cock > wrote: > > Hi all, > > > > I installed the Xcode 5.1 update last night on my work Mac, and > > this seems to have broken the builds on Python 2.6 and 2.7 > > (run via builtbot). > > > > > http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.6/builds/174/steps/compile/logs/stdio > > ... > > running build_ext > > building 'Bio.cpairwise2' extension > > creating build/temp.macosx-10.9-intel-2.6 > > creating build/temp.macosx-10.9-intel-2.6/Bio > > cc -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common > > -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX > > -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g > > -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 > > -arch i386 -pipe > > > -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 > > -c Bio/cpairwise2module.c -o > > build/temp.macosx-10.9-intel-2.6/Bio/cpairwise2module.o > > clang: error: unknown argument: '-mno-fused-madd' > > [-Wunused-command-line-argument-hard-error-in-future] > > clang: note: this will be a hard error (cannot be downgraded to a > > warning) in the future > > error: command 'cc' failed with exit status 1 > > > > > http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.7/builds/170/steps/compile/logs/stdio > > ... > > running build_ext > > building 'Bio.cpairwise2' extension > > creating build/temp.macosx-10.9-intel-2.7 > > creating build/temp.macosx-10.9-intel-2.7/Bio > > cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 > > -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd > > -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes > > -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes > > -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe > > > -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 > > -c Bio/cpairwise2module.c -o > > build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o > > clang: error: unknown argument: '-mno-fused-madd' > > [-Wunused-command-line-argument-hard-error-in-future] > > clang: note: this will be a hard error (cannot be downgraded to a > > warning) in the future > > error: command 'cc' failed with exit status 1 > > > > This looks like a problem where distutils is using a gcc argument > > which cc (clang) used to ignore but not treats as an error. There > > will probably be similar reports on other Python projects as well... > > > > Peter > > This looks relevant, especially this reply from Paul Kehrer which > suggests this is entirely Apple's fault for shipping a Python and > clang compiler which don't get along with the default settings: > > > http://stackoverflow.com/questions/22313407/clang-error-unknown-argument-mno-fused-madd-psycopg2-installation-failure > > The suggested workaround seems to do the trick, > > $ export CFLAGS=-Qunused-arguments > $ export CPPFLAGS=-Qunused-arguments > > I encountered the same problem (clean install of Mavericks, vanilla Python, latest XCode from App Store). One answer [1] suggests this is not a guaranteed solution but offers a different flag (which I did not test). I chose to edit system python files [2] which is definitely not the best option for most users. [1]: http://stackoverflow.com/a/22315129 [2]: http://stackoverflow.com/a/22322068 > Perhaps we can add this hack to our setup.py on Mac OS X... > it seems harmless under gcc (e.g. my locally compiled version > of Python 3.3 used gcc rather than clang)? > Do you mean editing environment variables with `os.environ`? I don't know enough about the details of how packages are built to know what will work with both compiling from source, easy_install, pip, etc. > > Or it could be done via the buildbot setup, or on this buildslave > directly (e.g. the ~/.bash_profile). > It's a dilemma, because asking users to edit their .bashrc or .bash_profile before installation is annoying and easy to overlook, but modifying them in setup.py feels hacky (i.e. how long will this solution work?). Crossing my fingers and hoping Apple fixes this in an update... > > What are folks' thoughts on this? We want it to remain easy > to install Biopython from source under Mac OS X. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From p.j.a.cock at googlemail.com Sun Mar 16 17:15:06 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 16 Mar 2014 21:15:06 +0000 Subject: [Biopython-dev] Problem on Mac OS X since update Xcode 5.1 In-Reply-To: References:

Message-ID: On Sun, Mar 16, 2014 at 8:53 PM, Lenna Peterson wrote: > On Wed, Mar 12, 2014 at 5:59 AM, Peter Cock > wrote: >> >> ... >> >> This looks relevant, especially this reply from Paul Kehrer which >> suggests this is entirely Apple's fault for shipping a Python and >> clang compiler which don't get along with the default settings: >> >> >> http://stackoverflow.com/questions/22313407/clang-error-unknown-argument-mno-fused-madd-psycopg2-installation-failure >> > >> The suggested workaround seems to do the trick, >> >> $ export CFLAGS=-Qunused-arguments >> $ export CPPFLAGS=-Qunused-arguments >> > > I encountered the same problem (clean install of Mavericks, vanilla Python, > latest XCode from App Store). > > One answer [1] suggests this is not a guaranteed solution but offers a > different flag (which I did not test). > > I chose to edit system python files [2] which is definitely not the best > option for most users. > > [1]: http://stackoverflow.com/a/22315129 > [2]: http://stackoverflow.com/a/22322068 > >> Perhaps we can add this hack to our setup.py on Mac OS X... >> it seems harmless under gcc (e.g. my locally compiled version >> of Python 3.3 used gcc rather than clang)? > > Do you mean editing environment variables with `os.environ`? I don't know > enough about the details of how packages are built to know what will work > with both compiling from source, easy_install, pip, etc. Yes, I was thinking about editing the environment variables in setup.py via the os module. I agree there are potential risks with 3rd party installers, but adding -Qunused-arguments to any existing CFLAGS (within the scope of the Biopython install) is hopefully low risk... >> Or it could be done via the buildbot setup, or on this buildslave >> directly (e.g. the ~/.bash_profile). > > It's a dilemma, because asking users to edit their .bashrc or .bash_profile > before installation is annoying and easy to overlook, but modifying them in > setup.py feels hacky (i.e. how long will this solution work?). Crossing my > fingers and hoping Apple fixes this in an update... > Fingers crossed Apple pushes another update in the next few weeks to resolve this... Peter From anaryin at gmail.com Mon Mar 17 12:05:04 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 17 Mar 2014 17:05:04 +0100 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References:

Message-ID: Dear all, I created a new 'empty' branch called bio.struct on my github account. The only change from the master branch is a new folder - Bio/Struct - that has an empty __init__.py in there. Please add issues with feature requests, and if you are willing to start coding, I'd say fork and go ahead! https://github.com/JoaoRodrigues/biopython/tree/bio.struct I also added a small wiki page with a description. 2014-02-20 0:05 GMT+01:00 Morten Kjeldgaard : > > On 19/02/2014, at 17:35, David Cain wrote: > > > I frequently make use of Bio.PDB, and agree wholeheartedly that certain > > aspects of it are very dated, or haphazardly organized. > > > > The module as a whole would benefit greatly from some extra attention. > I'm > > happy to lend a hand in whatever revamp takes place. > > I second that. I am also willing to participate in this project! > > Cheers, > Morten > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From p.j.a.cock at googlemail.com Mon Mar 17 12:15:06 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 17 Mar 2014 16:15:06 +0000 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References:

Message-ID: On Mon, Mar 17, 2014 at 4:05 PM, Jo?o Rodrigues wrote: > Dear all, > > I created a new 'empty' branch called bio.struct on my github account. The > only change from the master branch is a new folder - Bio/Struct - that has > an empty __init__.py in there. Please add issues with feature requests, and > if you are willing to start coding, I'd say fork and go ahead! > > https://github.com/JoaoRodrigues/biopython/tree/bio.struct > > I also added a small wiki page with a > description. Are we all generally in favour of lower case for new module names (as per PEP8)? i.e. Bio/struct not Bio/Struct ? Peter From anaryin at gmail.com Mon Mar 17 12:19:31 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 17 Mar 2014 17:19:31 +0100 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References:

Message-ID: Hello Peter, Sorry, typo actually, wrote with small case everywhere but the module name.. thanks. Also something I have in mind. Should wrappers for NACCESS and DSSP be refactored to use Bio.Application? From p.j.a.cock at googlemail.com Mon Mar 17 12:32:30 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 17 Mar 2014 16:32:30 +0000 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References:

Message-ID: On Mon, Mar 17, 2014 at 4:19 PM, Jo?o Rodrigues wrote: > Hello Peter, > > Sorry, typo actually, wrote with small case everywhere but the module name.. > thanks. > > Also something I have in mind. Should wrappers for NACCESS and > DSSP be refactored to use Bio.Application? If you think it would help, sure. Peter From anaryin at gmail.com Mon Mar 17 12:33:55 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 17 Mar 2014 17:33:55 +0100 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References:

Message-ID: To be honest, more of an issue of internal consistency with the rest of the code base. I'd have to look into it more carefully to see if it fits.. 2014-03-17 17:32 GMT+01:00 Peter Cock : > On Mon, Mar 17, 2014 at 4:19 PM, Jo?o Rodrigues wrote: > > Hello Peter, > > > > Sorry, typo actually, wrote with small case everywhere but the module > name.. > > thanks. > > > > Also something I have in mind. Should wrappers for NACCESS and > > DSSP be refactored to use Bio.Application? > > If you think it would help, sure. > > Peter > From tra at popgen.net Mon Mar 17 12:53:52 2014 From: tra at popgen.net (Tiago Antao) Date: Mon, 17 Mar 2014 16:53:52 +0000 Subject: [Biopython-dev] Dialign2 testing... Message-ID: <20140317165352.36db07ee@lnx> Hi, Still on the quest for a test run that actually runs all the tests. Can someone suggest what would be a sensible value for DIALIGN2-DIR? It seems that setting up the test is not trivial: there seems to be a need a BLOSUM file inside the dialign directory? Any clues would be appreciated... From p.j.a.cock at googlemail.com Mon Mar 17 14:35:25 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 17 Mar 2014 18:35:25 +0000 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References: Message-ID: Hi all, Bow (regarding SearchIO) others should probably read this... I've commented, see also: http://blastedbio.blogspot.co.uk/2014/02/blast-xml-output-needs-more-love-from.html Peter ---------- Forwarded message ---------- From: Maloney, Christopher (NIH/NLM/NCBI) [C] Date: Mon, Mar 17, 2014 at 5:17 PM Subject: [Open-bio-l] Proposed BLAST XML Changes To: "open-bio-l at lists.open-bio.org" We are not directly soliciting comments, but if anyone would like to make any technical or programmatic suggestions, there is a link from which anyone may comment in the document. ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/ProposedBLASTXMLChanges.pdf Thank you. P.S. Please re-post this to other lists that might have interested readers. Chris Maloney NIH/NLM/NCBI (Contractor) Building 45, 5AN.24D-22 301-594-2842 _______________________________________________ Open-Bio-l mailing list Open-Bio-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/open-bio-l From kirkolem at gmail.com Mon Mar 17 16:07:26 2014 From: kirkolem at gmail.com (Dan K.) Date: Tue, 18 Mar 2014 00:07:26 +0400 Subject: [Biopython-dev] GSoC 2014 Message-ID: Hello! I'm a third year student learning bioengineering and bioinformatics and I'm interested in participating in GSoC and contributing to the BioPython project. In particular, working on multiple alignments and scoring matrices (PSSMs) seems interesting to me. For example, I find convenient to implement Gerstein-Sonnhammer-Chothia and Henikoff weighting procedures. What do you think? What my next steps should be like? Thank you for your attention. From ben at benfulton.net Mon Mar 17 20:38:17 2014 From: ben at benfulton.net (Ben Fulton) Date: Mon, 17 Mar 2014 20:38:17 -0400 Subject: [Biopython-dev] Dialign2 testing... In-Reply-To: <20140317165352.36db07ee@lnx> References: <20140317165352.36db07ee@lnx> Message-ID: I looked at that last year. As far as I could tell the actual code didn't do anything useful with that value; I removed the precondition checks from the tests and it ran properly. On Mon, Mar 17, 2014 at 12:53 PM, Tiago Antao wrote: > Hi, > > Still on the quest for a test run that actually runs all the tests. > > Can someone suggest what would be a sensible value for DIALIGN2-DIR? > It seems that setting up the test is not trivial: there seems to be a > need a BLOSUM file inside the dialign directory? > > Any clues would be appreciated... > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From harsh.beria93 at gmail.com Mon Mar 17 20:39:25 2014 From: harsh.beria93 at gmail.com (Harsh Beria) Date: Tue, 18 Mar 2014 06:09:25 +0530 Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> References: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: Hi, I have started to write a proposal for a project on pair wise sequence alignment. Is there anyone interested in mentoring the project so that I can discuss some of the algorithmic problems in detail? Also, do I need to add the project to the ideas page as it is not there yet? Thanks On Mar 6, 2014 7:19 AM, "Michiel de Hoon" wrote: > Hi Nigel, > > While compiling Biopython on Windows can be tricky, in my experience it > has been easy to compile the C libraries in Biopython on other platforms > (Unix/Linux/MacOSX). Have you run into specific problems compiling > Biopython? I would think that wrapping 3rd-party libraries or executables > is much more error-prone. > > Best, > -Michiel. > > -------------------------------------------- > On Tue, 3/4/14, Nigel Delaney wrote: > > Subject: Re: [Biopython-dev] [GSoC] Gsoc 2014 aspirant > To: "'Peter Cock'" , "'Fields, Christopher > J'" > Cc: biopython-dev at lists.open-bio.org, "'Harsh Beria'" < > harsh.beria93 at gmail.com> > Date: Tuesday, March 4, 2014, 5:39 PM > > As a quick $0.02 from a library user > on this. Back in 2006 when I first > started using biopython I was working with Bio.pairwise2, > and learned that > it was too slow for the task at hand, so wound up switching > to > passing/parsing files to an aligner on the command line to > get around this, > as (on windows back then), I never got the compiled C code > to work properly. > Although pypy may be fast enough and has been very > impressive in my > experience (plus computers are much faster now), I think > wrapping a library > that is cross platform and maintains its own binaries would > be a great > option rather than implementing C-code (I think this might > be what the GSoC > student did last year for BioPython by wrapping Python). > > Mostly though I just wanted to second Peter's wariness about > adding C-code > into the library. I have found over the years that a > lot of python > scientific tools that in theory should be cross platform > (Stampy, IPython, > Matplotlib, Numpy, GATK, etc.) are really not and can be a > huge timesuck of > dealing with installation issues as code moves between > computers and > operating systems, usually due to some C code or OS specific > behavior. > Since code in python works anywhere python is installed, I > really appreciate > the extent that the library can be as much pure python as > allowable or > strictly dependent on a particular downloadable binary for a > specific > OS/Architecture/Scenario. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > From w.arindrarto at gmail.com Tue Mar 18 05:52:29 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 18 Mar 2014 10:52:29 +0100 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References:

Message-ID: Hi Peter, everyone, Thanks for the heads up. If implemented as it is, the updates will change our underlying SearchIO model (aside from the blast-xml parser itself), by allowing a Hit retrieval using multiple different keys. I have a feeling it will be difficult to jam all the new changes into a backwards-compatible parser. One way to make it transparent to users is to use the underlying DTD to do validation before parsing (for the two BLAST DTDs, use the one which the file can be validated against). However, this comes at a price. Since the standard library-bundled elementtree doesn't seem to support validation, we have to use another library (lxml is my choice). This means adding 3rd party dependency which require compiling (lxml is also partly written in C). The other option is to introduce a new format name (e.g. 'blast-xml2'), which makes the user responsible for knowing which BLAST XML he/she is parsing. It feels more explicit this way, so I am leaning towards this option, despite 'blast-xml2' not sounding very nice to me ;). Any other thoughts? Best, Bow On Mon, Mar 17, 2014 at 7:35 PM, Peter Cock wrote: > Hi all, > > Bow (regarding SearchIO) others should probably read this... > > I've commented, see also: > http://blastedbio.blogspot.co.uk/2014/02/blast-xml-output-needs-more-love-from.html > > Peter > > > ---------- Forwarded message ---------- > From: Maloney, Christopher (NIH/NLM/NCBI) [C] > Date: Mon, Mar 17, 2014 at 5:17 PM > Subject: [Open-bio-l] Proposed BLAST XML Changes > To: "open-bio-l at lists.open-bio.org" > > > We are not directly soliciting comments, but if anyone would like to > make any technical or programmatic suggestions, there is a link from > which anyone may comment in the document. > > ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/ProposedBLASTXMLChanges.pdf > > Thank you. > > > P.S. Please re-post this to other lists that might have interested readers. > > Chris Maloney > NIH/NLM/NCBI (Contractor) > Building 45, 5AN.24D-22 > 301-594-2842 > > > _______________________________________________ > Open-Bio-l mailing list > Open-Bio-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/open-bio-l > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Tue Mar 18 06:17:48 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 10:17:48 +0000 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References:

Message-ID: On Tue, Mar 18, 2014 at 9:52 AM, Wibowo Arindrarto wrote: > Hi Peter, everyone, > > Thanks for the heads up. If implemented as it is, the updates will > change our underlying SearchIO model (aside from the blast-xml parser > itself), by allowing a Hit retrieval using multiple different keys. Could you clarify what you mean by multiple keys here? > I have a feeling it will be difficult to jam all the new changes into > a backwards-compatible parser. One way to make it transparent to users > is to use the underlying DTD to do validation before parsing (for the > two BLAST DTDs, use the one which the file can be validated against). > However, this comes at a price. Since the standard library-bundled > elementtree doesn't seem to support validation, we have to use another > library (lxml is my choice). This means adding 3rd party dependency > which require compiling (lxml is also partly written in C). We can probably tell by sniffing the first few lines... but how to do that without using a handle seek to rewind may be tricky (desirable to support parsing streams, e.g. stdin). > The other option is to introduce a new format name (e.g. > 'blast-xml2'), which makes the user responsible for knowing which > BLAST XML he/she is parsing. It feels more explicit this way, so I am > leaning towards this option, despite 'blast-xml2' not sounding very > nice to me ;). > > Any other thoughts? > > Best, > Bow I agree for the SearchIO interface, two format names makes sense - unless there is a neat way to auto-detect this on input. Using "blast-xml2" would work, or maybe something like "blast-xml-2014" (too long?). We could even go for "blast-xml-old" and "blast-xml" perhaps? Peter From w.arindrarto at gmail.com Tue Mar 18 06:33:55 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 18 Mar 2014 11:33:55 +0100 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References:

Message-ID: On Tue, Mar 18, 2014 at 11:17 AM, Peter Cock wrote: > On Tue, Mar 18, 2014 at 9:52 AM, Wibowo Arindrarto > wrote: >> Hi Peter, everyone, >> >> Thanks for the heads up. If implemented as it is, the updates will >> change our underlying SearchIO model (aside from the blast-xml parser >> itself), by allowing a Hit retrieval using multiple different keys. > > Could you clarify what you mean by multiple keys here? Currently, we can retrieve hits from a query using its ID, aside from its numeric index. With their proposed changes to the Hit element here: ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/ProposedBLASTXMLChanges.pdf, it means that a given Hit can now be annotated with more than one ID. Ideally, this should also be reflected in the QueryResult object: a hit item should be retrievable using any of the IDs it has. This will also affect membership checking on the QueryResult object. >> I have a feeling it will be difficult to jam all the new changes into >> a backwards-compatible parser. One way to make it transparent to users >> is to use the underlying DTD to do validation before parsing (for the >> two BLAST DTDs, use the one which the file can be validated against). >> However, this comes at a price. Since the standard library-bundled >> elementtree doesn't seem to support validation, we have to use another >> library (lxml is my choice). This means adding 3rd party dependency >> which require compiling (lxml is also partly written in C). > > We can probably tell by sniffing the first few lines... but how > to do that without using a handle seek to rewind may be > tricky (desirable to support parsing streams, e.g. stdin). Ah yes. We have a rewindable file seek object in Bio.File, don't we :)? I'll have to play around with some real datasets first, I think. The other thing we should take into account is the Xinclude tag. Would we want to make it possible to query *either* the single query XML results or the master Xinclude document (point 2 of the proposed change)? Or should we restrict our parser only to the single query files? >> The other option is to introduce a new format name (e.g. >> 'blast-xml2'), which makes the user responsible for knowing which >> BLAST XML he/she is parsing. It feels more explicit this way, so I am >> leaning towards this option, despite 'blast-xml2' not sounding very >> nice to me ;). >> >> Any other thoughts? >> >> Best, >> Bow > > I agree for the SearchIO interface, two format names makes > sense - unless there is a neat way to auto-detect this on input. > > Using "blast-xml2" would work, or maybe something like > "blast-xml-2014" (too long?). > > We could even go for "blast-xml-old" and "blast-xml" perhaps? Hmm..'blast-xml-old', may make it difficult to adapt for future XML schema changes. How about renaming the current parser to 'blast-xml-legacy', and the new one to just 'blast-xml'? Cheers, Bow From p.j.a.cock at googlemail.com Tue Mar 18 06:38:53 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 10:38:53 +0000 Subject: [Biopython-dev] Dialign2 testing... In-Reply-To: References: <20140317165352.36db07ee@lnx> Message-ID: Hi Tiago, Ben, >From memory if the environment variable was not set, the command line tool would fail in a strange way - so I made the test conditional on having the variable set. Perhaps things have changed slightly since 2009, https://github.com/biopython/biopython/commit/d4ea47e27f3a8aa7ebe1460b0d96c3135f6bfba5 or maybe this depends on how dialign2 is installed... possibly the Linux packages didn't exist back then? Peter On Tue, Mar 18, 2014 at 12:38 AM, Ben Fulton wrote: > I looked at that last year. As far as I could tell the actual code didn't > do anything useful with that value; I removed the precondition checks from > the tests and it ran properly. > > On Mon, Mar 17, 2014 at 12:53 PM, Tiago Antao wrote: > >> Hi, >> >> Still on the quest for a test run that actually runs all the tests. >> >> Can someone suggest what would be a sensible value for DIALIGN2-DIR? >> It seems that setting up the test is not trivial: there seems to be a >> need a BLOSUM file inside the dialign directory? >> >> Any clues would be appreciated... >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Tue Mar 18 06:58:06 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 10:58:06 +0000 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References:

Message-ID: On Tue, Mar 18, 2014 at 10:33 AM, Wibowo Arindrarto wrote: > On Tue, Mar 18, 2014 at 11:17 AM, Peter Cock wrote: >> On Tue, Mar 18, 2014 at 9:52 AM, Wibowo Arindrarto >> wrote: >>> Hi Peter, everyone, >>> >>> Thanks for the heads up. If implemented as it is, the updates will >>> change our underlying SearchIO model (aside from the blast-xml parser >>> itself), by allowing a Hit retrieval using multiple different keys. >> >> Could you clarify what you mean by multiple keys here? > > Currently, we can retrieve hits from a query using its ID, aside from > its numeric index. With their proposed changes to the Hit element > here: ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/ProposedBLASTXMLChanges.pdf, > it means that a given Hit can now be annotated with more than one ID. But this happens already in the current output from merged entries in databases like NR - we effectively use the first alternative ID as the hit ID. See for example the nasty > separated entries in the legacy BLAST XML's tag where only the first ID appears in the tag: http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions.html See also the new optional fields in the tabular output which explicitly list all the aliases for the merge record (e.g. sallseqid). > Ideally, this should also be reflected in the QueryResult object: a > hit item should be retrievable using any of the IDs it has. > > This will also affect membership checking on the QueryResult object. This looks like something we should review anyway, regardless of the new BLAST XML format. >>> I have a feeling it will be difficult to jam all the new changes into >>> a backwards-compatible parser. One way to make it transparent to users >>> is to use the underlying DTD to do validation before parsing (for the >>> two BLAST DTDs, use the one which the file can be validated against). >>> However, this comes at a price. Since the standard library-bundled >>> elementtree doesn't seem to support validation, we have to use another >>> library (lxml is my choice). This means adding 3rd party dependency >>> which require compiling (lxml is also partly written in C). >> >> We can probably tell by sniffing the first few lines... but how >> to do that without using a handle seek to rewind may be >> tricky (desirable to support parsing streams, e.g. stdin). > > Ah yes. We have a rewindable file seek object in Bio.File, don't we > :)? I'll have to play around with some real datasets first, I think. Yes, the UndoHandle in Bio.File might be the best solution here for auto-detection. But two explicit formats is probably better. > The other thing we should take into account is the Xinclude tag. Would > we want to make it possible to query *either* the single query XML > results or the master Xinclude document (point 2 of the proposed > change)? Or should we restrict our parser only to the single query > files? I think single files is a reasonable restriction... assuming BLAST will still have the option of producing a big multi-query XML? Probably we should ask the NCBI about that... I would hope the Bio.SearchIO.index_db(...) approach could be used on a colloection of little XML files, one for each query. >>> The other option is to introduce a new format name (e.g. >>> 'blast-xml2'), which makes the user responsible for knowing which >>> BLAST XML he/she is parsing. It feels more explicit this way, so I am >>> leaning towards this option, despite 'blast-xml2' not sounding very >>> nice to me ;). >>> >>> Any other thoughts? >>> >>> Best, >>> Bow >> >> I agree for the SearchIO interface, two format names makes >> sense - unless there is a neat way to auto-detect this on input. >> >> Using "blast-xml2" would work, or maybe something like >> "blast-xml-2014" (too long?). >> >> We could even go for "blast-xml-old" and "blast-xml" perhaps? > > Hmm..'blast-xml-old', may make it difficult to adapt for future XML > schema changes. How about renaming the current parser to > 'blast-xml-legacy', and the new one to just 'blast-xml'? A possible downside of 'blast-xml-legacy' over 'blast-xml-old' is this may be confused with the "legacy" BLAST in C to the current BLAST+ in C++ move (which happened well before this XML format change). Peter From tra at popgen.net Tue Mar 18 07:22:15 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 18 Mar 2014 11:22:15 +0000 Subject: [Biopython-dev] Dialign2 testing... In-Reply-To: References: <20140317165352.36db07ee@lnx>

Message-ID: <20140318112215.4207072e@lnx> On Tue, 18 Mar 2014 10:38:53 +0000 Peter Cock wrote: > From memory if the environment variable was not > set, the command line tool would fail in a strange > way - so I made the test conditional on having the > variable set. I noticed that and created an environment variable, then I got stuck on the BLOSUM issue. Per Ben's suggestion, should be remove the check? Or should I use a non-standard package? Thanks, Tiago From w.arindrarto at gmail.com Tue Mar 18 07:48:56 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 18 Mar 2014 12:48:56 +0100 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References:

Message-ID: On Tue, Mar 18, 2014 at 11:58 AM, Peter Cock wrote: > On Tue, Mar 18, 2014 at 10:33 AM, Wibowo Arindrarto > wrote: >> On Tue, Mar 18, 2014 at 11:17 AM, Peter Cock wrote: >>> On Tue, Mar 18, 2014 at 9:52 AM, Wibowo Arindrarto >>> wrote: >>>> Hi Peter, everyone, >>>> >>>> Thanks for the heads up. If implemented as it is, the updates will >>>> change our underlying SearchIO model (aside from the blast-xml parser >>>> itself), by allowing a Hit retrieval using multiple different keys. >>> >>> Could you clarify what you mean by multiple keys here? >> >> Currently, we can retrieve hits from a query using its ID, aside from >> its numeric index. With their proposed changes to the Hit element >> here: ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/ProposedBLASTXMLChanges.pdf, >> it means that a given Hit can now be annotated with more than one ID. > > But this happens already in the current output from merged entries > in databases like NR - we effectively use the first alternative ID as > the hit ID. See for example the nasty > separated entries in > the legacy BLAST XML's tag where only the first ID > appears in the tag: > > http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions.html > > See also the new optional fields in the tabular output which > explicitly list all the aliases for the merge record (e.g. sallseqid). In the BLAST outputs, yes. However, there's no explicit support yet in SearchIO for this. Currently we only parse whatever is in as the ID and as the description. If the tag has is separated by semicolons / has more than one IDs, the current parser does not try to split it into multiple IDs. Instead it takes the whole string as the ID. Also, in the blast tabular format, even though sallseqid is parsed, it's merely stored as an attribute of the hit object, not something that can be used to retrieve Hits from the QueryResult object. >> Ideally, this should also be reflected in the QueryResult object: a >> hit item should be retrievable using any of the IDs it has. >> >> This will also affect membership checking on the QueryResult object. > > This looks like something we should review anyway, regardless > of the new BLAST XML format. Of course :). >>>> I have a feeling it will be difficult to jam all the new changes into >>>> a backwards-compatible parser. One way to make it transparent to users >>>> is to use the underlying DTD to do validation before parsing (for the >>>> two BLAST DTDs, use the one which the file can be validated against). >>>> However, this comes at a price. Since the standard library-bundled >>>> elementtree doesn't seem to support validation, we have to use another >>>> library (lxml is my choice). This means adding 3rd party dependency >>>> which require compiling (lxml is also partly written in C). >>> >>> We can probably tell by sniffing the first few lines... but how >>> to do that without using a handle seek to rewind may be >>> tricky (desirable to support parsing streams, e.g. stdin). >> >> Ah yes. We have a rewindable file seek object in Bio.File, don't we >> :)? I'll have to play around with some real datasets first, I think. > > Yes, the UndoHandle in Bio.File might be the best solution > here for auto-detection. But two explicit formats is probably better. > >> The other thing we should take into account is the Xinclude tag. Would >> we want to make it possible to query *either* the single query XML >> results or the master Xinclude document (point 2 of the proposed >> change)? Or should we restrict our parser only to the single query >> files? > > I think single files is a reasonable restriction... assuming BLAST > will still have the option of producing a big multi-query XML? > Probably we should ask the NCBI about that... In a way, the Xinclude file is the file containing multi-query XML. I have a feeling that if Xinclude is proposed, producing multi-output BLAST XML files will not be an option anymore (otherwise it seems redundant). But yes, NCBI should has more info about this. > I would hope the Bio.SearchIO.index_db(...) approach could > be used on a colloection of little XML files, one for each query. > >>>> The other option is to introduce a new format name (e.g. >>>> 'blast-xml2'), which makes the user responsible for knowing which >>>> BLAST XML he/she is parsing. It feels more explicit this way, so I am >>>> leaning towards this option, despite 'blast-xml2' not sounding very >>>> nice to me ;). >>>> >>>> Any other thoughts? >>>> >>>> Best, >>>> Bow >>> >>> I agree for the SearchIO interface, two format names makes >>> sense - unless there is a neat way to auto-detect this on input. >>> >>> Using "blast-xml2" would work, or maybe something like >>> "blast-xml-2014" (too long?). >>> >>> We could even go for "blast-xml-old" and "blast-xml" perhaps? >> >> Hmm..'blast-xml-old', may make it difficult to adapt for future XML >> schema changes. How about renaming the current parser to >> 'blast-xml-legacy', and the new one to just 'blast-xml'? > > A possible downside of 'blast-xml-legacy' over 'blast-xml-old' > is this may be confused with the "legacy" BLAST in C to the > current BLAST+ in C++ move (which happened well before > this XML format change). Hmm. In this case then I am leaning to 'blast-xml2', I think. It's the shortest and most future-proof (subsequent changes to the XML format could be denoted as 'blast-xml3'). But it does make it slightly inconsistent with the names we have for HMMER (i.e. 'hmmer2-text' is for HMMER version 2 text output, 'hmmer3-text' is for HMMER version 3 text output). Cheers, Bow From p.j.a.cock at googlemail.com Tue Mar 18 09:15:16 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 13:15:16 +0000 Subject: [Biopython-dev] A biopython buildbot as a docker container In-Reply-To: <20140312151048.45066ade@lnx> References: <20140312151048.45066ade@lnx> Message-ID: On Wed, Mar 12, 2014 at 3:10 PM, Tiago Antao wrote: > Hi, > > I have a docker container ready (save for a few applications). Simple > usage instructions: > > 1. Create a directory and download inside this file: > https://raw.github.com/tiagoantao/my-containers/master/Biopython-Test Things moved, https://github.com/tiagoantao/my-containers/tree/master/biopython I guess you mean: https://raw.github.com/tiagoantao/my-containers/master/biopython/Biopython-Test > 2. Rename it Dockerfile (capital D) > > 3. Get a buildbot username and password (from Peter or me), edit the > file and replace CHANGEUSER CHANGEPASS > > 4. do > docker build -t biopython-buildbot . > > 5. do > docker run biopython-buildbot > > Beta-version, comments appreciated ;) > > If people like this, I will amend the Continuous Integration page on > the wiki accordingly > > Tiago Is this a 32 or 64 bit VM, or either? I'm asking because we may want to source a replacement 32 bit Linux buildslave - the hard drive in the old machine we've been using is failing, and it is probably not worth replacing. Peter From mjldehoon at yahoo.com Tue Mar 18 10:21:48 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 18 Mar 2014 07:21:48 -0700 (PDT) Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: Message-ID: <1395152508.46913.YahooMailBasic@web164006.mail.gq1.yahoo.com> On Mon, 3/17/14, Peter Cock wrote: > Are we all generally in favour of lower case for new module > names (as per PEP8)? > i.e. Bio/struct not Bio/Struct ? You may want to consider Bio/structure instead of Bio/struct. To me "struct" sounds like the C programming term, rather than a protein structure. Best, -Michiel From p.j.a.cock at googlemail.com Tue Mar 18 10:43:56 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 14:43:56 +0000 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: <1395152508.46913.YahooMailBasic@web164006.mail.gq1.yahoo.com> References: <1395152508.46913.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: On Tue, Mar 18, 2014 at 2:21 PM, Michiel de Hoon wrote: > On Mon, 3/17/14, Peter Cock wrote: >> Are we all generally in favour of lower case for new module >> names (as per PEP8)? >> i.e. Bio/struct not Bio/Struct ? > > You may want to consider Bio/structure instead of Bio/struct. > To me "struct" sounds like the C programming term, > rather than a protein structure. > > Best, > -Michiel I like Bio.structure too :) Peter From anaryin at gmail.com Tue Mar 18 10:46:34 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 18 Mar 2014 15:46:34 +0100 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References: <1395152508.46913.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: Makes sense! If nobody complains I'll change it. From eric.talevich at gmail.com Tue Mar 18 11:23:29 2014 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 18 Mar 2014 08:23:29 -0700 Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: References: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: On Mon, Mar 17, 2014 at 5:39 PM, Harsh Beria wrote: > Hi, > > I have started to write a proposal for a project on pair wise sequence > alignment. Is there anyone interested in mentoring the project so that I > can discuss some of the algorithmic problems in detail? Also, do I need to > add the project to the ideas page as it is not there yet? > It's not necessary to add the project to the public Ideas page if you've come up with it yourself. Just share your own proposal with us here and we'll discuss it with you. -Eric From tra at popgen.net Tue Mar 18 12:12:50 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 18 Mar 2014 16:12:50 +0000 Subject: [Biopython-dev] A biopython buildbot as a docker container In-Reply-To: References: <20140312151048.45066ade@lnx> Message-ID: <20140318161250.77a269fe@lnx> Hi, On Tue, 18 Mar 2014 13:15:16 +0000 Peter Cock wrote: > > Things moved, > https://github.com/tiagoantao/my-containers/tree/master/biopython > > I guess you mean: > https://raw.github.com/tiagoantao/my-containers/master/biopython/Biopython-Test Ah, sorry. Because this is a first version is being subjected to heavy refactoring still. I plan to document the final version well. For now maybe it is better to go to the top level: https://github.com/tiagoantao/my-containers The example of the README is documenting the biopython containers as they stand. > Is this a 32 or 64 bit VM, or either? I am afraid it is 64 and that doing a 32-bit docker is possible but not trivial. Tiago From arklenna at gmail.com Tue Mar 18 12:48:51 2014 From: arklenna at gmail.com (Lenna Peterson) Date: Tue, 18 Mar 2014 12:48:51 -0400 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References: <1395152508.46913.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: On Tue, Mar 18, 2014 at 10:43 AM, Peter Cock wrote: > On Tue, Mar 18, 2014 at 2:21 PM, Michiel de Hoon > wrote: > > On Mon, 3/17/14, Peter Cock wrote: > >> Are we all generally in favour of lower case for new module > >> names (as per PEP8)? > >> i.e. Bio/struct not Bio/Struct ? > > > > You may want to consider Bio/structure instead of Bio/struct. > > To me "struct" sounds like the C programming term, > > rather than a protein structure. > > > > Best, > > -Michiel > > I like Bio.structure too :) > Thirded! I'm in a particularly busy portion of my PhD right now but hopefully over the summer I'll have a little more spare time for open source work. Cheers, Lenna From tra at popgen.net Tue Mar 18 13:13:34 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 18 Mar 2014 17:13:34 +0000 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal Message-ID: <20140318171334.0edc2b45@lnx> Hi, Currently we have went through the procedure of asking on the mailing lists about Simcoal deprecation (now that we have fastsimcoal) 3 proposals and a doubt: 1. Deprecate https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Async.py https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Cache.py 2. Delete the Simcoal tests 3. Amend the tutorial The doubt: I would like to deprecate a class inside https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Controller.py But not the whole Controller (the fastsimcoal code is there). Question: Is there a procedure for a partial deprecation? Thanks, T From p.j.a.cock at googlemail.com Tue Mar 18 13:15:41 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 17:15:41 +0000 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal In-Reply-To: <20140318171334.0edc2b45@lnx> References: <20140318171334.0edc2b45@lnx> Message-ID: Hi Tiago, We've previously put a deprecation warning inside the __init__ method so anyone actually using the class will be warned. Peter On Tue, Mar 18, 2014 at 5:13 PM, Tiago Antao wrote: > Hi, > > Currently we have went through the procedure of asking on the mailing > lists about Simcoal deprecation (now that we have fastsimcoal) > > 3 proposals and a doubt: > > 1. Deprecate > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Async.py > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Cache.py > 2. Delete the Simcoal tests > 3. Amend the tutorial > > The doubt: I would like to deprecate a class inside > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Controller.py > But not the whole Controller (the fastsimcoal code is there). > Question: Is there a procedure for a partial deprecation? > > Thanks, > T > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From tra at popgen.net Tue Mar 18 14:26:10 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 18 Mar 2014 18:26:10 +0000 Subject: [Biopython-dev] A biopython buildbot as a docker container In-Reply-To: <20140318161250.77a269fe@lnx> References: <20140312151048.45066ade@lnx> <20140318161250.77a269fe@lnx> Message-ID: <20140318182610.35082103@lnx> An update on the status: 1. A couple of problems with fasttree and dialign2. These seem genuine problems with the test code/modules. 2. Prank will wait for ubuntu trusty (it will be a standard package). I will then include it. 3. I was just able to find part of the fonts for the graphics packages, so a couple of tests are being skipped 4. naccess as a very restrictive activation system, impossible to add. 1 and 3 are solvable (2 will sort itself out with time). 1 is really a problem with the biopython code, I think. For 3, if someone could have a look at the existing fonts here: https://github.com/tiagoantao/my-containers/blob/master/biopython/Biopython-Basic And tell me which ones are missing, I would take care of adding them. Tiago PS - In the near future I will do a Python 3 container also. From kirkolem at gmail.com Tue Mar 18 17:31:45 2014 From: kirkolem at gmail.com (Dan K.) Date: Wed, 19 Mar 2014 01:31:45 +0400 Subject: [Biopython-dev] GSoC 2014 Message-ID: Hello! I'm a third year student learning bioengineering and bioinformatics and I'm interested in participating in GSoC and contributing to the BioPython project. In particular, working on multiple alignments and scoring matrices (PSSMs) seems interesting to me. For example, I find convenient to implement Gerstein-Sonnhammer-Chothia and Henikoff weighting procedures. What do you think? What my next steps should be like? Thank you for your attention. From p.j.a.cock at googlemail.com Wed Mar 19 13:00:37 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 19 Mar 2014 17:00:37 +0000 Subject: [Biopython-dev] SQLite test failure on Windows, OperationalError: unable to open database file Message-ID: Hi all, About a week ago most of the Windows nightly tests broke - e.g. here on the same revision (!) 79f9054e5246ba30816ff93a775d594ae7da6fc6 https://github.com/biopython/biopython/commit/79f9054e5246ba30816ff93a775d594ae7da6fc6 Worked, Fri Mar 14 http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.6/builds/1129 Failed, Sat Mar 15 http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.6/builds/1130 ... test_BioSQL_sqlite3 ... FAIL ... ====================================================================== ERROR: Check list, keys, length etc ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\repositories\BuildBotBiopython\win26\build\Tests\common_BioSQL.py", line 193, in setUp load_database(gb_handle) File "c:\repositories\BuildBotBiopython\win26\build\Tests\common_BioSQL.py", line 166, in load_database create_database() File "c:\repositories\BuildBotBiopython\win26\build\Tests\common_BioSQL.py", line 148, in create_database server.load_database_sql(SQL_FILE) File "c:\repositories\BuildBotBiopython\win26\build\build\lib.win32-2.6\BioSQL\BioSeqDatabase.py", line 281, in load_database_sql self.adaptor.cursor.execute(sql_line) OperationalError: unable to open database file (etc) Presumably something changed on the machine itself - perhaps a Windows security update? Any guesses for what might be wrong and why it broke on Python 2.6, PyPy 1.9, 2.0, 2.1 - yet works fine on Python 2.7, Python 3.3, PyPy 2.2 and Jython 2.7? Logged into this machine, I can reproduce the error with: c:\python26\python test_BioSQL_sqlite3.py Thanks, Peter From eparker at ucdavis.edu Wed Mar 19 12:49:04 2014 From: eparker at ucdavis.edu (Evan Parker) Date: Wed, 19 Mar 2014 09:49:04 -0700 Subject: [Biopython-dev] GSoC draft proposal - lazy loading SeqIO parsers Message-ID: Hi all, I have a rough draft of my GSoC proposal and would appreciate comments from anybody who might be willing to eventually mentor this project, or anybody who has opinions on implementation. It's about 3 pages of text + several figures. I'll be submitting a final draft Friday on the GSoC website pending your comments. Thank you, -Evan -------------- next part -------------- A non-text attachment was scrubbed... Name: Evan-Parker-GSOC-2014-proposal.pdf Type: application/pdf Size: 68577 bytes Desc: not available URL: From p.j.a.cock at googlemail.com Wed Mar 19 13:26:10 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 19 Mar 2014 17:26:10 +0000 Subject: [Biopython-dev] GSoC draft proposal - lazy loading SeqIO parsers In-Reply-To: References: Message-ID: On Wed, Mar 19, 2014 at 4:49 PM, Evan Parker wrote: > Hi all, > > I have a rough draft of my GSoC proposal and would appreciate comments from > anybody who might be willing to eventually mentor this project, or anybody > who has opinions on implementation. It's about 3 pages of text + several > figures. > > I'll be submitting a final draft Friday on the GSoC website pending your > comments. > > Thank you, > -Evan Hi Evan, That's a nice job so far - although questions about your time availability will be raised (sadly the GSoC schedule isn't fair to students depending on regional University term schedules). However, you are a PhD student (which is normally full time). You will need to clear this with your PhD supervisors - since you would be spending a large chunk of time not working directly on your thesis project, and there can be strict deadlines for completion. Here's a selection of points in no particular order: Have you looked at Bio.SeqIO.index_db(...) which works like Bio.SeqIO.index(...) but stores the offsets etc in an SQLite database? When pondering how to design this kind of thing myself, I had suspected multiple SeqRecProxy classes might be needed (one per file format potentially), although run time selection of internal parsing methods might work too. I would also ask why not have the slicing of a SeqRecProxy return another SeqRecProxy? This means creating a new proxy object with different offset values - but would be fast. Only when the seq/annotation/etc is accessed would the proxy have to go to the disk drive. This becomes more interesting when accessing the features in the slice of interest (e.g. if the full record was for a whole chromosome and only region [1000:2000] was of interest). This idea about windows onto the data is key to how the SAM/BAM file format is used (coordinate sorting with an index). Are you familiar with that, or tabix? Another open question is what to do with file handles - specifically the question of when to close them? e.g. via garbage collection, context managers, etc. See for example this blog post - the lazy parsing approach may result in ResourceWarnings as a side effect: http://emptysqua.re/blog/against-resourcewarnings-in-python-3/ I appreciate you are unlikely to have ready answers to all of that - I've probably given you a whole load more background reading. I hope some of the other Biopython developers (or GSoC mentors on other OBF projects - you could post this to the OBF GSoC mailing list too) will have further feedback. Regards, Peter From harsh.beria93 at gmail.com Wed Mar 19 13:44:47 2014 From: harsh.beria93 at gmail.com (Harsh Beria) Date: Wed, 19 Mar 2014 23:14:47 +0530 Subject: [Biopython-dev] GSOC Proposal (Pairwise Sequence Alignment in Biopython) Message-ID: Hi, Please take a look at my GSOC proposal on Pairwise Sequence Alignment and suggest improvements. https://gist.github.com/harshberia93/9647053 Thanks -- Harsh Beria, Indian Institute of Technology,Kharagpur E-mail: harsh.beria93 at gmail.com Ph: +919332157616 From nejat.arinik at insa-lyon.fr Wed Mar 19 13:45:53 2014 From: nejat.arinik at insa-lyon.fr (Nejat Arinik) Date: Wed, 19 Mar 2014 18:45:53 +0100 (CET) Subject: [Biopython-dev] detailed plan - Indexing & Lazy-loading Sequence Parsers In-Reply-To: <635897854.576469.1395250362974.JavaMail.root@insa-lyon.fr> Message-ID: <14256604.579277.1395251153640.JavaMail.root@insa-lyon.fr> Hi all, I would show you my detailed plan per mounth. https://docs.google.com/document/d/1IKJAs4u4rAVnmaDh0LPyMrd_MuqELqblmveeeOO36aE/edit It's not neatly I know but it's just for learn yours ideas at about that plan. I'll finish it this night. I understood correctly the subject, you think? That plan can be solution? Thanks in advance. PS: My english level is not good so It is a little bit difficult to write a proposal-plan detailed but I'm trying. I hope it's not a big problem :) I'm more comfortable with the french language unfortunately. Nejat From p.j.a.cock at googlemail.com Wed Mar 19 14:10:54 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 19 Mar 2014 18:10:54 +0000 Subject: [Biopython-dev] detailed plan - Indexing & Lazy-loading Sequence Parsers In-Reply-To: <14256604.579277.1395251153640.JavaMail.root@insa-lyon.fr> References: <635897854.576469.1395250362974.JavaMail.root@insa-lyon.fr> <14256604.579277.1395251153640.JavaMail.root@insa-lyon.fr> Message-ID: On Wed, Mar 19, 2014 at 5:45 PM, Nejat Arinik wrote: > > Hi all, > > I would show you my detailed plan per mounth. > https://docs.google.com/document/d/1IKJAs4u4rAVnmaDh0LPyMrd_MuqELqblmveeeOO36aE/edit > > It's not neatly I know but it's just for learn yours ideas at about that > plan. I'll finish it this night. I understood correctly the subject, you think? > That plan can be solution? Thanks in advance. > > PS: My english level is not good so It is a little bit difficult to write a > proposal-plan detailed but I'm trying. I hope it's not a big problem :) > I'm more comfortable with the french language unfortunately. > > Nejat Hi Nejat, I can try to answer some of the questions at the start of the document: Q: Lazy-load ~= load partially (depends on demands) ? Yes. For example, only load the sequence if the user tries to access the sequence. For example, this should speed up tasks like counting the records, or building a list of all the record identifiers. Q: small to medium sized sequences/genomes is how much in general? It takes how many times? A: Bacterial genomes usually are small enough to load into memory without worrying about RAM. Eukaryote genomes (e.g. mouse, human, plants) are typically large enough that you may not want to load an entire annotated chromosome into memory. Q: python dictionary is used for SeqRecord object ? A: Yes, the SeqRecord object uses a Python dictionary for the annotations property, and a dictionary like object for the letter_annotations property. The SeqRecord object also uses Python lists, and the Biopython Seq object. Q: Putting some data in the file will be done? If yes, relation with Biosql? So any modification as an update will be considerable/ be paid attention. A: The SeqRecord-like objects from the lazy-parsers could be read only. However, if they act enough like the original SeqRecord, then they can be used with Bio.SeqIO.write(...) to save them to disk. It would be nice if (like the BioSQL SeqRecord-like objects) it was possible to modify the records in memory. Q: For very large indexing jobs, index on multiple machines running simultaneously, and then merge the indexes. A: This seems too complicated. If building the index is slow, I suggest saving the index on disk (e.g. as an SQLite database). For comparison, see the BAM and tabix index files, or Biopython's Bio.SeqIO.index_db(...) function. Regards, Peter From w.arindrarto at gmail.com Wed Mar 19 15:42:50 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 19 Mar 2014 20:42:50 +0100 Subject: [Biopython-dev] GSoC draft proposal - lazy loading SeqIO parsers In-Reply-To: References:

Message-ID: Hi Evan, Looks like this is shaping up in a good direction :). In addition to Peter's earlier comments, I also have some remarks: * How would the indices of the files be stored? Are they simply stored in-memory or as files? Are their creation invisible to the user (i.e. invoking the `lazy=True` argument is enough to create the index) or does the user need to create the index explicitly? For `SeqIO.index(lazy=True)` in particular, does this mean that we will have two indices then (one for the currently implemented SQLite database that stores offsets for record positions and the other to store other informations necessary for the lazy parser)? * It would be nice to also have some notes on the relation between SeqRecProxy and SeqRecord (is it a subclass perhaps, or are they both different but will inherit from another base subclass). As an alternative, it is also possible to have regular SeqRecord object, but with lazy Seq objects and lazy annotation objects instead. * Have you thought about what to store in the indices of the different formats? It's a good idea to explain this further in your proposal (e.g. what to store when indexing GenBank files, UniprotXML files, etc.). It doesn't have to be concrete (it will be in the code anyway, but having an idea or possible implementations you have in mind would be nice. * And finally, the schedule. It looks like the early weeks will be quite packed, considering your other obligations. I think it is expected that students spend close to 8 hours per day (or 40 hours per week) during the coding period. Of course this is much more sensible when the student does not have other pressing obligations. I do agree with Peter here that you have to at least discuss this with your PhD supervisor. I personally do not mind that for the week you have the conference the workload is reduced. But in the first four weeks, I would prefer that you have more time to spend for GSoC. Cheers & good luck, Bow On Wed, Mar 19, 2014 at 6:26 PM, Peter Cock wrote: > On Wed, Mar 19, 2014 at 4:49 PM, Evan Parker wrote: >> Hi all, >> >> I have a rough draft of my GSoC proposal and would appreciate comments from >> anybody who might be willing to eventually mentor this project, or anybody >> who has opinions on implementation. It's about 3 pages of text + several >> figures. >> >> I'll be submitting a final draft Friday on the GSoC website pending your >> comments. >> >> Thank you, >> -Evan > > Hi Evan, > > That's a nice job so far - although questions about your time > availability will be raised (sadly the GSoC schedule isn't fair to > students depending on regional University term schedules). > However, you are a PhD student (which is normally full time). > You will need to clear this with your PhD supervisors - since > you would be spending a large chunk of time not working > directly on your thesis project, and there can be strict > deadlines for completion. > > Here's a selection of points in no particular order: > > Have you looked at Bio.SeqIO.index_db(...) which works > like Bio.SeqIO.index(...) but stores the offsets etc in an > SQLite database? > > When pondering how to design this kind of thing myself, > I had suspected multiple SeqRecProxy classes might be > needed (one per file format potentially), although run > time selection of internal parsing methods might work too. > > I would also ask why not have the slicing of a SeqRecProxy > return another SeqRecProxy? This means creating a new > proxy object with different offset values - but would be fast. > Only when the seq/annotation/etc is accessed would the > proxy have to go to the disk drive. This becomes more > interesting when accessing the features in the slice of > interest (e.g. if the full record was for a whole chromosome > and only region [1000:2000] was of interest). > > This idea about windows onto the data is key to how > the SAM/BAM file format is used (coordinate sorting > with an index). Are you familiar with that, or tabix? > > Another open question is what to do with file handles - > specifically the question of when to close them? e.g. > via garbage collection, context managers, etc. See > for example this blog post - the lazy parsing approach > may result in ResourceWarnings as a side effect: > http://emptysqua.re/blog/against-resourcewarnings-in-python-3/ > > I appreciate you are unlikely to have ready answers to > all of that - I've probably given you a whole load more > background reading. I hope some of the other Biopython > developers (or GSoC mentors on other OBF projects - > you could post this to the OBF GSoC mailing list too) > will have further feedback. > > Regards, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From eparker at ucdavis.edu Wed Mar 19 20:34:34 2014 From: eparker at ucdavis.edu (Evan Parker) Date: Wed, 19 Mar 2014 17:34:34 -0700 Subject: [Biopython-dev] GSoC draft proposal - lazy loading SeqIO parsers In-Reply-To: References:

Message-ID: Thank you both for your fast and thorough evaluation of my proposal. *Regarding time requirements:* My adviser is aware the possibility that I may participate in this program. During the summer I would file a "planned educational leave" instead of enrollment to accommodate my full-time participation in GSoC. As for the time requirements; I cannot avoid my obligations prior to ASMS although I can promise to spend every extra minute I have to honor my obligations to Biopython. If my lack of full time availability prior to June precludes me from participation I will understand. *Regarding specific suggestions:* I will come up with a deeper description of the relationship between SeqRecProxy and SeqRecord before Friday. I like the idea of a SeqRecProxy returning itself when sliced, I had not thought of it but it would be an elegant solution to the problem of unparsed-vs-parsed annotations, this feature would also allow more transparent use of proxy objects and would pave the way for compatibility with SeqIO.write(). I considered using multiple proxy classes, but I prefer making a standardized binding for a lazy parsing function function that can be accepted by a single SeqRecProxy at run-time. I'll make this more explicit in my proposal. There are many other questions and points of clarification that I still need to evaluate. I'll incorporate as much as I can in my proposal without overloading it and without making statements that I cannot back up with my own understanding. Thanks again, -Evan From p.j.a.cock at googlemail.com Thu Mar 20 07:19:27 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 20 Mar 2014 11:19:27 +0000 Subject: [Biopython-dev] Fwd: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts In-Reply-To: References: Message-ID: FYI, in addition to the SciPy conference in Texas this summer, there is also EuroSciPy which will be in England this year - deadline for abstracts is 14 April (see below). Is anyone planning to attend? If not maybe I should...? Thanks, Peter P.S. Don't forget to consider submitting a talk/poster abstract to BOSC 2014 (which I am co-chairing this year), especially students who can get free BOSC registration: http://news.open-bio.org/news/2014/03/free-student-presenters-bosc-2014/ ---------- Forwarded message ---------- From: Ralf Gommers Date: Wed, Mar 5, 2014 at 7:37 PM Subject: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts To: Organisation of EuroScipy , conferences at python.org, numfocus at googlegroups.com, Discussion of Numerical Python , SciPy Users List Dear all, EuroSciPy 2014, the Seventh Annual Conference on Python in Science, takes place in Cambridge, UK on 27 - 30 August 2013. The conference features two days of tutorials followed by two days of scientific talks. The day after the main conference, developer sprints will be organized on projects of interest to attendees. The topics presented at EuroSciPy are very diverse, with a focus on advanced software engineering and original uses of Python and its scientific libraries, either in theoretical or experimental research, from both academia and the industry. The program includes keynotes, contributed talks and posters. Submissions for talks and posters are welcome on our website (http://www.euroscipy.org/2014/). In your abstract, please provide details on what Python tools are being employed, and how. The deadline for submission is 14 April 2013. Also until 14 April 2014, you can apply for a sprint session on 31 August 2014. See https://www.euroscipy.org/2014/calls/sprints/ for details. Important dates: April 14th: Presentation abstracts, poster, tutorial submission deadline. Application for sponsorship deadline. May 17th: Speakers selected May 22nd: Sponsorship acceptance deadline June 1st: Speaker schedule announced June 6th, or 150 registrants: Early-bird registration ends August 27-31st: 2 days of tutorials, 2 days of conference, 1 day of sprints We look forward to an exciting conference and hope to see you in Cambridge in August! The EuroSciPy 2014 Team http://www.euroscipy.org/2014/ Conference Chairs -------------------------- Mark Hayes, Cambridge University, UK Didrik Pinte, Enthought Europe, UK Tutorial Chair ------------------- David Cournapeau, Enthought Europe, UK Program Chair -------------------- Ralf Gommers, ASML, The Netherlands Program Committee ----------------------------- Tiziano Zito, Humboldt-Universit?t zu Berlin, Germany Pierre de Buyl, Universit? libre de Bruxelles, Belgium Emmanuelle Gouillart, Joint Unit CNRS/Saint-Gobain, France Konrad Hinsen, Centre National de la Recherche Scientifique (CNRS), France Raphael Ritz, Garching Computing Centre of the Max Planck Society, Germany St?fan van der Walt, Applied Mathematics, Stellenbosch University, South Africa Ga?l Varoquaux, INRIA Parietal, Saclay, France Nelle Varoquaux, Mines ParisTech, France Pauli Virtanen, Aalto University, Finland Evgeni Burovski, Lancaster University, UK Robert Cimrman, New Technologies Research Centre, University of West Bohemia, Czech Republic Almar Klein, Cybermind, The Netherlands Organizing Committee ------------------------------ Simon Jagoe, Enthought Europe, UK Pierre de Buyl, Universit? libre de Bruxelles, Belgium _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From tra at popgen.net Thu Mar 20 07:48:15 2014 From: tra at popgen.net (Tiago Antao) Date: Thu, 20 Mar 2014 11:48:15 +0000 Subject: [Biopython-dev] Fwd: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts In-Reply-To: References:

Message-ID: <20140320114815.70210fd5@grandao> On Thu, 20 Mar 2014 11:19:27 +0000 Peter Cock wrote: > Is anyone planning to attend? If not maybe I should...? Wild thought here: Considering that Cambridge is a geographic focal point for some of us (I am looking at you Dutch-based Biopythoneers, for instance), I am wondering if the could use this for a "local" Biopython meetup... Does this make any sense? Would there be interest? As I said, wild thought (silly?)... Tiago From anaryin at gmail.com Thu Mar 20 07:54:05 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 20 Mar 2014 12:54:05 +0100 Subject: [Biopython-dev] Fwd: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts In-Reply-To: <20140320114815.70210fd5@grandao> References:

<20140320114815.70210fd5@grandao> Message-ID: Lovely geographical clustering :) I'd be in. 2014-03-20 12:48 GMT+01:00 Tiago Antao : > On Thu, 20 Mar 2014 11:19:27 +0000 > Peter Cock wrote: > > > Is anyone planning to attend? If not maybe I should...? > > > Wild thought here: Considering that Cambridge is a geographic focal > point for some of us (I am looking at you Dutch-based Biopythoneers, > for instance), I am wondering if the could use this for a "local" > Biopython meetup... Does this make any sense? Would there be interest? > > As I said, wild thought (silly?)... > > > Tiago > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From tra at popgen.net Thu Mar 20 07:42:44 2014 From: tra at popgen.net (Tiago Antao) Date: Thu, 20 Mar 2014 11:42:44 +0000 Subject: [Biopython-dev] Ipython notebook biopython tutorials Message-ID: <20140320114244.4022cbc7@grandao> Hi all, Just to announce a potential project that I might embark on very soon and see the reaction of the community: Get all the tutorial materials that I can find and create a ipython notebook version of them. Does this sound like a good idea? Tiago (your ipython notebook fanatic) From w.arindrarto at gmail.com Thu Mar 20 08:10:15 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 20 Mar 2014 13:10:15 +0100 Subject: [Biopython-dev] Fwd: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts In-Reply-To: References:

<20140320114815.70210fd5@grandao> Message-ID: Sounds good :). However, my passport is still Indonesian and I'd have to apply for a visa first in Germany in order to come to the UK :|. So I'll pass this one I guess. On Thu, Mar 20, 2014 at 12:54 PM, Jo?o Rodrigues wrote: > Lovely geographical clustering :) > > I'd be in. > > > > 2014-03-20 12:48 GMT+01:00 Tiago Antao : > >> On Thu, 20 Mar 2014 11:19:27 +0000 >> Peter Cock wrote: >> >> > Is anyone planning to attend? If not maybe I should...? >> >> >> Wild thought here: Considering that Cambridge is a geographic focal >> point for some of us (I am looking at you Dutch-based Biopythoneers, >> for instance), I am wondering if the could use this for a "local" >> Biopython meetup... Does this make any sense? Would there be interest? >> >> As I said, wild thought (silly?)... >> >> >> Tiago >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From w.arindrarto at gmail.com Thu Mar 20 08:15:56 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 20 Mar 2014 13:15:56 +0100 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: <20140320114244.4022cbc7@grandao> References: <20140320114244.4022cbc7@grandao> Message-ID: Hi Tiago, Do you plan to put the .ipynb file in the repo or will this be separate? Either way, I like the idea of having an .ipynb version of the tutorials around :). (from another IPython user). On Thu, Mar 20, 2014 at 12:42 PM, Tiago Antao wrote: > Hi all, > > Just to announce a potential project that I might embark on very soon > and see the reaction of the community: > > Get all the tutorial materials that I can find and create a ipython > notebook version of them. > > Does this sound like a good idea? > > Tiago > (your ipython notebook fanatic) > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From mgymrek at mit.edu Thu Mar 20 09:50:39 2014 From: mgymrek at mit.edu (Melissa Gymrek) Date: Thu, 20 Mar 2014 09:50:39 -0400 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal In-Reply-To: <20140318171334.0edc2b45@lnx> References: <20140318171334.0edc2b45@lnx> Message-ID: Hi Tiago, I'm happy to update this section in the tutorial if you'd like help with that. Cheers, ~M On Tue, Mar 18, 2014 at 1:13 PM, Tiago Antao wrote: > Hi, > > Currently we have went through the procedure of asking on the mailing > lists about Simcoal deprecation (now that we have fastsimcoal) > > 3 proposals and a doubt: > > 1. Deprecate > > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Async.py > > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Cache.py > 2. Delete the Simcoal tests > 3. Amend the tutorial > > The doubt: I would like to deprecate a class inside > > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Controller.py > But not the whole Controller (the fastsimcoal code is there). > Question: Is there a procedure for a partial deprecation? > > Thanks, > T > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From mgymrek at mit.edu Thu Mar 20 09:57:13 2014 From: mgymrek at mit.edu (Melissa Gymrek) Date: Thu, 20 Mar 2014 09:57:13 -0400 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: References: <20140320114244.4022cbc7@grandao> Message-ID: Hi Tiago, I also really like this idea. Seems like it would make sense to have them as part of the repository to make it easy for others to contribute. (yet another IPython notebook user :) ) ~M On Thu, Mar 20, 2014 at 8:15 AM, Wibowo Arindrarto wrote: > Hi Tiago, > > Do you plan to put the .ipynb file in the repo or will this be > separate? Either way, I like the idea of having an .ipynb version of > the tutorials around :). > > (from another IPython user). > > On Thu, Mar 20, 2014 at 12:42 PM, Tiago Antao wrote: > > Hi all, > > > > Just to announce a potential project that I might embark on very soon > > and see the reaction of the community: > > > > Get all the tutorial materials that I can find and create a ipython > > notebook version of them. > > > > Does this sound like a good idea? > > > > Tiago > > (your ipython notebook fanatic) > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From tra at popgen.net Thu Mar 20 10:11:16 2014 From: tra at popgen.net (Tiago Antao) Date: Thu, 20 Mar 2014 14:11:16 +0000 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: References: <20140320114244.4022cbc7@grandao> Message-ID: <20140320141116.3f98384f@grandao> Hi Bow and Melissa, I was planning on doing this separately. But happy to do it otherwise. Or maybe we could start a git repo, do some examples and see where it goes? Considering that this would be starting from scratch I was planning on doing this on ipython 2.0 with python 3.4. You know, living on the edge ;) Tiago On Thu, 20 Mar 2014 09:57:13 -0400 Melissa Gymrek wrote: > Hi Tiago, > I also really like this idea. Seems like it would make sense to have > them as part of the repository to make it easy for others to > contribute. (yet another IPython notebook user :) ) > ~M > > > On Thu, Mar 20, 2014 at 8:15 AM, Wibowo Arindrarto > wrote: > > > Hi Tiago, > > > > Do you plan to put the .ipynb file in the repo or will this be > > separate? Either way, I like the idea of having an .ipynb version of > > the tutorials around :). > > > > (from another IPython user). > > > > On Thu, Mar 20, 2014 at 12:42 PM, Tiago Antao > > wrote: > > > Hi all, > > > > > > Just to announce a potential project that I might embark on very > > > soon and see the reaction of the community: > > > > > > Get all the tutorial materials that I can find and create a > > > ipython notebook version of them. > > > > > > Does this sound like a good idea? > > > > > > Tiago > > > (your ipython notebook fanatic) > > > _______________________________________________ > > > Biopython-dev mailing list > > > Biopython-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > From p.j.a.cock at googlemail.com Thu Mar 20 10:19:51 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 20 Mar 2014 14:19:51 +0000 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: References: <20140320114244.4022cbc7@grandao> Message-ID: +1 for any *.ipynb files being under source code control. There are perhaps advantages to using a separate repository, but still under Biopython on GitHub? This might also help if we wanted to build on existing external tutorials which are under a CC licence etc... Peter On Thu, Mar 20, 2014 at 1:57 PM, Melissa Gymrek wrote: > Hi Tiago, > I also really like this idea. Seems like it would make sense to have them > as part of the repository to make it easy for others to contribute. > (yet another IPython notebook user :) ) > ~M > > > On Thu, Mar 20, 2014 at 8:15 AM, Wibowo Arindrarto > wrote: > >> Hi Tiago, >> >> Do you plan to put the .ipynb file in the repo or will this be >> separate? Either way, I like the idea of having an .ipynb version of >> the tutorials around :). >> >> (from another IPython user). >> >> On Thu, Mar 20, 2014 at 12:42 PM, Tiago Antao wrote: >> > Hi all, >> > >> > Just to announce a potential project that I might embark on very soon >> > and see the reaction of the community: >> > >> > Get all the tutorial materials that I can find and create a ipython >> > notebook version of them. >> > >> > Does this sound like a good idea? >> > >> > Tiago >> > (your ipython notebook fanatic) >> > _______________________________________________ >> > Biopython-dev mailing list >> > Biopython-dev at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biopython-dev >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From anaryin at gmail.com Thu Mar 20 10:21:12 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 20 Mar 2014 15:21:12 +0100 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: <20140320141116.3f98384f@grandao> References: <20140320114244.4022cbc7@grandao> <20140320141116.3f98384f@grandao> Message-ID: +1 too. Maybe adding some support for oldies (Python 2.x) or are there features in iPython 2.0 that cannot be used in these older versions?? From tra at popgen.net Thu Mar 20 10:48:15 2014 From: tra at popgen.net (Tiago Antao) Date: Thu, 20 Mar 2014 14:48:15 +0000 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: References: <20140320114244.4022cbc7@grandao>

Message-ID: <20140320144815.30b7a138@grandao> On Thu, 20 Mar 2014 14:19:51 +0000 Peter Cock wrote: > +1 for any *.ipynb files being under source code control. > > There are perhaps advantages to using a separate repository, > but still under Biopython on GitHub? This might also help if we > wanted to build on existing external tutorials which are under > a CC licence etc... My original plan was to draw "heavy inspiration" (credited, of course) from the existing Tutorial and maybe your workshop work. This all started when I noticed the need to change the tutorial due to simcoal changes... As I had to re-visit this, the idea followed... If people are fine with something under the biopython organization, I am fine with that. I have two proposals, though: 1. Base it on recent infrastructure (python 3.4 or maybe 3.3, and above all ipython 2.0) 2. Go on and "do stuff", see where it goes and then maybe re-organize in the future (as opposed to do lots of planning first). This is, in some sense, a new line of direction and I would suggest that being exploratory would be better than being cautious... Tiago From p.j.a.cock at googlemail.com Thu Mar 20 10:53:41 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 20 Mar 2014 14:53:41 +0000 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: <20140320144815.30b7a138@grandao> References: <20140320114244.4022cbc7@grandao>

<20140320144815.30b7a138@grandao> Message-ID: On Thu, Mar 20, 2014 at 2:48 PM, Tiago Antao wrote: > On Thu, 20 Mar 2014 14:19:51 +0000 > Peter Cock wrote: > >> +1 for any *.ipynb files being under source code control. >> >> There are perhaps advantages to using a separate repository, >> but still under Biopython on GitHub? This might also help if we >> wanted to build on existing external tutorials which are under >> a CC licence etc... > > > My original plan was to draw "heavy inspiration" (credited, of course) > from the existing Tutorial and maybe your workshop work. > > This all started when I noticed the need to change the tutorial due to > simcoal changes... As I had to re-visit this, the idea followed... > > If people are fine with something under the biopython organization, I > am fine with that. > > I have two proposals, though: > > 1. Base it on recent infrastructure (python 3.4 or maybe 3.3, and above > all ipython 2.0) > > 2. Go on and "do stuff", see where it goes and then maybe re-organize > in the future (as opposed to do lots of planning first). This is, in > some sense, a new line of direction and I would suggest that being > exploratory would be better than being cautious... > > Tiago So make a new repository and explore away :) Regarding https://github.com/peterjc/biopython_workshop - my workshop stuff I did wonder at the time about using iPython notebook but it adds another step to the workshop setup - and another barrier for people to repeat what they did at home. I was/am hoping to improve the TravisCI coverage of that work to check all the examples work under Python 2.6, 2.7 3.3 etc. I wonder if iPython notebooks make automated testing any easier or not? Peter From tra at popgen.net Thu Mar 20 11:27:38 2014 From: tra at popgen.net (Tiago Antao) Date: Thu, 20 Mar 2014 15:27:38 +0000 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal In-Reply-To: References: <20140318171334.0edc2b45@lnx> Message-ID: <20140320152738.3b3ab8ac@grandao> Hi Melissa, On Thu, 20 Mar 2014 09:50:39 -0400 Melissa Gymrek wrote: > I'm happy to update this section in the tutorial if you'd like help > with that. I just did all the changes (not much really). I was planning on committing the changes (Peter, can I?) and then some reviewing (or changing, if needed) would really be appreciated. Tiago From p.j.a.cock at googlemail.com Thu Mar 20 11:29:15 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 20 Mar 2014 15:29:15 +0000 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal In-Reply-To: <20140320152738.3b3ab8ac@grandao> References: <20140318171334.0edc2b45@lnx> <20140320152738.3b3ab8ac@grandao> Message-ID: On Thu, Mar 20, 2014 at 3:27 PM, Tiago Antao wrote: > Hi Melissa, > > On Thu, 20 Mar 2014 09:50:39 -0400 > Melissa Gymrek wrote: > >> I'm happy to update this section in the tutorial if you'd like help >> with that. > > I just did all the changes (not much really). I was planning on > committing the changes (Peter, can I?) and then some reviewing (or > changing, if needed) would really be appreciated. > > Tiago Please do :) Peter From mgymrek at mit.edu Thu Mar 20 11:34:52 2014 From: mgymrek at mit.edu (Melissa Gymrek) Date: Thu, 20 Mar 2014 11:34:52 -0400 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal In-Reply-To: References: <20140318171334.0edc2b45@lnx> <20140320152738.3b3ab8ac@grandao> Message-ID: sounds good! happy to have a look ~M On Thu, Mar 20, 2014 at 11:29 AM, Peter Cock wrote: > On Thu, Mar 20, 2014 at 3:27 PM, Tiago Antao wrote: > > Hi Melissa, > > > > On Thu, 20 Mar 2014 09:50:39 -0400 > > Melissa Gymrek wrote: > > > >> I'm happy to update this section in the tutorial if you'd like help > >> with that. > > > > I just did all the changes (not much really). I was planning on > > committing the changes (Peter, can I?) and then some reviewing (or > > changing, if needed) would really be appreciated. > > > > Tiago > > Please do :) > > Peter > From b.invergo at gmail.com Thu Mar 20 09:39:34 2014 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 20 Mar 2014 13:39:34 +0000 Subject: [Biopython-dev] Fwd: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts In-Reply-To: <20140320114815.70210fd5@grandao> (Tiago Antao's message of "Thu, 20 Mar 2014 11:48:15 +0000") References:

<20140320114815.70210fd5@grandao> Message-ID: <87txathtux.fsf@chupacabra.windows.ebi.ac.uk> > Wild thought here: Considering that Cambridge is a geographic focal > point for some of us (I am looking at you Dutch-based Biopythoneers, > for instance), I am wondering if the could use this for a "local" > Biopython meetup... Does this make any sense? Would there be interest? > > As I said, wild thought (silly?)... Since I'm now based in Cambridge, it would by silly for me not to attend. I'm not all that active lately (biopython's doing what I want it to do) but it'd still be nice to meet up. Cheers, Brandon -- Brandon Invergo http://brandon.invergo.net -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 489 bytes Desc: not available URL: From w.arindrarto at gmail.com Fri Mar 21 10:59:40 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 21 Mar 2014 15:59:40 +0100 Subject: [Biopython-dev] [Biopython] Exonerate Parser Error In-Reply-To: References: