From harsh.beria93 at gmail.com Mon Mar 3 16:57:35 2014 From: harsh.beria93 at gmail.com (Harsh Beria) Date: Tue, 4 Mar 2014 03:27:35 +0530 Subject: [Biopython-dev] Gsoc 2014 aspirant In-Reply-To: References: <1393582069.6863.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: The pairwise alignment project is not listed in the Ideas page. If I work on it and make a GUI or command line frontend, can that be taken up as a GSOC project? Who can be the potential mentor for this project so that I can chalk out the details before starting to code? Also, do I need to add the project on the idea page? On Sat, Mar 1, 2014 at 12:40 AM, Harsh Beria wrote: > I can work on pairwise sequence alignment. Actually, I have previously > worked on this using Dynamic programming. But I doubt whether this can be a > GSOC project because the work load will not be too much. If we use > different methods to predict sequence alignment and make a front-end which > allows the user to input the sequence or even a pdb file and method of > alignment and predict the alignment, the work can be substantial enough. > > Also, as suggested by Christopher, sequence alignment is pretty basic and > we can use C backend, which can significantly improve the runtime. So, we > can discuss it and I can start working on it. > > > On Fri, Feb 28, 2014 at 11:15 PM, Fields, Christopher J < > cjfields at illinois.edu> wrote: > >> I'm wondering, with something that is as broadly applicable as pairwise >> alignment, would it be better to implement only in Python (or implement in >> Python wedded to a C backend)? Or maybe set up something in python that >> taps into an already well-defined C/C++ library that does this? >> >> The reason I mention this: with bioperl we went down this route with >> bioperl-ext a long time ago (these are generally C-based backend tools with >> a perl front-end), that bit-rotted simply b/c there were other more >> maintainable options. IIUC from this post, similar issues re: >> maintainability held for Bio/pairwise2.py (unless I'm mistaken, which is >> entirely possible). However, tools like pysam and Bio::DB::Samtools (on >> the perl end) seem to have been maintained much more readily since they tap >> into a common library. >> >> For instance, my suggestion would be to implement a Biopython tool that >> does pairwise alignment using library X (SeqAn, EMBOSS, etc). Or maybe a >> generic python front-end that allows users to pick the tool/method for the >> alignment, with maybe a library binding as an initial implementation. >> >> chris >> >> On Feb 28, 2014, at 4:07 AM, Michiel de Hoon wrote: >> >> > Hi Harsh Beria, >> > >> > One option is to work on pairwise sequence alignments. Currently there >> is some code for that in Biopython (in Bio/pairwise2.py), but it is not >> general and is not being maintained. This may need to be rebuilt from the >> ground up. >> > >> > Best, >> > -Michiel. >> > >> > -------------------------------------------- >> > On Wed, 2/26/14, Harsh Beria wrote: >> > >> > Subject: [Biopython-dev] Gsoc 2014 aspirant >> > To: biopython at lists.open-bio.org, biopython-dev at lists.open-bio.org, >> gsoc at lists.open-bio.org >> > Date: Wednesday, February 26, 2014, 11:14 AM >> > >> > Hi, >> > >> > I am a Harsh Beria, third year UG student at Indian >> > Institute of >> > Technology, Kharagpur. I have started working in >> > Computational Biophysics >> > recently, having written code for pdb to fasta parser, >> > sequence alignment >> > using Needleman Wunch and Smith Waterman, Secondary >> > Structure prediction, >> > Henikoff's weight and am currently working on Monte Carlo >> > simulation. >> > Overall, I have started to like this field and want to carry >> > my interest >> > forward by pursuing a relevant project for GSOC 2014. I >> > mainly code in C >> > and python and would like to start contributing to the >> > Biopython library. I >> > started going through the official contribution wiki page ( >> > http://biopython.org/wiki/Contributing) >> > >> > I also went through the wiki page of Bio.SeqlO's. I >> > seriously want to >> > contribute to the Biopython library through GSOC. What do I >> > do next ? >> > >> > Thanks >> > -- >> > >> > Harsh Beria, >> > Indian Institute of Technology,Kharagpur >> > E-mail: harsh.beria93 at gmail.com >> > _______________________________________________ >> > Biopython-dev mailing list >> > Biopython-dev at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > >> > _______________________________________________ >> > Biopython-dev mailing list >> > Biopython-dev at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >> > > > -- > > Harsh Beria, > Indian Institute of Technology,Kharagpur > E-mail: harsh.beria93 at gmail.com > > Ph: +919332157616 > > -- Harsh Beria, Indian Institute of Technology,Kharagpur E-mail: harsh.beria93 at gmail.com Ph: +919332157616 From mjldehoon at yahoo.com Tue Mar 4 05:40:52 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 4 Mar 2014 02:40:52 -0800 (PST) Subject: [Biopython-dev] Gsoc 2014 aspirant In-Reply-To: Message-ID: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com> I would suggest to implement this in C, with a thin wrapper in Python. Using 3rd-party libraries would increase the compile-time dependencies of Biopython. Anyway I expect that the tricky part will be the design of the module, rather than the algorithms themselves, so using 3rd-party libraries wouldn't help us so much. Best, -Michiel. -------------------------------------------- On Fri, 2/28/14, Fields, Christopher J wrote: Subject: Re: [Biopython-dev] Gsoc 2014 aspirant To: "Michiel de Hoon" Cc: "biopython-dev at lists.open-bio.org" , "Harsh Beria" Date: Friday, February 28, 2014, 12:45 PM I?m wondering, with something that is as broadly applicable as pairwise alignment, would it be better to implement only in Python (or implement in Python wedded to a C backend)?? Or maybe set up something in python that taps into an already well-defined C/C++ library that does this?? The reason I mention this: with bioperl we went down this route with bioperl-ext a long time ago (these are generally C-based backend tools with a perl front-end), that bit-rotted simply b/c there were other more maintainable options.? IIUC from this post, similar issues re: maintainability held for Bio/pairwise2.py (unless I?m mistaken, which is entirely possible).? However, tools like pysam and Bio::DB::Samtools (on the perl end) seem to have been maintained much more readily since they tap into a common library. For instance, my suggestion would be to implement a Biopython tool that does pairwise alignment using library X (SeqAn, EMBOSS, etc).? Or maybe a generic python front-end that allows users to pick the tool/method for the alignment, with maybe a library binding as an initial implementation. chris On Feb 28, 2014, at 4:07 AM, Michiel de Hoon wrote: > Hi Harsh Beria, > > One option is to work on pairwise sequence alignments. Currently there is some code for that in Biopython (in Bio/pairwise2.py), but it is not general and is not being maintained. This may need to be rebuilt from the ground up. > > Best, > -Michiel. > > -------------------------------------------- > On Wed, 2/26/14, Harsh Beria wrote: > > Subject: [Biopython-dev] Gsoc 2014 aspirant > To: biopython at lists.open-bio.org, biopython-dev at lists.open-bio.org, gsoc at lists.open-bio.org > Date: Wednesday, February 26, 2014, 11:14 AM > > Hi, > > I am a Harsh Beria, third year UG student at Indian > Institute of > Technology, Kharagpur. I have started working in > Computational Biophysics > recently, having written code for pdb to fasta parser, > sequence alignment > using Needleman Wunch and Smith Waterman, Secondary > Structure prediction, > Henikoff's weight and am currently working on Monte Carlo > simulation. > Overall, I have started to like this field and want to carry > my interest > forward by pursuing a relevant project for GSOC 2014. I > mainly code in C > and python and would like to start contributing to the > Biopython library. I > started going through the official contribution wiki page ( > http://biopython.org/wiki/Contributing) > > I also went through the wiki page of Bio.SeqlO's. I > seriously want to > contribute to the Biopython library through GSOC. What do I > do next ? > > Thanks > -- > > Harsh Beria, > Indian Institute of Technology,Kharagpur > E-mail: harsh.beria93 at gmail.com > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Tue Mar 4 07:32:47 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 4 Mar 2014 12:32:47 +0000 Subject: [Biopython-dev] Gsoc 2014 aspirant In-Reply-To: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com> References: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: On Tue, Mar 4, 2014 at 10:40 AM, Michiel de Hoon wrote: > I would suggest to implement this in C, with a thin wrapper in Python. > Using 3rd-party libraries would increase the compile-time dependencies of Biopython. > Anyway I expect that the tricky part will be the design of the module, rather than the algorithms themselves, so using 3rd-party libraries wouldn't help us so much. > Best, > -Michiel. I would also consider a pure Python implementation on top, both for cross-testing, but also for use under Jython or PyPy where using the C code wouldn't be possible (or at least, becomes more complicated). (This is what the existing Bio.pairwise2 module does) Adding third party C libraries would also make life hard for cross platform testing (Linux, Mac, Windows). Peter From cjfields at illinois.edu Tue Mar 4 15:25:47 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 4 Mar 2014 20:25:47 +0000 Subject: [Biopython-dev] Gsoc 2014 aspirant In-Reply-To: References: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: On Mar 4, 2014, at 6:32 AM, Peter Cock wrote: > On Tue, Mar 4, 2014 at 10:40 AM, Michiel de Hoon wrote: >> I would suggest to implement this in C, with a thin wrapper in Python. >> Using 3rd-party libraries would increase the compile-time dependencies of Biopython. >> Anyway I expect that the tricky part will be the design of the module, rather than the algorithms themselves, so using 3rd-party libraries wouldn't help us so much. >> Best, >> -Michiel. > > I would also consider a pure Python implementation on top, > both for cross-testing, but also for use under Jython or PyPy > where using the C code wouldn't be possible (or at least, > becomes more complicated). > > (This is what the existing Bio.pairwise2 module does) Ah, so it?s pure python. Makes sense to have it for that purpose. You could simply repurpose the existing code. > Adding third party C libraries would also make life hard for > cross platform testing (Linux, Mac, Windows). > > Peter This is a problem with bioinformatics tools in general; they simply aren?t Windows-friendly. However, one can write code with portability in mind (even C/C++). chris From p.j.a.cock at googlemail.com Tue Mar 4 16:45:09 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 4 Mar 2014 21:45:09 +0000 Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: References: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: On Tuesday, March 4, 2014, Fields, Christopher J wrote: > On Mar 4, 2014, at 6:32 AM, Peter Cock > > wrote: > > > On Tue, Mar 4, 2014 at 10:40 AM, Michiel de Hoon > > wrote: > >> I would suggest to implement this in C, with a thin wrapper in Python. > >> Using 3rd-party libraries would increase the compile-time dependencies > of Biopython. > >> Anyway I expect that the tricky part will be the design of the module, > rather than the algorithms themselves, so using 3rd-party libraries > wouldn't help us so much. > >> Best, > >> -Michiel. > > > > I would also consider a pure Python implementation on top, > > both for cross-testing, but also for use under Jython or PyPy > > where using the C code wouldn't be possible (or at least, > > becomes more complicated). > > > > (This is what the existing Bio.pairwise2 module does) > > Ah, so it's pure python. Makes sense to have it for that purpose. You > could simply repurpose the existing code. > > Apologies if unclear - Biopython has both a C and pure Python version of pairwise2 - although most of our bits of C code don't have a fallback and so break under Jython or PyPy etc. Personally I am optimistic about the potential of PyPy to speed up most Python code with its JIT so am a little wary of adding more C code (which may act as a barrier to entry for future maintainers) without a matching Python implementation - but appreciate that for typical C Python this is often the best way to attain high performance. But Michiel is absolutely right - the algorithm choice is even more important. > > Adding third party C libraries would also make life hard for > > cross platform testing (Linux, Mac, Windows). > > > > Peter > > This is a problem with bioinformatics tools in general; they simply aren't > Windows-friendly. However, one can write code with portability in mind > (even C/C++). > > chris Yes indeed - this is one reason why the buildbot for automated cross-platform testing is really helpful (since few if currently any of the Biopython developers use Windows as their primary system). Peter From nigel.delaney at outlook.com Tue Mar 4 17:39:04 2014 From: nigel.delaney at outlook.com (Nigel Delaney) Date: Tue, 4 Mar 2014 17:39:04 -0500 Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: References: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: As a quick $0.02 from a library user on this. Back in 2006 when I first started using biopython I was working with Bio.pairwise2, and learned that it was too slow for the task at hand, so wound up switching to passing/parsing files to an aligner on the command line to get around this, as (on windows back then), I never got the compiled C code to work properly. Although pypy may be fast enough and has been very impressive in my experience (plus computers are much faster now), I think wrapping a library that is cross platform and maintains its own binaries would be a great option rather than implementing C-code (I think this might be what the GSoC student did last year for BioPython by wrapping Python). Mostly though I just wanted to second Peter's wariness about adding C-code into the library. I have found over the years that a lot of python scientific tools that in theory should be cross platform (Stampy, IPython, Matplotlib, Numpy, GATK, etc.) are really not and can be a huge timesuck of dealing with installation issues as code moves between computers and operating systems, usually due to some C code or OS specific behavior. Since code in python works anywhere python is installed, I really appreciate the extent that the library can be as much pure python as allowable or strictly dependent on a particular downloadable binary for a specific OS/Architecture/Scenario. From mjldehoon at yahoo.com Wed Mar 5 20:49:49 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 5 Mar 2014 17:49:49 -0800 (PST) Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: Message-ID: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> Hi Nigel, While compiling Biopython on Windows can be tricky, in my experience it has been easy to compile the C libraries in Biopython on other platforms (Unix/Linux/MacOSX). Have you run into specific problems compiling Biopython? I would think that wrapping 3rd-party libraries or executables is much more error-prone. Best, -Michiel. -------------------------------------------- On Tue, 3/4/14, Nigel Delaney wrote: Subject: Re: [Biopython-dev] [GSoC] Gsoc 2014 aspirant To: "'Peter Cock'" , "'Fields, Christopher J'" Cc: biopython-dev at lists.open-bio.org, "'Harsh Beria'" Date: Tuesday, March 4, 2014, 5:39 PM As a quick $0.02 from a library user on this.? Back in 2006 when I first started using biopython I was working with Bio.pairwise2, and learned that it was too slow for the task at hand, so wound up switching to passing/parsing files to an aligner on the command line to get around this, as (on windows back then), I never got the compiled C code to work properly. Although pypy may be fast enough and has been very impressive in my experience (plus computers are much faster now), I think wrapping a library that is cross platform and maintains its own binaries would be a great option rather than implementing C-code (I think this might be what the GSoC student did last year for BioPython by wrapping Python). Mostly though I just wanted to second Peter's wariness about adding C-code into the library.? I have found over the years that a lot of python scientific tools that in theory should be cross platform (Stampy, IPython, Matplotlib, Numpy, GATK, etc.) are really not and can be a huge timesuck of dealing with installation issues as code moves between computers and operating systems, usually due to some C code or OS specific behavior. Since code in python works anywhere python is installed, I really appreciate the extent that the library can be as much pure python as allowable or strictly dependent on a particular downloadable binary for a specific OS/Architecture/Scenario. _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From harsh.beria93 at gmail.com Fri Mar 7 18:41:53 2014 From: harsh.beria93 at gmail.com (Harsh Beria) Date: Sat, 8 Mar 2014 05:11:53 +0530 Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> References: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: Hi, Regarding the algorithm part of Pairwise Sequence Alignment, I can use Dynamic Programming (Smith Waterman for local and Needleman Wunsch for Global Alignment). Please suggest if I should go for dynamic programming. Also, the above discussion points out that the implementation should be purely python based for cross-platform compatibility. On Thu, Mar 6, 2014 at 7:19 AM, Michiel de Hoon wrote: > Hi Nigel, > > While compiling Biopython on Windows can be tricky, in my experience it > has been easy to compile the C libraries in Biopython on other platforms > (Unix/Linux/MacOSX). Have you run into specific problems compiling > Biopython? I would think that wrapping 3rd-party libraries or executables > is much more error-prone. > > Best, > -Michiel. > > -------------------------------------------- > On Tue, 3/4/14, Nigel Delaney wrote: > > Subject: Re: [Biopython-dev] [GSoC] Gsoc 2014 aspirant > To: "'Peter Cock'" , "'Fields, Christopher > J'" > Cc: biopython-dev at lists.open-bio.org, "'Harsh Beria'" < > harsh.beria93 at gmail.com> > Date: Tuesday, March 4, 2014, 5:39 PM > > As a quick $0.02 from a library user > on this. Back in 2006 when I first > started using biopython I was working with Bio.pairwise2, > and learned that > it was too slow for the task at hand, so wound up switching > to > passing/parsing files to an aligner on the command line to > get around this, > as (on windows back then), I never got the compiled C code > to work properly. > Although pypy may be fast enough and has been very > impressive in my > experience (plus computers are much faster now), I think > wrapping a library > that is cross platform and maintains its own binaries would > be a great > option rather than implementing C-code (I think this might > be what the GSoC > student did last year for BioPython by wrapping Python). > > Mostly though I just wanted to second Peter's wariness about > adding C-code > into the library. I have found over the years that a > lot of python > scientific tools that in theory should be cross platform > (Stampy, IPython, > Matplotlib, Numpy, GATK, etc.) are really not and can be a > huge timesuck of > dealing with installation issues as code moves between > computers and > operating systems, usually due to some C code or OS specific > behavior. > Since code in python works anywhere python is installed, I > really appreciate > the extent that the library can be as much pure python as > allowable or > strictly dependent on a particular downloadable binary for a > specific > OS/Architecture/Scenario. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > -- Harsh Beria, Indian Institute of Technology,Kharagpur E-mail: harsh.beria93 at gmail.com Ph: +919332157616 From tra at popgen.net Mon Mar 10 13:02:17 2014 From: tra at popgen.net (Tiago Antao) Date: Mon, 10 Mar 2014 17:02:17 +0000 Subject: [Biopython-dev] Installing all biopython dependencies Message-ID: <20140310170217.048f00a1@lnx> Hi, I am trying to create a easy-to-install, easy-to-replicate Virtual Machine(*) with all the requirements for Biopython. The idea is mainly to make it easy to have reliable testing, but it can also be used as a very fast installation of Biopython. The VM is currently based on Ubuntu saucy, and I am trying to make sure all the dependencies are met. I would like some advice on the following please: EmbossPhylipNew - the ubuntu package (embassy-phylip) has dependency problems, so I guess this requires a manual download/install? reportlab - What is the best way to get the fonts? XXmotif - What is this??? PAML - There seemed to be a ubuntu package, but no more? The following packages require manual installation (no ubuntu package), please correct me if I am wrong (makes my life easier)... DSSP Dialign msaprobs NACCESS Prank Probcons TCoffee (*) Actually I am building a docker container, but for ease of explanation it is similar to the more familiar Virtual Machine concept Thanks, Tiago From p.j.a.cock at googlemail.com Mon Mar 10 13:20:43 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 10 Mar 2014 17:20:43 +0000 Subject: [Biopython-dev] Installing all biopython dependencies In-Reply-To: <20140310170217.048f00a1@lnx> References: <20140310170217.048f00a1@lnx> Message-ID: On Mon, Mar 10, 2014 at 5:02 PM, Tiago Antao wrote: > Hi, > > I am trying to create a easy-to-install, easy-to-replicate Virtual > Machine(*) with all the requirements for Biopython. The idea is mainly > to make it easy to have reliable testing, but it can also be used as a > very fast installation of Biopython. Sounds good :) > The VM is currently based on Ubuntu saucy, and I am trying to > make sure all the dependencies are met. Some of this would apply to the TravisCI VM, which is also Debian/Ubuntu based. There we have to balance total run time (install everything & run tests) against full coverage). https://travis-ci.org/biopython/biopython/builds It would be neat to have an instance of your docker based VM running as a buildslave too... http://testing.open-bio.org/biopython/tgrid > I would like some advice on the following please: > > EmbossPhylipNew - the ubuntu package (embassy-phylip) has dependency > problems, so I guess this requires a manual download/install? > reportlab - What is the best way to get the fonts? > XXmotif - What is this??? > PAML - There seemed to be a ubuntu package, but no more? > > > The following packages require manual installation (no ubuntu > package), please correct me if I am wrong (makes my life easier)... > > DSSP > Dialign > msaprobs > NACCESS > Prank > Probcons > TCoffee For TravisCI we install a Debian/Ubuntu package for t-coffee, so at least that ought to be easy. e.g. https://packages.debian.org/sid/t-coffee Others (where the licence permits) we can request DebianMed/ BioLinux look at for packaging... Peter From anaryin at gmail.com Mon Mar 10 13:20:37 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 10 Mar 2014 18:20:37 +0100 Subject: [Biopython-dev] Installing all biopython dependencies In-Reply-To: <20140310170217.048f00a1@lnx> References: <20140310170217.048f00a1@lnx> Message-ID: Hi Tiago, For DSSP and NACCESS, you need a manual installation. DSSP is publicly available (binaries): ftp://ftp.cmbi.ru.nl/pub/software/dssp/ NACCESS is more complicated.. you need a license to get it and g77 installed to compile. You might have to contact the authors to allow such a broad distribution.. ? Cheers, Jo?o From tra at popgen.net Mon Mar 10 13:28:54 2014 From: tra at popgen.net (Tiago Antao) Date: Mon, 10 Mar 2014 17:28:54 +0000 Subject: [Biopython-dev] Installing all biopython dependencies In-Reply-To: References: <20140310170217.048f00a1@lnx> Message-ID: <20140310172854.07b0c1df@lnx> On Mon, 10 Mar 2014 18:20:37 +0100 Jo?o Rodrigues wrote: > NACCESS is more complicated.. you need a license to get it and g77 > installed to compile. You might have to contact the authors to allow > such a broad distribution.. Thanks, I might skip NACCESS at this stage. From tra at popgen.net Mon Mar 10 13:33:20 2014 From: tra at popgen.net (Tiago Antao) Date: Mon, 10 Mar 2014 17:33:20 +0000 Subject: [Biopython-dev] Installing all biopython dependencies In-Reply-To: References: <20140310170217.048f00a1@lnx> Message-ID: <20140310173320.228a7866@lnx> On Mon, 10 Mar 2014 17:20:43 +0000 Peter Cock wrote: > It would be neat to have an instance of your docker based > VM running as a buildslave too... > http://testing.open-bio.org/biopython/tgrid That was my original objective, which I have split into two: 1. A biopython docker container 2. A buildbot docker container for biopython (a different kind of beast) And then research how this might integrate with BioCloudLinux. As a side I have to say that using docker is progressing quite well and it seems a very interesting platform for deployment and testing. > Others (where the licence permits) we can request DebianMed/ > BioLinux look at for packaging... >From the problematic list, I will gather a list of software whose license permits packaging and report back on this. Tiago From tra at popgen.net Tue Mar 11 08:04:13 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 11 Mar 2014 12:04:13 +0000 Subject: [Biopython-dev] test_Fasttree_tool Message-ID: <20140311120413.73bbac2e@lnx> Hi, When I run test_Fasttree_tool standalone, all goes well. But if I run it through run_test.py I get this: ====================================================================== FAIL: runTest (__main__.ComparisonTestCase) test_Fasttree_tool ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 302, in runTest self.fail("Warning: Can't open %s for test %s" % (outputfile, self.name)) AssertionError: Warning: Can't open ./output/test_Fasttree_tool for test test_Fasttree_tool ---------------------------------------------------------------------- Any ideas? Thanks, T From harijay at gmail.com Tue Mar 11 09:37:39 2014 From: harijay at gmail.com (hari jayaram) Date: Tue, 11 Mar 2014 09:37:39 -0400 Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings In-Reply-To: <1388053712.20811.YahooMailBasic@web164006.mail.gq1.yahoo.com> References: <1388053712.20811.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: Hi all, I just pull-ed from the git repository just now and after installing the newest numpy and scipy ( also from their respective git repos)..when I try to install biopython I get the same error complaining that I need to define : #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] I tried adding to file "/usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/old_defines.h" the following line #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION But it still fails to install with an error as indicated below. I am sorry I dont know how to work around this. Thanks for your help Hari ################# error message ################# In file included from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1725: /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:11:2: warning: "Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] #warning "Using deprecated NumPy API, disable it by #defining ... ^ /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:23:2: error: Should never include npy_deprecated_api directly. #error Should never include npy_deprecated_api directly. ^ In file included from Bio/Cluster/clustermodule.c:3: In file included from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:15: In file included from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17: In file included from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1725: In file included from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:126: /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/old_defines.h:6:2: error: The header "old_defines.h" is deprecated as of NumPy 1.7. #error The header "old_defines.h" is deprecated as of NumPy 1.7. ^ 1 warning and 2 errors generated. error: command 'cc' failed with exit status 1 On Thu, Dec 26, 2013 at 5:28 AM, Michiel de Hoon wrote: > Fixed; please let us know if you encounter any problems. > > -Michiel. > > > > -------------------------------------------- > On Mon, 9/23/13, Peter Cock wrote: > > Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings > To: "Biopython-Dev Mailing List" > Date: Monday, September 23, 2013, 4:58 PM > > Hi all, > > I'm seeing the following warning from NumPy 1.7 with Python > 3.3 on Mac > OS X, and on Linux too. I believe the NumPy version is the > critical > factor: > > building 'Bio.Cluster.cluster' extension > building 'Bio.KDTree._CKDTree' extension > building 'Bio.Motif._pwm' extension > building 'Bio.motifs._pwm' extension > > all give: > > > /Users/peterjc/lib/python3.3/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:11:2: > warning: "Using > deprecated NumPy API, disable it by > #defining > NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] > > According to this page, > http://docs.scipy.org/doc/numpy-dev/reference/c-api.deprecations.html > > If we add this line it should confirm our code is clean for > NumPy 1.7 > (and implies to side effects on older NumPy): > > #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION > > Unfortunately that seems all four modules have problems > doing > that, presumably planned NumPy C API changes we need to > handle via a version conditional #ifdef? > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From p.j.a.cock at googlemail.com Tue Mar 11 09:42:55 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 11 Mar 2014 13:42:55 +0000 Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings In-Reply-To: References: <1388053712.20811.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: On Tue, Mar 11, 2014 at 1:37 PM, hari jayaram wrote: > Hi all, > I just pull-ed from the git repository just now and after installing the > newest numpy and scipy ( also from their respective git repos)..when I try > to install biopython I get the same error complaining that I need to define > : > > #defining NPY_NO_DEPRECATED_API > NPY_1_7_API_VERSION" [-W#warnings] > > I tried adding to file > "/usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/old_defines.h" > the following line > > > #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION > > > But it still fails to install with an error as indicated below. > > I am sorry I dont know how to work around this. > Thanks for your help > > Hari I suspect based on this NumPy thread that it is a problem with your NumPy install, perhaps you have some old files from a previous NumPy installation which are confusing things? http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069396.html Peter From p.j.a.cock at googlemail.com Tue Mar 11 09:52:47 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 11 Mar 2014 13:52:47 +0000 Subject: [Biopython-dev] test_Fasttree_tool In-Reply-To: <20140311120413.73bbac2e@lnx> References: <20140311120413.73bbac2e@lnx> Message-ID: On Tue, Mar 11, 2014 at 12:04 PM, Tiago Antao wrote: > Hi, > > When I run test_Fasttree_tool standalone, all goes well. But if I run > it through run_test.py I get this: > ====================================================================== > FAIL: runTest (__main__.ComparisonTestCase) > test_Fasttree_tool > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "run_tests.py", line 302, in runTest > self.fail("Warning: Can't open %s for test %s" % (outputfile, > self.name)) AssertionError: Warning: Can't > open ./output/test_Fasttree_tool for test test_Fasttree_tool > > ---------------------------------------------------------------------- > > > Any ideas? > > Thanks, > T I'm surprised it ever works - the expected output file is not in git :( Try: $ run_tests.py -g test_Fasttree_tool $ more output/test_Fasttree_tool $ git add output/test_Fasttree_tool $ git commit -m "Checking in missing output file for test_Fasttree_tool.py" Peter From tra at popgen.net Tue Mar 11 10:38:53 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 11 Mar 2014 14:38:53 +0000 Subject: [Biopython-dev] A docker container for Biopython Message-ID: <20140311143853.054f89fe@lnx> Hi, In a effort to have a complete, reliable and easy to replicate testing platform for Biopython I am in the process of creating a docker container (inspired by Brad's CloudBioLinux work) with everything needed for Biopython. I currently have a container that allows easy installation of Biopython. I have documented the process here: http://fe.popgen.net/2014/03/a-docker-container-for-biopython/ A few points: 1. A few applications still missing, not many 2. The fasttree test case is still failing 3. Database servers are included 4. This can be used to do a very fast deploy of Biopython (teaching, demo, etc...) 5. The container to test biopython (buildbot based) will be a different one (and probably only of interest to Peter and me ;) ) This is my first container, problems & suggestions most welcome! Tiago From harijay at gmail.com Tue Mar 11 20:17:39 2014 From: harijay at gmail.com (hari jayaram) Date: Tue, 11 Mar 2014 20:17:39 -0400 Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings In-Reply-To: References: <1388053712.20811.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: Thanks Peter .. That was indeed the case. I had a python in /usr/local/lib/python2.7/site-packages/numpy That was getting called rather than the one in my .virtualenv Once I removed that python . The install progressed very smoothly Thanks for your help Hari On Tue, Mar 11, 2014 at 9:42 AM, Peter Cock wrote: > On Tue, Mar 11, 2014 at 1:37 PM, hari jayaram wrote: > > Hi all, > > I just pull-ed from the git repository just now and after installing the > > newest numpy and scipy ( also from their respective git repos)..when I > try > > to install biopython I get the same error complaining that I need to > define > > : > > > > #defining NPY_NO_DEPRECATED_API > > NPY_1_7_API_VERSION" [-W#warnings] > > > > I tried adding to file > > > "/usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/old_defines.h" > > the following line > > > > > > #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION > > > > > > But it still fails to install with an error as indicated below. > > > > I am sorry I dont know how to work around this. > > Thanks for your help > > > > Hari > > I suspect based on this NumPy thread that it is a problem with > your NumPy install, perhaps you have some old files from a > previous NumPy installation which are confusing things? > > http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069396.html > > Peter > From p.j.a.cock at googlemail.com Tue Mar 11 20:25:33 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 12 Mar 2014 00:25:33 +0000 Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings In-Reply-To: References: <1388053712.20811.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: Great - thanks for letting us know that solved the problem :) Peter On Wed, Mar 12, 2014 at 12:17 AM, hari jayaram wrote: > Thanks Peter .. > That was indeed the case. I had a python in > > /usr/local/lib/python2.7/site-packages/numpy > > That was getting called rather than the one in my .virtualenv > > Once I removed that python . The install progressed very smoothly > > Thanks for your help > > Hari > > > > On Tue, Mar 11, 2014 at 9:42 AM, Peter Cock > wrote: >> >> On Tue, Mar 11, 2014 at 1:37 PM, hari jayaram wrote: >> > Hi all, >> > I just pull-ed from the git repository just now and after installing >> > the >> > newest numpy and scipy ( also from their respective git repos)..when I >> > try >> > to install biopython I get the same error complaining that I need to >> > define >> > : >> > >> > #defining NPY_NO_DEPRECATED_API >> > NPY_1_7_API_VERSION" [-W#warnings] >> > >> > I tried adding to file >> > >> > "/usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/old_defines.h" >> > the following line >> > >> > >> > #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION >> > >> > >> > But it still fails to install with an error as indicated below. >> > >> > I am sorry I dont know how to work around this. >> > Thanks for your help >> > >> > Hari >> >> I suspect based on this NumPy thread that it is a problem with >> your NumPy install, perhaps you have some old files from a >> previous NumPy installation which are confusing things? >> >> http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069396.html >> >> Peter > > From p.j.a.cock at googlemail.com Wed Mar 12 05:48:44 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 12 Mar 2014 09:48:44 +0000 Subject: [Biopython-dev] Problem on Mac OS X since update Xcode 5.1 Message-ID: Hi all, I installed the Xcode 5.1 update last night on my work Mac, and this seems to have broken the builds on Python 2.6 and 2.7 (run via builtbot). http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.6/builds/174/steps/compile/logs/stdio ... running build_ext building 'Bio.cpairwise2' extension creating build/temp.macosx-10.9-intel-2.6 creating build/temp.macosx-10.9-intel-2.6/Bio cc -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 -c Bio/cpairwise2module.c -o build/temp.macosx-10.9-intel-2.6/Bio/cpairwise2module.o clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future] clang: note: this will be a hard error (cannot be downgraded to a warning) in the future error: command 'cc' failed with exit status 1 http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.7/builds/170/steps/compile/logs/stdio ... running build_ext building 'Bio.cpairwise2' extension creating build/temp.macosx-10.9-intel-2.7 creating build/temp.macosx-10.9-intel-2.7/Bio cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c Bio/cpairwise2module.c -o build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future] clang: note: this will be a hard error (cannot be downgraded to a warning) in the future error: command 'cc' failed with exit status 1 This looks like a problem where distutils is using a gcc argument which cc (clang) used to ignore but not treats as an error. There will probably be similar reports on other Python projects as well... Peter From p.j.a.cock at googlemail.com Wed Mar 12 05:59:31 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 12 Mar 2014 09:59:31 +0000 Subject: [Biopython-dev] Problem on Mac OS X since update Xcode 5.1 In-Reply-To: References: Message-ID: On Wed, Mar 12, 2014 at 9:48 AM, Peter Cock wrote: > Hi all, > > I installed the Xcode 5.1 update last night on my work Mac, and > this seems to have broken the builds on Python 2.6 and 2.7 > (run via builtbot). > > http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.6/builds/174/steps/compile/logs/stdio > ... > running build_ext > building 'Bio.cpairwise2' extension > creating build/temp.macosx-10.9-intel-2.6 > creating build/temp.macosx-10.9-intel-2.6/Bio > cc -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common > -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX > -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g > -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 > -arch i386 -pipe > -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 > -c Bio/cpairwise2module.c -o > build/temp.macosx-10.9-intel-2.6/Bio/cpairwise2module.o > clang: error: unknown argument: '-mno-fused-madd' > [-Wunused-command-line-argument-hard-error-in-future] > clang: note: this will be a hard error (cannot be downgraded to a > warning) in the future > error: command 'cc' failed with exit status 1 > > http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.7/builds/170/steps/compile/logs/stdio > ... > running build_ext > building 'Bio.cpairwise2' extension > creating build/temp.macosx-10.9-intel-2.7 > creating build/temp.macosx-10.9-intel-2.7/Bio > cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 > -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd > -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes > -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes > -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe > -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 > -c Bio/cpairwise2module.c -o > build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o > clang: error: unknown argument: '-mno-fused-madd' > [-Wunused-command-line-argument-hard-error-in-future] > clang: note: this will be a hard error (cannot be downgraded to a > warning) in the future > error: command 'cc' failed with exit status 1 > > This looks like a problem where distutils is using a gcc argument > which cc (clang) used to ignore but not treats as an error. There > will probably be similar reports on other Python projects as well... > > Peter This looks relevant, especially this reply from Paul Kehrer which suggests this is entirely Apple's fault for shipping a Python and clang compiler which don't get along with the default settings: http://stackoverflow.com/questions/22313407/clang-error-unknown-argument-mno-fused-madd-psycopg2-installation-failure The suggested workaround seems to do the trick, $ export CFLAGS=-Qunused-arguments $ export CPPFLAGS=-Qunused-arguments Perhaps we can add this hack to our setup.py on Mac OS X... it seems harmless under gcc (e.g. my locally compiled version of Python 3.3 used gcc rather than clang)? Or it could be done via the buildbot setup, or on this buildslave directly (e.g. the ~/.bash_profile). What are folks' thoughts on this? We want it to remain easy to install Biopython from source under Mac OS X. Peter From tra at popgen.net Wed Mar 12 09:09:20 2014 From: tra at popgen.net (Tiago Antao) Date: Wed, 12 Mar 2014 13:09:20 +0000 Subject: [Biopython-dev] Logging in on the wiki Message-ID: <20140312130920.701a656e@lnx> Hi, Are people able to log in on the wiki? I am getting back a page with: " Google Error: invalid_request Error in parsing the OpenID auth request. Learn more" Maybe its a google thing, but it might be on our side? Tiago From w.arindrarto at gmail.com Wed Mar 12 09:15:45 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 12 Mar 2014 14:15:45 +0100 Subject: [Biopython-dev] Logging in on the wiki In-Reply-To: <20140312130920.701a656e@lnx> References: <20140312130920.701a656e@lnx> Message-ID: Hi Tiago, I can log in using my Google OpenID. Best, Bow On Wed, Mar 12, 2014 at 2:09 PM, Tiago Antao wrote: > Hi, > > Are people able to log in on the wiki? I am getting back a page with: > > " > Google > Error: invalid_request > Error in parsing the OpenID auth request. > Learn more" > > Maybe its a google thing, but it might be on our side? > > Tiago > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Wed Mar 12 09:19:45 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 12 Mar 2014 13:19:45 +0000 Subject: [Biopython-dev] Logging in on the wiki In-Reply-To: References: <20140312130920.701a656e@lnx> Message-ID: I too can log in to the wiki with my Google OpenID. (Probably unrelated, we had to restart MySQL on the server earlier this week) Peter On Wed, Mar 12, 2014 at 1:15 PM, Wibowo Arindrarto wrote: > Hi Tiago, > > I can log in using my Google OpenID. > > Best, > Bow > > On Wed, Mar 12, 2014 at 2:09 PM, Tiago Antao wrote: >> Hi, >> >> Are people able to log in on the wiki? I am getting back a page with: >> >> " >> Google >> Error: invalid_request >> Error in parsing the OpenID auth request. >> Learn more" >> >> Maybe its a google thing, but it might be on our side? >> >> Tiago >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From tra at popgen.net Wed Mar 12 09:23:45 2014 From: tra at popgen.net (Tiago Antao) Date: Wed, 12 Mar 2014 13:23:45 +0000 Subject: [Biopython-dev] Logging in on the wiki In-Reply-To: References: <20140312130920.701a656e@lnx> Message-ID: <20140312132345.3b577a47@lnx> On Wed, 12 Mar 2014 14:15:45 +0100 Wibowo Arindrarto wrote: > Hi Tiago, > > I can log in using my Google OpenID. > Thanks. I also can login now. I suppose it was something temporary on the google side... Tiago From cjfields at illinois.edu Wed Mar 12 10:46:15 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 12 Mar 2014 14:46:15 +0000 Subject: [Biopython-dev] Problem on Mac OS X since update Xcode 5.1 In-Reply-To: References: Message-ID: On Mar 12, 2014, at 4:59 AM, Peter Cock wrote: > On Wed, Mar 12, 2014 at 9:48 AM, Peter Cock wrote: >> Hi all, >> >> I installed the Xcode 5.1 update last night on my work Mac, and >> this seems to have broken the builds on Python 2.6 and 2.7 >> (run via builtbot). >> >> http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.6/builds/174/steps/compile/logs/stdio >> ... >> running build_ext >> building 'Bio.cpairwise2' extension >> creating build/temp.macosx-10.9-intel-2.6 >> creating build/temp.macosx-10.9-intel-2.6/Bio >> cc -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common >> -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX >> -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g >> -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 >> -arch i386 -pipe >> -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 >> -c Bio/cpairwise2module.c -o >> build/temp.macosx-10.9-intel-2.6/Bio/cpairwise2module.o >> clang: error: unknown argument: '-mno-fused-madd' >> [-Wunused-command-line-argument-hard-error-in-future] >> clang: note: this will be a hard error (cannot be downgraded to a >> warning) in the future >> error: command 'cc' failed with exit status 1 >> >> http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.7/builds/170/steps/compile/logs/stdio >> ... >> running build_ext >> building 'Bio.cpairwise2' extension >> creating build/temp.macosx-10.9-intel-2.7 >> creating build/temp.macosx-10.9-intel-2.7/Bio >> cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 >> -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd >> -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes >> -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes >> -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe >> -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 >> -c Bio/cpairwise2module.c -o >> build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o >> clang: error: unknown argument: '-mno-fused-madd' >> [-Wunused-command-line-argument-hard-error-in-future] >> clang: note: this will be a hard error (cannot be downgraded to a >> warning) in the future >> error: command 'cc' failed with exit status 1 >> >> This looks like a problem where distutils is using a gcc argument >> which cc (clang) used to ignore but not treats as an error. There >> will probably be similar reports on other Python projects as well... >> >> Peter > > This looks relevant, especially this reply from Paul Kehrer which > suggests this is entirely Apple's fault for shipping a Python and > clang compiler which don't get along with the default settings: > > http://stackoverflow.com/questions/22313407/clang-error-unknown-argument-mno-fused-madd-psycopg2-installation-failure > > The suggested workaround seems to do the trick, > > $ export CFLAGS=-Qunused-arguments > $ export CPPFLAGS=-Qunused-arguments > > Perhaps we can add this hack to our setup.py on Mac OS X... > it seems harmless under gcc (e.g. my locally compiled version > of Python 3.3 used gcc rather than clang)? > > Or it could be done via the buildbot setup, or on this buildslave > directly (e.g. the ~/.bash_profile). > > What are folks' thoughts on this? We want it to remain easy > to install Biopython from source under Mac OS X. > > Peter That?s scary. I planned on updating to the latest Xcode myself today, nice to be forewarned. I?ve been seeing clang complaints with various tools already, so I wouldn?t be surprised if this problem is more wide-spread than python. chris From tra at popgen.net Wed Mar 12 11:10:48 2014 From: tra at popgen.net (Tiago Antao) Date: Wed, 12 Mar 2014 15:10:48 +0000 Subject: [Biopython-dev] A biopython buildbot as a docker container Message-ID: <20140312151048.45066ade@lnx> Hi, I have a docker container ready (save for a few applications). Simple usage instructions: 1. Create a directory and download inside this file: https://raw.github.com/tiagoantao/my-containers/master/Biopython-Test 2. Rename it Dockerfile (capital D) 3. Get a buildbot username and password (from Peter or me), edit the file and replace CHANGEUSER CHANGEPASS 4. do docker build -t biopython-buildbot . 5. do docker run biopython-buildbot Beta-version, comments appreciated ;) If people like this, I will amend the Continuous Integration page on the wiki accordingly Tiago From eparker at ucdavis.edu Wed Mar 12 20:06:51 2014 From: eparker at ucdavis.edu (Evan Parker) Date: Wed, 12 Mar 2014 17:06:51 -0700 Subject: [Biopython-dev] Gsoc 2014: another aspirant here Message-ID: Hello, My name is Evan Parker, I am a third year graduate student studying analytical chemistry at UC Davis. Coding was my hobby in undergrad and has become a major component of my current graduate work in the context of mass-spectral interpretation software. I use Biopython for parsing Uniprot sequence data/annotations and I would be delighted to have the opportunity give back, especially under the umbrella of the Google Summer of Code. The project on implementing an indexing & lazy-loading sequence parser looks interesting to me and, while difficult, it is something that I could wrap my mind around. I apologize in advance for the wall of text but if you have the time I'd like to ask a couple of questions relating to implementation as I prepare my proposal. 1) Should the lazy loading be done primarily in the context of records returned from the SeqIO.index() dict-like object, or should the lazy loading be available to the generator made by SeqIO.parse()? The project idea in the wiki mentions adding lazy-loading to SeqIO.parse() but it seems to me that the best implementation of lazy loading in these two SeqIO functions would be significantly different. My initial impression of the project would be for SeqIO.parse() to stage a file segment and selectively generate information when called while SeqIO.index() would use a more detailed map created at instantiation to pull information selectively. 2) Is slower instantiation an acceptable trade-off for memory efficiency? In the current implementation of SeqIO.index(), sequence files are read twice, once to annotate beginning points of entries and a second time to load the SeqRecord requested by __getitem__(). A lazy-loading parser could amplify this issue if it works by indexing locations other than the start of the record. The alternative approach of passing the complete textual sequence record and selectively parsing would be easier to implement (and would include dual compatibility with parse and index) but it seems that it would be slower when called and potentially less memory efficient. Any of your thoughts and comments are appreciated, - Evan From w.arindrarto at gmail.com Thu Mar 13 05:04:16 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 13 Mar 2014 10:04:16 +0100 Subject: [Biopython-dev] Gsoc 2014: another aspirant here In-Reply-To: References: Message-ID: Hi Evan, Thank you for your interest in the project :). It's good to know you're already quite familiar with SeqIO as well. My replies are below. > 1) Should the lazy loading be done primarily in the context of records > returned from the SeqIO.index() dict-like object, or should the lazy > loading be available to the generator made by SeqIO.parse()? The project > idea in the wiki mentions adding lazy-loading to SeqIO.parse() but it seems > to me that the best implementation of lazy loading in these two SeqIO > functions would be significantly different. My initial impression of the > project would be for SeqIO.parse() to stage a file segment and selectively > generate information when called while SeqIO.index() would use a more > detailed map created at instantiation to pull information selectively. We don't necessarily have to be restricted to SeqIO.index() objects here. You'll notice of course that SeqIO.index() indexes complete records without granularity up to the possible subsequences. What we're looking for is compatibility with our existing SeqIO parsers. The lazy parser may well be a new object implemented alongside SeqIO, but the parsing logic itself (the one whose invocation is delayed by the lazy parser) should rely on existing parsers. > 2) Is slower instantiation an acceptable trade-off for memory efficiency? > In the current implementation of SeqIO.index(), sequence files are read > twice, once to annotate beginning points of entries and a second time to > load the SeqRecord requested by __getitem__(). A lazy-loading parser could > amplify this issue if it works by indexing locations other than the start > of the record. The alternative approach of passing the complete textual > sequence record and selectively parsing would be easier to implement (and > would include dual compatibility with parse and index) but it seems that it > would be slower when called and potentially less memory efficient. I think this will depend on what you want to store in the indices and how you store them, which will most likely differ per sequencing file format. Coming up with this, we expect, is an important part of the project implementation. Doing a first pass for indexing is acceptable. Instantiation of the object using the index doesn't necessarily have to be slow. Retrieval of the actual (sub)sequence will be slower since we will touch the disk and do the actual parsing by then. But this can also be improved, perhaps by caching the result so subsequent retrieval is faster. One important point (and the use case that we envision for this project) is that subsequences in large sequence files (genome assemblies, for example) can be retrieved quite quickly. Take a look at some existing indexing implementations, such as faidx[1] for FASTA files and BAM indexing[2]. Looking at the tabix[3] tool may also help. The faidx indexing, for example, relies on the FASTA file having the same line length, which means it can be used to retrieve subsequences given only the file offset of a FASTA record. Hope this gives you some useful hints. Good luck with your proposal :). Cheers, Bow [1] http://samtools.sourceforge.net/samtools.shtml [2] http://samtools.github.io/hts-specs/SAMv1.pdf [3] http://bioinformatics.oxfordjournals.org/content/27/5/718 From eparker at ucdavis.edu Thu Mar 13 15:04:34 2014 From: eparker at ucdavis.edu (Evan Parker) Date: Thu, 13 Mar 2014 12:04:34 -0700 Subject: [Biopython-dev] Gsoc 2014: another aspirant here In-Reply-To: References: Message-ID: Thank you Bow, I'll need to digest this a bit, but you have given me a good direction. My inclination for the proposal is to focus on sequential file formats used to transmit 'databases' of sequences (like fasta, embl, uniprot-xml, swiss, and others) and to mostly ignore formats used to convey alignment (ie. anything covered exclusively by parsers in AlignIO). If this is a poor direction please tell me so that I can add to my preparation. -Evan Evan Parker Ph.D. Candidate Dept. of Chemistry - Lebrilla Lab University of California, Davis On Thu, Mar 13, 2014 at 2:04 AM, Wibowo Arindrarto wrote: > Hi Evan, > > Thank you for your interest in the project :). It's good to know > you're already quite familiar with SeqIO as well. > > My replies are below. > > > 1) Should the lazy loading be done primarily in the context of records > > returned from the SeqIO.index() dict-like object, or should the lazy > > loading be available to the generator made by SeqIO.parse()? The project > > idea in the wiki mentions adding lazy-loading to SeqIO.parse() but it > seems > > to me that the best implementation of lazy loading in these two SeqIO > > functions would be significantly different. My initial impression of the > > project would be for SeqIO.parse() to stage a file segment and > selectively > > generate information when called while SeqIO.index() would use a more > > detailed map created at instantiation to pull information selectively. > > We don't necessarily have to be restricted to SeqIO.index() objects > here. You'll notice of course that SeqIO.index() indexes complete > records without granularity up to the possible subsequences. What > we're looking for is compatibility with our existing SeqIO parsers. > The lazy parser may well be a new object implemented alongside SeqIO, > but the parsing logic itself (the one whose invocation is delayed by > the lazy parser) should rely on existing parsers. > > > 2) Is slower instantiation an acceptable trade-off for memory efficiency? > > In the current implementation of SeqIO.index(), sequence files are read > > twice, once to annotate beginning points of entries and a second time to > > load the SeqRecord requested by __getitem__(). A lazy-loading parser > could > > amplify this issue if it works by indexing locations other than the start > > of the record. The alternative approach of passing the complete textual > > sequence record and selectively parsing would be easier to implement (and > > would include dual compatibility with parse and index) but it seems that > it > > would be slower when called and potentially less memory efficient. > > I think this will depend on what you want to store in the indices and > how you store them, which will most likely differ per sequencing file > format. Coming up with this, we expect, is an important part of the > project implementation. Doing a first pass for indexing is acceptable. > Instantiation of the object using the index doesn't necessarily have > to be slow. Retrieval of the actual (sub)sequence will be slower since > we will touch the disk and do the actual parsing by then. But this can > also be improved, perhaps by caching the result so subsequent > retrieval is faster. One important point (and the use case that we > envision for this project) is that subsequences in large sequence > files (genome assemblies, for example) can be retrieved quite quickly. > > Take a look at some existing indexing implementations, such as > faidx[1] for FASTA files and BAM indexing[2]. Looking at the tabix[3] > tool may also help. The faidx indexing, for example, relies on the > FASTA file having the same line length, which means it can be used to > retrieve subsequences given only the file offset of a FASTA record. > > Hope this gives you some useful hints. Good luck with your proposal :). > > Cheers, > Bow > > [1] http://samtools.sourceforge.net/samtools.shtml > [2] http://samtools.github.io/hts-specs/SAMv1.pdf > [3] http://bioinformatics.oxfordjournals.org/content/27/5/718 > From w.arindrarto at gmail.com Fri Mar 14 01:30:13 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 14 Mar 2014 06:30:13 +0100 Subject: [Biopython-dev] Gsoc 2014: another aspirant here In-Reply-To: References: Message-ID: Hi Evan, Focusing on the SeqIO parsers is ok. That's where having lazy parsers would help most (and you've got a handful of formats there already). Remember that you'll also need to account for time to write tests, possibly benchmark or profile the code (lazy parsers should improve performance after all), and write documentation, outside of writing the code itself. You'll also want to be clear about this in your proposed timeline, since that will be your main guide during the coding period. Looking forward to reading your proposal :), Bow On Thu, Mar 13, 2014 at 8:04 PM, Evan Parker wrote: > Thank you Bow, > > I'll need to digest this a bit, but you have given me a good direction. My > inclination for the proposal is to focus on sequential file formats used to > transmit 'databases' of sequences (like fasta, embl, uniprot-xml, swiss, and > others) and to mostly ignore formats used to convey alignment (ie. anything > covered exclusively by parsers in AlignIO). If this is a poor direction > please tell me so that I can add to my preparation. > > -Evan > > Evan Parker > Ph.D. Candidate > Dept. of Chemistry - Lebrilla Lab > University of California, Davis > > > On Thu, Mar 13, 2014 at 2:04 AM, Wibowo Arindrarto > wrote: >> >> Hi Evan, >> >> Thank you for your interest in the project :). It's good to know >> you're already quite familiar with SeqIO as well. >> >> My replies are below. >> >> > 1) Should the lazy loading be done primarily in the context of records >> > returned from the SeqIO.index() dict-like object, or should the lazy >> > loading be available to the generator made by SeqIO.parse()? The project >> > idea in the wiki mentions adding lazy-loading to SeqIO.parse() but it >> > seems >> > to me that the best implementation of lazy loading in these two SeqIO >> > functions would be significantly different. My initial impression of the >> > project would be for SeqIO.parse() to stage a file segment and >> > selectively >> > generate information when called while SeqIO.index() would use a more >> > detailed map created at instantiation to pull information selectively. >> >> We don't necessarily have to be restricted to SeqIO.index() objects >> here. You'll notice of course that SeqIO.index() indexes complete >> records without granularity up to the possible subsequences. What >> we're looking for is compatibility with our existing SeqIO parsers. >> The lazy parser may well be a new object implemented alongside SeqIO, >> but the parsing logic itself (the one whose invocation is delayed by >> the lazy parser) should rely on existing parsers. >> >> > 2) Is slower instantiation an acceptable trade-off for memory >> > efficiency? >> > In the current implementation of SeqIO.index(), sequence files are read >> > twice, once to annotate beginning points of entries and a second time to >> > load the SeqRecord requested by __getitem__(). A lazy-loading parser >> > could >> > amplify this issue if it works by indexing locations other than the >> > start >> > of the record. The alternative approach of passing the complete textual >> > sequence record and selectively parsing would be easier to implement >> > (and >> > would include dual compatibility with parse and index) but it seems that >> > it >> > would be slower when called and potentially less memory efficient. >> >> I think this will depend on what you want to store in the indices and >> how you store them, which will most likely differ per sequencing file >> format. Coming up with this, we expect, is an important part of the >> project implementation. Doing a first pass for indexing is acceptable. >> Instantiation of the object using the index doesn't necessarily have >> to be slow. Retrieval of the actual (sub)sequence will be slower since >> we will touch the disk and do the actual parsing by then. But this can >> also be improved, perhaps by caching the result so subsequent >> retrieval is faster. One important point (and the use case that we >> envision for this project) is that subsequences in large sequence >> files (genome assemblies, for example) can be retrieved quite quickly. >> >> Take a look at some existing indexing implementations, such as >> faidx[1] for FASTA files and BAM indexing[2]. Looking at the tabix[3] >> tool may also help. The faidx indexing, for example, relies on the >> FASTA file having the same line length, which means it can be used to >> retrieve subsequences given only the file offset of a FASTA record. >> >> Hope this gives you some useful hints. Good luck with your proposal :). >> >> Cheers, >> Bow >> >> [1] http://samtools.sourceforge.net/samtools.shtml >> [2] http://samtools.github.io/hts-specs/SAMv1.pdf >> [3] http://bioinformatics.oxfordjournals.org/content/27/5/718 > > From p.j.a.cock at googlemail.com Fri Mar 14 09:34:40 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 14 Mar 2014 13:34:40 +0000 Subject: [Biopython-dev] Gsoc 2014: another aspirant here In-Reply-To: References: Message-ID: On Fri, Mar 14, 2014 at 5:30 AM, Wibowo Arindrarto wrote: > Hi Evan, > > Focusing on the SeqIO parsers is ok. That's where having lazy parsers > would help most (and you've got a handful of formats there already). > Remember that you'll also need to account for time to write tests, > possibly benchmark or profile the code (lazy parsers should improve > performance after all), and write documentation, outside of writing > the code itself. You'll also want to be clear about this in your > proposed timeline, since that will be your main guide during the > coding period. > > Looking forward to reading your proposal :), > Bow Yes, profiling will be important here - if your script accesses all the annotation/sequence/etc of a record, then the lazy parser will probably be slower (all the same work, plus an overhead). It should win when only a subset of the data is needed, both in terms of speed and memory usage. Peter From eric.talevich at gmail.com Sat Mar 15 01:29:21 2014 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 14 Mar 2014 22:29:21 -0700 Subject: [Biopython-dev] Google Summer of Code 2014: Call for student applications Message-ID: Hi everyone, Google Summer of Code is an annual program that funds students all over the world to work with open-source software projects to develop new code. This summer, the Open Bioinformatics Foundation (OBF) is taking on students through the Google Summer of Code program to work with mentors on established bioinformatics software projects including BioPython. We invite students to submit applications by Friday, March 21. Full details are here: http://news.open-bio.org/news/2014/03/obf-gsoc-2014-call-for-student-applications/ All the best, Eric & Raoul OBF GSoC organization admins From arklenna at gmail.com Sun Mar 16 16:53:22 2014 From: arklenna at gmail.com (Lenna Peterson) Date: Sun, 16 Mar 2014 16:53:22 -0400 Subject: [Biopython-dev] Problem on Mac OS X since update Xcode 5.1 In-Reply-To: References: Message-ID: On Wed, Mar 12, 2014 at 5:59 AM, Peter Cock wrote: > On Wed, Mar 12, 2014 at 9:48 AM, Peter Cock > wrote: > > Hi all, > > > > I installed the Xcode 5.1 update last night on my work Mac, and > > this seems to have broken the builds on Python 2.6 and 2.7 > > (run via builtbot). > > > > > http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.6/builds/174/steps/compile/logs/stdio > > ... > > running build_ext > > building 'Bio.cpairwise2' extension > > creating build/temp.macosx-10.9-intel-2.6 > > creating build/temp.macosx-10.9-intel-2.6/Bio > > cc -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common > > -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX > > -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g > > -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 > > -arch i386 -pipe > > > -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 > > -c Bio/cpairwise2module.c -o > > build/temp.macosx-10.9-intel-2.6/Bio/cpairwise2module.o > > clang: error: unknown argument: '-mno-fused-madd' > > [-Wunused-command-line-argument-hard-error-in-future] > > clang: note: this will be a hard error (cannot be downgraded to a > > warning) in the future > > error: command 'cc' failed with exit status 1 > > > > > http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.7/builds/170/steps/compile/logs/stdio > > ... > > running build_ext > > building 'Bio.cpairwise2' extension > > creating build/temp.macosx-10.9-intel-2.7 > > creating build/temp.macosx-10.9-intel-2.7/Bio > > cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 > > -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd > > -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes > > -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes > > -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe > > > -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 > > -c Bio/cpairwise2module.c -o > > build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o > > clang: error: unknown argument: '-mno-fused-madd' > > [-Wunused-command-line-argument-hard-error-in-future] > > clang: note: this will be a hard error (cannot be downgraded to a > > warning) in the future > > error: command 'cc' failed with exit status 1 > > > > This looks like a problem where distutils is using a gcc argument > > which cc (clang) used to ignore but not treats as an error. There > > will probably be similar reports on other Python projects as well... > > > > Peter > > This looks relevant, especially this reply from Paul Kehrer which > suggests this is entirely Apple's fault for shipping a Python and > clang compiler which don't get along with the default settings: > > > http://stackoverflow.com/questions/22313407/clang-error-unknown-argument-mno-fused-madd-psycopg2-installation-failure > > The suggested workaround seems to do the trick, > > $ export CFLAGS=-Qunused-arguments > $ export CPPFLAGS=-Qunused-arguments > > I encountered the same problem (clean install of Mavericks, vanilla Python, latest XCode from App Store). One answer [1] suggests this is not a guaranteed solution but offers a different flag (which I did not test). I chose to edit system python files [2] which is definitely not the best option for most users. [1]: http://stackoverflow.com/a/22315129 [2]: http://stackoverflow.com/a/22322068 > Perhaps we can add this hack to our setup.py on Mac OS X... > it seems harmless under gcc (e.g. my locally compiled version > of Python 3.3 used gcc rather than clang)? > Do you mean editing environment variables with `os.environ`? I don't know enough about the details of how packages are built to know what will work with both compiling from source, easy_install, pip, etc. > > Or it could be done via the buildbot setup, or on this buildslave > directly (e.g. the ~/.bash_profile). > It's a dilemma, because asking users to edit their .bashrc or .bash_profile before installation is annoying and easy to overlook, but modifying them in setup.py feels hacky (i.e. how long will this solution work?). Crossing my fingers and hoping Apple fixes this in an update... > > What are folks' thoughts on this? We want it to remain easy > to install Biopython from source under Mac OS X. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From p.j.a.cock at googlemail.com Sun Mar 16 17:15:06 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 16 Mar 2014 21:15:06 +0000 Subject: [Biopython-dev] Problem on Mac OS X since update Xcode 5.1 In-Reply-To: References: Message-ID: On Sun, Mar 16, 2014 at 8:53 PM, Lenna Peterson wrote: > On Wed, Mar 12, 2014 at 5:59 AM, Peter Cock > wrote: >> >> ... >> >> This looks relevant, especially this reply from Paul Kehrer which >> suggests this is entirely Apple's fault for shipping a Python and >> clang compiler which don't get along with the default settings: >> >> >> http://stackoverflow.com/questions/22313407/clang-error-unknown-argument-mno-fused-madd-psycopg2-installation-failure >> > >> The suggested workaround seems to do the trick, >> >> $ export CFLAGS=-Qunused-arguments >> $ export CPPFLAGS=-Qunused-arguments >> > > I encountered the same problem (clean install of Mavericks, vanilla Python, > latest XCode from App Store). > > One answer [1] suggests this is not a guaranteed solution but offers a > different flag (which I did not test). > > I chose to edit system python files [2] which is definitely not the best > option for most users. > > [1]: http://stackoverflow.com/a/22315129 > [2]: http://stackoverflow.com/a/22322068 > >> Perhaps we can add this hack to our setup.py on Mac OS X... >> it seems harmless under gcc (e.g. my locally compiled version >> of Python 3.3 used gcc rather than clang)? > > Do you mean editing environment variables with `os.environ`? I don't know > enough about the details of how packages are built to know what will work > with both compiling from source, easy_install, pip, etc. Yes, I was thinking about editing the environment variables in setup.py via the os module. I agree there are potential risks with 3rd party installers, but adding -Qunused-arguments to any existing CFLAGS (within the scope of the Biopython install) is hopefully low risk... >> Or it could be done via the buildbot setup, or on this buildslave >> directly (e.g. the ~/.bash_profile). > > It's a dilemma, because asking users to edit their .bashrc or .bash_profile > before installation is annoying and easy to overlook, but modifying them in > setup.py feels hacky (i.e. how long will this solution work?). Crossing my > fingers and hoping Apple fixes this in an update... > Fingers crossed Apple pushes another update in the next few weeks to resolve this... Peter From anaryin at gmail.com Mon Mar 17 12:05:04 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 17 Mar 2014 17:05:04 +0100 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References: Message-ID: Dear all, I created a new 'empty' branch called bio.struct on my github account. The only change from the master branch is a new folder - Bio/Struct - that has an empty __init__.py in there. Please add issues with feature requests, and if you are willing to start coding, I'd say fork and go ahead! https://github.com/JoaoRodrigues/biopython/tree/bio.struct I also added a small wiki page with a description. 2014-02-20 0:05 GMT+01:00 Morten Kjeldgaard : > > On 19/02/2014, at 17:35, David Cain wrote: > > > I frequently make use of Bio.PDB, and agree wholeheartedly that certain > > aspects of it are very dated, or haphazardly organized. > > > > The module as a whole would benefit greatly from some extra attention. > I'm > > happy to lend a hand in whatever revamp takes place. > > I second that. I am also willing to participate in this project! > > Cheers, > Morten > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From p.j.a.cock at googlemail.com Mon Mar 17 12:15:06 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 17 Mar 2014 16:15:06 +0000 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References: Message-ID: On Mon, Mar 17, 2014 at 4:05 PM, Jo?o Rodrigues wrote: > Dear all, > > I created a new 'empty' branch called bio.struct on my github account. The > only change from the master branch is a new folder - Bio/Struct - that has > an empty __init__.py in there. Please add issues with feature requests, and > if you are willing to start coding, I'd say fork and go ahead! > > https://github.com/JoaoRodrigues/biopython/tree/bio.struct > > I also added a small wiki page with a > description. Are we all generally in favour of lower case for new module names (as per PEP8)? i.e. Bio/struct not Bio/Struct ? Peter From anaryin at gmail.com Mon Mar 17 12:19:31 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 17 Mar 2014 17:19:31 +0100 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References: Message-ID: Hello Peter, Sorry, typo actually, wrote with small case everywhere but the module name.. thanks. Also something I have in mind. Should wrappers for NACCESS and DSSP be refactored to use Bio.Application? From p.j.a.cock at googlemail.com Mon Mar 17 12:32:30 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 17 Mar 2014 16:32:30 +0000 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References: Message-ID: On Mon, Mar 17, 2014 at 4:19 PM, Jo?o Rodrigues wrote: > Hello Peter, > > Sorry, typo actually, wrote with small case everywhere but the module name.. > thanks. > > Also something I have in mind. Should wrappers for NACCESS and > DSSP be refactored to use Bio.Application? If you think it would help, sure. Peter From anaryin at gmail.com Mon Mar 17 12:33:55 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 17 Mar 2014 17:33:55 +0100 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References: Message-ID: To be honest, more of an issue of internal consistency with the rest of the code base. I'd have to look into it more carefully to see if it fits.. 2014-03-17 17:32 GMT+01:00 Peter Cock : > On Mon, Mar 17, 2014 at 4:19 PM, Jo?o Rodrigues wrote: > > Hello Peter, > > > > Sorry, typo actually, wrote with small case everywhere but the module > name.. > > thanks. > > > > Also something I have in mind. Should wrappers for NACCESS and > > DSSP be refactored to use Bio.Application? > > If you think it would help, sure. > > Peter > From tra at popgen.net Mon Mar 17 12:53:52 2014 From: tra at popgen.net (Tiago Antao) Date: Mon, 17 Mar 2014 16:53:52 +0000 Subject: [Biopython-dev] Dialign2 testing... Message-ID: <20140317165352.36db07ee@lnx> Hi, Still on the quest for a test run that actually runs all the tests. Can someone suggest what would be a sensible value for DIALIGN2-DIR? It seems that setting up the test is not trivial: there seems to be a need a BLOSUM file inside the dialign directory? Any clues would be appreciated... From p.j.a.cock at googlemail.com Mon Mar 17 14:35:25 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 17 Mar 2014 18:35:25 +0000 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References: Message-ID: Hi all, Bow (regarding SearchIO) others should probably read this... I've commented, see also: http://blastedbio.blogspot.co.uk/2014/02/blast-xml-output-needs-more-love-from.html Peter ---------- Forwarded message ---------- From: Maloney, Christopher (NIH/NLM/NCBI) [C] Date: Mon, Mar 17, 2014 at 5:17 PM Subject: [Open-bio-l] Proposed BLAST XML Changes To: "open-bio-l at lists.open-bio.org" We are not directly soliciting comments, but if anyone would like to make any technical or programmatic suggestions, there is a link from which anyone may comment in the document. ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/ProposedBLASTXMLChanges.pdf Thank you. P.S. Please re-post this to other lists that might have interested readers. Chris Maloney NIH/NLM/NCBI (Contractor) Building 45, 5AN.24D-22 301-594-2842 _______________________________________________ Open-Bio-l mailing list Open-Bio-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/open-bio-l From kirkolem at gmail.com Mon Mar 17 16:07:26 2014 From: kirkolem at gmail.com (Dan K.) Date: Tue, 18 Mar 2014 00:07:26 +0400 Subject: [Biopython-dev] GSoC 2014 Message-ID: Hello! I'm a third year student learning bioengineering and bioinformatics and I'm interested in participating in GSoC and contributing to the BioPython project. In particular, working on multiple alignments and scoring matrices (PSSMs) seems interesting to me. For example, I find convenient to implement Gerstein-Sonnhammer-Chothia and Henikoff weighting procedures. What do you think? What my next steps should be like? Thank you for your attention. From ben at benfulton.net Mon Mar 17 20:38:17 2014 From: ben at benfulton.net (Ben Fulton) Date: Mon, 17 Mar 2014 20:38:17 -0400 Subject: [Biopython-dev] Dialign2 testing... In-Reply-To: <20140317165352.36db07ee@lnx> References: <20140317165352.36db07ee@lnx> Message-ID: I looked at that last year. As far as I could tell the actual code didn't do anything useful with that value; I removed the precondition checks from the tests and it ran properly. On Mon, Mar 17, 2014 at 12:53 PM, Tiago Antao wrote: > Hi, > > Still on the quest for a test run that actually runs all the tests. > > Can someone suggest what would be a sensible value for DIALIGN2-DIR? > It seems that setting up the test is not trivial: there seems to be a > need a BLOSUM file inside the dialign directory? > > Any clues would be appreciated... > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From harsh.beria93 at gmail.com Mon Mar 17 20:39:25 2014 From: harsh.beria93 at gmail.com (Harsh Beria) Date: Tue, 18 Mar 2014 06:09:25 +0530 Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> References: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: Hi, I have started to write a proposal for a project on pair wise sequence alignment. Is there anyone interested in mentoring the project so that I can discuss some of the algorithmic problems in detail? Also, do I need to add the project to the ideas page as it is not there yet? Thanks On Mar 6, 2014 7:19 AM, "Michiel de Hoon" wrote: > Hi Nigel, > > While compiling Biopython on Windows can be tricky, in my experience it > has been easy to compile the C libraries in Biopython on other platforms > (Unix/Linux/MacOSX). Have you run into specific problems compiling > Biopython? I would think that wrapping 3rd-party libraries or executables > is much more error-prone. > > Best, > -Michiel. > > -------------------------------------------- > On Tue, 3/4/14, Nigel Delaney wrote: > > Subject: Re: [Biopython-dev] [GSoC] Gsoc 2014 aspirant > To: "'Peter Cock'" , "'Fields, Christopher > J'" > Cc: biopython-dev at lists.open-bio.org, "'Harsh Beria'" < > harsh.beria93 at gmail.com> > Date: Tuesday, March 4, 2014, 5:39 PM > > As a quick $0.02 from a library user > on this. Back in 2006 when I first > started using biopython I was working with Bio.pairwise2, > and learned that > it was too slow for the task at hand, so wound up switching > to > passing/parsing files to an aligner on the command line to > get around this, > as (on windows back then), I never got the compiled C code > to work properly. > Although pypy may be fast enough and has been very > impressive in my > experience (plus computers are much faster now), I think > wrapping a library > that is cross platform and maintains its own binaries would > be a great > option rather than implementing C-code (I think this might > be what the GSoC > student did last year for BioPython by wrapping Python). > > Mostly though I just wanted to second Peter's wariness about > adding C-code > into the library. I have found over the years that a > lot of python > scientific tools that in theory should be cross platform > (Stampy, IPython, > Matplotlib, Numpy, GATK, etc.) are really not and can be a > huge timesuck of > dealing with installation issues as code moves between > computers and > operating systems, usually due to some C code or OS specific > behavior. > Since code in python works anywhere python is installed, I > really appreciate > the extent that the library can be as much pure python as > allowable or > strictly dependent on a particular downloadable binary for a > specific > OS/Architecture/Scenario. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > From w.arindrarto at gmail.com Tue Mar 18 05:52:29 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 18 Mar 2014 10:52:29 +0100 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References: Message-ID: Hi Peter, everyone, Thanks for the heads up. If implemented as it is, the updates will change our underlying SearchIO model (aside from the blast-xml parser itself), by allowing a Hit retrieval using multiple different keys. I have a feeling it will be difficult to jam all the new changes into a backwards-compatible parser. One way to make it transparent to users is to use the underlying DTD to do validation before parsing (for the two BLAST DTDs, use the one which the file can be validated against). However, this comes at a price. Since the standard library-bundled elementtree doesn't seem to support validation, we have to use another library (lxml is my choice). This means adding 3rd party dependency which require compiling (lxml is also partly written in C). The other option is to introduce a new format name (e.g. 'blast-xml2'), which makes the user responsible for knowing which BLAST XML he/she is parsing. It feels more explicit this way, so I am leaning towards this option, despite 'blast-xml2' not sounding very nice to me ;). Any other thoughts? Best, Bow On Mon, Mar 17, 2014 at 7:35 PM, Peter Cock wrote: > Hi all, > > Bow (regarding SearchIO) others should probably read this... > > I've commented, see also: > http://blastedbio.blogspot.co.uk/2014/02/blast-xml-output-needs-more-love-from.html > > Peter > > > ---------- Forwarded message ---------- > From: Maloney, Christopher (NIH/NLM/NCBI) [C] > Date: Mon, Mar 17, 2014 at 5:17 PM > Subject: [Open-bio-l] Proposed BLAST XML Changes > To: "open-bio-l at lists.open-bio.org" > > > We are not directly soliciting comments, but if anyone would like to > make any technical or programmatic suggestions, there is a link from > which anyone may comment in the document. > > ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/ProposedBLASTXMLChanges.pdf > > Thank you. > > > P.S. Please re-post this to other lists that might have interested readers. > > Chris Maloney > NIH/NLM/NCBI (Contractor) > Building 45, 5AN.24D-22 > 301-594-2842 > > > _______________________________________________ > Open-Bio-l mailing list > Open-Bio-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/open-bio-l > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Tue Mar 18 06:17:48 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 10:17:48 +0000 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References: Message-ID: On Tue, Mar 18, 2014 at 9:52 AM, Wibowo Arindrarto wrote: > Hi Peter, everyone, > > Thanks for the heads up. If implemented as it is, the updates will > change our underlying SearchIO model (aside from the blast-xml parser > itself), by allowing a Hit retrieval using multiple different keys. Could you clarify what you mean by multiple keys here? > I have a feeling it will be difficult to jam all the new changes into > a backwards-compatible parser. One way to make it transparent to users > is to use the underlying DTD to do validation before parsing (for the > two BLAST DTDs, use the one which the file can be validated against). > However, this comes at a price. Since the standard library-bundled > elementtree doesn't seem to support validation, we have to use another > library (lxml is my choice). This means adding 3rd party dependency > which require compiling (lxml is also partly written in C). We can probably tell by sniffing the first few lines... but how to do that without using a handle seek to rewind may be tricky (desirable to support parsing streams, e.g. stdin). > The other option is to introduce a new format name (e.g. > 'blast-xml2'), which makes the user responsible for knowing which > BLAST XML he/she is parsing. It feels more explicit this way, so I am > leaning towards this option, despite 'blast-xml2' not sounding very > nice to me ;). > > Any other thoughts? > > Best, > Bow I agree for the SearchIO interface, two format names makes sense - unless there is a neat way to auto-detect this on input. Using "blast-xml2" would work, or maybe something like "blast-xml-2014" (too long?). We could even go for "blast-xml-old" and "blast-xml" perhaps? Peter From w.arindrarto at gmail.com Tue Mar 18 06:33:55 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 18 Mar 2014 11:33:55 +0100 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References: Message-ID: On Tue, Mar 18, 2014 at 11:17 AM, Peter Cock wrote: > On Tue, Mar 18, 2014 at 9:52 AM, Wibowo Arindrarto > wrote: >> Hi Peter, everyone, >> >> Thanks for the heads up. If implemented as it is, the updates will >> change our underlying SearchIO model (aside from the blast-xml parser >> itself), by allowing a Hit retrieval using multiple different keys. > > Could you clarify what you mean by multiple keys here? Currently, we can retrieve hits from a query using its ID, aside from its numeric index. With their proposed changes to the Hit element here: ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/ProposedBLASTXMLChanges.pdf, it means that a given Hit can now be annotated with more than one ID. Ideally, this should also be reflected in the QueryResult object: a hit item should be retrievable using any of the IDs it has. This will also affect membership checking on the QueryResult object. >> I have a feeling it will be difficult to jam all the new changes into >> a backwards-compatible parser. One way to make it transparent to users >> is to use the underlying DTD to do validation before parsing (for the >> two BLAST DTDs, use the one which the file can be validated against). >> However, this comes at a price. Since the standard library-bundled >> elementtree doesn't seem to support validation, we have to use another >> library (lxml is my choice). This means adding 3rd party dependency >> which require compiling (lxml is also partly written in C). > > We can probably tell by sniffing the first few lines... but how > to do that without using a handle seek to rewind may be > tricky (desirable to support parsing streams, e.g. stdin). Ah yes. We have a rewindable file seek object in Bio.File, don't we :)? I'll have to play around with some real datasets first, I think. The other thing we should take into account is the Xinclude tag. Would we want to make it possible to query *either* the single query XML results or the master Xinclude document (point 2 of the proposed change)? Or should we restrict our parser only to the single query files? >> The other option is to introduce a new format name (e.g. >> 'blast-xml2'), which makes the user responsible for knowing which >> BLAST XML he/she is parsing. It feels more explicit this way, so I am >> leaning towards this option, despite 'blast-xml2' not sounding very >> nice to me ;). >> >> Any other thoughts? >> >> Best, >> Bow > > I agree for the SearchIO interface, two format names makes > sense - unless there is a neat way to auto-detect this on input. > > Using "blast-xml2" would work, or maybe something like > "blast-xml-2014" (too long?). > > We could even go for "blast-xml-old" and "blast-xml" perhaps? Hmm..'blast-xml-old', may make it difficult to adapt for future XML schema changes. How about renaming the current parser to 'blast-xml-legacy', and the new one to just 'blast-xml'? Cheers, Bow From p.j.a.cock at googlemail.com Tue Mar 18 06:38:53 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 10:38:53 +0000 Subject: [Biopython-dev] Dialign2 testing... In-Reply-To: References: <20140317165352.36db07ee@lnx> Message-ID: Hi Tiago, Ben, >From memory if the environment variable was not set, the command line tool would fail in a strange way - so I made the test conditional on having the variable set. Perhaps things have changed slightly since 2009, https://github.com/biopython/biopython/commit/d4ea47e27f3a8aa7ebe1460b0d96c3135f6bfba5 or maybe this depends on how dialign2 is installed... possibly the Linux packages didn't exist back then? Peter On Tue, Mar 18, 2014 at 12:38 AM, Ben Fulton wrote: > I looked at that last year. As far as I could tell the actual code didn't > do anything useful with that value; I removed the precondition checks from > the tests and it ran properly. > > On Mon, Mar 17, 2014 at 12:53 PM, Tiago Antao wrote: > >> Hi, >> >> Still on the quest for a test run that actually runs all the tests. >> >> Can someone suggest what would be a sensible value for DIALIGN2-DIR? >> It seems that setting up the test is not trivial: there seems to be a >> need a BLOSUM file inside the dialign directory? >> >> Any clues would be appreciated... >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Tue Mar 18 06:58:06 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 10:58:06 +0000 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References: Message-ID: On Tue, Mar 18, 2014 at 10:33 AM, Wibowo Arindrarto wrote: > On Tue, Mar 18, 2014 at 11:17 AM, Peter Cock wrote: >> On Tue, Mar 18, 2014 at 9:52 AM, Wibowo Arindrarto >> wrote: >>> Hi Peter, everyone, >>> >>> Thanks for the heads up. If implemented as it is, the updates will >>> change our underlying SearchIO model (aside from the blast-xml parser >>> itself), by allowing a Hit retrieval using multiple different keys. >> >> Could you clarify what you mean by multiple keys here? > > Currently, we can retrieve hits from a query using its ID, aside from > its numeric index. With their proposed changes to the Hit element > here: ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/ProposedBLASTXMLChanges.pdf, > it means that a given Hit can now be annotated with more than one ID. But this happens already in the current output from merged entries in databases like NR - we effectively use the first alternative ID as the hit ID. See for example the nasty > separated entries in the legacy BLAST XML's tag where only the first ID appears in the tag: http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions.html See also the new optional fields in the tabular output which explicitly list all the aliases for the merge record (e.g. sallseqid). > Ideally, this should also be reflected in the QueryResult object: a > hit item should be retrievable using any of the IDs it has. > > This will also affect membership checking on the QueryResult object. This looks like something we should review anyway, regardless of the new BLAST XML format. >>> I have a feeling it will be difficult to jam all the new changes into >>> a backwards-compatible parser. One way to make it transparent to users >>> is to use the underlying DTD to do validation before parsing (for the >>> two BLAST DTDs, use the one which the file can be validated against). >>> However, this comes at a price. Since the standard library-bundled >>> elementtree doesn't seem to support validation, we have to use another >>> library (lxml is my choice). This means adding 3rd party dependency >>> which require compiling (lxml is also partly written in C). >> >> We can probably tell by sniffing the first few lines... but how >> to do that without using a handle seek to rewind may be >> tricky (desirable to support parsing streams, e.g. stdin). > > Ah yes. We have a rewindable file seek object in Bio.File, don't we > :)? I'll have to play around with some real datasets first, I think. Yes, the UndoHandle in Bio.File might be the best solution here for auto-detection. But two explicit formats is probably better. > The other thing we should take into account is the Xinclude tag. Would > we want to make it possible to query *either* the single query XML > results or the master Xinclude document (point 2 of the proposed > change)? Or should we restrict our parser only to the single query > files? I think single files is a reasonable restriction... assuming BLAST will still have the option of producing a big multi-query XML? Probably we should ask the NCBI about that... I would hope the Bio.SearchIO.index_db(...) approach could be used on a colloection of little XML files, one for each query. >>> The other option is to introduce a new format name (e.g. >>> 'blast-xml2'), which makes the user responsible for knowing which >>> BLAST XML he/she is parsing. It feels more explicit this way, so I am >>> leaning towards this option, despite 'blast-xml2' not sounding very >>> nice to me ;). >>> >>> Any other thoughts? >>> >>> Best, >>> Bow >> >> I agree for the SearchIO interface, two format names makes >> sense - unless there is a neat way to auto-detect this on input. >> >> Using "blast-xml2" would work, or maybe something like >> "blast-xml-2014" (too long?). >> >> We could even go for "blast-xml-old" and "blast-xml" perhaps? > > Hmm..'blast-xml-old', may make it difficult to adapt for future XML > schema changes. How about renaming the current parser to > 'blast-xml-legacy', and the new one to just 'blast-xml'? A possible downside of 'blast-xml-legacy' over 'blast-xml-old' is this may be confused with the "legacy" BLAST in C to the current BLAST+ in C++ move (which happened well before this XML format change). Peter From tra at popgen.net Tue Mar 18 07:22:15 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 18 Mar 2014 11:22:15 +0000 Subject: [Biopython-dev] Dialign2 testing... In-Reply-To: References: <20140317165352.36db07ee@lnx> Message-ID: <20140318112215.4207072e@lnx> On Tue, 18 Mar 2014 10:38:53 +0000 Peter Cock wrote: > From memory if the environment variable was not > set, the command line tool would fail in a strange > way - so I made the test conditional on having the > variable set. I noticed that and created an environment variable, then I got stuck on the BLOSUM issue. Per Ben's suggestion, should be remove the check? Or should I use a non-standard package? Thanks, Tiago From w.arindrarto at gmail.com Tue Mar 18 07:48:56 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 18 Mar 2014 12:48:56 +0100 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References: Message-ID: On Tue, Mar 18, 2014 at 11:58 AM, Peter Cock wrote: > On Tue, Mar 18, 2014 at 10:33 AM, Wibowo Arindrarto > wrote: >> On Tue, Mar 18, 2014 at 11:17 AM, Peter Cock wrote: >>> On Tue, Mar 18, 2014 at 9:52 AM, Wibowo Arindrarto >>> wrote: >>>> Hi Peter, everyone, >>>> >>>> Thanks for the heads up. If implemented as it is, the updates will >>>> change our underlying SearchIO model (aside from the blast-xml parser >>>> itself), by allowing a Hit retrieval using multiple different keys. >>> >>> Could you clarify what you mean by multiple keys here? >> >> Currently, we can retrieve hits from a query using its ID, aside from >> its numeric index. With their proposed changes to the Hit element >> here: ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/ProposedBLASTXMLChanges.pdf, >> it means that a given Hit can now be annotated with more than one ID. > > But this happens already in the current output from merged entries > in databases like NR - we effectively use the first alternative ID as > the hit ID. See for example the nasty > separated entries in > the legacy BLAST XML's tag where only the first ID > appears in the tag: > > http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions.html > > See also the new optional fields in the tabular output which > explicitly list all the aliases for the merge record (e.g. sallseqid). In the BLAST outputs, yes. However, there's no explicit support yet in SearchIO for this. Currently we only parse whatever is in as the ID and as the description. If the tag has is separated by semicolons / has more than one IDs, the current parser does not try to split it into multiple IDs. Instead it takes the whole string as the ID. Also, in the blast tabular format, even though sallseqid is parsed, it's merely stored as an attribute of the hit object, not something that can be used to retrieve Hits from the QueryResult object. >> Ideally, this should also be reflected in the QueryResult object: a >> hit item should be retrievable using any of the IDs it has. >> >> This will also affect membership checking on the QueryResult object. > > This looks like something we should review anyway, regardless > of the new BLAST XML format. Of course :). >>>> I have a feeling it will be difficult to jam all the new changes into >>>> a backwards-compatible parser. One way to make it transparent to users >>>> is to use the underlying DTD to do validation before parsing (for the >>>> two BLAST DTDs, use the one which the file can be validated against). >>>> However, this comes at a price. Since the standard library-bundled >>>> elementtree doesn't seem to support validation, we have to use another >>>> library (lxml is my choice). This means adding 3rd party dependency >>>> which require compiling (lxml is also partly written in C). >>> >>> We can probably tell by sniffing the first few lines... but how >>> to do that without using a handle seek to rewind may be >>> tricky (desirable to support parsing streams, e.g. stdin). >> >> Ah yes. We have a rewindable file seek object in Bio.File, don't we >> :)? I'll have to play around with some real datasets first, I think. > > Yes, the UndoHandle in Bio.File might be the best solution > here for auto-detection. But two explicit formats is probably better. > >> The other thing we should take into account is the Xinclude tag. Would >> we want to make it possible to query *either* the single query XML >> results or the master Xinclude document (point 2 of the proposed >> change)? Or should we restrict our parser only to the single query >> files? > > I think single files is a reasonable restriction... assuming BLAST > will still have the option of producing a big multi-query XML? > Probably we should ask the NCBI about that... In a way, the Xinclude file is the file containing multi-query XML. I have a feeling that if Xinclude is proposed, producing multi-output BLAST XML files will not be an option anymore (otherwise it seems redundant). But yes, NCBI should has more info about this. > I would hope the Bio.SearchIO.index_db(...) approach could > be used on a colloection of little XML files, one for each query. > >>>> The other option is to introduce a new format name (e.g. >>>> 'blast-xml2'), which makes the user responsible for knowing which >>>> BLAST XML he/she is parsing. It feels more explicit this way, so I am >>>> leaning towards this option, despite 'blast-xml2' not sounding very >>>> nice to me ;). >>>> >>>> Any other thoughts? >>>> >>>> Best, >>>> Bow >>> >>> I agree for the SearchIO interface, two format names makes >>> sense - unless there is a neat way to auto-detect this on input. >>> >>> Using "blast-xml2" would work, or maybe something like >>> "blast-xml-2014" (too long?). >>> >>> We could even go for "blast-xml-old" and "blast-xml" perhaps? >> >> Hmm..'blast-xml-old', may make it difficult to adapt for future XML >> schema changes. How about renaming the current parser to >> 'blast-xml-legacy', and the new one to just 'blast-xml'? > > A possible downside of 'blast-xml-legacy' over 'blast-xml-old' > is this may be confused with the "legacy" BLAST in C to the > current BLAST+ in C++ move (which happened well before > this XML format change). Hmm. In this case then I am leaning to 'blast-xml2', I think. It's the shortest and most future-proof (subsequent changes to the XML format could be denoted as 'blast-xml3'). But it does make it slightly inconsistent with the names we have for HMMER (i.e. 'hmmer2-text' is for HMMER version 2 text output, 'hmmer3-text' is for HMMER version 3 text output). Cheers, Bow From p.j.a.cock at googlemail.com Tue Mar 18 09:15:16 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 13:15:16 +0000 Subject: [Biopython-dev] A biopython buildbot as a docker container In-Reply-To: <20140312151048.45066ade@lnx> References: <20140312151048.45066ade@lnx> Message-ID: On Wed, Mar 12, 2014 at 3:10 PM, Tiago Antao wrote: > Hi, > > I have a docker container ready (save for a few applications). Simple > usage instructions: > > 1. Create a directory and download inside this file: > https://raw.github.com/tiagoantao/my-containers/master/Biopython-Test Things moved, https://github.com/tiagoantao/my-containers/tree/master/biopython I guess you mean: https://raw.github.com/tiagoantao/my-containers/master/biopython/Biopython-Test > 2. Rename it Dockerfile (capital D) > > 3. Get a buildbot username and password (from Peter or me), edit the > file and replace CHANGEUSER CHANGEPASS > > 4. do > docker build -t biopython-buildbot . > > 5. do > docker run biopython-buildbot > > Beta-version, comments appreciated ;) > > If people like this, I will amend the Continuous Integration page on > the wiki accordingly > > Tiago Is this a 32 or 64 bit VM, or either? I'm asking because we may want to source a replacement 32 bit Linux buildslave - the hard drive in the old machine we've been using is failing, and it is probably not worth replacing. Peter From mjldehoon at yahoo.com Tue Mar 18 10:21:48 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 18 Mar 2014 07:21:48 -0700 (PDT) Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: Message-ID: <1395152508.46913.YahooMailBasic@web164006.mail.gq1.yahoo.com> On Mon, 3/17/14, Peter Cock wrote: > Are we all generally in favour of lower case for new module > names (as per PEP8)? > i.e. Bio/struct not Bio/Struct ? You may want to consider Bio/structure instead of Bio/struct. To me "struct" sounds like the C programming term, rather than a protein structure. Best, -Michiel From p.j.a.cock at googlemail.com Tue Mar 18 10:43:56 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 14:43:56 +0000 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: <1395152508.46913.YahooMailBasic@web164006.mail.gq1.yahoo.com> References: <1395152508.46913.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: On Tue, Mar 18, 2014 at 2:21 PM, Michiel de Hoon wrote: > On Mon, 3/17/14, Peter Cock wrote: >> Are we all generally in favour of lower case for new module >> names (as per PEP8)? >> i.e. Bio/struct not Bio/Struct ? > > You may want to consider Bio/structure instead of Bio/struct. > To me "struct" sounds like the C programming term, > rather than a protein structure. > > Best, > -Michiel I like Bio.structure too :) Peter From anaryin at gmail.com Tue Mar 18 10:46:34 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 18 Mar 2014 15:46:34 +0100 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References: <1395152508.46913.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: Makes sense! If nobody complains I'll change it. From eric.talevich at gmail.com Tue Mar 18 11:23:29 2014 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 18 Mar 2014 08:23:29 -0700 Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: References: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: On Mon, Mar 17, 2014 at 5:39 PM, Harsh Beria wrote: > Hi, > > I have started to write a proposal for a project on pair wise sequence > alignment. Is there anyone interested in mentoring the project so that I > can discuss some of the algorithmic problems in detail? Also, do I need to > add the project to the ideas page as it is not there yet? > It's not necessary to add the project to the public Ideas page if you've come up with it yourself. Just share your own proposal with us here and we'll discuss it with you. -Eric From tra at popgen.net Tue Mar 18 12:12:50 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 18 Mar 2014 16:12:50 +0000 Subject: [Biopython-dev] A biopython buildbot as a docker container In-Reply-To: References: <20140312151048.45066ade@lnx> Message-ID: <20140318161250.77a269fe@lnx> Hi, On Tue, 18 Mar 2014 13:15:16 +0000 Peter Cock wrote: > > Things moved, > https://github.com/tiagoantao/my-containers/tree/master/biopython > > I guess you mean: > https://raw.github.com/tiagoantao/my-containers/master/biopython/Biopython-Test Ah, sorry. Because this is a first version is being subjected to heavy refactoring still. I plan to document the final version well. For now maybe it is better to go to the top level: https://github.com/tiagoantao/my-containers The example of the README is documenting the biopython containers as they stand. > Is this a 32 or 64 bit VM, or either? I am afraid it is 64 and that doing a 32-bit docker is possible but not trivial. Tiago From arklenna at gmail.com Tue Mar 18 12:48:51 2014 From: arklenna at gmail.com (Lenna Peterson) Date: Tue, 18 Mar 2014 12:48:51 -0400 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References: <1395152508.46913.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: On Tue, Mar 18, 2014 at 10:43 AM, Peter Cock wrote: > On Tue, Mar 18, 2014 at 2:21 PM, Michiel de Hoon > wrote: > > On Mon, 3/17/14, Peter Cock wrote: > >> Are we all generally in favour of lower case for new module > >> names (as per PEP8)? > >> i.e. Bio/struct not Bio/Struct ? > > > > You may want to consider Bio/structure instead of Bio/struct. > > To me "struct" sounds like the C programming term, > > rather than a protein structure. > > > > Best, > > -Michiel > > I like Bio.structure too :) > Thirded! I'm in a particularly busy portion of my PhD right now but hopefully over the summer I'll have a little more spare time for open source work. Cheers, Lenna From tra at popgen.net Tue Mar 18 13:13:34 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 18 Mar 2014 17:13:34 +0000 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal Message-ID: <20140318171334.0edc2b45@lnx> Hi, Currently we have went through the procedure of asking on the mailing lists about Simcoal deprecation (now that we have fastsimcoal) 3 proposals and a doubt: 1. Deprecate https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Async.py https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Cache.py 2. Delete the Simcoal tests 3. Amend the tutorial The doubt: I would like to deprecate a class inside https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Controller.py But not the whole Controller (the fastsimcoal code is there). Question: Is there a procedure for a partial deprecation? Thanks, T From p.j.a.cock at googlemail.com Tue Mar 18 13:15:41 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 17:15:41 +0000 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal In-Reply-To: <20140318171334.0edc2b45@lnx> References: <20140318171334.0edc2b45@lnx> Message-ID: Hi Tiago, We've previously put a deprecation warning inside the __init__ method so anyone actually using the class will be warned. Peter On Tue, Mar 18, 2014 at 5:13 PM, Tiago Antao wrote: > Hi, > > Currently we have went through the procedure of asking on the mailing > lists about Simcoal deprecation (now that we have fastsimcoal) > > 3 proposals and a doubt: > > 1. Deprecate > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Async.py > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Cache.py > 2. Delete the Simcoal tests > 3. Amend the tutorial > > The doubt: I would like to deprecate a class inside > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Controller.py > But not the whole Controller (the fastsimcoal code is there). > Question: Is there a procedure for a partial deprecation? > > Thanks, > T > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From tra at popgen.net Tue Mar 18 14:26:10 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 18 Mar 2014 18:26:10 +0000 Subject: [Biopython-dev] A biopython buildbot as a docker container In-Reply-To: <20140318161250.77a269fe@lnx> References: <20140312151048.45066ade@lnx> <20140318161250.77a269fe@lnx> Message-ID: <20140318182610.35082103@lnx> An update on the status: 1. A couple of problems with fasttree and dialign2. These seem genuine problems with the test code/modules. 2. Prank will wait for ubuntu trusty (it will be a standard package). I will then include it. 3. I was just able to find part of the fonts for the graphics packages, so a couple of tests are being skipped 4. naccess as a very restrictive activation system, impossible to add. 1 and 3 are solvable (2 will sort itself out with time). 1 is really a problem with the biopython code, I think. For 3, if someone could have a look at the existing fonts here: https://github.com/tiagoantao/my-containers/blob/master/biopython/Biopython-Basic And tell me which ones are missing, I would take care of adding them. Tiago PS - In the near future I will do a Python 3 container also. From kirkolem at gmail.com Tue Mar 18 17:31:45 2014 From: kirkolem at gmail.com (Dan K.) Date: Wed, 19 Mar 2014 01:31:45 +0400 Subject: [Biopython-dev] GSoC 2014 Message-ID: Hello! I'm a third year student learning bioengineering and bioinformatics and I'm interested in participating in GSoC and contributing to the BioPython project. In particular, working on multiple alignments and scoring matrices (PSSMs) seems interesting to me. For example, I find convenient to implement Gerstein-Sonnhammer-Chothia and Henikoff weighting procedures. What do you think? What my next steps should be like? Thank you for your attention. From p.j.a.cock at googlemail.com Wed Mar 19 13:00:37 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 19 Mar 2014 17:00:37 +0000 Subject: [Biopython-dev] SQLite test failure on Windows, OperationalError: unable to open database file Message-ID: Hi all, About a week ago most of the Windows nightly tests broke - e.g. here on the same revision (!) 79f9054e5246ba30816ff93a775d594ae7da6fc6 https://github.com/biopython/biopython/commit/79f9054e5246ba30816ff93a775d594ae7da6fc6 Worked, Fri Mar 14 http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.6/builds/1129 Failed, Sat Mar 15 http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.6/builds/1130 ... test_BioSQL_sqlite3 ... FAIL ... ====================================================================== ERROR: Check list, keys, length etc ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\repositories\BuildBotBiopython\win26\build\Tests\common_BioSQL.py", line 193, in setUp load_database(gb_handle) File "c:\repositories\BuildBotBiopython\win26\build\Tests\common_BioSQL.py", line 166, in load_database create_database() File "c:\repositories\BuildBotBiopython\win26\build\Tests\common_BioSQL.py", line 148, in create_database server.load_database_sql(SQL_FILE) File "c:\repositories\BuildBotBiopython\win26\build\build\lib.win32-2.6\BioSQL\BioSeqDatabase.py", line 281, in load_database_sql self.adaptor.cursor.execute(sql_line) OperationalError: unable to open database file (etc) Presumably something changed on the machine itself - perhaps a Windows security update? Any guesses for what might be wrong and why it broke on Python 2.6, PyPy 1.9, 2.0, 2.1 - yet works fine on Python 2.7, Python 3.3, PyPy 2.2 and Jython 2.7? Logged into this machine, I can reproduce the error with: c:\python26\python test_BioSQL_sqlite3.py Thanks, Peter From eparker at ucdavis.edu Wed Mar 19 12:49:04 2014 From: eparker at ucdavis.edu (Evan Parker) Date: Wed, 19 Mar 2014 09:49:04 -0700 Subject: [Biopython-dev] GSoC draft proposal - lazy loading SeqIO parsers Message-ID: Hi all, I have a rough draft of my GSoC proposal and would appreciate comments from anybody who might be willing to eventually mentor this project, or anybody who has opinions on implementation. It's about 3 pages of text + several figures. I'll be submitting a final draft Friday on the GSoC website pending your comments. Thank you, -Evan -------------- next part -------------- A non-text attachment was scrubbed... Name: Evan-Parker-GSOC-2014-proposal.pdf Type: application/pdf Size: 68577 bytes Desc: not available URL: From p.j.a.cock at googlemail.com Wed Mar 19 13:26:10 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 19 Mar 2014 17:26:10 +0000 Subject: [Biopython-dev] GSoC draft proposal - lazy loading SeqIO parsers In-Reply-To: References: Message-ID: On Wed, Mar 19, 2014 at 4:49 PM, Evan Parker wrote: > Hi all, > > I have a rough draft of my GSoC proposal and would appreciate comments from > anybody who might be willing to eventually mentor this project, or anybody > who has opinions on implementation. It's about 3 pages of text + several > figures. > > I'll be submitting a final draft Friday on the GSoC website pending your > comments. > > Thank you, > -Evan Hi Evan, That's a nice job so far - although questions about your time availability will be raised (sadly the GSoC schedule isn't fair to students depending on regional University term schedules). However, you are a PhD student (which is normally full time). You will need to clear this with your PhD supervisors - since you would be spending a large chunk of time not working directly on your thesis project, and there can be strict deadlines for completion. Here's a selection of points in no particular order: Have you looked at Bio.SeqIO.index_db(...) which works like Bio.SeqIO.index(...) but stores the offsets etc in an SQLite database? When pondering how to design this kind of thing myself, I had suspected multiple SeqRecProxy classes might be needed (one per file format potentially), although run time selection of internal parsing methods might work too. I would also ask why not have the slicing of a SeqRecProxy return another SeqRecProxy? This means creating a new proxy object with different offset values - but would be fast. Only when the seq/annotation/etc is accessed would the proxy have to go to the disk drive. This becomes more interesting when accessing the features in the slice of interest (e.g. if the full record was for a whole chromosome and only region [1000:2000] was of interest). This idea about windows onto the data is key to how the SAM/BAM file format is used (coordinate sorting with an index). Are you familiar with that, or tabix? Another open question is what to do with file handles - specifically the question of when to close them? e.g. via garbage collection, context managers, etc. See for example this blog post - the lazy parsing approach may result in ResourceWarnings as a side effect: http://emptysqua.re/blog/against-resourcewarnings-in-python-3/ I appreciate you are unlikely to have ready answers to all of that - I've probably given you a whole load more background reading. I hope some of the other Biopython developers (or GSoC mentors on other OBF projects - you could post this to the OBF GSoC mailing list too) will have further feedback. Regards, Peter From harsh.beria93 at gmail.com Wed Mar 19 13:44:47 2014 From: harsh.beria93 at gmail.com (Harsh Beria) Date: Wed, 19 Mar 2014 23:14:47 +0530 Subject: [Biopython-dev] GSOC Proposal (Pairwise Sequence Alignment in Biopython) Message-ID: Hi, Please take a look at my GSOC proposal on Pairwise Sequence Alignment and suggest improvements. https://gist.github.com/harshberia93/9647053 Thanks -- Harsh Beria, Indian Institute of Technology,Kharagpur E-mail: harsh.beria93 at gmail.com Ph: +919332157616 From nejat.arinik at insa-lyon.fr Wed Mar 19 13:45:53 2014 From: nejat.arinik at insa-lyon.fr (Nejat Arinik) Date: Wed, 19 Mar 2014 18:45:53 +0100 (CET) Subject: [Biopython-dev] detailed plan - Indexing & Lazy-loading Sequence Parsers In-Reply-To: <635897854.576469.1395250362974.JavaMail.root@insa-lyon.fr> Message-ID: <14256604.579277.1395251153640.JavaMail.root@insa-lyon.fr> Hi all, I would show you my detailed plan per mounth. https://docs.google.com/document/d/1IKJAs4u4rAVnmaDh0LPyMrd_MuqELqblmveeeOO36aE/edit It's not neatly I know but it's just for learn yours ideas at about that plan. I'll finish it this night. I understood correctly the subject, you think? That plan can be solution? Thanks in advance. PS: My english level is not good so It is a little bit difficult to write a proposal-plan detailed but I'm trying. I hope it's not a big problem :) I'm more comfortable with the french language unfortunately. Nejat From p.j.a.cock at googlemail.com Wed Mar 19 14:10:54 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 19 Mar 2014 18:10:54 +0000 Subject: [Biopython-dev] detailed plan - Indexing & Lazy-loading Sequence Parsers In-Reply-To: <14256604.579277.1395251153640.JavaMail.root@insa-lyon.fr> References: <635897854.576469.1395250362974.JavaMail.root@insa-lyon.fr> <14256604.579277.1395251153640.JavaMail.root@insa-lyon.fr> Message-ID: On Wed, Mar 19, 2014 at 5:45 PM, Nejat Arinik wrote: > > Hi all, > > I would show you my detailed plan per mounth. > https://docs.google.com/document/d/1IKJAs4u4rAVnmaDh0LPyMrd_MuqELqblmveeeOO36aE/edit > > It's not neatly I know but it's just for learn yours ideas at about that > plan. I'll finish it this night. I understood correctly the subject, you think? > That plan can be solution? Thanks in advance. > > PS: My english level is not good so It is a little bit difficult to write a > proposal-plan detailed but I'm trying. I hope it's not a big problem :) > I'm more comfortable with the french language unfortunately. > > Nejat Hi Nejat, I can try to answer some of the questions at the start of the document: Q: Lazy-load ~= load partially (depends on demands) ? Yes. For example, only load the sequence if the user tries to access the sequence. For example, this should speed up tasks like counting the records, or building a list of all the record identifiers. Q: small to medium sized sequences/genomes is how much in general? It takes how many times? A: Bacterial genomes usually are small enough to load into memory without worrying about RAM. Eukaryote genomes (e.g. mouse, human, plants) are typically large enough that you may not want to load an entire annotated chromosome into memory. Q: python dictionary is used for SeqRecord object ? A: Yes, the SeqRecord object uses a Python dictionary for the annotations property, and a dictionary like object for the letter_annotations property. The SeqRecord object also uses Python lists, and the Biopython Seq object. Q: Putting some data in the file will be done? If yes, relation with Biosql? So any modification as an update will be considerable/ be paid attention. A: The SeqRecord-like objects from the lazy-parsers could be read only. However, if they act enough like the original SeqRecord, then they can be used with Bio.SeqIO.write(...) to save them to disk. It would be nice if (like the BioSQL SeqRecord-like objects) it was possible to modify the records in memory. Q: For very large indexing jobs, index on multiple machines running simultaneously, and then merge the indexes. A: This seems too complicated. If building the index is slow, I suggest saving the index on disk (e.g. as an SQLite database). For comparison, see the BAM and tabix index files, or Biopython's Bio.SeqIO.index_db(...) function. Regards, Peter From w.arindrarto at gmail.com Wed Mar 19 15:42:50 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 19 Mar 2014 20:42:50 +0100 Subject: [Biopython-dev] GSoC draft proposal - lazy loading SeqIO parsers In-Reply-To: References: Message-ID: Hi Evan, Looks like this is shaping up in a good direction :). In addition to Peter's earlier comments, I also have some remarks: * How would the indices of the files be stored? Are they simply stored in-memory or as files? Are their creation invisible to the user (i.e. invoking the `lazy=True` argument is enough to create the index) or does the user need to create the index explicitly? For `SeqIO.index(lazy=True)` in particular, does this mean that we will have two indices then (one for the currently implemented SQLite database that stores offsets for record positions and the other to store other informations necessary for the lazy parser)? * It would be nice to also have some notes on the relation between SeqRecProxy and SeqRecord (is it a subclass perhaps, or are they both different but will inherit from another base subclass). As an alternative, it is also possible to have regular SeqRecord object, but with lazy Seq objects and lazy annotation objects instead. * Have you thought about what to store in the indices of the different formats? It's a good idea to explain this further in your proposal (e.g. what to store when indexing GenBank files, UniprotXML files, etc.). It doesn't have to be concrete (it will be in the code anyway, but having an idea or possible implementations you have in mind would be nice. * And finally, the schedule. It looks like the early weeks will be quite packed, considering your other obligations. I think it is expected that students spend close to 8 hours per day (or 40 hours per week) during the coding period. Of course this is much more sensible when the student does not have other pressing obligations. I do agree with Peter here that you have to at least discuss this with your PhD supervisor. I personally do not mind that for the week you have the conference the workload is reduced. But in the first four weeks, I would prefer that you have more time to spend for GSoC. Cheers & good luck, Bow On Wed, Mar 19, 2014 at 6:26 PM, Peter Cock wrote: > On Wed, Mar 19, 2014 at 4:49 PM, Evan Parker wrote: >> Hi all, >> >> I have a rough draft of my GSoC proposal and would appreciate comments from >> anybody who might be willing to eventually mentor this project, or anybody >> who has opinions on implementation. It's about 3 pages of text + several >> figures. >> >> I'll be submitting a final draft Friday on the GSoC website pending your >> comments. >> >> Thank you, >> -Evan > > Hi Evan, > > That's a nice job so far - although questions about your time > availability will be raised (sadly the GSoC schedule isn't fair to > students depending on regional University term schedules). > However, you are a PhD student (which is normally full time). > You will need to clear this with your PhD supervisors - since > you would be spending a large chunk of time not working > directly on your thesis project, and there can be strict > deadlines for completion. > > Here's a selection of points in no particular order: > > Have you looked at Bio.SeqIO.index_db(...) which works > like Bio.SeqIO.index(...) but stores the offsets etc in an > SQLite database? > > When pondering how to design this kind of thing myself, > I had suspected multiple SeqRecProxy classes might be > needed (one per file format potentially), although run > time selection of internal parsing methods might work too. > > I would also ask why not have the slicing of a SeqRecProxy > return another SeqRecProxy? This means creating a new > proxy object with different offset values - but would be fast. > Only when the seq/annotation/etc is accessed would the > proxy have to go to the disk drive. This becomes more > interesting when accessing the features in the slice of > interest (e.g. if the full record was for a whole chromosome > and only region [1000:2000] was of interest). > > This idea about windows onto the data is key to how > the SAM/BAM file format is used (coordinate sorting > with an index). Are you familiar with that, or tabix? > > Another open question is what to do with file handles - > specifically the question of when to close them? e.g. > via garbage collection, context managers, etc. See > for example this blog post - the lazy parsing approach > may result in ResourceWarnings as a side effect: > http://emptysqua.re/blog/against-resourcewarnings-in-python-3/ > > I appreciate you are unlikely to have ready answers to > all of that - I've probably given you a whole load more > background reading. I hope some of the other Biopython > developers (or GSoC mentors on other OBF projects - > you could post this to the OBF GSoC mailing list too) > will have further feedback. > > Regards, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From eparker at ucdavis.edu Wed Mar 19 20:34:34 2014 From: eparker at ucdavis.edu (Evan Parker) Date: Wed, 19 Mar 2014 17:34:34 -0700 Subject: [Biopython-dev] GSoC draft proposal - lazy loading SeqIO parsers In-Reply-To: References: Message-ID: Thank you both for your fast and thorough evaluation of my proposal. *Regarding time requirements:* My adviser is aware the possibility that I may participate in this program. During the summer I would file a "planned educational leave" instead of enrollment to accommodate my full-time participation in GSoC. As for the time requirements; I cannot avoid my obligations prior to ASMS although I can promise to spend every extra minute I have to honor my obligations to Biopython. If my lack of full time availability prior to June precludes me from participation I will understand. *Regarding specific suggestions:* I will come up with a deeper description of the relationship between SeqRecProxy and SeqRecord before Friday. I like the idea of a SeqRecProxy returning itself when sliced, I had not thought of it but it would be an elegant solution to the problem of unparsed-vs-parsed annotations, this feature would also allow more transparent use of proxy objects and would pave the way for compatibility with SeqIO.write(). I considered using multiple proxy classes, but I prefer making a standardized binding for a lazy parsing function function that can be accepted by a single SeqRecProxy at run-time. I'll make this more explicit in my proposal. There are many other questions and points of clarification that I still need to evaluate. I'll incorporate as much as I can in my proposal without overloading it and without making statements that I cannot back up with my own understanding. Thanks again, -Evan From p.j.a.cock at googlemail.com Thu Mar 20 07:19:27 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 20 Mar 2014 11:19:27 +0000 Subject: [Biopython-dev] Fwd: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts In-Reply-To: References: Message-ID: FYI, in addition to the SciPy conference in Texas this summer, there is also EuroSciPy which will be in England this year - deadline for abstracts is 14 April (see below). Is anyone planning to attend? If not maybe I should...? Thanks, Peter P.S. Don't forget to consider submitting a talk/poster abstract to BOSC 2014 (which I am co-chairing this year), especially students who can get free BOSC registration: http://news.open-bio.org/news/2014/03/free-student-presenters-bosc-2014/ ---------- Forwarded message ---------- From: Ralf Gommers Date: Wed, Mar 5, 2014 at 7:37 PM Subject: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts To: Organisation of EuroScipy , conferences at python.org, numfocus at googlegroups.com, Discussion of Numerical Python , SciPy Users List Dear all, EuroSciPy 2014, the Seventh Annual Conference on Python in Science, takes place in Cambridge, UK on 27 - 30 August 2013. The conference features two days of tutorials followed by two days of scientific talks. The day after the main conference, developer sprints will be organized on projects of interest to attendees. The topics presented at EuroSciPy are very diverse, with a focus on advanced software engineering and original uses of Python and its scientific libraries, either in theoretical or experimental research, from both academia and the industry. The program includes keynotes, contributed talks and posters. Submissions for talks and posters are welcome on our website (http://www.euroscipy.org/2014/). In your abstract, please provide details on what Python tools are being employed, and how. The deadline for submission is 14 April 2013. Also until 14 April 2014, you can apply for a sprint session on 31 August 2014. See https://www.euroscipy.org/2014/calls/sprints/ for details. Important dates: April 14th: Presentation abstracts, poster, tutorial submission deadline. Application for sponsorship deadline. May 17th: Speakers selected May 22nd: Sponsorship acceptance deadline June 1st: Speaker schedule announced June 6th, or 150 registrants: Early-bird registration ends August 27-31st: 2 days of tutorials, 2 days of conference, 1 day of sprints We look forward to an exciting conference and hope to see you in Cambridge in August! The EuroSciPy 2014 Team http://www.euroscipy.org/2014/ Conference Chairs -------------------------- Mark Hayes, Cambridge University, UK Didrik Pinte, Enthought Europe, UK Tutorial Chair ------------------- David Cournapeau, Enthought Europe, UK Program Chair -------------------- Ralf Gommers, ASML, The Netherlands Program Committee ----------------------------- Tiziano Zito, Humboldt-Universit?t zu Berlin, Germany Pierre de Buyl, Universit? libre de Bruxelles, Belgium Emmanuelle Gouillart, Joint Unit CNRS/Saint-Gobain, France Konrad Hinsen, Centre National de la Recherche Scientifique (CNRS), France Raphael Ritz, Garching Computing Centre of the Max Planck Society, Germany St?fan van der Walt, Applied Mathematics, Stellenbosch University, South Africa Ga?l Varoquaux, INRIA Parietal, Saclay, France Nelle Varoquaux, Mines ParisTech, France Pauli Virtanen, Aalto University, Finland Evgeni Burovski, Lancaster University, UK Robert Cimrman, New Technologies Research Centre, University of West Bohemia, Czech Republic Almar Klein, Cybermind, The Netherlands Organizing Committee ------------------------------ Simon Jagoe, Enthought Europe, UK Pierre de Buyl, Universit? libre de Bruxelles, Belgium _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From tra at popgen.net Thu Mar 20 07:48:15 2014 From: tra at popgen.net (Tiago Antao) Date: Thu, 20 Mar 2014 11:48:15 +0000 Subject: [Biopython-dev] Fwd: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts In-Reply-To: References: Message-ID: <20140320114815.70210fd5@grandao> On Thu, 20 Mar 2014 11:19:27 +0000 Peter Cock wrote: > Is anyone planning to attend? If not maybe I should...? Wild thought here: Considering that Cambridge is a geographic focal point for some of us (I am looking at you Dutch-based Biopythoneers, for instance), I am wondering if the could use this for a "local" Biopython meetup... Does this make any sense? Would there be interest? As I said, wild thought (silly?)... Tiago From anaryin at gmail.com Thu Mar 20 07:54:05 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 20 Mar 2014 12:54:05 +0100 Subject: [Biopython-dev] Fwd: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts In-Reply-To: <20140320114815.70210fd5@grandao> References: <20140320114815.70210fd5@grandao> Message-ID: Lovely geographical clustering :) I'd be in. 2014-03-20 12:48 GMT+01:00 Tiago Antao : > On Thu, 20 Mar 2014 11:19:27 +0000 > Peter Cock wrote: > > > Is anyone planning to attend? If not maybe I should...? > > > Wild thought here: Considering that Cambridge is a geographic focal > point for some of us (I am looking at you Dutch-based Biopythoneers, > for instance), I am wondering if the could use this for a "local" > Biopython meetup... Does this make any sense? Would there be interest? > > As I said, wild thought (silly?)... > > > Tiago > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From tra at popgen.net Thu Mar 20 07:42:44 2014 From: tra at popgen.net (Tiago Antao) Date: Thu, 20 Mar 2014 11:42:44 +0000 Subject: [Biopython-dev] Ipython notebook biopython tutorials Message-ID: <20140320114244.4022cbc7@grandao> Hi all, Just to announce a potential project that I might embark on very soon and see the reaction of the community: Get all the tutorial materials that I can find and create a ipython notebook version of them. Does this sound like a good idea? Tiago (your ipython notebook fanatic) From w.arindrarto at gmail.com Thu Mar 20 08:10:15 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 20 Mar 2014 13:10:15 +0100 Subject: [Biopython-dev] Fwd: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts In-Reply-To: References: <20140320114815.70210fd5@grandao> Message-ID: Sounds good :). However, my passport is still Indonesian and I'd have to apply for a visa first in Germany in order to come to the UK :|. So I'll pass this one I guess. On Thu, Mar 20, 2014 at 12:54 PM, Jo?o Rodrigues wrote: > Lovely geographical clustering :) > > I'd be in. > > > > 2014-03-20 12:48 GMT+01:00 Tiago Antao : > >> On Thu, 20 Mar 2014 11:19:27 +0000 >> Peter Cock wrote: >> >> > Is anyone planning to attend? If not maybe I should...? >> >> >> Wild thought here: Considering that Cambridge is a geographic focal >> point for some of us (I am looking at you Dutch-based Biopythoneers, >> for instance), I am wondering if the could use this for a "local" >> Biopython meetup... Does this make any sense? Would there be interest? >> >> As I said, wild thought (silly?)... >> >> >> Tiago >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From w.arindrarto at gmail.com Thu Mar 20 08:15:56 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 20 Mar 2014 13:15:56 +0100 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: <20140320114244.4022cbc7@grandao> References: <20140320114244.4022cbc7@grandao> Message-ID: Hi Tiago, Do you plan to put the .ipynb file in the repo or will this be separate? Either way, I like the idea of having an .ipynb version of the tutorials around :). (from another IPython user). On Thu, Mar 20, 2014 at 12:42 PM, Tiago Antao wrote: > Hi all, > > Just to announce a potential project that I might embark on very soon > and see the reaction of the community: > > Get all the tutorial materials that I can find and create a ipython > notebook version of them. > > Does this sound like a good idea? > > Tiago > (your ipython notebook fanatic) > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From mgymrek at mit.edu Thu Mar 20 09:50:39 2014 From: mgymrek at mit.edu (Melissa Gymrek) Date: Thu, 20 Mar 2014 09:50:39 -0400 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal In-Reply-To: <20140318171334.0edc2b45@lnx> References: <20140318171334.0edc2b45@lnx> Message-ID: Hi Tiago, I'm happy to update this section in the tutorial if you'd like help with that. Cheers, ~M On Tue, Mar 18, 2014 at 1:13 PM, Tiago Antao wrote: > Hi, > > Currently we have went through the procedure of asking on the mailing > lists about Simcoal deprecation (now that we have fastsimcoal) > > 3 proposals and a doubt: > > 1. Deprecate > > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Async.py > > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Cache.py > 2. Delete the Simcoal tests > 3. Amend the tutorial > > The doubt: I would like to deprecate a class inside > > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Controller.py > But not the whole Controller (the fastsimcoal code is there). > Question: Is there a procedure for a partial deprecation? > > Thanks, > T > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From mgymrek at mit.edu Thu Mar 20 09:57:13 2014 From: mgymrek at mit.edu (Melissa Gymrek) Date: Thu, 20 Mar 2014 09:57:13 -0400 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: References: <20140320114244.4022cbc7@grandao> Message-ID: Hi Tiago, I also really like this idea. Seems like it would make sense to have them as part of the repository to make it easy for others to contribute. (yet another IPython notebook user :) ) ~M On Thu, Mar 20, 2014 at 8:15 AM, Wibowo Arindrarto wrote: > Hi Tiago, > > Do you plan to put the .ipynb file in the repo or will this be > separate? Either way, I like the idea of having an .ipynb version of > the tutorials around :). > > (from another IPython user). > > On Thu, Mar 20, 2014 at 12:42 PM, Tiago Antao wrote: > > Hi all, > > > > Just to announce a potential project that I might embark on very soon > > and see the reaction of the community: > > > > Get all the tutorial materials that I can find and create a ipython > > notebook version of them. > > > > Does this sound like a good idea? > > > > Tiago > > (your ipython notebook fanatic) > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From tra at popgen.net Thu Mar 20 10:11:16 2014 From: tra at popgen.net (Tiago Antao) Date: Thu, 20 Mar 2014 14:11:16 +0000 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: References: <20140320114244.4022cbc7@grandao> Message-ID: <20140320141116.3f98384f@grandao> Hi Bow and Melissa, I was planning on doing this separately. But happy to do it otherwise. Or maybe we could start a git repo, do some examples and see where it goes? Considering that this would be starting from scratch I was planning on doing this on ipython 2.0 with python 3.4. You know, living on the edge ;) Tiago On Thu, 20 Mar 2014 09:57:13 -0400 Melissa Gymrek wrote: > Hi Tiago, > I also really like this idea. Seems like it would make sense to have > them as part of the repository to make it easy for others to > contribute. (yet another IPython notebook user :) ) > ~M > > > On Thu, Mar 20, 2014 at 8:15 AM, Wibowo Arindrarto > wrote: > > > Hi Tiago, > > > > Do you plan to put the .ipynb file in the repo or will this be > > separate? Either way, I like the idea of having an .ipynb version of > > the tutorials around :). > > > > (from another IPython user). > > > > On Thu, Mar 20, 2014 at 12:42 PM, Tiago Antao > > wrote: > > > Hi all, > > > > > > Just to announce a potential project that I might embark on very > > > soon and see the reaction of the community: > > > > > > Get all the tutorial materials that I can find and create a > > > ipython notebook version of them. > > > > > > Does this sound like a good idea? > > > > > > Tiago > > > (your ipython notebook fanatic) > > > _______________________________________________ > > > Biopython-dev mailing list > > > Biopython-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > From p.j.a.cock at googlemail.com Thu Mar 20 10:19:51 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 20 Mar 2014 14:19:51 +0000 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: References: <20140320114244.4022cbc7@grandao> Message-ID: +1 for any *.ipynb files being under source code control. There are perhaps advantages to using a separate repository, but still under Biopython on GitHub? This might also help if we wanted to build on existing external tutorials which are under a CC licence etc... Peter On Thu, Mar 20, 2014 at 1:57 PM, Melissa Gymrek wrote: > Hi Tiago, > I also really like this idea. Seems like it would make sense to have them > as part of the repository to make it easy for others to contribute. > (yet another IPython notebook user :) ) > ~M > > > On Thu, Mar 20, 2014 at 8:15 AM, Wibowo Arindrarto > wrote: > >> Hi Tiago, >> >> Do you plan to put the .ipynb file in the repo or will this be >> separate? Either way, I like the idea of having an .ipynb version of >> the tutorials around :). >> >> (from another IPython user). >> >> On Thu, Mar 20, 2014 at 12:42 PM, Tiago Antao wrote: >> > Hi all, >> > >> > Just to announce a potential project that I might embark on very soon >> > and see the reaction of the community: >> > >> > Get all the tutorial materials that I can find and create a ipython >> > notebook version of them. >> > >> > Does this sound like a good idea? >> > >> > Tiago >> > (your ipython notebook fanatic) >> > _______________________________________________ >> > Biopython-dev mailing list >> > Biopython-dev at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biopython-dev >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From anaryin at gmail.com Thu Mar 20 10:21:12 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 20 Mar 2014 15:21:12 +0100 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: <20140320141116.3f98384f@grandao> References: <20140320114244.4022cbc7@grandao> <20140320141116.3f98384f@grandao> Message-ID: +1 too. Maybe adding some support for oldies (Python 2.x) or are there features in iPython 2.0 that cannot be used in these older versions?? From tra at popgen.net Thu Mar 20 10:48:15 2014 From: tra at popgen.net (Tiago Antao) Date: Thu, 20 Mar 2014 14:48:15 +0000 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: References: <20140320114244.4022cbc7@grandao> Message-ID: <20140320144815.30b7a138@grandao> On Thu, 20 Mar 2014 14:19:51 +0000 Peter Cock wrote: > +1 for any *.ipynb files being under source code control. > > There are perhaps advantages to using a separate repository, > but still under Biopython on GitHub? This might also help if we > wanted to build on existing external tutorials which are under > a CC licence etc... My original plan was to draw "heavy inspiration" (credited, of course) from the existing Tutorial and maybe your workshop work. This all started when I noticed the need to change the tutorial due to simcoal changes... As I had to re-visit this, the idea followed... If people are fine with something under the biopython organization, I am fine with that. I have two proposals, though: 1. Base it on recent infrastructure (python 3.4 or maybe 3.3, and above all ipython 2.0) 2. Go on and "do stuff", see where it goes and then maybe re-organize in the future (as opposed to do lots of planning first). This is, in some sense, a new line of direction and I would suggest that being exploratory would be better than being cautious... Tiago From p.j.a.cock at googlemail.com Thu Mar 20 10:53:41 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 20 Mar 2014 14:53:41 +0000 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: <20140320144815.30b7a138@grandao> References: <20140320114244.4022cbc7@grandao> <20140320144815.30b7a138@grandao> Message-ID: On Thu, Mar 20, 2014 at 2:48 PM, Tiago Antao wrote: > On Thu, 20 Mar 2014 14:19:51 +0000 > Peter Cock wrote: > >> +1 for any *.ipynb files being under source code control. >> >> There are perhaps advantages to using a separate repository, >> but still under Biopython on GitHub? This might also help if we >> wanted to build on existing external tutorials which are under >> a CC licence etc... > > > My original plan was to draw "heavy inspiration" (credited, of course) > from the existing Tutorial and maybe your workshop work. > > This all started when I noticed the need to change the tutorial due to > simcoal changes... As I had to re-visit this, the idea followed... > > If people are fine with something under the biopython organization, I > am fine with that. > > I have two proposals, though: > > 1. Base it on recent infrastructure (python 3.4 or maybe 3.3, and above > all ipython 2.0) > > 2. Go on and "do stuff", see where it goes and then maybe re-organize > in the future (as opposed to do lots of planning first). This is, in > some sense, a new line of direction and I would suggest that being > exploratory would be better than being cautious... > > Tiago So make a new repository and explore away :) Regarding https://github.com/peterjc/biopython_workshop - my workshop stuff I did wonder at the time about using iPython notebook but it adds another step to the workshop setup - and another barrier for people to repeat what they did at home. I was/am hoping to improve the TravisCI coverage of that work to check all the examples work under Python 2.6, 2.7 3.3 etc. I wonder if iPython notebooks make automated testing any easier or not? Peter From tra at popgen.net Thu Mar 20 11:27:38 2014 From: tra at popgen.net (Tiago Antao) Date: Thu, 20 Mar 2014 15:27:38 +0000 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal In-Reply-To: References: <20140318171334.0edc2b45@lnx> Message-ID: <20140320152738.3b3ab8ac@grandao> Hi Melissa, On Thu, 20 Mar 2014 09:50:39 -0400 Melissa Gymrek wrote: > I'm happy to update this section in the tutorial if you'd like help > with that. I just did all the changes (not much really). I was planning on committing the changes (Peter, can I?) and then some reviewing (or changing, if needed) would really be appreciated. Tiago From p.j.a.cock at googlemail.com Thu Mar 20 11:29:15 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 20 Mar 2014 15:29:15 +0000 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal In-Reply-To: <20140320152738.3b3ab8ac@grandao> References: <20140318171334.0edc2b45@lnx> <20140320152738.3b3ab8ac@grandao> Message-ID: On Thu, Mar 20, 2014 at 3:27 PM, Tiago Antao wrote: > Hi Melissa, > > On Thu, 20 Mar 2014 09:50:39 -0400 > Melissa Gymrek wrote: > >> I'm happy to update this section in the tutorial if you'd like help >> with that. > > I just did all the changes (not much really). I was planning on > committing the changes (Peter, can I?) and then some reviewing (or > changing, if needed) would really be appreciated. > > Tiago Please do :) Peter From mgymrek at mit.edu Thu Mar 20 11:34:52 2014 From: mgymrek at mit.edu (Melissa Gymrek) Date: Thu, 20 Mar 2014 11:34:52 -0400 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal In-Reply-To: References: <20140318171334.0edc2b45@lnx> <20140320152738.3b3ab8ac@grandao> Message-ID: sounds good! happy to have a look ~M On Thu, Mar 20, 2014 at 11:29 AM, Peter Cock wrote: > On Thu, Mar 20, 2014 at 3:27 PM, Tiago Antao wrote: > > Hi Melissa, > > > > On Thu, 20 Mar 2014 09:50:39 -0400 > > Melissa Gymrek wrote: > > > >> I'm happy to update this section in the tutorial if you'd like help > >> with that. > > > > I just did all the changes (not much really). I was planning on > > committing the changes (Peter, can I?) and then some reviewing (or > > changing, if needed) would really be appreciated. > > > > Tiago > > Please do :) > > Peter > From b.invergo at gmail.com Thu Mar 20 09:39:34 2014 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 20 Mar 2014 13:39:34 +0000 Subject: [Biopython-dev] Fwd: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts In-Reply-To: <20140320114815.70210fd5@grandao> (Tiago Antao's message of "Thu, 20 Mar 2014 11:48:15 +0000") References: <20140320114815.70210fd5@grandao> Message-ID: <87txathtux.fsf@chupacabra.windows.ebi.ac.uk> > Wild thought here: Considering that Cambridge is a geographic focal > point for some of us (I am looking at you Dutch-based Biopythoneers, > for instance), I am wondering if the could use this for a "local" > Biopython meetup... Does this make any sense? Would there be interest? > > As I said, wild thought (silly?)... Since I'm now based in Cambridge, it would by silly for me not to attend. I'm not all that active lately (biopython's doing what I want it to do) but it'd still be nice to meet up. Cheers, Brandon -- Brandon Invergo http://brandon.invergo.net -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 489 bytes Desc: not available URL: From w.arindrarto at gmail.com Fri Mar 21 10:59:40 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 21 Mar 2014 15:59:40 +0100 Subject: [Biopython-dev] [Biopython] Exonerate Parser Error In-Reply-To: References: Message-ID: Hi Zheng, The nucleotide information is stored as the alignment annotation. You can access it using hsp.aln_annotation['query_annotation']. There, they are stored as triplets, reprensenting the codons. This is indeed a tradeoff that I had to make because there is no proper model yet to represent alignment objects containing sequences with different length in our master branch. In this case, the length of the DNA is most of the time 3x the length of the protein. And yes, this is not ideal since the actual query are now stored as an annotation ~ trading places with the translated query. HSPs themselves are basically modelled based on our MultipleSeqAlignment objects (you can get such objects when accessing the `aln` attribute from an HSP object). I think in order to properly model these types of alignment, we need to have a proper model of three-letter protein Seq objects as well. Your CodonSeqAlignment object may help here :), but I have not looked into it that much to be honest. How does it work with Seq objects with ProteinAlphabet? Is it possible to align protein and codon sequences? I tried storing as much information as possible using the current approach (e.g. notice the start and end coordinates of each hit and query, they are parsed from the file and the difference is not the same as the value you get when doing a `len` on hsp.query and/or hsp.hit). Note also that when dealing with frameshifts, you may want to access the hsp.fragments attribute, since frameshifts mean that you can break further your HSP alignment into multiple subalignments (fragments as it is called in SearchIO). Hope this helps :), Bow P.S. Also CC-ing the Development list ~ this looks like something interesting for dev in general. On Fri, Mar 21, 2014 at 3:39 PM, Zheng Ruan wrote: > Thanks Bow, > > That works for me. But it seems the parser doesn't take the nucleotide > information into the hsps. All I get is a pairwise alignment between two > proteins. Nucleotide information is useful because I want to know the codon > -- amino acid correspondence. In the case of frameshift the situation may > not be that straightforward. Maybe you have other concern of not doing this. > > Best, > Zheng > > > On Thu, Mar 20, 2014 at 7:30 PM, Wibowo Arindrarto > wrote: >> >> Hi Zheng, >> >> Thank you for the files :). I found out what was causing the error and >> have pushed a patch along with some tests to our codebase >> >> (https://github.com/biopython/biopython/commit/377889b05235c2e6f192916fb610d0da01b45c6d). >> You should be able to parse your file using the latest `master` >> branch. >> >> Hope this helps, >> Bow >> >> On Thu, Mar 20, 2014 at 9:42 PM, Zheng Ruan wrote: >> > Hi Bow, >> > >> > I'm happy to provide the example for testing. See attachment. >> > >> > The command to generate the output above. >> > exonerate --showvulgar no --showalignment yes nuc.fa pro.fa >> > >> > I'll check the test suite to see if I can find why. >> > >> > Best, >> > Zheng >> > >> > >> > On Thu, Mar 20, 2014 at 4:33 PM, Wibowo Arindrarto >> > >> > wrote: >> >> >> >> > Looking at our test cases, this particular case may have slipped >> >> > testing. We do test for several cases of dna2protein (which could >> >> > explain why it works when the nucleotide sequence comes first), but >> >> > not protein2dna. Please let me know if I can also use your example as >> >> > a test in our test corpus :). >> >> >> >> Oops, I meant the reverse ~ we have several test cases for protein2dna >> >> which may explain why it works when the protein sequence comes first >> >> ;). >> > >> > > > From Tom.Brown at enmu.edu Fri Mar 21 12:30:06 2014 From: Tom.Brown at enmu.edu (Brown, Tom) Date: Fri, 21 Mar 2014 16:30:06 +0000 Subject: [Biopython-dev] Blastp for Proteins from Nematoda Message-ID: <7D3EF56670A2CC448808CB89D32F7372B3D1FE20@ITSNV499.ad.enet.enmu.edu> How can I blastp a protein against only proteins from organisms in Nematoda (a phylum)? Thanks Tom ________________________________ Confidentiality Notice: This e-mail, including all attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information as defined under FERPA. Any unauthorized review, use, disclosure or distribution is prohibited unless specifically provided under the New Mexico Inspection of Public Records Act. If you are not the intended recipient, please contact the sender and destroy all copies of this message From p.j.a.cock at googlemail.com Fri Mar 21 12:35:00 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 21 Mar 2014 16:35:00 +0000 Subject: [Biopython-dev] Blastp for Proteins from Nematoda In-Reply-To: <7D3EF56670A2CC448808CB89D32F7372B3D1FE20@ITSNV499.ad.enet.enmu.edu> References: <7D3EF56670A2CC448808CB89D32F7372B3D1FE20@ITSNV499.ad.enet.enmu.edu> Message-ID: On Fri, Mar 21, 2014 at 4:30 PM, Brown, Tom wrote: > How can I blastp a protein against only proteins from organisms in Nematoda (a phylum)? > > Thanks > > Tom Hi Tom, The two main options I can think of are: 1. Make a database of only nematode proteins - which is probably a good idea anyway as many are not in NR, see also http://www.nematodes.org/ 2. Search against the NCBI NR database doing a taxonomy filter, which is possible if you use the -remote option and an Entrez filter (taxonomy ID is 6231, so try txid6231[ORGN] as the Entrez filter). Peter From Tom.Brown at enmu.edu Fri Mar 21 15:23:14 2014 From: Tom.Brown at enmu.edu (Brown, Tom) Date: Fri, 21 Mar 2014 19:23:14 +0000 Subject: [Biopython-dev] Blastp for Proteins from Nematoda In-Reply-To: References: <7D3EF56670A2CC448808CB89D32F7372B3D1FE20@ITSNV499.ad.enet.enmu.edu> Message-ID: <7D3EF56670A2CC448808CB89D32F7372B3D1FEC5@ITSNV499.ad.enet.enmu.edu> Peter, Thanks. It is working. Tom -----Original Message----- From: Peter Cock [mailto:p.j.a.cock at googlemail.com] Sent: Friday, March 21, 2014 10:35 AM To: Brown, Tom Cc: Biopython-dev at lists.open-bio.org Subject: Re: [Biopython-dev] Blastp for Proteins from Nematoda On Fri, Mar 21, 2014 at 4:30 PM, Brown, Tom wrote: > How can I blastp a protein against only proteins from organisms in Nematoda (a phylum)? > > Thanks > > Tom Hi Tom, The two main options I can think of are: 1. Make a database of only nematode proteins - which is probably a good idea anyway as many are not in NR, see also http://www.nematodes.org/ 2. Search against the NCBI NR database doing a taxonomy filter, which is possible if you use the -remote option and an Entrez filter (taxonomy ID is 6231, so try txid6231[ORGN] as the Entrez filter). Peter ________________________________ Confidentiality Notice: This e-mail, including all attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information as defined under FERPA. Any unauthorized review, use, disclosure or distribution is prohibited unless specifically provided under the New Mexico Inspection of Public Records Act. If you are not the intended recipient, please contact the sender and destroy all copies of this message From zruan1991 at gmail.com Fri Mar 21 15:32:33 2014 From: zruan1991 at gmail.com (Zheng Ruan) Date: Fri, 21 Mar 2014 15:32:33 -0400 Subject: [Biopython-dev] [Biopython] Exonerate Parser Error In-Reply-To: References: Message-ID: Hi Bow, I have the same problem when trying to model codon alignment with frameshift being considered. Basically, I have a CodonSeq object to store a coding sequence. The only difference between CodonSeq and Seq object is that CodonSeq has an attribute -- `rf_table` (reading frame table). It's actually a list of positions each codon starts with, so that translate() method will go through the list to translate codon into amino acid. In this case, it is easy to store a coding sequence with frameshift events. And it's not necessary to split the protein to dna alignment into multiple part when frameshift occurs. However, the problem now becomes how to obtain such information (`rf_table`). I find exonerate is quite capable of handling this task, especially with introns in the dna. I do think an object to store protein to dna alignment is necessary in this scenario. Best, Zheng On Fri, Mar 21, 2014 at 10:59 AM, Wibowo Arindrarto wrote: > Hi Zheng, > > The nucleotide information is stored as the alignment annotation. You > can access it using hsp.aln_annotation['query_annotation']. There, > they are stored as triplets, reprensenting the codons. > > This is indeed a tradeoff that I had to make because there is no > proper model yet to represent alignment objects containing sequences > with different length in our master branch. In this case, the length > of the DNA is most of > the time 3x the length of the protein. And yes, this is not ideal > since the actual query are now stored as an annotation ~ trading > places with the translated query. HSPs themselves are basically > modelled based on our MultipleSeqAlignment objects (you can get such > objects when accessing the `aln` attribute from an HSP object). I > think in order to properly model these types of alignment, we need to > have a proper model of three-letter protein Seq objects as well. > > Your CodonSeqAlignment object may help here :), but I have not looked > into it that much to be honest. How does it work with Seq objects with > ProteinAlphabet? Is it possible to align protein and codon sequences? > > I tried storing as much information as possible using the current > approach (e.g. notice the start and end coordinates of each hit and > query, they are parsed from the file and the difference is not the > same as the value you get when doing a `len` on hsp.query and/or > hsp.hit). Note also that when dealing with frameshifts, you may want > to access the hsp.fragments attribute, since frameshifts mean that you > can break further your HSP alignment into multiple subalignments > (fragments as it is called in SearchIO). > > Hope this helps :), > Bow > > P.S. Also CC-ing the Development list ~ this looks like something > interesting for dev in general. > > On Fri, Mar 21, 2014 at 3:39 PM, Zheng Ruan wrote: > > Thanks Bow, > > > > That works for me. But it seems the parser doesn't take the nucleotide > > information into the hsps. All I get is a pairwise alignment between two > > proteins. Nucleotide information is useful because I want to know the > codon > > -- amino acid correspondence. In the case of frameshift the situation may > > not be that straightforward. Maybe you have other concern of not doing > this. > > > > Best, > > Zheng > > > > > > On Thu, Mar 20, 2014 at 7:30 PM, Wibowo Arindrarto < > w.arindrarto at gmail.com> > > wrote: > >> > >> Hi Zheng, > >> > >> Thank you for the files :). I found out what was causing the error and > >> have pushed a patch along with some tests to our codebase > >> > >> ( > https://github.com/biopython/biopython/commit/377889b05235c2e6f192916fb610d0da01b45c6d > ). > >> You should be able to parse your file using the latest `master` > >> branch. > >> > >> Hope this helps, > >> Bow > >> > >> On Thu, Mar 20, 2014 at 9:42 PM, Zheng Ruan > wrote: > >> > Hi Bow, > >> > > >> > I'm happy to provide the example for testing. See attachment. > >> > > >> > The command to generate the output above. > >> > exonerate --showvulgar no --showalignment yes nuc.fa pro.fa > >> > > >> > I'll check the test suite to see if I can find why. > >> > > >> > Best, > >> > Zheng > >> > > >> > > >> > On Thu, Mar 20, 2014 at 4:33 PM, Wibowo Arindrarto > >> > > >> > wrote: > >> >> > >> >> > Looking at our test cases, this particular case may have slipped > >> >> > testing. We do test for several cases of dna2protein (which could > >> >> > explain why it works when the nucleotide sequence comes first), but > >> >> > not protein2dna. Please let me know if I can also use your example > as > >> >> > a test in our test corpus :). > >> >> > >> >> Oops, I meant the reverse ~ we have several test cases for > protein2dna > >> >> which may explain why it works when the protein sequence comes first > >> >> ;). > >> > > >> > > > > > > From arklenna at gmail.com Fri Mar 21 16:54:05 2014 From: arklenna at gmail.com (Lenna Peterson) Date: Fri, 21 Mar 2014 16:54:05 -0400 Subject: [Biopython-dev] [Biopython] Exonerate Parser Error In-Reply-To: References: Message-ID: On Fri, Mar 21, 2014 at 3:32 PM, Zheng Ruan wrote: > Hi Bow, > > I have the same problem when trying to model codon alignment with > frameshift being considered. Basically, I have a CodonSeq object to store a > coding sequence. The only difference between CodonSeq and Seq object is > that CodonSeq has an attribute -- `rf_table` (reading frame table). It's > actually a list of positions each codon starts with, so that translate() > method will go through the list to translate codon into amino acid. In this > case, it is easy to store a coding sequence with frameshift events. And > it's not necessary to split the protein to dna alignment into multiple part > when frameshift occurs. However, the problem now becomes how to obtain such > information (`rf_table`). I find exonerate is quite capable of handling > this task, especially with introns in the dna. I do think an object to > store protein to dna alignment is necessary in this scenario. > Is the (still unmerged) CoordinateMapper the solution to this? http://biopython.org/wiki/Coordinate_mapping If so, let me know and I'll rebase and refresh the pull request. If not, I misunderstood the problem. Cheers, Lenna From zruan1991 at gmail.com Fri Mar 21 17:53:13 2014 From: zruan1991 at gmail.com (Zheng Ruan) Date: Fri, 21 Mar 2014 17:53:13 -0400 Subject: [Biopython-dev] Fwd: [Biopython] Exonerate Parser Error In-Reply-To: References: Message-ID: Forget to cc'd to dev list. Hi Lenna, I'm not quite sure about CoordinateMapper, but it seems to deal with sequence files with rich annotation like genbank. However, In our case, we are typically not sure about the coordinate correspondence between dna and protein sequence. That's why exonerate can help. Thanks! On Fri, Mar 21, 2014 at 4:54 PM, Lenna Peterson wrote: > On Fri, Mar 21, 2014 at 3:32 PM, Zheng Ruan wrote: > >> Hi Bow, >> >> I have the same problem when trying to model codon alignment with >> frameshift being considered. Basically, I have a CodonSeq object to store >> a >> coding sequence. The only difference between CodonSeq and Seq object is >> that CodonSeq has an attribute -- `rf_table` (reading frame table). It's >> actually a list of positions each codon starts with, so that translate() >> method will go through the list to translate codon into amino acid. In >> this >> case, it is easy to store a coding sequence with frameshift events. And >> it's not necessary to split the protein to dna alignment into multiple >> part >> when frameshift occurs. However, the problem now becomes how to obtain >> such >> information (`rf_table`). I find exonerate is quite capable of handling >> this task, especially with introns in the dna. I do think an object to >> store protein to dna alignment is necessary in this scenario. >> > > Is the (still unmerged) CoordinateMapper the solution to this? > http://biopython.org/wiki/Coordinate_mapping > If so, let me know and I'll rebase and refresh the pull request. > If not, I misunderstood the problem. > > Cheers, > > Lenna > From p.j.a.cock at googlemail.com Mon Mar 24 07:57:14 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 24 Mar 2014 11:57:14 +0000 Subject: [Biopython-dev] Volunteer buildslave machines? e.g. Windows & 32 bit Linux Message-ID: Hello all, Tiago and I have been looking after a range of machines covering different operating systems and Python versions, running as volunteer buildslaves for Biopython using buildbot: http://testing.open-bio.org/biopython/tgrid Does anyone else have a lab/home server which could be setup to run nightly Biopython tests for us via buildbot? Ideally the machine needs to be online overnight (European time) when the server is currently setup to schedule tests: http://www.biopython.org/wiki/Continuous_integration Our elderly 32 bit Linux desktop which has been running as a Biopython buildslave for the last few years is finally failing (hard drive problem). I would particularly like to see new buildslaves for: * 32 bit Linux * 64 bit Windows * Windows 7 or 8 (we have a 32 bit XP machine) If you think you might be able to help, the first hurdle is verifying you can checkout Biopython from github, and then compile the source (this is non-trivial on Windows, especially for 64 bit Windows). Note that this is separate from the continuous integration testing done for use via TravisCI whenever the GitHub repository is updated - this is very useful but currently only covers Linux: https://travis-ci.org/biopython/biopython/builds The key benefit of the buildbot server is cross platform testing - but this requires a range of volunteer machines. Thanks, Peter RE: http://lists.open-bio.org/pipermail/biopython-dev/2014-March/011158.html On Tue, Mar 18, 2014 at 1:15 PM, Peter Cock wrote: > On Wed, Mar 12, 2014 at 3:10 PM, Tiago Antao wrote: >> Hi, >> >> I have a docker container ready (save for a few applications). Simple >> usage instructions: >> >> ... >> >> Tiago > > Is this a 32 or 64 bit VM, or either? > > I'm asking because we may want to source a replacement > 32 bit Linux buildslave - the hard drive in the old machine > we've been using is failing, and it is probably not worth > replacing. > > Peter From p.j.a.cock at googlemail.com Mon Mar 24 12:42:29 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 24 Mar 2014 16:42:29 +0000 Subject: [Biopython-dev] Fwd: [Fwd: missing NCBI DTDs] In-Reply-To: References: Message-ID: Three more NCBI Entrez DTD files added: https://github.com/biopython/biopython/commit/58e024f7704c3b7d3694fda42be6fa47808dad7d The downside of the new Entrez code is it silently downloads and caches missing DTD files, so we may want to add something to the manual release process to check what missing DTD files have been cached locally... e.g. $ ls ~/.config/biopython/Bio/Entrez/DTDs/ Regards, Peter ---------- Forwarded message ---------- From: Peter Cock Date: Mon, Mar 24, 2014 at 4:19 PM Subject: Re: [Fwd: missing NCBI DTDs] To: xxx at uci.edu Cc: "biopython-owner at lists.open-bio.org" Thanks for getting in touch. Sadly back when anyone could email the list we had far too much spam. Unfortunately the only practical solution was to insist people join the mailing list before posting. With hindsight the missing DTD message should have also said please check the latest code / issue tracker - in this case we've fixed the missing esummary-v1.dtd file: https://github.com/biopython/biopython/commit/cb560e79def4b24c831725308f17123af4e8eeff We do seem to be missing the other three through, bookdoc_140101.dtd nlmmedlinecitationset_140101.dtd pubmed_140101.dtd The sample code you provided makes testing this easier, thank you. Peter ---------- Forwarded message ---------- On Mon, Mar 24, 2014 at 4:04 PM, wrote: When I attempted to run the following python script: from Bio import Entrez Entrez.email = "esharman at uci.edu" handle = Entrez.efetch(db="pubmed", id="24653700", retmode="xml") record = Entrez.read(handle) handle.close() print record[0]["ArticleTitle"] the following DTDs were reported missing: bookdoc_140101.dtd nlmmedlinecitationset_140101.dtd pubmed_140101.dtd When I ran a similar script to access the SNP database, the following DTD was reported missing: esummary-v1.dtd Downloading and saving these files to the requested python directory eliminated the error messages. Biopython is an absolutely super package! Hope this helps. From marco.galardini at unifi.it Tue Mar 25 19:40:44 2014 From: marco.galardini at unifi.it (Marco Galardini) Date: Tue, 25 Mar 2014 23:40:44 +0000 Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: <52CD2948.7050102@unifi.it> References: <1389160399.57779.YahooMailBasic@web164004.mail.gq1.yahoo.com> <52CD2948.7050102@unifi.it> Message-ID: <533213FC.8010304@unifi.it> Hi all, following your suggestions (as well as the other modules implementations) I've just committed a couple of commits to my biopython fork, featuring the Bio.Phenomics module. The module capabilities are limited to reading/writing Phenotype Microarray files and basic operations on the PlateRecord/WellRecord objects. The module requires numpy to interpolate the signal when the user request a time point that wasn't in the input file (this way the WellRecord object can be queried with slices). I'm thinking on how to implement the parameters extraction from WellRecord objects without the use of scipy. Here's the link to my branch: https://github.com/mgalardini/biopython/tree/phenomics The module and functions have been documented taking inspiration from the other modules: hope they are clear enough for you to try it out. Some example files can be found in Tests/Phenomics. Marco On 08/01/2014 10:32, Marco Galardini wrote: > Hi, > > On 01/08/2014 06:53 AM, Michiel de Hoon wrote: >>> any specification on the style guide for the biopython parsers? >> There is no strict set of rules, but to get you started, many modules >> follow this format: >> - Assuming a PM data file contains only a single data set, the module >> should contain a function "read" that takes either a file name or a file >> handle as the argument. > Unfortunately, the situation is a bit mixed up: there are basically > three file formats for PM data: as csv files (which can contain one or > more data sets or 'plates') and as yaml/json, which can contain also > some metadata. I would therefore use a similar approach as the SeqIO > module, having a parse() and a read() method that returns an exception > if the file contains more than one record. > >> - The module should contain a class (typically called "Record") that >> can store the data in the data file. The "read" function returns an >> object of this class. >> - Try to avoid third-party dependencies if at all possible. > So far the dependencies would be pyYaml (for the yaml/json parsing, > but maybe i could use the stdlib json module) and numpy/scipy for the > extraction of curve parameters. Does this sound ok? >> >> Would it make sense to have a single Bio.Microarray module that can >> house the various microarray parsers (PM, Affy, others)? > I don't know if that would be a good strategy: the Phenotype > Microarrays are very different from the other proper microarrays; how > about a "phenomics" module? > >> >> Best, >> -Michiel. > Kind regards, > Marco > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- ------------------------------------------------- Marco Galardini, PhD Dipartimento di Biologia Via Madonna del Piano, 6 - 50019 Sesto Fiorentino (FI) e-mail: marco.galardini at unifi.it www: http://www.unifi.it/dblage/CMpro-v-p-51.html phone: +39 055 4574737 mobile: +39 340 2808041 ------------------------------------------------- From mjldehoon at yahoo.com Tue Mar 25 22:15:52 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 25 Mar 2014 19:15:52 -0700 (PDT) Subject: [Biopython-dev] Fwd: [Fwd: missing NCBI DTDs] In-Reply-To: Message-ID: <1395800152.18518.YahooMailBasic@web164004.mail.gq1.yahoo.com> We could consider to not include any DTDs with Biopython, and rely on downloading them automatically. This seems a better test case than what we currently have, because as NCBI updates their DTDs, Bio.Entrez depends on this automatic download capability. Best, -Michiel. -------------------------------------------- On Mon, 3/24/14, Peter Cock wrote: Subject: [Biopython-dev] Fwd: [Fwd: missing NCBI DTDs] To: "Biopython-Dev Mailing List" Date: Monday, March 24, 2014, 12:42 PM Three more NCBI Entrez DTD files added: https://github.com/biopython/biopython/commit/58e024f7704c3b7d3694fda42be6fa47808dad7d The downside of the new Entrez code is it silently downloads and caches missing DTD files, so we may want to add something to the manual release process to check what missing DTD files have been cached locally... e.g. $ ls ~/.config/biopython/Bio/Entrez/DTDs/ Regards, Peter ---------- Forwarded message ---------- From: Peter Cock Date: Mon, Mar 24, 2014 at 4:19 PM Subject: Re: [Fwd: missing NCBI DTDs] To: xxx at uci.edu Cc: "biopython-owner at lists.open-bio.org" Thanks for getting in touch. Sadly back when anyone could email the list we had far too much spam. Unfortunately the only practical solution was to insist people join the mailing list before posting. With hindsight the missing DTD message should have also said please check the latest code / issue tracker - in this case we've fixed the missing esummary-v1.dtd file: https://github.com/biopython/biopython/commit/cb560e79def4b24c831725308f17123af4e8eeff We do seem to be missing the other three through, bookdoc_140101.dtd nlmmedlinecitationset_140101.dtd pubmed_140101.dtd The sample code you provided makes testing this easier, thank you. Peter ---------- Forwarded message ---------- On Mon, Mar 24, 2014 at 4:04 PM,? wrote: When I attempted to run the following python script: from Bio import Entrez Entrez.email = "esharman at uci.edu" handle = Entrez.efetch(db="pubmed", id="24653700", retmode="xml") record = Entrez.read(handle) handle.close() print record[0]["ArticleTitle"] the following DTDs were reported missing: bookdoc_140101.dtd nlmmedlinecitationset_140101.dtd pubmed_140101.dtd When I ran a similar script to access the SNP database, the following DTD was reported missing: esummary-v1.dtd Downloading and saving these files to the requested python directory eliminated the error messages. Biopython is an absolutely super package! Hope this helps. _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Wed Mar 26 05:18:21 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 26 Mar 2014 09:18:21 +0000 Subject: [Biopython-dev] Fwd: [Fwd: missing NCBI DTDs] In-Reply-To: <1395800152.18518.YahooMailBasic@web164004.mail.gq1.yahoo.com> References: <1395800152.18518.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: On Wed, Mar 26, 2014 at 2:15 AM, Michiel de Hoon wrote: > We could consider to not include any DTDs with Biopython, > and rely on downloading them automatically. > This seems a better test case than what we currently have, > because as NCBI updates their DTDs, Bio.Entrez depends > on this automatic download capability. > > Best, > -Michiel. Long term not bundling the DTD files seems a good idea. Being cautious we could bundle them for the next release, see how the download mechanism works in the wild, and drop the DTD files for the release after that? This would mean all the Entrez parser tests would require internet access (even if using an old XML file on disk), but given that most of Bio.Entrez requires a connection to the NCBI anyway this isn't such a problem. If we do go down this route, would the current once-a-week running of the online tests with buildbot be enough? Peter From p.j.a.cock at googlemail.com Wed Mar 26 06:14:53 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 26 Mar 2014 10:14:53 +0000 Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: <533213FC.8010304@unifi.it> References: <1389160399.57779.YahooMailBasic@web164004.mail.gq1.yahoo.com> <52CD2948.7050102@unifi.it> <533213FC.8010304@unifi.it> Message-ID: On Tue, Mar 25, 2014 at 11:40 PM, Marco Galardini wrote: > Hi all, > > following your suggestions (as well as the other modules implementations) > I've just committed a couple of commits to my biopython fork, featuring the > Bio.Phenomics module. > The module capabilities are limited to reading/writing Phenotype Microarray > files and basic operations on the PlateRecord/WellRecord objects. The module > requires numpy to interpolate the signal when the user request a time point > that wasn't in the input file (this way the WellRecord object can be queried > with slices). > I'm thinking on how to implement the parameters extraction from WellRecord > objects without the use of scipy. > > Here's the link to my branch: > https://github.com/mgalardini/biopython/tree/phenomics > The module and functions have been documented taking inspiration from the > other modules: hope they are clear enough for you to try it out. > Some example files can be found in Tests/Phenomics. > > Marco Hi Marco, I've not worked with kind of data so my comments are not on the application specifics. But I'm pleased to see unit tests :) One thought was while you define (Java like?) getRow and getColumn methods, your __getitem__ does not support (NumPy like) access, which is something we do for multiple sequence alignments. I guess while most plates are laid out in a grid, the row/column for each sample is not the most important thing - the sample identifier is? Thinking out loud, would properties `rows` and `columns` etc be nicer than `getRow` and `getColumn`, supporting iteration over the rows/columns/etc and indexing? Minor: Your longer function docstrings do not follow PEP257, specifically starting with a one line summary, then a blank line, then the details. Also you are using triple single-quotes, rather than triple double-quotes (like the rest of Biopthon). http://legacy.python.org/dev/peps/pep-0257/ Peter P.S. Also, I'm not very keen on the module name, phenomics - I wonder if it would earn Biopython a badomics award? ;) http://dx.doi.org/10.1186/2047-217X-1-6 From marco.galardini at unifi.it Wed Mar 26 09:26:42 2014 From: marco.galardini at unifi.it (Marco Galardini) Date: Wed, 26 Mar 2014 14:26:42 +0100 Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: References: <1389160399.57779.YahooMailBasic@web164004.mail.gq1.yahoo.com> <52CD2948.7050102@unifi.it> <533213FC.8010304@unifi.it> Message-ID: <20140326142642.9o3hmuipcocsw8g8@webmail.unifi.it> Hi, many thanks for your comments, below some replies: ----- Messaggio da p.j.a.cock at googlemail.com --------- Data: Wed, 26 Mar 2014 10:14:53 +0000 Da: Peter Cock Rispondi-A:Peter Cock Oggetto: Re: [Biopython-dev] Interested in a Phenotype Microarray parser? A: Marco Galardini Cc: Biopython-Dev Mailing List > On Tue, Mar 25, 2014 at 11:40 PM, Marco Galardini > wrote: >> Hi all, >> >> following your suggestions (as well as the other modules implementations) >> I've just committed a couple of commits to my biopython fork, featuring the >> Bio.Phenomics module. >> The module capabilities are limited to reading/writing Phenotype Microarray >> files and basic operations on the PlateRecord/WellRecord objects. The module >> requires numpy to interpolate the signal when the user request a time point >> that wasn't in the input file (this way the WellRecord object can be queried >> with slices). >> I'm thinking on how to implement the parameters extraction from WellRecord >> objects without the use of scipy. >> >> Here's the link to my branch: >> https://github.com/mgalardini/biopython/tree/phenomics >> The module and functions have been documented taking inspiration from the >> other modules: hope they are clear enough for you to try it out. >> Some example files can be found in Tests/Phenomics. >> >> Marco > > Hi Marco, > > I've not worked with kind of data so my comments are not on > the application specifics. But I'm pleased to see unit tests :) > > One thought was while you define (Java like?) getRow and getColumn > methods, your __getitem__ does not support (NumPy like) access, > which is something we do for multiple sequence alignments. I guess > while most plates are laid out in a grid, the row/column for each > sample is not the most important thing - the sample identifier is? > > Thinking out loud, would properties `rows` and `columns` etc be > nicer than `getRow` and `getColumn`, supporting iteration over > the rows/columns/etc and indexing? Yeah, absolutely: I'll work on some changes to have a more straightforward way to select multiple WellRecords on row/column basis. > > Minor: Your longer function docstrings do not follow PEP257, > specifically starting with a one line summary, then a blank line, > then the details. Also you are using triple single-quotes, rather > than triple double-quotes (like the rest of Biopthon). > http://legacy.python.org/dev/peps/pep-0257/ Whoops, I'll change it, thanks > > Peter > > P.S. Also, I'm not very keen on the module name, phenomics - > I wonder if it would earn Biopython a badomics award? ;) > http://dx.doi.org/10.1186/2047-217X-1-6 That's meta-omics right? :p What about 'Phenotype' then? Maybe it's too general, but future extensions may include other phenotypic readouts. Marco > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > ----- Fine del messaggio da p.j.a.cock at googlemail.com ----- Marco Galardini Postdoctoral Fellow EMBL-EBI - European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD, UK Phone: +44 (0)1223 49 2547 From mjldehoon at yahoo.com Wed Mar 26 10:55:46 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 26 Mar 2014 07:55:46 -0700 (PDT) Subject: [Biopython-dev] Fwd: [Fwd: missing NCBI DTDs] In-Reply-To: Message-ID: <1395845746.7662.YahooMailBasic@web164001.mail.gq1.yahoo.com> Hi Peter, On Wed, 3/26/14, Peter Cock wrote: > Long term not bundling the DTD files seems a good idea. > Being cautious we could bundle them for the next release, > see how the download mechanism works in the wild, and > drop the DTD files for the release after that? I don't think we need to be so cautious. > This would mean all the Entrez parser tests would require > internet access (even if using an old XML file on disk), But only the first time. After a DTD is downloaded, it is stored locally, and internet access won't be needed the next time the XML (or other XML files relying on the same DTD) is parsed. In my experience, using local DTDs is much much faster than accessing them through the internet for each XML file, so I would not advocate an internet-only solution. As an alternative to local storage, we could consider downloading all DTDs for each Biopython session, but keeping the results of parsing the DTD in memory (so we won't have to download each DTD over and over again if we're parsing many XML files). This can be almost as fast as using local storage, but will require internet access, and also Bio.Entrez would have to be changed. Best, -Michiel. From p.j.a.cock at googlemail.com Wed Mar 26 11:04:28 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 26 Mar 2014 15:04:28 +0000 Subject: [Biopython-dev] Fwd: [Fwd: missing NCBI DTDs] In-Reply-To: <1395845746.7662.YahooMailBasic@web164001.mail.gq1.yahoo.com> References: <1395845746.7662.YahooMailBasic@web164001.mail.gq1.yahoo.com> Message-ID: On Wed, Mar 26, 2014 at 2:55 PM, Michiel de Hoon wrote: > Hi Peter, > > On Wed, 3/26/14, Peter Cock wrote: >> Long term not bundling the DTD files seems a good idea. >> Being cautious we could bundle them for the next release, >> see how the download mechanism works in the wild, and >> drop the DTD files for the release after that? > > I don't think we need to be so cautious. OK. We could then get rid of the DTDs folder under Bio/Entrez and tweak the Entrez XML parsing tests to ensure they are only run if the internet is available. >> This would mean all the Entrez parser tests would require >> internet access (even if using an old XML file on disk), > > But only the first time. After a DTD is downloaded, it is stored > locally, and internet access won't be needed the next time the XML > (or other XML files relying on the same DTD) is parsed. Yes, but for many test environments, it is always the first time ;) e.g. TravisCI uses a clean VM for each test run. > In my experience, using local DTDs is much much faster than > accessing them through the internet for each XML file, so I > would not advocate an internet-only solution. Yes (I didn't mean to imply that - sorry for any confusion). > As an alternative to local storage, we could consider downloading > all DTDs for each Biopython session, but keeping the results of > parsing the DTD in memory (so we won't have to download each > DTD over and over again if we're parsing many XML files). > This can be almost as fast as using local storage, but will require > internet access, and also Bio.Entrez would have to be changed. A local cache (as implemented) seems fine to me. Peter From p.j.a.cock at googlemail.com Thu Mar 27 07:40:41 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 27 Mar 2014 11:40:41 +0000 Subject: [Biopython-dev] Biopython update talk at BOSC 2014 Message-ID: Hello all, As a co-chair of BOSC this year, I'd like to remind you all that the abstract deadline is about a week away now (April 4): http://news.open-bio.org/news/2014/03/bosc-2014-call-for-abstracts/ Also this year student presenters will get free BOSC registration: http://news.open-bio.org/news/2014/03/free-student-presenters-bosc-2014/ Would anyone like to volunteer to give this year's Biopython Update talk at BOSC 2014 in Boston? I would prefer one of the newer project members have a turn - but also I'll be busier than usual with BOSC organisation duties. Note that giving a talk often helps with getting travel funding to attend a meeting - and in addition to BOSC, you can combine the trip with the BOSC CodeFest beforehand and/or the ISMB meeting afterwards. Thanks, Peter From w.arindrarto at gmail.com Fri Mar 28 17:22:56 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 28 Mar 2014 22:22:56 +0100 Subject: [Biopython-dev] Biopython update talk at BOSC 2014 In-Reply-To: References: Message-ID: Hi Peter, everyone, If there are no objections from anyone, I would like to volunteer :). I am planning to come to ISMB anyway, though this isn't 100% confirmed as I am still applying for the visa. Cheers, Bowo On Mar 27, 2014 12:41 PM, "Peter Cock" wrote: > Hello all, > > As a co-chair of BOSC this year, I'd like to remind you all that the > abstract deadline is about a week away now (April 4): > > http://news.open-bio.org/news/2014/03/bosc-2014-call-for-abstracts/ > > Also this year student presenters will get free BOSC registration: > > http://news.open-bio.org/news/2014/03/free-student-presenters-bosc-2014/ > > Would anyone like to volunteer to give this year's Biopython Update > talk at BOSC 2014 in Boston? I would prefer one of the newer project > members have a turn - but also I'll be busier than usual with BOSC > organisation duties. > > Note that giving a talk often helps with getting travel funding to > attend a meeting - and in addition to BOSC, you can combine the > trip with the BOSC CodeFest beforehand and/or the ISMB meeting > afterwards. > > Thanks, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From ashok.bioinformatics at gmail.com Sun Mar 30 14:11:49 2014 From: ashok.bioinformatics at gmail.com (T. Ashok Kumar) Date: Sun, 30 Mar 2014 23:41:49 +0530 Subject: [Biopython-dev] Contributing code to Biopython Message-ID: Dear Sir/Madam, I wish to contribute a code on predicting *hydropathy plot of a protein sequence* using biopython. Please help me regarding this issue. -- *T. Ashok Kumar* Head, Department of Bioinformatics Noorul Islam College of Arts and Science Kumaracoil, Thuckalay - 629 180 Kanyakumari District, INDIA Mobile:- 00 91 9655307178 *E-Mail:* *ashok.bioinformatics at gmail.com *, *ashok at biogem.org * *Website:* *www.biogem.org * From p.j.a.cock at googlemail.com Mon Mar 31 05:12:51 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 31 Mar 2014 10:12:51 +0100 Subject: [Biopython-dev] Biopython update talk at BOSC 2014 In-Reply-To: References: Message-ID: Thanks for volunteering Bow :) I can send you the LaTeX files I used for the abstract in previous years (each talk gets one page in the BOSC abstract booklet). You should be able to find our past talks online, some as PDFs, some on SlideShare etc: http://biopython.org/wiki/Documentation#Presentations Peter On Fri, Mar 28, 2014 at 9:22 PM, Wibowo Arindrarto wrote: > Hi Peter, everyone, > > If there are no objections from anyone, I would like to volunteer :). > > I am planning to come to ISMB anyway, though this isn't 100% confirmed as I > am still applying for the visa. > > Cheers, > Bowo > > On Mar 27, 2014 12:41 PM, "Peter Cock" wrote: >> >> Hello all, >> >> As a co-chair of BOSC this year, I'd like to remind you all that the >> abstract deadline is about a week away now (April 4): >> >> http://news.open-bio.org/news/2014/03/bosc-2014-call-for-abstracts/ >> >> Also this year student presenters will get free BOSC registration: >> >> http://news.open-bio.org/news/2014/03/free-student-presenters-bosc-2014/ >> >> Would anyone like to volunteer to give this year's Biopython Update >> talk at BOSC 2014 in Boston? I would prefer one of the newer project >> members have a turn - but also I'll be busier than usual with BOSC >> organisation duties. >> >> Note that giving a talk often helps with getting travel funding to >> attend a meeting - and in addition to BOSC, you can combine the >> trip with the BOSC CodeFest beforehand and/or the ISMB meeting >> afterwards. >> >> Thanks, >> >> Peter >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Mon Mar 31 12:30:21 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 31 Mar 2014 17:30:21 +0100 Subject: [Biopython-dev] Fwd: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts In-Reply-To: <87txathtux.fsf@chupacabra.windows.ebi.ac.uk> References: <20140320114815.70210fd5@grandao> <87txathtux.fsf@chupacabra.windows.ebi.ac.uk> Message-ID: I'm glad to see there are several people interested in EuroSciPy. One one of you like to submit a Biopython talk? The deadline is 14 April: https://www.euroscipy.org/2014/calls/abstracts/ Peter From w.arindrarto at gmail.com Mon Mar 31 17:08:50 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Mon, 31 Mar 2014 23:08:50 +0200 Subject: [Biopython-dev] Biopython update talk at BOSC 2014 In-Reply-To: References: Message-ID: Hi Peter, everyone, A LaTeX template would be great :). I'm still preparing the abstract, should be ready for everyone to check soon. Cheers, Bow On Mon, Mar 31, 2014 at 11:12 AM, Peter Cock wrote: > Thanks for volunteering Bow :) > > I can send you the LaTeX files I used for the abstract in > previous years (each talk gets one page in the BOSC > abstract booklet). You should be able to find our past > talks online, some as PDFs, some on SlideShare etc: > http://biopython.org/wiki/Documentation#Presentations > > Peter > > On Fri, Mar 28, 2014 at 9:22 PM, Wibowo Arindrarto > wrote: >> Hi Peter, everyone, >> >> If there are no objections from anyone, I would like to volunteer :). >> >> I am planning to come to ISMB anyway, though this isn't 100% confirmed as I >> am still applying for the visa. >> >> Cheers, >> Bowo >> >> On Mar 27, 2014 12:41 PM, "Peter Cock" wrote: >>> >>> Hello all, >>> >>> As a co-chair of BOSC this year, I'd like to remind you all that the >>> abstract deadline is about a week away now (April 4): >>> >>> http://news.open-bio.org/news/2014/03/bosc-2014-call-for-abstracts/ >>> >>> Also this year student presenters will get free BOSC registration: >>> >>> http://news.open-bio.org/news/2014/03/free-student-presenters-bosc-2014/ >>> >>> Would anyone like to volunteer to give this year's Biopython Update >>> talk at BOSC 2014 in Boston? I would prefer one of the newer project >>> members have a turn - but also I'll be busier than usual with BOSC >>> organisation duties. >>> >>> Note that giving a talk often helps with getting travel funding to >>> attend a meeting - and in addition to BOSC, you can combine the >>> trip with the BOSC CodeFest beforehand and/or the ISMB meeting >>> afterwards. >>> >>> Thanks, >>> >>> Peter >>> _______________________________________________ >>> Biopython-dev mailing list >>> Biopython-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev From marco.galardini at unifi.it Mon Mar 31 19:59:32 2014 From: marco.galardini at unifi.it (Marco Galardini) Date: Tue, 01 Apr 2014 00:59:32 +0100 Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: <20140326142642.9o3hmuipcocsw8g8@webmail.unifi.it> References: <1389160399.57779.YahooMailBasic@web164004.mail.gq1.yahoo.com> <52CD2948.7050102@unifi.it> <533213FC.8010304@unifi.it> <20140326142642.9o3hmuipcocsw8g8@webmail.unifi.it> Message-ID: <533A0164.9010700@unifi.it> Hi, as suggested, I've made a few changes to the proposed Bio.Phenotype module (apart from the less-omics name). The PlateRecord object now can be indexed in a similar fashion as AlignIO multiple alignments: it is still possible to use the WellRecord identifier as an index, but when integers or slices are used, new sub-plates or single wells are returned. The system uses the well identifier as a mean to divide the plate into rows/column. Thanks for pointing out the AlignIO system, it has been very useful. I've left the two getColumns and getRows functions, since for some people it may still be useful to use the wells identifiers. If you feel like they are too confusing I can remove them. The updated branch is here: https://github.com/mgalardini/biopython/tree/phenomics Kind regards, Marco On 26/03/2014 13:26, Marco Galardini wrote: > Hi, > > many thanks for your comments, below some replies: > > ----- Messaggio da p.j.a.cock at googlemail.com --------- > Data: Wed, 26 Mar 2014 10:14:53 +0000 > Da: Peter Cock > Rispondi-A:Peter Cock > Oggetto: Re: [Biopython-dev] Interested in a Phenotype Microarray > parser? > A: Marco Galardini > Cc: Biopython-Dev Mailing List > > >> On Tue, Mar 25, 2014 at 11:40 PM, Marco Galardini >> wrote: >>> Hi all, >>> >>> following your suggestions (as well as the other modules >>> implementations) >>> I've just committed a couple of commits to my biopython fork, >>> featuring the >>> Bio.Phenomics module. >>> The module capabilities are limited to reading/writing Phenotype >>> Microarray >>> files and basic operations on the PlateRecord/WellRecord objects. >>> The module >>> requires numpy to interpolate the signal when the user request a >>> time point >>> that wasn't in the input file (this way the WellRecord object can be >>> queried >>> with slices). >>> I'm thinking on how to implement the parameters extraction from >>> WellRecord >>> objects without the use of scipy. >>> >>> Here's the link to my branch: >>> https://github.com/mgalardini/biopython/tree/phenomics >>> The module and functions have been documented taking inspiration >>> from the >>> other modules: hope they are clear enough for you to try it out. >>> Some example files can be found in Tests/Phenomics. >>> >>> Marco >> >> Hi Marco, >> >> I've not worked with kind of data so my comments are not on >> the application specifics. But I'm pleased to see unit tests :) >> >> One thought was while you define (Java like?) getRow and getColumn >> methods, your __getitem__ does not support (NumPy like) access, >> which is something we do for multiple sequence alignments. I guess >> while most plates are laid out in a grid, the row/column for each >> sample is not the most important thing - the sample identifier is? >> >> Thinking out loud, would properties `rows` and `columns` etc be >> nicer than `getRow` and `getColumn`, supporting iteration over >> the rows/columns/etc and indexing? > > Yeah, absolutely: I'll work on some changes to have a more > straightforward way to select multiple WellRecords on row/column basis. > >> >> Minor: Your longer function docstrings do not follow PEP257, >> specifically starting with a one line summary, then a blank line, >> then the details. Also you are using triple single-quotes, rather >> than triple double-quotes (like the rest of Biopthon). >> http://legacy.python.org/dev/peps/pep-0257/ > > Whoops, I'll change it, thanks > >> >> Peter >> >> P.S. Also, I'm not very keen on the module name, phenomics - >> I wonder if it would earn Biopython a badomics award? ;) >> http://dx.doi.org/10.1186/2047-217X-1-6 > > That's meta-omics right? :p > What about 'Phenotype' then? Maybe it's too general, but future > extensions may include other phenotypic readouts. > > Marco >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > > > ----- Fine del messaggio da p.j.a.cock at googlemail.com ----- > > > > Marco Galardini > Postdoctoral Fellow > EMBL-EBI - European Bioinformatics Institute > Wellcome Trust Genome Campus > Hinxton, Cambridge CB10 1SD, UK > Phone: +44 (0)1223 49 2547 > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- ------------------------------------------------- Marco Galardini, PhD Dipartimento di Biologia Via Madonna del Piano, 6 - 50019 Sesto Fiorentino (FI) e-mail: marco.galardini at unifi.it www: http://www.unifi.it/dblage/CMpro-v-p-51.html phone: +39 055 4574737 mobile: +39 340 2808041 ------------------------------------------------- From harsh.beria93 at gmail.com Mon Mar 3 21:57:35 2014 From: harsh.beria93 at gmail.com (Harsh Beria) Date: Tue, 4 Mar 2014 03:27:35 +0530 Subject: [Biopython-dev] Gsoc 2014 aspirant In-Reply-To: References: <1393582069.6863.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: The pairwise alignment project is not listed in the Ideas page. If I work on it and make a GUI or command line frontend, can that be taken up as a GSOC project? Who can be the potential mentor for this project so that I can chalk out the details before starting to code? Also, do I need to add the project on the idea page? On Sat, Mar 1, 2014 at 12:40 AM, Harsh Beria wrote: > I can work on pairwise sequence alignment. Actually, I have previously > worked on this using Dynamic programming. But I doubt whether this can be a > GSOC project because the work load will not be too much. If we use > different methods to predict sequence alignment and make a front-end which > allows the user to input the sequence or even a pdb file and method of > alignment and predict the alignment, the work can be substantial enough. > > Also, as suggested by Christopher, sequence alignment is pretty basic and > we can use C backend, which can significantly improve the runtime. So, we > can discuss it and I can start working on it. > > > On Fri, Feb 28, 2014 at 11:15 PM, Fields, Christopher J < > cjfields at illinois.edu> wrote: > >> I'm wondering, with something that is as broadly applicable as pairwise >> alignment, would it be better to implement only in Python (or implement in >> Python wedded to a C backend)? Or maybe set up something in python that >> taps into an already well-defined C/C++ library that does this? >> >> The reason I mention this: with bioperl we went down this route with >> bioperl-ext a long time ago (these are generally C-based backend tools with >> a perl front-end), that bit-rotted simply b/c there were other more >> maintainable options. IIUC from this post, similar issues re: >> maintainability held for Bio/pairwise2.py (unless I'm mistaken, which is >> entirely possible). However, tools like pysam and Bio::DB::Samtools (on >> the perl end) seem to have been maintained much more readily since they tap >> into a common library. >> >> For instance, my suggestion would be to implement a Biopython tool that >> does pairwise alignment using library X (SeqAn, EMBOSS, etc). Or maybe a >> generic python front-end that allows users to pick the tool/method for the >> alignment, with maybe a library binding as an initial implementation. >> >> chris >> >> On Feb 28, 2014, at 4:07 AM, Michiel de Hoon wrote: >> >> > Hi Harsh Beria, >> > >> > One option is to work on pairwise sequence alignments. Currently there >> is some code for that in Biopython (in Bio/pairwise2.py), but it is not >> general and is not being maintained. This may need to be rebuilt from the >> ground up. >> > >> > Best, >> > -Michiel. >> > >> > -------------------------------------------- >> > On Wed, 2/26/14, Harsh Beria wrote: >> > >> > Subject: [Biopython-dev] Gsoc 2014 aspirant >> > To: biopython at lists.open-bio.org, biopython-dev at lists.open-bio.org, >> gsoc at lists.open-bio.org >> > Date: Wednesday, February 26, 2014, 11:14 AM >> > >> > Hi, >> > >> > I am a Harsh Beria, third year UG student at Indian >> > Institute of >> > Technology, Kharagpur. I have started working in >> > Computational Biophysics >> > recently, having written code for pdb to fasta parser, >> > sequence alignment >> > using Needleman Wunch and Smith Waterman, Secondary >> > Structure prediction, >> > Henikoff's weight and am currently working on Monte Carlo >> > simulation. >> > Overall, I have started to like this field and want to carry >> > my interest >> > forward by pursuing a relevant project for GSOC 2014. I >> > mainly code in C >> > and python and would like to start contributing to the >> > Biopython library. I >> > started going through the official contribution wiki page ( >> > http://biopython.org/wiki/Contributing) >> > >> > I also went through the wiki page of Bio.SeqlO's. I >> > seriously want to >> > contribute to the Biopython library through GSOC. What do I >> > do next ? >> > >> > Thanks >> > -- >> > >> > Harsh Beria, >> > Indian Institute of Technology,Kharagpur >> > E-mail: harsh.beria93 at gmail.com >> > _______________________________________________ >> > Biopython-dev mailing list >> > Biopython-dev at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > >> > _______________________________________________ >> > Biopython-dev mailing list >> > Biopython-dev at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biopython-dev >> >> > > > -- > > Harsh Beria, > Indian Institute of Technology,Kharagpur > E-mail: harsh.beria93 at gmail.com > > Ph: +919332157616 > > -- Harsh Beria, Indian Institute of Technology,Kharagpur E-mail: harsh.beria93 at gmail.com Ph: +919332157616 From mjldehoon at yahoo.com Tue Mar 4 10:40:52 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 4 Mar 2014 02:40:52 -0800 (PST) Subject: [Biopython-dev] Gsoc 2014 aspirant In-Reply-To: Message-ID: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com> I would suggest to implement this in C, with a thin wrapper in Python. Using 3rd-party libraries would increase the compile-time dependencies of Biopython. Anyway I expect that the tricky part will be the design of the module, rather than the algorithms themselves, so using 3rd-party libraries wouldn't help us so much. Best, -Michiel. -------------------------------------------- On Fri, 2/28/14, Fields, Christopher J wrote: Subject: Re: [Biopython-dev] Gsoc 2014 aspirant To: "Michiel de Hoon" Cc: "biopython-dev at lists.open-bio.org" , "Harsh Beria" Date: Friday, February 28, 2014, 12:45 PM I?m wondering, with something that is as broadly applicable as pairwise alignment, would it be better to implement only in Python (or implement in Python wedded to a C backend)?? Or maybe set up something in python that taps into an already well-defined C/C++ library that does this?? The reason I mention this: with bioperl we went down this route with bioperl-ext a long time ago (these are generally C-based backend tools with a perl front-end), that bit-rotted simply b/c there were other more maintainable options.? IIUC from this post, similar issues re: maintainability held for Bio/pairwise2.py (unless I?m mistaken, which is entirely possible).? However, tools like pysam and Bio::DB::Samtools (on the perl end) seem to have been maintained much more readily since they tap into a common library. For instance, my suggestion would be to implement a Biopython tool that does pairwise alignment using library X (SeqAn, EMBOSS, etc).? Or maybe a generic python front-end that allows users to pick the tool/method for the alignment, with maybe a library binding as an initial implementation. chris On Feb 28, 2014, at 4:07 AM, Michiel de Hoon wrote: > Hi Harsh Beria, > > One option is to work on pairwise sequence alignments. Currently there is some code for that in Biopython (in Bio/pairwise2.py), but it is not general and is not being maintained. This may need to be rebuilt from the ground up. > > Best, > -Michiel. > > -------------------------------------------- > On Wed, 2/26/14, Harsh Beria wrote: > > Subject: [Biopython-dev] Gsoc 2014 aspirant > To: biopython at lists.open-bio.org, biopython-dev at lists.open-bio.org, gsoc at lists.open-bio.org > Date: Wednesday, February 26, 2014, 11:14 AM > > Hi, > > I am a Harsh Beria, third year UG student at Indian > Institute of > Technology, Kharagpur. I have started working in > Computational Biophysics > recently, having written code for pdb to fasta parser, > sequence alignment > using Needleman Wunch and Smith Waterman, Secondary > Structure prediction, > Henikoff's weight and am currently working on Monte Carlo > simulation. > Overall, I have started to like this field and want to carry > my interest > forward by pursuing a relevant project for GSOC 2014. I > mainly code in C > and python and would like to start contributing to the > Biopython library. I > started going through the official contribution wiki page ( > http://biopython.org/wiki/Contributing) > > I also went through the wiki page of Bio.SeqlO's. I > seriously want to > contribute to the Biopython library through GSOC. What do I > do next ? > > Thanks > -- > > Harsh Beria, > Indian Institute of Technology,Kharagpur > E-mail: harsh.beria93 at gmail.com > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Tue Mar 4 12:32:47 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 4 Mar 2014 12:32:47 +0000 Subject: [Biopython-dev] Gsoc 2014 aspirant In-Reply-To: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com> References: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: On Tue, Mar 4, 2014 at 10:40 AM, Michiel de Hoon wrote: > I would suggest to implement this in C, with a thin wrapper in Python. > Using 3rd-party libraries would increase the compile-time dependencies of Biopython. > Anyway I expect that the tricky part will be the design of the module, rather than the algorithms themselves, so using 3rd-party libraries wouldn't help us so much. > Best, > -Michiel. I would also consider a pure Python implementation on top, both for cross-testing, but also for use under Jython or PyPy where using the C code wouldn't be possible (or at least, becomes more complicated). (This is what the existing Bio.pairwise2 module does) Adding third party C libraries would also make life hard for cross platform testing (Linux, Mac, Windows). Peter From cjfields at illinois.edu Tue Mar 4 20:25:47 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Tue, 4 Mar 2014 20:25:47 +0000 Subject: [Biopython-dev] Gsoc 2014 aspirant In-Reply-To: References: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: On Mar 4, 2014, at 6:32 AM, Peter Cock wrote: > On Tue, Mar 4, 2014 at 10:40 AM, Michiel de Hoon wrote: >> I would suggest to implement this in C, with a thin wrapper in Python. >> Using 3rd-party libraries would increase the compile-time dependencies of Biopython. >> Anyway I expect that the tricky part will be the design of the module, rather than the algorithms themselves, so using 3rd-party libraries wouldn't help us so much. >> Best, >> -Michiel. > > I would also consider a pure Python implementation on top, > both for cross-testing, but also for use under Jython or PyPy > where using the C code wouldn't be possible (or at least, > becomes more complicated). > > (This is what the existing Bio.pairwise2 module does) Ah, so it?s pure python. Makes sense to have it for that purpose. You could simply repurpose the existing code. > Adding third party C libraries would also make life hard for > cross platform testing (Linux, Mac, Windows). > > Peter This is a problem with bioinformatics tools in general; they simply aren?t Windows-friendly. However, one can write code with portability in mind (even C/C++). chris From p.j.a.cock at googlemail.com Tue Mar 4 21:45:09 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 4 Mar 2014 21:45:09 +0000 Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: References: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: On Tuesday, March 4, 2014, Fields, Christopher J wrote: > On Mar 4, 2014, at 6:32 AM, Peter Cock > > wrote: > > > On Tue, Mar 4, 2014 at 10:40 AM, Michiel de Hoon > > wrote: > >> I would suggest to implement this in C, with a thin wrapper in Python. > >> Using 3rd-party libraries would increase the compile-time dependencies > of Biopython. > >> Anyway I expect that the tricky part will be the design of the module, > rather than the algorithms themselves, so using 3rd-party libraries > wouldn't help us so much. > >> Best, > >> -Michiel. > > > > I would also consider a pure Python implementation on top, > > both for cross-testing, but also for use under Jython or PyPy > > where using the C code wouldn't be possible (or at least, > > becomes more complicated). > > > > (This is what the existing Bio.pairwise2 module does) > > Ah, so it's pure python. Makes sense to have it for that purpose. You > could simply repurpose the existing code. > > Apologies if unclear - Biopython has both a C and pure Python version of pairwise2 - although most of our bits of C code don't have a fallback and so break under Jython or PyPy etc. Personally I am optimistic about the potential of PyPy to speed up most Python code with its JIT so am a little wary of adding more C code (which may act as a barrier to entry for future maintainers) without a matching Python implementation - but appreciate that for typical C Python this is often the best way to attain high performance. But Michiel is absolutely right - the algorithm choice is even more important. > > Adding third party C libraries would also make life hard for > > cross platform testing (Linux, Mac, Windows). > > > > Peter > > This is a problem with bioinformatics tools in general; they simply aren't > Windows-friendly. However, one can write code with portability in mind > (even C/C++). > > chris Yes indeed - this is one reason why the buildbot for automated cross-platform testing is really helpful (since few if currently any of the Biopython developers use Windows as their primary system). Peter From nigel.delaney at outlook.com Tue Mar 4 22:39:04 2014 From: nigel.delaney at outlook.com (Nigel Delaney) Date: Tue, 4 Mar 2014 17:39:04 -0500 Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: References: <1393929652.71526.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: As a quick $0.02 from a library user on this. Back in 2006 when I first started using biopython I was working with Bio.pairwise2, and learned that it was too slow for the task at hand, so wound up switching to passing/parsing files to an aligner on the command line to get around this, as (on windows back then), I never got the compiled C code to work properly. Although pypy may be fast enough and has been very impressive in my experience (plus computers are much faster now), I think wrapping a library that is cross platform and maintains its own binaries would be a great option rather than implementing C-code (I think this might be what the GSoC student did last year for BioPython by wrapping Python). Mostly though I just wanted to second Peter's wariness about adding C-code into the library. I have found over the years that a lot of python scientific tools that in theory should be cross platform (Stampy, IPython, Matplotlib, Numpy, GATK, etc.) are really not and can be a huge timesuck of dealing with installation issues as code moves between computers and operating systems, usually due to some C code or OS specific behavior. Since code in python works anywhere python is installed, I really appreciate the extent that the library can be as much pure python as allowable or strictly dependent on a particular downloadable binary for a specific OS/Architecture/Scenario. From mjldehoon at yahoo.com Thu Mar 6 01:49:49 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 5 Mar 2014 17:49:49 -0800 (PST) Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: Message-ID: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> Hi Nigel, While compiling Biopython on Windows can be tricky, in my experience it has been easy to compile the C libraries in Biopython on other platforms (Unix/Linux/MacOSX). Have you run into specific problems compiling Biopython? I would think that wrapping 3rd-party libraries or executables is much more error-prone. Best, -Michiel. -------------------------------------------- On Tue, 3/4/14, Nigel Delaney wrote: Subject: Re: [Biopython-dev] [GSoC] Gsoc 2014 aspirant To: "'Peter Cock'" , "'Fields, Christopher J'" Cc: biopython-dev at lists.open-bio.org, "'Harsh Beria'" Date: Tuesday, March 4, 2014, 5:39 PM As a quick $0.02 from a library user on this.? Back in 2006 when I first started using biopython I was working with Bio.pairwise2, and learned that it was too slow for the task at hand, so wound up switching to passing/parsing files to an aligner on the command line to get around this, as (on windows back then), I never got the compiled C code to work properly. Although pypy may be fast enough and has been very impressive in my experience (plus computers are much faster now), I think wrapping a library that is cross platform and maintains its own binaries would be a great option rather than implementing C-code (I think this might be what the GSoC student did last year for BioPython by wrapping Python). Mostly though I just wanted to second Peter's wariness about adding C-code into the library.? I have found over the years that a lot of python scientific tools that in theory should be cross platform (Stampy, IPython, Matplotlib, Numpy, GATK, etc.) are really not and can be a huge timesuck of dealing with installation issues as code moves between computers and operating systems, usually due to some C code or OS specific behavior. Since code in python works anywhere python is installed, I really appreciate the extent that the library can be as much pure python as allowable or strictly dependent on a particular downloadable binary for a specific OS/Architecture/Scenario. _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From harsh.beria93 at gmail.com Fri Mar 7 23:41:53 2014 From: harsh.beria93 at gmail.com (Harsh Beria) Date: Sat, 8 Mar 2014 05:11:53 +0530 Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> References: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: Hi, Regarding the algorithm part of Pairwise Sequence Alignment, I can use Dynamic Programming (Smith Waterman for local and Needleman Wunsch for Global Alignment). Please suggest if I should go for dynamic programming. Also, the above discussion points out that the implementation should be purely python based for cross-platform compatibility. On Thu, Mar 6, 2014 at 7:19 AM, Michiel de Hoon wrote: > Hi Nigel, > > While compiling Biopython on Windows can be tricky, in my experience it > has been easy to compile the C libraries in Biopython on other platforms > (Unix/Linux/MacOSX). Have you run into specific problems compiling > Biopython? I would think that wrapping 3rd-party libraries or executables > is much more error-prone. > > Best, > -Michiel. > > -------------------------------------------- > On Tue, 3/4/14, Nigel Delaney wrote: > > Subject: Re: [Biopython-dev] [GSoC] Gsoc 2014 aspirant > To: "'Peter Cock'" , "'Fields, Christopher > J'" > Cc: biopython-dev at lists.open-bio.org, "'Harsh Beria'" < > harsh.beria93 at gmail.com> > Date: Tuesday, March 4, 2014, 5:39 PM > > As a quick $0.02 from a library user > on this. Back in 2006 when I first > started using biopython I was working with Bio.pairwise2, > and learned that > it was too slow for the task at hand, so wound up switching > to > passing/parsing files to an aligner on the command line to > get around this, > as (on windows back then), I never got the compiled C code > to work properly. > Although pypy may be fast enough and has been very > impressive in my > experience (plus computers are much faster now), I think > wrapping a library > that is cross platform and maintains its own binaries would > be a great > option rather than implementing C-code (I think this might > be what the GSoC > student did last year for BioPython by wrapping Python). > > Mostly though I just wanted to second Peter's wariness about > adding C-code > into the library. I have found over the years that a > lot of python > scientific tools that in theory should be cross platform > (Stampy, IPython, > Matplotlib, Numpy, GATK, etc.) are really not and can be a > huge timesuck of > dealing with installation issues as code moves between > computers and > operating systems, usually due to some C code or OS specific > behavior. > Since code in python works anywhere python is installed, I > really appreciate > the extent that the library can be as much pure python as > allowable or > strictly dependent on a particular downloadable binary for a > specific > OS/Architecture/Scenario. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > -- Harsh Beria, Indian Institute of Technology,Kharagpur E-mail: harsh.beria93 at gmail.com Ph: +919332157616 From tra at popgen.net Mon Mar 10 17:02:17 2014 From: tra at popgen.net (Tiago Antao) Date: Mon, 10 Mar 2014 17:02:17 +0000 Subject: [Biopython-dev] Installing all biopython dependencies Message-ID: <20140310170217.048f00a1@lnx> Hi, I am trying to create a easy-to-install, easy-to-replicate Virtual Machine(*) with all the requirements for Biopython. The idea is mainly to make it easy to have reliable testing, but it can also be used as a very fast installation of Biopython. The VM is currently based on Ubuntu saucy, and I am trying to make sure all the dependencies are met. I would like some advice on the following please: EmbossPhylipNew - the ubuntu package (embassy-phylip) has dependency problems, so I guess this requires a manual download/install? reportlab - What is the best way to get the fonts? XXmotif - What is this??? PAML - There seemed to be a ubuntu package, but no more? The following packages require manual installation (no ubuntu package), please correct me if I am wrong (makes my life easier)... DSSP Dialign msaprobs NACCESS Prank Probcons TCoffee (*) Actually I am building a docker container, but for ease of explanation it is similar to the more familiar Virtual Machine concept Thanks, Tiago From p.j.a.cock at googlemail.com Mon Mar 10 17:20:43 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 10 Mar 2014 17:20:43 +0000 Subject: [Biopython-dev] Installing all biopython dependencies In-Reply-To: <20140310170217.048f00a1@lnx> References: <20140310170217.048f00a1@lnx> Message-ID: On Mon, Mar 10, 2014 at 5:02 PM, Tiago Antao wrote: > Hi, > > I am trying to create a easy-to-install, easy-to-replicate Virtual > Machine(*) with all the requirements for Biopython. The idea is mainly > to make it easy to have reliable testing, but it can also be used as a > very fast installation of Biopython. Sounds good :) > The VM is currently based on Ubuntu saucy, and I am trying to > make sure all the dependencies are met. Some of this would apply to the TravisCI VM, which is also Debian/Ubuntu based. There we have to balance total run time (install everything & run tests) against full coverage). https://travis-ci.org/biopython/biopython/builds It would be neat to have an instance of your docker based VM running as a buildslave too... http://testing.open-bio.org/biopython/tgrid > I would like some advice on the following please: > > EmbossPhylipNew - the ubuntu package (embassy-phylip) has dependency > problems, so I guess this requires a manual download/install? > reportlab - What is the best way to get the fonts? > XXmotif - What is this??? > PAML - There seemed to be a ubuntu package, but no more? > > > The following packages require manual installation (no ubuntu > package), please correct me if I am wrong (makes my life easier)... > > DSSP > Dialign > msaprobs > NACCESS > Prank > Probcons > TCoffee For TravisCI we install a Debian/Ubuntu package for t-coffee, so at least that ought to be easy. e.g. https://packages.debian.org/sid/t-coffee Others (where the licence permits) we can request DebianMed/ BioLinux look at for packaging... Peter From anaryin at gmail.com Mon Mar 10 17:20:37 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 10 Mar 2014 18:20:37 +0100 Subject: [Biopython-dev] Installing all biopython dependencies In-Reply-To: <20140310170217.048f00a1@lnx> References: <20140310170217.048f00a1@lnx> Message-ID: Hi Tiago, For DSSP and NACCESS, you need a manual installation. DSSP is publicly available (binaries): ftp://ftp.cmbi.ru.nl/pub/software/dssp/ NACCESS is more complicated.. you need a license to get it and g77 installed to compile. You might have to contact the authors to allow such a broad distribution.. ? Cheers, Jo?o From tra at popgen.net Mon Mar 10 17:28:54 2014 From: tra at popgen.net (Tiago Antao) Date: Mon, 10 Mar 2014 17:28:54 +0000 Subject: [Biopython-dev] Installing all biopython dependencies In-Reply-To: References: <20140310170217.048f00a1@lnx> Message-ID: <20140310172854.07b0c1df@lnx> On Mon, 10 Mar 2014 18:20:37 +0100 Jo?o Rodrigues wrote: > NACCESS is more complicated.. you need a license to get it and g77 > installed to compile. You might have to contact the authors to allow > such a broad distribution.. Thanks, I might skip NACCESS at this stage. From tra at popgen.net Mon Mar 10 17:33:20 2014 From: tra at popgen.net (Tiago Antao) Date: Mon, 10 Mar 2014 17:33:20 +0000 Subject: [Biopython-dev] Installing all biopython dependencies In-Reply-To: References: <20140310170217.048f00a1@lnx> Message-ID: <20140310173320.228a7866@lnx> On Mon, 10 Mar 2014 17:20:43 +0000 Peter Cock wrote: > It would be neat to have an instance of your docker based > VM running as a buildslave too... > http://testing.open-bio.org/biopython/tgrid That was my original objective, which I have split into two: 1. A biopython docker container 2. A buildbot docker container for biopython (a different kind of beast) And then research how this might integrate with BioCloudLinux. As a side I have to say that using docker is progressing quite well and it seems a very interesting platform for deployment and testing. > Others (where the licence permits) we can request DebianMed/ > BioLinux look at for packaging... >From the problematic list, I will gather a list of software whose license permits packaging and report back on this. Tiago From tra at popgen.net Tue Mar 11 12:04:13 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 11 Mar 2014 12:04:13 +0000 Subject: [Biopython-dev] test_Fasttree_tool Message-ID: <20140311120413.73bbac2e@lnx> Hi, When I run test_Fasttree_tool standalone, all goes well. But if I run it through run_test.py I get this: ====================================================================== FAIL: runTest (__main__.ComparisonTestCase) test_Fasttree_tool ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 302, in runTest self.fail("Warning: Can't open %s for test %s" % (outputfile, self.name)) AssertionError: Warning: Can't open ./output/test_Fasttree_tool for test test_Fasttree_tool ---------------------------------------------------------------------- Any ideas? Thanks, T From harijay at gmail.com Tue Mar 11 13:37:39 2014 From: harijay at gmail.com (hari jayaram) Date: Tue, 11 Mar 2014 09:37:39 -0400 Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings In-Reply-To: <1388053712.20811.YahooMailBasic@web164006.mail.gq1.yahoo.com> References: <1388053712.20811.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: Hi all, I just pull-ed from the git repository just now and after installing the newest numpy and scipy ( also from their respective git repos)..when I try to install biopython I get the same error complaining that I need to define : #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] I tried adding to file "/usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/old_defines.h" the following line #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION But it still fails to install with an error as indicated below. I am sorry I dont know how to work around this. Thanks for your help Hari ################# error message ################# In file included from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1725: /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:11:2: warning: "Using deprecated NumPy API, disable it by #defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] #warning "Using deprecated NumPy API, disable it by #defining ... ^ /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:23:2: error: Should never include npy_deprecated_api directly. #error Should never include npy_deprecated_api directly. ^ In file included from Bio/Cluster/clustermodule.c:3: In file included from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:15: In file included from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17: In file included from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1725: In file included from /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:126: /usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/old_defines.h:6:2: error: The header "old_defines.h" is deprecated as of NumPy 1.7. #error The header "old_defines.h" is deprecated as of NumPy 1.7. ^ 1 warning and 2 errors generated. error: command 'cc' failed with exit status 1 On Thu, Dec 26, 2013 at 5:28 AM, Michiel de Hoon wrote: > Fixed; please let us know if you encounter any problems. > > -Michiel. > > > > -------------------------------------------- > On Mon, 9/23/13, Peter Cock wrote: > > Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings > To: "Biopython-Dev Mailing List" > Date: Monday, September 23, 2013, 4:58 PM > > Hi all, > > I'm seeing the following warning from NumPy 1.7 with Python > 3.3 on Mac > OS X, and on Linux too. I believe the NumPy version is the > critical > factor: > > building 'Bio.Cluster.cluster' extension > building 'Bio.KDTree._CKDTree' extension > building 'Bio.Motif._pwm' extension > building 'Bio.motifs._pwm' extension > > all give: > > > /Users/peterjc/lib/python3.3/site-packages/numpy/core/include/numpy/npy_deprecated_api.h:11:2: > warning: "Using > deprecated NumPy API, disable it by > #defining > NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] > > According to this page, > http://docs.scipy.org/doc/numpy-dev/reference/c-api.deprecations.html > > If we add this line it should confirm our code is clean for > NumPy 1.7 > (and implies to side effects on older NumPy): > > #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION > > Unfortunately that seems all four modules have problems > doing > that, presumably planned NumPy C API changes we need to > handle via a version conditional #ifdef? > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From p.j.a.cock at googlemail.com Tue Mar 11 13:42:55 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 11 Mar 2014 13:42:55 +0000 Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings In-Reply-To: References: <1388053712.20811.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: On Tue, Mar 11, 2014 at 1:37 PM, hari jayaram wrote: > Hi all, > I just pull-ed from the git repository just now and after installing the > newest numpy and scipy ( also from their respective git repos)..when I try > to install biopython I get the same error complaining that I need to define > : > > #defining NPY_NO_DEPRECATED_API > NPY_1_7_API_VERSION" [-W#warnings] > > I tried adding to file > "/usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/old_defines.h" > the following line > > > #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION > > > But it still fails to install with an error as indicated below. > > I am sorry I dont know how to work around this. > Thanks for your help > > Hari I suspect based on this NumPy thread that it is a problem with your NumPy install, perhaps you have some old files from a previous NumPy installation which are confusing things? http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069396.html Peter From p.j.a.cock at googlemail.com Tue Mar 11 13:52:47 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 11 Mar 2014 13:52:47 +0000 Subject: [Biopython-dev] test_Fasttree_tool In-Reply-To: <20140311120413.73bbac2e@lnx> References: <20140311120413.73bbac2e@lnx> Message-ID: On Tue, Mar 11, 2014 at 12:04 PM, Tiago Antao wrote: > Hi, > > When I run test_Fasttree_tool standalone, all goes well. But if I run > it through run_test.py I get this: > ====================================================================== > FAIL: runTest (__main__.ComparisonTestCase) > test_Fasttree_tool > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "run_tests.py", line 302, in runTest > self.fail("Warning: Can't open %s for test %s" % (outputfile, > self.name)) AssertionError: Warning: Can't > open ./output/test_Fasttree_tool for test test_Fasttree_tool > > ---------------------------------------------------------------------- > > > Any ideas? > > Thanks, > T I'm surprised it ever works - the expected output file is not in git :( Try: $ run_tests.py -g test_Fasttree_tool $ more output/test_Fasttree_tool $ git add output/test_Fasttree_tool $ git commit -m "Checking in missing output file for test_Fasttree_tool.py" Peter From tra at popgen.net Tue Mar 11 14:38:53 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 11 Mar 2014 14:38:53 +0000 Subject: [Biopython-dev] A docker container for Biopython Message-ID: <20140311143853.054f89fe@lnx> Hi, In a effort to have a complete, reliable and easy to replicate testing platform for Biopython I am in the process of creating a docker container (inspired by Brad's CloudBioLinux work) with everything needed for Biopython. I currently have a container that allows easy installation of Biopython. I have documented the process here: http://fe.popgen.net/2014/03/a-docker-container-for-biopython/ A few points: 1. A few applications still missing, not many 2. The fasttree test case is still failing 3. Database servers are included 4. This can be used to do a very fast deploy of Biopython (teaching, demo, etc...) 5. The container to test biopython (buildbot based) will be a different one (and probably only of interest to Peter and me ;) ) This is my first container, problems & suggestions most welcome! Tiago From harijay at gmail.com Wed Mar 12 00:17:39 2014 From: harijay at gmail.com (hari jayaram) Date: Tue, 11 Mar 2014 20:17:39 -0400 Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings In-Reply-To: References: <1388053712.20811.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: Thanks Peter .. That was indeed the case. I had a python in /usr/local/lib/python2.7/site-packages/numpy That was getting called rather than the one in my .virtualenv Once I removed that python . The install progressed very smoothly Thanks for your help Hari On Tue, Mar 11, 2014 at 9:42 AM, Peter Cock wrote: > On Tue, Mar 11, 2014 at 1:37 PM, hari jayaram wrote: > > Hi all, > > I just pull-ed from the git repository just now and after installing the > > newest numpy and scipy ( also from their respective git repos)..when I > try > > to install biopython I get the same error complaining that I need to > define > > : > > > > #defining NPY_NO_DEPRECATED_API > > NPY_1_7_API_VERSION" [-W#warnings] > > > > I tried adding to file > > > "/usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/old_defines.h" > > the following line > > > > > > #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION > > > > > > But it still fails to install with an error as indicated below. > > > > I am sorry I dont know how to work around this. > > Thanks for your help > > > > Hari > > I suspect based on this NumPy thread that it is a problem with > your NumPy install, perhaps you have some old files from a > previous NumPy installation which are confusing things? > > http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069396.html > > Peter > From p.j.a.cock at googlemail.com Wed Mar 12 00:25:33 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 12 Mar 2014 00:25:33 +0000 Subject: [Biopython-dev] NumPy 1.7 and NPY_NO_DEPRECATED_API warnings In-Reply-To: References: <1388053712.20811.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: Great - thanks for letting us know that solved the problem :) Peter On Wed, Mar 12, 2014 at 12:17 AM, hari jayaram wrote: > Thanks Peter .. > That was indeed the case. I had a python in > > /usr/local/lib/python2.7/site-packages/numpy > > That was getting called rather than the one in my .virtualenv > > Once I removed that python . The install progressed very smoothly > > Thanks for your help > > Hari > > > > On Tue, Mar 11, 2014 at 9:42 AM, Peter Cock > wrote: >> >> On Tue, Mar 11, 2014 at 1:37 PM, hari jayaram wrote: >> > Hi all, >> > I just pull-ed from the git repository just now and after installing >> > the >> > newest numpy and scipy ( also from their respective git repos)..when I >> > try >> > to install biopython I get the same error complaining that I need to >> > define >> > : >> > >> > #defining NPY_NO_DEPRECATED_API >> > NPY_1_7_API_VERSION" [-W#warnings] >> > >> > I tried adding to file >> > >> > "/usr/local/lib/python2.7/site-packages/numpy/core/include/numpy/old_defines.h" >> > the following line >> > >> > >> > #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION >> > >> > >> > But it still fails to install with an error as indicated below. >> > >> > I am sorry I dont know how to work around this. >> > Thanks for your help >> > >> > Hari >> >> I suspect based on this NumPy thread that it is a problem with >> your NumPy install, perhaps you have some old files from a >> previous NumPy installation which are confusing things? >> >> http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069396.html >> >> Peter > > From p.j.a.cock at googlemail.com Wed Mar 12 09:48:44 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 12 Mar 2014 09:48:44 +0000 Subject: [Biopython-dev] Problem on Mac OS X since update Xcode 5.1 Message-ID: Hi all, I installed the Xcode 5.1 update last night on my work Mac, and this seems to have broken the builds on Python 2.6 and 2.7 (run via builtbot). http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.6/builds/174/steps/compile/logs/stdio ... running build_ext building 'Bio.cpairwise2' extension creating build/temp.macosx-10.9-intel-2.6 creating build/temp.macosx-10.9-intel-2.6/Bio cc -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 -c Bio/cpairwise2module.c -o build/temp.macosx-10.9-intel-2.6/Bio/cpairwise2module.o clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future] clang: note: this will be a hard error (cannot be downgraded to a warning) in the future error: command 'cc' failed with exit status 1 http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.7/builds/170/steps/compile/logs/stdio ... running build_ext building 'Bio.cpairwise2' extension creating build/temp.macosx-10.9-intel-2.7 creating build/temp.macosx-10.9-intel-2.7/Bio cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c Bio/cpairwise2module.c -o build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future] clang: note: this will be a hard error (cannot be downgraded to a warning) in the future error: command 'cc' failed with exit status 1 This looks like a problem where distutils is using a gcc argument which cc (clang) used to ignore but not treats as an error. There will probably be similar reports on other Python projects as well... Peter From p.j.a.cock at googlemail.com Wed Mar 12 09:59:31 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 12 Mar 2014 09:59:31 +0000 Subject: [Biopython-dev] Problem on Mac OS X since update Xcode 5.1 In-Reply-To: References: Message-ID: On Wed, Mar 12, 2014 at 9:48 AM, Peter Cock wrote: > Hi all, > > I installed the Xcode 5.1 update last night on my work Mac, and > this seems to have broken the builds on Python 2.6 and 2.7 > (run via builtbot). > > http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.6/builds/174/steps/compile/logs/stdio > ... > running build_ext > building 'Bio.cpairwise2' extension > creating build/temp.macosx-10.9-intel-2.6 > creating build/temp.macosx-10.9-intel-2.6/Bio > cc -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common > -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX > -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g > -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 > -arch i386 -pipe > -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 > -c Bio/cpairwise2module.c -o > build/temp.macosx-10.9-intel-2.6/Bio/cpairwise2module.o > clang: error: unknown argument: '-mno-fused-madd' > [-Wunused-command-line-argument-hard-error-in-future] > clang: note: this will be a hard error (cannot be downgraded to a > warning) in the future > error: command 'cc' failed with exit status 1 > > http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.7/builds/170/steps/compile/logs/stdio > ... > running build_ext > building 'Bio.cpairwise2' extension > creating build/temp.macosx-10.9-intel-2.7 > creating build/temp.macosx-10.9-intel-2.7/Bio > cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 > -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd > -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes > -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes > -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe > -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 > -c Bio/cpairwise2module.c -o > build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o > clang: error: unknown argument: '-mno-fused-madd' > [-Wunused-command-line-argument-hard-error-in-future] > clang: note: this will be a hard error (cannot be downgraded to a > warning) in the future > error: command 'cc' failed with exit status 1 > > This looks like a problem where distutils is using a gcc argument > which cc (clang) used to ignore but not treats as an error. There > will probably be similar reports on other Python projects as well... > > Peter This looks relevant, especially this reply from Paul Kehrer which suggests this is entirely Apple's fault for shipping a Python and clang compiler which don't get along with the default settings: http://stackoverflow.com/questions/22313407/clang-error-unknown-argument-mno-fused-madd-psycopg2-installation-failure The suggested workaround seems to do the trick, $ export CFLAGS=-Qunused-arguments $ export CPPFLAGS=-Qunused-arguments Perhaps we can add this hack to our setup.py on Mac OS X... it seems harmless under gcc (e.g. my locally compiled version of Python 3.3 used gcc rather than clang)? Or it could be done via the buildbot setup, or on this buildslave directly (e.g. the ~/.bash_profile). What are folks' thoughts on this? We want it to remain easy to install Biopython from source under Mac OS X. Peter From tra at popgen.net Wed Mar 12 13:09:20 2014 From: tra at popgen.net (Tiago Antao) Date: Wed, 12 Mar 2014 13:09:20 +0000 Subject: [Biopython-dev] Logging in on the wiki Message-ID: <20140312130920.701a656e@lnx> Hi, Are people able to log in on the wiki? I am getting back a page with: " Google Error: invalid_request Error in parsing the OpenID auth request. Learn more" Maybe its a google thing, but it might be on our side? Tiago From w.arindrarto at gmail.com Wed Mar 12 13:15:45 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 12 Mar 2014 14:15:45 +0100 Subject: [Biopython-dev] Logging in on the wiki In-Reply-To: <20140312130920.701a656e@lnx> References: <20140312130920.701a656e@lnx> Message-ID: Hi Tiago, I can log in using my Google OpenID. Best, Bow On Wed, Mar 12, 2014 at 2:09 PM, Tiago Antao wrote: > Hi, > > Are people able to log in on the wiki? I am getting back a page with: > > " > Google > Error: invalid_request > Error in parsing the OpenID auth request. > Learn more" > > Maybe its a google thing, but it might be on our side? > > Tiago > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Wed Mar 12 13:19:45 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 12 Mar 2014 13:19:45 +0000 Subject: [Biopython-dev] Logging in on the wiki In-Reply-To: References: <20140312130920.701a656e@lnx> Message-ID: I too can log in to the wiki with my Google OpenID. (Probably unrelated, we had to restart MySQL on the server earlier this week) Peter On Wed, Mar 12, 2014 at 1:15 PM, Wibowo Arindrarto wrote: > Hi Tiago, > > I can log in using my Google OpenID. > > Best, > Bow > > On Wed, Mar 12, 2014 at 2:09 PM, Tiago Antao wrote: >> Hi, >> >> Are people able to log in on the wiki? I am getting back a page with: >> >> " >> Google >> Error: invalid_request >> Error in parsing the OpenID auth request. >> Learn more" >> >> Maybe its a google thing, but it might be on our side? >> >> Tiago >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From tra at popgen.net Wed Mar 12 13:23:45 2014 From: tra at popgen.net (Tiago Antao) Date: Wed, 12 Mar 2014 13:23:45 +0000 Subject: [Biopython-dev] Logging in on the wiki In-Reply-To: References: <20140312130920.701a656e@lnx> Message-ID: <20140312132345.3b577a47@lnx> On Wed, 12 Mar 2014 14:15:45 +0100 Wibowo Arindrarto wrote: > Hi Tiago, > > I can log in using my Google OpenID. > Thanks. I also can login now. I suppose it was something temporary on the google side... Tiago From cjfields at illinois.edu Wed Mar 12 14:46:15 2014 From: cjfields at illinois.edu (Fields, Christopher J) Date: Wed, 12 Mar 2014 14:46:15 +0000 Subject: [Biopython-dev] Problem on Mac OS X since update Xcode 5.1 In-Reply-To: References: Message-ID: On Mar 12, 2014, at 4:59 AM, Peter Cock wrote: > On Wed, Mar 12, 2014 at 9:48 AM, Peter Cock wrote: >> Hi all, >> >> I installed the Xcode 5.1 update last night on my work Mac, and >> this seems to have broken the builds on Python 2.6 and 2.7 >> (run via builtbot). >> >> http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.6/builds/174/steps/compile/logs/stdio >> ... >> running build_ext >> building 'Bio.cpairwise2' extension >> creating build/temp.macosx-10.9-intel-2.6 >> creating build/temp.macosx-10.9-intel-2.6/Bio >> cc -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common >> -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX >> -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g >> -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 >> -arch i386 -pipe >> -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 >> -c Bio/cpairwise2module.c -o >> build/temp.macosx-10.9-intel-2.6/Bio/cpairwise2module.o >> clang: error: unknown argument: '-mno-fused-madd' >> [-Wunused-command-line-argument-hard-error-in-future] >> clang: note: this will be a hard error (cannot be downgraded to a >> warning) in the future >> error: command 'cc' failed with exit status 1 >> >> http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.7/builds/170/steps/compile/logs/stdio >> ... >> running build_ext >> building 'Bio.cpairwise2' extension >> creating build/temp.macosx-10.9-intel-2.7 >> creating build/temp.macosx-10.9-intel-2.7/Bio >> cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 >> -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd >> -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes >> -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes >> -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe >> -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 >> -c Bio/cpairwise2module.c -o >> build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o >> clang: error: unknown argument: '-mno-fused-madd' >> [-Wunused-command-line-argument-hard-error-in-future] >> clang: note: this will be a hard error (cannot be downgraded to a >> warning) in the future >> error: command 'cc' failed with exit status 1 >> >> This looks like a problem where distutils is using a gcc argument >> which cc (clang) used to ignore but not treats as an error. There >> will probably be similar reports on other Python projects as well... >> >> Peter > > This looks relevant, especially this reply from Paul Kehrer which > suggests this is entirely Apple's fault for shipping a Python and > clang compiler which don't get along with the default settings: > > http://stackoverflow.com/questions/22313407/clang-error-unknown-argument-mno-fused-madd-psycopg2-installation-failure > > The suggested workaround seems to do the trick, > > $ export CFLAGS=-Qunused-arguments > $ export CPPFLAGS=-Qunused-arguments > > Perhaps we can add this hack to our setup.py on Mac OS X... > it seems harmless under gcc (e.g. my locally compiled version > of Python 3.3 used gcc rather than clang)? > > Or it could be done via the buildbot setup, or on this buildslave > directly (e.g. the ~/.bash_profile). > > What are folks' thoughts on this? We want it to remain easy > to install Biopython from source under Mac OS X. > > Peter That?s scary. I planned on updating to the latest Xcode myself today, nice to be forewarned. I?ve been seeing clang complaints with various tools already, so I wouldn?t be surprised if this problem is more wide-spread than python. chris From tra at popgen.net Wed Mar 12 15:10:48 2014 From: tra at popgen.net (Tiago Antao) Date: Wed, 12 Mar 2014 15:10:48 +0000 Subject: [Biopython-dev] A biopython buildbot as a docker container Message-ID: <20140312151048.45066ade@lnx> Hi, I have a docker container ready (save for a few applications). Simple usage instructions: 1. Create a directory and download inside this file: https://raw.github.com/tiagoantao/my-containers/master/Biopython-Test 2. Rename it Dockerfile (capital D) 3. Get a buildbot username and password (from Peter or me), edit the file and replace CHANGEUSER CHANGEPASS 4. do docker build -t biopython-buildbot . 5. do docker run biopython-buildbot Beta-version, comments appreciated ;) If people like this, I will amend the Continuous Integration page on the wiki accordingly Tiago From eparker at ucdavis.edu Thu Mar 13 00:06:51 2014 From: eparker at ucdavis.edu (Evan Parker) Date: Wed, 12 Mar 2014 17:06:51 -0700 Subject: [Biopython-dev] Gsoc 2014: another aspirant here Message-ID: Hello, My name is Evan Parker, I am a third year graduate student studying analytical chemistry at UC Davis. Coding was my hobby in undergrad and has become a major component of my current graduate work in the context of mass-spectral interpretation software. I use Biopython for parsing Uniprot sequence data/annotations and I would be delighted to have the opportunity give back, especially under the umbrella of the Google Summer of Code. The project on implementing an indexing & lazy-loading sequence parser looks interesting to me and, while difficult, it is something that I could wrap my mind around. I apologize in advance for the wall of text but if you have the time I'd like to ask a couple of questions relating to implementation as I prepare my proposal. 1) Should the lazy loading be done primarily in the context of records returned from the SeqIO.index() dict-like object, or should the lazy loading be available to the generator made by SeqIO.parse()? The project idea in the wiki mentions adding lazy-loading to SeqIO.parse() but it seems to me that the best implementation of lazy loading in these two SeqIO functions would be significantly different. My initial impression of the project would be for SeqIO.parse() to stage a file segment and selectively generate information when called while SeqIO.index() would use a more detailed map created at instantiation to pull information selectively. 2) Is slower instantiation an acceptable trade-off for memory efficiency? In the current implementation of SeqIO.index(), sequence files are read twice, once to annotate beginning points of entries and a second time to load the SeqRecord requested by __getitem__(). A lazy-loading parser could amplify this issue if it works by indexing locations other than the start of the record. The alternative approach of passing the complete textual sequence record and selectively parsing would be easier to implement (and would include dual compatibility with parse and index) but it seems that it would be slower when called and potentially less memory efficient. Any of your thoughts and comments are appreciated, - Evan From w.arindrarto at gmail.com Thu Mar 13 09:04:16 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 13 Mar 2014 10:04:16 +0100 Subject: [Biopython-dev] Gsoc 2014: another aspirant here In-Reply-To: References: Message-ID: Hi Evan, Thank you for your interest in the project :). It's good to know you're already quite familiar with SeqIO as well. My replies are below. > 1) Should the lazy loading be done primarily in the context of records > returned from the SeqIO.index() dict-like object, or should the lazy > loading be available to the generator made by SeqIO.parse()? The project > idea in the wiki mentions adding lazy-loading to SeqIO.parse() but it seems > to me that the best implementation of lazy loading in these two SeqIO > functions would be significantly different. My initial impression of the > project would be for SeqIO.parse() to stage a file segment and selectively > generate information when called while SeqIO.index() would use a more > detailed map created at instantiation to pull information selectively. We don't necessarily have to be restricted to SeqIO.index() objects here. You'll notice of course that SeqIO.index() indexes complete records without granularity up to the possible subsequences. What we're looking for is compatibility with our existing SeqIO parsers. The lazy parser may well be a new object implemented alongside SeqIO, but the parsing logic itself (the one whose invocation is delayed by the lazy parser) should rely on existing parsers. > 2) Is slower instantiation an acceptable trade-off for memory efficiency? > In the current implementation of SeqIO.index(), sequence files are read > twice, once to annotate beginning points of entries and a second time to > load the SeqRecord requested by __getitem__(). A lazy-loading parser could > amplify this issue if it works by indexing locations other than the start > of the record. The alternative approach of passing the complete textual > sequence record and selectively parsing would be easier to implement (and > would include dual compatibility with parse and index) but it seems that it > would be slower when called and potentially less memory efficient. I think this will depend on what you want to store in the indices and how you store them, which will most likely differ per sequencing file format. Coming up with this, we expect, is an important part of the project implementation. Doing a first pass for indexing is acceptable. Instantiation of the object using the index doesn't necessarily have to be slow. Retrieval of the actual (sub)sequence will be slower since we will touch the disk and do the actual parsing by then. But this can also be improved, perhaps by caching the result so subsequent retrieval is faster. One important point (and the use case that we envision for this project) is that subsequences in large sequence files (genome assemblies, for example) can be retrieved quite quickly. Take a look at some existing indexing implementations, such as faidx[1] for FASTA files and BAM indexing[2]. Looking at the tabix[3] tool may also help. The faidx indexing, for example, relies on the FASTA file having the same line length, which means it can be used to retrieve subsequences given only the file offset of a FASTA record. Hope this gives you some useful hints. Good luck with your proposal :). Cheers, Bow [1] http://samtools.sourceforge.net/samtools.shtml [2] http://samtools.github.io/hts-specs/SAMv1.pdf [3] http://bioinformatics.oxfordjournals.org/content/27/5/718 From eparker at ucdavis.edu Thu Mar 13 19:04:34 2014 From: eparker at ucdavis.edu (Evan Parker) Date: Thu, 13 Mar 2014 12:04:34 -0700 Subject: [Biopython-dev] Gsoc 2014: another aspirant here In-Reply-To: References: Message-ID: Thank you Bow, I'll need to digest this a bit, but you have given me a good direction. My inclination for the proposal is to focus on sequential file formats used to transmit 'databases' of sequences (like fasta, embl, uniprot-xml, swiss, and others) and to mostly ignore formats used to convey alignment (ie. anything covered exclusively by parsers in AlignIO). If this is a poor direction please tell me so that I can add to my preparation. -Evan Evan Parker Ph.D. Candidate Dept. of Chemistry - Lebrilla Lab University of California, Davis On Thu, Mar 13, 2014 at 2:04 AM, Wibowo Arindrarto wrote: > Hi Evan, > > Thank you for your interest in the project :). It's good to know > you're already quite familiar with SeqIO as well. > > My replies are below. > > > 1) Should the lazy loading be done primarily in the context of records > > returned from the SeqIO.index() dict-like object, or should the lazy > > loading be available to the generator made by SeqIO.parse()? The project > > idea in the wiki mentions adding lazy-loading to SeqIO.parse() but it > seems > > to me that the best implementation of lazy loading in these two SeqIO > > functions would be significantly different. My initial impression of the > > project would be for SeqIO.parse() to stage a file segment and > selectively > > generate information when called while SeqIO.index() would use a more > > detailed map created at instantiation to pull information selectively. > > We don't necessarily have to be restricted to SeqIO.index() objects > here. You'll notice of course that SeqIO.index() indexes complete > records without granularity up to the possible subsequences. What > we're looking for is compatibility with our existing SeqIO parsers. > The lazy parser may well be a new object implemented alongside SeqIO, > but the parsing logic itself (the one whose invocation is delayed by > the lazy parser) should rely on existing parsers. > > > 2) Is slower instantiation an acceptable trade-off for memory efficiency? > > In the current implementation of SeqIO.index(), sequence files are read > > twice, once to annotate beginning points of entries and a second time to > > load the SeqRecord requested by __getitem__(). A lazy-loading parser > could > > amplify this issue if it works by indexing locations other than the start > > of the record. The alternative approach of passing the complete textual > > sequence record and selectively parsing would be easier to implement (and > > would include dual compatibility with parse and index) but it seems that > it > > would be slower when called and potentially less memory efficient. > > I think this will depend on what you want to store in the indices and > how you store them, which will most likely differ per sequencing file > format. Coming up with this, we expect, is an important part of the > project implementation. Doing a first pass for indexing is acceptable. > Instantiation of the object using the index doesn't necessarily have > to be slow. Retrieval of the actual (sub)sequence will be slower since > we will touch the disk and do the actual parsing by then. But this can > also be improved, perhaps by caching the result so subsequent > retrieval is faster. One important point (and the use case that we > envision for this project) is that subsequences in large sequence > files (genome assemblies, for example) can be retrieved quite quickly. > > Take a look at some existing indexing implementations, such as > faidx[1] for FASTA files and BAM indexing[2]. Looking at the tabix[3] > tool may also help. The faidx indexing, for example, relies on the > FASTA file having the same line length, which means it can be used to > retrieve subsequences given only the file offset of a FASTA record. > > Hope this gives you some useful hints. Good luck with your proposal :). > > Cheers, > Bow > > [1] http://samtools.sourceforge.net/samtools.shtml > [2] http://samtools.github.io/hts-specs/SAMv1.pdf > [3] http://bioinformatics.oxfordjournals.org/content/27/5/718 > From w.arindrarto at gmail.com Fri Mar 14 05:30:13 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 14 Mar 2014 06:30:13 +0100 Subject: [Biopython-dev] Gsoc 2014: another aspirant here In-Reply-To: References: Message-ID: Hi Evan, Focusing on the SeqIO parsers is ok. That's where having lazy parsers would help most (and you've got a handful of formats there already). Remember that you'll also need to account for time to write tests, possibly benchmark or profile the code (lazy parsers should improve performance after all), and write documentation, outside of writing the code itself. You'll also want to be clear about this in your proposed timeline, since that will be your main guide during the coding period. Looking forward to reading your proposal :), Bow On Thu, Mar 13, 2014 at 8:04 PM, Evan Parker wrote: > Thank you Bow, > > I'll need to digest this a bit, but you have given me a good direction. My > inclination for the proposal is to focus on sequential file formats used to > transmit 'databases' of sequences (like fasta, embl, uniprot-xml, swiss, and > others) and to mostly ignore formats used to convey alignment (ie. anything > covered exclusively by parsers in AlignIO). If this is a poor direction > please tell me so that I can add to my preparation. > > -Evan > > Evan Parker > Ph.D. Candidate > Dept. of Chemistry - Lebrilla Lab > University of California, Davis > > > On Thu, Mar 13, 2014 at 2:04 AM, Wibowo Arindrarto > wrote: >> >> Hi Evan, >> >> Thank you for your interest in the project :). It's good to know >> you're already quite familiar with SeqIO as well. >> >> My replies are below. >> >> > 1) Should the lazy loading be done primarily in the context of records >> > returned from the SeqIO.index() dict-like object, or should the lazy >> > loading be available to the generator made by SeqIO.parse()? The project >> > idea in the wiki mentions adding lazy-loading to SeqIO.parse() but it >> > seems >> > to me that the best implementation of lazy loading in these two SeqIO >> > functions would be significantly different. My initial impression of the >> > project would be for SeqIO.parse() to stage a file segment and >> > selectively >> > generate information when called while SeqIO.index() would use a more >> > detailed map created at instantiation to pull information selectively. >> >> We don't necessarily have to be restricted to SeqIO.index() objects >> here. You'll notice of course that SeqIO.index() indexes complete >> records without granularity up to the possible subsequences. What >> we're looking for is compatibility with our existing SeqIO parsers. >> The lazy parser may well be a new object implemented alongside SeqIO, >> but the parsing logic itself (the one whose invocation is delayed by >> the lazy parser) should rely on existing parsers. >> >> > 2) Is slower instantiation an acceptable trade-off for memory >> > efficiency? >> > In the current implementation of SeqIO.index(), sequence files are read >> > twice, once to annotate beginning points of entries and a second time to >> > load the SeqRecord requested by __getitem__(). A lazy-loading parser >> > could >> > amplify this issue if it works by indexing locations other than the >> > start >> > of the record. The alternative approach of passing the complete textual >> > sequence record and selectively parsing would be easier to implement >> > (and >> > would include dual compatibility with parse and index) but it seems that >> > it >> > would be slower when called and potentially less memory efficient. >> >> I think this will depend on what you want to store in the indices and >> how you store them, which will most likely differ per sequencing file >> format. Coming up with this, we expect, is an important part of the >> project implementation. Doing a first pass for indexing is acceptable. >> Instantiation of the object using the index doesn't necessarily have >> to be slow. Retrieval of the actual (sub)sequence will be slower since >> we will touch the disk and do the actual parsing by then. But this can >> also be improved, perhaps by caching the result so subsequent >> retrieval is faster. One important point (and the use case that we >> envision for this project) is that subsequences in large sequence >> files (genome assemblies, for example) can be retrieved quite quickly. >> >> Take a look at some existing indexing implementations, such as >> faidx[1] for FASTA files and BAM indexing[2]. Looking at the tabix[3] >> tool may also help. The faidx indexing, for example, relies on the >> FASTA file having the same line length, which means it can be used to >> retrieve subsequences given only the file offset of a FASTA record. >> >> Hope this gives you some useful hints. Good luck with your proposal :). >> >> Cheers, >> Bow >> >> [1] http://samtools.sourceforge.net/samtools.shtml >> [2] http://samtools.github.io/hts-specs/SAMv1.pdf >> [3] http://bioinformatics.oxfordjournals.org/content/27/5/718 > > From p.j.a.cock at googlemail.com Fri Mar 14 13:34:40 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 14 Mar 2014 13:34:40 +0000 Subject: [Biopython-dev] Gsoc 2014: another aspirant here In-Reply-To: References: Message-ID: On Fri, Mar 14, 2014 at 5:30 AM, Wibowo Arindrarto wrote: > Hi Evan, > > Focusing on the SeqIO parsers is ok. That's where having lazy parsers > would help most (and you've got a handful of formats there already). > Remember that you'll also need to account for time to write tests, > possibly benchmark or profile the code (lazy parsers should improve > performance after all), and write documentation, outside of writing > the code itself. You'll also want to be clear about this in your > proposed timeline, since that will be your main guide during the > coding period. > > Looking forward to reading your proposal :), > Bow Yes, profiling will be important here - if your script accesses all the annotation/sequence/etc of a record, then the lazy parser will probably be slower (all the same work, plus an overhead). It should win when only a subset of the data is needed, both in terms of speed and memory usage. Peter From eric.talevich at gmail.com Sat Mar 15 05:29:21 2014 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 14 Mar 2014 22:29:21 -0700 Subject: [Biopython-dev] Google Summer of Code 2014: Call for student applications Message-ID: Hi everyone, Google Summer of Code is an annual program that funds students all over the world to work with open-source software projects to develop new code. This summer, the Open Bioinformatics Foundation (OBF) is taking on students through the Google Summer of Code program to work with mentors on established bioinformatics software projects including BioPython. We invite students to submit applications by Friday, March 21. Full details are here: http://news.open-bio.org/news/2014/03/obf-gsoc-2014-call-for-student-applications/ All the best, Eric & Raoul OBF GSoC organization admins From arklenna at gmail.com Sun Mar 16 20:53:22 2014 From: arklenna at gmail.com (Lenna Peterson) Date: Sun, 16 Mar 2014 16:53:22 -0400 Subject: [Biopython-dev] Problem on Mac OS X since update Xcode 5.1 In-Reply-To: References: Message-ID: On Wed, Mar 12, 2014 at 5:59 AM, Peter Cock wrote: > On Wed, Mar 12, 2014 at 9:48 AM, Peter Cock > wrote: > > Hi all, > > > > I installed the Xcode 5.1 update last night on my work Mac, and > > this seems to have broken the builds on Python 2.6 and 2.7 > > (run via builtbot). > > > > > http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.6/builds/174/steps/compile/logs/stdio > > ... > > running build_ext > > building 'Bio.cpairwise2' extension > > creating build/temp.macosx-10.9-intel-2.6 > > creating build/temp.macosx-10.9-intel-2.6/Bio > > cc -fno-strict-aliasing -fno-common -dynamic -g -Os -pipe -fno-common > > -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX > > -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g > > -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 > > -arch i386 -pipe > > > -I/System/Library/Frameworks/Python.framework/Versions/2.6/include/python2.6 > > -c Bio/cpairwise2module.c -o > > build/temp.macosx-10.9-intel-2.6/Bio/cpairwise2module.o > > clang: error: unknown argument: '-mno-fused-madd' > > [-Wunused-command-line-argument-hard-error-in-future] > > clang: note: this will be a hard error (cannot be downgraded to a > > warning) in the future > > error: command 'cc' failed with exit status 1 > > > > > http://testing.open-bio.org/biopython/builders/OS%20X%20-%20Python%202.7/builds/170/steps/compile/logs/stdio > > ... > > running build_ext > > building 'Bio.cpairwise2' extension > > creating build/temp.macosx-10.9-intel-2.7 > > creating build/temp.macosx-10.9-intel-2.7/Bio > > cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 > > -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd > > -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes > > -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes > > -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe > > > -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 > > -c Bio/cpairwise2module.c -o > > build/temp.macosx-10.9-intel-2.7/Bio/cpairwise2module.o > > clang: error: unknown argument: '-mno-fused-madd' > > [-Wunused-command-line-argument-hard-error-in-future] > > clang: note: this will be a hard error (cannot be downgraded to a > > warning) in the future > > error: command 'cc' failed with exit status 1 > > > > This looks like a problem where distutils is using a gcc argument > > which cc (clang) used to ignore but not treats as an error. There > > will probably be similar reports on other Python projects as well... > > > > Peter > > This looks relevant, especially this reply from Paul Kehrer which > suggests this is entirely Apple's fault for shipping a Python and > clang compiler which don't get along with the default settings: > > > http://stackoverflow.com/questions/22313407/clang-error-unknown-argument-mno-fused-madd-psycopg2-installation-failure > > The suggested workaround seems to do the trick, > > $ export CFLAGS=-Qunused-arguments > $ export CPPFLAGS=-Qunused-arguments > > I encountered the same problem (clean install of Mavericks, vanilla Python, latest XCode from App Store). One answer [1] suggests this is not a guaranteed solution but offers a different flag (which I did not test). I chose to edit system python files [2] which is definitely not the best option for most users. [1]: http://stackoverflow.com/a/22315129 [2]: http://stackoverflow.com/a/22322068 > Perhaps we can add this hack to our setup.py on Mac OS X... > it seems harmless under gcc (e.g. my locally compiled version > of Python 3.3 used gcc rather than clang)? > Do you mean editing environment variables with `os.environ`? I don't know enough about the details of how packages are built to know what will work with both compiling from source, easy_install, pip, etc. > > Or it could be done via the buildbot setup, or on this buildslave > directly (e.g. the ~/.bash_profile). > It's a dilemma, because asking users to edit their .bashrc or .bash_profile before installation is annoying and easy to overlook, but modifying them in setup.py feels hacky (i.e. how long will this solution work?). Crossing my fingers and hoping Apple fixes this in an update... > > What are folks' thoughts on this? We want it to remain easy > to install Biopython from source under Mac OS X. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From p.j.a.cock at googlemail.com Sun Mar 16 21:15:06 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 16 Mar 2014 21:15:06 +0000 Subject: [Biopython-dev] Problem on Mac OS X since update Xcode 5.1 In-Reply-To: References: Message-ID: On Sun, Mar 16, 2014 at 8:53 PM, Lenna Peterson wrote: > On Wed, Mar 12, 2014 at 5:59 AM, Peter Cock > wrote: >> >> ... >> >> This looks relevant, especially this reply from Paul Kehrer which >> suggests this is entirely Apple's fault for shipping a Python and >> clang compiler which don't get along with the default settings: >> >> >> http://stackoverflow.com/questions/22313407/clang-error-unknown-argument-mno-fused-madd-psycopg2-installation-failure >> > >> The suggested workaround seems to do the trick, >> >> $ export CFLAGS=-Qunused-arguments >> $ export CPPFLAGS=-Qunused-arguments >> > > I encountered the same problem (clean install of Mavericks, vanilla Python, > latest XCode from App Store). > > One answer [1] suggests this is not a guaranteed solution but offers a > different flag (which I did not test). > > I chose to edit system python files [2] which is definitely not the best > option for most users. > > [1]: http://stackoverflow.com/a/22315129 > [2]: http://stackoverflow.com/a/22322068 > >> Perhaps we can add this hack to our setup.py on Mac OS X... >> it seems harmless under gcc (e.g. my locally compiled version >> of Python 3.3 used gcc rather than clang)? > > Do you mean editing environment variables with `os.environ`? I don't know > enough about the details of how packages are built to know what will work > with both compiling from source, easy_install, pip, etc. Yes, I was thinking about editing the environment variables in setup.py via the os module. I agree there are potential risks with 3rd party installers, but adding -Qunused-arguments to any existing CFLAGS (within the scope of the Biopython install) is hopefully low risk... >> Or it could be done via the buildbot setup, or on this buildslave >> directly (e.g. the ~/.bash_profile). > > It's a dilemma, because asking users to edit their .bashrc or .bash_profile > before installation is annoying and easy to overlook, but modifying them in > setup.py feels hacky (i.e. how long will this solution work?). Crossing my > fingers and hoping Apple fixes this in an update... > Fingers crossed Apple pushes another update in the next few weeks to resolve this... Peter From anaryin at gmail.com Mon Mar 17 16:05:04 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 17 Mar 2014 17:05:04 +0100 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References: Message-ID: Dear all, I created a new 'empty' branch called bio.struct on my github account. The only change from the master branch is a new folder - Bio/Struct - that has an empty __init__.py in there. Please add issues with feature requests, and if you are willing to start coding, I'd say fork and go ahead! https://github.com/JoaoRodrigues/biopython/tree/bio.struct I also added a small wiki page with a description. 2014-02-20 0:05 GMT+01:00 Morten Kjeldgaard : > > On 19/02/2014, at 17:35, David Cain wrote: > > > I frequently make use of Bio.PDB, and agree wholeheartedly that certain > > aspects of it are very dated, or haphazardly organized. > > > > The module as a whole would benefit greatly from some extra attention. > I'm > > happy to lend a hand in whatever revamp takes place. > > I second that. I am also willing to participate in this project! > > Cheers, > Morten > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From p.j.a.cock at googlemail.com Mon Mar 17 16:15:06 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 17 Mar 2014 16:15:06 +0000 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References: Message-ID: On Mon, Mar 17, 2014 at 4:05 PM, Jo?o Rodrigues wrote: > Dear all, > > I created a new 'empty' branch called bio.struct on my github account. The > only change from the master branch is a new folder - Bio/Struct - that has > an empty __init__.py in there. Please add issues with feature requests, and > if you are willing to start coding, I'd say fork and go ahead! > > https://github.com/JoaoRodrigues/biopython/tree/bio.struct > > I also added a small wiki page with a > description. Are we all generally in favour of lower case for new module names (as per PEP8)? i.e. Bio/struct not Bio/Struct ? Peter From anaryin at gmail.com Mon Mar 17 16:19:31 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 17 Mar 2014 17:19:31 +0100 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References: Message-ID: Hello Peter, Sorry, typo actually, wrote with small case everywhere but the module name.. thanks. Also something I have in mind. Should wrappers for NACCESS and DSSP be refactored to use Bio.Application? From p.j.a.cock at googlemail.com Mon Mar 17 16:32:30 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 17 Mar 2014 16:32:30 +0000 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References: Message-ID: On Mon, Mar 17, 2014 at 4:19 PM, Jo?o Rodrigues wrote: > Hello Peter, > > Sorry, typo actually, wrote with small case everywhere but the module name.. > thanks. > > Also something I have in mind. Should wrappers for NACCESS and > DSSP be refactored to use Bio.Application? If you think it would help, sure. Peter From anaryin at gmail.com Mon Mar 17 16:33:55 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 17 Mar 2014 17:33:55 +0100 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References: Message-ID: To be honest, more of an issue of internal consistency with the rest of the code base. I'd have to look into it more carefully to see if it fits.. 2014-03-17 17:32 GMT+01:00 Peter Cock : > On Mon, Mar 17, 2014 at 4:19 PM, Jo?o Rodrigues wrote: > > Hello Peter, > > > > Sorry, typo actually, wrote with small case everywhere but the module > name.. > > thanks. > > > > Also something I have in mind. Should wrappers for NACCESS and > > DSSP be refactored to use Bio.Application? > > If you think it would help, sure. > > Peter > From tra at popgen.net Mon Mar 17 16:53:52 2014 From: tra at popgen.net (Tiago Antao) Date: Mon, 17 Mar 2014 16:53:52 +0000 Subject: [Biopython-dev] Dialign2 testing... Message-ID: <20140317165352.36db07ee@lnx> Hi, Still on the quest for a test run that actually runs all the tests. Can someone suggest what would be a sensible value for DIALIGN2-DIR? It seems that setting up the test is not trivial: there seems to be a need a BLOSUM file inside the dialign directory? Any clues would be appreciated... From p.j.a.cock at googlemail.com Mon Mar 17 18:35:25 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 17 Mar 2014 18:35:25 +0000 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References: Message-ID: Hi all, Bow (regarding SearchIO) others should probably read this... I've commented, see also: http://blastedbio.blogspot.co.uk/2014/02/blast-xml-output-needs-more-love-from.html Peter ---------- Forwarded message ---------- From: Maloney, Christopher (NIH/NLM/NCBI) [C] Date: Mon, Mar 17, 2014 at 5:17 PM Subject: [Open-bio-l] Proposed BLAST XML Changes To: "open-bio-l at lists.open-bio.org" We are not directly soliciting comments, but if anyone would like to make any technical or programmatic suggestions, there is a link from which anyone may comment in the document. ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/ProposedBLASTXMLChanges.pdf Thank you. P.S. Please re-post this to other lists that might have interested readers. Chris Maloney NIH/NLM/NCBI (Contractor) Building 45, 5AN.24D-22 301-594-2842 _______________________________________________ Open-Bio-l mailing list Open-Bio-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/open-bio-l From kirkolem at gmail.com Mon Mar 17 20:07:26 2014 From: kirkolem at gmail.com (Dan K.) Date: Tue, 18 Mar 2014 00:07:26 +0400 Subject: [Biopython-dev] GSoC 2014 Message-ID: Hello! I'm a third year student learning bioengineering and bioinformatics and I'm interested in participating in GSoC and contributing to the BioPython project. In particular, working on multiple alignments and scoring matrices (PSSMs) seems interesting to me. For example, I find convenient to implement Gerstein-Sonnhammer-Chothia and Henikoff weighting procedures. What do you think? What my next steps should be like? Thank you for your attention. From ben at benfulton.net Tue Mar 18 00:38:17 2014 From: ben at benfulton.net (Ben Fulton) Date: Mon, 17 Mar 2014 20:38:17 -0400 Subject: [Biopython-dev] Dialign2 testing... In-Reply-To: <20140317165352.36db07ee@lnx> References: <20140317165352.36db07ee@lnx> Message-ID: I looked at that last year. As far as I could tell the actual code didn't do anything useful with that value; I removed the precondition checks from the tests and it ran properly. On Mon, Mar 17, 2014 at 12:53 PM, Tiago Antao wrote: > Hi, > > Still on the quest for a test run that actually runs all the tests. > > Can someone suggest what would be a sensible value for DIALIGN2-DIR? > It seems that setting up the test is not trivial: there seems to be a > need a BLOSUM file inside the dialign directory? > > Any clues would be appreciated... > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From harsh.beria93 at gmail.com Tue Mar 18 00:39:25 2014 From: harsh.beria93 at gmail.com (Harsh Beria) Date: Tue, 18 Mar 2014 06:09:25 +0530 Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> References: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: Hi, I have started to write a proposal for a project on pair wise sequence alignment. Is there anyone interested in mentoring the project so that I can discuss some of the algorithmic problems in detail? Also, do I need to add the project to the ideas page as it is not there yet? Thanks On Mar 6, 2014 7:19 AM, "Michiel de Hoon" wrote: > Hi Nigel, > > While compiling Biopython on Windows can be tricky, in my experience it > has been easy to compile the C libraries in Biopython on other platforms > (Unix/Linux/MacOSX). Have you run into specific problems compiling > Biopython? I would think that wrapping 3rd-party libraries or executables > is much more error-prone. > > Best, > -Michiel. > > -------------------------------------------- > On Tue, 3/4/14, Nigel Delaney wrote: > > Subject: Re: [Biopython-dev] [GSoC] Gsoc 2014 aspirant > To: "'Peter Cock'" , "'Fields, Christopher > J'" > Cc: biopython-dev at lists.open-bio.org, "'Harsh Beria'" < > harsh.beria93 at gmail.com> > Date: Tuesday, March 4, 2014, 5:39 PM > > As a quick $0.02 from a library user > on this. Back in 2006 when I first > started using biopython I was working with Bio.pairwise2, > and learned that > it was too slow for the task at hand, so wound up switching > to > passing/parsing files to an aligner on the command line to > get around this, > as (on windows back then), I never got the compiled C code > to work properly. > Although pypy may be fast enough and has been very > impressive in my > experience (plus computers are much faster now), I think > wrapping a library > that is cross platform and maintains its own binaries would > be a great > option rather than implementing C-code (I think this might > be what the GSoC > student did last year for BioPython by wrapping Python). > > Mostly though I just wanted to second Peter's wariness about > adding C-code > into the library. I have found over the years that a > lot of python > scientific tools that in theory should be cross platform > (Stampy, IPython, > Matplotlib, Numpy, GATK, etc.) are really not and can be a > huge timesuck of > dealing with installation issues as code moves between > computers and > operating systems, usually due to some C code or OS specific > behavior. > Since code in python works anywhere python is installed, I > really appreciate > the extent that the library can be as much pure python as > allowable or > strictly dependent on a particular downloadable binary for a > specific > OS/Architecture/Scenario. > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > From w.arindrarto at gmail.com Tue Mar 18 09:52:29 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 18 Mar 2014 10:52:29 +0100 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References: Message-ID: Hi Peter, everyone, Thanks for the heads up. If implemented as it is, the updates will change our underlying SearchIO model (aside from the blast-xml parser itself), by allowing a Hit retrieval using multiple different keys. I have a feeling it will be difficult to jam all the new changes into a backwards-compatible parser. One way to make it transparent to users is to use the underlying DTD to do validation before parsing (for the two BLAST DTDs, use the one which the file can be validated against). However, this comes at a price. Since the standard library-bundled elementtree doesn't seem to support validation, we have to use another library (lxml is my choice). This means adding 3rd party dependency which require compiling (lxml is also partly written in C). The other option is to introduce a new format name (e.g. 'blast-xml2'), which makes the user responsible for knowing which BLAST XML he/she is parsing. It feels more explicit this way, so I am leaning towards this option, despite 'blast-xml2' not sounding very nice to me ;). Any other thoughts? Best, Bow On Mon, Mar 17, 2014 at 7:35 PM, Peter Cock wrote: > Hi all, > > Bow (regarding SearchIO) others should probably read this... > > I've commented, see also: > http://blastedbio.blogspot.co.uk/2014/02/blast-xml-output-needs-more-love-from.html > > Peter > > > ---------- Forwarded message ---------- > From: Maloney, Christopher (NIH/NLM/NCBI) [C] > Date: Mon, Mar 17, 2014 at 5:17 PM > Subject: [Open-bio-l] Proposed BLAST XML Changes > To: "open-bio-l at lists.open-bio.org" > > > We are not directly soliciting comments, but if anyone would like to > make any technical or programmatic suggestions, there is a link from > which anyone may comment in the document. > > ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/ProposedBLASTXMLChanges.pdf > > Thank you. > > > P.S. Please re-post this to other lists that might have interested readers. > > Chris Maloney > NIH/NLM/NCBI (Contractor) > Building 45, 5AN.24D-22 > 301-594-2842 > > > _______________________________________________ > Open-Bio-l mailing list > Open-Bio-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/open-bio-l > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Tue Mar 18 10:17:48 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 10:17:48 +0000 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References: Message-ID: On Tue, Mar 18, 2014 at 9:52 AM, Wibowo Arindrarto wrote: > Hi Peter, everyone, > > Thanks for the heads up. If implemented as it is, the updates will > change our underlying SearchIO model (aside from the blast-xml parser > itself), by allowing a Hit retrieval using multiple different keys. Could you clarify what you mean by multiple keys here? > I have a feeling it will be difficult to jam all the new changes into > a backwards-compatible parser. One way to make it transparent to users > is to use the underlying DTD to do validation before parsing (for the > two BLAST DTDs, use the one which the file can be validated against). > However, this comes at a price. Since the standard library-bundled > elementtree doesn't seem to support validation, we have to use another > library (lxml is my choice). This means adding 3rd party dependency > which require compiling (lxml is also partly written in C). We can probably tell by sniffing the first few lines... but how to do that without using a handle seek to rewind may be tricky (desirable to support parsing streams, e.g. stdin). > The other option is to introduce a new format name (e.g. > 'blast-xml2'), which makes the user responsible for knowing which > BLAST XML he/she is parsing. It feels more explicit this way, so I am > leaning towards this option, despite 'blast-xml2' not sounding very > nice to me ;). > > Any other thoughts? > > Best, > Bow I agree for the SearchIO interface, two format names makes sense - unless there is a neat way to auto-detect this on input. Using "blast-xml2" would work, or maybe something like "blast-xml-2014" (too long?). We could even go for "blast-xml-old" and "blast-xml" perhaps? Peter From w.arindrarto at gmail.com Tue Mar 18 10:33:55 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 18 Mar 2014 11:33:55 +0100 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References: Message-ID: On Tue, Mar 18, 2014 at 11:17 AM, Peter Cock wrote: > On Tue, Mar 18, 2014 at 9:52 AM, Wibowo Arindrarto > wrote: >> Hi Peter, everyone, >> >> Thanks for the heads up. If implemented as it is, the updates will >> change our underlying SearchIO model (aside from the blast-xml parser >> itself), by allowing a Hit retrieval using multiple different keys. > > Could you clarify what you mean by multiple keys here? Currently, we can retrieve hits from a query using its ID, aside from its numeric index. With their proposed changes to the Hit element here: ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/ProposedBLASTXMLChanges.pdf, it means that a given Hit can now be annotated with more than one ID. Ideally, this should also be reflected in the QueryResult object: a hit item should be retrievable using any of the IDs it has. This will also affect membership checking on the QueryResult object. >> I have a feeling it will be difficult to jam all the new changes into >> a backwards-compatible parser. One way to make it transparent to users >> is to use the underlying DTD to do validation before parsing (for the >> two BLAST DTDs, use the one which the file can be validated against). >> However, this comes at a price. Since the standard library-bundled >> elementtree doesn't seem to support validation, we have to use another >> library (lxml is my choice). This means adding 3rd party dependency >> which require compiling (lxml is also partly written in C). > > We can probably tell by sniffing the first few lines... but how > to do that without using a handle seek to rewind may be > tricky (desirable to support parsing streams, e.g. stdin). Ah yes. We have a rewindable file seek object in Bio.File, don't we :)? I'll have to play around with some real datasets first, I think. The other thing we should take into account is the Xinclude tag. Would we want to make it possible to query *either* the single query XML results or the master Xinclude document (point 2 of the proposed change)? Or should we restrict our parser only to the single query files? >> The other option is to introduce a new format name (e.g. >> 'blast-xml2'), which makes the user responsible for knowing which >> BLAST XML he/she is parsing. It feels more explicit this way, so I am >> leaning towards this option, despite 'blast-xml2' not sounding very >> nice to me ;). >> >> Any other thoughts? >> >> Best, >> Bow > > I agree for the SearchIO interface, two format names makes > sense - unless there is a neat way to auto-detect this on input. > > Using "blast-xml2" would work, or maybe something like > "blast-xml-2014" (too long?). > > We could even go for "blast-xml-old" and "blast-xml" perhaps? Hmm..'blast-xml-old', may make it difficult to adapt for future XML schema changes. How about renaming the current parser to 'blast-xml-legacy', and the new one to just 'blast-xml'? Cheers, Bow From p.j.a.cock at googlemail.com Tue Mar 18 10:38:53 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 10:38:53 +0000 Subject: [Biopython-dev] Dialign2 testing... In-Reply-To: References: <20140317165352.36db07ee@lnx> Message-ID: Hi Tiago, Ben, >From memory if the environment variable was not set, the command line tool would fail in a strange way - so I made the test conditional on having the variable set. Perhaps things have changed slightly since 2009, https://github.com/biopython/biopython/commit/d4ea47e27f3a8aa7ebe1460b0d96c3135f6bfba5 or maybe this depends on how dialign2 is installed... possibly the Linux packages didn't exist back then? Peter On Tue, Mar 18, 2014 at 12:38 AM, Ben Fulton wrote: > I looked at that last year. As far as I could tell the actual code didn't > do anything useful with that value; I removed the precondition checks from > the tests and it ran properly. > > On Mon, Mar 17, 2014 at 12:53 PM, Tiago Antao wrote: > >> Hi, >> >> Still on the quest for a test run that actually runs all the tests. >> >> Can someone suggest what would be a sensible value for DIALIGN2-DIR? >> It seems that setting up the test is not trivial: there seems to be a >> need a BLOSUM file inside the dialign directory? >> >> Any clues would be appreciated... >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Tue Mar 18 10:58:06 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 10:58:06 +0000 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References: Message-ID: On Tue, Mar 18, 2014 at 10:33 AM, Wibowo Arindrarto wrote: > On Tue, Mar 18, 2014 at 11:17 AM, Peter Cock wrote: >> On Tue, Mar 18, 2014 at 9:52 AM, Wibowo Arindrarto >> wrote: >>> Hi Peter, everyone, >>> >>> Thanks for the heads up. If implemented as it is, the updates will >>> change our underlying SearchIO model (aside from the blast-xml parser >>> itself), by allowing a Hit retrieval using multiple different keys. >> >> Could you clarify what you mean by multiple keys here? > > Currently, we can retrieve hits from a query using its ID, aside from > its numeric index. With their proposed changes to the Hit element > here: ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/ProposedBLASTXMLChanges.pdf, > it means that a given Hit can now be annotated with more than one ID. But this happens already in the current output from merged entries in databases like NR - we effectively use the first alternative ID as the hit ID. See for example the nasty > separated entries in the legacy BLAST XML's tag where only the first ID appears in the tag: http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions.html See also the new optional fields in the tabular output which explicitly list all the aliases for the merge record (e.g. sallseqid). > Ideally, this should also be reflected in the QueryResult object: a > hit item should be retrievable using any of the IDs it has. > > This will also affect membership checking on the QueryResult object. This looks like something we should review anyway, regardless of the new BLAST XML format. >>> I have a feeling it will be difficult to jam all the new changes into >>> a backwards-compatible parser. One way to make it transparent to users >>> is to use the underlying DTD to do validation before parsing (for the >>> two BLAST DTDs, use the one which the file can be validated against). >>> However, this comes at a price. Since the standard library-bundled >>> elementtree doesn't seem to support validation, we have to use another >>> library (lxml is my choice). This means adding 3rd party dependency >>> which require compiling (lxml is also partly written in C). >> >> We can probably tell by sniffing the first few lines... but how >> to do that without using a handle seek to rewind may be >> tricky (desirable to support parsing streams, e.g. stdin). > > Ah yes. We have a rewindable file seek object in Bio.File, don't we > :)? I'll have to play around with some real datasets first, I think. Yes, the UndoHandle in Bio.File might be the best solution here for auto-detection. But two explicit formats is probably better. > The other thing we should take into account is the Xinclude tag. Would > we want to make it possible to query *either* the single query XML > results or the master Xinclude document (point 2 of the proposed > change)? Or should we restrict our parser only to the single query > files? I think single files is a reasonable restriction... assuming BLAST will still have the option of producing a big multi-query XML? Probably we should ask the NCBI about that... I would hope the Bio.SearchIO.index_db(...) approach could be used on a colloection of little XML files, one for each query. >>> The other option is to introduce a new format name (e.g. >>> 'blast-xml2'), which makes the user responsible for knowing which >>> BLAST XML he/she is parsing. It feels more explicit this way, so I am >>> leaning towards this option, despite 'blast-xml2' not sounding very >>> nice to me ;). >>> >>> Any other thoughts? >>> >>> Best, >>> Bow >> >> I agree for the SearchIO interface, two format names makes >> sense - unless there is a neat way to auto-detect this on input. >> >> Using "blast-xml2" would work, or maybe something like >> "blast-xml-2014" (too long?). >> >> We could even go for "blast-xml-old" and "blast-xml" perhaps? > > Hmm..'blast-xml-old', may make it difficult to adapt for future XML > schema changes. How about renaming the current parser to > 'blast-xml-legacy', and the new one to just 'blast-xml'? A possible downside of 'blast-xml-legacy' over 'blast-xml-old' is this may be confused with the "legacy" BLAST in C to the current BLAST+ in C++ move (which happened well before this XML format change). Peter From tra at popgen.net Tue Mar 18 11:22:15 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 18 Mar 2014 11:22:15 +0000 Subject: [Biopython-dev] Dialign2 testing... In-Reply-To: References: <20140317165352.36db07ee@lnx> Message-ID: <20140318112215.4207072e@lnx> On Tue, 18 Mar 2014 10:38:53 +0000 Peter Cock wrote: > From memory if the environment variable was not > set, the command line tool would fail in a strange > way - so I made the test conditional on having the > variable set. I noticed that and created an environment variable, then I got stuck on the BLOSUM issue. Per Ben's suggestion, should be remove the check? Or should I use a non-standard package? Thanks, Tiago From w.arindrarto at gmail.com Tue Mar 18 11:48:56 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Tue, 18 Mar 2014 12:48:56 +0100 Subject: [Biopython-dev] Fwd: [Open-bio-l] Proposed BLAST XML Changes In-Reply-To: References: Message-ID: On Tue, Mar 18, 2014 at 11:58 AM, Peter Cock wrote: > On Tue, Mar 18, 2014 at 10:33 AM, Wibowo Arindrarto > wrote: >> On Tue, Mar 18, 2014 at 11:17 AM, Peter Cock wrote: >>> On Tue, Mar 18, 2014 at 9:52 AM, Wibowo Arindrarto >>> wrote: >>>> Hi Peter, everyone, >>>> >>>> Thanks for the heads up. If implemented as it is, the updates will >>>> change our underlying SearchIO model (aside from the blast-xml parser >>>> itself), by allowing a Hit retrieval using multiple different keys. >>> >>> Could you clarify what you mean by multiple keys here? >> >> Currently, we can retrieve hits from a query using its ID, aside from >> its numeric index. With their proposed changes to the Hit element >> here: ftp://ftp.ncbi.nlm.nih.gov/blast/documents/NEWXML/ProposedBLASTXMLChanges.pdf, >> it means that a given Hit can now be annotated with more than one ID. > > But this happens already in the current output from merged entries > in databases like NR - we effectively use the first alternative ID as > the hit ID. See for example the nasty > separated entries in > the legacy BLAST XML's tag where only the first ID > appears in the tag: > > http://blastedbio.blogspot.co.uk/2012/05/blast-tabular-missing-descriptions.html > > See also the new optional fields in the tabular output which > explicitly list all the aliases for the merge record (e.g. sallseqid). In the BLAST outputs, yes. However, there's no explicit support yet in SearchIO for this. Currently we only parse whatever is in as the ID and as the description. If the tag has is separated by semicolons / has more than one IDs, the current parser does not try to split it into multiple IDs. Instead it takes the whole string as the ID. Also, in the blast tabular format, even though sallseqid is parsed, it's merely stored as an attribute of the hit object, not something that can be used to retrieve Hits from the QueryResult object. >> Ideally, this should also be reflected in the QueryResult object: a >> hit item should be retrievable using any of the IDs it has. >> >> This will also affect membership checking on the QueryResult object. > > This looks like something we should review anyway, regardless > of the new BLAST XML format. Of course :). >>>> I have a feeling it will be difficult to jam all the new changes into >>>> a backwards-compatible parser. One way to make it transparent to users >>>> is to use the underlying DTD to do validation before parsing (for the >>>> two BLAST DTDs, use the one which the file can be validated against). >>>> However, this comes at a price. Since the standard library-bundled >>>> elementtree doesn't seem to support validation, we have to use another >>>> library (lxml is my choice). This means adding 3rd party dependency >>>> which require compiling (lxml is also partly written in C). >>> >>> We can probably tell by sniffing the first few lines... but how >>> to do that without using a handle seek to rewind may be >>> tricky (desirable to support parsing streams, e.g. stdin). >> >> Ah yes. We have a rewindable file seek object in Bio.File, don't we >> :)? I'll have to play around with some real datasets first, I think. > > Yes, the UndoHandle in Bio.File might be the best solution > here for auto-detection. But two explicit formats is probably better. > >> The other thing we should take into account is the Xinclude tag. Would >> we want to make it possible to query *either* the single query XML >> results or the master Xinclude document (point 2 of the proposed >> change)? Or should we restrict our parser only to the single query >> files? > > I think single files is a reasonable restriction... assuming BLAST > will still have the option of producing a big multi-query XML? > Probably we should ask the NCBI about that... In a way, the Xinclude file is the file containing multi-query XML. I have a feeling that if Xinclude is proposed, producing multi-output BLAST XML files will not be an option anymore (otherwise it seems redundant). But yes, NCBI should has more info about this. > I would hope the Bio.SearchIO.index_db(...) approach could > be used on a colloection of little XML files, one for each query. > >>>> The other option is to introduce a new format name (e.g. >>>> 'blast-xml2'), which makes the user responsible for knowing which >>>> BLAST XML he/she is parsing. It feels more explicit this way, so I am >>>> leaning towards this option, despite 'blast-xml2' not sounding very >>>> nice to me ;). >>>> >>>> Any other thoughts? >>>> >>>> Best, >>>> Bow >>> >>> I agree for the SearchIO interface, two format names makes >>> sense - unless there is a neat way to auto-detect this on input. >>> >>> Using "blast-xml2" would work, or maybe something like >>> "blast-xml-2014" (too long?). >>> >>> We could even go for "blast-xml-old" and "blast-xml" perhaps? >> >> Hmm..'blast-xml-old', may make it difficult to adapt for future XML >> schema changes. How about renaming the current parser to >> 'blast-xml-legacy', and the new one to just 'blast-xml'? > > A possible downside of 'blast-xml-legacy' over 'blast-xml-old' > is this may be confused with the "legacy" BLAST in C to the > current BLAST+ in C++ move (which happened well before > this XML format change). Hmm. In this case then I am leaning to 'blast-xml2', I think. It's the shortest and most future-proof (subsequent changes to the XML format could be denoted as 'blast-xml3'). But it does make it slightly inconsistent with the names we have for HMMER (i.e. 'hmmer2-text' is for HMMER version 2 text output, 'hmmer3-text' is for HMMER version 3 text output). Cheers, Bow From p.j.a.cock at googlemail.com Tue Mar 18 13:15:16 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 13:15:16 +0000 Subject: [Biopython-dev] A biopython buildbot as a docker container In-Reply-To: <20140312151048.45066ade@lnx> References: <20140312151048.45066ade@lnx> Message-ID: On Wed, Mar 12, 2014 at 3:10 PM, Tiago Antao wrote: > Hi, > > I have a docker container ready (save for a few applications). Simple > usage instructions: > > 1. Create a directory and download inside this file: > https://raw.github.com/tiagoantao/my-containers/master/Biopython-Test Things moved, https://github.com/tiagoantao/my-containers/tree/master/biopython I guess you mean: https://raw.github.com/tiagoantao/my-containers/master/biopython/Biopython-Test > 2. Rename it Dockerfile (capital D) > > 3. Get a buildbot username and password (from Peter or me), edit the > file and replace CHANGEUSER CHANGEPASS > > 4. do > docker build -t biopython-buildbot . > > 5. do > docker run biopython-buildbot > > Beta-version, comments appreciated ;) > > If people like this, I will amend the Continuous Integration page on > the wiki accordingly > > Tiago Is this a 32 or 64 bit VM, or either? I'm asking because we may want to source a replacement 32 bit Linux buildslave - the hard drive in the old machine we've been using is failing, and it is probably not worth replacing. Peter From mjldehoon at yahoo.com Tue Mar 18 14:21:48 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 18 Mar 2014 07:21:48 -0700 (PDT) Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: Message-ID: <1395152508.46913.YahooMailBasic@web164006.mail.gq1.yahoo.com> On Mon, 3/17/14, Peter Cock wrote: > Are we all generally in favour of lower case for new module > names (as per PEP8)? > i.e. Bio/struct not Bio/Struct ? You may want to consider Bio/structure instead of Bio/struct. To me "struct" sounds like the C programming term, rather than a protein structure. Best, -Michiel From p.j.a.cock at googlemail.com Tue Mar 18 14:43:56 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 14:43:56 +0000 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: <1395152508.46913.YahooMailBasic@web164006.mail.gq1.yahoo.com> References: <1395152508.46913.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: On Tue, Mar 18, 2014 at 2:21 PM, Michiel de Hoon wrote: > On Mon, 3/17/14, Peter Cock wrote: >> Are we all generally in favour of lower case for new module >> names (as per PEP8)? >> i.e. Bio/struct not Bio/Struct ? > > You may want to consider Bio/structure instead of Bio/struct. > To me "struct" sounds like the C programming term, > rather than a protein structure. > > Best, > -Michiel I like Bio.structure too :) Peter From anaryin at gmail.com Tue Mar 18 14:46:34 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 18 Mar 2014 15:46:34 +0100 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References: <1395152508.46913.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: Makes sense! If nobody complains I'll change it. From eric.talevich at gmail.com Tue Mar 18 15:23:29 2014 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 18 Mar 2014 08:23:29 -0700 Subject: [Biopython-dev] [GSoC] Gsoc 2014 aspirant In-Reply-To: References: <1394070589.65906.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: On Mon, Mar 17, 2014 at 5:39 PM, Harsh Beria wrote: > Hi, > > I have started to write a proposal for a project on pair wise sequence > alignment. Is there anyone interested in mentoring the project so that I > can discuss some of the algorithmic problems in detail? Also, do I need to > add the project to the ideas page as it is not there yet? > It's not necessary to add the project to the public Ideas page if you've come up with it yourself. Just share your own proposal with us here and we'll discuss it with you. -Eric From tra at popgen.net Tue Mar 18 16:12:50 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 18 Mar 2014 16:12:50 +0000 Subject: [Biopython-dev] A biopython buildbot as a docker container In-Reply-To: References: <20140312151048.45066ade@lnx> Message-ID: <20140318161250.77a269fe@lnx> Hi, On Tue, 18 Mar 2014 13:15:16 +0000 Peter Cock wrote: > > Things moved, > https://github.com/tiagoantao/my-containers/tree/master/biopython > > I guess you mean: > https://raw.github.com/tiagoantao/my-containers/master/biopython/Biopython-Test Ah, sorry. Because this is a first version is being subjected to heavy refactoring still. I plan to document the final version well. For now maybe it is better to go to the top level: https://github.com/tiagoantao/my-containers The example of the README is documenting the biopython containers as they stand. > Is this a 32 or 64 bit VM, or either? I am afraid it is 64 and that doing a 32-bit docker is possible but not trivial. Tiago From arklenna at gmail.com Tue Mar 18 16:48:51 2014 From: arklenna at gmail.com (Lenna Peterson) Date: Tue, 18 Mar 2014 12:48:51 -0400 Subject: [Biopython-dev] Future of Bio.PDB In-Reply-To: References: <1395152508.46913.YahooMailBasic@web164006.mail.gq1.yahoo.com> Message-ID: On Tue, Mar 18, 2014 at 10:43 AM, Peter Cock wrote: > On Tue, Mar 18, 2014 at 2:21 PM, Michiel de Hoon > wrote: > > On Mon, 3/17/14, Peter Cock wrote: > >> Are we all generally in favour of lower case for new module > >> names (as per PEP8)? > >> i.e. Bio/struct not Bio/Struct ? > > > > You may want to consider Bio/structure instead of Bio/struct. > > To me "struct" sounds like the C programming term, > > rather than a protein structure. > > > > Best, > > -Michiel > > I like Bio.structure too :) > Thirded! I'm in a particularly busy portion of my PhD right now but hopefully over the summer I'll have a little more spare time for open source work. Cheers, Lenna From tra at popgen.net Tue Mar 18 17:13:34 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 18 Mar 2014 17:13:34 +0000 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal Message-ID: <20140318171334.0edc2b45@lnx> Hi, Currently we have went through the procedure of asking on the mailing lists about Simcoal deprecation (now that we have fastsimcoal) 3 proposals and a doubt: 1. Deprecate https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Async.py https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Cache.py 2. Delete the Simcoal tests 3. Amend the tutorial The doubt: I would like to deprecate a class inside https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Controller.py But not the whole Controller (the fastsimcoal code is there). Question: Is there a procedure for a partial deprecation? Thanks, T From p.j.a.cock at googlemail.com Tue Mar 18 17:15:41 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 18 Mar 2014 17:15:41 +0000 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal In-Reply-To: <20140318171334.0edc2b45@lnx> References: <20140318171334.0edc2b45@lnx> Message-ID: Hi Tiago, We've previously put a deprecation warning inside the __init__ method so anyone actually using the class will be warned. Peter On Tue, Mar 18, 2014 at 5:13 PM, Tiago Antao wrote: > Hi, > > Currently we have went through the procedure of asking on the mailing > lists about Simcoal deprecation (now that we have fastsimcoal) > > 3 proposals and a doubt: > > 1. Deprecate > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Async.py > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Cache.py > 2. Delete the Simcoal tests > 3. Amend the tutorial > > The doubt: I would like to deprecate a class inside > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Controller.py > But not the whole Controller (the fastsimcoal code is there). > Question: Is there a procedure for a partial deprecation? > > Thanks, > T > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From tra at popgen.net Tue Mar 18 18:26:10 2014 From: tra at popgen.net (Tiago Antao) Date: Tue, 18 Mar 2014 18:26:10 +0000 Subject: [Biopython-dev] A biopython buildbot as a docker container In-Reply-To: <20140318161250.77a269fe@lnx> References: <20140312151048.45066ade@lnx> <20140318161250.77a269fe@lnx> Message-ID: <20140318182610.35082103@lnx> An update on the status: 1. A couple of problems with fasttree and dialign2. These seem genuine problems with the test code/modules. 2. Prank will wait for ubuntu trusty (it will be a standard package). I will then include it. 3. I was just able to find part of the fonts for the graphics packages, so a couple of tests are being skipped 4. naccess as a very restrictive activation system, impossible to add. 1 and 3 are solvable (2 will sort itself out with time). 1 is really a problem with the biopython code, I think. For 3, if someone could have a look at the existing fonts here: https://github.com/tiagoantao/my-containers/blob/master/biopython/Biopython-Basic And tell me which ones are missing, I would take care of adding them. Tiago PS - In the near future I will do a Python 3 container also. From kirkolem at gmail.com Tue Mar 18 21:31:45 2014 From: kirkolem at gmail.com (Dan K.) Date: Wed, 19 Mar 2014 01:31:45 +0400 Subject: [Biopython-dev] GSoC 2014 Message-ID: Hello! I'm a third year student learning bioengineering and bioinformatics and I'm interested in participating in GSoC and contributing to the BioPython project. In particular, working on multiple alignments and scoring matrices (PSSMs) seems interesting to me. For example, I find convenient to implement Gerstein-Sonnhammer-Chothia and Henikoff weighting procedures. What do you think? What my next steps should be like? Thank you for your attention. From p.j.a.cock at googlemail.com Wed Mar 19 17:00:37 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 19 Mar 2014 17:00:37 +0000 Subject: [Biopython-dev] SQLite test failure on Windows, OperationalError: unable to open database file Message-ID: Hi all, About a week ago most of the Windows nightly tests broke - e.g. here on the same revision (!) 79f9054e5246ba30816ff93a775d594ae7da6fc6 https://github.com/biopython/biopython/commit/79f9054e5246ba30816ff93a775d594ae7da6fc6 Worked, Fri Mar 14 http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.6/builds/1129 Failed, Sat Mar 15 http://testing.open-bio.org/biopython/builders/Windows%20XP%20-%20Python%202.6/builds/1130 ... test_BioSQL_sqlite3 ... FAIL ... ====================================================================== ERROR: Check list, keys, length etc ---------------------------------------------------------------------- Traceback (most recent call last): File "c:\repositories\BuildBotBiopython\win26\build\Tests\common_BioSQL.py", line 193, in setUp load_database(gb_handle) File "c:\repositories\BuildBotBiopython\win26\build\Tests\common_BioSQL.py", line 166, in load_database create_database() File "c:\repositories\BuildBotBiopython\win26\build\Tests\common_BioSQL.py", line 148, in create_database server.load_database_sql(SQL_FILE) File "c:\repositories\BuildBotBiopython\win26\build\build\lib.win32-2.6\BioSQL\BioSeqDatabase.py", line 281, in load_database_sql self.adaptor.cursor.execute(sql_line) OperationalError: unable to open database file (etc) Presumably something changed on the machine itself - perhaps a Windows security update? Any guesses for what might be wrong and why it broke on Python 2.6, PyPy 1.9, 2.0, 2.1 - yet works fine on Python 2.7, Python 3.3, PyPy 2.2 and Jython 2.7? Logged into this machine, I can reproduce the error with: c:\python26\python test_BioSQL_sqlite3.py Thanks, Peter From eparker at ucdavis.edu Wed Mar 19 16:49:04 2014 From: eparker at ucdavis.edu (Evan Parker) Date: Wed, 19 Mar 2014 09:49:04 -0700 Subject: [Biopython-dev] GSoC draft proposal - lazy loading SeqIO parsers Message-ID: Hi all, I have a rough draft of my GSoC proposal and would appreciate comments from anybody who might be willing to eventually mentor this project, or anybody who has opinions on implementation. It's about 3 pages of text + several figures. I'll be submitting a final draft Friday on the GSoC website pending your comments. Thank you, -Evan -------------- next part -------------- A non-text attachment was scrubbed... Name: Evan-Parker-GSOC-2014-proposal.pdf Type: application/pdf Size: 68577 bytes Desc: not available URL: From p.j.a.cock at googlemail.com Wed Mar 19 17:26:10 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 19 Mar 2014 17:26:10 +0000 Subject: [Biopython-dev] GSoC draft proposal - lazy loading SeqIO parsers In-Reply-To: References: Message-ID: On Wed, Mar 19, 2014 at 4:49 PM, Evan Parker wrote: > Hi all, > > I have a rough draft of my GSoC proposal and would appreciate comments from > anybody who might be willing to eventually mentor this project, or anybody > who has opinions on implementation. It's about 3 pages of text + several > figures. > > I'll be submitting a final draft Friday on the GSoC website pending your > comments. > > Thank you, > -Evan Hi Evan, That's a nice job so far - although questions about your time availability will be raised (sadly the GSoC schedule isn't fair to students depending on regional University term schedules). However, you are a PhD student (which is normally full time). You will need to clear this with your PhD supervisors - since you would be spending a large chunk of time not working directly on your thesis project, and there can be strict deadlines for completion. Here's a selection of points in no particular order: Have you looked at Bio.SeqIO.index_db(...) which works like Bio.SeqIO.index(...) but stores the offsets etc in an SQLite database? When pondering how to design this kind of thing myself, I had suspected multiple SeqRecProxy classes might be needed (one per file format potentially), although run time selection of internal parsing methods might work too. I would also ask why not have the slicing of a SeqRecProxy return another SeqRecProxy? This means creating a new proxy object with different offset values - but would be fast. Only when the seq/annotation/etc is accessed would the proxy have to go to the disk drive. This becomes more interesting when accessing the features in the slice of interest (e.g. if the full record was for a whole chromosome and only region [1000:2000] was of interest). This idea about windows onto the data is key to how the SAM/BAM file format is used (coordinate sorting with an index). Are you familiar with that, or tabix? Another open question is what to do with file handles - specifically the question of when to close them? e.g. via garbage collection, context managers, etc. See for example this blog post - the lazy parsing approach may result in ResourceWarnings as a side effect: http://emptysqua.re/blog/against-resourcewarnings-in-python-3/ I appreciate you are unlikely to have ready answers to all of that - I've probably given you a whole load more background reading. I hope some of the other Biopython developers (or GSoC mentors on other OBF projects - you could post this to the OBF GSoC mailing list too) will have further feedback. Regards, Peter From harsh.beria93 at gmail.com Wed Mar 19 17:44:47 2014 From: harsh.beria93 at gmail.com (Harsh Beria) Date: Wed, 19 Mar 2014 23:14:47 +0530 Subject: [Biopython-dev] GSOC Proposal (Pairwise Sequence Alignment in Biopython) Message-ID: Hi, Please take a look at my GSOC proposal on Pairwise Sequence Alignment and suggest improvements. https://gist.github.com/harshberia93/9647053 Thanks -- Harsh Beria, Indian Institute of Technology,Kharagpur E-mail: harsh.beria93 at gmail.com Ph: +919332157616 From nejat.arinik at insa-lyon.fr Wed Mar 19 17:45:53 2014 From: nejat.arinik at insa-lyon.fr (Nejat Arinik) Date: Wed, 19 Mar 2014 18:45:53 +0100 (CET) Subject: [Biopython-dev] detailed plan - Indexing & Lazy-loading Sequence Parsers In-Reply-To: <635897854.576469.1395250362974.JavaMail.root@insa-lyon.fr> Message-ID: <14256604.579277.1395251153640.JavaMail.root@insa-lyon.fr> Hi all, I would show you my detailed plan per mounth. https://docs.google.com/document/d/1IKJAs4u4rAVnmaDh0LPyMrd_MuqELqblmveeeOO36aE/edit It's not neatly I know but it's just for learn yours ideas at about that plan. I'll finish it this night. I understood correctly the subject, you think? That plan can be solution? Thanks in advance. PS: My english level is not good so It is a little bit difficult to write a proposal-plan detailed but I'm trying. I hope it's not a big problem :) I'm more comfortable with the french language unfortunately. Nejat From p.j.a.cock at googlemail.com Wed Mar 19 18:10:54 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 19 Mar 2014 18:10:54 +0000 Subject: [Biopython-dev] detailed plan - Indexing & Lazy-loading Sequence Parsers In-Reply-To: <14256604.579277.1395251153640.JavaMail.root@insa-lyon.fr> References: <635897854.576469.1395250362974.JavaMail.root@insa-lyon.fr> <14256604.579277.1395251153640.JavaMail.root@insa-lyon.fr> Message-ID: On Wed, Mar 19, 2014 at 5:45 PM, Nejat Arinik wrote: > > Hi all, > > I would show you my detailed plan per mounth. > https://docs.google.com/document/d/1IKJAs4u4rAVnmaDh0LPyMrd_MuqELqblmveeeOO36aE/edit > > It's not neatly I know but it's just for learn yours ideas at about that > plan. I'll finish it this night. I understood correctly the subject, you think? > That plan can be solution? Thanks in advance. > > PS: My english level is not good so It is a little bit difficult to write a > proposal-plan detailed but I'm trying. I hope it's not a big problem :) > I'm more comfortable with the french language unfortunately. > > Nejat Hi Nejat, I can try to answer some of the questions at the start of the document: Q: Lazy-load ~= load partially (depends on demands) ? Yes. For example, only load the sequence if the user tries to access the sequence. For example, this should speed up tasks like counting the records, or building a list of all the record identifiers. Q: small to medium sized sequences/genomes is how much in general? It takes how many times? A: Bacterial genomes usually are small enough to load into memory without worrying about RAM. Eukaryote genomes (e.g. mouse, human, plants) are typically large enough that you may not want to load an entire annotated chromosome into memory. Q: python dictionary is used for SeqRecord object ? A: Yes, the SeqRecord object uses a Python dictionary for the annotations property, and a dictionary like object for the letter_annotations property. The SeqRecord object also uses Python lists, and the Biopython Seq object. Q: Putting some data in the file will be done? If yes, relation with Biosql? So any modification as an update will be considerable/ be paid attention. A: The SeqRecord-like objects from the lazy-parsers could be read only. However, if they act enough like the original SeqRecord, then they can be used with Bio.SeqIO.write(...) to save them to disk. It would be nice if (like the BioSQL SeqRecord-like objects) it was possible to modify the records in memory. Q: For very large indexing jobs, index on multiple machines running simultaneously, and then merge the indexes. A: This seems too complicated. If building the index is slow, I suggest saving the index on disk (e.g. as an SQLite database). For comparison, see the BAM and tabix index files, or Biopython's Bio.SeqIO.index_db(...) function. Regards, Peter From w.arindrarto at gmail.com Wed Mar 19 19:42:50 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Wed, 19 Mar 2014 20:42:50 +0100 Subject: [Biopython-dev] GSoC draft proposal - lazy loading SeqIO parsers In-Reply-To: References: Message-ID: Hi Evan, Looks like this is shaping up in a good direction :). In addition to Peter's earlier comments, I also have some remarks: * How would the indices of the files be stored? Are they simply stored in-memory or as files? Are their creation invisible to the user (i.e. invoking the `lazy=True` argument is enough to create the index) or does the user need to create the index explicitly? For `SeqIO.index(lazy=True)` in particular, does this mean that we will have two indices then (one for the currently implemented SQLite database that stores offsets for record positions and the other to store other informations necessary for the lazy parser)? * It would be nice to also have some notes on the relation between SeqRecProxy and SeqRecord (is it a subclass perhaps, or are they both different but will inherit from another base subclass). As an alternative, it is also possible to have regular SeqRecord object, but with lazy Seq objects and lazy annotation objects instead. * Have you thought about what to store in the indices of the different formats? It's a good idea to explain this further in your proposal (e.g. what to store when indexing GenBank files, UniprotXML files, etc.). It doesn't have to be concrete (it will be in the code anyway, but having an idea or possible implementations you have in mind would be nice. * And finally, the schedule. It looks like the early weeks will be quite packed, considering your other obligations. I think it is expected that students spend close to 8 hours per day (or 40 hours per week) during the coding period. Of course this is much more sensible when the student does not have other pressing obligations. I do agree with Peter here that you have to at least discuss this with your PhD supervisor. I personally do not mind that for the week you have the conference the workload is reduced. But in the first four weeks, I would prefer that you have more time to spend for GSoC. Cheers & good luck, Bow On Wed, Mar 19, 2014 at 6:26 PM, Peter Cock wrote: > On Wed, Mar 19, 2014 at 4:49 PM, Evan Parker wrote: >> Hi all, >> >> I have a rough draft of my GSoC proposal and would appreciate comments from >> anybody who might be willing to eventually mentor this project, or anybody >> who has opinions on implementation. It's about 3 pages of text + several >> figures. >> >> I'll be submitting a final draft Friday on the GSoC website pending your >> comments. >> >> Thank you, >> -Evan > > Hi Evan, > > That's a nice job so far - although questions about your time > availability will be raised (sadly the GSoC schedule isn't fair to > students depending on regional University term schedules). > However, you are a PhD student (which is normally full time). > You will need to clear this with your PhD supervisors - since > you would be spending a large chunk of time not working > directly on your thesis project, and there can be strict > deadlines for completion. > > Here's a selection of points in no particular order: > > Have you looked at Bio.SeqIO.index_db(...) which works > like Bio.SeqIO.index(...) but stores the offsets etc in an > SQLite database? > > When pondering how to design this kind of thing myself, > I had suspected multiple SeqRecProxy classes might be > needed (one per file format potentially), although run > time selection of internal parsing methods might work too. > > I would also ask why not have the slicing of a SeqRecProxy > return another SeqRecProxy? This means creating a new > proxy object with different offset values - but would be fast. > Only when the seq/annotation/etc is accessed would the > proxy have to go to the disk drive. This becomes more > interesting when accessing the features in the slice of > interest (e.g. if the full record was for a whole chromosome > and only region [1000:2000] was of interest). > > This idea about windows onto the data is key to how > the SAM/BAM file format is used (coordinate sorting > with an index). Are you familiar with that, or tabix? > > Another open question is what to do with file handles - > specifically the question of when to close them? e.g. > via garbage collection, context managers, etc. See > for example this blog post - the lazy parsing approach > may result in ResourceWarnings as a side effect: > http://emptysqua.re/blog/against-resourcewarnings-in-python-3/ > > I appreciate you are unlikely to have ready answers to > all of that - I've probably given you a whole load more > background reading. I hope some of the other Biopython > developers (or GSoC mentors on other OBF projects - > you could post this to the OBF GSoC mailing list too) > will have further feedback. > > Regards, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From eparker at ucdavis.edu Thu Mar 20 00:34:34 2014 From: eparker at ucdavis.edu (Evan Parker) Date: Wed, 19 Mar 2014 17:34:34 -0700 Subject: [Biopython-dev] GSoC draft proposal - lazy loading SeqIO parsers In-Reply-To: References: Message-ID: Thank you both for your fast and thorough evaluation of my proposal. *Regarding time requirements:* My adviser is aware the possibility that I may participate in this program. During the summer I would file a "planned educational leave" instead of enrollment to accommodate my full-time participation in GSoC. As for the time requirements; I cannot avoid my obligations prior to ASMS although I can promise to spend every extra minute I have to honor my obligations to Biopython. If my lack of full time availability prior to June precludes me from participation I will understand. *Regarding specific suggestions:* I will come up with a deeper description of the relationship between SeqRecProxy and SeqRecord before Friday. I like the idea of a SeqRecProxy returning itself when sliced, I had not thought of it but it would be an elegant solution to the problem of unparsed-vs-parsed annotations, this feature would also allow more transparent use of proxy objects and would pave the way for compatibility with SeqIO.write(). I considered using multiple proxy classes, but I prefer making a standardized binding for a lazy parsing function function that can be accepted by a single SeqRecProxy at run-time. I'll make this more explicit in my proposal. There are many other questions and points of clarification that I still need to evaluate. I'll incorporate as much as I can in my proposal without overloading it and without making statements that I cannot back up with my own understanding. Thanks again, -Evan From p.j.a.cock at googlemail.com Thu Mar 20 11:19:27 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 20 Mar 2014 11:19:27 +0000 Subject: [Biopython-dev] Fwd: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts In-Reply-To: References: Message-ID: FYI, in addition to the SciPy conference in Texas this summer, there is also EuroSciPy which will be in England this year - deadline for abstracts is 14 April (see below). Is anyone planning to attend? If not maybe I should...? Thanks, Peter P.S. Don't forget to consider submitting a talk/poster abstract to BOSC 2014 (which I am co-chairing this year), especially students who can get free BOSC registration: http://news.open-bio.org/news/2014/03/free-student-presenters-bosc-2014/ ---------- Forwarded message ---------- From: Ralf Gommers Date: Wed, Mar 5, 2014 at 7:37 PM Subject: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts To: Organisation of EuroScipy , conferences at python.org, numfocus at googlegroups.com, Discussion of Numerical Python , SciPy Users List Dear all, EuroSciPy 2014, the Seventh Annual Conference on Python in Science, takes place in Cambridge, UK on 27 - 30 August 2013. The conference features two days of tutorials followed by two days of scientific talks. The day after the main conference, developer sprints will be organized on projects of interest to attendees. The topics presented at EuroSciPy are very diverse, with a focus on advanced software engineering and original uses of Python and its scientific libraries, either in theoretical or experimental research, from both academia and the industry. The program includes keynotes, contributed talks and posters. Submissions for talks and posters are welcome on our website (http://www.euroscipy.org/2014/). In your abstract, please provide details on what Python tools are being employed, and how. The deadline for submission is 14 April 2013. Also until 14 April 2014, you can apply for a sprint session on 31 August 2014. See https://www.euroscipy.org/2014/calls/sprints/ for details. Important dates: April 14th: Presentation abstracts, poster, tutorial submission deadline. Application for sponsorship deadline. May 17th: Speakers selected May 22nd: Sponsorship acceptance deadline June 1st: Speaker schedule announced June 6th, or 150 registrants: Early-bird registration ends August 27-31st: 2 days of tutorials, 2 days of conference, 1 day of sprints We look forward to an exciting conference and hope to see you in Cambridge in August! The EuroSciPy 2014 Team http://www.euroscipy.org/2014/ Conference Chairs -------------------------- Mark Hayes, Cambridge University, UK Didrik Pinte, Enthought Europe, UK Tutorial Chair ------------------- David Cournapeau, Enthought Europe, UK Program Chair -------------------- Ralf Gommers, ASML, The Netherlands Program Committee ----------------------------- Tiziano Zito, Humboldt-Universit?t zu Berlin, Germany Pierre de Buyl, Universit? libre de Bruxelles, Belgium Emmanuelle Gouillart, Joint Unit CNRS/Saint-Gobain, France Konrad Hinsen, Centre National de la Recherche Scientifique (CNRS), France Raphael Ritz, Garching Computing Centre of the Max Planck Society, Germany St?fan van der Walt, Applied Mathematics, Stellenbosch University, South Africa Ga?l Varoquaux, INRIA Parietal, Saclay, France Nelle Varoquaux, Mines ParisTech, France Pauli Virtanen, Aalto University, Finland Evgeni Burovski, Lancaster University, UK Robert Cimrman, New Technologies Research Centre, University of West Bohemia, Czech Republic Almar Klein, Cybermind, The Netherlands Organizing Committee ------------------------------ Simon Jagoe, Enthought Europe, UK Pierre de Buyl, Universit? libre de Bruxelles, Belgium _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From tra at popgen.net Thu Mar 20 11:48:15 2014 From: tra at popgen.net (Tiago Antao) Date: Thu, 20 Mar 2014 11:48:15 +0000 Subject: [Biopython-dev] Fwd: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts In-Reply-To: References: Message-ID: <20140320114815.70210fd5@grandao> On Thu, 20 Mar 2014 11:19:27 +0000 Peter Cock wrote: > Is anyone planning to attend? If not maybe I should...? Wild thought here: Considering that Cambridge is a geographic focal point for some of us (I am looking at you Dutch-based Biopythoneers, for instance), I am wondering if the could use this for a "local" Biopython meetup... Does this make any sense? Would there be interest? As I said, wild thought (silly?)... Tiago From anaryin at gmail.com Thu Mar 20 11:54:05 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 20 Mar 2014 12:54:05 +0100 Subject: [Biopython-dev] Fwd: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts In-Reply-To: <20140320114815.70210fd5@grandao> References: <20140320114815.70210fd5@grandao> Message-ID: Lovely geographical clustering :) I'd be in. 2014-03-20 12:48 GMT+01:00 Tiago Antao : > On Thu, 20 Mar 2014 11:19:27 +0000 > Peter Cock wrote: > > > Is anyone planning to attend? If not maybe I should...? > > > Wild thought here: Considering that Cambridge is a geographic focal > point for some of us (I am looking at you Dutch-based Biopythoneers, > for instance), I am wondering if the could use this for a "local" > Biopython meetup... Does this make any sense? Would there be interest? > > As I said, wild thought (silly?)... > > > Tiago > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From tra at popgen.net Thu Mar 20 11:42:44 2014 From: tra at popgen.net (Tiago Antao) Date: Thu, 20 Mar 2014 11:42:44 +0000 Subject: [Biopython-dev] Ipython notebook biopython tutorials Message-ID: <20140320114244.4022cbc7@grandao> Hi all, Just to announce a potential project that I might embark on very soon and see the reaction of the community: Get all the tutorial materials that I can find and create a ipython notebook version of them. Does this sound like a good idea? Tiago (your ipython notebook fanatic) From w.arindrarto at gmail.com Thu Mar 20 12:10:15 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 20 Mar 2014 13:10:15 +0100 Subject: [Biopython-dev] Fwd: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts In-Reply-To: References: <20140320114815.70210fd5@grandao> Message-ID: Sounds good :). However, my passport is still Indonesian and I'd have to apply for a visa first in Germany in order to come to the UK :|. So I'll pass this one I guess. On Thu, Mar 20, 2014 at 12:54 PM, Jo?o Rodrigues wrote: > Lovely geographical clustering :) > > I'd be in. > > > > 2014-03-20 12:48 GMT+01:00 Tiago Antao : > >> On Thu, 20 Mar 2014 11:19:27 +0000 >> Peter Cock wrote: >> >> > Is anyone planning to attend? If not maybe I should...? >> >> >> Wild thought here: Considering that Cambridge is a geographic focal >> point for some of us (I am looking at you Dutch-based Biopythoneers, >> for instance), I am wondering if the could use this for a "local" >> Biopython meetup... Does this make any sense? Would there be interest? >> >> As I said, wild thought (silly?)... >> >> >> Tiago >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From w.arindrarto at gmail.com Thu Mar 20 12:15:56 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Thu, 20 Mar 2014 13:15:56 +0100 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: <20140320114244.4022cbc7@grandao> References: <20140320114244.4022cbc7@grandao> Message-ID: Hi Tiago, Do you plan to put the .ipynb file in the repo or will this be separate? Either way, I like the idea of having an .ipynb version of the tutorials around :). (from another IPython user). On Thu, Mar 20, 2014 at 12:42 PM, Tiago Antao wrote: > Hi all, > > Just to announce a potential project that I might embark on very soon > and see the reaction of the community: > > Get all the tutorial materials that I can find and create a ipython > notebook version of them. > > Does this sound like a good idea? > > Tiago > (your ipython notebook fanatic) > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From mgymrek at mit.edu Thu Mar 20 13:50:39 2014 From: mgymrek at mit.edu (Melissa Gymrek) Date: Thu, 20 Mar 2014 09:50:39 -0400 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal In-Reply-To: <20140318171334.0edc2b45@lnx> References: <20140318171334.0edc2b45@lnx> Message-ID: Hi Tiago, I'm happy to update this section in the tutorial if you'd like help with that. Cheers, ~M On Tue, Mar 18, 2014 at 1:13 PM, Tiago Antao wrote: > Hi, > > Currently we have went through the procedure of asking on the mailing > lists about Simcoal deprecation (now that we have fastsimcoal) > > 3 proposals and a doubt: > > 1. Deprecate > > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Async.py > > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Cache.py > 2. Delete the Simcoal tests > 3. Amend the tutorial > > The doubt: I would like to deprecate a class inside > > https://github.com/biopython/biopython/blob/master/Bio/PopGen/SimCoal/Controller.py > But not the whole Controller (the fastsimcoal code is there). > Question: Is there a procedure for a partial deprecation? > > Thanks, > T > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From mgymrek at mit.edu Thu Mar 20 13:57:13 2014 From: mgymrek at mit.edu (Melissa Gymrek) Date: Thu, 20 Mar 2014 09:57:13 -0400 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: References: <20140320114244.4022cbc7@grandao> Message-ID: Hi Tiago, I also really like this idea. Seems like it would make sense to have them as part of the repository to make it easy for others to contribute. (yet another IPython notebook user :) ) ~M On Thu, Mar 20, 2014 at 8:15 AM, Wibowo Arindrarto wrote: > Hi Tiago, > > Do you plan to put the .ipynb file in the repo or will this be > separate? Either way, I like the idea of having an .ipynb version of > the tutorials around :). > > (from another IPython user). > > On Thu, Mar 20, 2014 at 12:42 PM, Tiago Antao wrote: > > Hi all, > > > > Just to announce a potential project that I might embark on very soon > > and see the reaction of the community: > > > > Get all the tutorial materials that I can find and create a ipython > > notebook version of them. > > > > Does this sound like a good idea? > > > > Tiago > > (your ipython notebook fanatic) > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From tra at popgen.net Thu Mar 20 14:11:16 2014 From: tra at popgen.net (Tiago Antao) Date: Thu, 20 Mar 2014 14:11:16 +0000 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: References: <20140320114244.4022cbc7@grandao> Message-ID: <20140320141116.3f98384f@grandao> Hi Bow and Melissa, I was planning on doing this separately. But happy to do it otherwise. Or maybe we could start a git repo, do some examples and see where it goes? Considering that this would be starting from scratch I was planning on doing this on ipython 2.0 with python 3.4. You know, living on the edge ;) Tiago On Thu, 20 Mar 2014 09:57:13 -0400 Melissa Gymrek wrote: > Hi Tiago, > I also really like this idea. Seems like it would make sense to have > them as part of the repository to make it easy for others to > contribute. (yet another IPython notebook user :) ) > ~M > > > On Thu, Mar 20, 2014 at 8:15 AM, Wibowo Arindrarto > wrote: > > > Hi Tiago, > > > > Do you plan to put the .ipynb file in the repo or will this be > > separate? Either way, I like the idea of having an .ipynb version of > > the tutorials around :). > > > > (from another IPython user). > > > > On Thu, Mar 20, 2014 at 12:42 PM, Tiago Antao > > wrote: > > > Hi all, > > > > > > Just to announce a potential project that I might embark on very > > > soon and see the reaction of the community: > > > > > > Get all the tutorial materials that I can find and create a > > > ipython notebook version of them. > > > > > > Does this sound like a good idea? > > > > > > Tiago > > > (your ipython notebook fanatic) > > > _______________________________________________ > > > Biopython-dev mailing list > > > Biopython-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > From p.j.a.cock at googlemail.com Thu Mar 20 14:19:51 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 20 Mar 2014 14:19:51 +0000 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: References: <20140320114244.4022cbc7@grandao> Message-ID: +1 for any *.ipynb files being under source code control. There are perhaps advantages to using a separate repository, but still under Biopython on GitHub? This might also help if we wanted to build on existing external tutorials which are under a CC licence etc... Peter On Thu, Mar 20, 2014 at 1:57 PM, Melissa Gymrek wrote: > Hi Tiago, > I also really like this idea. Seems like it would make sense to have them > as part of the repository to make it easy for others to contribute. > (yet another IPython notebook user :) ) > ~M > > > On Thu, Mar 20, 2014 at 8:15 AM, Wibowo Arindrarto > wrote: > >> Hi Tiago, >> >> Do you plan to put the .ipynb file in the repo or will this be >> separate? Either way, I like the idea of having an .ipynb version of >> the tutorials around :). >> >> (from another IPython user). >> >> On Thu, Mar 20, 2014 at 12:42 PM, Tiago Antao wrote: >> > Hi all, >> > >> > Just to announce a potential project that I might embark on very soon >> > and see the reaction of the community: >> > >> > Get all the tutorial materials that I can find and create a ipython >> > notebook version of them. >> > >> > Does this sound like a good idea? >> > >> > Tiago >> > (your ipython notebook fanatic) >> > _______________________________________________ >> > Biopython-dev mailing list >> > Biopython-dev at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biopython-dev >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From anaryin at gmail.com Thu Mar 20 14:21:12 2014 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Thu, 20 Mar 2014 15:21:12 +0100 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: <20140320141116.3f98384f@grandao> References: <20140320114244.4022cbc7@grandao> <20140320141116.3f98384f@grandao> Message-ID: +1 too. Maybe adding some support for oldies (Python 2.x) or are there features in iPython 2.0 that cannot be used in these older versions?? From tra at popgen.net Thu Mar 20 14:48:15 2014 From: tra at popgen.net (Tiago Antao) Date: Thu, 20 Mar 2014 14:48:15 +0000 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: References: <20140320114244.4022cbc7@grandao> Message-ID: <20140320144815.30b7a138@grandao> On Thu, 20 Mar 2014 14:19:51 +0000 Peter Cock wrote: > +1 for any *.ipynb files being under source code control. > > There are perhaps advantages to using a separate repository, > but still under Biopython on GitHub? This might also help if we > wanted to build on existing external tutorials which are under > a CC licence etc... My original plan was to draw "heavy inspiration" (credited, of course) from the existing Tutorial and maybe your workshop work. This all started when I noticed the need to change the tutorial due to simcoal changes... As I had to re-visit this, the idea followed... If people are fine with something under the biopython organization, I am fine with that. I have two proposals, though: 1. Base it on recent infrastructure (python 3.4 or maybe 3.3, and above all ipython 2.0) 2. Go on and "do stuff", see where it goes and then maybe re-organize in the future (as opposed to do lots of planning first). This is, in some sense, a new line of direction and I would suggest that being exploratory would be better than being cautious... Tiago From p.j.a.cock at googlemail.com Thu Mar 20 14:53:41 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 20 Mar 2014 14:53:41 +0000 Subject: [Biopython-dev] Ipython notebook biopython tutorials In-Reply-To: <20140320144815.30b7a138@grandao> References: <20140320114244.4022cbc7@grandao> <20140320144815.30b7a138@grandao> Message-ID: On Thu, Mar 20, 2014 at 2:48 PM, Tiago Antao wrote: > On Thu, 20 Mar 2014 14:19:51 +0000 > Peter Cock wrote: > >> +1 for any *.ipynb files being under source code control. >> >> There are perhaps advantages to using a separate repository, >> but still under Biopython on GitHub? This might also help if we >> wanted to build on existing external tutorials which are under >> a CC licence etc... > > > My original plan was to draw "heavy inspiration" (credited, of course) > from the existing Tutorial and maybe your workshop work. > > This all started when I noticed the need to change the tutorial due to > simcoal changes... As I had to re-visit this, the idea followed... > > If people are fine with something under the biopython organization, I > am fine with that. > > I have two proposals, though: > > 1. Base it on recent infrastructure (python 3.4 or maybe 3.3, and above > all ipython 2.0) > > 2. Go on and "do stuff", see where it goes and then maybe re-organize > in the future (as opposed to do lots of planning first). This is, in > some sense, a new line of direction and I would suggest that being > exploratory would be better than being cautious... > > Tiago So make a new repository and explore away :) Regarding https://github.com/peterjc/biopython_workshop - my workshop stuff I did wonder at the time about using iPython notebook but it adds another step to the workshop setup - and another barrier for people to repeat what they did at home. I was/am hoping to improve the TravisCI coverage of that work to check all the examples work under Python 2.6, 2.7 3.3 etc. I wonder if iPython notebooks make automated testing any easier or not? Peter From tra at popgen.net Thu Mar 20 15:27:38 2014 From: tra at popgen.net (Tiago Antao) Date: Thu, 20 Mar 2014 15:27:38 +0000 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal In-Reply-To: References: <20140318171334.0edc2b45@lnx> Message-ID: <20140320152738.3b3ab8ac@grandao> Hi Melissa, On Thu, 20 Mar 2014 09:50:39 -0400 Melissa Gymrek wrote: > I'm happy to update this section in the tutorial if you'd like help > with that. I just did all the changes (not much really). I was planning on committing the changes (Peter, can I?) and then some reviewing (or changing, if needed) would really be appreciated. Tiago From p.j.a.cock at googlemail.com Thu Mar 20 15:29:15 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 20 Mar 2014 15:29:15 +0000 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal In-Reply-To: <20140320152738.3b3ab8ac@grandao> References: <20140318171334.0edc2b45@lnx> <20140320152738.3b3ab8ac@grandao> Message-ID: On Thu, Mar 20, 2014 at 3:27 PM, Tiago Antao wrote: > Hi Melissa, > > On Thu, 20 Mar 2014 09:50:39 -0400 > Melissa Gymrek wrote: > >> I'm happy to update this section in the tutorial if you'd like help >> with that. > > I just did all the changes (not much really). I was planning on > committing the changes (Peter, can I?) and then some reviewing (or > changing, if needed) would really be appreciated. > > Tiago Please do :) Peter From mgymrek at mit.edu Thu Mar 20 15:34:52 2014 From: mgymrek at mit.edu (Melissa Gymrek) Date: Thu, 20 Mar 2014 11:34:52 -0400 Subject: [Biopython-dev] Deprecating parts of Bio.PopGen.SimCoal In-Reply-To: References: <20140318171334.0edc2b45@lnx> <20140320152738.3b3ab8ac@grandao> Message-ID: sounds good! happy to have a look ~M On Thu, Mar 20, 2014 at 11:29 AM, Peter Cock wrote: > On Thu, Mar 20, 2014 at 3:27 PM, Tiago Antao wrote: > > Hi Melissa, > > > > On Thu, 20 Mar 2014 09:50:39 -0400 > > Melissa Gymrek wrote: > > > >> I'm happy to update this section in the tutorial if you'd like help > >> with that. > > > > I just did all the changes (not much really). I was planning on > > committing the changes (Peter, can I?) and then some reviewing (or > > changing, if needed) would really be appreciated. > > > > Tiago > > Please do :) > > Peter > From b.invergo at gmail.com Thu Mar 20 13:39:34 2014 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 20 Mar 2014 13:39:34 +0000 Subject: [Biopython-dev] Fwd: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts In-Reply-To: <20140320114815.70210fd5@grandao> (Tiago Antao's message of "Thu, 20 Mar 2014 11:48:15 +0000") References: <20140320114815.70210fd5@grandao> Message-ID: <87txathtux.fsf@chupacabra.windows.ebi.ac.uk> > Wild thought here: Considering that Cambridge is a geographic focal > point for some of us (I am looking at you Dutch-based Biopythoneers, > for instance), I am wondering if the could use this for a "local" > Biopython meetup... Does this make any sense? Would there be interest? > > As I said, wild thought (silly?)... Since I'm now based in Cambridge, it would by silly for me not to attend. I'm not all that active lately (biopython's doing what I want it to do) but it'd still be nice to meet up. Cheers, Brandon -- Brandon Invergo http://brandon.invergo.net -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 489 bytes Desc: not available URL: From w.arindrarto at gmail.com Fri Mar 21 14:59:40 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 21 Mar 2014 15:59:40 +0100 Subject: [Biopython-dev] [Biopython] Exonerate Parser Error In-Reply-To: References: Message-ID: Hi Zheng, The nucleotide information is stored as the alignment annotation. You can access it using hsp.aln_annotation['query_annotation']. There, they are stored as triplets, reprensenting the codons. This is indeed a tradeoff that I had to make because there is no proper model yet to represent alignment objects containing sequences with different length in our master branch. In this case, the length of the DNA is most of the time 3x the length of the protein. And yes, this is not ideal since the actual query are now stored as an annotation ~ trading places with the translated query. HSPs themselves are basically modelled based on our MultipleSeqAlignment objects (you can get such objects when accessing the `aln` attribute from an HSP object). I think in order to properly model these types of alignment, we need to have a proper model of three-letter protein Seq objects as well. Your CodonSeqAlignment object may help here :), but I have not looked into it that much to be honest. How does it work with Seq objects with ProteinAlphabet? Is it possible to align protein and codon sequences? I tried storing as much information as possible using the current approach (e.g. notice the start and end coordinates of each hit and query, they are parsed from the file and the difference is not the same as the value you get when doing a `len` on hsp.query and/or hsp.hit). Note also that when dealing with frameshifts, you may want to access the hsp.fragments attribute, since frameshifts mean that you can break further your HSP alignment into multiple subalignments (fragments as it is called in SearchIO). Hope this helps :), Bow P.S. Also CC-ing the Development list ~ this looks like something interesting for dev in general. On Fri, Mar 21, 2014 at 3:39 PM, Zheng Ruan wrote: > Thanks Bow, > > That works for me. But it seems the parser doesn't take the nucleotide > information into the hsps. All I get is a pairwise alignment between two > proteins. Nucleotide information is useful because I want to know the codon > -- amino acid correspondence. In the case of frameshift the situation may > not be that straightforward. Maybe you have other concern of not doing this. > > Best, > Zheng > > > On Thu, Mar 20, 2014 at 7:30 PM, Wibowo Arindrarto > wrote: >> >> Hi Zheng, >> >> Thank you for the files :). I found out what was causing the error and >> have pushed a patch along with some tests to our codebase >> >> (https://github.com/biopython/biopython/commit/377889b05235c2e6f192916fb610d0da01b45c6d). >> You should be able to parse your file using the latest `master` >> branch. >> >> Hope this helps, >> Bow >> >> On Thu, Mar 20, 2014 at 9:42 PM, Zheng Ruan wrote: >> > Hi Bow, >> > >> > I'm happy to provide the example for testing. See attachment. >> > >> > The command to generate the output above. >> > exonerate --showvulgar no --showalignment yes nuc.fa pro.fa >> > >> > I'll check the test suite to see if I can find why. >> > >> > Best, >> > Zheng >> > >> > >> > On Thu, Mar 20, 2014 at 4:33 PM, Wibowo Arindrarto >> > >> > wrote: >> >> >> >> > Looking at our test cases, this particular case may have slipped >> >> > testing. We do test for several cases of dna2protein (which could >> >> > explain why it works when the nucleotide sequence comes first), but >> >> > not protein2dna. Please let me know if I can also use your example as >> >> > a test in our test corpus :). >> >> >> >> Oops, I meant the reverse ~ we have several test cases for protein2dna >> >> which may explain why it works when the protein sequence comes first >> >> ;). >> > >> > > > From Tom.Brown at enmu.edu Fri Mar 21 16:30:06 2014 From: Tom.Brown at enmu.edu (Brown, Tom) Date: Fri, 21 Mar 2014 16:30:06 +0000 Subject: [Biopython-dev] Blastp for Proteins from Nematoda Message-ID: <7D3EF56670A2CC448808CB89D32F7372B3D1FE20@ITSNV499.ad.enet.enmu.edu> How can I blastp a protein against only proteins from organisms in Nematoda (a phylum)? Thanks Tom ________________________________ Confidentiality Notice: This e-mail, including all attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information as defined under FERPA. Any unauthorized review, use, disclosure or distribution is prohibited unless specifically provided under the New Mexico Inspection of Public Records Act. If you are not the intended recipient, please contact the sender and destroy all copies of this message From p.j.a.cock at googlemail.com Fri Mar 21 16:35:00 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 21 Mar 2014 16:35:00 +0000 Subject: [Biopython-dev] Blastp for Proteins from Nematoda In-Reply-To: <7D3EF56670A2CC448808CB89D32F7372B3D1FE20@ITSNV499.ad.enet.enmu.edu> References: <7D3EF56670A2CC448808CB89D32F7372B3D1FE20@ITSNV499.ad.enet.enmu.edu> Message-ID: On Fri, Mar 21, 2014 at 4:30 PM, Brown, Tom wrote: > How can I blastp a protein against only proteins from organisms in Nematoda (a phylum)? > > Thanks > > Tom Hi Tom, The two main options I can think of are: 1. Make a database of only nematode proteins - which is probably a good idea anyway as many are not in NR, see also http://www.nematodes.org/ 2. Search against the NCBI NR database doing a taxonomy filter, which is possible if you use the -remote option and an Entrez filter (taxonomy ID is 6231, so try txid6231[ORGN] as the Entrez filter). Peter From Tom.Brown at enmu.edu Fri Mar 21 19:23:14 2014 From: Tom.Brown at enmu.edu (Brown, Tom) Date: Fri, 21 Mar 2014 19:23:14 +0000 Subject: [Biopython-dev] Blastp for Proteins from Nematoda In-Reply-To: References: <7D3EF56670A2CC448808CB89D32F7372B3D1FE20@ITSNV499.ad.enet.enmu.edu> Message-ID: <7D3EF56670A2CC448808CB89D32F7372B3D1FEC5@ITSNV499.ad.enet.enmu.edu> Peter, Thanks. It is working. Tom -----Original Message----- From: Peter Cock [mailto:p.j.a.cock at googlemail.com] Sent: Friday, March 21, 2014 10:35 AM To: Brown, Tom Cc: Biopython-dev at lists.open-bio.org Subject: Re: [Biopython-dev] Blastp for Proteins from Nematoda On Fri, Mar 21, 2014 at 4:30 PM, Brown, Tom wrote: > How can I blastp a protein against only proteins from organisms in Nematoda (a phylum)? > > Thanks > > Tom Hi Tom, The two main options I can think of are: 1. Make a database of only nematode proteins - which is probably a good idea anyway as many are not in NR, see also http://www.nematodes.org/ 2. Search against the NCBI NR database doing a taxonomy filter, which is possible if you use the -remote option and an Entrez filter (taxonomy ID is 6231, so try txid6231[ORGN] as the Entrez filter). Peter ________________________________ Confidentiality Notice: This e-mail, including all attachments, is for the sole use of the intended recipient(s) and may contain confidential and privileged information as defined under FERPA. Any unauthorized review, use, disclosure or distribution is prohibited unless specifically provided under the New Mexico Inspection of Public Records Act. If you are not the intended recipient, please contact the sender and destroy all copies of this message From zruan1991 at gmail.com Fri Mar 21 19:32:33 2014 From: zruan1991 at gmail.com (Zheng Ruan) Date: Fri, 21 Mar 2014 15:32:33 -0400 Subject: [Biopython-dev] [Biopython] Exonerate Parser Error In-Reply-To: References: Message-ID: Hi Bow, I have the same problem when trying to model codon alignment with frameshift being considered. Basically, I have a CodonSeq object to store a coding sequence. The only difference between CodonSeq and Seq object is that CodonSeq has an attribute -- `rf_table` (reading frame table). It's actually a list of positions each codon starts with, so that translate() method will go through the list to translate codon into amino acid. In this case, it is easy to store a coding sequence with frameshift events. And it's not necessary to split the protein to dna alignment into multiple part when frameshift occurs. However, the problem now becomes how to obtain such information (`rf_table`). I find exonerate is quite capable of handling this task, especially with introns in the dna. I do think an object to store protein to dna alignment is necessary in this scenario. Best, Zheng On Fri, Mar 21, 2014 at 10:59 AM, Wibowo Arindrarto wrote: > Hi Zheng, > > The nucleotide information is stored as the alignment annotation. You > can access it using hsp.aln_annotation['query_annotation']. There, > they are stored as triplets, reprensenting the codons. > > This is indeed a tradeoff that I had to make because there is no > proper model yet to represent alignment objects containing sequences > with different length in our master branch. In this case, the length > of the DNA is most of > the time 3x the length of the protein. And yes, this is not ideal > since the actual query are now stored as an annotation ~ trading > places with the translated query. HSPs themselves are basically > modelled based on our MultipleSeqAlignment objects (you can get such > objects when accessing the `aln` attribute from an HSP object). I > think in order to properly model these types of alignment, we need to > have a proper model of three-letter protein Seq objects as well. > > Your CodonSeqAlignment object may help here :), but I have not looked > into it that much to be honest. How does it work with Seq objects with > ProteinAlphabet? Is it possible to align protein and codon sequences? > > I tried storing as much information as possible using the current > approach (e.g. notice the start and end coordinates of each hit and > query, they are parsed from the file and the difference is not the > same as the value you get when doing a `len` on hsp.query and/or > hsp.hit). Note also that when dealing with frameshifts, you may want > to access the hsp.fragments attribute, since frameshifts mean that you > can break further your HSP alignment into multiple subalignments > (fragments as it is called in SearchIO). > > Hope this helps :), > Bow > > P.S. Also CC-ing the Development list ~ this looks like something > interesting for dev in general. > > On Fri, Mar 21, 2014 at 3:39 PM, Zheng Ruan wrote: > > Thanks Bow, > > > > That works for me. But it seems the parser doesn't take the nucleotide > > information into the hsps. All I get is a pairwise alignment between two > > proteins. Nucleotide information is useful because I want to know the > codon > > -- amino acid correspondence. In the case of frameshift the situation may > > not be that straightforward. Maybe you have other concern of not doing > this. > > > > Best, > > Zheng > > > > > > On Thu, Mar 20, 2014 at 7:30 PM, Wibowo Arindrarto < > w.arindrarto at gmail.com> > > wrote: > >> > >> Hi Zheng, > >> > >> Thank you for the files :). I found out what was causing the error and > >> have pushed a patch along with some tests to our codebase > >> > >> ( > https://github.com/biopython/biopython/commit/377889b05235c2e6f192916fb610d0da01b45c6d > ). > >> You should be able to parse your file using the latest `master` > >> branch. > >> > >> Hope this helps, > >> Bow > >> > >> On Thu, Mar 20, 2014 at 9:42 PM, Zheng Ruan > wrote: > >> > Hi Bow, > >> > > >> > I'm happy to provide the example for testing. See attachment. > >> > > >> > The command to generate the output above. > >> > exonerate --showvulgar no --showalignment yes nuc.fa pro.fa > >> > > >> > I'll check the test suite to see if I can find why. > >> > > >> > Best, > >> > Zheng > >> > > >> > > >> > On Thu, Mar 20, 2014 at 4:33 PM, Wibowo Arindrarto > >> > > >> > wrote: > >> >> > >> >> > Looking at our test cases, this particular case may have slipped > >> >> > testing. We do test for several cases of dna2protein (which could > >> >> > explain why it works when the nucleotide sequence comes first), but > >> >> > not protein2dna. Please let me know if I can also use your example > as > >> >> > a test in our test corpus :). > >> >> > >> >> Oops, I meant the reverse ~ we have several test cases for > protein2dna > >> >> which may explain why it works when the protein sequence comes first > >> >> ;). > >> > > >> > > > > > > From arklenna at gmail.com Fri Mar 21 20:54:05 2014 From: arklenna at gmail.com (Lenna Peterson) Date: Fri, 21 Mar 2014 16:54:05 -0400 Subject: [Biopython-dev] [Biopython] Exonerate Parser Error In-Reply-To: References: Message-ID: On Fri, Mar 21, 2014 at 3:32 PM, Zheng Ruan wrote: > Hi Bow, > > I have the same problem when trying to model codon alignment with > frameshift being considered. Basically, I have a CodonSeq object to store a > coding sequence. The only difference between CodonSeq and Seq object is > that CodonSeq has an attribute -- `rf_table` (reading frame table). It's > actually a list of positions each codon starts with, so that translate() > method will go through the list to translate codon into amino acid. In this > case, it is easy to store a coding sequence with frameshift events. And > it's not necessary to split the protein to dna alignment into multiple part > when frameshift occurs. However, the problem now becomes how to obtain such > information (`rf_table`). I find exonerate is quite capable of handling > this task, especially with introns in the dna. I do think an object to > store protein to dna alignment is necessary in this scenario. > Is the (still unmerged) CoordinateMapper the solution to this? http://biopython.org/wiki/Coordinate_mapping If so, let me know and I'll rebase and refresh the pull request. If not, I misunderstood the problem. Cheers, Lenna From zruan1991 at gmail.com Fri Mar 21 21:53:13 2014 From: zruan1991 at gmail.com (Zheng Ruan) Date: Fri, 21 Mar 2014 17:53:13 -0400 Subject: [Biopython-dev] Fwd: [Biopython] Exonerate Parser Error In-Reply-To: References: Message-ID: Forget to cc'd to dev list. Hi Lenna, I'm not quite sure about CoordinateMapper, but it seems to deal with sequence files with rich annotation like genbank. However, In our case, we are typically not sure about the coordinate correspondence between dna and protein sequence. That's why exonerate can help. Thanks! On Fri, Mar 21, 2014 at 4:54 PM, Lenna Peterson wrote: > On Fri, Mar 21, 2014 at 3:32 PM, Zheng Ruan wrote: > >> Hi Bow, >> >> I have the same problem when trying to model codon alignment with >> frameshift being considered. Basically, I have a CodonSeq object to store >> a >> coding sequence. The only difference between CodonSeq and Seq object is >> that CodonSeq has an attribute -- `rf_table` (reading frame table). It's >> actually a list of positions each codon starts with, so that translate() >> method will go through the list to translate codon into amino acid. In >> this >> case, it is easy to store a coding sequence with frameshift events. And >> it's not necessary to split the protein to dna alignment into multiple >> part >> when frameshift occurs. However, the problem now becomes how to obtain >> such >> information (`rf_table`). I find exonerate is quite capable of handling >> this task, especially with introns in the dna. I do think an object to >> store protein to dna alignment is necessary in this scenario. >> > > Is the (still unmerged) CoordinateMapper the solution to this? > http://biopython.org/wiki/Coordinate_mapping > If so, let me know and I'll rebase and refresh the pull request. > If not, I misunderstood the problem. > > Cheers, > > Lenna > From p.j.a.cock at googlemail.com Mon Mar 24 11:57:14 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 24 Mar 2014 11:57:14 +0000 Subject: [Biopython-dev] Volunteer buildslave machines? e.g. Windows & 32 bit Linux Message-ID: Hello all, Tiago and I have been looking after a range of machines covering different operating systems and Python versions, running as volunteer buildslaves for Biopython using buildbot: http://testing.open-bio.org/biopython/tgrid Does anyone else have a lab/home server which could be setup to run nightly Biopython tests for us via buildbot? Ideally the machine needs to be online overnight (European time) when the server is currently setup to schedule tests: http://www.biopython.org/wiki/Continuous_integration Our elderly 32 bit Linux desktop which has been running as a Biopython buildslave for the last few years is finally failing (hard drive problem). I would particularly like to see new buildslaves for: * 32 bit Linux * 64 bit Windows * Windows 7 or 8 (we have a 32 bit XP machine) If you think you might be able to help, the first hurdle is verifying you can checkout Biopython from github, and then compile the source (this is non-trivial on Windows, especially for 64 bit Windows). Note that this is separate from the continuous integration testing done for use via TravisCI whenever the GitHub repository is updated - this is very useful but currently only covers Linux: https://travis-ci.org/biopython/biopython/builds The key benefit of the buildbot server is cross platform testing - but this requires a range of volunteer machines. Thanks, Peter RE: http://lists.open-bio.org/pipermail/biopython-dev/2014-March/011158.html On Tue, Mar 18, 2014 at 1:15 PM, Peter Cock wrote: > On Wed, Mar 12, 2014 at 3:10 PM, Tiago Antao wrote: >> Hi, >> >> I have a docker container ready (save for a few applications). Simple >> usage instructions: >> >> ... >> >> Tiago > > Is this a 32 or 64 bit VM, or either? > > I'm asking because we may want to source a replacement > 32 bit Linux buildslave - the hard drive in the old machine > we've been using is failing, and it is probably not worth > replacing. > > Peter From p.j.a.cock at googlemail.com Mon Mar 24 16:42:29 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 24 Mar 2014 16:42:29 +0000 Subject: [Biopython-dev] Fwd: [Fwd: missing NCBI DTDs] In-Reply-To: References: Message-ID: Three more NCBI Entrez DTD files added: https://github.com/biopython/biopython/commit/58e024f7704c3b7d3694fda42be6fa47808dad7d The downside of the new Entrez code is it silently downloads and caches missing DTD files, so we may want to add something to the manual release process to check what missing DTD files have been cached locally... e.g. $ ls ~/.config/biopython/Bio/Entrez/DTDs/ Regards, Peter ---------- Forwarded message ---------- From: Peter Cock Date: Mon, Mar 24, 2014 at 4:19 PM Subject: Re: [Fwd: missing NCBI DTDs] To: xxx at uci.edu Cc: "biopython-owner at lists.open-bio.org" Thanks for getting in touch. Sadly back when anyone could email the list we had far too much spam. Unfortunately the only practical solution was to insist people join the mailing list before posting. With hindsight the missing DTD message should have also said please check the latest code / issue tracker - in this case we've fixed the missing esummary-v1.dtd file: https://github.com/biopython/biopython/commit/cb560e79def4b24c831725308f17123af4e8eeff We do seem to be missing the other three through, bookdoc_140101.dtd nlmmedlinecitationset_140101.dtd pubmed_140101.dtd The sample code you provided makes testing this easier, thank you. Peter ---------- Forwarded message ---------- On Mon, Mar 24, 2014 at 4:04 PM, wrote: When I attempted to run the following python script: from Bio import Entrez Entrez.email = "esharman at uci.edu" handle = Entrez.efetch(db="pubmed", id="24653700", retmode="xml") record = Entrez.read(handle) handle.close() print record[0]["ArticleTitle"] the following DTDs were reported missing: bookdoc_140101.dtd nlmmedlinecitationset_140101.dtd pubmed_140101.dtd When I ran a similar script to access the SNP database, the following DTD was reported missing: esummary-v1.dtd Downloading and saving these files to the requested python directory eliminated the error messages. Biopython is an absolutely super package! Hope this helps. From marco.galardini at unifi.it Tue Mar 25 23:40:44 2014 From: marco.galardini at unifi.it (Marco Galardini) Date: Tue, 25 Mar 2014 23:40:44 +0000 Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: <52CD2948.7050102@unifi.it> References: <1389160399.57779.YahooMailBasic@web164004.mail.gq1.yahoo.com> <52CD2948.7050102@unifi.it> Message-ID: <533213FC.8010304@unifi.it> Hi all, following your suggestions (as well as the other modules implementations) I've just committed a couple of commits to my biopython fork, featuring the Bio.Phenomics module. The module capabilities are limited to reading/writing Phenotype Microarray files and basic operations on the PlateRecord/WellRecord objects. The module requires numpy to interpolate the signal when the user request a time point that wasn't in the input file (this way the WellRecord object can be queried with slices). I'm thinking on how to implement the parameters extraction from WellRecord objects without the use of scipy. Here's the link to my branch: https://github.com/mgalardini/biopython/tree/phenomics The module and functions have been documented taking inspiration from the other modules: hope they are clear enough for you to try it out. Some example files can be found in Tests/Phenomics. Marco On 08/01/2014 10:32, Marco Galardini wrote: > Hi, > > On 01/08/2014 06:53 AM, Michiel de Hoon wrote: >>> any specification on the style guide for the biopython parsers? >> There is no strict set of rules, but to get you started, many modules >> follow this format: >> - Assuming a PM data file contains only a single data set, the module >> should contain a function "read" that takes either a file name or a file >> handle as the argument. > Unfortunately, the situation is a bit mixed up: there are basically > three file formats for PM data: as csv files (which can contain one or > more data sets or 'plates') and as yaml/json, which can contain also > some metadata. I would therefore use a similar approach as the SeqIO > module, having a parse() and a read() method that returns an exception > if the file contains more than one record. > >> - The module should contain a class (typically called "Record") that >> can store the data in the data file. The "read" function returns an >> object of this class. >> - Try to avoid third-party dependencies if at all possible. > So far the dependencies would be pyYaml (for the yaml/json parsing, > but maybe i could use the stdlib json module) and numpy/scipy for the > extraction of curve parameters. Does this sound ok? >> >> Would it make sense to have a single Bio.Microarray module that can >> house the various microarray parsers (PM, Affy, others)? > I don't know if that would be a good strategy: the Phenotype > Microarrays are very different from the other proper microarrays; how > about a "phenomics" module? > >> >> Best, >> -Michiel. > Kind regards, > Marco > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- ------------------------------------------------- Marco Galardini, PhD Dipartimento di Biologia Via Madonna del Piano, 6 - 50019 Sesto Fiorentino (FI) e-mail: marco.galardini at unifi.it www: http://www.unifi.it/dblage/CMpro-v-p-51.html phone: +39 055 4574737 mobile: +39 340 2808041 ------------------------------------------------- From mjldehoon at yahoo.com Wed Mar 26 02:15:52 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 25 Mar 2014 19:15:52 -0700 (PDT) Subject: [Biopython-dev] Fwd: [Fwd: missing NCBI DTDs] In-Reply-To: Message-ID: <1395800152.18518.YahooMailBasic@web164004.mail.gq1.yahoo.com> We could consider to not include any DTDs with Biopython, and rely on downloading them automatically. This seems a better test case than what we currently have, because as NCBI updates their DTDs, Bio.Entrez depends on this automatic download capability. Best, -Michiel. -------------------------------------------- On Mon, 3/24/14, Peter Cock wrote: Subject: [Biopython-dev] Fwd: [Fwd: missing NCBI DTDs] To: "Biopython-Dev Mailing List" Date: Monday, March 24, 2014, 12:42 PM Three more NCBI Entrez DTD files added: https://github.com/biopython/biopython/commit/58e024f7704c3b7d3694fda42be6fa47808dad7d The downside of the new Entrez code is it silently downloads and caches missing DTD files, so we may want to add something to the manual release process to check what missing DTD files have been cached locally... e.g. $ ls ~/.config/biopython/Bio/Entrez/DTDs/ Regards, Peter ---------- Forwarded message ---------- From: Peter Cock Date: Mon, Mar 24, 2014 at 4:19 PM Subject: Re: [Fwd: missing NCBI DTDs] To: xxx at uci.edu Cc: "biopython-owner at lists.open-bio.org" Thanks for getting in touch. Sadly back when anyone could email the list we had far too much spam. Unfortunately the only practical solution was to insist people join the mailing list before posting. With hindsight the missing DTD message should have also said please check the latest code / issue tracker - in this case we've fixed the missing esummary-v1.dtd file: https://github.com/biopython/biopython/commit/cb560e79def4b24c831725308f17123af4e8eeff We do seem to be missing the other three through, bookdoc_140101.dtd nlmmedlinecitationset_140101.dtd pubmed_140101.dtd The sample code you provided makes testing this easier, thank you. Peter ---------- Forwarded message ---------- On Mon, Mar 24, 2014 at 4:04 PM,? wrote: When I attempted to run the following python script: from Bio import Entrez Entrez.email = "esharman at uci.edu" handle = Entrez.efetch(db="pubmed", id="24653700", retmode="xml") record = Entrez.read(handle) handle.close() print record[0]["ArticleTitle"] the following DTDs were reported missing: bookdoc_140101.dtd nlmmedlinecitationset_140101.dtd pubmed_140101.dtd When I ran a similar script to access the SNP database, the following DTD was reported missing: esummary-v1.dtd Downloading and saving these files to the requested python directory eliminated the error messages. Biopython is an absolutely super package! Hope this helps. _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Wed Mar 26 09:18:21 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 26 Mar 2014 09:18:21 +0000 Subject: [Biopython-dev] Fwd: [Fwd: missing NCBI DTDs] In-Reply-To: <1395800152.18518.YahooMailBasic@web164004.mail.gq1.yahoo.com> References: <1395800152.18518.YahooMailBasic@web164004.mail.gq1.yahoo.com> Message-ID: On Wed, Mar 26, 2014 at 2:15 AM, Michiel de Hoon wrote: > We could consider to not include any DTDs with Biopython, > and rely on downloading them automatically. > This seems a better test case than what we currently have, > because as NCBI updates their DTDs, Bio.Entrez depends > on this automatic download capability. > > Best, > -Michiel. Long term not bundling the DTD files seems a good idea. Being cautious we could bundle them for the next release, see how the download mechanism works in the wild, and drop the DTD files for the release after that? This would mean all the Entrez parser tests would require internet access (even if using an old XML file on disk), but given that most of Bio.Entrez requires a connection to the NCBI anyway this isn't such a problem. If we do go down this route, would the current once-a-week running of the online tests with buildbot be enough? Peter From p.j.a.cock at googlemail.com Wed Mar 26 10:14:53 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 26 Mar 2014 10:14:53 +0000 Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: <533213FC.8010304@unifi.it> References: <1389160399.57779.YahooMailBasic@web164004.mail.gq1.yahoo.com> <52CD2948.7050102@unifi.it> <533213FC.8010304@unifi.it> Message-ID: On Tue, Mar 25, 2014 at 11:40 PM, Marco Galardini wrote: > Hi all, > > following your suggestions (as well as the other modules implementations) > I've just committed a couple of commits to my biopython fork, featuring the > Bio.Phenomics module. > The module capabilities are limited to reading/writing Phenotype Microarray > files and basic operations on the PlateRecord/WellRecord objects. The module > requires numpy to interpolate the signal when the user request a time point > that wasn't in the input file (this way the WellRecord object can be queried > with slices). > I'm thinking on how to implement the parameters extraction from WellRecord > objects without the use of scipy. > > Here's the link to my branch: > https://github.com/mgalardini/biopython/tree/phenomics > The module and functions have been documented taking inspiration from the > other modules: hope they are clear enough for you to try it out. > Some example files can be found in Tests/Phenomics. > > Marco Hi Marco, I've not worked with kind of data so my comments are not on the application specifics. But I'm pleased to see unit tests :) One thought was while you define (Java like?) getRow and getColumn methods, your __getitem__ does not support (NumPy like) access, which is something we do for multiple sequence alignments. I guess while most plates are laid out in a grid, the row/column for each sample is not the most important thing - the sample identifier is? Thinking out loud, would properties `rows` and `columns` etc be nicer than `getRow` and `getColumn`, supporting iteration over the rows/columns/etc and indexing? Minor: Your longer function docstrings do not follow PEP257, specifically starting with a one line summary, then a blank line, then the details. Also you are using triple single-quotes, rather than triple double-quotes (like the rest of Biopthon). http://legacy.python.org/dev/peps/pep-0257/ Peter P.S. Also, I'm not very keen on the module name, phenomics - I wonder if it would earn Biopython a badomics award? ;) http://dx.doi.org/10.1186/2047-217X-1-6 From marco.galardini at unifi.it Wed Mar 26 13:26:42 2014 From: marco.galardini at unifi.it (Marco Galardini) Date: Wed, 26 Mar 2014 14:26:42 +0100 Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: References: <1389160399.57779.YahooMailBasic@web164004.mail.gq1.yahoo.com> <52CD2948.7050102@unifi.it> <533213FC.8010304@unifi.it> Message-ID: <20140326142642.9o3hmuipcocsw8g8@webmail.unifi.it> Hi, many thanks for your comments, below some replies: ----- Messaggio da p.j.a.cock at googlemail.com --------- Data: Wed, 26 Mar 2014 10:14:53 +0000 Da: Peter Cock Rispondi-A:Peter Cock Oggetto: Re: [Biopython-dev] Interested in a Phenotype Microarray parser? A: Marco Galardini Cc: Biopython-Dev Mailing List > On Tue, Mar 25, 2014 at 11:40 PM, Marco Galardini > wrote: >> Hi all, >> >> following your suggestions (as well as the other modules implementations) >> I've just committed a couple of commits to my biopython fork, featuring the >> Bio.Phenomics module. >> The module capabilities are limited to reading/writing Phenotype Microarray >> files and basic operations on the PlateRecord/WellRecord objects. The module >> requires numpy to interpolate the signal when the user request a time point >> that wasn't in the input file (this way the WellRecord object can be queried >> with slices). >> I'm thinking on how to implement the parameters extraction from WellRecord >> objects without the use of scipy. >> >> Here's the link to my branch: >> https://github.com/mgalardini/biopython/tree/phenomics >> The module and functions have been documented taking inspiration from the >> other modules: hope they are clear enough for you to try it out. >> Some example files can be found in Tests/Phenomics. >> >> Marco > > Hi Marco, > > I've not worked with kind of data so my comments are not on > the application specifics. But I'm pleased to see unit tests :) > > One thought was while you define (Java like?) getRow and getColumn > methods, your __getitem__ does not support (NumPy like) access, > which is something we do for multiple sequence alignments. I guess > while most plates are laid out in a grid, the row/column for each > sample is not the most important thing - the sample identifier is? > > Thinking out loud, would properties `rows` and `columns` etc be > nicer than `getRow` and `getColumn`, supporting iteration over > the rows/columns/etc and indexing? Yeah, absolutely: I'll work on some changes to have a more straightforward way to select multiple WellRecords on row/column basis. > > Minor: Your longer function docstrings do not follow PEP257, > specifically starting with a one line summary, then a blank line, > then the details. Also you are using triple single-quotes, rather > than triple double-quotes (like the rest of Biopthon). > http://legacy.python.org/dev/peps/pep-0257/ Whoops, I'll change it, thanks > > Peter > > P.S. Also, I'm not very keen on the module name, phenomics - > I wonder if it would earn Biopython a badomics award? ;) > http://dx.doi.org/10.1186/2047-217X-1-6 That's meta-omics right? :p What about 'Phenotype' then? Maybe it's too general, but future extensions may include other phenotypic readouts. Marco > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > ----- Fine del messaggio da p.j.a.cock at googlemail.com ----- Marco Galardini Postdoctoral Fellow EMBL-EBI - European Bioinformatics Institute Wellcome Trust Genome Campus Hinxton, Cambridge CB10 1SD, UK Phone: +44 (0)1223 49 2547 From mjldehoon at yahoo.com Wed Mar 26 14:55:46 2014 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 26 Mar 2014 07:55:46 -0700 (PDT) Subject: [Biopython-dev] Fwd: [Fwd: missing NCBI DTDs] In-Reply-To: Message-ID: <1395845746.7662.YahooMailBasic@web164001.mail.gq1.yahoo.com> Hi Peter, On Wed, 3/26/14, Peter Cock wrote: > Long term not bundling the DTD files seems a good idea. > Being cautious we could bundle them for the next release, > see how the download mechanism works in the wild, and > drop the DTD files for the release after that? I don't think we need to be so cautious. > This would mean all the Entrez parser tests would require > internet access (even if using an old XML file on disk), But only the first time. After a DTD is downloaded, it is stored locally, and internet access won't be needed the next time the XML (or other XML files relying on the same DTD) is parsed. In my experience, using local DTDs is much much faster than accessing them through the internet for each XML file, so I would not advocate an internet-only solution. As an alternative to local storage, we could consider downloading all DTDs for each Biopython session, but keeping the results of parsing the DTD in memory (so we won't have to download each DTD over and over again if we're parsing many XML files). This can be almost as fast as using local storage, but will require internet access, and also Bio.Entrez would have to be changed. Best, -Michiel. From p.j.a.cock at googlemail.com Wed Mar 26 15:04:28 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 26 Mar 2014 15:04:28 +0000 Subject: [Biopython-dev] Fwd: [Fwd: missing NCBI DTDs] In-Reply-To: <1395845746.7662.YahooMailBasic@web164001.mail.gq1.yahoo.com> References: <1395845746.7662.YahooMailBasic@web164001.mail.gq1.yahoo.com> Message-ID: On Wed, Mar 26, 2014 at 2:55 PM, Michiel de Hoon wrote: > Hi Peter, > > On Wed, 3/26/14, Peter Cock wrote: >> Long term not bundling the DTD files seems a good idea. >> Being cautious we could bundle them for the next release, >> see how the download mechanism works in the wild, and >> drop the DTD files for the release after that? > > I don't think we need to be so cautious. OK. We could then get rid of the DTDs folder under Bio/Entrez and tweak the Entrez XML parsing tests to ensure they are only run if the internet is available. >> This would mean all the Entrez parser tests would require >> internet access (even if using an old XML file on disk), > > But only the first time. After a DTD is downloaded, it is stored > locally, and internet access won't be needed the next time the XML > (or other XML files relying on the same DTD) is parsed. Yes, but for many test environments, it is always the first time ;) e.g. TravisCI uses a clean VM for each test run. > In my experience, using local DTDs is much much faster than > accessing them through the internet for each XML file, so I > would not advocate an internet-only solution. Yes (I didn't mean to imply that - sorry for any confusion). > As an alternative to local storage, we could consider downloading > all DTDs for each Biopython session, but keeping the results of > parsing the DTD in memory (so we won't have to download each > DTD over and over again if we're parsing many XML files). > This can be almost as fast as using local storage, but will require > internet access, and also Bio.Entrez would have to be changed. A local cache (as implemented) seems fine to me. Peter From p.j.a.cock at googlemail.com Thu Mar 27 11:40:41 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 27 Mar 2014 11:40:41 +0000 Subject: [Biopython-dev] Biopython update talk at BOSC 2014 Message-ID: Hello all, As a co-chair of BOSC this year, I'd like to remind you all that the abstract deadline is about a week away now (April 4): http://news.open-bio.org/news/2014/03/bosc-2014-call-for-abstracts/ Also this year student presenters will get free BOSC registration: http://news.open-bio.org/news/2014/03/free-student-presenters-bosc-2014/ Would anyone like to volunteer to give this year's Biopython Update talk at BOSC 2014 in Boston? I would prefer one of the newer project members have a turn - but also I'll be busier than usual with BOSC organisation duties. Note that giving a talk often helps with getting travel funding to attend a meeting - and in addition to BOSC, you can combine the trip with the BOSC CodeFest beforehand and/or the ISMB meeting afterwards. Thanks, Peter From w.arindrarto at gmail.com Fri Mar 28 21:22:56 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 28 Mar 2014 22:22:56 +0100 Subject: [Biopython-dev] Biopython update talk at BOSC 2014 In-Reply-To: References: Message-ID: Hi Peter, everyone, If there are no objections from anyone, I would like to volunteer :). I am planning to come to ISMB anyway, though this isn't 100% confirmed as I am still applying for the visa. Cheers, Bowo On Mar 27, 2014 12:41 PM, "Peter Cock" wrote: > Hello all, > > As a co-chair of BOSC this year, I'd like to remind you all that the > abstract deadline is about a week away now (April 4): > > http://news.open-bio.org/news/2014/03/bosc-2014-call-for-abstracts/ > > Also this year student presenters will get free BOSC registration: > > http://news.open-bio.org/news/2014/03/free-student-presenters-bosc-2014/ > > Would anyone like to volunteer to give this year's Biopython Update > talk at BOSC 2014 in Boston? I would prefer one of the newer project > members have a turn - but also I'll be busier than usual with BOSC > organisation duties. > > Note that giving a talk often helps with getting travel funding to > attend a meeting - and in addition to BOSC, you can combine the > trip with the BOSC CodeFest beforehand and/or the ISMB meeting > afterwards. > > Thanks, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From ashok.bioinformatics at gmail.com Sun Mar 30 18:11:49 2014 From: ashok.bioinformatics at gmail.com (T. Ashok Kumar) Date: Sun, 30 Mar 2014 23:41:49 +0530 Subject: [Biopython-dev] Contributing code to Biopython Message-ID: Dear Sir/Madam, I wish to contribute a code on predicting *hydropathy plot of a protein sequence* using biopython. Please help me regarding this issue. -- *T. Ashok Kumar* Head, Department of Bioinformatics Noorul Islam College of Arts and Science Kumaracoil, Thuckalay - 629 180 Kanyakumari District, INDIA Mobile:- 00 91 9655307178 *E-Mail:* *ashok.bioinformatics at gmail.com *, *ashok at biogem.org * *Website:* *www.biogem.org * From p.j.a.cock at googlemail.com Mon Mar 31 09:12:51 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 31 Mar 2014 10:12:51 +0100 Subject: [Biopython-dev] Biopython update talk at BOSC 2014 In-Reply-To: References: Message-ID: Thanks for volunteering Bow :) I can send you the LaTeX files I used for the abstract in previous years (each talk gets one page in the BOSC abstract booklet). You should be able to find our past talks online, some as PDFs, some on SlideShare etc: http://biopython.org/wiki/Documentation#Presentations Peter On Fri, Mar 28, 2014 at 9:22 PM, Wibowo Arindrarto wrote: > Hi Peter, everyone, > > If there are no objections from anyone, I would like to volunteer :). > > I am planning to come to ISMB anyway, though this isn't 100% confirmed as I > am still applying for the visa. > > Cheers, > Bowo > > On Mar 27, 2014 12:41 PM, "Peter Cock" wrote: >> >> Hello all, >> >> As a co-chair of BOSC this year, I'd like to remind you all that the >> abstract deadline is about a week away now (April 4): >> >> http://news.open-bio.org/news/2014/03/bosc-2014-call-for-abstracts/ >> >> Also this year student presenters will get free BOSC registration: >> >> http://news.open-bio.org/news/2014/03/free-student-presenters-bosc-2014/ >> >> Would anyone like to volunteer to give this year's Biopython Update >> talk at BOSC 2014 in Boston? I would prefer one of the newer project >> members have a turn - but also I'll be busier than usual with BOSC >> organisation duties. >> >> Note that giving a talk often helps with getting travel funding to >> attend a meeting - and in addition to BOSC, you can combine the >> trip with the BOSC CodeFest beforehand and/or the ISMB meeting >> afterwards. >> >> Thanks, >> >> Peter >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Mon Mar 31 16:30:21 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 31 Mar 2014 17:30:21 +0100 Subject: [Biopython-dev] Fwd: [Numpy-discussion] EuroSciPy 2014 Call for Abstracts In-Reply-To: <87txathtux.fsf@chupacabra.windows.ebi.ac.uk> References: <20140320114815.70210fd5@grandao> <87txathtux.fsf@chupacabra.windows.ebi.ac.uk> Message-ID: I'm glad to see there are several people interested in EuroSciPy. One one of you like to submit a Biopython talk? The deadline is 14 April: https://www.euroscipy.org/2014/calls/abstracts/ Peter From w.arindrarto at gmail.com Mon Mar 31 21:08:50 2014 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Mon, 31 Mar 2014 23:08:50 +0200 Subject: [Biopython-dev] Biopython update talk at BOSC 2014 In-Reply-To: References: Message-ID: Hi Peter, everyone, A LaTeX template would be great :). I'm still preparing the abstract, should be ready for everyone to check soon. Cheers, Bow On Mon, Mar 31, 2014 at 11:12 AM, Peter Cock wrote: > Thanks for volunteering Bow :) > > I can send you the LaTeX files I used for the abstract in > previous years (each talk gets one page in the BOSC > abstract booklet). You should be able to find our past > talks online, some as PDFs, some on SlideShare etc: > http://biopython.org/wiki/Documentation#Presentations > > Peter > > On Fri, Mar 28, 2014 at 9:22 PM, Wibowo Arindrarto > wrote: >> Hi Peter, everyone, >> >> If there are no objections from anyone, I would like to volunteer :). >> >> I am planning to come to ISMB anyway, though this isn't 100% confirmed as I >> am still applying for the visa. >> >> Cheers, >> Bowo >> >> On Mar 27, 2014 12:41 PM, "Peter Cock" wrote: >>> >>> Hello all, >>> >>> As a co-chair of BOSC this year, I'd like to remind you all that the >>> abstract deadline is about a week away now (April 4): >>> >>> http://news.open-bio.org/news/2014/03/bosc-2014-call-for-abstracts/ >>> >>> Also this year student presenters will get free BOSC registration: >>> >>> http://news.open-bio.org/news/2014/03/free-student-presenters-bosc-2014/ >>> >>> Would anyone like to volunteer to give this year's Biopython Update >>> talk at BOSC 2014 in Boston? I would prefer one of the newer project >>> members have a turn - but also I'll be busier than usual with BOSC >>> organisation duties. >>> >>> Note that giving a talk often helps with getting travel funding to >>> attend a meeting - and in addition to BOSC, you can combine the >>> trip with the BOSC CodeFest beforehand and/or the ISMB meeting >>> afterwards. >>> >>> Thanks, >>> >>> Peter >>> _______________________________________________ >>> Biopython-dev mailing list >>> Biopython-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biopython-dev From marco.galardini at unifi.it Mon Mar 31 23:59:32 2014 From: marco.galardini at unifi.it (Marco Galardini) Date: Tue, 01 Apr 2014 00:59:32 +0100 Subject: [Biopython-dev] Interested in a Phenotype Microarray parser? In-Reply-To: <20140326142642.9o3hmuipcocsw8g8@webmail.unifi.it> References: <1389160399.57779.YahooMailBasic@web164004.mail.gq1.yahoo.com> <52CD2948.7050102@unifi.it> <533213FC.8010304@unifi.it> <20140326142642.9o3hmuipcocsw8g8@webmail.unifi.it> Message-ID: <533A0164.9010700@unifi.it> Hi, as suggested, I've made a few changes to the proposed Bio.Phenotype module (apart from the less-omics name). The PlateRecord object now can be indexed in a similar fashion as AlignIO multiple alignments: it is still possible to use the WellRecord identifier as an index, but when integers or slices are used, new sub-plates or single wells are returned. The system uses the well identifier as a mean to divide the plate into rows/column. Thanks for pointing out the AlignIO system, it has been very useful. I've left the two getColumns and getRows functions, since for some people it may still be useful to use the wells identifiers. If you feel like they are too confusing I can remove them. The updated branch is here: https://github.com/mgalardini/biopython/tree/phenomics Kind regards, Marco On 26/03/2014 13:26, Marco Galardini wrote: > Hi, > > many thanks for your comments, below some replies: > > ----- Messaggio da p.j.a.cock at googlemail.com --------- > Data: Wed, 26 Mar 2014 10:14:53 +0000 > Da: Peter Cock > Rispondi-A:Peter Cock > Oggetto: Re: [Biopython-dev] Interested in a Phenotype Microarray > parser? > A: Marco Galardini > Cc: Biopython-Dev Mailing List > > >> On Tue, Mar 25, 2014 at 11:40 PM, Marco Galardini >> wrote: >>> Hi all, >>> >>> following your suggestions (as well as the other modules >>> implementations) >>> I've just committed a couple of commits to my biopython fork, >>> featuring the >>> Bio.Phenomics module. >>> The module capabilities are limited to reading/writing Phenotype >>> Microarray >>> files and basic operations on the PlateRecord/WellRecord objects. >>> The module >>> requires numpy to interpolate the signal when the user request a >>> time point >>> that wasn't in the input file (this way the WellRecord object can be >>> queried >>> with slices). >>> I'm thinking on how to implement the parameters extraction from >>> WellRecord >>> objects without the use of scipy. >>> >>> Here's the link to my branch: >>> https://github.com/mgalardini/biopython/tree/phenomics >>> The module and functions have been documented taking inspiration >>> from the >>> other modules: hope they are clear enough for you to try it out. >>> Some example files can be found in Tests/Phenomics. >>> >>> Marco >> >> Hi Marco, >> >> I've not worked with kind of data so my comments are not on >> the application specifics. But I'm pleased to see unit tests :) >> >> One thought was while you define (Java like?) getRow and getColumn >> methods, your __getitem__ does not support (NumPy like) access, >> which is something we do for multiple sequence alignments. I guess >> while most plates are laid out in a grid, the row/column for each >> sample is not the most important thing - the sample identifier is? >> >> Thinking out loud, would properties `rows` and `columns` etc be >> nicer than `getRow` and `getColumn`, supporting iteration over >> the rows/columns/etc and indexing? > > Yeah, absolutely: I'll work on some changes to have a more > straightforward way to select multiple WellRecords on row/column basis. > >> >> Minor: Your longer function docstrings do not follow PEP257, >> specifically starting with a one line summary, then a blank line, >> then the details. Also you are using triple single-quotes, rather >> than triple double-quotes (like the rest of Biopthon). >> http://legacy.python.org/dev/peps/pep-0257/ > > Whoops, I'll change it, thanks > >> >> Peter >> >> P.S. Also, I'm not very keen on the module name, phenomics - >> I wonder if it would earn Biopython a badomics award? ;) >> http://dx.doi.org/10.1186/2047-217X-1-6 > > That's meta-omics right? :p > What about 'Phenotype' then? Maybe it's too general, but future > extensions may include other phenotypic readouts. > > Marco >> _______________________________________________ >> Biopython-dev mailing list >> Biopython-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biopython-dev >> > > > ----- Fine del messaggio da p.j.a.cock at googlemail.com ----- > > > > Marco Galardini > Postdoctoral Fellow > EMBL-EBI - European Bioinformatics Institute > Wellcome Trust Genome Campus > Hinxton, Cambridge CB10 1SD, UK > Phone: +44 (0)1223 49 2547 > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- ------------------------------------------------- Marco Galardini, PhD Dipartimento di Biologia Via Madonna del Piano, 6 - 50019 Sesto Fiorentino (FI) e-mail: marco.galardini at unifi.it www: http://www.unifi.it/dblage/CMpro-v-p-51.html phone: +39 055 4574737 mobile: +39 340 2808041 -------------------------------------------------