From redmine at redmine.open-bio.org Wed Feb 1 22:22:58 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 2 Feb 2012 03:22:58 +0000 Subject: [Biopython-dev] [Biopython - Bug #3320] (New) Bio.Phylo.PAML KeyError in codeml Message-ID: Issue #3320 has been reported by Timothee Flutre. ---------------------------------------- Bug #3320: Bio.Phylo.PAML KeyError in codeml https://redmine.open-bio.org/issues/3320 Author: Timothee Flutre Status: New Priority: Normal Assignee: Category: Target version: URL: I get the following error while using codeml in Bio.Phylo.PAML (same error for options "fix_rho" and "rho"):
Traceback (most recent call last):
  File "./PamlAnalysis.py", line 42, in main
    cml.read_ctl_file(genericCtlFile)
  File "/home/src/BIOPYTHON/lib/python/Bio/Phylo/PAML/codeml.py", line 133, in read_ctl_file
    raise KeyError, "Invalid option: %s" % option
KeyError: 'Invalid option: fix_rho'
I resolved the problem by adding the following two lines in the file Bio/Phylo/PAML/codeml.py at the lines 63-64:
                        "fix_rho": None,
                        "rho": None,
Such errors do not happen when using the example file "codeml.ctl" available with the "PAML":http://abacus.gene.ucl.ac.uk/software/paml.html archive (v4.4 or v4.5) as this file does contain neither the option "fix_rho" nor "rho". But these options are present in PAML "documentation":http://abacus.gene.ucl.ac.uk/software/pamlDOC.pdf (see p.33-34). Moreover, these two options are present in the file Bio/Phylo/PAML/baseml.py at the lines 49-50. Do I need to fork the biopython repository on github and make the changes myself? ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From b.invergo at gmail.com Thu Feb 2 04:26:57 2012 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 02 Feb 2012 10:26:57 +0100 Subject: [Biopython-dev] [Biopython - Bug #3320] (New) Bio.Phylo.PAML KeyError in codeml In-Reply-To: References: Message-ID: <1328174817.1038.1.camel@localhost.localdomain> I'll take care of this now... -brandon On Thu, 2012-02-02 at 03:22 +0000, redmine at redmine.open-bio.org wrote: > Issue #3320 has been reported by Timothee Flutre. > > ---------------------------------------- > Bug #3320: Bio.Phylo.PAML KeyError in codeml > https://redmine.open-bio.org/issues/3320 > > Author: Timothee Flutre > Status: New > Priority: Normal > Assignee: > Category: > Target version: > URL: > > > I get the following error while using codeml in Bio.Phylo.PAML (same error for options "fix_rho" and "rho"): > >
> Traceback (most recent call last):
>   File "./PamlAnalysis.py", line 42, in main
>     cml.read_ctl_file(genericCtlFile)
>   File "/home/src/BIOPYTHON/lib/python/Bio/Phylo/PAML/codeml.py", line 133, in read_ctl_file
>     raise KeyError, "Invalid option: %s" % option
> KeyError: 'Invalid option: fix_rho'
> 
> > I resolved the problem by adding the following two lines in the file Bio/Phylo/PAML/codeml.py at the lines 63-64: > >
>                         "fix_rho": None,
>                         "rho": None,
> 
> > Such errors do not happen when using the example file "codeml.ctl" available with the "PAML":http://abacus.gene.ucl.ac.uk/software/paml.html archive (v4.4 or v4.5) as this file does contain neither the option "fix_rho" nor "rho". But these options are present in PAML "documentation":http://abacus.gene.ucl.ac.uk/software/pamlDOC.pdf (see p.33-34). Moreover, these two options are present in the file Bio/Phylo/PAML/baseml.py at the lines 49-50. > > Do I need to fork the biopython repository on github and make the changes myself? > > > ---------------------------------------- > You have received this notification because this email was added to the New Issue Alert plugin > > From redmine at redmine.open-bio.org Thu Feb 2 05:02:31 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 2 Feb 2012 10:02:31 +0000 Subject: [Biopython-dev] [Biopython - Bug #3321] (New) Bio.Phylo.PAML.codeml fails to parse the omega tree (free-ratio model) Message-ID: Issue #3321 has been reported by Brandon Invergo. ---------------------------------------- Bug #3321: Bio.Phylo.PAML.codeml fails to parse the omega tree (free-ratio model) https://redmine.open-bio.org/issues/3321 Author: Brandon Invergo Status: New Priority: Normal Assignee: Brandon Invergo Category: Target version: URL: When using the free-ratio model of codeml, Bio.Phylo.PAML.codeml fails to parse the omega tree (the Newick tree following "w ratios as labels for TreeView:" in the codeml results file). ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Tue Feb 7 08:46:33 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 7 Feb 2012 13:46:33 +0000 Subject: [Biopython-dev] Fwd: [biopython] Added transform and copy method to Entity in biopython.PDB (#25) In-Reply-To: References: Message-ID: Sorry, didn't realise this was direct to me and not the mailing list. Would everyone be OK with git pull requests coming straight here? Anyway - Eric, would this be something you could look at? Peter ---------- Forwarded message ---------- From: benreynwar Date: Tue, Jan 24, 2012 at 11:12 PM Subject: [biopython] Added transform and copy method to Entity in biopython.PDB (#25) To: Peter Cock Minor changes to biopython.PDB to allow transforms to be applied to entities and copies of entities to be made. ?Also fixed a bug in the transform method of Atom. ?Tests are included. The changes are from a few months ago but they still merge cleanly into the current master. You can merge this Pull Request by running: ?git pull https://github.com/benreynwar/biopython master Or you can view, comment on it, or merge it online at: ?https://github.com/biopython/biopython/pull/25 -- Commit Summary -- * Added tranform method to Entity. * Add test for transform method. * Adding copy method for Entities * Added an insert method to Entity. ?This allows a child to be inserted into a specified position in the child_list which effects position in the PDB output. * Fixed bug in transform method of Atom (dot product order). -- File Changes -- M Bio/PDB/Atom.py (14) M Bio/PDB/Entity.py (40) M Tests/test_PDB.py (70) -- Patch Links -- ?https://github.com/biopython/biopython/pull/25.patch ?https://github.com/biopython/biopython/pull/25.diff --- Reply to this email directly or view it on GitHub: https://github.com/biopython/biopython/pull/25 From eric.talevich at gmail.com Tue Feb 7 10:36:28 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 7 Feb 2012 10:36:28 -0500 Subject: [Biopython-dev] [biopython] Added transform and copy method to Entity in biopython.PDB (#25) In-Reply-To: References: Message-ID: On Tue, Feb 7, 2012 at 8:46 AM, Peter Cock wrote: > Sorry, didn't realise this was direct to me and not the mailing list. > Would everyone be OK with git pull requests coming straight here? > > Anyway - Eric, would this be something you could look at? > > Peter > At a glance, the code looks cool to me, though I faintly recall some overlap with Jo?o's unmerged work (copy method). I'll try to find time to test it, but would not be upset if someone else got to it first. -E > ---------- Forwarded message ---------- > From: benreynwar > < > reply+i-2958669-14408e039dee774169d6f09c683146c3f42dd0b9-63959 at reply.github.com > > > Date: Tue, Jan 24, 2012 at 11:12 PM > Subject: [biopython] Added transform and copy method to Entity in > biopython.PDB (#25) > To: Peter Cock > > > Minor changes to biopython.PDB to allow transforms to be applied to > entities and copies of entities to be made. Also fixed a bug in the > transform method of Atom. Tests are included. > The changes are from a few months ago but they still merge cleanly > into the current master. > > You can merge this Pull Request by running: > > git pull https://github.com/benreynwar/biopython master > > Or you can view, comment on it, or merge it online at: > > https://github.com/biopython/biopython/pull/25 > > -- Commit Summary -- > > * Added tranform method to Entity. > * Add test for transform method. > * Adding copy method for Entities > * Added an insert method to Entity. This allows a child to be > inserted into a specified position in the child_list which effects > position in the PDB output. > * Fixed bug in transform method of Atom (dot product order). > > -- File Changes -- > > M Bio/PDB/Atom.py (14) > M Bio/PDB/Entity.py (40) > M Tests/test_PDB.py (70) > > -- Patch Links -- > > https://github.com/biopython/biopython/pull/25.patch > https://github.com/biopython/biopython/pull/25.diff > > --- > Reply to this email directly or view it on GitHub: > https://github.com/biopython/biopython/pull/25 > From redmine at redmine.open-bio.org Fri Feb 10 03:55:30 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Fri, 10 Feb 2012 08:55:30 +0000 Subject: [Biopython-dev] [Biopython - Bug #3323] (New) Bio.Phylo.draw should accept axes as optional argument Message-ID: Issue #3323 has been reported by Fabio Zanini. ---------------------------------------- Bug #3323: Bio.Phylo.draw should accept axes as optional argument https://redmine.open-bio.org/issues/3323 Author: Fabio Zanini Status: New Priority: Normal Assignee: Category: Target version: URL: Some months ago, the draw function has beeen added to Bio.Phylo. Although that function works, it always opens a new figure. I have slightly modified it to accept an additional optional argument, so that a pre-defined axes can be used to plot the phylogram in if needed. This makes it much easier to embed a phylogram into a grid of subplots. The file with the new function in attachment. If you prefer, I can git push it somewhere. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From eric.talevich at gmail.com Fri Feb 10 21:55:06 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 10 Feb 2012 21:55:06 -0500 Subject: [Biopython-dev] [biopython] Added transform and copy method to Entity in biopython.PDB (#25) In-Reply-To: References: Message-ID: On Tue, Feb 7, 2012 at 10:36 AM, Eric Talevich wrote: > On Tue, Feb 7, 2012 at 8:46 AM, Peter Cock wrote: > >> Sorry, didn't realise this was direct to me and not the mailing list. >> Would everyone be OK with git pull requests coming straight here? >> >> Anyway - Eric, would this be something you could look at? >> >> Peter >> > > At a glance, the code looks cool to me, though I faintly recall some > overlap with Jo?o's unmerged work (copy method). I'll try to find time to > test it, but would not be upset if someone else got to it first. > -E > > (Not sure who gets follow-up e-mails from Github, if anyone.) Could someone else confirm whether the last patch is correct? It switches the order of the dot product arguments in Atom.transform(). https://github.com/benreynwar/biopython/commit/346df8f2006735129a93508a04c4cdf6acb99a5f Code: def transform(self, rot, tran): """ Apply rotation and translation to the atomic coordinates. Example: >>> rotation=rotmat(pi, Vector(1,0,0)) >>> translation=array((0,0,1), 'f') >>> atom.transform(rotation, translation) @param rot: A right multiplying rotation matrix @type rot: 3x3 Numeric array @param tran: the translation vector @type tran: size 3 Numeric array """ - self.coord=numpy.dot(self.coord, rot)+tran + self.coord=numpy.dot(rot, self.coord)+tran This will break every script that uses the transform() method if we apply it. It also breaks the unit test, of course, but I can change the unit test to match if we accept this patch. It seems to me that which way is right is a matter of how the user specifies the input. I'm not a thinking man, so I don't entirely trust my judgment on this one. Thanks, Eric > >> ---------- Forwarded message ---------- >> From: benreynwar >> < >> reply+i-2958669-14408e039dee774169d6f09c683146c3f42dd0b9-63959 at reply.github.com >> > >> Date: Tue, Jan 24, 2012 at 11:12 PM >> Subject: [biopython] Added transform and copy method to Entity in >> biopython.PDB (#25) >> To: Peter Cock >> >> >> Minor changes to biopython.PDB to allow transforms to be applied to >> entities and copies of entities to be made. Also fixed a bug in the >> transform method of Atom. Tests are included. >> The changes are from a few months ago but they still merge cleanly >> into the current master. >> >> You can merge this Pull Request by running: >> >> git pull https://github.com/benreynwar/biopython master >> >> Or you can view, comment on it, or merge it online at: >> >> https://github.com/biopython/biopython/pull/25 >> >> -- Commit Summary -- >> >> * Added tranform method to Entity. >> * Add test for transform method. >> * Adding copy method for Entities >> * Added an insert method to Entity. This allows a child to be >> inserted into a specified position in the child_list which effects >> position in the PDB output. >> * Fixed bug in transform method of Atom (dot product order). >> >> -- File Changes -- >> >> M Bio/PDB/Atom.py (14) >> M Bio/PDB/Entity.py (40) >> M Tests/test_PDB.py (70) >> >> -- Patch Links -- >> >> https://github.com/biopython/biopython/pull/25.patch >> https://github.com/biopython/biopython/pull/25.diff >> >> --- >> Reply to this email directly or view it on GitHub: >> https://github.com/biopython/biopython/pull/25 >> > > From redmine at redmine.open-bio.org Sun Feb 12 10:37:25 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 12 Feb 2012 15:37:25 +0000 Subject: [Biopython-dev] [Biopython - Feature #3326] (New) MultipleSeqAlignment should support iterators, not only slice objects Message-ID: Issue #3326 has been reported by Fabio Zanini. ---------------------------------------- Feature #3326: MultipleSeqAlignment should support iterators, not only slice objects https://redmine.open-bio.org/issues/3326 Author: Fabio Zanini Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Currently, the MultipleSeqAlignment object supports slicing via various syntaxes, e.g.: - alignment[4,6] - alignment[2:4,3:6] - alignment[3:4:5] In the latter case, the indices build a so-called slice, a pure Python object, and MultipleSeqAlignment has an explicit if clause for dealing with this case. However, the user might want to iterate over the MSA using the more general *iterators*, e.g. from itertools, rather than simple slice objects. An extension that includes iterators looks easy: # Check whether the index is an iterator if (hasattr(index, 'next')) and (hasattr(index:, '__iter__')): return MultipleSeqAlignment([self._records[i] for i in index], self._alphabet) Would you think this is useful? ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sun Feb 12 10:37:26 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 12 Feb 2012 15:37:26 +0000 Subject: [Biopython-dev] [Biopython - Feature #3326] (New) MultipleSeqAlignment should support iterators, not only slice objects Message-ID: Issue #3326 has been reported by Fabio Zanini. ---------------------------------------- Feature #3326: MultipleSeqAlignment should support iterators, not only slice objects https://redmine.open-bio.org/issues/3326 Author: Fabio Zanini Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Currently, the MultipleSeqAlignment object supports slicing via various syntaxes, e.g.: - alignment[4,6] - alignment[2:4,3:6] - alignment[3:4:5] In the latter case, the indices build a so-called slice, a pure Python object, and MultipleSeqAlignment has an explicit if clause for dealing with this case. However, the user might want to iterate over the MSA using the more general *iterators*, e.g. from itertools, rather than simple slice objects. An extension that includes iterators looks easy: # Check whether the index is an iterator if (hasattr(index, 'next')) and (hasattr(index:, '__iter__')): return MultipleSeqAlignment([self._records[i] for i in index], self._alphabet) Would you think this is useful? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sun Feb 12 15:20:01 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 12 Feb 2012 20:20:01 +0000 Subject: [Biopython-dev] [Biopython - Feature #3326] MultipleSeqAlignment should support iterators, not only slice objects References: Message-ID: Issue #3326 has been updated by Peter Cock. Could you give a usage example or two, combining itertools with the alignment (after this change)? I don't really understand the aim here. ---------------------------------------- Feature #3326: MultipleSeqAlignment should support iterators, not only slice objects https://redmine.open-bio.org/issues/3326 Author: Fabio Zanini Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Currently, the MultipleSeqAlignment object supports slicing via various syntaxes, e.g.: - alignment[4,6] - alignment[2:4,3:6] - alignment[3:4:5] In the latter case, the indices build a so-called slice, a pure Python object, and MultipleSeqAlignment has an explicit if clause for dealing with this case. However, the user might want to iterate over the MSA using the more general *iterators*, e.g. from itertools, rather than simple slice objects. An extension that includes iterators looks easy: # Check whether the index is an iterator if (hasattr(index, 'next')) and (hasattr(index:, '__iter__')): return MultipleSeqAlignment([self._records[i] for i in index], self._alphabet) Would you think this is useful? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Mon Feb 13 03:24:02 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 13 Feb 2012 08:24:02 +0000 Subject: [Biopython-dev] [Biopython - Feature #3326] MultipleSeqAlignment should support iterators, not only slice objects References: Message-ID: Issue #3326 has been updated by Fabio Zanini. Sure, here the examples, one using a plain iter object, one using itertools. # to get a subalignment with only rows at indices 1,7 and 8, you could write:
 iterator = iter([1,7,8])
 alignment[iterator]
# you want a subalignment with only the indices from a list index_list that are True ater a certain filter index_filter, i.e. for which index_filter(index_list[i]) == True:
 from itertools import ifilter
 iterator = ifilter(index_filter, index_list)
 alignment[iterator]
The trivial example from the itertools website on this is the following:
 ifilter(lambda x: x%2, range(10)) --> 1 3 5 7 9
---------------------------------------- Feature #3326: MultipleSeqAlignment should support iterators, not only slice objects https://redmine.open-bio.org/issues/3326 Author: Fabio Zanini Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Currently, the MultipleSeqAlignment object supports slicing via various syntaxes, e.g.: - alignment[4,6] - alignment[2:4,3:6] - alignment[3:4:5] In the latter case, the indices build a so-called slice, a pure Python object, and MultipleSeqAlignment has an explicit if clause for dealing with this case. However, the user might want to iterate over the MSA using the more general *iterators*, e.g. from itertools, rather than simple slice objects. An extension that includes iterators looks easy: # Check whether the index is an iterator if (hasattr(index, 'next')) and (hasattr(index:, '__iter__')): return MultipleSeqAlignment([self._records[i] for i in index], self._alphabet) Would you think this is useful? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From thamelry at binf.ku.dk Mon Feb 13 06:57:41 2012 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Mon, 13 Feb 2012 12:57:41 +0100 Subject: [Biopython-dev] [biopython] Added transform and copy method to Entity in biopython.PDB (#25) In-Reply-To: References: Message-ID: On Sat, Feb 11, 2012 at 3:55 AM, Eric Talevich wrote: > - self.coord=numpy.dot(self.coord, rot)+tran > + self.coord=numpy.dot(rot, self.coord)+tran > > > This will break every script that uses the transform() method if we apply > it. It also breaks the unit test, of course, but I can change the unit test > to match if we accept this patch. > > It seems to me that which way is right is a matter of how the user > specifies the input. I'm not a thinking man, so I don't entirely trust my > judgment on this one. > Indeed. This is not a bug, the method simply assumes a right-multiplying matrix. Changing this will break many scripts for No Good Reason (TM). Cheers, -Thomas -- Thomas Hamelryck Assoc. Prof., University of Copenhagen, Denmark Visiting Prof., University of Leeds, UK Group leader Structural Bioinformatics Bioinformatics center, Department of Biology University of Copenhagen Ole Maaloes Vej 5 DK-2200 Copenhagen N Denmark From redmine at redmine.open-bio.org Thu Feb 16 05:11:55 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 16 Feb 2012 10:11:55 +0000 Subject: [Biopython-dev] [Biopython - Bug #3327] (New) HMMparse.py has some difficulty in loading hmmsearch result file Message-ID: Issue #3327 has been reported by ruan zheng. ---------------------------------------- Bug #3327: HMMparse.py has some difficulty in loading hmmsearch result file https://redmine.open-bio.org/issues/3327 Author: ruan zheng Status: New Priority: Normal Assignee: Category: Target version: URL: Hi, I just download the HMMparse.py file in want of dealing with my hmmsearch result. But I found a problem about using it to load my data. When I import HMMparse in python environment and type HMMparse.HMMparser('hmmsearch_result'), it reports a problem of "invalid literal for int() with base 10:". I try to locate the error. And I found in line 60 and 61 of HMMparse.py file, it missed a possible value of '[]'. By adding s != '[]' in the argument list, it works fine to me. I attached my sample file. Ruan Zheng ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Fri Feb 17 17:40:29 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 17 Feb 2012 22:40:29 +0000 Subject: [Biopython-dev] Fwd: [Utilities-announce] NCBI E-Utilities Update In-Reply-To: References: Message-ID: Hi all, Just FYI, the following was also changed in this week's Entrez update to EFetch 2.0 (see forwarded email below). This was breaking some Biopython scripts - depending on how they passed in the id parameters. It turns out we relied on the undocumented and now withdrawn form in one of our examples, so some users had copied this style. Biopython 1.59 will solve this. I know BioJava is looking at the more publicised changes to retmode - I don't know if BioPerl or BioRuby was affected. Regards, Peter ---------- Forwarded message ---------- From: Date: Fri, Feb 17, 2012 at 7:09 PM Subject: [Utilities-announce] NCBI E-Utilities Update To: NLM/NCBI List utilities-announce The most recent NCBI E-Utilities update includes a more stringent check for correct URL parameters. EFetch URLs with multiple IDs must be entered as: id=1,2,3 EFetch no longer accepts invalid URL parameters, e.g., id=1&id=2&id=3 Please see the online E-Utilities help for additional information: http://www.ncbi.nlm.nih.gov/books/NBK25500/ EFetch online help: http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch Thank you. _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From p.j.a.cock at googlemail.com Sat Feb 18 04:15:31 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 18 Feb 2012 09:15:31 +0000 Subject: [Biopython-dev] Biopython 1.59 plans Message-ID: Hello all, Based on the typical release schedule, we're a little overdue for releasing Biopython 1.59 - I would have raised this earlier but January was busy for me. With the recent NCBI EFetch change, and the workaround for it, it would be especially good to get the release out soon. I propose we release Biopython 1.59 in the second half of next week - essentially the master branch as it is. Most of the unit tests are also passing under PyPy (bar the C extensions, and external dependencies like NumPy) with the exception of some XML issues with the standard library. If we mark these as known failures and include them in the buildbot before then, we can announce this release as having (partial) PyPy support. Does anyone else want to do the release? If not, I can. Any comments on this? I'll start a new thread for plans for Biopython 1.60 - there are several exciting chunks of new code that looks near ready for release. Peter From p.j.a.cock at googlemail.com Sat Feb 18 04:39:06 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 18 Feb 2012 09:39:06 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond Message-ID: Hi all, Assuming we ship Biopython 1.59 soon (Feb 2012), we should start thinking about what is ready to merge to the trunk afterwards to be included in Biopython 1.60, and what else is being worked on beyond that. This might also help with GSoC project ideas. http://lists.open-bio.org/pipermail/biopython-dev/2012-February/009383.html ------------------------------------------- Here are some things that I think are strong candidates for 1.60 (not an exclusive list!) MAF support (the multiple alignment format) for AlignIO, including bespoke indexing (MAF specific). BGZF support: Low level module like Python's gzip, support in SeqIO for indexing BGZF compressed files, and probably also indexing large MAF files too (I've had some positive discussions off list about this). Brad's GFF code. People are using this already, so it probably is ready for inclusion (even if we do need some fine tuning for SeqIO integration). Official releases for Python 3 (even if we do call it a beta status release). Maybe we can even do this with Biopython 1.59? Most things are working (with the exception of some C code and missing third party dependencies), and my concerns about the memory overhead of unicode strings should be resolved with Python 3.3 (the parsing speed overhead perhaps not). -------------------------------------------- Other work at various stages: Ontologies, GO and OBO - several people are looking at this stuff but is anything "ready" yet? I can't see Chris Lasher's repository on github anymore. http://lists.open-bio.org/pipermail/biopython/2011-December/007682.html VCF format? Variant Call Format - Tiago what's you're impression of work in this area? I know there are other things but I'm struggling to recall them right now. If I've overlooked your work it isn't malice but forgetfulness - please reply with a status update. So, what other cool things are you all working on, and in particular what is ready or near-ready for inclusion with Biopython this year? Thanks, Peter From tiagoantao at gmail.com Sat Feb 18 05:53:37 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 18 Feb 2012 10:53:37 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: Hello, On Sat, Feb 18, 2012 at 9:39 AM, Peter Cock wrote: > So, what other cool things are you all working on, > and in particular what is ready or near-ready for > inclusion with Biopython this year? I have changed job 3 months ago and that has meant that I have been in a hell-hole of over-work for the last 3 months. A hell-hole that I now have crawled out (and with lots of new code written). I am now doing (more standard?) human evolutionary genetics. My previous experience with donating code to Biopython has me with mixed feelings: people use the applications a lot but very rarely the code directly (a cursory look at the citations of the applications vs the citations of Bio.PopGen clearly shows that). I now have written a LOT of code in slightly different areas, these might (or not) interest people: 1. Phasing/imputation: code to parse/convert between Beagle, Shapeit, phase, impute2. Typically to analyse SNP chips of human data (say between 0.5 and 1.5 Million SNPs per individual, 1000s of individuals). 2. plink: code to parse plink output files (quite trivial). People use plink a lot? 3. GO code: I would REALLY like to start a discussion on what should be a proper GO approach. In my case I am doing gene enrichment analysis. I might start a thread or a blog post on this... 4. Code to do multi-tasking. Actually Bio.PopGen has a scheduler to do multiple (external) tasks at the same time, but I have written a new one. Maybe the code does not belong into biopython, but a discussion could be done around such a issue (I suppose people doing analysis of lots of data have been having that problem, not just me?). 5. Some ensembl variation code: things like getting the ancestral SNP (versus the derived) or getting all the stuff (genes mostly) in a certain window position of the genome. On a side, I still have to do a 64 bit Windows port (remember?). This will have to be done from home, as my now work computer is a Pentium 4 (not precisely a modern 64 bit machine ;) ) Another issue that has been crossing my mind regards the inclusion of new code: In my case I would really like to have something like a "beta" version of the API: ie releasing something that is deemed "unstable" API wise (to get comments from the community) and then stabilize it. Concretly: in the first/second version people should expect the API to not be stable and have changes. Another side issue that I would like to discuss (maybe a different thread): Is how people are coping with large amounts of data using Python (or Perl/Ruby for that matter)? Specifically the problem of performance. As I see it, there is more and more the case of depending on external (fast) programs or CLib extensions or Java extensions to do the bulk of the work. Inner-loops in Python simply do not cut for speed. In the near future (this year) I will probably also be working with sequence data (BAM and VCF stuff might resurface) All for now, T -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From mjldehoon at yahoo.com Sat Feb 18 06:52:38 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 18 Feb 2012 03:52:38 -0800 (PST) Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: Message-ID: <1329565958.63072.YahooMailClassic@web161201.mail.bf1.yahoo.com> --- On Sat, 2/18/12, Tiago Ant?o wrote: > Another side issue that I would like to discuss (maybe a > different > thread): Is how people are coping with large amounts of data > using > Python (or Perl/Ruby for that matter)? Specifically the > problem of > performance. As I see it, there is more and more the case of > depending > on external (fast) programs or CLib extensions or Java > extensions to > do the bulk of the work. Inner-loops in Python simply do not > cut for speed. C extensions to Python such as pysam, together with outer-loops in Python/Biopython have been working very well for me. Perhaps at some point pysam can be included into Biopython, but as samtools is still evolving it makes sense for it to be a separate package so that it can be updated more quickly. I am more concerned about relying on external programs, in particular R. Notwithstanding the usefulness of rpy and rpy2, I would prefer to have a pure-Python or Python-with-C-extension solution, ideally as part of Biopython. -Michiel. From mjldehoon at yahoo.com Sat Feb 18 06:58:12 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 18 Feb 2012 03:58:12 -0800 (PST) Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: Message-ID: <1329566292.98292.YahooMailClassic@web161202.mail.bf1.yahoo.com> --- On Sat, 2/18/12, Peter Cock wrote: > So, what other cool things are you all working on, > and in particular what is ready or near-ready for > inclusion with Biopython this year? > I have written some scripts for microarray data analysis, including both file parsing and data normalization. This will need some discussion on biopython-dev to decide the appropriate data structures and functionality, so it won't be ready for 1.59, but perhaps it will be for 1.60. -Michiel. From eric.talevich at gmail.com Sat Feb 18 11:34:18 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Feb 2012 11:34:18 -0500 Subject: [Biopython-dev] Bio.Phylo bugs & pain points Message-ID: Folks, Since we're coming up on another release of Biopython, I'd like to identify any remaing bugs, pain points, aesthetic flaws, and minor missing features in Bio.Phylo. (And hopefully, fix them before the release.) In particular, the Phylo.draw() function, which plots a rooted phylogram with matplotlib, appeared in the last Biopython release unannounced. There are already many tree-drawing programs that produce beautiful publication-quality graphics, and we're not trying to compete with those. But we do want it to be useful for quickly visualizing a tree as you develop a script or modify a tree interactively in IPython, for example. So -- do the trees drawn by Phylo.draw() look right? Thanks, Eric From mjldehoon at yahoo.com Sat Feb 18 11:46:33 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 18 Feb 2012 08:46:33 -0800 (PST) Subject: [Biopython-dev] [Biopython] Bio.Phylo bugs & pain points In-Reply-To: Message-ID: <1329583593.82866.YahooMailClassic@web161202.mail.bf1.yahoo.com> Hi Eric, > But we do want it to be useful for quickly visualizing a > tree as you develop a script or modify a tree > interactively in IPython, for example. Do you need IPython or is regular Python sufficient? -Michiel. --- On Sat, 2/18/12, Eric Talevich wrote: > From: Eric Talevich > Subject: [Biopython] Bio.Phylo bugs & pain points > To: "BioPython Mailing List" , "BioPython-Dev Mailing List" > Date: Saturday, February 18, 2012, 11:34 AM > Folks, > > Since we're coming up on another release of Biopython, I'd > like to identify > any remaing bugs, pain points, aesthetic flaws, and minor > missing features > in Bio.Phylo. (And hopefully, fix them before the release.) > > In particular, the Phylo.draw() function, which plots a > rooted phylogram > with matplotlib, appeared in the last Biopython release > unannounced. There > are already many tree-drawing programs that produce > beautiful > publication-quality graphics, and we're not trying to > compete with those. > But we do want it to be useful for quickly visualizing a > tree as you > develop a script or modify a tree interactively in IPython, > for example. So > -- do the trees drawn by Phylo.draw() look right? > > Thanks, > Eric > _______________________________________________ > Biopython mailing list? -? Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From eric.talevich at gmail.com Sat Feb 18 11:52:16 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Feb 2012 11:52:16 -0500 Subject: [Biopython-dev] Have you used Bio.Phylo in a published study? Message-ID: Folks, Now that Bio.Phylo has reached a somewhat stable point, we're preparing a journal article on it. I'd like to mention and cite some published studies in which Bio.Phylo was used for some part of the analysis. Has anyone here published a study that relied on the Phylo module of Biopython? I know of two so far: http://www.biology-direct.com/content/6/1/34/ http://www.biomedcentral.com/1471-2148/11/321 Thanks, Eric From eric.talevich at gmail.com Sat Feb 18 11:54:03 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Feb 2012 11:54:03 -0500 Subject: [Biopython-dev] [Biopython] Bio.Phylo bugs & pain points In-Reply-To: <1329583593.82866.YahooMailClassic@web161202.mail.bf1.yahoo.com> References: <1329583593.82866.YahooMailClassic@web161202.mail.bf1.yahoo.com> Message-ID: On Sat, Feb 18, 2012 at 11:46 AM, Michiel de Hoon wrote: > Hi Eric, > > > But we do want it to be useful for quickly visualizing a > > tree as you develop a script or modify a tree > > interactively in IPython, for example. > > Do you need IPython or is regular Python sufficient? > > -Michiel. > Regular Python plus matplotlib is sufficient. IPython has convenient integration with pylab, that's all. -E > > --- On Sat, 2/18/12, Eric Talevich wrote: > > > From: Eric Talevich > > Subject: [Biopython] Bio.Phylo bugs & pain points > > To: "BioPython Mailing List" , > "BioPython-Dev Mailing List" > > Date: Saturday, February 18, 2012, 11:34 AM > > Folks, > > > > Since we're coming up on another release of Biopython, I'd > > like to identify > > any remaing bugs, pain points, aesthetic flaws, and minor > > missing features > > in Bio.Phylo. (And hopefully, fix them before the release.) > > > > In particular, the Phylo.draw() function, which plots a > > rooted phylogram > > with matplotlib, appeared in the last Biopython release > > unannounced. There > > are already many tree-drawing programs that produce > > beautiful > > publication-quality graphics, and we're not trying to > > compete with those. > > But we do want it to be useful for quickly visualizing a > > tree as you > > develop a script or modify a tree interactively in IPython, > > for example. So > > -- do the trees drawn by Phylo.draw() look right? > > > > Thanks, > > Eric > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > From eric.talevich at gmail.com Sat Feb 18 12:11:27 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Feb 2012 12:11:27 -0500 Subject: [Biopython-dev] Bio.Phylo bugs & pain points In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 11:34 AM, Eric Talevich wrote: > So -- do the trees drawn by Phylo.draw() look right? > > Here's how to get a quick tree, using a test file from the Biopython source distribution: >>> from Bio import Phylo >>> tree = Phylo.read("Tests/PhyloXML/apaf.xml", "phyloxml") >>> Phylo.draw(tree) If you don't have the Tests/ directory, you can use any other Newick, Nexus or PhyloXML tree; just change the file name and format name in the call to Phylo.read(). Thanks, Eric From p.j.a.cock at googlemail.com Sat Feb 18 13:14:34 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 18 Feb 2012 18:14:34 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: <1329565958.63072.YahooMailClassic@web161201.mail.bf1.yahoo.com> References: <1329565958.63072.YahooMailClassic@web161201.mail.bf1.yahoo.com> Message-ID: On Sat, Feb 18, 2012 at 11:52 AM, Michiel de Hoon wrote: > --- On Sat, 2/18/12, Tiago Ant?o wrote: >> Another side issue that I would like to discuss (maybe a >> different thread): Is how people are coping with large >> amounts of data using Python (or Perl/Ruby for that >> matter)? Specifically the problem of performance. As >> I see it, there is more and more the case of depending >> on external (fast) programs or CLib extensions or Java >> extensions to do the bulk of the work. Inner-loops in >> Python simply do not cut for speed. > > C extensions to Python such as pysam, together with > outer-loops in Python/Biopython have been working > very well for me. Perhaps at some point pysam can > be included into Biopython, but as samtools is still > evolving it makes sense for it to be a separate package > so that it can be updated more quickly. I've got some partial SAM/BAM code in pure Python, partly as a learning exercise for the format itself and issues around that. > I am more concerned about relying on external programs, > in particular R. Notwithstanding the usefulness of rpy and > rpy2, I would prefer to have a pure-Python or > Python-with-C-extension solution, ideally as part of > Biopython. Python with C extensions (e.g. via CPython?) certainly have their role to play - and should be much faster than calling separate binaries and parsing their output as the payback. However, pure Python is also getting a lot more interesting with PyPy getting better and better. Peter From p.j.a.cock at googlemail.com Sat Feb 18 13:22:07 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 18 Feb 2012 18:22:07 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: 2012/2/18 Tiago Ant?o : > 4. Code to do multi-tasking. Actually Bio.PopGen has a scheduler to do > multiple (external) tasks at the same time, but I have written a new > one. Maybe the code does not belong into biopython, but a discussion > could be done around such a issue (I suppose people doing analysis of > lots of data have been having that problem, not just me?). I hear good things about Python's multiprocessing. I don't really see that this is a Biopython issue per se, but a much more general one for high performance computing in Python. We should probably focus on providing some examples of how to do this with the best standard/external libraries. Right now I'm some simple job splitting/merging code in Python for Galaxy jobs - we've got an in house server now hooked up to our cluster, which will be the primary way for non-programmers in the institute to run large jobs (e.g. preprepared pipelines for protein annotation). > 5. Some ensembl variation code: things like getting the ancestral > SNP (versus the derived) or getting all the stuff (genes mostly) in > a certain window position of the genome. Sounds good - I've not had much cause to look at Ensembl through work, but there would be interest in using this from Python. > On a side, I still have to do a 64 bit Windows port (remember?). Oh yes - this issue is still in debate for NumPy/SciPy where they have some quite complex build interdependencies to sort out. Waiting for an official 64bit Windows NumPy was one of our stumbling blocks. > Another issue that has been crossing my mind regards the inclusion of > new code: In my case I would really like to have something like a > "beta" version of the API: ie releasing something that is deemed > "unstable" API wise (to get comments from the community) and then > stabilize it. Concretly: in the first/second version people should > expect the API to not be stable and have changes. We've sort of done that already in that we've said in the release notes that whole modules are new and experimental (beta), and subject to change. Are you thinking of more than that - e.g. an import time warning? Peter From tiagoantao at gmail.com Sat Feb 18 13:57:32 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 18 Feb 2012 18:57:32 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: 2012/2/18 Peter Cock : > We've sort of done that already in that we've said in the release > notes that whole modules are new and experimental (beta), and > subject to change. Are you thinking of more than that - e.g. an > import time warning? Did not know that (skipped a few things in the last few months, I am afraid). That is what I had in mind. -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From p.j.a.cock at googlemail.com Sat Feb 18 14:11:11 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 18 Feb 2012 19:11:11 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: 2012/2/18 Tiago Ant?o : > 2012/2/18 Peter Cock : >> We've sort of done that already in that we've said in the release >> notes that whole modules are new and experimental (beta), and >> subject to change. Are you thinking of more than that - e.g. an >> import time warning? > > Did not know that (skipped a few things in the last few months, I am > afraid). That is what I had in mind. Actually, looking back, the NEWS file doesn't really say this. Maybe I am thinking of some of the release announcements? Anyway - I am OK with a clear warning up front in the NEWS and accompanying release announcement email/post that a new module Bio.XXX is considered to be in beta-testing and that its API is subject to change. But I would hope this would mean the module was still complete enough and stable enough to be useful - and of course not going to impact anything else in Biopython. Do you think an import time warning on top of this would be prudent? Power users can silence it with the warnings library. That seems like a good way to increase visibility of somewhat experimental code, while not giving false impressions of its stability. Peter From eric.talevich at gmail.com Sat Feb 18 14:20:41 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Feb 2012 14:20:41 -0500 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 4:39 AM, Peter Cock wrote: > Hi all, > > Assuming we ship Biopython 1.59 soon (Feb 2012), > we should start thinking about what is ready to merge to > the trunk afterwards to be included in Biopython 1.60, > and what else is being worked on beyond that. This > might also help with GSoC project ideas. > http://lists.open-bio.org/pipermail/biopython-dev/2012-February/009383.html > > My thoughts: 1. Lingering GSoC code: (a) Jo?o has some interesting features on his own fork that are not yet complete. I believe he's using this code for his own work and ironing out the details, and will let us know when it's ready to merge. (b) Mikael Trellet's Interface work is good, but not yet merged. Soem dependencies on Jo?o's work, maybe? (c) Michele Silva wrote two modules under Bio.PDB that use mocapy++ to do stuff. It's based on Thomas Hamelryck's published work, with some modifications to improve Biopython integration. I think it could be merged cleanly at any time, under either Bio.PDB or the proposed Bio.Struct. (d) I squashed Nick Matzke's Bio.Geography branch into a single commit and put it on another branch on my public fork. It needs work, and some day I'll probably take care of that. 2. Bio.Phylo improvements: (a) ETE offered use a public-domain NeXML parser; "all" I have to do is copy it into our codebase and convert/wrap parts of it to fit Bio.Phylo's Tree/Clade object model. Or something along those lines. (b) I have a RAxML wrapper that's mostly complete. It will go in Bio.Phylo.Applications. (c) I'd like to add functions for majority-rules consensus and Robinson-Foulds tree distance. Then I'll consider it pretty much feature-complete, pending feature requests from users. 3. SeqIO read-only support for PDB files ( https://redmine.open-bio.org/issues/3295). I've been using this code on my own. It fails to parse at least one PDB file I care about (3BEG); I haven't tried it on a larger set of PDB files. In any case this shouldn't be too hard to fix, and I'd like to see it in a stable Biopython release. Official releases for Python 3 (even if we do call it > a beta status release). Maybe we can even do this > with Biopython 1.59? Most things are working (with > the exception of some C code and missing third > party dependencies), and my concerns about the > memory overhead of unicode strings should be > resolved with Python 3.3 (the parsing speed > overhead perhaps not). > > I sense that Python 3 users are aware there can be minor performance regressions relative to Python 2, and this applies to libraries too. (I could be mistaken.) So I'd be in favor of making Biopython 1.59 an official release for Python 3, but marked "beta" in the release notes since not all parts of Biopython are fully functional in Python 3. -Eric From tiagoantao at gmail.com Sat Feb 18 14:24:00 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 18 Feb 2012 19:24:00 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: <1329565958.63072.YahooMailClassic@web161201.mail.bf1.yahoo.com> References: <1329565958.63072.YahooMailClassic@web161201.mail.bf1.yahoo.com> Message-ID: On Sat, Feb 18, 2012 at 11:52 AM, Michiel de Hoon wrote: > I am more concerned about relying on external programs, in particular R. Notwithstanding the usefulness of rpy and rpy2, I would prefer to have a pure-Python or Python-with-C-extension solution, ideally as part of Biopython. The problem with the Python-with-C extensions is the Jython universe, which has a few users (I am pretty sure I am not the only one ;) ). In terms of discussion I would separate R (which can be linked via rpy2) from other external programs (executables called explicitly). I do not have a coherent view myself, only that: 1. Python is not computationally efficient for lots of stuff 2. There are apps that we need to talk with 3. Talking with external programs/apps/libraries will probably cause the solution not to be 100% portable. And, 1. external executables (not R) are the easiest dependencies to maintain as long as they exist in Win+Mac+Lin (or Java platform independent). 2. C (or Java) libs are not portable (but sometimes unavoidable?). Note that external executables are portable: In a Jython implementation you can call an external program but not a CLib (and vice-versa: In a CLib implementation you can also call an external program but not a Jav Lib 3. R dependencies are extremely burdensome: require management of R plus extra R packages. Outside of Biopython, I am on the verge of needing to write some efficient algorithms for my personal work, which I want to call from Python. I am currently leaning to do it in Java (or C as a second option) as a stand-alone executable (and then execute the external program from Python and parse the results). Not using the library (either Jython or CLib) approach. And yes, Java kicks Python in terms of speed. Big time. -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From tiagoantao at gmail.com Sat Feb 18 14:29:43 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 18 Feb 2012 19:29:43 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: 2012/2/18 Peter Cock : > Do you think an import time warning on top of this would be > prudent? Power users can silence it with the warnings library. > That seems like a good way to increase visibility of somewhat > experimental code, while not giving false impressions of its > stability. Oh surely. I never added complex code because I was afraid of breaking things in a second version. I think this is unavoidable: things never get really good at the first attempt - especially complex things. Therefore it would be good if we could warn the users on initial versions. Warn as in-their-face warnings. -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From p.j.a.cock at googlemail.com Sat Feb 18 14:54:06 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 18 Feb 2012 19:54:06 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 7:20 PM, Eric Talevich wrote: > > 3. SeqIO read-only support for PDB files > (https://redmine.open-bio.org/issues/3295). I've been using this code on my > own. It fails to parse at least one PDB file I care about (3BEG); I haven't > tried it on a larger set of PDB files. In any case this shouldn't be too > hard to fix, and I'd like to see it in a stable Biopython release. If right now it has known failures, I don't want to squeeze this into Biopython 1.59 next week. Does your code manage to produce the same FASTA sequence as the PDB themselves offer for download? That would be my expectation as an end user. It should be easy enough to test if you've already done a full local PDB download. I'm still uneasy about this making SeqIO depend on NumPy (even as a soft dependency at runtime), given the fact that the rest of SeqIO should work fine under Jython and PpPy. Support for the NumPy API under PyPy is coming along, but isn't likely for Jython for now (although PyPy's efforts may help there). Peter From eric.talevich at gmail.com Sat Feb 18 17:17:24 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Feb 2012 17:17:24 -0500 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 2:54 PM, Peter Cock wrote: > On Sat, Feb 18, 2012 at 7:20 PM, Eric Talevich > wrote: > > > > 3. SeqIO read-only support for PDB files > > (https://redmine.open-bio.org/issues/3295). I've been using this code > on my > > own. It fails to parse at least one PDB file I care about (3BEG); I > haven't > > tried it on a larger set of PDB files. In any case this shouldn't be too > > hard to fix, and I'd like to see it in a stable Biopython release. > > > If right now it has known failures, I don't want to squeeze this into > Biopython 1.59 next week. > Agreed! But 1.60 sounds like a good goal. > Does your code manage to produce the same FASTA sequence as > the PDB themselves offer for download? That would be my expectation > as an end user. It should be easy enough to test if you've already > done a full local PDB download. > If there are disordered regions (very common), the missing residues are replaced with 'X' characters. These residues can be listed in the SEQRES lines of the PDB header, if it's available, but they're not included with the atomic coordinates, so PdbIO can't reliably fill in these disordered residues for all PDB files. This matches the behavior of the tool I was using before (which is non-free and not widely used). I don't keep a local copy of PDB normally, but I'll download it and do the test before asking to merge PdbIO. > I'm still uneasy about this making SeqIO depend on NumPy (even as > a soft dependency at runtime), given the fact that the rest of SeqIO > should work fine under Jython and PpPy. Support for the NumPy > API under PyPy is coming along, but isn't likely for Jython for now > (although PyPy's efforts may help there). > > As an alternative, I could copy the portion of PDBParser and StructureBuilder that are needed to read the amino acid sequence, but skip creating Atoms. That would avoid the need for Numpy, at the cost of some code duplication. Interested in that approach? If so, I can take a closer look and report back on the feasibility. -Eric From p.j.a.cock at googlemail.com Sat Feb 18 17:24:02 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 18 Feb 2012 22:24:02 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 10:17 PM, Eric Talevich wrote: >> If right now it has known failures, I don't want to squeeze this into >> Biopython 1.59 next week. > > Agreed! But 1.60 sounds like a good goal. OK. >> Does your code manage to produce the same FASTA sequence as >> the PDB themselves offer for download? That would be my expectation >> as an end user. It should be easy enough to test if you've already >> done a full local PDB download. > > If there are disordered regions (very common), the missing residues are > replaced with 'X' characters. These residues can be listed in the SEQRES > lines of the PDB header, if it's available, but they're not included with > the atomic coordinates, so PdbIO can't reliably fill in these disordered > residues for all PDB files. This matches the behavior of the tool I was > using before (which is non-free and not widely used). > > I don't keep a local copy of PDB normally, but I'll download it and do the > test before asking to merge PdbIO. Great. >> I'm still uneasy about this making SeqIO depend on NumPy (even as >> a soft dependency at runtime), given the fact that the rest of SeqIO >> should work fine under Jython and PpPy. Support for the NumPy >> API under PyPy is coming along, but isn't likely for Jython for now >> (although PyPy's efforts may help there). > > As an alternative, I could copy the portion of PDBParser and > StructureBuilder that are needed to read the amino acid sequence, but skip > creating Atoms. That would avoid the need for Numpy, at the cost of some > code duplication. Interested in that approach? If so, I can take a closer > look and report back on the feasibility. Rather than literally copying it, do you think it is realistic to make some of Bio.PDB work without NumPy? e.g. fall back on tuples of floats (x,y,z) for atom co-ordinates. Just brainstorming - this might be a horrible idea? Peter From kellrott at gmail.com Sat Feb 18 20:19:09 2012 From: kellrott at gmail.com (Kyle) Date: Sat, 18 Feb 2012 17:19:09 -0800 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: > > Ontologies, GO and OBO - several people are looking > at this stuff but is anything "ready" yet? I can't see > Chris Lasher's repository on github anymore. > http://lists.open-bio.org/pipermail/biopython/2011-December/007682.html > I merged ntamas branch (which I think comes from Chris Lasher's branch), into https://github.com/kellrott/biopython/tree/gosupport Originally his code used NetworkX to provide graph support. I added in a class to provide that functionality (probably slower) should it throw an import error. Kyle From anaryin at gmail.com Mon Feb 20 09:30:23 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 20 Feb 2012 15:30:23 +0100 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: Hi all, Answering what "concerns" me :) > > > If there are disordered regions (very common), the missing residues are > > replaced with 'X' characters. These residues can be listed in the SEQRES > > lines of the PDB header, if it's available, but they're not included with > > the atomic coordinates, so PdbIO can't reliably fill in these disordered > > residues for all PDB files. This matches the behavior of the tool I was > > using before (which is non-free and not widely used). > The SEQRES contains the sequence used in the construct expressed and crystallized so it's never incomplete. What I've done in the past in these situations is iterate over the SEQRES and fill as '-' those residues that do not have coordinates. I don't know if I have any decent version of my MODELLER PIR format SeqIO stuff on github, but maybe we could work together to make it consistent (since what I wanted was PDB to seq essentially) ? Or maybe these are two different points of view for the same problem and need different solutions... https://github.com/JoaoRodrigues/biopython/tree/modeller-pirIO > Rather than literally copying it, do you think it is realistic to make > some of Bio.PDB work without NumPy? e.g. fall back on tuples > of floats (x,y,z) for atom co-ordinates. Just brainstorming - this > might be a horrible idea? > I kind of disagree because otherwise we'd have to convert them to numpy arrays everytime we need them. Regarding my own work, I've been slowly working on cleaning a bit Bio.PDB (for example, all those get_X methods that just return class attributes) and organising my own GSoC code into it and in Bio.Struct. I don't know when I have this even "alpha"-testable, it's been a long road and I had a couple of computer crashes that made me lose my data so.. When would there be a soft deadline for 1.60? Best, Jo?o From eric.talevich at gmail.com Mon Feb 20 11:20:33 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 20 Feb 2012 11:20:33 -0500 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: Hey Jo?o, On Mon, Feb 20, 2012 at 9:30 AM, Jo?o Rodrigues wrote: > Hi all, > > Answering what "concerns" me :) > > > >> > If there are disordered regions (very common), the missing residues are >> > replaced with 'X' characters. These residues can be listed in the SEQRES >> > lines of the PDB header, if it's available, but they're not included >> with >> > the atomic coordinates, so PdbIO can't reliably fill in these disordered >> > residues for all PDB files. This matches the behavior of the tool I was >> > using before (which is non-free and not widely used). >> > > The SEQRES contains the sequence used in the construct expressed and > crystallized so it's never incomplete. What I've done in the past in these > situations is iterate over the SEQRES and fill as '-' those residues that > do not have coordinates. > OK, we should implement that then. Perhaps we can avoid both the conditional numpy/PDB import and code duplication if we let parse_pdb_header call SeqIO.PdbIO for SEQRES lines. What about PDB files that don't have SEQRES lines? Should we... - Fall back to ATOM parsing automatically - Allow a flag for fallback (use_atoms_if_absolutely_must=False) - Require the user to specify whether to use SEQRES or ATOMs (use_seqres=True) - Use different format names, e.g. "pdb-seqres" and "pdb"/"pdb-atom"? Keeping in mind that secondary structure is also best represented as a SeqRecord, we could use "pdb-ss" or similar as another format eventually. > I don't know if I have any decent version of my MODELLER PIR format SeqIO > stuff on github, but maybe we could work together to make it consistent > (since what I wanted was PDB to seq essentially) ? Or maybe these are two > different points of view for the same problem and need different > solutions... > > https://github.com/JoaoRodrigues/biopython/tree/modeller-pirIO > Let's try to decouple these. I remember the original use case -- our goal would be to create Modeller-ready files with code like: target = SeqIO.read("foo.fa", "fasta") template = SeqIO.read("bar.pdb", "pdb") aln = ... # Pairwise alignment AlignIO.write(aln, "foobar.pir", "pir") How much more information would we need to extract from the PDB file (that isn't normally in a SeqRecord) to satisfy Modeller? > Rather than literally copying it, do you think it is realistic to make >> some of Bio.PDB work without NumPy? e.g. fall back on tuples >> of floats (x,y,z) for atom co-ordinates. Just brainstorming - this >> might be a horrible idea? >> > > I kind of disagree because otherwise we'd have to convert them to numpy > arrays everytime we need them. > For atomic coordinates, I don't think there's a pressing need to make numpy optional, but perhaps we could refactor parse_pdb_header to work without loading numpy. That would give use access to SEQRES lines, secondary structure, PDB ID, deposition date, etc. if they're specified in the header. > Regarding my own work, I've been slowly working on cleaning a bit Bio.PDB > (for example, all those get_X methods that just return class attributes) > and organising my own GSoC code into it and in Bio.Struct. I don't know > when I have this even "alpha"-testable, it's been a long road and I had a > couple of computer crashes that made me lose my data so.. When would there > be a soft deadline for 1.60? > > Cool, no worries about the timeline. I think it's generally best if major new feature sets are merged shortly after a stable release, so bleeding-edge users (like us) have time to use the new code in a variety of situations and find bugs and design issues. However, if you have a stub of Bio/Struct/__init__.py that you feel is ready to merge right after this week's release, I think we could start there and add new features under that namespace in the coming months. Cheers, Eric From b.invergo at gmail.com Mon Feb 20 12:17:19 2012 From: b.invergo at gmail.com (Brandon Invergo) Date: Mon, 20 Feb 2012 18:17:19 +0100 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: <1329758239.970.13.camel@localhost.localdomain> Hi Peter, After getting bitten by some subtle output differences between different versions of the PAML programs, I've been writing much stricter unit tests for the parsing routines. I have the big one, codeml, done, so I just have to do the output for the two smaller programs, baseml and yn00. So far, I've only had to change one line in a Bio.Phylo.PAML file to accommodate a parsing mistake for the oldest supported codeml version. It's related to a purely informational line in the output rather than to any generated results. Combined, the insignificance of the changed line and the extremely old software version that produces the difference mean that this change in the code is not mission critical. The testing code and the directory of PAML test resources is, however, significantly different. I can probably have the other two parts done by, say, Wednesday. The question, though, is whether this is worth trying to pull in for Biopython 1.59 or should I hold of on the pull request until after the release? Cheers, Brandon On Sat, 2012-02-18 at 09:15 +0000, Peter Cock wrote: > Hello all, > > Based on the typical release schedule, we're a little overdue > for releasing Biopython 1.59 - I would have raised this earlier > but January was busy for me. With the recent NCBI EFetch > change, and the workaround for it, it would be especially > good to get the release out soon. > > I propose we release Biopython 1.59 in the second half of > next week - essentially the master branch as it is. > > Most of the unit tests are also passing under PyPy (bar > the C extensions, and external dependencies like NumPy) > with the exception of some XML issues with the standard > library. If we mark these as known failures and include > them in the buildbot before then, we can announce this > release as having (partial) PyPy support. > > Does anyone else want to do the release? If not, I can. > > Any comments on this? I'll start a new thread for plans > for Biopython 1.60 - there are several exciting chunks > of new code that looks near ready for release. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Mon Feb 20 13:03:44 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 20 Feb 2012 18:03:44 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: <1329758239.970.13.camel@localhost.localdomain> References: <1329758239.970.13.camel@localhost.localdomain> Message-ID: On Monday, February 20, 2012, Brandon Invergo wrote: > Hi Peter, > > After getting bitten by some subtle output differences between different > versions of the PAML programs, I've been writing much stricter unit > tests for the parsing routines. I have the big one, codeml, done, so I > just have to do the output for the two smaller programs, baseml and > yn00. > > So far, I've only had to change one line in a Bio.Phylo.PAML file to > accommodate a parsing mistake for the oldest supported codeml version. > It's related to a purely informational line in the output rather than to > any generated results. Combined, the insignificance of the changed line > and the extremely old software version that produces the difference mean > that this change in the code is not mission critical. > > The testing code and the directory of PAML test resources is, however, > significantly different. > > I can probably have the other two parts done by, say, Wednesday. The > question, though, is whether this is worth trying to pull in for > Biopython 1.59 or should I hold of on the pull request until after the > release? > > I'm tied up in a workshop Monday (today) and Tuesday (tomorrow), so Wednesday was the earliest date anyway. If you have these extended unit tests done by Wednesday that would be great. Thanks, Peter From redmine at redmine.open-bio.org Mon Feb 20 17:58:38 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 20 Feb 2012 22:58:38 +0000 Subject: [Biopython-dev] [Biopython - Feature #3326] MultipleSeqAlignment should support iterators, not only slice objects References: Message-ID: Issue #3326 has been updated by Peter Cock. It strikes me that while Python sequences don't support this at all, numpy arrays do allow indexing with a list - but surprisingly perhaps not an iterator. I imaging the problem with iterators is when you have more than one dimension (here we have slicing in one or two dimensions), and the fact you'd be forced to cache the iterator values in a list. On balance, I would recommend doing this instead: new_align = MultipleSeqAlignment(old_align[i] for i in row_iter) Please bring this up on the mailing list if you want - we might spark some discussion and brainstorming. ---------------------------------------- Feature #3326: MultipleSeqAlignment should support iterators, not only slice objects https://redmine.open-bio.org/issues/3326 Author: Fabio Zanini Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Currently, the MultipleSeqAlignment object supports slicing via various syntaxes, e.g.: - alignment[4,6] - alignment[2:4,3:6] - alignment[3:4:5] In the latter case, the indices build a so-called slice, a pure Python object, and MultipleSeqAlignment has an explicit if clause for dealing with this case. However, the user might want to iterate over the MSA using the more general *iterators*, e.g. from itertools, rather than simple slice objects. An extension that includes iterators looks easy: # Check whether the index is an iterator if (hasattr(index, 'next')) and (hasattr(index:, '__iter__')): return MultipleSeqAlignment([self._records[i] for i in index], self._alphabet) Would you think this is useful? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Mon Feb 20 18:06:51 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 20 Feb 2012 23:06:51 +0000 Subject: [Biopython-dev] Optional libraries in README file (etc) Message-ID: Hi all, Do we need to have a new section in the README file for other soft dependencies like NetworkX? This file is now especially important as it gets shown on github for the project: https://github.com/biopython/biopython/blob/master/README Similarly, does the long installation manual need some work? https://github.com/biopython/biopython/blob/master/Doc/install/Installation.tex http://biopython.org/DIST/docs/install/Installation.html http://biopython.org/DIST/docs/install/Installation.pdf Peter From eric.talevich at gmail.com Tue Feb 21 09:53:20 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 21 Feb 2012 09:53:20 -0500 Subject: [Biopython-dev] Optional libraries in README file (etc) In-Reply-To: References: Message-ID: On Mon, Feb 20, 2012 at 6:06 PM, Peter Cock wrote: > Hi all, > > Do we need to have a new section in the README file for other soft > dependencies like NetworkX? This file is now especially important > as it gets shown on github for the project: > > https://github.com/biopython/biopython/blob/master/README > It would be good to have this info in the README. These would also be meaningful entries in the Debian package for Biopython, where soft dependencies would go in the "Suggests" field and probably just NumPy would go in "Recommends". A couple soft dependencies not listed in the README are networkx (which in turn depends on pygraphviz|pydot and graphviz in Debian) and matplotlib|pylab, both for Bio.Phylo._utils. > Similarly, does the long installation manual need some work? > > https://github.com/biopython/biopython/blob/master/Doc/install/Installation.tex > http://biopython.org/DIST/docs/install/Installation.html > http://biopython.org/DIST/docs/install/Installation.pdf > > I guess mxTextTools can probably go, now. Does MMCIFlex still work? The easy_install and pip approaches for installing Biopython and its dependencies should probably be prominent. Maybe an extra layer of nesting in the outline, like: 1. Installing Python 2. Installing pre-built packages (the easy way) - PyPI & pip / easy_install - On Linux: Biopython and NumPy are packaged for Ubuntu/Debian and presumably other Linux distros 3. Installing from source - Dependencies - Biopython - Installing with non-admin permissions (Unix/Mac) 4. Testing 5. Third-party tools -E From redmine at redmine.open-bio.org Tue Feb 21 16:36:30 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 21 Feb 2012 21:36:30 +0000 Subject: [Biopython-dev] [Biopython - Feature #3326] MultipleSeqAlignment should support iterators, not only slice objects References: Message-ID: Issue #3326 has been updated by Fabio Zanini. Right, neither Python nor Numpy support iterators, for different reasons - AFAIK. # Python lists actually do support it, kind of; that is the idea behind *list comprehensions*:
 new_list = [rec for rec in iterator]
does exactly this! # Numpy probably avoids it for problems when extending to many dimensions, as you mentioned. Multiple Sequence Alignments, however, are intrinsically two dimensional, and have no easy list comprehension. Your compromise is what I am proposing as well. This needs two steps: # we check that the index object supports _for_ cycles, i.e. has an __iter__ method (see http://docs.python.org/library/stdtypes.html#iterator-types):
 if hasattr(index, '__iter__'):
# we generate the new MSA by a for cycle:
 return MultipleSeqAlignment((self._records[i] for i in index), self._alphabet)
Note that double slicing is not really an issue, since in that case *we are already using that method*! In fact, we now have:
 #Handle double indexing
 [...]
 else:
     #e.g. sub_align = align[1:4, 5:7], gives another alignment
     return MultipleSeqAlignment((rec[col_index] for rec in self._records[row_index]), self._alphabet)
We would only need to modify this easily to:
 if hasattr(row_index, '__iter__'):
     return MultipleSeqAlignment((self._record[i][col_index] for i in row_index), self._alphabet)
Finally, I would gladly post to the mailing list. You mean the Biopython-Dev Mailing List , right? ---------------------------------------- Feature #3326: MultipleSeqAlignment should support iterators, not only slice objects https://redmine.open-bio.org/issues/3326 Author: Fabio Zanini Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Currently, the MultipleSeqAlignment object supports slicing via various syntaxes, e.g.: - alignment[4,6] - alignment[2:4,3:6] - alignment[3:4:5] In the latter case, the indices build a so-called slice, a pure Python object, and MultipleSeqAlignment has an explicit if clause for dealing with this case. However, the user might want to iterate over the MSA using the more general *iterators*, e.g. from itertools, rather than simple slice objects. An extension that includes iterators looks easy: # Check whether the index is an iterator if (hasattr(index, 'next')) and (hasattr(index:, '__iter__')): return MultipleSeqAlignment([self._records[i] for i in index], self._alphabet) Would you think this is useful? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From fabio.zanini at fastmail.fm Tue Feb 21 16:44:40 2012 From: fabio.zanini at fastmail.fm (Fabio Zanini) Date: Tue, 21 Feb 2012 22:44:40 +0100 Subject: [Biopython-dev] Should MultipleSequenceAlignment support iterator slicing? Message-ID: <20120221214440.GB2430@X200.local> Hi all! I am using the MultipleSequenceAlignment class a lot these days, and would find it useful to get subalignments using python iterators. I started a discussion on the issue tracker: https://redmine.open-bio.org/issues/3326 Short version: I would like to do things like alignment[[4,5,8]] to get a subalignment with the 5th, 6th, and 9th rows. This syntax is not working at present, but can be implemented, for single as well as double indices, in a very simple way. For instance, for the single index case, if hasattr(index, '__iter__'): return MultipleSeqAlignment((self._records[i] for i in index), self._alphabet) Questions? Doubts? Cheers, Fabio From p.j.a.cock at googlemail.com Tue Feb 21 17:50:39 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 21 Feb 2012 22:50:39 +0000 Subject: [Biopython-dev] Should MultipleSequenceAlignment support iterator slicing? In-Reply-To: <20120221214440.GB2430@X200.local> References: <20120221214440.GB2430@X200.local> Message-ID: On Tue, Feb 21, 2012 at 9:44 PM, Fabio Zanini wrote: > Hi all! > > I am using the MultipleSequenceAlignment class a lot these days, and > would find it useful to get subalignments using python iterators. I > started a discussion on the issue tracker: > > https://redmine.open-bio.org/issues/3326 > > Short version: I would like to do things like > > alignment[[4,5,8]] > > to get a subalignment with the 5th, 6th, and 9th rows. This syntax is > not working at present, but can be implemented, for single as well as > double indices, in a very simple way. For instance, for the single index > case, > > if hasattr(index, '__iter__'): > ? ?return MultipleSeqAlignment((self._records[i] for i in index), > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? self._alphabet) > > Questions? Doubts? > > Cheers, > Fabio As I said on the bug, there are parallels with numpy arrays allowing indexing with lists (but not iterators). The problem with iterator indices for numpy arrays is you may have many axis - but an iterator can only be looped over once. This effectively means the iterator would have to be expanded into a list inside the __getitem__ code. This isn't so critical with multiple sequence alignments where we have just two dimensions. Supporting numpy array list list indexing should cover most use cases, including things like producing a resampled alignment for phylogenetic tree bootstrapping where random columns are selected. Does that sound useful enough to add (for rows and cols)? i.e. support row/col index lists - but not iterators? Peter From p.j.a.cock at googlemail.com Thu Feb 23 08:43:02 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 23 Feb 2012 13:43:02 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: <1329758239.970.13.camel@localhost.localdomain> Message-ID: On Mon, Feb 20, 2012 at 6:03 PM, Peter Cock wrote: > > > On Monday, February 20, 2012, Brandon Invergo wrote: >> I can probably have the other two parts done by, say, Wednesday. The >> question, though, is whether this is worth trying to pull in for >> Biopython 1.59 or should I hold of on the pull request until after the >> release? > > I'm tied up in a workshop Monday (today) and Tuesday (tomorrow), > so Wednesday was the earliest date anyway. If you have these > extended unit tests?done by Wednesday that would be great. That's checked in now, and the buildslaves are all green (good). Thanks Brandon. Any last minute requests for Biopython 1.59? I can hold off till tomorrow... or next week if we need to. Peter From p.j.a.cock at googlemail.com Thu Feb 23 10:26:31 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 23 Feb 2012 15:26:31 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 9:15 AM, Peter Cock wrote: > Most of the unit tests are also passing under PyPy (bar > the C extensions, and external dependencies like NumPy) > with the exception of some XML issues with the standard > library. If we mark these as known failures and include > them in the buildbot before then, we can announce this > release as having (partial) PyPy support. I forgot ongoing headaches from handle 'leaks' too - for instance test_SeqIO_index.py fails apparently due to the delay in GC leading to a delay in the closing of handles: http://doc.pypy.org/en/latest/cpython_differences.html#differences-related-to-garbage-collection-strategies Peter From redmine at redmine.open-bio.org Thu Feb 23 12:17:59 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 23 Feb 2012 17:17:59 +0000 Subject: [Biopython-dev] [Biopython - Bug #2597] Enforce alphabet letters in Seq objects References: Message-ID: Issue #2597 has been updated by Eric Talevich. It would also be useful to be able to validate alphabets when constructing Seqs or SeqRecords from scratch. Here's a proposal that I believe fits with most of what's been agreed to so far. In Bio/Alphabet/__init__.py, replace _verify_alphabet with an efficiently implemented method on the Alphabet class and perhaps make it public:
def validate(self, sequence):
    """Raise a ValueError if sequence contains letters not allowed by alphabet.

    If alphabet does not define letters, it's all OK.
    ...
    """
    ok_letters = set(self.letters)
    if ok_letters:
        bad_letters = set(str(sequence)) - ok_letters
        if bad_letters:
            raise ValueError("Alphabet does not accept these letters: "
                             + ''.join(bad_letters))
In the Seq class, optionally add a method 'check_alphabet' which wraps Alphabet.validate:
def check_alphabet(self):
    self.alphabet.validate(self.data)
In SeqIO.parse and SeqIO.read, add an option check_alphabet=False, which calls either Alphabet.validate(seq) or seq.check_alphabet(). If validation fails, the exception is propagated up. I don't know how much this would affect performance, but it seems that users are willing to accept a small performance hit if they explicitly opt into validation. The extra 'if' statement may or may not be noticeable in the default case. ---------------------------------------- Bug #2597: Enforce alphabet letters in Seq objects https://redmine.open-bio.org/issues/2597 Author: Peter Cock Status: In Progress Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: Not Applicable URL: If a Seq object is created with an alphabet with a pre-defined set of letters (e.g. the IUPAC alphabets) then I think Biopython should validate that the sequence does indeed only use those letters. This will catch mis-use of ambiguous sequences with non-ambiguous alphabets, letters in an unexpected case, and most importantly any unexpected symbols (e.g. from a parsing problem). This will impose a performance overhead - which can be avoided if the user instead chooses to use a generic dna/rna/protein alphabet which does not list the letters expected. Note that we will have to resolve Bug 2532 before doing this, as currently some parts of Biopython are mis-using the upper case only IUPAC alphabet objects with mixed case sequences. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Thu Feb 23 19:24:01 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 00:24:01 +0000 Subject: [Biopython-dev] Entrez documentation and DTD checking Message-ID: Hi all, Prompted by this query I did check the Tutorial and docstrings and updated most (hopefully all) the efetch examples to include the now required retmode="text" argument. I also found a few missing DTD files as well while writing some doctests for Bio.Entrez (not sure right now how to integrate these into our run_tests.py framework while making them conditional on the --offline switch which we use on the buildbot). I'd be grateful if anyone has time to check for any other examples that need updating, or further NCBI DTD files we should include. This page claims nothing has changed since 2009 which is wrong: http://www.ncbi.nlm.nih.gov/corehtml/query/DTD/index.shtml This page claims nothing has changed since 2002 which is worse: http://www.ncbi.nlm.nih.gov/data_specs/dtd/other/entrez/ There are a few recent DTD files listed here though they may not apply to Entrez: http://www.ncbi.nlm.nih.gov/dtd/ or alternatively http://www.ncbi.nlm.nih.gov/data_specs/dtd/ Peter ---------- Forwarded message ---------- From: Peter Cock Date: 2012/2/23 Subject: Re: [Biopython] Entrez and SeqIO "no records found in handle" To: "????(Feng GAO)" Cc: "biopython at lists.open-bio.org" 2012/2/23 ????(Feng GAO) : > Hi all, > We have some python code using gi number to get record from Genbank. > Part of the code is: > > handle = Entrez.efetch(db="protein", id=ID, rettype="gb") > record = SeqIO.read(handle,"genbank") > > We have had no problem with this code > until this week when we started getting "ValueError: No records found in handle". > Anyone have an idea how to fix it now? Thanks! > Feng Try using an explicit retmode="text" in the efetch call. The NCBI changed the defaults with EFetch 2.0, which went live earlier this month. You're probably getting XML back instead. Note to self: I wonder if the Biopython tutorial examples need to be updated as well... Peter From p.j.a.cock at googlemail.com Fri Feb 24 08:16:10 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 13:16:10 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 12:02 PM, James Casbon wrote: > Hi Peter et al, > > Bit late on this, but... > > On 18 February 2012 09:39, Peter Cock wrote: >> VCF format? Variant Call Format - Tiago what's >> you're impression of work in this area? Ahem, you're -> your. > If you think the original license is compatible I'd be happy to fold > PyVCF into biopython, if it fits. Excellent. > Aaron Quinlan is evaluating VCF parsers at the moment (not sure if > he's on this list), so he can probably give you some good feedback. Sounds good. > Some unknowns are: > 1. using cython/binding to a c library for speed (my preference is > probably pure python for pypy compatibility) We've not had any cython dependency so far, but it may be desirable in the future rather that writing lots of boilerplate code for calling C libraries. However, I'd hope for a pure Python fall back for Jython and PyPy etc. I presume Windows would be OK? > 2. where is BCF going, is that going to be important for a VCF lib? Not sure. > 3. There is an optional dependency on pysam, how does that fit with > biopython? ?(other replies in this thread indicate this is already the > case?) If it was a run time dependency on pysam that is workable. I'm unclear if pysam supports Windows, while Python 3 is still pending. Again, my preference is for a pure Python SAM/BAM library (and that is doable), possibly as a fallback for compiled code. > I would also like to know: > 1. is there existing variant code in biopython that a parser needs > integration with? Might this tie in with any of the population genetics code? > 2. are there other (perhaps more promising) formats that we would like > parse into the same kind of representation, e.g. > http://genomebiology.com/2010/11/8/R88/abstract I don't know. Peter From p.j.a.cock at googlemail.com Fri Feb 24 08:30:52 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 13:30:52 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: Hello all, No git commits to the master please until further notice - I'm going to do the release now. Peter From p.j.a.cock at googlemail.com Fri Feb 24 09:35:28 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 14:35:28 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 1:30 PM, Peter Cock wrote: > Hello all, > > No git commits to the master please until further notice - I'm going > to do the release now. > > Peter OK, git tagged, and release files and installers done, uploaded here as usual: http://biopython.org/DIST/ If anyone would like to grab and check those that would be great. I haven't pushed this to pypy yet either. Updated API files and Tutorial now live: http://biopython.org/DIST/docs/api/ http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf I will now prepare a draft announcement for the news blog and emailing... Peter From anaryin at gmail.com Fri Feb 24 09:56:51 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 24 Feb 2012 15:56:51 +0100 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: Hi Peter, In both Linux and Mac all tests ran fine. A remark, when running the tests I get plenty of these warnings: test_FSSP ... /Volumes/Home/users/joaor/Software/biopython-1.59/build/lib.macosx-10.6-intel-2.7/Bio/Align/Generic.py:54: BiopythonDeprecationWarning: With the introduction of the MultipleSeqAlignment class in Bio.Align, this base class is deprecated and is likely to be removed in a future release of Biopython. warnings.warn("With the introduction of the MultipleSeqAlignment class in Bio.Align, this base class is deprecated and is likely to be removed in a future release of Biopython.", Bio.BiopythonDeprecationWarning) Is there an option not to have the message repeated? It's just cosmetics but I thought of asking anyways.. Good work, Jo?o From p.j.a.cock at googlemail.com Fri Feb 24 10:30:37 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 15:30:37 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 2:35 PM, Peter Cock wrote: > > I will now prepare a draft announcement for the news blog and emailing... > How this look? Biopython 1.59 released Source distributions and Windows installers for *Biopython 1.59* are now available from the downloads page on the Biopython website and from the Python Package Index (PyPI) . Platforms/Deployment We currently support Python 2.5, 2.6 and 2.7 and also test under Jython 2.5 (which does not cover NumPy). Please note that this release will not work on Python 2.4 Most functionality is also working under Python 3.1 and 3.2 (including modules using NumPy ), and under PyPy(excluding our NumPy dependencies). We are now encouraging early adopters to help beta testing on these platforms. The installation setup.py now supports ?install_requires? when setuptools is installed. This avoids the manual dialog when installing Biopython via easy_install or pip and numpy is not installed. It also allows user libraries that require Biopython to include it in their install_requires and get automatic installation of dependencies. Features New module Bio.TogoWS offers a wrapper for the TogoWS REST API, a web service based in Japan offering access to KEGG, DDBJ, PDBj, CBRC plus access to some NCBI and EBI resources including PubMed, GenBank and UniProt. This is much easier to use than the NCBI Entrez API, but should be especially useful for Biopython users based in Asia. The NCBI Entrez Fetchfunction Bio.Entrez.efetch has been updated to handle the NCBI?s stricter handling of multiple ID arguments in EFetch 2.0 (released February 2012), however the NCBI have also changed the retmode default argument so you may need to make this explicit. e.g. retmode="text" The position objects used in Bio.SeqFeature now act almost like integers, making dealing with fuzzy locations in EMBL/GenBank files much easier. Also the SeqFeature?s strand and any database reference are now properties of the FeatureLocation object (a more logical placement), with proxy methods for backwards compatibility. Bio.Graphics.BasicChromosome has been extended to allow simple sub-features to be drawn on chromosome segments, suitable to show the position of genes, SNPs or other loci. Bio.Graphics.GenomeDiagram has been extended to allow cross-links between tracks, and track specific start/end positions for showing regions. This can be used to imitate the output from the Artemis Comparison Tool (ACT). Also, a new attribute circle_core makes it easier to have an empty space in the middle of a circular diagram (see tutorial). Note Bio.Graphics requires the ReportLab library . Bio.Align.Applications now includes a wrapper for command line tool Clustal Omega for protein multiple sequence alignment. Bio.AlignIO now supports sequential PHYLIP files (as well as interlaced PHYLIP files) as a separate format variant. Additionally there have been other minor bug fixes and more unit tests, and updates to the documentation including the Biopython Tutorial( PDF ). Contributors Many thanks to the Biopython developers and community for making this release possible, especially the following contributors: - Andreas Wilm (first contribution) - Alessio Papini (first contribution) - Brad Chapman - Brandon Invergo - Connor McCoy - Eric Talevich - Joao Rodrigues - Konrad F?rstner (first contribution) - Michiel de Hoon - Matej Repi? (first contribution) - Leighton Pritchard - Peter Cock From anaryin at gmail.com Fri Feb 24 10:36:01 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 24 Feb 2012 16:36:01 +0100 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: Maybe a small reference to the TogoWS issue? Otherwise people might get worried.. And another cosmetic change, would you mind adding a tilde to my name? Copy-paste it from my sig. I'm usually not picky but since there are weird characters in there already ;) Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao From p.j.a.cock at googlemail.com Fri Feb 24 10:39:54 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 15:39:54 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 3:36 PM, Jo?o Rodrigues wrote: > And another cosmetic change, would you mind adding a tilde to my name? > Copy-paste it from my sig. I'm usually not picky but since there are weird > characters in there already ;) Sure, I'll do that for the release notes. Do can update the NEWS file if you like for future mentions. Peter From p.j.a.cock at googlemail.com Fri Feb 24 10:45:08 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 15:45:08 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 2:56 PM, Jo?o Rodrigues wrote: > Hi Peter, > > In both Linux and Mac all tests ran fine. > Thank you - I would have been surprised if not though. It would suggest a gap in the buildbot coverage. > A remark, when running the tests I get plenty of these warnings: > > test_FSSP ... > /Volumes/Home/users/joaor/Software/biopython-1.59/build/lib.macosx-10.6-intel-2.7/Bio/Align/Generic.py:54: > BiopythonDeprecationWarning: With the introduction of the > MultipleSeqAlignment class in Bio.Align, this base class is deprecated and > is likely to be removed in a future release of Biopython. > ? warnings.warn("With the introduction of the MultipleSeqAlignment class in > Bio.Align, this base class is deprecated and is likely to be removed in a > future release of Biopython.", Bio.BiopythonDeprecationWarning) > > Is there an option not to have the message repeated? It's just cosmetics but > I thought of asking anyways.. > > Good work, > > Jo?o I don't think we can alter the repeat when the warning is shown, but we can silence it for this test. We should do that (but really we should update the FSSP code), likewise the warning for the BioSQL feature 'order' thing. We've got similar code in other tests - the trouble is that the warnings module is global and there are subtle interactions between test scripts. I think we need to add some cleanup in run_tests.py to restore the filters as a fall back. Do you want to try this (after the release)? Peter From anaryin at gmail.com Fri Feb 24 10:46:36 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 24 Feb 2012 16:46:36 +0100 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: Sure, we can look at it afterwards then, it was just a cosmetic issue, it's annoying to see so many repeated lines. Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao No dia 24 de Fevereiro de 2012 16:45, Peter Cock escreveu: > On Fri, Feb 24, 2012 at 2:56 PM, Jo?o Rodrigues wrote: > > Hi Peter, > > > > In both Linux and Mac all tests ran fine. > > > > Thank you - I would have been surprised if not though. It would > suggest a gap in the buildbot coverage. > > > A remark, when running the tests I get plenty of these warnings: > > > > test_FSSP ... > > > /Volumes/Home/users/joaor/Software/biopython-1.59/build/lib.macosx-10.6-intel-2.7/Bio/Align/Generic.py:54: > > BiopythonDeprecationWarning: With the introduction of the > > MultipleSeqAlignment class in Bio.Align, this base class is deprecated > and > > is likely to be removed in a future release of Biopython. > > warnings.warn("With the introduction of the MultipleSeqAlignment class > in > > Bio.Align, this base class is deprecated and is likely to be removed in a > > future release of Biopython.", Bio.BiopythonDeprecationWarning) > > > > Is there an option not to have the message repeated? It's just cosmetics > but > > I thought of asking anyways.. > > > > Good work, > > > > Jo?o > > I don't think we can alter the repeat when the warning is shown, > but we can silence it for this test. We should do that (but really > we should update the FSSP code), likewise the warning for the > BioSQL feature 'order' thing. > > We've got similar code in other tests - the trouble is that the > warnings module is global and there are subtle interactions > between test scripts. I think we need to add some cleanup in > run_tests.py to restore the filters as a fall back. > > Do you want to try this (after the release)? > > Peter > From p.j.a.cock at googlemail.com Fri Feb 24 11:44:54 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 16:44:54 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 3:46 PM, Jo?o Rodrigues wrote: > Sure, we can look at it afterwards then, it was just a cosmetic issue, it's > annoying to see so many repeated lines. Great - thanks. No one spotted the fact that today is still 24 Feb, not 25 Feb? Oh well... the NEWS file in the release itself will just be a day out. http://news.open-bio.org/news/2012/02/biopython-1-59-released/ Normal git usage can resume... back to the Biopython 1.60 thread. Peter From fabio.zanini at fastmail.fm Sun Feb 26 06:29:08 2012 From: fabio.zanini at fastmail.fm (Fabio Zanini) Date: Sun, 26 Feb 2012 12:29:08 +0100 Subject: [Biopython-dev] Should MultipleSequenceAlignment support iterator slicing? In-Reply-To: References: <20120221214440.GB2430@X200.local> Message-ID: <20120226112908.GA20547@X200.local> On Tue, Feb 21, 2012 at 10:50:39PM +0000, Peter Cock wrote: > On Tue, Feb 21, 2012 at 9:44 PM, Fabio Zanini wrote: > > Hi all! > > > > I am using the MultipleSequenceAlignment class a lot these days, and > > would find it useful to get subalignments using python iterators. I > > started a discussion on the issue tracker: > > > > https://redmine.open-bio.org/issues/3326 > > > > Short version: I would like to do things like > > > > alignment[[4,5,8]] > > > > Does that sound useful enough to add (for rows and cols)? > i.e. support row/col index lists - but not iterators? > Support for lists in both dimensions is already quite big of an improvement and we should definitely implement this. This should cover most use cases. Since iterators can be iterated over only once, a memory efficient solution for row+column double iterators, which do not convert any of them into a list, is not that easy. Let's introduce the list support for now! Cheers, Fabio From redmine at redmine.open-bio.org Mon Feb 27 23:56:45 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 28 Feb 2012 04:56:45 +0000 Subject: [Biopython-dev] [Biopython - Feature #3329] (New) MMCIF parser should take an open file handle Message-ID: Issue #3329 has been reported by Mark Diekhans. ---------------------------------------- Feature #3329: MMCIF parser should take an open file handle https://redmine.open-bio.org/issues/3329 Author: Mark Diekhans Status: New Priority: Normal Assignee: Category: Target version: URL: MMCIF parser should take an open file as well as a file path. We we unable to use this paser because we need to read compressed files. Reading from a file handle is the most flexible API. thanks!!! ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Thu Feb 2 03:22:58 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 2 Feb 2012 03:22:58 +0000 Subject: [Biopython-dev] [Biopython - Bug #3320] (New) Bio.Phylo.PAML KeyError in codeml Message-ID: Issue #3320 has been reported by Timothee Flutre. ---------------------------------------- Bug #3320: Bio.Phylo.PAML KeyError in codeml https://redmine.open-bio.org/issues/3320 Author: Timothee Flutre Status: New Priority: Normal Assignee: Category: Target version: URL: I get the following error while using codeml in Bio.Phylo.PAML (same error for options "fix_rho" and "rho"):
Traceback (most recent call last):
  File "./PamlAnalysis.py", line 42, in main
    cml.read_ctl_file(genericCtlFile)
  File "/home/src/BIOPYTHON/lib/python/Bio/Phylo/PAML/codeml.py", line 133, in read_ctl_file
    raise KeyError, "Invalid option: %s" % option
KeyError: 'Invalid option: fix_rho'
I resolved the problem by adding the following two lines in the file Bio/Phylo/PAML/codeml.py at the lines 63-64:
                        "fix_rho": None,
                        "rho": None,
Such errors do not happen when using the example file "codeml.ctl" available with the "PAML":http://abacus.gene.ucl.ac.uk/software/paml.html archive (v4.4 or v4.5) as this file does contain neither the option "fix_rho" nor "rho". But these options are present in PAML "documentation":http://abacus.gene.ucl.ac.uk/software/pamlDOC.pdf (see p.33-34). Moreover, these two options are present in the file Bio/Phylo/PAML/baseml.py at the lines 49-50. Do I need to fork the biopython repository on github and make the changes myself? ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From b.invergo at gmail.com Thu Feb 2 09:26:57 2012 From: b.invergo at gmail.com (Brandon Invergo) Date: Thu, 02 Feb 2012 10:26:57 +0100 Subject: [Biopython-dev] [Biopython - Bug #3320] (New) Bio.Phylo.PAML KeyError in codeml In-Reply-To: References: Message-ID: <1328174817.1038.1.camel@localhost.localdomain> I'll take care of this now... -brandon On Thu, 2012-02-02 at 03:22 +0000, redmine at redmine.open-bio.org wrote: > Issue #3320 has been reported by Timothee Flutre. > > ---------------------------------------- > Bug #3320: Bio.Phylo.PAML KeyError in codeml > https://redmine.open-bio.org/issues/3320 > > Author: Timothee Flutre > Status: New > Priority: Normal > Assignee: > Category: > Target version: > URL: > > > I get the following error while using codeml in Bio.Phylo.PAML (same error for options "fix_rho" and "rho"): > >
> Traceback (most recent call last):
>   File "./PamlAnalysis.py", line 42, in main
>     cml.read_ctl_file(genericCtlFile)
>   File "/home/src/BIOPYTHON/lib/python/Bio/Phylo/PAML/codeml.py", line 133, in read_ctl_file
>     raise KeyError, "Invalid option: %s" % option
> KeyError: 'Invalid option: fix_rho'
> 
> > I resolved the problem by adding the following two lines in the file Bio/Phylo/PAML/codeml.py at the lines 63-64: > >
>                         "fix_rho": None,
>                         "rho": None,
> 
> > Such errors do not happen when using the example file "codeml.ctl" available with the "PAML":http://abacus.gene.ucl.ac.uk/software/paml.html archive (v4.4 or v4.5) as this file does contain neither the option "fix_rho" nor "rho". But these options are present in PAML "documentation":http://abacus.gene.ucl.ac.uk/software/pamlDOC.pdf (see p.33-34). Moreover, these two options are present in the file Bio/Phylo/PAML/baseml.py at the lines 49-50. > > Do I need to fork the biopython repository on github and make the changes myself? > > > ---------------------------------------- > You have received this notification because this email was added to the New Issue Alert plugin > > From redmine at redmine.open-bio.org Thu Feb 2 10:02:31 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 2 Feb 2012 10:02:31 +0000 Subject: [Biopython-dev] [Biopython - Bug #3321] (New) Bio.Phylo.PAML.codeml fails to parse the omega tree (free-ratio model) Message-ID: Issue #3321 has been reported by Brandon Invergo. ---------------------------------------- Bug #3321: Bio.Phylo.PAML.codeml fails to parse the omega tree (free-ratio model) https://redmine.open-bio.org/issues/3321 Author: Brandon Invergo Status: New Priority: Normal Assignee: Brandon Invergo Category: Target version: URL: When using the free-ratio model of codeml, Bio.Phylo.PAML.codeml fails to parse the omega tree (the Newick tree following "w ratios as labels for TreeView:" in the codeml results file). ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Tue Feb 7 13:46:33 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 7 Feb 2012 13:46:33 +0000 Subject: [Biopython-dev] Fwd: [biopython] Added transform and copy method to Entity in biopython.PDB (#25) In-Reply-To: References: Message-ID: Sorry, didn't realise this was direct to me and not the mailing list. Would everyone be OK with git pull requests coming straight here? Anyway - Eric, would this be something you could look at? Peter ---------- Forwarded message ---------- From: benreynwar Date: Tue, Jan 24, 2012 at 11:12 PM Subject: [biopython] Added transform and copy method to Entity in biopython.PDB (#25) To: Peter Cock Minor changes to biopython.PDB to allow transforms to be applied to entities and copies of entities to be made. ?Also fixed a bug in the transform method of Atom. ?Tests are included. The changes are from a few months ago but they still merge cleanly into the current master. You can merge this Pull Request by running: ?git pull https://github.com/benreynwar/biopython master Or you can view, comment on it, or merge it online at: ?https://github.com/biopython/biopython/pull/25 -- Commit Summary -- * Added tranform method to Entity. * Add test for transform method. * Adding copy method for Entities * Added an insert method to Entity. ?This allows a child to be inserted into a specified position in the child_list which effects position in the PDB output. * Fixed bug in transform method of Atom (dot product order). -- File Changes -- M Bio/PDB/Atom.py (14) M Bio/PDB/Entity.py (40) M Tests/test_PDB.py (70) -- Patch Links -- ?https://github.com/biopython/biopython/pull/25.patch ?https://github.com/biopython/biopython/pull/25.diff --- Reply to this email directly or view it on GitHub: https://github.com/biopython/biopython/pull/25 From eric.talevich at gmail.com Tue Feb 7 15:36:28 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 7 Feb 2012 10:36:28 -0500 Subject: [Biopython-dev] [biopython] Added transform and copy method to Entity in biopython.PDB (#25) In-Reply-To: References: Message-ID: On Tue, Feb 7, 2012 at 8:46 AM, Peter Cock wrote: > Sorry, didn't realise this was direct to me and not the mailing list. > Would everyone be OK with git pull requests coming straight here? > > Anyway - Eric, would this be something you could look at? > > Peter > At a glance, the code looks cool to me, though I faintly recall some overlap with Jo?o's unmerged work (copy method). I'll try to find time to test it, but would not be upset if someone else got to it first. -E > ---------- Forwarded message ---------- > From: benreynwar > < > reply+i-2958669-14408e039dee774169d6f09c683146c3f42dd0b9-63959 at reply.github.com > > > Date: Tue, Jan 24, 2012 at 11:12 PM > Subject: [biopython] Added transform and copy method to Entity in > biopython.PDB (#25) > To: Peter Cock > > > Minor changes to biopython.PDB to allow transforms to be applied to > entities and copies of entities to be made. Also fixed a bug in the > transform method of Atom. Tests are included. > The changes are from a few months ago but they still merge cleanly > into the current master. > > You can merge this Pull Request by running: > > git pull https://github.com/benreynwar/biopython master > > Or you can view, comment on it, or merge it online at: > > https://github.com/biopython/biopython/pull/25 > > -- Commit Summary -- > > * Added tranform method to Entity. > * Add test for transform method. > * Adding copy method for Entities > * Added an insert method to Entity. This allows a child to be > inserted into a specified position in the child_list which effects > position in the PDB output. > * Fixed bug in transform method of Atom (dot product order). > > -- File Changes -- > > M Bio/PDB/Atom.py (14) > M Bio/PDB/Entity.py (40) > M Tests/test_PDB.py (70) > > -- Patch Links -- > > https://github.com/biopython/biopython/pull/25.patch > https://github.com/biopython/biopython/pull/25.diff > > --- > Reply to this email directly or view it on GitHub: > https://github.com/biopython/biopython/pull/25 > From redmine at redmine.open-bio.org Fri Feb 10 08:55:30 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Fri, 10 Feb 2012 08:55:30 +0000 Subject: [Biopython-dev] [Biopython - Bug #3323] (New) Bio.Phylo.draw should accept axes as optional argument Message-ID: Issue #3323 has been reported by Fabio Zanini. ---------------------------------------- Bug #3323: Bio.Phylo.draw should accept axes as optional argument https://redmine.open-bio.org/issues/3323 Author: Fabio Zanini Status: New Priority: Normal Assignee: Category: Target version: URL: Some months ago, the draw function has beeen added to Bio.Phylo. Although that function works, it always opens a new figure. I have slightly modified it to accept an additional optional argument, so that a pre-defined axes can be used to plot the phylogram in if needed. This makes it much easier to embed a phylogram into a grid of subplots. The file with the new function in attachment. If you prefer, I can git push it somewhere. ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From eric.talevich at gmail.com Sat Feb 11 02:55:06 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 10 Feb 2012 21:55:06 -0500 Subject: [Biopython-dev] [biopython] Added transform and copy method to Entity in biopython.PDB (#25) In-Reply-To: References: Message-ID: On Tue, Feb 7, 2012 at 10:36 AM, Eric Talevich wrote: > On Tue, Feb 7, 2012 at 8:46 AM, Peter Cock wrote: > >> Sorry, didn't realise this was direct to me and not the mailing list. >> Would everyone be OK with git pull requests coming straight here? >> >> Anyway - Eric, would this be something you could look at? >> >> Peter >> > > At a glance, the code looks cool to me, though I faintly recall some > overlap with Jo?o's unmerged work (copy method). I'll try to find time to > test it, but would not be upset if someone else got to it first. > -E > > (Not sure who gets follow-up e-mails from Github, if anyone.) Could someone else confirm whether the last patch is correct? It switches the order of the dot product arguments in Atom.transform(). https://github.com/benreynwar/biopython/commit/346df8f2006735129a93508a04c4cdf6acb99a5f Code: def transform(self, rot, tran): """ Apply rotation and translation to the atomic coordinates. Example: >>> rotation=rotmat(pi, Vector(1,0,0)) >>> translation=array((0,0,1), 'f') >>> atom.transform(rotation, translation) @param rot: A right multiplying rotation matrix @type rot: 3x3 Numeric array @param tran: the translation vector @type tran: size 3 Numeric array """ - self.coord=numpy.dot(self.coord, rot)+tran + self.coord=numpy.dot(rot, self.coord)+tran This will break every script that uses the transform() method if we apply it. It also breaks the unit test, of course, but I can change the unit test to match if we accept this patch. It seems to me that which way is right is a matter of how the user specifies the input. I'm not a thinking man, so I don't entirely trust my judgment on this one. Thanks, Eric > >> ---------- Forwarded message ---------- >> From: benreynwar >> < >> reply+i-2958669-14408e039dee774169d6f09c683146c3f42dd0b9-63959 at reply.github.com >> > >> Date: Tue, Jan 24, 2012 at 11:12 PM >> Subject: [biopython] Added transform and copy method to Entity in >> biopython.PDB (#25) >> To: Peter Cock >> >> >> Minor changes to biopython.PDB to allow transforms to be applied to >> entities and copies of entities to be made. Also fixed a bug in the >> transform method of Atom. Tests are included. >> The changes are from a few months ago but they still merge cleanly >> into the current master. >> >> You can merge this Pull Request by running: >> >> git pull https://github.com/benreynwar/biopython master >> >> Or you can view, comment on it, or merge it online at: >> >> https://github.com/biopython/biopython/pull/25 >> >> -- Commit Summary -- >> >> * Added tranform method to Entity. >> * Add test for transform method. >> * Adding copy method for Entities >> * Added an insert method to Entity. This allows a child to be >> inserted into a specified position in the child_list which effects >> position in the PDB output. >> * Fixed bug in transform method of Atom (dot product order). >> >> -- File Changes -- >> >> M Bio/PDB/Atom.py (14) >> M Bio/PDB/Entity.py (40) >> M Tests/test_PDB.py (70) >> >> -- Patch Links -- >> >> https://github.com/biopython/biopython/pull/25.patch >> https://github.com/biopython/biopython/pull/25.diff >> >> --- >> Reply to this email directly or view it on GitHub: >> https://github.com/biopython/biopython/pull/25 >> > > From redmine at redmine.open-bio.org Sun Feb 12 15:37:25 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 12 Feb 2012 15:37:25 +0000 Subject: [Biopython-dev] [Biopython - Feature #3326] (New) MultipleSeqAlignment should support iterators, not only slice objects Message-ID: Issue #3326 has been reported by Fabio Zanini. ---------------------------------------- Feature #3326: MultipleSeqAlignment should support iterators, not only slice objects https://redmine.open-bio.org/issues/3326 Author: Fabio Zanini Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Currently, the MultipleSeqAlignment object supports slicing via various syntaxes, e.g.: - alignment[4,6] - alignment[2:4,3:6] - alignment[3:4:5] In the latter case, the indices build a so-called slice, a pure Python object, and MultipleSeqAlignment has an explicit if clause for dealing with this case. However, the user might want to iterate over the MSA using the more general *iterators*, e.g. from itertools, rather than simple slice objects. An extension that includes iterators looks easy: # Check whether the index is an iterator if (hasattr(index, 'next')) and (hasattr(index:, '__iter__')): return MultipleSeqAlignment([self._records[i] for i in index], self._alphabet) Would you think this is useful? ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sun Feb 12 15:37:26 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 12 Feb 2012 15:37:26 +0000 Subject: [Biopython-dev] [Biopython - Feature #3326] (New) MultipleSeqAlignment should support iterators, not only slice objects Message-ID: Issue #3326 has been reported by Fabio Zanini. ---------------------------------------- Feature #3326: MultipleSeqAlignment should support iterators, not only slice objects https://redmine.open-bio.org/issues/3326 Author: Fabio Zanini Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Currently, the MultipleSeqAlignment object supports slicing via various syntaxes, e.g.: - alignment[4,6] - alignment[2:4,3:6] - alignment[3:4:5] In the latter case, the indices build a so-called slice, a pure Python object, and MultipleSeqAlignment has an explicit if clause for dealing with this case. However, the user might want to iterate over the MSA using the more general *iterators*, e.g. from itertools, rather than simple slice objects. An extension that includes iterators looks easy: # Check whether the index is an iterator if (hasattr(index, 'next')) and (hasattr(index:, '__iter__')): return MultipleSeqAlignment([self._records[i] for i in index], self._alphabet) Would you think this is useful? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Sun Feb 12 20:20:01 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Sun, 12 Feb 2012 20:20:01 +0000 Subject: [Biopython-dev] [Biopython - Feature #3326] MultipleSeqAlignment should support iterators, not only slice objects References: Message-ID: Issue #3326 has been updated by Peter Cock. Could you give a usage example or two, combining itertools with the alignment (after this change)? I don't really understand the aim here. ---------------------------------------- Feature #3326: MultipleSeqAlignment should support iterators, not only slice objects https://redmine.open-bio.org/issues/3326 Author: Fabio Zanini Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Currently, the MultipleSeqAlignment object supports slicing via various syntaxes, e.g.: - alignment[4,6] - alignment[2:4,3:6] - alignment[3:4:5] In the latter case, the indices build a so-called slice, a pure Python object, and MultipleSeqAlignment has an explicit if clause for dealing with this case. However, the user might want to iterate over the MSA using the more general *iterators*, e.g. from itertools, rather than simple slice objects. An extension that includes iterators looks easy: # Check whether the index is an iterator if (hasattr(index, 'next')) and (hasattr(index:, '__iter__')): return MultipleSeqAlignment([self._records[i] for i in index], self._alphabet) Would you think this is useful? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From redmine at redmine.open-bio.org Mon Feb 13 08:24:02 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 13 Feb 2012 08:24:02 +0000 Subject: [Biopython-dev] [Biopython - Feature #3326] MultipleSeqAlignment should support iterators, not only slice objects References: Message-ID: Issue #3326 has been updated by Fabio Zanini. Sure, here the examples, one using a plain iter object, one using itertools. # to get a subalignment with only rows at indices 1,7 and 8, you could write:
 iterator = iter([1,7,8])
 alignment[iterator]
# you want a subalignment with only the indices from a list index_list that are True ater a certain filter index_filter, i.e. for which index_filter(index_list[i]) == True:
 from itertools import ifilter
 iterator = ifilter(index_filter, index_list)
 alignment[iterator]
The trivial example from the itertools website on this is the following:
 ifilter(lambda x: x%2, range(10)) --> 1 3 5 7 9
---------------------------------------- Feature #3326: MultipleSeqAlignment should support iterators, not only slice objects https://redmine.open-bio.org/issues/3326 Author: Fabio Zanini Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Currently, the MultipleSeqAlignment object supports slicing via various syntaxes, e.g.: - alignment[4,6] - alignment[2:4,3:6] - alignment[3:4:5] In the latter case, the indices build a so-called slice, a pure Python object, and MultipleSeqAlignment has an explicit if clause for dealing with this case. However, the user might want to iterate over the MSA using the more general *iterators*, e.g. from itertools, rather than simple slice objects. An extension that includes iterators looks easy: # Check whether the index is an iterator if (hasattr(index, 'next')) and (hasattr(index:, '__iter__')): return MultipleSeqAlignment([self._records[i] for i in index], self._alphabet) Would you think this is useful? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From thamelry at binf.ku.dk Mon Feb 13 11:57:41 2012 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Mon, 13 Feb 2012 12:57:41 +0100 Subject: [Biopython-dev] [biopython] Added transform and copy method to Entity in biopython.PDB (#25) In-Reply-To: References: Message-ID: On Sat, Feb 11, 2012 at 3:55 AM, Eric Talevich wrote: > - self.coord=numpy.dot(self.coord, rot)+tran > + self.coord=numpy.dot(rot, self.coord)+tran > > > This will break every script that uses the transform() method if we apply > it. It also breaks the unit test, of course, but I can change the unit test > to match if we accept this patch. > > It seems to me that which way is right is a matter of how the user > specifies the input. I'm not a thinking man, so I don't entirely trust my > judgment on this one. > Indeed. This is not a bug, the method simply assumes a right-multiplying matrix. Changing this will break many scripts for No Good Reason (TM). Cheers, -Thomas -- Thomas Hamelryck Assoc. Prof., University of Copenhagen, Denmark Visiting Prof., University of Leeds, UK Group leader Structural Bioinformatics Bioinformatics center, Department of Biology University of Copenhagen Ole Maaloes Vej 5 DK-2200 Copenhagen N Denmark From redmine at redmine.open-bio.org Thu Feb 16 10:11:55 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 16 Feb 2012 10:11:55 +0000 Subject: [Biopython-dev] [Biopython - Bug #3327] (New) HMMparse.py has some difficulty in loading hmmsearch result file Message-ID: Issue #3327 has been reported by ruan zheng. ---------------------------------------- Bug #3327: HMMparse.py has some difficulty in loading hmmsearch result file https://redmine.open-bio.org/issues/3327 Author: ruan zheng Status: New Priority: Normal Assignee: Category: Target version: URL: Hi, I just download the HMMparse.py file in want of dealing with my hmmsearch result. But I found a problem about using it to load my data. When I import HMMparse in python environment and type HMMparse.HMMparser('hmmsearch_result'), it reports a problem of "invalid literal for int() with base 10:". I try to locate the error. And I found in line 60 and 61 of HMMparse.py file, it missed a possible value of '[]'. By adding s != '[]' in the argument list, it works fine to me. I attached my sample file. Ruan Zheng ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Fri Feb 17 22:40:29 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 17 Feb 2012 22:40:29 +0000 Subject: [Biopython-dev] Fwd: [Utilities-announce] NCBI E-Utilities Update In-Reply-To: References: Message-ID: Hi all, Just FYI, the following was also changed in this week's Entrez update to EFetch 2.0 (see forwarded email below). This was breaking some Biopython scripts - depending on how they passed in the id parameters. It turns out we relied on the undocumented and now withdrawn form in one of our examples, so some users had copied this style. Biopython 1.59 will solve this. I know BioJava is looking at the more publicised changes to retmode - I don't know if BioPerl or BioRuby was affected. Regards, Peter ---------- Forwarded message ---------- From: Date: Fri, Feb 17, 2012 at 7:09 PM Subject: [Utilities-announce] NCBI E-Utilities Update To: NLM/NCBI List utilities-announce The most recent NCBI E-Utilities update includes a more stringent check for correct URL parameters. EFetch URLs with multiple IDs must be entered as: id=1,2,3 EFetch no longer accepts invalid URL parameters, e.g., id=1&id=2&id=3 Please see the online E-Utilities help for additional information: http://www.ncbi.nlm.nih.gov/books/NBK25500/ EFetch online help: http://www.ncbi.nlm.nih.gov/books/NBK25499/#chapter4.EFetch Thank you. _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From p.j.a.cock at googlemail.com Sat Feb 18 09:15:31 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 18 Feb 2012 09:15:31 +0000 Subject: [Biopython-dev] Biopython 1.59 plans Message-ID: Hello all, Based on the typical release schedule, we're a little overdue for releasing Biopython 1.59 - I would have raised this earlier but January was busy for me. With the recent NCBI EFetch change, and the workaround for it, it would be especially good to get the release out soon. I propose we release Biopython 1.59 in the second half of next week - essentially the master branch as it is. Most of the unit tests are also passing under PyPy (bar the C extensions, and external dependencies like NumPy) with the exception of some XML issues with the standard library. If we mark these as known failures and include them in the buildbot before then, we can announce this release as having (partial) PyPy support. Does anyone else want to do the release? If not, I can. Any comments on this? I'll start a new thread for plans for Biopython 1.60 - there are several exciting chunks of new code that looks near ready for release. Peter From p.j.a.cock at googlemail.com Sat Feb 18 09:39:06 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 18 Feb 2012 09:39:06 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond Message-ID: Hi all, Assuming we ship Biopython 1.59 soon (Feb 2012), we should start thinking about what is ready to merge to the trunk afterwards to be included in Biopython 1.60, and what else is being worked on beyond that. This might also help with GSoC project ideas. http://lists.open-bio.org/pipermail/biopython-dev/2012-February/009383.html ------------------------------------------- Here are some things that I think are strong candidates for 1.60 (not an exclusive list!) MAF support (the multiple alignment format) for AlignIO, including bespoke indexing (MAF specific). BGZF support: Low level module like Python's gzip, support in SeqIO for indexing BGZF compressed files, and probably also indexing large MAF files too (I've had some positive discussions off list about this). Brad's GFF code. People are using this already, so it probably is ready for inclusion (even if we do need some fine tuning for SeqIO integration). Official releases for Python 3 (even if we do call it a beta status release). Maybe we can even do this with Biopython 1.59? Most things are working (with the exception of some C code and missing third party dependencies), and my concerns about the memory overhead of unicode strings should be resolved with Python 3.3 (the parsing speed overhead perhaps not). -------------------------------------------- Other work at various stages: Ontologies, GO and OBO - several people are looking at this stuff but is anything "ready" yet? I can't see Chris Lasher's repository on github anymore. http://lists.open-bio.org/pipermail/biopython/2011-December/007682.html VCF format? Variant Call Format - Tiago what's you're impression of work in this area? I know there are other things but I'm struggling to recall them right now. If I've overlooked your work it isn't malice but forgetfulness - please reply with a status update. So, what other cool things are you all working on, and in particular what is ready or near-ready for inclusion with Biopython this year? Thanks, Peter From tiagoantao at gmail.com Sat Feb 18 10:53:37 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 18 Feb 2012 10:53:37 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: Hello, On Sat, Feb 18, 2012 at 9:39 AM, Peter Cock wrote: > So, what other cool things are you all working on, > and in particular what is ready or near-ready for > inclusion with Biopython this year? I have changed job 3 months ago and that has meant that I have been in a hell-hole of over-work for the last 3 months. A hell-hole that I now have crawled out (and with lots of new code written). I am now doing (more standard?) human evolutionary genetics. My previous experience with donating code to Biopython has me with mixed feelings: people use the applications a lot but very rarely the code directly (a cursory look at the citations of the applications vs the citations of Bio.PopGen clearly shows that). I now have written a LOT of code in slightly different areas, these might (or not) interest people: 1. Phasing/imputation: code to parse/convert between Beagle, Shapeit, phase, impute2. Typically to analyse SNP chips of human data (say between 0.5 and 1.5 Million SNPs per individual, 1000s of individuals). 2. plink: code to parse plink output files (quite trivial). People use plink a lot? 3. GO code: I would REALLY like to start a discussion on what should be a proper GO approach. In my case I am doing gene enrichment analysis. I might start a thread or a blog post on this... 4. Code to do multi-tasking. Actually Bio.PopGen has a scheduler to do multiple (external) tasks at the same time, but I have written a new one. Maybe the code does not belong into biopython, but a discussion could be done around such a issue (I suppose people doing analysis of lots of data have been having that problem, not just me?). 5. Some ensembl variation code: things like getting the ancestral SNP (versus the derived) or getting all the stuff (genes mostly) in a certain window position of the genome. On a side, I still have to do a 64 bit Windows port (remember?). This will have to be done from home, as my now work computer is a Pentium 4 (not precisely a modern 64 bit machine ;) ) Another issue that has been crossing my mind regards the inclusion of new code: In my case I would really like to have something like a "beta" version of the API: ie releasing something that is deemed "unstable" API wise (to get comments from the community) and then stabilize it. Concretly: in the first/second version people should expect the API to not be stable and have changes. Another side issue that I would like to discuss (maybe a different thread): Is how people are coping with large amounts of data using Python (or Perl/Ruby for that matter)? Specifically the problem of performance. As I see it, there is more and more the case of depending on external (fast) programs or CLib extensions or Java extensions to do the bulk of the work. Inner-loops in Python simply do not cut for speed. In the near future (this year) I will probably also be working with sequence data (BAM and VCF stuff might resurface) All for now, T -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From mjldehoon at yahoo.com Sat Feb 18 11:52:38 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 18 Feb 2012 03:52:38 -0800 (PST) Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: Message-ID: <1329565958.63072.YahooMailClassic@web161201.mail.bf1.yahoo.com> --- On Sat, 2/18/12, Tiago Ant?o wrote: > Another side issue that I would like to discuss (maybe a > different > thread): Is how people are coping with large amounts of data > using > Python (or Perl/Ruby for that matter)? Specifically the > problem of > performance. As I see it, there is more and more the case of > depending > on external (fast) programs or CLib extensions or Java > extensions to > do the bulk of the work. Inner-loops in Python simply do not > cut for speed. C extensions to Python such as pysam, together with outer-loops in Python/Biopython have been working very well for me. Perhaps at some point pysam can be included into Biopython, but as samtools is still evolving it makes sense for it to be a separate package so that it can be updated more quickly. I am more concerned about relying on external programs, in particular R. Notwithstanding the usefulness of rpy and rpy2, I would prefer to have a pure-Python or Python-with-C-extension solution, ideally as part of Biopython. -Michiel. From mjldehoon at yahoo.com Sat Feb 18 11:58:12 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 18 Feb 2012 03:58:12 -0800 (PST) Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: Message-ID: <1329566292.98292.YahooMailClassic@web161202.mail.bf1.yahoo.com> --- On Sat, 2/18/12, Peter Cock wrote: > So, what other cool things are you all working on, > and in particular what is ready or near-ready for > inclusion with Biopython this year? > I have written some scripts for microarray data analysis, including both file parsing and data normalization. This will need some discussion on biopython-dev to decide the appropriate data structures and functionality, so it won't be ready for 1.59, but perhaps it will be for 1.60. -Michiel. From eric.talevich at gmail.com Sat Feb 18 16:34:18 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Feb 2012 11:34:18 -0500 Subject: [Biopython-dev] Bio.Phylo bugs & pain points Message-ID: Folks, Since we're coming up on another release of Biopython, I'd like to identify any remaing bugs, pain points, aesthetic flaws, and minor missing features in Bio.Phylo. (And hopefully, fix them before the release.) In particular, the Phylo.draw() function, which plots a rooted phylogram with matplotlib, appeared in the last Biopython release unannounced. There are already many tree-drawing programs that produce beautiful publication-quality graphics, and we're not trying to compete with those. But we do want it to be useful for quickly visualizing a tree as you develop a script or modify a tree interactively in IPython, for example. So -- do the trees drawn by Phylo.draw() look right? Thanks, Eric From mjldehoon at yahoo.com Sat Feb 18 16:46:33 2012 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 18 Feb 2012 08:46:33 -0800 (PST) Subject: [Biopython-dev] [Biopython] Bio.Phylo bugs & pain points In-Reply-To: Message-ID: <1329583593.82866.YahooMailClassic@web161202.mail.bf1.yahoo.com> Hi Eric, > But we do want it to be useful for quickly visualizing a > tree as you develop a script or modify a tree > interactively in IPython, for example. Do you need IPython or is regular Python sufficient? -Michiel. --- On Sat, 2/18/12, Eric Talevich wrote: > From: Eric Talevich > Subject: [Biopython] Bio.Phylo bugs & pain points > To: "BioPython Mailing List" , "BioPython-Dev Mailing List" > Date: Saturday, February 18, 2012, 11:34 AM > Folks, > > Since we're coming up on another release of Biopython, I'd > like to identify > any remaing bugs, pain points, aesthetic flaws, and minor > missing features > in Bio.Phylo. (And hopefully, fix them before the release.) > > In particular, the Phylo.draw() function, which plots a > rooted phylogram > with matplotlib, appeared in the last Biopython release > unannounced. There > are already many tree-drawing programs that produce > beautiful > publication-quality graphics, and we're not trying to > compete with those. > But we do want it to be useful for quickly visualizing a > tree as you > develop a script or modify a tree interactively in IPython, > for example. So > -- do the trees drawn by Phylo.draw() look right? > > Thanks, > Eric > _______________________________________________ > Biopython mailing list? -? Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From eric.talevich at gmail.com Sat Feb 18 16:52:16 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Feb 2012 11:52:16 -0500 Subject: [Biopython-dev] Have you used Bio.Phylo in a published study? Message-ID: Folks, Now that Bio.Phylo has reached a somewhat stable point, we're preparing a journal article on it. I'd like to mention and cite some published studies in which Bio.Phylo was used for some part of the analysis. Has anyone here published a study that relied on the Phylo module of Biopython? I know of two so far: http://www.biology-direct.com/content/6/1/34/ http://www.biomedcentral.com/1471-2148/11/321 Thanks, Eric From eric.talevich at gmail.com Sat Feb 18 16:54:03 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Feb 2012 11:54:03 -0500 Subject: [Biopython-dev] [Biopython] Bio.Phylo bugs & pain points In-Reply-To: <1329583593.82866.YahooMailClassic@web161202.mail.bf1.yahoo.com> References: <1329583593.82866.YahooMailClassic@web161202.mail.bf1.yahoo.com> Message-ID: On Sat, Feb 18, 2012 at 11:46 AM, Michiel de Hoon wrote: > Hi Eric, > > > But we do want it to be useful for quickly visualizing a > > tree as you develop a script or modify a tree > > interactively in IPython, for example. > > Do you need IPython or is regular Python sufficient? > > -Michiel. > Regular Python plus matplotlib is sufficient. IPython has convenient integration with pylab, that's all. -E > > --- On Sat, 2/18/12, Eric Talevich wrote: > > > From: Eric Talevich > > Subject: [Biopython] Bio.Phylo bugs & pain points > > To: "BioPython Mailing List" , > "BioPython-Dev Mailing List" > > Date: Saturday, February 18, 2012, 11:34 AM > > Folks, > > > > Since we're coming up on another release of Biopython, I'd > > like to identify > > any remaing bugs, pain points, aesthetic flaws, and minor > > missing features > > in Bio.Phylo. (And hopefully, fix them before the release.) > > > > In particular, the Phylo.draw() function, which plots a > > rooted phylogram > > with matplotlib, appeared in the last Biopython release > > unannounced. There > > are already many tree-drawing programs that produce > > beautiful > > publication-quality graphics, and we're not trying to > > compete with those. > > But we do want it to be useful for quickly visualizing a > > tree as you > > develop a script or modify a tree interactively in IPython, > > for example. So > > -- do the trees drawn by Phylo.draw() look right? > > > > Thanks, > > Eric > > _______________________________________________ > > Biopython mailing list - Biopython at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biopython > > > From eric.talevich at gmail.com Sat Feb 18 17:11:27 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Feb 2012 12:11:27 -0500 Subject: [Biopython-dev] Bio.Phylo bugs & pain points In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 11:34 AM, Eric Talevich wrote: > So -- do the trees drawn by Phylo.draw() look right? > > Here's how to get a quick tree, using a test file from the Biopython source distribution: >>> from Bio import Phylo >>> tree = Phylo.read("Tests/PhyloXML/apaf.xml", "phyloxml") >>> Phylo.draw(tree) If you don't have the Tests/ directory, you can use any other Newick, Nexus or PhyloXML tree; just change the file name and format name in the call to Phylo.read(). Thanks, Eric From p.j.a.cock at googlemail.com Sat Feb 18 18:14:34 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 18 Feb 2012 18:14:34 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: <1329565958.63072.YahooMailClassic@web161201.mail.bf1.yahoo.com> References: <1329565958.63072.YahooMailClassic@web161201.mail.bf1.yahoo.com> Message-ID: On Sat, Feb 18, 2012 at 11:52 AM, Michiel de Hoon wrote: > --- On Sat, 2/18/12, Tiago Ant?o wrote: >> Another side issue that I would like to discuss (maybe a >> different thread): Is how people are coping with large >> amounts of data using Python (or Perl/Ruby for that >> matter)? Specifically the problem of performance. As >> I see it, there is more and more the case of depending >> on external (fast) programs or CLib extensions or Java >> extensions to do the bulk of the work. Inner-loops in >> Python simply do not cut for speed. > > C extensions to Python such as pysam, together with > outer-loops in Python/Biopython have been working > very well for me. Perhaps at some point pysam can > be included into Biopython, but as samtools is still > evolving it makes sense for it to be a separate package > so that it can be updated more quickly. I've got some partial SAM/BAM code in pure Python, partly as a learning exercise for the format itself and issues around that. > I am more concerned about relying on external programs, > in particular R. Notwithstanding the usefulness of rpy and > rpy2, I would prefer to have a pure-Python or > Python-with-C-extension solution, ideally as part of > Biopython. Python with C extensions (e.g. via CPython?) certainly have their role to play - and should be much faster than calling separate binaries and parsing their output as the payback. However, pure Python is also getting a lot more interesting with PyPy getting better and better. Peter From p.j.a.cock at googlemail.com Sat Feb 18 18:22:07 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 18 Feb 2012 18:22:07 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: 2012/2/18 Tiago Ant?o : > 4. Code to do multi-tasking. Actually Bio.PopGen has a scheduler to do > multiple (external) tasks at the same time, but I have written a new > one. Maybe the code does not belong into biopython, but a discussion > could be done around such a issue (I suppose people doing analysis of > lots of data have been having that problem, not just me?). I hear good things about Python's multiprocessing. I don't really see that this is a Biopython issue per se, but a much more general one for high performance computing in Python. We should probably focus on providing some examples of how to do this with the best standard/external libraries. Right now I'm some simple job splitting/merging code in Python for Galaxy jobs - we've got an in house server now hooked up to our cluster, which will be the primary way for non-programmers in the institute to run large jobs (e.g. preprepared pipelines for protein annotation). > 5. Some ensembl variation code: things like getting the ancestral > SNP (versus the derived) or getting all the stuff (genes mostly) in > a certain window position of the genome. Sounds good - I've not had much cause to look at Ensembl through work, but there would be interest in using this from Python. > On a side, I still have to do a 64 bit Windows port (remember?). Oh yes - this issue is still in debate for NumPy/SciPy where they have some quite complex build interdependencies to sort out. Waiting for an official 64bit Windows NumPy was one of our stumbling blocks. > Another issue that has been crossing my mind regards the inclusion of > new code: In my case I would really like to have something like a > "beta" version of the API: ie releasing something that is deemed > "unstable" API wise (to get comments from the community) and then > stabilize it. Concretly: in the first/second version people should > expect the API to not be stable and have changes. We've sort of done that already in that we've said in the release notes that whole modules are new and experimental (beta), and subject to change. Are you thinking of more than that - e.g. an import time warning? Peter From tiagoantao at gmail.com Sat Feb 18 18:57:32 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 18 Feb 2012 18:57:32 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: 2012/2/18 Peter Cock : > We've sort of done that already in that we've said in the release > notes that whole modules are new and experimental (beta), and > subject to change. Are you thinking of more than that - e.g. an > import time warning? Did not know that (skipped a few things in the last few months, I am afraid). That is what I had in mind. -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From p.j.a.cock at googlemail.com Sat Feb 18 19:11:11 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 18 Feb 2012 19:11:11 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: 2012/2/18 Tiago Ant?o : > 2012/2/18 Peter Cock : >> We've sort of done that already in that we've said in the release >> notes that whole modules are new and experimental (beta), and >> subject to change. Are you thinking of more than that - e.g. an >> import time warning? > > Did not know that (skipped a few things in the last few months, I am > afraid). That is what I had in mind. Actually, looking back, the NEWS file doesn't really say this. Maybe I am thinking of some of the release announcements? Anyway - I am OK with a clear warning up front in the NEWS and accompanying release announcement email/post that a new module Bio.XXX is considered to be in beta-testing and that its API is subject to change. But I would hope this would mean the module was still complete enough and stable enough to be useful - and of course not going to impact anything else in Biopython. Do you think an import time warning on top of this would be prudent? Power users can silence it with the warnings library. That seems like a good way to increase visibility of somewhat experimental code, while not giving false impressions of its stability. Peter From eric.talevich at gmail.com Sat Feb 18 19:20:41 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Feb 2012 14:20:41 -0500 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 4:39 AM, Peter Cock wrote: > Hi all, > > Assuming we ship Biopython 1.59 soon (Feb 2012), > we should start thinking about what is ready to merge to > the trunk afterwards to be included in Biopython 1.60, > and what else is being worked on beyond that. This > might also help with GSoC project ideas. > http://lists.open-bio.org/pipermail/biopython-dev/2012-February/009383.html > > My thoughts: 1. Lingering GSoC code: (a) Jo?o has some interesting features on his own fork that are not yet complete. I believe he's using this code for his own work and ironing out the details, and will let us know when it's ready to merge. (b) Mikael Trellet's Interface work is good, but not yet merged. Soem dependencies on Jo?o's work, maybe? (c) Michele Silva wrote two modules under Bio.PDB that use mocapy++ to do stuff. It's based on Thomas Hamelryck's published work, with some modifications to improve Biopython integration. I think it could be merged cleanly at any time, under either Bio.PDB or the proposed Bio.Struct. (d) I squashed Nick Matzke's Bio.Geography branch into a single commit and put it on another branch on my public fork. It needs work, and some day I'll probably take care of that. 2. Bio.Phylo improvements: (a) ETE offered use a public-domain NeXML parser; "all" I have to do is copy it into our codebase and convert/wrap parts of it to fit Bio.Phylo's Tree/Clade object model. Or something along those lines. (b) I have a RAxML wrapper that's mostly complete. It will go in Bio.Phylo.Applications. (c) I'd like to add functions for majority-rules consensus and Robinson-Foulds tree distance. Then I'll consider it pretty much feature-complete, pending feature requests from users. 3. SeqIO read-only support for PDB files ( https://redmine.open-bio.org/issues/3295). I've been using this code on my own. It fails to parse at least one PDB file I care about (3BEG); I haven't tried it on a larger set of PDB files. In any case this shouldn't be too hard to fix, and I'd like to see it in a stable Biopython release. Official releases for Python 3 (even if we do call it > a beta status release). Maybe we can even do this > with Biopython 1.59? Most things are working (with > the exception of some C code and missing third > party dependencies), and my concerns about the > memory overhead of unicode strings should be > resolved with Python 3.3 (the parsing speed > overhead perhaps not). > > I sense that Python 3 users are aware there can be minor performance regressions relative to Python 2, and this applies to libraries too. (I could be mistaken.) So I'd be in favor of making Biopython 1.59 an official release for Python 3, but marked "beta" in the release notes since not all parts of Biopython are fully functional in Python 3. -Eric From tiagoantao at gmail.com Sat Feb 18 19:24:00 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 18 Feb 2012 19:24:00 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: <1329565958.63072.YahooMailClassic@web161201.mail.bf1.yahoo.com> References: <1329565958.63072.YahooMailClassic@web161201.mail.bf1.yahoo.com> Message-ID: On Sat, Feb 18, 2012 at 11:52 AM, Michiel de Hoon wrote: > I am more concerned about relying on external programs, in particular R. Notwithstanding the usefulness of rpy and rpy2, I would prefer to have a pure-Python or Python-with-C-extension solution, ideally as part of Biopython. The problem with the Python-with-C extensions is the Jython universe, which has a few users (I am pretty sure I am not the only one ;) ). In terms of discussion I would separate R (which can be linked via rpy2) from other external programs (executables called explicitly). I do not have a coherent view myself, only that: 1. Python is not computationally efficient for lots of stuff 2. There are apps that we need to talk with 3. Talking with external programs/apps/libraries will probably cause the solution not to be 100% portable. And, 1. external executables (not R) are the easiest dependencies to maintain as long as they exist in Win+Mac+Lin (or Java platform independent). 2. C (or Java) libs are not portable (but sometimes unavoidable?). Note that external executables are portable: In a Jython implementation you can call an external program but not a CLib (and vice-versa: In a CLib implementation you can also call an external program but not a Jav Lib 3. R dependencies are extremely burdensome: require management of R plus extra R packages. Outside of Biopython, I am on the verge of needing to write some efficient algorithms for my personal work, which I want to call from Python. I am currently leaning to do it in Java (or C as a second option) as a stand-alone executable (and then execute the external program from Python and parse the results). Not using the library (either Jython or CLib) approach. And yes, Java kicks Python in terms of speed. Big time. -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From tiagoantao at gmail.com Sat Feb 18 19:29:43 2012 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Sat, 18 Feb 2012 19:29:43 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: 2012/2/18 Peter Cock : > Do you think an import time warning on top of this would be > prudent? Power users can silence it with the warnings library. > That seems like a good way to increase visibility of somewhat > experimental code, while not giving false impressions of its > stability. Oh surely. I never added complex code because I was afraid of breaking things in a second version. I think this is unavoidable: things never get really good at the first attempt - especially complex things. Therefore it would be good if we could warn the users on initial versions. Warn as in-their-face warnings. -- "If you want to get laid, go to college.? If you want an education, go to the library." - Frank Zappa From p.j.a.cock at googlemail.com Sat Feb 18 19:54:06 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 18 Feb 2012 19:54:06 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 7:20 PM, Eric Talevich wrote: > > 3. SeqIO read-only support for PDB files > (https://redmine.open-bio.org/issues/3295). I've been using this code on my > own. It fails to parse at least one PDB file I care about (3BEG); I haven't > tried it on a larger set of PDB files. In any case this shouldn't be too > hard to fix, and I'd like to see it in a stable Biopython release. If right now it has known failures, I don't want to squeeze this into Biopython 1.59 next week. Does your code manage to produce the same FASTA sequence as the PDB themselves offer for download? That would be my expectation as an end user. It should be easy enough to test if you've already done a full local PDB download. I'm still uneasy about this making SeqIO depend on NumPy (even as a soft dependency at runtime), given the fact that the rest of SeqIO should work fine under Jython and PpPy. Support for the NumPy API under PyPy is coming along, but isn't likely for Jython for now (although PyPy's efforts may help there). Peter From eric.talevich at gmail.com Sat Feb 18 22:17:24 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 18 Feb 2012 17:17:24 -0500 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 2:54 PM, Peter Cock wrote: > On Sat, Feb 18, 2012 at 7:20 PM, Eric Talevich > wrote: > > > > 3. SeqIO read-only support for PDB files > > (https://redmine.open-bio.org/issues/3295). I've been using this code > on my > > own. It fails to parse at least one PDB file I care about (3BEG); I > haven't > > tried it on a larger set of PDB files. In any case this shouldn't be too > > hard to fix, and I'd like to see it in a stable Biopython release. > > > If right now it has known failures, I don't want to squeeze this into > Biopython 1.59 next week. > Agreed! But 1.60 sounds like a good goal. > Does your code manage to produce the same FASTA sequence as > the PDB themselves offer for download? That would be my expectation > as an end user. It should be easy enough to test if you've already > done a full local PDB download. > If there are disordered regions (very common), the missing residues are replaced with 'X' characters. These residues can be listed in the SEQRES lines of the PDB header, if it's available, but they're not included with the atomic coordinates, so PdbIO can't reliably fill in these disordered residues for all PDB files. This matches the behavior of the tool I was using before (which is non-free and not widely used). I don't keep a local copy of PDB normally, but I'll download it and do the test before asking to merge PdbIO. > I'm still uneasy about this making SeqIO depend on NumPy (even as > a soft dependency at runtime), given the fact that the rest of SeqIO > should work fine under Jython and PpPy. Support for the NumPy > API under PyPy is coming along, but isn't likely for Jython for now > (although PyPy's efforts may help there). > > As an alternative, I could copy the portion of PDBParser and StructureBuilder that are needed to read the amino acid sequence, but skip creating Atoms. That would avoid the need for Numpy, at the cost of some code duplication. Interested in that approach? If so, I can take a closer look and report back on the feasibility. -Eric From p.j.a.cock at googlemail.com Sat Feb 18 22:24:02 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 18 Feb 2012 22:24:02 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 10:17 PM, Eric Talevich wrote: >> If right now it has known failures, I don't want to squeeze this into >> Biopython 1.59 next week. > > Agreed! But 1.60 sounds like a good goal. OK. >> Does your code manage to produce the same FASTA sequence as >> the PDB themselves offer for download? That would be my expectation >> as an end user. It should be easy enough to test if you've already >> done a full local PDB download. > > If there are disordered regions (very common), the missing residues are > replaced with 'X' characters. These residues can be listed in the SEQRES > lines of the PDB header, if it's available, but they're not included with > the atomic coordinates, so PdbIO can't reliably fill in these disordered > residues for all PDB files. This matches the behavior of the tool I was > using before (which is non-free and not widely used). > > I don't keep a local copy of PDB normally, but I'll download it and do the > test before asking to merge PdbIO. Great. >> I'm still uneasy about this making SeqIO depend on NumPy (even as >> a soft dependency at runtime), given the fact that the rest of SeqIO >> should work fine under Jython and PpPy. Support for the NumPy >> API under PyPy is coming along, but isn't likely for Jython for now >> (although PyPy's efforts may help there). > > As an alternative, I could copy the portion of PDBParser and > StructureBuilder that are needed to read the amino acid sequence, but skip > creating Atoms. That would avoid the need for Numpy, at the cost of some > code duplication. Interested in that approach? If so, I can take a closer > look and report back on the feasibility. Rather than literally copying it, do you think it is realistic to make some of Bio.PDB work without NumPy? e.g. fall back on tuples of floats (x,y,z) for atom co-ordinates. Just brainstorming - this might be a horrible idea? Peter From kellrott at gmail.com Sun Feb 19 01:19:09 2012 From: kellrott at gmail.com (Kyle) Date: Sat, 18 Feb 2012 17:19:09 -0800 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: > > Ontologies, GO and OBO - several people are looking > at this stuff but is anything "ready" yet? I can't see > Chris Lasher's repository on github anymore. > http://lists.open-bio.org/pipermail/biopython/2011-December/007682.html > I merged ntamas branch (which I think comes from Chris Lasher's branch), into https://github.com/kellrott/biopython/tree/gosupport Originally his code used NetworkX to provide graph support. I added in a class to provide that functionality (probably slower) should it throw an import error. Kyle From anaryin at gmail.com Mon Feb 20 14:30:23 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 20 Feb 2012 15:30:23 +0100 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: Hi all, Answering what "concerns" me :) > > > If there are disordered regions (very common), the missing residues are > > replaced with 'X' characters. These residues can be listed in the SEQRES > > lines of the PDB header, if it's available, but they're not included with > > the atomic coordinates, so PdbIO can't reliably fill in these disordered > > residues for all PDB files. This matches the behavior of the tool I was > > using before (which is non-free and not widely used). > The SEQRES contains the sequence used in the construct expressed and crystallized so it's never incomplete. What I've done in the past in these situations is iterate over the SEQRES and fill as '-' those residues that do not have coordinates. I don't know if I have any decent version of my MODELLER PIR format SeqIO stuff on github, but maybe we could work together to make it consistent (since what I wanted was PDB to seq essentially) ? Or maybe these are two different points of view for the same problem and need different solutions... https://github.com/JoaoRodrigues/biopython/tree/modeller-pirIO > Rather than literally copying it, do you think it is realistic to make > some of Bio.PDB work without NumPy? e.g. fall back on tuples > of floats (x,y,z) for atom co-ordinates. Just brainstorming - this > might be a horrible idea? > I kind of disagree because otherwise we'd have to convert them to numpy arrays everytime we need them. Regarding my own work, I've been slowly working on cleaning a bit Bio.PDB (for example, all those get_X methods that just return class attributes) and organising my own GSoC code into it and in Bio.Struct. I don't know when I have this even "alpha"-testable, it's been a long road and I had a couple of computer crashes that made me lose my data so.. When would there be a soft deadline for 1.60? Best, Jo?o From eric.talevich at gmail.com Mon Feb 20 16:20:33 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 20 Feb 2012 11:20:33 -0500 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: Hey Jo?o, On Mon, Feb 20, 2012 at 9:30 AM, Jo?o Rodrigues wrote: > Hi all, > > Answering what "concerns" me :) > > > >> > If there are disordered regions (very common), the missing residues are >> > replaced with 'X' characters. These residues can be listed in the SEQRES >> > lines of the PDB header, if it's available, but they're not included >> with >> > the atomic coordinates, so PdbIO can't reliably fill in these disordered >> > residues for all PDB files. This matches the behavior of the tool I was >> > using before (which is non-free and not widely used). >> > > The SEQRES contains the sequence used in the construct expressed and > crystallized so it's never incomplete. What I've done in the past in these > situations is iterate over the SEQRES and fill as '-' those residues that > do not have coordinates. > OK, we should implement that then. Perhaps we can avoid both the conditional numpy/PDB import and code duplication if we let parse_pdb_header call SeqIO.PdbIO for SEQRES lines. What about PDB files that don't have SEQRES lines? Should we... - Fall back to ATOM parsing automatically - Allow a flag for fallback (use_atoms_if_absolutely_must=False) - Require the user to specify whether to use SEQRES or ATOMs (use_seqres=True) - Use different format names, e.g. "pdb-seqres" and "pdb"/"pdb-atom"? Keeping in mind that secondary structure is also best represented as a SeqRecord, we could use "pdb-ss" or similar as another format eventually. > I don't know if I have any decent version of my MODELLER PIR format SeqIO > stuff on github, but maybe we could work together to make it consistent > (since what I wanted was PDB to seq essentially) ? Or maybe these are two > different points of view for the same problem and need different > solutions... > > https://github.com/JoaoRodrigues/biopython/tree/modeller-pirIO > Let's try to decouple these. I remember the original use case -- our goal would be to create Modeller-ready files with code like: target = SeqIO.read("foo.fa", "fasta") template = SeqIO.read("bar.pdb", "pdb") aln = ... # Pairwise alignment AlignIO.write(aln, "foobar.pir", "pir") How much more information would we need to extract from the PDB file (that isn't normally in a SeqRecord) to satisfy Modeller? > Rather than literally copying it, do you think it is realistic to make >> some of Bio.PDB work without NumPy? e.g. fall back on tuples >> of floats (x,y,z) for atom co-ordinates. Just brainstorming - this >> might be a horrible idea? >> > > I kind of disagree because otherwise we'd have to convert them to numpy > arrays everytime we need them. > For atomic coordinates, I don't think there's a pressing need to make numpy optional, but perhaps we could refactor parse_pdb_header to work without loading numpy. That would give use access to SEQRES lines, secondary structure, PDB ID, deposition date, etc. if they're specified in the header. > Regarding my own work, I've been slowly working on cleaning a bit Bio.PDB > (for example, all those get_X methods that just return class attributes) > and organising my own GSoC code into it and in Bio.Struct. I don't know > when I have this even "alpha"-testable, it's been a long road and I had a > couple of computer crashes that made me lose my data so.. When would there > be a soft deadline for 1.60? > > Cool, no worries about the timeline. I think it's generally best if major new feature sets are merged shortly after a stable release, so bleeding-edge users (like us) have time to use the new code in a variety of situations and find bugs and design issues. However, if you have a stub of Bio/Struct/__init__.py that you feel is ready to merge right after this week's release, I think we could start there and add new features under that namespace in the coming months. Cheers, Eric From b.invergo at gmail.com Mon Feb 20 17:17:19 2012 From: b.invergo at gmail.com (Brandon Invergo) Date: Mon, 20 Feb 2012 18:17:19 +0100 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: <1329758239.970.13.camel@localhost.localdomain> Hi Peter, After getting bitten by some subtle output differences between different versions of the PAML programs, I've been writing much stricter unit tests for the parsing routines. I have the big one, codeml, done, so I just have to do the output for the two smaller programs, baseml and yn00. So far, I've only had to change one line in a Bio.Phylo.PAML file to accommodate a parsing mistake for the oldest supported codeml version. It's related to a purely informational line in the output rather than to any generated results. Combined, the insignificance of the changed line and the extremely old software version that produces the difference mean that this change in the code is not mission critical. The testing code and the directory of PAML test resources is, however, significantly different. I can probably have the other two parts done by, say, Wednesday. The question, though, is whether this is worth trying to pull in for Biopython 1.59 or should I hold of on the pull request until after the release? Cheers, Brandon On Sat, 2012-02-18 at 09:15 +0000, Peter Cock wrote: > Hello all, > > Based on the typical release schedule, we're a little overdue > for releasing Biopython 1.59 - I would have raised this earlier > but January was busy for me. With the recent NCBI EFetch > change, and the workaround for it, it would be especially > good to get the release out soon. > > I propose we release Biopython 1.59 in the second half of > next week - essentially the master branch as it is. > > Most of the unit tests are also passing under PyPy (bar > the C extensions, and external dependencies like NumPy) > with the exception of some XML issues with the standard > library. If we mark these as known failures and include > them in the buildbot before then, we can announce this > release as having (partial) PyPy support. > > Does anyone else want to do the release? If not, I can. > > Any comments on this? I'll start a new thread for plans > for Biopython 1.60 - there are several exciting chunks > of new code that looks near ready for release. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Mon Feb 20 18:03:44 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 20 Feb 2012 18:03:44 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: <1329758239.970.13.camel@localhost.localdomain> References: <1329758239.970.13.camel@localhost.localdomain> Message-ID: On Monday, February 20, 2012, Brandon Invergo wrote: > Hi Peter, > > After getting bitten by some subtle output differences between different > versions of the PAML programs, I've been writing much stricter unit > tests for the parsing routines. I have the big one, codeml, done, so I > just have to do the output for the two smaller programs, baseml and > yn00. > > So far, I've only had to change one line in a Bio.Phylo.PAML file to > accommodate a parsing mistake for the oldest supported codeml version. > It's related to a purely informational line in the output rather than to > any generated results. Combined, the insignificance of the changed line > and the extremely old software version that produces the difference mean > that this change in the code is not mission critical. > > The testing code and the directory of PAML test resources is, however, > significantly different. > > I can probably have the other two parts done by, say, Wednesday. The > question, though, is whether this is worth trying to pull in for > Biopython 1.59 or should I hold of on the pull request until after the > release? > > I'm tied up in a workshop Monday (today) and Tuesday (tomorrow), so Wednesday was the earliest date anyway. If you have these extended unit tests done by Wednesday that would be great. Thanks, Peter From redmine at redmine.open-bio.org Mon Feb 20 22:58:38 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Mon, 20 Feb 2012 22:58:38 +0000 Subject: [Biopython-dev] [Biopython - Feature #3326] MultipleSeqAlignment should support iterators, not only slice objects References: Message-ID: Issue #3326 has been updated by Peter Cock. It strikes me that while Python sequences don't support this at all, numpy arrays do allow indexing with a list - but surprisingly perhaps not an iterator. I imaging the problem with iterators is when you have more than one dimension (here we have slicing in one or two dimensions), and the fact you'd be forced to cache the iterator values in a list. On balance, I would recommend doing this instead: new_align = MultipleSeqAlignment(old_align[i] for i in row_iter) Please bring this up on the mailing list if you want - we might spark some discussion and brainstorming. ---------------------------------------- Feature #3326: MultipleSeqAlignment should support iterators, not only slice objects https://redmine.open-bio.org/issues/3326 Author: Fabio Zanini Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Currently, the MultipleSeqAlignment object supports slicing via various syntaxes, e.g.: - alignment[4,6] - alignment[2:4,3:6] - alignment[3:4:5] In the latter case, the indices build a so-called slice, a pure Python object, and MultipleSeqAlignment has an explicit if clause for dealing with this case. However, the user might want to iterate over the MSA using the more general *iterators*, e.g. from itertools, rather than simple slice objects. An extension that includes iterators looks easy: # Check whether the index is an iterator if (hasattr(index, 'next')) and (hasattr(index:, '__iter__')): return MultipleSeqAlignment([self._records[i] for i in index], self._alphabet) Would you think this is useful? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Mon Feb 20 23:06:51 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 20 Feb 2012 23:06:51 +0000 Subject: [Biopython-dev] Optional libraries in README file (etc) Message-ID: Hi all, Do we need to have a new section in the README file for other soft dependencies like NetworkX? This file is now especially important as it gets shown on github for the project: https://github.com/biopython/biopython/blob/master/README Similarly, does the long installation manual need some work? https://github.com/biopython/biopython/blob/master/Doc/install/Installation.tex http://biopython.org/DIST/docs/install/Installation.html http://biopython.org/DIST/docs/install/Installation.pdf Peter From eric.talevich at gmail.com Tue Feb 21 14:53:20 2012 From: eric.talevich at gmail.com (Eric Talevich) Date: Tue, 21 Feb 2012 09:53:20 -0500 Subject: [Biopython-dev] Optional libraries in README file (etc) In-Reply-To: References: Message-ID: On Mon, Feb 20, 2012 at 6:06 PM, Peter Cock wrote: > Hi all, > > Do we need to have a new section in the README file for other soft > dependencies like NetworkX? This file is now especially important > as it gets shown on github for the project: > > https://github.com/biopython/biopython/blob/master/README > It would be good to have this info in the README. These would also be meaningful entries in the Debian package for Biopython, where soft dependencies would go in the "Suggests" field and probably just NumPy would go in "Recommends". A couple soft dependencies not listed in the README are networkx (which in turn depends on pygraphviz|pydot and graphviz in Debian) and matplotlib|pylab, both for Bio.Phylo._utils. > Similarly, does the long installation manual need some work? > > https://github.com/biopython/biopython/blob/master/Doc/install/Installation.tex > http://biopython.org/DIST/docs/install/Installation.html > http://biopython.org/DIST/docs/install/Installation.pdf > > I guess mxTextTools can probably go, now. Does MMCIFlex still work? The easy_install and pip approaches for installing Biopython and its dependencies should probably be prominent. Maybe an extra layer of nesting in the outline, like: 1. Installing Python 2. Installing pre-built packages (the easy way) - PyPI & pip / easy_install - On Linux: Biopython and NumPy are packaged for Ubuntu/Debian and presumably other Linux distros 3. Installing from source - Dependencies - Biopython - Installing with non-admin permissions (Unix/Mac) 4. Testing 5. Third-party tools -E From redmine at redmine.open-bio.org Tue Feb 21 21:36:30 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 21 Feb 2012 21:36:30 +0000 Subject: [Biopython-dev] [Biopython - Feature #3326] MultipleSeqAlignment should support iterators, not only slice objects References: Message-ID: Issue #3326 has been updated by Fabio Zanini. Right, neither Python nor Numpy support iterators, for different reasons - AFAIK. # Python lists actually do support it, kind of; that is the idea behind *list comprehensions*:
 new_list = [rec for rec in iterator]
does exactly this! # Numpy probably avoids it for problems when extending to many dimensions, as you mentioned. Multiple Sequence Alignments, however, are intrinsically two dimensional, and have no easy list comprehension. Your compromise is what I am proposing as well. This needs two steps: # we check that the index object supports _for_ cycles, i.e. has an __iter__ method (see http://docs.python.org/library/stdtypes.html#iterator-types):
 if hasattr(index, '__iter__'):
# we generate the new MSA by a for cycle:
 return MultipleSeqAlignment((self._records[i] for i in index), self._alphabet)
Note that double slicing is not really an issue, since in that case *we are already using that method*! In fact, we now have:
 #Handle double indexing
 [...]
 else:
     #e.g. sub_align = align[1:4, 5:7], gives another alignment
     return MultipleSeqAlignment((rec[col_index] for rec in self._records[row_index]), self._alphabet)
We would only need to modify this easily to:
 if hasattr(row_index, '__iter__'):
     return MultipleSeqAlignment((self._record[i][col_index] for i in row_index), self._alphabet)
Finally, I would gladly post to the mailing list. You mean the Biopython-Dev Mailing List , right? ---------------------------------------- Feature #3326: MultipleSeqAlignment should support iterators, not only slice objects https://redmine.open-bio.org/issues/3326 Author: Fabio Zanini Status: New Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: URL: Currently, the MultipleSeqAlignment object supports slicing via various syntaxes, e.g.: - alignment[4,6] - alignment[2:4,3:6] - alignment[3:4:5] In the latter case, the indices build a so-called slice, a pure Python object, and MultipleSeqAlignment has an explicit if clause for dealing with this case. However, the user might want to iterate over the MSA using the more general *iterators*, e.g. from itertools, rather than simple slice objects. An extension that includes iterators looks easy: # Check whether the index is an iterator if (hasattr(index, 'next')) and (hasattr(index:, '__iter__')): return MultipleSeqAlignment([self._records[i] for i in index], self._alphabet) Would you think this is useful? -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From fabio.zanini at fastmail.fm Tue Feb 21 21:44:40 2012 From: fabio.zanini at fastmail.fm (Fabio Zanini) Date: Tue, 21 Feb 2012 22:44:40 +0100 Subject: [Biopython-dev] Should MultipleSequenceAlignment support iterator slicing? Message-ID: <20120221214440.GB2430@X200.local> Hi all! I am using the MultipleSequenceAlignment class a lot these days, and would find it useful to get subalignments using python iterators. I started a discussion on the issue tracker: https://redmine.open-bio.org/issues/3326 Short version: I would like to do things like alignment[[4,5,8]] to get a subalignment with the 5th, 6th, and 9th rows. This syntax is not working at present, but can be implemented, for single as well as double indices, in a very simple way. For instance, for the single index case, if hasattr(index, '__iter__'): return MultipleSeqAlignment((self._records[i] for i in index), self._alphabet) Questions? Doubts? Cheers, Fabio From p.j.a.cock at googlemail.com Tue Feb 21 22:50:39 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 21 Feb 2012 22:50:39 +0000 Subject: [Biopython-dev] Should MultipleSequenceAlignment support iterator slicing? In-Reply-To: <20120221214440.GB2430@X200.local> References: <20120221214440.GB2430@X200.local> Message-ID: On Tue, Feb 21, 2012 at 9:44 PM, Fabio Zanini wrote: > Hi all! > > I am using the MultipleSequenceAlignment class a lot these days, and > would find it useful to get subalignments using python iterators. I > started a discussion on the issue tracker: > > https://redmine.open-bio.org/issues/3326 > > Short version: I would like to do things like > > alignment[[4,5,8]] > > to get a subalignment with the 5th, 6th, and 9th rows. This syntax is > not working at present, but can be implemented, for single as well as > double indices, in a very simple way. For instance, for the single index > case, > > if hasattr(index, '__iter__'): > ? ?return MultipleSeqAlignment((self._records[i] for i in index), > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? self._alphabet) > > Questions? Doubts? > > Cheers, > Fabio As I said on the bug, there are parallels with numpy arrays allowing indexing with lists (but not iterators). The problem with iterator indices for numpy arrays is you may have many axis - but an iterator can only be looped over once. This effectively means the iterator would have to be expanded into a list inside the __getitem__ code. This isn't so critical with multiple sequence alignments where we have just two dimensions. Supporting numpy array list list indexing should cover most use cases, including things like producing a resampled alignment for phylogenetic tree bootstrapping where random columns are selected. Does that sound useful enough to add (for rows and cols)? i.e. support row/col index lists - but not iterators? Peter From p.j.a.cock at googlemail.com Thu Feb 23 13:43:02 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 23 Feb 2012 13:43:02 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: <1329758239.970.13.camel@localhost.localdomain> Message-ID: On Mon, Feb 20, 2012 at 6:03 PM, Peter Cock wrote: > > > On Monday, February 20, 2012, Brandon Invergo wrote: >> I can probably have the other two parts done by, say, Wednesday. The >> question, though, is whether this is worth trying to pull in for >> Biopython 1.59 or should I hold of on the pull request until after the >> release? > > I'm tied up in a workshop Monday (today) and Tuesday (tomorrow), > so Wednesday was the earliest date anyway. If you have these > extended unit tests?done by Wednesday that would be great. That's checked in now, and the buildslaves are all green (good). Thanks Brandon. Any last minute requests for Biopython 1.59? I can hold off till tomorrow... or next week if we need to. Peter From p.j.a.cock at googlemail.com Thu Feb 23 15:26:31 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 23 Feb 2012 15:26:31 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: On Sat, Feb 18, 2012 at 9:15 AM, Peter Cock wrote: > Most of the unit tests are also passing under PyPy (bar > the C extensions, and external dependencies like NumPy) > with the exception of some XML issues with the standard > library. If we mark these as known failures and include > them in the buildbot before then, we can announce this > release as having (partial) PyPy support. I forgot ongoing headaches from handle 'leaks' too - for instance test_SeqIO_index.py fails apparently due to the delay in GC leading to a delay in the closing of handles: http://doc.pypy.org/en/latest/cpython_differences.html#differences-related-to-garbage-collection-strategies Peter From redmine at redmine.open-bio.org Thu Feb 23 17:17:59 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Thu, 23 Feb 2012 17:17:59 +0000 Subject: [Biopython-dev] [Biopython - Bug #2597] Enforce alphabet letters in Seq objects References: Message-ID: Issue #2597 has been updated by Eric Talevich. It would also be useful to be able to validate alphabets when constructing Seqs or SeqRecords from scratch. Here's a proposal that I believe fits with most of what's been agreed to so far. In Bio/Alphabet/__init__.py, replace _verify_alphabet with an efficiently implemented method on the Alphabet class and perhaps make it public:
def validate(self, sequence):
    """Raise a ValueError if sequence contains letters not allowed by alphabet.

    If alphabet does not define letters, it's all OK.
    ...
    """
    ok_letters = set(self.letters)
    if ok_letters:
        bad_letters = set(str(sequence)) - ok_letters
        if bad_letters:
            raise ValueError("Alphabet does not accept these letters: "
                             + ''.join(bad_letters))
In the Seq class, optionally add a method 'check_alphabet' which wraps Alphabet.validate:
def check_alphabet(self):
    self.alphabet.validate(self.data)
In SeqIO.parse and SeqIO.read, add an option check_alphabet=False, which calls either Alphabet.validate(seq) or seq.check_alphabet(). If validation fails, the exception is propagated up. I don't know how much this would affect performance, but it seems that users are willing to accept a small performance hit if they explicitly opt into validation. The extra 'if' statement may or may not be noticeable in the default case. ---------------------------------------- Bug #2597: Enforce alphabet letters in Seq objects https://redmine.open-bio.org/issues/2597 Author: Peter Cock Status: In Progress Priority: Normal Assignee: Biopython Dev Mailing List Category: Main Distribution Target version: Not Applicable URL: If a Seq object is created with an alphabet with a pre-defined set of letters (e.g. the IUPAC alphabets) then I think Biopython should validate that the sequence does indeed only use those letters. This will catch mis-use of ambiguous sequences with non-ambiguous alphabets, letters in an unexpected case, and most importantly any unexpected symbols (e.g. from a parsing problem). This will impose a performance overhead - which can be avoided if the user instead chooses to use a generic dna/rna/protein alphabet which does not list the letters expected. Note that we will have to resolve Bug 2532 before doing this, as currently some parts of Biopython are mis-using the upper case only IUPAC alphabet objects with mixed case sequences. -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org From p.j.a.cock at googlemail.com Fri Feb 24 00:24:01 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 00:24:01 +0000 Subject: [Biopython-dev] Entrez documentation and DTD checking Message-ID: Hi all, Prompted by this query I did check the Tutorial and docstrings and updated most (hopefully all) the efetch examples to include the now required retmode="text" argument. I also found a few missing DTD files as well while writing some doctests for Bio.Entrez (not sure right now how to integrate these into our run_tests.py framework while making them conditional on the --offline switch which we use on the buildbot). I'd be grateful if anyone has time to check for any other examples that need updating, or further NCBI DTD files we should include. This page claims nothing has changed since 2009 which is wrong: http://www.ncbi.nlm.nih.gov/corehtml/query/DTD/index.shtml This page claims nothing has changed since 2002 which is worse: http://www.ncbi.nlm.nih.gov/data_specs/dtd/other/entrez/ There are a few recent DTD files listed here though they may not apply to Entrez: http://www.ncbi.nlm.nih.gov/dtd/ or alternatively http://www.ncbi.nlm.nih.gov/data_specs/dtd/ Peter ---------- Forwarded message ---------- From: Peter Cock Date: 2012/2/23 Subject: Re: [Biopython] Entrez and SeqIO "no records found in handle" To: "??(Feng GAO)" Cc: "biopython at lists.open-bio.org" 2012/2/23 ??(Feng GAO) : > Hi all, > We have some python code using gi number to get record from Genbank. > Part of the code is: > > handle = Entrez.efetch(db="protein", id=ID, rettype="gb") > record = SeqIO.read(handle,"genbank") > > We have had no problem with this code > until this week when we started getting "ValueError: No records found in handle". > Anyone have an idea how to fix it now? Thanks! > Feng Try using an explicit retmode="text" in the efetch call. The NCBI changed the defaults with EFetch 2.0, which went live earlier this month. You're probably getting XML back instead. Note to self: I wonder if the Biopython tutorial examples need to be updated as well... Peter From p.j.a.cock at googlemail.com Fri Feb 24 13:16:10 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 13:16:10 +0000 Subject: [Biopython-dev] Biopython 1.60 plans and beyond In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 12:02 PM, James Casbon wrote: > Hi Peter et al, > > Bit late on this, but... > > On 18 February 2012 09:39, Peter Cock wrote: >> VCF format? Variant Call Format - Tiago what's >> you're impression of work in this area? Ahem, you're -> your. > If you think the original license is compatible I'd be happy to fold > PyVCF into biopython, if it fits. Excellent. > Aaron Quinlan is evaluating VCF parsers at the moment (not sure if > he's on this list), so he can probably give you some good feedback. Sounds good. > Some unknowns are: > 1. using cython/binding to a c library for speed (my preference is > probably pure python for pypy compatibility) We've not had any cython dependency so far, but it may be desirable in the future rather that writing lots of boilerplate code for calling C libraries. However, I'd hope for a pure Python fall back for Jython and PyPy etc. I presume Windows would be OK? > 2. where is BCF going, is that going to be important for a VCF lib? Not sure. > 3. There is an optional dependency on pysam, how does that fit with > biopython? ?(other replies in this thread indicate this is already the > case?) If it was a run time dependency on pysam that is workable. I'm unclear if pysam supports Windows, while Python 3 is still pending. Again, my preference is for a pure Python SAM/BAM library (and that is doable), possibly as a fallback for compiled code. > I would also like to know: > 1. is there existing variant code in biopython that a parser needs > integration with? Might this tie in with any of the population genetics code? > 2. are there other (perhaps more promising) formats that we would like > parse into the same kind of representation, e.g. > http://genomebiology.com/2010/11/8/R88/abstract I don't know. Peter From p.j.a.cock at googlemail.com Fri Feb 24 13:30:52 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 13:30:52 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: Hello all, No git commits to the master please until further notice - I'm going to do the release now. Peter From p.j.a.cock at googlemail.com Fri Feb 24 14:35:28 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 14:35:28 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 1:30 PM, Peter Cock wrote: > Hello all, > > No git commits to the master please until further notice - I'm going > to do the release now. > > Peter OK, git tagged, and release files and installers done, uploaded here as usual: http://biopython.org/DIST/ If anyone would like to grab and check those that would be great. I haven't pushed this to pypy yet either. Updated API files and Tutorial now live: http://biopython.org/DIST/docs/api/ http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf I will now prepare a draft announcement for the news blog and emailing... Peter From anaryin at gmail.com Fri Feb 24 14:56:51 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 24 Feb 2012 15:56:51 +0100 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: Hi Peter, In both Linux and Mac all tests ran fine. A remark, when running the tests I get plenty of these warnings: test_FSSP ... /Volumes/Home/users/joaor/Software/biopython-1.59/build/lib.macosx-10.6-intel-2.7/Bio/Align/Generic.py:54: BiopythonDeprecationWarning: With the introduction of the MultipleSeqAlignment class in Bio.Align, this base class is deprecated and is likely to be removed in a future release of Biopython. warnings.warn("With the introduction of the MultipleSeqAlignment class in Bio.Align, this base class is deprecated and is likely to be removed in a future release of Biopython.", Bio.BiopythonDeprecationWarning) Is there an option not to have the message repeated? It's just cosmetics but I thought of asking anyways.. Good work, Jo?o From p.j.a.cock at googlemail.com Fri Feb 24 15:30:37 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 15:30:37 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 2:35 PM, Peter Cock wrote: > > I will now prepare a draft announcement for the news blog and emailing... > How this look? Biopython 1.59 released Source distributions and Windows installers for *Biopython 1.59* are now available from the downloads page on the Biopython website and from the Python Package Index (PyPI) . Platforms/Deployment We currently support Python 2.5, 2.6 and 2.7 and also test under Jython 2.5 (which does not cover NumPy). Please note that this release will not work on Python 2.4 Most functionality is also working under Python 3.1 and 3.2 (including modules using NumPy ), and under PyPy(excluding our NumPy dependencies). We are now encouraging early adopters to help beta testing on these platforms. The installation setup.py now supports ?install_requires? when setuptools is installed. This avoids the manual dialog when installing Biopython via easy_install or pip and numpy is not installed. It also allows user libraries that require Biopython to include it in their install_requires and get automatic installation of dependencies. Features New module Bio.TogoWS offers a wrapper for the TogoWS REST API, a web service based in Japan offering access to KEGG, DDBJ, PDBj, CBRC plus access to some NCBI and EBI resources including PubMed, GenBank and UniProt. This is much easier to use than the NCBI Entrez API, but should be especially useful for Biopython users based in Asia. The NCBI Entrez Fetchfunction Bio.Entrez.efetch has been updated to handle the NCBI?s stricter handling of multiple ID arguments in EFetch 2.0 (released February 2012), however the NCBI have also changed the retmode default argument so you may need to make this explicit. e.g. retmode="text" The position objects used in Bio.SeqFeature now act almost like integers, making dealing with fuzzy locations in EMBL/GenBank files much easier. Also the SeqFeature?s strand and any database reference are now properties of the FeatureLocation object (a more logical placement), with proxy methods for backwards compatibility. Bio.Graphics.BasicChromosome has been extended to allow simple sub-features to be drawn on chromosome segments, suitable to show the position of genes, SNPs or other loci. Bio.Graphics.GenomeDiagram has been extended to allow cross-links between tracks, and track specific start/end positions for showing regions. This can be used to imitate the output from the Artemis Comparison Tool (ACT). Also, a new attribute circle_core makes it easier to have an empty space in the middle of a circular diagram (see tutorial). Note Bio.Graphics requires the ReportLab library . Bio.Align.Applications now includes a wrapper for command line tool Clustal Omega for protein multiple sequence alignment. Bio.AlignIO now supports sequential PHYLIP files (as well as interlaced PHYLIP files) as a separate format variant. Additionally there have been other minor bug fixes and more unit tests, and updates to the documentation including the Biopython Tutorial( PDF ). Contributors Many thanks to the Biopython developers and community for making this release possible, especially the following contributors: - Andreas Wilm (first contribution) - Alessio Papini (first contribution) - Brad Chapman - Brandon Invergo - Connor McCoy - Eric Talevich - Joao Rodrigues - Konrad F?rstner (first contribution) - Michiel de Hoon - Matej Repi? (first contribution) - Leighton Pritchard - Peter Cock From anaryin at gmail.com Fri Feb 24 15:36:01 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 24 Feb 2012 16:36:01 +0100 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: Maybe a small reference to the TogoWS issue? Otherwise people might get worried.. And another cosmetic change, would you mind adding a tilde to my name? Copy-paste it from my sig. I'm usually not picky but since there are weird characters in there already ;) Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao From p.j.a.cock at googlemail.com Fri Feb 24 15:39:54 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 15:39:54 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 3:36 PM, Jo?o Rodrigues wrote: > And another cosmetic change, would you mind adding a tilde to my name? > Copy-paste it from my sig. I'm usually not picky but since there are weird > characters in there already ;) Sure, I'll do that for the release notes. Do can update the NEWS file if you like for future mentions. Peter From p.j.a.cock at googlemail.com Fri Feb 24 15:45:08 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 15:45:08 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 2:56 PM, Jo?o Rodrigues wrote: > Hi Peter, > > In both Linux and Mac all tests ran fine. > Thank you - I would have been surprised if not though. It would suggest a gap in the buildbot coverage. > A remark, when running the tests I get plenty of these warnings: > > test_FSSP ... > /Volumes/Home/users/joaor/Software/biopython-1.59/build/lib.macosx-10.6-intel-2.7/Bio/Align/Generic.py:54: > BiopythonDeprecationWarning: With the introduction of the > MultipleSeqAlignment class in Bio.Align, this base class is deprecated and > is likely to be removed in a future release of Biopython. > ? warnings.warn("With the introduction of the MultipleSeqAlignment class in > Bio.Align, this base class is deprecated and is likely to be removed in a > future release of Biopython.", Bio.BiopythonDeprecationWarning) > > Is there an option not to have the message repeated? It's just cosmetics but > I thought of asking anyways.. > > Good work, > > Jo?o I don't think we can alter the repeat when the warning is shown, but we can silence it for this test. We should do that (but really we should update the FSSP code), likewise the warning for the BioSQL feature 'order' thing. We've got similar code in other tests - the trouble is that the warnings module is global and there are subtle interactions between test scripts. I think we need to add some cleanup in run_tests.py to restore the filters as a fall back. Do you want to try this (after the release)? Peter From anaryin at gmail.com Fri Feb 24 15:46:36 2012 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 24 Feb 2012 16:46:36 +0100 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: Sure, we can look at it afterwards then, it was just a cosmetic issue, it's annoying to see so many repeated lines. Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao No dia 24 de Fevereiro de 2012 16:45, Peter Cock escreveu: > On Fri, Feb 24, 2012 at 2:56 PM, Jo?o Rodrigues wrote: > > Hi Peter, > > > > In both Linux and Mac all tests ran fine. > > > > Thank you - I would have been surprised if not though. It would > suggest a gap in the buildbot coverage. > > > A remark, when running the tests I get plenty of these warnings: > > > > test_FSSP ... > > > /Volumes/Home/users/joaor/Software/biopython-1.59/build/lib.macosx-10.6-intel-2.7/Bio/Align/Generic.py:54: > > BiopythonDeprecationWarning: With the introduction of the > > MultipleSeqAlignment class in Bio.Align, this base class is deprecated > and > > is likely to be removed in a future release of Biopython. > > warnings.warn("With the introduction of the MultipleSeqAlignment class > in > > Bio.Align, this base class is deprecated and is likely to be removed in a > > future release of Biopython.", Bio.BiopythonDeprecationWarning) > > > > Is there an option not to have the message repeated? It's just cosmetics > but > > I thought of asking anyways.. > > > > Good work, > > > > Jo?o > > I don't think we can alter the repeat when the warning is shown, > but we can silence it for this test. We should do that (but really > we should update the FSSP code), likewise the warning for the > BioSQL feature 'order' thing. > > We've got similar code in other tests - the trouble is that the > warnings module is global and there are subtle interactions > between test scripts. I think we need to add some cleanup in > run_tests.py to restore the filters as a fall back. > > Do you want to try this (after the release)? > > Peter > From p.j.a.cock at googlemail.com Fri Feb 24 16:44:54 2012 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 24 Feb 2012 16:44:54 +0000 Subject: [Biopython-dev] Biopython 1.59 plans In-Reply-To: References: Message-ID: On Fri, Feb 24, 2012 at 3:46 PM, Jo?o Rodrigues wrote: > Sure, we can look at it afterwards then, it was just a cosmetic issue, it's > annoying to see so many repeated lines. Great - thanks. No one spotted the fact that today is still 24 Feb, not 25 Feb? Oh well... the NEWS file in the release itself will just be a day out. http://news.open-bio.org/news/2012/02/biopython-1-59-released/ Normal git usage can resume... back to the Biopython 1.60 thread. Peter From fabio.zanini at fastmail.fm Sun Feb 26 11:29:08 2012 From: fabio.zanini at fastmail.fm (Fabio Zanini) Date: Sun, 26 Feb 2012 12:29:08 +0100 Subject: [Biopython-dev] Should MultipleSequenceAlignment support iterator slicing? In-Reply-To: References: <20120221214440.GB2430@X200.local> Message-ID: <20120226112908.GA20547@X200.local> On Tue, Feb 21, 2012 at 10:50:39PM +0000, Peter Cock wrote: > On Tue, Feb 21, 2012 at 9:44 PM, Fabio Zanini wrote: > > Hi all! > > > > I am using the MultipleSequenceAlignment class a lot these days, and > > would find it useful to get subalignments using python iterators. I > > started a discussion on the issue tracker: > > > > https://redmine.open-bio.org/issues/3326 > > > > Short version: I would like to do things like > > > > alignment[[4,5,8]] > > > > Does that sound useful enough to add (for rows and cols)? > i.e. support row/col index lists - but not iterators? > Support for lists in both dimensions is already quite big of an improvement and we should definitely implement this. This should cover most use cases. Since iterators can be iterated over only once, a memory efficient solution for row+column double iterators, which do not convert any of them into a list, is not that easy. Let's introduce the list support for now! Cheers, Fabio From redmine at redmine.open-bio.org Tue Feb 28 04:56:45 2012 From: redmine at redmine.open-bio.org (redmine at redmine.open-bio.org) Date: Tue, 28 Feb 2012 04:56:45 +0000 Subject: [Biopython-dev] [Biopython - Feature #3329] (New) MMCIF parser should take an open file handle Message-ID: Issue #3329 has been reported by Mark Diekhans. ---------------------------------------- Feature #3329: MMCIF parser should take an open file handle https://redmine.open-bio.org/issues/3329 Author: Mark Diekhans Status: New Priority: Normal Assignee: Category: Target version: URL: MMCIF parser should take an open file as well as a file path. We we unable to use this paser because we need to read compressed files. Reading from a file handle is the most flexible API. thanks!!! ---------------------------------------- You have received this notification because this email was added to the New Issue Alert plugin -- You have received this notification because you have either subscribed to it, or are involved in it. To change your notification preferences, please click here and login: http://redmine.open-bio.org