From bugzilla-daemon at portal.open-bio.org Tue Dec 1 07:28:33 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Dec 2009 07:28:33 -0500 Subject: [Biopython-dev] [Bug 2957] GenBank Writer Should Write Out Date In-Reply-To: Message-ID: <200912011228.nB1CSXec001831@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2957 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-01 07:28 EST ------- A slightly more robust version of this has been checked in. Future work could handle date/time objects. Please reopen this bug if there are any problems. Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Dec 1 14:34:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 1 Dec 2009 19:34:19 +0000 Subject: [Biopython-dev] Fwd: [Utilities-announce] NCBI E-Utility Policy Change In-Reply-To: <320fb6e00912011129j68dda3b2p6df9a232f0462458@mail.gmail.com> References: <7B6F170840CA6C4DA63EE0C8A7BB43EC09CA7387@NIHCESMLBX15.nih.gov> <320fb6e00912011129j68dda3b2p6df9a232f0462458@mail.gmail.com> Message-ID: <320fb6e00912011134u2481644aw5dfdfe9f9a3049f0@mail.gmail.com> Hi all, Attention NCBI Entrez users - the NCBI really do want you to include your email address, and it will be mandatory in future! See below... If using Bio.Entrez, the tool parameter will by default be set to Biopython, but the email is omitted. We already encourage the email to be included in our documentation but given the new NCBI guidance I'd suggest we make omitting the email issue a warning in the next release (and an error in the subsequent release of Biopython?). Peter ---------- Forwarded message ---------- From: ? Date: Tue, Dec 1, 2009 at 6:59 PM Subject: [Utilities-announce] NCBI E-Utility Policy Change To: utilities-announce at ncbi.nlm.nih.gov As part of an ongoing effort to ensure efficient access to the Entrez Utilities (E-utilities) by all users, NCBI has decided to change the usage policy for the E-utilities effective June 1, 2010. Effective on June 1, 2010, all E-utility requests, either using standard URLs or SOAP, must contain non-null values for both the &tool and &email parameters. Any E-utility request made after June 1, 2010 that does not contain values for both parameters will return an error explaining that these parameters must be included in E-utility requests. The value of the &tool parameter should be a URI-safe string that is the name of the software package, script or web page producing the E-utility request. The value of the &email parameter should be a valid e-mail address for the appropriate contact person or group responsible for maintaining the tool producing the E-utility request. NCBI uses these parameters to contact users whose use of the E-utilities violates the standard usage policies described at http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements. These usage policies are designed to prevent excessive requests from a small group of users from reducing or eliminating the wider community's access to the E-utilities. NCBI will attempt to contact a user at the e-mail address provided in the &email parameter prior to blocking access to the E-utilities. NCBI realizes that this policy change will require many of our users to change their code. Based on past experience, we anticipate that most of our users should be able to make the necessary changes before the June 1, 2010 deadline. If you have any concerns about making these changes by that date, or if you have any questions about these policies, please contact eutilities at ncbi.nlm.nih.gov. Thank you for your understanding and cooperation in helping us continue to deliver a reliable and efficient web service. _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce -------------- next part -------------- _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From chapmanb at 50mail.com Wed Dec 2 07:57:44 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 2 Dec 2009 07:57:44 -0500 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <320fb6e00911270823g320c7c24pd0773ae8b72902ee@mail.gmail.com> References: <320fb6e00904060625v4a49da2au76159eae18f707eb@mail.gmail.com> <20090406220826.GH43636@sobchak.mgh.harvard.edu> <320fb6e00911270823g320c7c24pd0773ae8b72902ee@mail.gmail.com> Message-ID: <20091202125744.GA46415@sobchak.mgh.harvard.edu> Hi Peter; > Brad has some GFF parsing code he as been working on, which > would be nice to merge into Biopython at some point. See: > > http://lists.open-bio.org/pipermail/biopython-dev/2009-April/005700.html > > As we started to discuss earlier this year, we need to think about > what to do with the existing (old) Bio.GFF module. This was written > by Michael Hoffman back in 2002 which accesses MySQL General > Feature Format (GFF) databases created with BioPerl. > > I've been looking at the old Bio.GFF code, and there are a lot of > redundant things like its own GenBank/EMBL location parsing, > plus its own location objects and its own Feature objects (rather > than reusing Bio.SeqFeature which should have sufficed). I'm ambivalent on deprecating GFF. Agreed that some of it is not well integrated with the rest of Biopython, with the Location/LocationFromString code being the most duplicated. It's too bad feature were reimplemented as well. Is Michael around at all? > I want to suggest we deprecate Michael Hoffman's Bio.GFF module > in Biopython 1.53 (I'm hoping we can do this next month, Dec 2009). > Depending on how soon Brad's code is ready to be merged (which I > am assuming could be Biopython 1.54, spring 2010), we can perhaps > accelerate removal of the old module. The current structure of the GFF code does not require removing what is currently there. It needs a couple of lines in __init__.py to expose the useful classes at the top level: from GFFParser import GFFParser, DiscoGFFParser, GFFExaminer from GFFOutput import GFF3Writer and we'd need to move the MySQLdb check to the Connection class so it's only needed if you are actually using the database code. So these can happen in parallel. Ideally, I'd like to get the GFF stuff in sooner rather than later. The main item on my todo list is finishing the documentation, with the stubs here: http://biopython.org/wiki/GFF_Parsing If I crank that out what do we think about putting it in with the __init__.py modifications I suggested? Brad From mjldehoon at yahoo.com Wed Dec 2 09:29:27 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 2 Dec 2009 06:29:27 -0800 (PST) Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <20091202125744.GA46415@sobchak.mgh.harvard.edu> Message-ID: <317375.58712.qm@web62401.mail.re1.yahoo.com> --- On Wed, 12/2/09, Brad Chapman wrote: > If I crank that out what do we think about putting it in > with the __init__.py modifications I suggested? I'd definitely welcome a GFF parser in Biopython, but I think the current code needs to be simplified and its usage more consistent with other Biopython modules. It's great that the documentation is available. It's a big help in designing the module, in particular what its usage looks like to the user. Let's start from basic GFF parsing. This is the example in the documentation: >>> from BCBio.GFF import GFFParser >>> in_file = "your_file.gff" >>> parser = GFFParser() >>> in_handle = open(in_file) >>> for rec in parser.parse(in_handle): ... print rec >>> in_handle.close() What is the purpose of creating the parser first, and then calling parser.parse on the in_handle? I'd much rather have >>> from BCBio import GFF >>> in_file = "your_file.gff" >>> in_handle = open(in_file) >>> for rec in GFF.parse(in_handle): ... print rec >>> in_handle.close() which is how most other Biopython parsers work. --Michiel. From chapmanb at 50mail.com Thu Dec 3 09:25:34 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 3 Dec 2009 09:25:34 -0500 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <317375.58712.qm@web62401.mail.re1.yahoo.com> References: <20091202125744.GA46415@sobchak.mgh.harvard.edu> <317375.58712.qm@web62401.mail.re1.yahoo.com> Message-ID: <20091203142534.GF51407@sobchak.mgh.harvard.edu> Hi Michiel; > > If I crank that out what do we think about putting it in > > with the __init__.py modifications I suggested? > > I'd definitely welcome a GFF parser in Biopython, but I think the > current code needs to be simplified and its usage more consistent > with other Biopython modules. It's great that the documentation is > available. It's a big help in designing the module, in particular what > its usage looks like to the user. Awesome. I welcome these suggestions; it's really helpful to have fresh eyes looking at it. Hopefully moving it into Biopython will stimulate that. > Let's start from basic GFF parsing. This is the example in the documentation: [...] > What is the purpose of creating the parser first, and then calling > parser.parse on the in_handle? I'd much rather have > > >>> from BCBio import GFF > >>> in_file = "your_file.gff" > >>> in_handle = open(in_file) > >>> for rec in GFF.parse(in_handle): > ... print rec > >>> in_handle.close() Great -- done for parsing and writing and committed to GitHub. The documentation is updated as well. Happy to get other comments and thoughts. Thanks again, Brad From biopython at maubp.freeserve.co.uk Thu Dec 3 09:53:44 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 3 Dec 2009 14:53:44 +0000 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <20091203142534.GF51407@sobchak.mgh.harvard.edu> References: <20091202125744.GA46415@sobchak.mgh.harvard.edu> <317375.58712.qm@web62401.mail.re1.yahoo.com> <20091203142534.GF51407@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> On Thu, Dec 3, 2009 at 2:25 PM, Brad Chapman wrote: > > Great -- done for parsing and writing and committed to GitHub. The > documentation is updated as well. > > Happy to get other comments and thoughts. Thanks again, > I understand that GFF files are complex, and a simple "record iterator" isn't flexible enough to cover all use cases - hence the need for a complex parser class. That said, Michiel is right that GFF.parse() or GFF.read() functions would be consistent with other bits of Biopython, and would provide for the simple use cases. Looking at your code, BCBio.GFF.parse(...) would return SeqRecord objects (with SeqFeatures). That seems redundant to me as one expect people to just use Bio.SeqIO.parse(handle, "gff3") instead. I would instead have expected BCBio.GFF.parse(...) to iterate over the features in a GFF file. Also, and we'd touched on this before - I'd much prefer to have the GFF module quite "low level" using either new GFF-specific classes or simple Python objects (e.g. for each feature use a tuple of ints and strings for the first feature columns plus a dict for the final extendible column of annotation). >From a technical point of view, a justification for this separation is the GFF details are not a perfect fit to the SeqRecord and SeqFeature objects and forcing their use adds unnecessary overheads for people wanting to work directly with the features themselves. Also, by splitting the code into basic parsing and a SeqRecord/SeqFeature conversion layer (which I would put in Bio/SeqIO/GffIO.py) we can add the code in two steps (first GFF parsing, then SeqIO support). I think this split is useful as this is a very big job to do properly: Once we have GFF to SeqRecord parsing, we need to try and ensure that it is compatible with the GenBank to SeqRecord parsing. This is important as we would in effect be extending Biopython to allow GFF3 to GenBank conversions. For testing all this, we can grab the same data in the two file formats (e.g. from the NCBI) and perhaps also use EMBOSS. You may recall we talked to Peter Rice (from EMBOSS) about this - there are some important issues here like ontology mapping where we should be able to reuse a lot of the work EMBOSS has already done (and use the EMBOSS tools to help validate our mapping). i.e. While I may be being overly cautious, I think that while adding GFF parsing and GFF to SeqRecord mapping is very important, it is also very complex. Therefore breaking this into a two stage task makes managing and testing it easier - as well as seeming a good idea for the code itself. Peter From bugzilla-daemon at portal.open-bio.org Thu Dec 3 10:03:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Dec 2009 10:03:29 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912031503.nB3F3Tu8013049@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-03 10:03 EST ------- Brad, Now that Chris at BioPerl is interested, I am confident we can get the SQLite schema into BioSQL in the near future: http://lists.open-bio.org/pipermail/biosql-l/2009-November/001655.html Do you want to update your patch (if needed) and put this up on a Biopython branch in github? How soon do you think it could be ready to merge? It would be nice to have this in the next release (even if we put a bug "beta" warning in)? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Dec 3 10:30:54 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 3 Dec 2009 15:30:54 +0000 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <20091202125744.GA46415@sobchak.mgh.harvard.edu> References: <320fb6e00904060625v4a49da2au76159eae18f707eb@mail.gmail.com> <20090406220826.GH43636@sobchak.mgh.harvard.edu> <320fb6e00911270823g320c7c24pd0773ae8b72902ee@mail.gmail.com> <20091202125744.GA46415@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00912030730rb66c2dav1993465ba25f9f5f@mail.gmail.com> On Wed, Dec 2, 2009 at 12:57 PM, Brad Chapman wrote: > Hi Peter; > >> Brad has some GFF parsing code he as been working on, which >> would be nice to merge into Biopython at some point. See: >> >> http://lists.open-bio.org/pipermail/biopython-dev/2009-April/005700.html >> >> As we started to discuss earlier this year, we need to think about >> what to do with the existing (old) Bio.GFF module. This was written >> by Michael Hoffman back in 2002 which accesses MySQL General >> Feature Format (GFF) databases created with BioPerl. >> >> I've been looking at the old Bio.GFF code, and there are a lot of >> redundant things like its own GenBank/EMBL location parsing, >> plus its own location objects and its own Feature objects (rather >> than reusing Bio.SeqFeature which should have sufficed). > > I'm ambivalent on deprecating GFF. Agreed that some of it is not > well integrated with the rest of Biopython, with the > Location/LocationFromString code being the most duplicated. It's too > bad feature were reimplemented as well. Is Michael around at all? I got in touch with Michael Hoffman - he has moved from the EBI to the University of Washington but his EBI email address still worked. He said: "Please feel free to deprecate the module or make any necessary changes for the project." Even if you (Brad) didn't have a new GFF parser waiting to be added to Biopython, I would still want to do something with Bio.GFF to reduce the redundancy of location and feature code. Deprecation is the simplest solution (but I may be able to reuse some of his location string parsing code on Bug 2738). Peter From bugzilla-daemon at portal.open-bio.org Thu Dec 3 10:32:31 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Dec 2009 10:32:31 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200912031532.nB3FWV7G013739@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-03 10:32 EST ------- Note - we may be able to reuse some of the location string parsing ideas in Bio/GFF/easy.py here too... Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 4 07:31:51 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Dec 2009 07:31:51 -0500 Subject: [Biopython-dev] [Bug 2961] New: Adding undocumented file format switches to MUSCLE wrapper Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2961 Summary: Adding undocumented file format switches to MUSCLE wrapper Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk As discussed on the mailing list, and confirmed with MUSCLE author Robert Edgar, there are a number of useful command line arguments for things like PHYLIP output (both interlaced and sequential) which the Bio.Align.Applications wrapper does not support. See: http://lists.open-bio.org/pipermail/biopython/2009-December/005881.html We should add these. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 4 07:50:25 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Dec 2009 07:50:25 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912041250.nB4CoP72007627@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 ------- Comment #1 from cymon.cox at gmail.com 2009-12-04 07:50 EST ------- Created an attachment (id=1408) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1408&action=view) Add PHYLIP output to Muscle command line interface -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 4 08:14:08 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Dec 2009 08:14:08 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912041314.nB4DE8aA008792@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1408 is|0 |1 obsolete| | ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-04 08:14 EST ------- (From update of attachment 1408) Patch applied. Should we also add -phyiout, -physout, -htmlout, -msfout, -clwout etc (which all take a filename)? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 4 08:21:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Dec 2009 08:21:52 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912041321.nB4DLqsd009037@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 ------- Comment #3 from cymon.cox at gmail.com 2009-12-04 08:21 EST ------- (In reply to comment #2) > (From update of attachment 1408 [details]) > Patch applied. > > Should we also add -phyiout, -physout, -htmlout, -msfout, -clwout etc (which > all take a filename)? ! Is there anything else undocumented? OK, I'll do that asap. I'll also add tests - change test suite to use subprocess module etc. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 4 08:36:11 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Dec 2009 08:36:11 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912041336.nB4DaBvS009365@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-04 08:36 EST ------- (In reply to comment #3) > (In reply to comment #2) > > (From update of attachment 1408 [details] [details]) > > Patch applied. > > > > Should we also add -phyiout, -physout, -htmlout, -msfout, -clwout etc (which > > all take a filename)? > > ! Is there anything else undocumented? Robert did imply there could be other things as his documentation was out of sync with the code :( These after of limited value given you can use "-phyi -out filename.phy" as an alternative to "-phyiout filename.phy" however one bonus feature is these options allow you to get multiple output files in one run (e.g. both HTML output to inspect visually and ClustalW output to parse). > OK, I'll do that asap. I'll also add tests - change test suite to use > subprocess module etc. I'd forgotten about that (using subprocess rather than generic_run in our unit tests). Could you do that as a separate patch please? Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at 50mail.com Fri Dec 4 08:40:10 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 4 Dec 2009 08:40:10 -0500 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> References: <20091202125744.GA46415@sobchak.mgh.harvard.edu> <317375.58712.qm@web62401.mail.re1.yahoo.com> <20091203142534.GF51407@sobchak.mgh.harvard.edu> <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> Message-ID: <20091204134010.GK51407@sobchak.mgh.harvard.edu> Hi all; Peter, thanks for the feedback. Thoughts below. > Looking at your code, BCBio.GFF.parse(...) would return > SeqRecord objects (with SeqFeatures). That seems > redundant to me as one expect people to just use > Bio.SeqIO.parse(handle, "gff3") instead. I would instead > have expected BCBio.GFF.parse(...) to iterate over the > features in a GFF file. This would work for simple cases, but for most real life cases you will likely want to limit the file to a subset of things you are interested in. It helps reduce memory problems, and is equivalent to a track system view in UCSC or Ensembl. I find it very useful for all of the work I've done with it. We could use SeqIO here, but then there is the issue of passing along the additional arguments. The simplicity of SeqIO is really nice, so not sure if we'd want to clutter SeqIO with it. So we could support basic parsing in SeqIO, but it would be useful to have this GFF specific parsing as the additional arguments will be a regular use case. > Also, and we'd touched on this before - I'd much prefer to > have the GFF module quite "low level" using either new > GFF-specific classes or simple Python objects (e.g. for > each feature use a tuple of ints and strings for the first > feature columns plus a dict for the final extendible > column of annotation). Yes, it is implemented this way. The parse_simple function returns a line by line parse of the file as a dictionary, which is then used to build up the SeqFeature objects: http://github.com/chapmanb/bcbb/blob/master/gff/BCBio/GFF/GFFParser.py We can document and flesh that out, although I'm not really sure how useful it will be. It's pretty easy to build your own simple line-by-line GFF parser; the only advantage of this code over a home-brew is that it handles tricky annotation cases. For all of my uses, the real win was being able to build up the multiple transcript exon/intron structures from the file. This is not trivial to do on your own, and the real win of the code is in handling this, especially for older GFF2 and GTF formatted files. > From a technical point of view, a justification for this > separation is the GFF details are not a perfect fit to the > SeqRecord and SeqFeature objects and forcing their > use adds unnecessary overheads for people wanting > to work directly with the features themselves. Why are SeqRecord and SeqFeature not appropriate for GFF? We could improve them to make things more lightweight, as we discussed previously, but conceptually the values fit into the framework fine. > Also, by splitting the code into basic parsing and a > SeqRecord/SeqFeature conversion layer (which I > would put in Bio/SeqIO/GffIO.py) we can add the > code in two steps (first GFF parsing, then SeqIO > support). We can do this as is. I'm not suggesting SeqIO support right now, and want to target getting the GFF parser as is into Biopython. > I think this split is useful as this is a very big job to do > properly: Once we have GFF to SeqRecord parsing, > we need to try and ensure that it is compatible with the > GenBank to SeqRecord parsing. This is important as > we would in effect be extending Biopython to allow > GFF3 to GenBank conversions. For testing all this, > we can grab the same data in the two file formats > (e.g. from the NCBI) and perhaps also use EMBOSS. Do you think GFF to GenBank is a common use case? Agreed that it is very hard, but this really had less to do with the object structure in Biopython and more to do with how things are represented and named in the original source files. GenBank has some "consistency" since it is produced mostly by NCBI, but GFF files are all over the place. This can be tackled later if someone wants, but right now my goals are simply: - Produce Biopython objects from GFF3/GTF/GFF2 files - Represent nested features - Allow GFF2/GTF to GFF3 conversion This should be done with the current code. We can formalize the raw parse_simple output for the line-by-line if people find it useful, but otherwise we should leave these bigger projects for down the line. Brad From biopython at maubp.freeserve.co.uk Fri Dec 4 09:25:40 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 4 Dec 2009 14:25:40 +0000 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <20091204134010.GK51407@sobchak.mgh.harvard.edu> References: <20091202125744.GA46415@sobchak.mgh.harvard.edu> <317375.58712.qm@web62401.mail.re1.yahoo.com> <20091203142534.GF51407@sobchak.mgh.harvard.edu> <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> <20091204134010.GK51407@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com> On Fri, Dec 4, 2009 at 1:40 PM, Brad Chapman wrote: > Hi all; > Peter, thanks for the feedback. Thoughts below. > >> Looking at your code, BCBio.GFF.parse(...) would return >> SeqRecord objects (with SeqFeatures). That seems >> redundant to me as one expect people to just use >> Bio.SeqIO.parse(handle, "gff3") instead. I would instead >> have expected BCBio.GFF.parse(...) to iterate over the >> features in a GFF file. > > This would work for simple cases, but for most real life cases you > will likely want to limit the file to a subset of things you are > interested in. It helps reduce memory problems, and is equivalent to > a track system view in UCSC or Ensembl. I find it very useful for > all of the work I've done with it. Understood - a feature returning Bio.GFF.parse() function could take various arguments, or for full flexibility, the user can use the parser object directly. > We could use SeqIO here, but then there is the issue of passing > along the additional arguments. The simplicity of SeqIO is really > nice, so not sure if we'd want to clutter SeqIO with it. > > So we could support basic parsing in SeqIO, but it would be useful to > have this GFF specific parsing as the additional arguments will be a > regular use case. This is already catered for in that Bio.SeqIO.parse() and read() don't take arbitrary arguments (currently), but the underlying Bio.SeqIO.XxxxIO.XxxIterator() they invoke may do so. i.e. You could have Bio.SeqIO.GffIO.GffIterator() and perhaps variants (e.g. GFF2 vs GFF3) which take filtering arguments. >> Also, and we'd touched on this before - I'd much prefer to >> have the GFF module quite "low level" using either new >> GFF-specific classes or simple Python objects (e.g. for >> each feature use a tuple of ints and strings for the first >> feature columns plus a dict for the final extendible >> column of annotation). > > Yes, it is implemented this way. The parse_simple function returns > a line by line parse of the file as a dictionary, which is then used > to build up the SeqFeature objects: > > http://github.com/chapmanb/bcbb/blob/master/gff/BCBio/GFF/GFFParser.py > > We can document and flesh that out, although I'm not really sure how > useful it will be. It's pretty easy to build your own simple > line-by-line GFF parser; the only advantage of this code over a > home-brew is that it handles tricky annotation cases. I still think it would be useful to have Bio/GFF/Parser.py (or similar) as the low level parser, and Bio/SeqIO/GffIO.py (or similar) to turn this into SeqRecord and SeqFeature objects. > For all of my uses, the real win was being able to build up the > multiple transcript exon/intron structures from the file. This is > not trivial to do on your own, and the real win of the code is in > handling this, especially for older GFF2 and GTF formatted files. > >> From a technical point of view, a justification for this >> separation is the GFF details are not a perfect fit to the >> SeqRecord and SeqFeature objects and forcing their >> use adds unnecessary overheads for people wanting >> to work directly with the features themselves. > > Why are SeqRecord and SeqFeature not appropriate for GFF? We could > improve them to make things more lightweight, as we discussed > previously, but conceptually the values fit into the framework fine. The nested features that worry me. Perhaps the existing location operator (e.g. "join") could be set to something like "parent/child" if the subfeatures is used to hold child features rather than the elements of a join? We need the GenBank output code etc to be able to tell these apart reliably. >> Also, by splitting the code into basic parsing and a >> SeqRecord/SeqFeature conversion layer (which I >> would put in Bio/SeqIO/GffIO.py) we can add the >> code in two steps (first GFF parsing, then SeqIO >> support). > > We can do this as is. I'm not suggesting SeqIO support right now, > and want to target getting the GFF parser as is into Biopython. My point is the moment you include GFF -> SeqRecord code (even if not explicitly via the Bio.SeqIO namespace) it opens us up to people giving these SeqRecord objects to SeqIO for output (e.g. as GenBank). >> I think this split is useful as this is a very big job to do >> properly: Once we have GFF to SeqRecord parsing, >> we need to try and ensure that it is compatible with the >> GenBank to SeqRecord parsing. This is important as >> we would in effect be extending Biopython to allow >> GFF3 to GenBank conversions. For testing all this, >> we can grab the same data in the two file formats >> (e.g. from the NCBI) and perhaps also use EMBOSS. > > Do you think GFF to GenBank is a common use case? I suspect its something I'd want to do it when working with new genome annotations. GeneMark produces GFF, while Prodigal produces (simple) GenBank. The SOLiD pipeline corona produces GFF. Sometimes you can get both, the tool RAST outputs GenBank, GFF, GTF and EMBL files. > Agreed that it is very hard, but this really had less to do > with the object structure in Biopython and more to do > with how things are represented and named in the > original source files. GenBank has some "consistency" > since it is produced mostly by NCBI, but GFF files are > all over the place. > > This can be tackled later if someone wants, but right > now my goals are simply: > > - Produce Biopython objects from GFF3/GTF/GFF2 files > - Represent nested features > - Allow GFF2/GTF to GFF3 conversion > > This should be done with the current code. We can > formalize the raw parse_simple output for the line-by-line > if people find it useful, but otherwise we should leave > these bigger projects for down the line. Worth goals, but if by "Produce Biopython objects from GFF3/GTF/GFF2 files" you mean SeqRecords with SeqFeatures, (as I said above) we are opening up the GFF to GenBank can of worms. There is no "later" :( Peter From mjldehoon at yahoo.com Sat Dec 5 10:54:19 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 5 Dec 2009 07:54:19 -0800 (PST) Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> Message-ID: <983129.2133.qm@web62408.mail.re1.yahoo.com> I didn't realize that the GFF parser returns SeqRecords. I agree with Peter that a parser returning SeqRecords should be accessed through Bio.SeqIO, while a lower-level parser can live in Bio.GFF. --Michiel --- On Thu, 12/3/09, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Bio.GFF and Brad's code > To: "Brad Chapman" , biopython-dev at lists.open-bio.org > Date: Thursday, December 3, 2009, 9:53 AM > On Thu, Dec 3, 2009 at 2:25 PM, Brad > Chapman > wrote: > > > > Great -- done for parsing and writing and committed to > GitHub. The > > documentation is updated as well. > > > > Happy to get other comments and thoughts. Thanks > again, > > > > I understand that GFF files are complex, and a simple > "record > iterator" isn't flexible enough to cover all use cases - > hence the > need for a complex parser class. That said, Michiel is > right that > GFF.parse() or GFF.read() functions would be consistent > with > other bits of Biopython, and would provide for the simple > use > cases. > > Looking at your code, BCBio.GFF.parse(...) would return > SeqRecord objects (with SeqFeatures). That seems > redundant to me as one expect people to just use > Bio.SeqIO.parse(handle, "gff3") instead. I would instead > have expected BCBio.GFF.parse(...) to iterate over the > features in a GFF file. > > Also, and we'd touched on this before - I'd much prefer to > have the GFF module quite "low level" using either new > GFF-specific classes or simple Python objects (e.g. for > each feature use a tuple of ints and strings for the first > feature columns plus a dict for the final extendible > column of annotation). > > >From a technical point of view, a justification for > this > separation is the GFF details are not a perfect fit to the > SeqRecord and SeqFeature objects and forcing their > use adds unnecessary overheads for people wanting > to work directly with the features themselves. > > Also, by splitting the code into basic parsing and a > SeqRecord/SeqFeature conversion layer (which I > would put in Bio/SeqIO/GffIO.py) we can add the > code in two steps (first GFF parsing, then SeqIO > support). > > I think this split is useful as this is a very big job to > do > properly: Once we have GFF to SeqRecord parsing, > we need to try and ensure that it is compatible with the > GenBank to SeqRecord parsing. This is important as > we would in effect be extending Biopython to allow > GFF3 to GenBank conversions. For testing all this, > we can grab the same data in the two file formats > (e.g. from the NCBI) and perhaps also use EMBOSS. > > You may recall we talked to Peter Rice (from EMBOSS) > about this - there are some important issues here like > ontology mapping where we should be able to reuse a > lot of the work EMBOSS has already done (and use the > EMBOSS tools to help validate our mapping). > > i.e. While I may be being overly cautious, I think that > while adding GFF parsing and GFF to SeqRecord > mapping is very important, it is also very complex. > Therefore breaking this into a two stage task makes > managing and testing it easier - as well as seeming > a good idea for the code itself. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From MatatTHC at gmx.de Sun Dec 6 09:18:40 2009 From: MatatTHC at gmx.de (Matthias Bernt) Date: Sun, 06 Dec 2009 15:18:40 +0100 Subject: [Biopython-dev] Genetic Code Message-ID: <20091206141840.67400@gmx.net> Hi, The genetic codes you provide in Bio.Data.CodonTable are somewhat out of date. E.g. in the mitochondrial echinoderm (id 9) genetic code one start codon is missing. Regards, Matthias -- GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 From biopython at maubp.freeserve.co.uk Sun Dec 6 09:55:24 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 6 Dec 2009 14:55:24 +0000 Subject: [Biopython-dev] Genetic Code In-Reply-To: <20091206141840.67400@gmx.net> References: <20091206141840.67400@gmx.net> Message-ID: <320fb6e00912060655r75103918w3122f46b3ccb538f@mail.gmail.com> On Sun, Dec 6, 2009 at 2:18 PM, Matthias Bernt wrote: > Hi, > > The genetic codes you provide in Bio.Data.CodonTable are somewhat > out of date. E.g. in the mitochondrial echinoderm (id 9) genetic code > one start codon is missing. Confirmed - could you file a bug please? http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython It looks like we have only got Version 3.4 (based on a visual inspection), but the latest version is Version 3.9. We should just need to re-run the script to generate these. Also the original URL noted in the Biopython source code of ftp://ncbi.nlm.nih.gov/entrez/misc/data/gc.prt is now ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt Peter From bugzilla-daemon at portal.open-bio.org Sun Dec 6 10:07:23 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 10:07:23 -0500 Subject: [Biopython-dev] [Bug 2962] New: deprecated generic code Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2962 Summary: deprecated generic code Product: Biopython Version: 1.52 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: MatatTHC at gmx.de The genetic codes provided in Bio.Data.CodonTable are out of date. E.g. in the mitochondrial echinoderm (id 9) genetic code one start codon is missing. It looks like we have only got Version 3.4 (based on a visual inspection), but the latest version is Version 3.9. We should just need to re-run the script to generate these. Also the original URL noted in the Biopython source code of ftp://ncbi.nlm.nih.gov/entrez/misc/data/gc.prt is now ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 10:07:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 10:07:43 -0500 Subject: [Biopython-dev] [Bug 2963] New: deprecated genetic code Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2963 Summary: deprecated genetic code Product: Biopython Version: 1.52 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: MatatTHC at gmx.de The genetic codes provided in Bio.Data.CodonTable are out of date. E.g. in the mitochondrial echinoderm (id 9) genetic code one start codon is missing. It looks like we have only got Version 3.4 (based on a visual inspection), but the latest version is Version 3.9. We should just need to re-run the script to generate these. Also the original URL noted in the Biopython source code of ftp://ncbi.nlm.nih.gov/entrez/misc/data/gc.prt is now ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 10:35:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 10:35:09 -0500 Subject: [Biopython-dev] [Bug 2963] deprecated genetic code In-Reply-To: Message-ID: <200912061535.nB6FZ9qY029156@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2963 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-06 10:35 EST ------- *** This bug has been marked as a duplicate of bug 2962 *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 10:35:21 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 10:35:21 -0500 Subject: [Biopython-dev] [Bug 2962] deprecated generic code In-Reply-To: Message-ID: <200912061535.nB6FZL0I029172@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2962 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-06 10:35 EST ------- *** Bug 2963 has been marked as a duplicate of this bug. *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 11:09:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 11:09:28 -0500 Subject: [Biopython-dev] [Bug 2962] deprecated generic code In-Reply-To: Message-ID: <200912061609.nB6G9Sk9030056@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2962 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-06 11:09 EST ------- The NCBI codon tables have been updated from version 3.4 to 3.9, which adds a few extra start codons, and a few new tables (Tables 16, 21, 22 and 23). Note that Table 14 which used to be called "Flatworm Mitochondrial" is now called "Alternative Flatworm Mitochondrial", and "Flatworm Mitochondrial" is now an alias for Table 9 ("Echinoderm Mitochondrial"). See: http://github.com/biopython/biopython/commit/74ba9d295b2cd6c6fa6862e91f1e1e59300deeb6 Marking as fixed - but feel free to reopen this is I missed anything. Thanks! Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Sun Dec 6 11:11:08 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 6 Dec 2009 16:11:08 +0000 Subject: [Biopython-dev] Genetic Code In-Reply-To: <320fb6e00912060655r75103918w3122f46b3ccb538f@mail.gmail.com> References: <20091206141840.67400@gmx.net> <320fb6e00912060655r75103918w3122f46b3ccb538f@mail.gmail.com> Message-ID: <320fb6e00912060811x1fc336ech6245221741372c62@mail.gmail.com> On Sun, Dec 6, 2009 at 2:55 PM, Peter wrote: > Confirmed - could you file a bug please? > http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython Thanks - I was expecting to look at this next week, but had some spare time this afternoon after all. It should be fixed, you can grab the latest code and reinstall to test: http://www.biopython.org/wiki/SourceCode Peter From bugzilla-daemon at portal.open-bio.org Sun Dec 6 12:46:55 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 12:46:55 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912061746.nB6Hkt7x032479@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 ------- Comment #5 from cymon.cox at gmail.com 2009-12-06 12:46 EST ------- Created an attachment (id=1409) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1409&action=view) Patch for output file fomat options -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 13:50:08 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 13:50:08 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912061850.nB6Io80P001234@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-06 13:50 EST ------- (In reply to comment #5) > Created an attachment (id=1409) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1409&action=view) [details] > Patch for output file fomat options > Applied with minor changes to the docstrings - Bio.AlignIO will now accept the default CLUSTALW output from MUSCLE as is. Thanks! Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 14:10:01 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 14:10:01 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912061910.nB6JA1p3001668@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 ------- Comment #7 from cymon.cox at gmail.com 2009-12-06 14:10 EST ------- Created an attachment (id=1410) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1410&action=view) Change Application cmdline tests to use subprocess module -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 14:36:27 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 14:36:27 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912061936.nB6JaRo0002258@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-06 14:36 EST ------- (In reply to comment #7) > Created an attachment (id=1410) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1410&action=view) [details] > Change Application cmdline tests to use subprocess module > Lovely - applied as is - thanks again :) Did you want to add tests for the new MUSCLE output options, or can we close this bug now Cymon? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 14:43:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 14:43:12 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912061943.nB6JhCOd002510@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 ------- Comment #9 from cymon.cox at gmail.com 2009-12-06 14:43 EST ------- (In reply to comment #8) > (In reply to comment #7) > > Created an attachment (id=1410) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1410&action=view) [details] [details] > > Change Application cmdline tests to use subprocess module > > > > Lovely - applied as is - thanks again :) > > Did you want to add tests for the new MUSCLE output options, or can we close > this bug now Cymon? There's is one in the patch called: test_with_multiple_output_formats that writes to stdout, phylip interleaved, and clustalw strict, using the -phyi and -clwstrict options. I think it can be closed. Cheers, C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 14:47:11 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 14:47:11 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912061947.nB6JlBHi002609@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-06 14:47 EST ------- (In reply to comment #9) > > Did you want to add tests for the new MUSCLE output options, or can we > > close this bug now Cymon? > > There's is one in the patch called: test_with_multiple_output_formats that > writes to stdout, phylip interleaved, and clustalw strict, using the -phyi and > -clwstrict options. So there is - I missed that. Lovely :) Marking bug as fixed. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 7 04:16:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 04:16:42 -0500 Subject: [Biopython-dev] [Bug 2964] New: placing x-axis of graph track at the bottom or top of the track Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2964 Summary: placing x-axis of graph track at the bottom or top of the track Product: Biopython Version: 1.52 Platform: PC OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: Other AssignedTo: biopython-dev at biopython.org ReportedBy: Daniel.Nicorici at gmail.com By default when one uses the graph track the axis is placed automatically in the middle of the track (which is given by the mean of the all values which are plotted). It would be great if the x-axis of the graph track could be placed at the bottom of the track also and the plotting of the values could be done accordingly. This would allow one to plot for example the short-read coverage in next-gen sequencing data. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 7 04:48:11 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 04:48:11 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track In-Reply-To: Message-ID: <200912070948.nB79mBTh022941@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #1 from Daniel.Nicorici at gmail.com 2009-12-07 04:48 EST ------- This has feature has been added in: http://github.com/ndaniel/biopython/tree/x-axis_GenomeDiagram/Bio/Graphics/GenomeDiagram/ Also, here a small additional bug has been fixed, i.e. the line/bar graphs are drawn from the first element to the last element of the graph and not from the origin to the end of the x-axis as it was original. One can specify that the x-axis should be drawn at bottom of the track by specifying the argument x_axis='bottom' for new_track, e.g. gdt_features=gdd.new_track(2,x_axis='bottom'). Below one may find two examples where the x-axis is drawn in the middle (as it is originally done by the GenomeDiagram) and bottom of the track (the new feature added to GenomeDiagram). ====Example_1:_Using_Graph_from_GenomeDiagram_where_the_x-axis_is_at_the_middle_of_track(as_it_is_originally)============================= import Bio.SeqFeature import Bio.Graphics.GenomeDiagram import random gdd=Bio.Graphics.GenomeDiagram.Diagram('Test diagram') gdt_features=gdd.new_track(1) gds_features=gdt_features.new_set() # Add three features feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(25,125),strand=+1) gds_features.add_feature(feature,name="Forward",label=True) feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(150,250),strand=None) gds_features.add_feature(feature,name="Forward",label=True) feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(275,375),strand=-1) gds_features.add_feature(feature,name="Forward",label=True) # Add graph gdt_features=gdd.new_track(2) gds_features=gdt_features.new_set('graph') # generate some random values for plotting coverage=[] coverage.append((50,float(0))) # this is need in order skip the interpolation done by GenomeDiagram for missing values coverage.extend( [ (i, random.uniform(0,100)) for i in xrange(51,100)]) coverage.append((100,float(0))) # this is need in order skip the interpolation done by GenomeDiagram for missing values coverage.append((250,float(0)))# this is need in order skip the interpolation done by GenomeDiagram for missing values coverage.extend( [ (i, random.uniform(50,400)) for i in xrange(251,400)]) coverage.append((400,float(0)))# this is need in order skip the interpolation done by GenomeDiagram for missing values gds_features.new_graph(coverage, 'coverage', style='bar') gdd.draw(format='linear',orientation='landscape',pagesize='A4',fragments=1,start=1,end=500) gdd.write("Test_gaph.pdf","pdf") ============================================ ====Example_2:_Using_Graph_from_GenomeDiagram_where_x-axis_is_at_the_bottom_of_track============================= import Bio.SeqFeature import Bio.Graphics.GenomeDiagram import random gdd=Bio.Graphics.GenomeDiagram.Diagram('Test diagram') gdt_features=gdd.new_track(1) gds_features=gdt_features.new_set() # Add three features feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(25,125),strand=+1) gds_features.add_feature(feature,name="Forward",label=True) feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(150,250),strand=None) gds_features.add_feature(feature,name="Forward",label=True) feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(275,375),strand=-1) gds_features.add_feature(feature,name="Forward",label=True) # Add graph gdt_features=gdd.new_track(2,x_axis='bottom') gds_features=gdt_features.new_set('graph') # generate some random values for plotting coverage=[] coverage.append((50,float(0))) # this is need in order skip the interpolation done by GenomeDiagram for missing values coverage.extend( [ (i, random.uniform(0,100)) for i in xrange(51,100)]) coverage.append((100,float(0))) # this is need in order skip the interpolation done by GenomeDiagram for missing values coverage.append((250,float(0)))# this is need in order skip the interpolation done by GenomeDiagram for missing values coverage.extend( [ (i, random.uniform(50,400)) for i in xrange(251,400)]) coverage.append((400,float(0)))# this is need in order skip the interpolation done by GenomeDiagram for missing values gds_features.new_graph(coverage, 'coverage', style='bar') gdd.draw(format='linear',orientation='landscape',pagesize='A4',fragments=1,start=1,end=500) gdd.write("Test_gaph.pdf","pdf") ============================================ Best Regards, Daniel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 7 05:55:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 05:55:12 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track In-Reply-To: Message-ID: <200912071055.nB7AtCol024504@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-07 05:55 EST ------- I'm guessing you are talking about GenomeDiagram? If so, yes, tracks default to having the x-axis line at the middle y-value (center or centre=None). Try setting center to zero when you create the Graph object. If you could give a cut down example it would be easier to help. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Dec 7 06:34:11 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 7 Dec 2009 11:34:11 +0000 Subject: [Biopython-dev] Biopython git access for Cymon Message-ID: <320fb6e00912070334m311dd287r4a20f1e399413adc@mail.gmail.com> Dear all, It is a little overdue, but I'm pleased to announce Cymon Cox now has write access to the Biopython repository. Cymon has made contributions to Biopython over many years, initially with the modules Bio.Nexus and Bio.Sequencing (together with Frank Kauff), and more recently with improvements to our BioSQL wrappers (especially on PostgreSQL) and his recent work on alignment wrappers. I'd previously talked to Cymon about giving him CVS access, and he said we might as well wait until after the git transition. I've just checked in a few patches on his behalf (alignment tool wrappers), which served to remind me of this - it would have saved me some work to just say "Yes, please check that in" ;) On behalf of the Biopython project, welcome (fully) to the development team Cymon, and thanks again for all your work to date. Regards, Peter From bugzilla-daemon at portal.open-bio.org Mon Dec 7 06:38:27 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 06:38:27 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track In-Reply-To: Message-ID: <200912071138.nB7BcROx026201@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #3 from Daniel.Nicorici at gmail.com 2009-12-07 06:38 EST ------- (In reply to comment #2) > I'm guessing you are talking about GenomeDiagram? If so, yes, tracks default to > having the x-axis line at the middle y-value (center or centre=None). Try > setting > center to zero when you create the Graph object. If you could give a cut down > example it would be easier to help. Yes, I am referring to GenomeDiagram. If one sets the center to zero then the lower half of the track (below the x-axis) is always empty and unused when all values are positive, e.g. CG content, short-read coverage have positive values. This feature allows one to use the entire track for plotting and not only half of it when setting center to zero is used. Best Regards, Daniel > > Peter > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 7 06:48:32 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 06:48:32 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track In-Reply-To: Message-ID: <200912071148.nB7BmW8A026423@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #4 from Daniel.Nicorici at gmail.com 2009-12-07 06:48 EST ------- Here is the cut down example of what I mean: ===================================================== import Bio.SeqFeature import Bio.Graphics.GenomeDiagram import random gdd=Bio.Graphics.GenomeDiagram.Diagram('Test diagram') gdt_features=gdd.new_track(1) gds_features=gdt_features.new_set() feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(25,125),strand=+1) gds_features.add_feature(feature,name="Forward",label=True) feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(150,250),strand=None) gds_features.add_feature(feature,name="Forward",label=True) feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(275,375),strand=-1) gds_features.add_feature(feature,name="Forward",label=True) # Add graph gdt_features=gdd.new_track(2) gds_features=gdt_features.new_set('graph') # generate some random values for plotting coverage=[] coverage.append((50,float(0))) coverage.extend( [ (i, random.uniform(0,100)) for i in xrange(51,100)]) coverage.append((100,float(0))) coverage.append((250,float(0))) coverage.extend( [ (i, random.uniform(50,400)) for i in xrange(251,400)]) coverage.append((400,float(0))) gds_features.new_graph(coverage, 'coverage', style='bar',center=0) gdd.draw(format='linear',orientation='landscape',pagesize='A4',fragments=1,start=1,end=500) gdd.write("Test_gaph.pdf","pdf") =========================================== The values which are plotted here in this are in range 0 to 400 and the GenomeDiagram's y-axis range is from -400 to 400 when center is set to 0. It is really odd choice of a y-axis range of -n to n when all the values which are to be plotted are in range 0 to n. The feature proposed here allows the entire track to be used instead of using half of the track and also having a better range for y-axis. (In reply to comment #2) > I'm guessing you are talking about GenomeDiagram? If so, yes, tracks default to > having the x-axis line at the middle y-value (center or centre=None). Try > setting > center to zero when you create the Graph object. If you could give a cut down > example it would be easier to help. > > Peter > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 7 06:59:33 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 06:59:33 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track in GenomeDiagram In-Reply-To: Message-ID: <200912071159.nB7BxXs5026717@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |enhancement Summary|placing x-axis of graph |placing x-axis of graph |track at the bottom or top |track at the bottom or top |of the track |of the track in | |GenomeDiagram ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-07 06:59 EST ------- When I wrote comment 2, I hadn't seen comment 1 with the github link and examples. Leighton and I had (some time ago now) chatted about a related enhancement allowing the user to give the y-limits. With than in mind, it makes sense to give the x-axis vertical position in terms of a y-coordinate (rather than a few limited options like top, middle and bottom). This would be more flexible. Marking this as an enhancement. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at 50mail.com Mon Dec 7 07:12:45 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 7 Dec 2009 07:12:45 -0500 Subject: [Biopython-dev] Biopython git access for Cymon In-Reply-To: <320fb6e00912070334m311dd287r4a20f1e399413adc@mail.gmail.com> References: <320fb6e00912070334m311dd287r4a20f1e399413adc@mail.gmail.com> Message-ID: <20091207121245.GM51407@sobchak.mgh.harvard.edu> Hi all; > It is a little overdue, but I'm pleased to announce Cymon Cox > now has write access to the Biopython repository. > > Cymon has made contributions to Biopython over many years, > initially with the modules Bio.Nexus and Bio.Sequencing > (together with Frank Kauff), and more recently with > improvements to our BioSQL wrappers (especially on > PostgreSQL) and his recent work on alignment wrappers. Awesome. Congrats Cymon and thanks for all your excellent work. Well deserved. Brad From bugzilla-daemon at portal.open-bio.org Mon Dec 7 07:15:03 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 07:15:03 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track in GenomeDiagram In-Reply-To: Message-ID: <200912071215.nB7CF3pE027513@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #6 from Daniel.Nicorici at gmail.com 2009-12-07 07:15 EST ------- (In reply to comment #5) > When I wrote comment 2, I hadn't seen comment 1 with the github link and > examples. > ;-) > Leighton and I had (some time ago now) chatted about a related enhancement > allowing the user to give the y-limits. I think that it is need enhancement. Let's see if others think that same! ;-) > With than in mind, it makes sense to > give the x-axis vertical position in terms of a y-coordinate (rather than a few > limited options like top, middle and bottom). This would be more flexible. This sounds good and I agree that it is more flexible. Indeed that options like "top, middle, bottom" are limited but still the scaling is done automatically and the user does not have to know in what range are his/her values are and what are the minimum and maximum and what axis position matches all the graphs which he/she wants to generate. I am sure that this can be done better than I did it. > > Marking this as an enhancement. Ok. Daniel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 7 08:03:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 08:03:14 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track in GenomeDiagram In-Reply-To: Message-ID: <200912071303.nB7D3Esa029362@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #7 from lpritc at scri.sari.ac.uk 2009-12-07 08:03 EST ------- (In reply to comment #6) > > (In reply to comment #5) > > Leighton and I had (some time ago now) chatted about a related enhancement > > allowing the user to give the y-limits. > > I think that it is need enhancement. Let's see if others think that same! ;-) Oh, it definitely does! ;) Thank you for taking the time to improve it. > > With than in mind, it makes sense to > > give the x-axis vertical position in terms of a y-coordinate (rather than a few > > limited options like top, middle and bottom). This would be more flexible. > > This sounds good and I agree that it is more flexible. This is my preferred option. > Indeed that options like "top, middle, bottom" are limited but still the > scaling is done automatically and the user does not have to know in what range > are his/her values are and what are the minimum and maximum and what axis > position matches all the graphs which he/she wants to generate. > > I am sure that this can be done better than I did it. By allowing the position of the axis to take any value within the data range, this still allows 'top', 'middle' and 'bottom' to be defined as functions of the data with, e.g. x_axis_pos = min(data) # bottom x_axis_pos = max(data) # middle x_axis_pos = median(data) # top and also allows for explicit placing of the axis at specified points on the y-axis, or as other points that depend on the data (e.g. mean, quartiles, etc.) Cheers, L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 7 08:05:11 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 08:05:11 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track in GenomeDiagram In-Reply-To: Message-ID: <200912071305.nB7D5B22029508@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #8 from lpritc at scri.sari.ac.uk 2009-12-07 08:05 EST ------- (In reply to comment #7) > x_axis_pos = min(data) # bottom > x_axis_pos = max(data) # middle > x_axis_pos = median(data) # top x_axis_pos = min(data) # bottom x_axis_pos = max(data) # top x_axis_pos = median(data) # middle D'oh! L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 7 08:25:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 08:25:29 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track in GenomeDiagram In-Reply-To: Message-ID: <200912071325.nB7DPTSH030274@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #9 from Daniel.Nicorici at gmail.com 2009-12-07 08:25 EST ------- (In reply to comment #8) Ok. > (In reply to comment #7) > > > x_axis_pos = min(data) # bottom > > x_axis_pos = max(data) # middle > > x_axis_pos = median(data) # top > > x_axis_pos = min(data) # bottom > x_axis_pos = max(data) # top > x_axis_pos = median(data) # middle > > D'oh! > > L. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Dec 7 08:28:10 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 7 Dec 2009 13:28:10 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 Message-ID: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> Hi all, I would like us to do the Biopython 1.53 release this month. We still have lots of new stuff that hasn't yet landed on the trunk, but despite that, looking at the NEWS file we have had plenty of improvements in the two months and a bit since Biopython 1.52 was released: http://biopython.open-bio.org/SRC/biopython/NEWS http://github.com/biopython/biopython/blob/master/NEWS One good reason for doing Biopython 1.53 soon is the NCBI said they plan to start using the new Jan 2010 DTD files for MedLine/PubMed as early as mid December: http://lists.open-bio.org/pipermail/biopython-dev/2009-November/007020.html Any comments on how things stand on the trunk - is there anything people think needs to be fixed before the release? Thanks, Peter From eric.talevich at gmail.com Mon Dec 7 11:33:30 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 7 Dec 2009 11:33:30 -0500 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> Message-ID: <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> On Mon, Dec 7, 2009 at 8:28 AM, Peter wrote: > Hi all, > > I would like us to do the Biopython 1.53 release this month. > > We still have lots of new stuff that hasn't yet landed on the > trunk, but despite that, looking at the NEWS file we have > had plenty of improvements in the two months and a bit > since Biopython 1.52 was released: > > http://biopython.open-bio.org/SRC/biopython/NEWS > http://github.com/biopython/biopython/blob/master/NEWS > > One good reason for doing Biopython 1.53 soon is the > NCBI said they plan to start using the new Jan 2010 DTD > files for MedLine/PubMed as early as mid December: > http://lists.open-bio.org/pipermail/biopython-dev/2009-November/007020.html > > Any comments on how things stand on the trunk - is there > anything people think needs to be fixed before the release? > > I'll chime in about the status of the Summer of Code stuff. For Bio.TreeIO, I've borrowed the Newick tree parsing code from Nexus.Trees and changed it to construct Bio.Tree objects via Bio.TreeIO.NewickIO -- so the TreeIO API will work independently of file formats. For Bio.Tree, I'm about halfway done porting the Nexus tree methods, though it'll go faster now that the semester's over. (I'll post the details and ask for a code review soon.) My phyloxml branch won't be ready to land in time for a December release, but merging it into the trunk right after that is feasible. That would everyone time to try it out and suggest changes before Biopython 1.54 cements the API. Separately: GitHub says Nick Matzke's BioGeography branch hasn't been touched since Aug. 19. It will need some love before it can be merged into the trunk. Is there a plan for this, Peter or Brad? If not, should I try to rescue it after TreeIO lands? Cheers, Eric From biopython at maubp.freeserve.co.uk Mon Dec 7 11:48:34 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 7 Dec 2009 16:48:34 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> Message-ID: <320fb6e00912070848i4153ee33w9df5c7df65a4c225@mail.gmail.com> On Mon, Dec 7, 2009 at 4:33 PM, Eric Talevich wrote: > > I'll chime in about the status of the Summer of Code stuff. Thanks > For Bio.TreeIO, I've borrowed the Newick tree parsing code from Nexus.Trees > and changed it to construct Bio.Tree objects via Bio.TreeIO.NewickIO -- so > the TreeIO API will work independently of file formats. For Bio.Tree, I'm > about halfway done porting the Nexus tree methods, though it'll go faster > now that the semester's over. (I'll post the details and ask for a code > review soon.) > > My phyloxml branch won't be ready to land in time for a December release, > but merging it into the trunk right after that is feasible. That would > everyone time to try it out and suggest changes before Biopython 1.54 > cements the API. That is what I was hoping for. Fingers crossed Tiago will be able to spare some time to go over the basics of the phyloxml and TreeIO work - more eyes on the code would be great. > Separately: GitHub says Nick Matzke's BioGeography branch hasn't been > touched since Aug. 19. It will need some love before it can be merged into > the trunk. Is there a plan for this, Peter or Brad? If not, should I try to > rescue it after TreeIO lands? That sounds good as a tentative plan - Nick may want to be more involved, but you would be the next logical choice to handle this. Cheers, Peter From bugzilla-daemon at portal.open-bio.org Mon Dec 7 13:56:20 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 13:56:20 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track in GenomeDiagram In-Reply-To: Message-ID: <200912071856.nB7IuKI7007552@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #10 from Daniel.Nicorici at gmail.com 2009-12-07 13:56 EST ------- (In reply to comment #7) > (In reply to comment #6) > > > > (In reply to comment #5) > > > Leighton and I had (some time ago now) chatted about a related enhancement > > > allowing the user to give the y-limits. > > > > I think that it is need enhancement. Let's see if others think that same! ;-) > > Oh, it definitely does! ;) Thank you for taking the time to improve it. > > > > With than in mind, it makes sense to > > > give the x-axis vertical position in terms of a y-coordinate (rather than a few > > > limited options like top, middle and bottom). This would be more flexible. > > > > This sounds good and I agree that it is more flexible. > > This is my preferred option. > > > Indeed that options like "top, middle, bottom" are limited but still the > > scaling is done automatically and the user does not have to know in what range > > are his/her values are and what are the minimum and maximum and what axis > > position matches all the graphs which he/she wants to generate. > > > > I am sure that this can be done better than I did it. > > By allowing the position of the axis to take any value within the data range, > this still allows 'top', 'middle' and 'bottom' to be defined as functions of > the data with, e.g. > > x_axis_pos = min(data) # bottom > x_axis_pos = max(data) # middle > x_axis_pos = median(data) # top > > and also allows for explicit placing of the axis at specified points on the > y-axis, or as other points that depend on the data (e.g. mean, quartiles, etc.) > It looks a little bit confusing too me now because I see that there are two sides of the problem (or two bugs?), as following: 1) drawing a line orthogonal on y-axis at any position which represents the x-axis (this does not affect how the values are plotted and in what interval) 2) in the case of bar plotting (partially affects also linear plotting), the values should be drawn automatically from zero (zero on y-axis, i.e. x=0 and y=-inf...+inf) unless the user specify something else and not to be drawn by default from some arbitrary point, e.g. median, mean, etc., as it is done now. I have the feeling that the solution presented here affects only the point 1) and not 2). Please, could you elaborate more such that maybe I could implement your suggestion? BR, Daniel > Cheers, > > L. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 8 03:49:59 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 8 Dec 2009 03:49:59 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track in GenomeDiagram In-Reply-To: Message-ID: <200912080849.nB88nx00030750@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #11 from lpritc at scri.sari.ac.uk 2009-12-08 03:49 EST ------- (In reply to comment #10) > It looks a little bit confusing too me now because I see that there are two > sides of the problem (or two bugs?), as following: > 1) drawing a line orthogonal on y-axis at any position which represents the > x-axis (this does not affect how the values are plotted and in what interval) > 2) in the case of bar plotting (partially affects also linear plotting), the > values should be drawn automatically from zero (zero on y-axis, i.e. x=0 and > y=-inf...+inf) unless the user specify something else and not to be drawn by > default from some arbitrary point, e.g. median, mean, etc., as it is done now. > > I have the feeling that the solution presented here affects only the point 1) > and not 2). > > Please, could you elaborate more such that maybe I could implement your > suggestion? I see why you've distinguished between the two cases, but I think they can be handled by the earlier suggestion to implement the location of the x-axis in the context of also allowing the user to set y-axis limits (see comment #5). It's the combination of allowing y-axis limits and the location of x-axis crossing that gives the greatest flexibility. For example, if y-limit selection and x-axis crossing point were under user control... ...if you wanted to continue with the current behaviour, you'd not set any y-limits, and not specify the location of the x-axis. ...if you wanted to draw short read coverage, you'd set the lower y-limit to 0, and set the location of the x-axis to zero (if that was not the default). This should draw bars with their bases on the bottom/inner of the track, and the scale running along the bottom/inner of the track. ...if you wanted to represent some data as a bar graph, with a special meaning for the mean (or median) value, you could optionally set y-limits, but have the x-axis cross at mean(data) or median(data). This should draw bars with their bases on the x-axis, and the axis located at the mean/median value for the data. Does this help clarify what I meant, above? L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at 50mail.com Tue Dec 8 08:33:12 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 8 Dec 2009 08:33:12 -0500 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com> References: <20091202125744.GA46415@sobchak.mgh.harvard.edu> <317375.58712.qm@web62401.mail.re1.yahoo.com> <20091203142534.GF51407@sobchak.mgh.harvard.edu> <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> <20091204134010.GK51407@sobchak.mgh.harvard.edu> <320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com> Message-ID: <20091208133312.GE74538@sobchak.mgh.harvard.edu> Peter and Michiel; Thanks for the thoughts. Tried to combine these below: Michiel: > I didn't realize that the GFF parser returns SeqRecords. I agree with > Peter that a parser returning SeqRecords should be accessed through > Bio.SeqIO, while a lower-level parser can live in Bio.GFF. Peter: > My point is the moment you include GFF -> SeqRecord > code (even if not explicitly via the Bio.SeqIO namespace) > it opens us up to people giving these SeqRecord objects > to SeqIO for output (e.g. as GenBank). [...] > Worth goals, but if by "Produce Biopython objects from > GFF3/GTF/GFF2 files" you mean SeqRecords with > SeqFeatures, (as I said above) we are opening up the > GFF to GenBank can of worms. There is no "later" :( We seem to have a very different view of SeqRecords/SeqFeatures. To me, they are a convenient well thought out object model to capture annotations and features associated with a sequence. They have the advantage that people who have used Biopython will be familiar with the object model. That's why I chose to use them for representing GFF, as opposed to a GFF specific class. You are adding on two extra conditions: - If something produces SeqRecords, it needs to come from SeqIO. - If you have a SeqRecord, it has to be compatible with GenBank output. This quickly ties us up to the not-that-great GenBank way of representing features and locations, and makes it hard to add on more flexible formats like GFF. Converting between very different feature representations is going to be complex and a whole new problem; why do you have to support that to use a SeqRecord in your code? Overall, I'd like to see it be simpler for people to contribute and add parsers to Biopython. > I still think it would be useful to have Bio/GFF/Parser.py (or > similar) as the low level parser, and Bio/SeqIO/GffIO.py (or > similar) to turn this into SeqRecord and SeqFeature objects. This appears to be about where the code lives. Personally, I prefer having things under the GFF namespace and then building thin wrappers around if in SeqIO if desired. Practically, I want to leave SeqIO inclusion out right now and try to argue only for getting the GFF specific parser in. > The nested features that worry me. Perhaps the existing > location operator (e.g. "join") could be set to something > like "parent/child" if the subfeatures is used to hold child > features rather than the elements of a join? We need > the GenBank output code etc to be able to tell these > apart reliably. Right now I don't set the location operator at all. The parent/child model is much more flexible than the GenBank operator stuff, so maybe the right way to go is to phase out using the operator at all. If it is set to nothing than parent/child is assumed, and GenBank output can add in all of the operators at output time. Brad From chapmanb at 50mail.com Tue Dec 8 09:03:54 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 8 Dec 2009 09:03:54 -0500 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> Message-ID: <20091208140354.GG74538@sobchak.mgh.harvard.edu> Hi Eric; > I'll chime in about the status of the Summer of Code stuff. > > For Bio.TreeIO, I've borrowed the Newick tree parsing code from Nexus.Trees > and changed it to construct Bio.Tree objects via Bio.TreeIO.NewickIO -- so > the TreeIO API will work independently of file formats. For Bio.Tree, I'm > about halfway done porting the Nexus tree methods, though it'll go faster > now that the semester's over. (I'll post the details and ask for a code > review soon.) > > My phyloxml branch won't be ready to land in time for a December release, > but merging it into the trunk right after that is feasible. That would > everyone time to try it out and suggest changes before Biopython 1.54 > cements the API. This sounds awesome. Thanks for keeping up with the code; looking forward to seeing it get in to the main branch. > Separately: GitHub says Nick Matzke's BioGeography branch hasn't been > touched since Aug. 19. It will need some love before it can be merged into > the trunk. Is there a plan for this, Peter or Brad? If not, should I try to > rescue it after TreeIO lands? No plan from my end; hopefully Nick will chime in. If Nick doesn't have time, it would be beyond great if you could finalize and merge the most useful parts. Thanks for volunteering on this. Brad From biopython at maubp.freeserve.co.uk Tue Dec 8 09:15:30 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Dec 2009 14:15:30 +0000 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <20091208133312.GE74538@sobchak.mgh.harvard.edu> References: <20091202125744.GA46415@sobchak.mgh.harvard.edu> <317375.58712.qm@web62401.mail.re1.yahoo.com> <20091203142534.GF51407@sobchak.mgh.harvard.edu> <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> <20091204134010.GK51407@sobchak.mgh.harvard.edu> <320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com> <20091208133312.GE74538@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00912080615k641cfc15v1c80b26132de83eb@mail.gmail.com> On Tue, Dec 8, 2009 at 1:33 PM, Brad Chapman wrote: > > We seem to have a very different view of SeqRecords/SeqFeatures. To > me, they are a convenient well thought out object model to capture > annotations and features associated with a sequence. They have the > advantage that people who have used Biopython will be familiar with > the object model. That's why I chose to use them for representing GFF, > as opposed to a GFF specific class. OK, but (as I expand on below), your planned use of the SeqFeature (while legitimate) appears to risk being inconsistent with existing parts of the Biopython code base (in particular, GenBank output, and maybe GenomeDiagram). > You are adding on two extra conditions: > > - If something produces SeqRecords, it needs to come from SeqIO. It was more of an aim than a rule. Isn't true of all the existing code for historical reasons, e.g. Bio.SeqIO "genbank" support acts as a thin wrapper to Bio.GenBank which does offer SeqRecord objects. For a user perspective, if you want a SeqRecord from a sequence file, the first point of call should be Bio.SeqIO. > - If you have a SeqRecord, it has to be compatible with GenBank > ?output. > > This quickly ties us up to the not-that-great GenBank way of > representing features and locations, and makes it hard to add on more > flexible formats like GFF. Converting between very different feature > representations is going to be complex and a whole new problem; > why do you have to support that to use a SeqRecord in your code? The big aim of Bio.SeqIO was to allow using many different file formats with the same object representation. Implicitly (assuming the required data is present), input from one file format could be output in another format. The problem lots of current code in Biopython uses SeqRecord/SeqFeatures in a particular way (GenBank/EMBL parsers, GenomeDiagram, GenBank output). Unfortunately, for GFF files it seems this isn't the most natural way to use SeqFeature objects (where you need real nesting). > Overall, I'd like to see it be simpler for people to contribute and > add parsers to Biopython. I hope that for simple file formats this already the case. But for annotation rich file formats, if we want SeqIO to continue to be useful for conversion, this by neccessity requires some awareness of how the other parsers/writers will represent the same data. One option for contributions is to offer a "low level" parser using basic Python datatypes or simple file-type specific records. Then someone more familiar with SeqIO and the other file formats can write a SeqRecord converter in order to integrate it into Bio.SeqIO. This is basically how Ace, Phred, SwissProt (and probably others) were done. >> I still think it would be useful to have Bio/GFF/Parser.py (or >> similar) as the low level parser, and Bio/SeqIO/GffIO.py (or >> similar) to turn this into SeqRecord and SeqFeature objects. > > This appears to be about where the code lives. Personally, I prefer > having things under the GFF namespace and then building thin > wrappers around if in SeqIO if desired. Practically, I want to leave > SeqIO inclusion out right now and try to argue only for getting the > GFF specific parser in. Where the code lives isn't a big issue. You can do a thin wrapper in Bio.SeqIO calling Bio.GFF (where Bio.GFF makes SeqRecords), or a fat wrapper (where Bio.GFF does not make SeqRecords). The problem (as I see it) is SeqIO integration and how your desired use of SeqFeatures will impact this. >> The nested features that worry me. Perhaps the existing >> location operator (e.g. "join") could be set to something >> like "parent/child" if the subfeatures is used to hold child >> features rather than the elements of a join? We need >> the GenBank output code etc to be able to tell these >> apart reliably. > > Right now I don't set the location operator at all. The parent/child > model is much more flexible than the GenBank operator stuff, so > maybe the right way to go is to phase out using the operator at all. > If it is set to nothing than parent/child is assumed, and GenBank > output can add in all of the operators at output time. I agree that using SeqFeature sub-features for parent/child relationships makes a lot of sense. BUT, we have a lot of existing code which follows the GenBank/EMBL parser route of using this for joins (and a few other corner cases). There are other annoyances with the current SeqFeature and FeatureLocation model - the strand and location operator are part of the SeqFeature not the FeatureLocation. It would make more sense to me to move them to the FeatureLocation (and have that handle joins itself). Or, move everything to the SeqFeature (and get rid of the FeatureLocation object). I think the best route forward is to plan a transition of the SeqFeature object to allow nice handling of real nested relationships, and a reworking of complex location handling. Then (hopefully) we can have the GenBank/EMBL/GFF3 parsers all using the SeqFeature in a consistent way. Peter From bugzilla-daemon at portal.open-bio.org Tue Dec 8 11:56:17 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 8 Dec 2009 11:56:17 -0500 Subject: [Biopython-dev] [Bug 2965] New: Updating Bio.Restriction with latest REBASE data Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2965 Summary: Updating Bio.Restriction with latest REBASE data Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk The Bio/Restriction/Restriction_Dictionary.py file hasn't been updated since 2004. The latest REBASE restriction digest files seem to be from Nov 29 2009, ftp://ftp.neb.com/pub/rebase/ This bug is to update Restriction_Dictionary.py to use the Nov 2009 data. I have tried and failed as described below: ---------------------------------------------------------------------------- I manually downloading these files to the Scripts/Restriction directory: ftp://ftp.neb.com/pub/rebase/emboss_e.912 ftp://ftp.neb.com/pub/rebase/emboss_r.912 ftp://ftp.neb.com/pub/rebase/emboss_s.912 And then ran ranacompiler.py which generated a new Restriction_Dictionary.py As an aside, module sre is deprecate, re is suggested instead. Other interesting output: WARNING : HaeIV cut twice with different overhang length each time. Unable to deal with this behaviour. This enzyme will not be included in the database. Sorry. Checking : Anyway, HaeIV is not commercially available. WARNING : TaqII has two different sites. The new database contains 753 enzymes. So far so good, but using the new Restriction_Dictionary.py the unit tests fail: $ python test_Restriction.py Traceback (most recent call last): File "test_Restriction.py", line 6, in from Bio.Restriction import * File "/Users/myusername/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Restriction/__init__.py", line 61, in from Bio.Restriction.Restriction import * File "/Users/myusername/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Restriction/Restriction.py", line 2358, in newenz = T(k, bases, enzymedict[k]) File "/Users/myusername/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Restriction/Restriction.py", line 216, in __init__ cls.compsite = re.compile(cls.compsite) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py", line 188, in compile return _compile(pattern, flags) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py", line 241, in _compile raise error, v # invalid expression sre_constants.error: bad character in group name -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 8 12:02:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 8 Dec 2009 12:02:42 -0500 Subject: [Biopython-dev] [Bug 2965] Updating Bio.Restriction with latest REBASE data In-Reply-To: Message-ID: <200912081702.nB8H2g4b014553@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2965 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-08 12:02 EST ------- To be more precise, running Bio/Restriction/Restriction.py in IDLE and looking at the stack track, the regular expression failing is for enzyme CviKI-1, (?P[AG]GC[CT])|(?P[AG]GC[CT]) The problem seems to be the hyphen/minus sign in the enzyme name which is being used as a group name in the regular expression. I think this is the only Enzyme with this name. Since it can't be used as a python name either, we should probably map it to an underscore: >>> import re >>> re.compile('(?P[AG]GC[CT])|(?P[AG]GC[CT])') ... error: bad character in group name >>> re.compile('(?P[AG]GC[CT])|(?P[AG]GC[CT])') <_sre.SRE_Pattern object at 0xe8d700> -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 8 12:50:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 8 Dec 2009 12:50:29 -0500 Subject: [Biopython-dev] [Bug 2965] Updating Bio.Restriction with latest REBASE data In-Reply-To: Message-ID: <200912081750.nB8HoTDW016476@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2965 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-08 12:50 EST ------- Fixed by mapping hyphen to an underscore. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From kellrott at gmail.com Tue Dec 8 17:00:11 2009 From: kellrott at gmail.com (Kyle Ellrott) Date: Tue, 8 Dec 2009 14:00:11 -0800 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <20091208140354.GG74538@sobchak.mgh.harvard.edu> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> Message-ID: Speaking of stuff that may not be ready for 1.53, but should start speeding up for 1.54, I've translated a bunch of HMMER3 / PfamScan code in the Bio.HMMER and Bio.Pfam modules in my github branch (right now it's sitting in the jython branch, but I can spin it into a separate branch). Right now it's missing the code to parse HMMER2, there needs to be more extensive unit testing, and the API needs to be nailed down with some documentation. Is there anybody else that needs HMMER and Pfam support? Kyle From biopython at maubp.freeserve.co.uk Tue Dec 8 17:18:03 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Dec 2009 22:18:03 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> On Tue, Dec 8, 2009 at 10:00 PM, Kyle Ellrott wrote: > > Speaking of stuff that may not be ready for 1.53, but should start speeding > up for 1.54, I've translated a bunch of HMMER3 / PfamScan code in the > Bio.HMMER and Bio.Pfam modules in my github branch (right now it's sitting > in the jython branch, but I can spin it into a separate branch). > Right now it's missing the code to parse HMMER2, there needs to be more > extensive unit testing, and the API needs to be nailed down with some > documentation. > Is there anybody else that needs HMMER and Pfam support? > > Kyle That had caught my eye, and it is potentially of direct interest to me personally. I will probably skip HMMER2 and go straight to HMMER3 though ;) On a related point, I am reasonably confident we can get most of Biopython running on Jython 2.5.1 in time for the release. Other than things that Jython doesn't support at all, i.e. the C code, DTD parsing (needed for Bio.Entrez), and the lack of a buffer function (not important, only used in deprecated code now), the only remaining hurdle is Bio.Restriction, and I think I have solved that. I will be testing this tomorrow (time permitting). Your groundwork has been very useful here Kyle. Thanks, Peter From biopython at maubp.freeserve.co.uk Tue Dec 8 17:30:20 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Dec 2009 22:30:20 +0000 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <320fb6e00912080615k641cfc15v1c80b26132de83eb@mail.gmail.com> References: <20091202125744.GA46415@sobchak.mgh.harvard.edu> <317375.58712.qm@web62401.mail.re1.yahoo.com> <20091203142534.GF51407@sobchak.mgh.harvard.edu> <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> <20091204134010.GK51407@sobchak.mgh.harvard.edu> <320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com> <20091208133312.GE74538@sobchak.mgh.harvard.edu> <320fb6e00912080615k641cfc15v1c80b26132de83eb@mail.gmail.com> Message-ID: <320fb6e00912081430q6db93d55l6de4a02baefd6c12@mail.gmail.com> On Tue, Dec 8, 2009 at 2:15 PM, Peter wrote: > > I agree that using SeqFeature sub-features for parent/child > relationships makes a lot of sense. BUT, we have a lot of > existing code which follows the GenBank/EMBL parser > route of using this for joins (and a few other corner cases). > > There are other annoyances with the current SeqFeature > and FeatureLocation model - the strand and location operator > are part of the SeqFeature not the FeatureLocation. It would > make more sense to me to move them to the FeatureLocation > (and have that handle joins itself). Or, move everything to > the SeqFeature (and get rid of the FeatureLocation object). > > I think the best route forward is to plan a transition of the > SeqFeature object to allow nice handling of real nested > relationships, and a reworking of complex location handling. > Then (hopefully) we can have the GenBank/EMBL/GFF3 > parsers all using the SeqFeature in a consistent way. > Just to add some ideas to this thread for discussion, on possible ways forward without breaking backwards compatibility... hopefully this is clear, I did have a glass of wine with dinner ;) Given the way the existing SeqFeature list property subfeatures is used (by the GenBank/EMBL parser etc), would it make sense for the GFF needs to add a new list for child features (say property "children"), and perhaps another property (maybe "parent") which can point back at the parent SeqFeature. i.e. A sort of tree, allowing us to represent genes, exons, etc. Note we may want to use weak references in the above (children/parent references) to assist the python GC. Given the above, potentially the GenBank/EMBL parser could be enhanced to use these new properties (e.g. for linking gene and CDS features in bacteria, or CDS and mat_peptide features in viruses etc). [This still leaves the ontology issues - which might be best dealt with by the GenBank output code] Peter From kellrott at gmail.com Tue Dec 8 17:42:54 2009 From: kellrott at gmail.com (Kyle Ellrott) Date: Tue, 8 Dec 2009 14:42:54 -0800 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> Message-ID: > > On a related point, I am reasonably confident we can get most > of Biopython running on Jython 2.5.1 in time for the release. > Other than things that Jython doesn't support at all, i.e. the C > code, DTD parsing (needed for Bio.Entrez), and the lack of a > buffer function (not important, only used in deprecated code > now), the only remaining hurdle is Bio.Restriction, and I think > I have solved that. I will be testing this tomorrow (time > permitting). The last bit for 'full' jython support is getting BioSQL working. Unfortunately MySQLdb links directly to the C mysql API, and doesn't work in Jython. My jython port also has work that moves the BioSQL interface from the internal ORM to a SqlAlchemy interface. Of course that is a little controversial because it introduces a dependency on another python package. Of course it takes care of sqlite and Java MySql connector support at the same time, so it does have some pluses. Kyle From biopython at maubp.freeserve.co.uk Tue Dec 8 17:46:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Dec 2009 22:46:19 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> Message-ID: <320fb6e00912081446w303edd73qe3a5dad964314487@mail.gmail.com> On Tue, Dec 8, 2009 at 10:42 PM, Kyle Ellrott wrote: > > The last bit for 'full' jython support is getting BioSQL working. > Unfortunately MySQLdb links directly to the C mysql API, and doesn't work in > Jython.? My jython port also has work that moves the BioSQL interface from > the internal ORM to a SqlAlchemy interface.? Of course that is a little > controversial because it introduces a dependency on another python package. > Of course it takes care of sqlite and Java MySql connector support at the > same time, so it does have some pluses. Fair point w.r.t. "full" jython support ;) I would be more comfortable with BioSQL on Jython working directly with sqlite (once we add that to BioSQL) and the Java MySql connector directly (without the extra dependency on SQLAlchemy). Peter From biopython at maubp.freeserve.co.uk Tue Dec 8 18:38:04 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Dec 2009 23:38:04 +0000 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <320fb6e00912081430q6db93d55l6de4a02baefd6c12@mail.gmail.com> References: <20091202125744.GA46415@sobchak.mgh.harvard.edu> <317375.58712.qm@web62401.mail.re1.yahoo.com> <20091203142534.GF51407@sobchak.mgh.harvard.edu> <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> <20091204134010.GK51407@sobchak.mgh.harvard.edu> <320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com> <20091208133312.GE74538@sobchak.mgh.harvard.edu> <320fb6e00912080615k641cfc15v1c80b26132de83eb@mail.gmail.com> <320fb6e00912081430q6db93d55l6de4a02baefd6c12@mail.gmail.com> Message-ID: <320fb6e00912081538o635347ceh8e10aa4863e538e9@mail.gmail.com> On Tue, Dec 8, 2009 at 2:15 PM, Peter wrote: >> >> There are other annoyances with the current SeqFeature >> and FeatureLocation model - the strand and location operator >> are part of the SeqFeature not the FeatureLocation. It would >> make more sense to me to move them to the FeatureLocation >> (and have that handle joins itself). Or, move everything to >> the SeqFeature (and get rid of the FeatureLocation object). >> In addition to the strand and location operator, there is also (sometimes) a database cross reference (properties ref and db_ref, e.g. in contig files). Again, this is conceptually part of the feature location (and stored that way in BioSQL if I recall correctly). One example of where it would make sense to move things like the database, operator and strand to the FeatureLocation is the coded_by information in some GenPept file annotation, a use case very recently raised on the main mailing list: http://lists.open-bio.org/pipermail/biopython/2009-December/005910.html The current FeatureLocation simply can't be used here - although a full SeqFeature could be. Peter From bugzilla-daemon at portal.open-bio.org Wed Dec 9 04:56:34 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 04:56:34 -0500 Subject: [Biopython-dev] [Bug 2966] New: Primer3Commandline does not use EMBOSS 6.1.0 arguments Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2966 Summary: Primer3Commandline does not use EMBOSS 6.1.0 arguments Product: Biopython Version: 1.52 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: lpritc at scri.sari.ac.uk Several arguments for EMBOSS eprimer3 are different in version 6.1.0 from those used in Primer3Commandline. I have updated Primer3Commandline locally (and added documentation strings), and will make this available via github with some other proposed changes shortly, after talking to Peter. This revealed another bug, which I will submit separately. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 05:07:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 05:07:14 -0500 Subject: [Biopython-dev] [Bug 2967] New: AbstractCommandline silently accepts invalid parameter options Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2967 Summary: AbstractCommandline silently accepts invalid parameter options Product: Biopython Version: 1.52 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: lpritc at scri.sari.ac.uk While investigating Bug 2996 I noticed that AbstractCommandline was silently accepting invalid parameter options when passed by setting attributes. For example: cline = Primer3Commandline(bogus=True) cline.sequence = filename raises the appropriate ValueError, as the parameter name 'bogus' is being compared to the self.parameters list when setting, and is found not to be valid. However, the following code: cline = Primer3Commandline() cline.sequence = filename cline.bogus = True # Invalid argument not flagged up cline.sequnce = True # Mistyped argument not flagged up silently sets the invalid cline.bogus and cline.sequnce attributes without warning. Parameters set via attribute are not validated with the setter/getters defined for the properties in AbstractCommandline.__init__ This could (did!) lead the user to think that parameters are set when they are not, under at least two circumstances: 1) Typos in the parameter name 2) Using a parameter unsupported by the interface (see Bug 2996). L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 05:08:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 05:08:12 -0500 Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts invalid parameter options In-Reply-To: Message-ID: <200912091008.nB9A8Cc5008147@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2967 ------- Comment #1 from lpritc at scri.sari.ac.uk 2009-12-09 05:08 EST ------- Sorry, I'm referring to bug 2966 in the post above. My bad. L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 05:46:11 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 05:46:11 -0500 Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts invalid parameter options In-Reply-To: Message-ID: <200912091046.nB9AkBXi009268@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2967 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-09 05:46 EST ------- (In reply to comment #0) > While investigating Bug 2996 I noticed that AbstractCommandline was silently > accepting invalid parameter options when passed by setting attributes. For > example: > > cline = Primer3Commandline(bogus=True) > cline.sequence = filename > > raises the appropriate ValueError, as the parameter name 'bogus' is being > compared to the self.parameters list when setting, and is found not to be > valid. However, the following code: > > cline = Primer3Commandline() > cline.sequence = filename > cline.bogus = True # Invalid argument not flagged up > cline.sequnce = True # Mistyped argument not flagged up > > > silently sets the invalid cline.bogus and cline.sequnce attributes without > warning. Parameters set via attribute are not validated with the > setter/getters defined for the properties in AbstractCommandline.__init__ > This could (did!) lead the user to think that parameters are set when they > are not, under at least two circumstances: > > 1) Typos in the parameter name > 2) Using a parameter unsupported by the interface This is normal Python object behaviour - you can add any "property" like this at run time, >>> class Dummy(object) : ... pass ... >>> d = Dummy() >>> d.name = "Fred" >>> dir(d) ['__class__', '__delattr__', '__dict__', '__doc__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__', '__weakref__', 'name'] >>> d.name 'Fred' We might still be able to block this via __setattr__, this needs some experimentation. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 07:23:34 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 07:23:34 -0500 Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts invalid parameter options In-Reply-To: Message-ID: <200912091223.nB9CNYtT012354@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2967 ------- Comment #3 from lpritc at scri.sari.ac.uk 2009-12-09 07:23 EST ------- (In reply to comment #2) > This is normal Python object behaviour - you can add any "property" like this > at run time, [...] Oddly enough, I was already aware of that... ;) The issue is that the setting of parameters via attributes fails silently, but is demonstrated in the tutorial and is in any case often rather more convenient than declaring the parameters on instantiation, so is very likely to be used in anger. This potentially (and *actually* in my case, when attempting to use EMBOSS 6.1.0 parameter names with eprimer3) leads to cases where the user might expect that command-line options have been set, when they in fact haven't. > We might still be able to block this via __setattr__, this needs some > experimentation. That seems to be the best route to me, initially. It might be worth removing the property magic in the AbstractCommandline.__init__(), and instead use __setattr__, __getattr__, and __delattr__, having them behave appropriately for known parameter names. I'll have a go at doing that and put it in with the EMBOSS stuff I'm working on, just now. L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 07:28:07 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 07:28:07 -0500 Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts invalid parameter options In-Reply-To: Message-ID: <200912091228.nB9CS7vS012457@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2967 ------- Comment #4 from lpritc at scri.sari.ac.uk 2009-12-09 07:28 EST ------- (In reply to comment #3) > (In reply to comment #2) > > > We might still be able to block this via __setattr__, this needs some > > experimentation. > > That seems to be the best route to me, initially. It might be worth removing > the property magic in the AbstractCommandline.__init__(), and instead use > __setattr__, __getattr__, and __delattr__, having them behave appropriately for > known parameter names. > > I'll have a go at doing that and put it in with the EMBOSS stuff I'm working > on, just now. Peter has pointed out that he'd like to retain discoverability, and so restrict the change to a validating __setattr__ - which seems reasonable. L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 07:53:00 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 07:53:00 -0500 Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts invalid parameter options In-Reply-To: Message-ID: <200912091253.nB9Cr0cP013048@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2967 ------- Comment #5 from lpritc at scri.sari.ac.uk 2009-12-09 07:53 EST ------- This works for me, at the moment: def __setattr__(self, name, value): """ Workaround for a user interface issue. Without this __setattr__ attribute-based assignment of parameters will silently accept invalid parameters, leading to known instances of the user assuming that parameters for the application are set, when they are not. This workaround uses a whitelist of object attributes, and sets the object attribute list as normal, for these. Other attributes are assumed to be parameters, and passed to the self.set_parameter method for validation and assignment. """ attr_whitelist = ['parameters', 'program_name'] # Allowed attributes if name not in attr_whitelist: # If not in whitelist, treat self.set_parameter(name, value) # as parameter else: self.__dict__[name] = value -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Dec 9 08:21:50 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 9 Dec 2009 13:21:50 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> Message-ID: <320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com> On Tue, Dec 8, 2009 at 10:18 PM, Peter wrote: > > On a related point, I am reasonably confident we can get most > of Biopython running on Jython 2.5.1 in time for the release. > Other than things that Jython doesn't support at all, i.e. the C > code, DTD parsing (needed for Bio.Entrez), and the lack of a > buffer function (not important, only used in deprecated code > now), the only remaining hurdle is Bio.Restriction, and I think > I have solved that. I will be testing this tomorrow (time > permitting). Your groundwork has been very useful here Kyle. I'm stuck again with Bio.Restriction under Jython. I've got the Bio/Restriction/Restriction_Dictionary.py to load under Jython (just = the Nov 2009 update isn't helping to keep the code size down), but doing test_Restriction.py hits the JVM limit. Furthermore, there is a little bit of C code in Bio.Restriction (which I think we can replace with plain python). Peter From biopython at maubp.freeserve.co.uk Wed Dec 9 09:18:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 9 Dec 2009 14:18:19 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> <320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com> Message-ID: <320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com> On Wed, Dec 9, 2009 at 1:21 PM, Peter wrote: > > Furthermore, there is a little bit of C code in Bio.Restriction > (which I think we can replace with plain python). > I've replaced the C module Bio.Restriction.DNAUtils with Python code, and deprecated it. I am surprised it was written in C in the first place! Peter From bugzilla-daemon at portal.open-bio.org Wed Dec 9 10:04:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 10:04:10 -0500 Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts invalid parameter options In-Reply-To: Message-ID: <200912091504.nB9F4AUM017626@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2967 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-09 10:04 EST ------- Fix committed - almost as is, I also added a doctest. Thanks! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Dec 9 10:57:20 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 9 Dec 2009 15:57:20 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> <320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com> <320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com> Message-ID: <320fb6e00912090757s6efbd2acpcb197e8e77cd298f@mail.gmail.com> Good news: I've tweaked the RestrictionCompiler.py code to modify how it generates Bio/Restriction/Restriction_Dictionary.py in order to build the dictionaries incrementally. Together with the removal of the C code DNAUtils, this means (after a clean install) that Jython likes Bio.Restriction and that test_Restiction.py passes on Jython 2.5.1 (and C Python too). Bad news: I think I have broken test_CAPS.py (under both Jython and Python). It looks like it hits some bits of Bio.Restriction are not covered by test_Restiction.py I'm working on it still ... Peter From biopython at maubp.freeserve.co.uk Wed Dec 9 11:25:28 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 9 Dec 2009 16:25:28 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <320fb6e00912090757s6efbd2acpcb197e8e77cd298f@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> <320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com> <320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com> <320fb6e00912090757s6efbd2acpcb197e8e77cd298f@mail.gmail.com> Message-ID: <320fb6e00912090825t45d2ac1atfaba7159d75aa6fc@mail.gmail.com> On Wed, Dec 9, 2009 at 3:57 PM, Peter wrote: > Good news: > > I've tweaked the RestrictionCompiler.py code to modify how it generates > Bio/Restriction/Restriction_Dictionary.py in order to build the dictionaries > incrementally. Together with the removal of the C code DNAUtils, this > means (after a clean install) that Jython likes Bio.Restriction and that > test_Restiction.py passes on Jython 2.5.1 (and C Python too). > > Bad news: > > I think I have broken test_CAPS.py (under both Jython and Python). > It looks like it hits some bits of Bio.Restriction are not covered by > test_Restiction.py > > I'm working on it still ... Solved: the check_bases function in Bio.Restriction also used to make things uppercase (but the docstring didn't make this clear and the C code was non-obvious). I think this means the whole test suite passes on Jython 2.5.1 (barring those bits with C code dependencies, BioSQL, or the known Jython issues with DTD passing or the missing buffer function). Kyle - could you confirm this on your machine please? Thanks, Peter From bugzilla-daemon at portal.open-bio.org Wed Dec 9 12:57:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 12:57:37 -0500 Subject: [Biopython-dev] [Bug 2968] New: Modifications to Emboss eprimer3 parser and associated files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2968 Summary: Modifications to Emboss eprimer3 parser and associated files Product: Biopython Version: 1.52 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: lpritc at scri.sari.ac.uk The existing Emboss primer3/eprimer3 code has a couple of issues, and some scope for improvement: - The existing Primer3.py parser code can only parse output when eprimer3 is applied to a single sequence. When eprimer3 is applied to multiple sequence input, it groups all primers for all sequences into a single record, which may incorrectly associate primers with the wrong sequences in downstream analysis. - The current parser lacks an iterator for iterating over multiple sequence output - The current parser creates 'ghost' primers for all primer pairs, with length zero and sequence as an empty string; it does not do this for internal oligos. A more intuitive solution might be to return None for absent primers/oligos - The current data model stores all primer data as individual attributes. It might be more useful to group the attributes of individual primers into their natural associations I have written new code for Emboss/Primer3.py that adds iterator/multiple sequence parsing functionality to the parser, and extensively revises the object model for the data. The Record and Primers objects are retained, but each primer/oligo is now represented by a Primer object that collects the relevant data together. The Record object has a new attribute that allows the sequence to be recorded directly, rather than having to be parsed from the comments attribute. The new data model retains the old attribute-based access for compatibility, but adds direct access to the Primer objects (where present) by .forward, .reverse and .oligo attributes, and by keywords. One change was required to the unit test, to account for the reporting of absent primers as None, rather than having 'null' attributes. I've added two further test output files, which may be rather large for the distribution (60kb total), and doctests that use these. The code can be inspected at my GitHub repository: http://github.com/widdowquinn/biopython/commit/b4701079ced297d7af5aa75b46738280c8783fe0 This enhancement request also relates to bug 2966. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 12:59:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 12:59:14 -0500 Subject: [Biopython-dev] [Bug 2968] Modifications to Emboss eprimer3 parser and associated files In-Reply-To: Message-ID: <200912091759.nB9HxErQ022462@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2968 ------- Comment #1 from lpritc at scri.sari.ac.uk 2009-12-09 12:59 EST ------- I forgot to mention - the new code still passes the test_EmbossPrimer.py unit test. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 13:01:13 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 13:01:13 -0500 Subject: [Biopython-dev] [Bug 2966] Primer3Commandline does not use EMBOSS 6.1.0 arguments In-Reply-To: Message-ID: <200912091801.nB9I1DMe022568@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2966 ------- Comment #1 from lpritc at scri.sari.ac.uk 2009-12-09 13:01 EST ------- I have made changes to Primer3Commandline that involve adding the EMBOSS 6.1.0 arguments, and docstrings to each argument. I have also added doctests. The proposed code can be inspected at my GitHub repository: http://github.com/widdowquinn/biopython/commit/9c0643e333b0cafb4e356426fb4902e0e9d2385c -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 13:03:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 13:03:30 -0500 Subject: [Biopython-dev] [Bug 2969] New: Addition of SeqmatchallCommandline to Emboss/Applications.py Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2969 Summary: Addition of SeqmatchallCommandline to Emboss/Applications.py Product: Biopython Version: 1.52 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: lpritc at scri.sari.ac.uk I thought it would be useful to have a command line wrapper to the EMBOSS seqmatchall application, and have added this to Emboss/Applications.py, with doctests. The proposed code can be inspected at my GitHub repository: http://github.com/widdowquinn/biopython/commit/ced72a34b2565b97f3ad2c77a66e1083375cff02 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From kellrott at gmail.com Wed Dec 9 14:22:01 2009 From: kellrott at gmail.com (Kyle Ellrott) Date: Wed, 9 Dec 2009 11:22:01 -0800 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <320fb6e00912090825t45d2ac1atfaba7159d75aa6fc@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> <320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com> <320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com> <320fb6e00912090757s6efbd2acpcb197e8e77cd298f@mail.gmail.com> <320fb6e00912090825t45d2ac1atfaba7159d75aa6fc@mail.gmail.com> Message-ID: > Kyle - could you confirm this on your machine please? > It looks like the master branch is working well. I guess the next step will be looking into the zxJDBC to expand the BioSQL ORM. Intro can be found at: http://www.informit.com/articles/article.aspx?p=26143 Kyle From bugzilla-daemon at portal.open-bio.org Wed Dec 9 16:53:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 16:53:42 -0500 Subject: [Biopython-dev] [Bug 2969] Addition of SeqmatchallCommandline to Emboss/Applications.py In-Reply-To: Message-ID: <200912092153.nB9LrgYN027652@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2969 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-09 16:53 EST ------- A nice easy one to wrap at first glance. I would like to also include the "aformat" output to set the output alignment format (useful to set to pair or simple for AlignIO to parse it as the "emboss" alignment format - see the needle and water wrappers). You could then also add a run time test to test_Emboss.py piping this to AlignIO... ;) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 17:42:26 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 17:42:26 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912092242.nB9MgQS9028588@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #8 from chapmanb at 50mail.com 2009-12-09 17:42 EST ------- Great idea Peter -- happy to get this in. It's now on a branch here: http://github.com/chapmanb/biopython/tree/biosql-sqlite It would be excellent if you, Cymon or anyone else interested could review and merge it in. This also includes a small typo fix on Bio/SeqIO/InsdcIO.py which isn't really related but came up when I was running the BioSQL tests. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 18:51:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 18:51:14 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912092351.nB9NpESn030303@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-09 18:51 EST ------- Hi Brad, My only immediate comment is it might make sense to split the BioSQL tests in two, one for SQLite which we can try and make 100% automatic (at least on Python 2.5+), and one for a user specified back end (MySQL, PostreSQL etc) which requires a username and password. Its midnight here in the UK, so feel free to tweak things this evening your time and I'll take full look tomorrow. Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 10 06:12:36 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Dec 2009 06:12:36 -0500 Subject: [Biopython-dev] [Bug 2969] Addition of SeqmatchallCommandline to Emboss/Applications.py In-Reply-To: Message-ID: <200912101112.nBABCaRr015734@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2969 ------- Comment #2 from lpritc at scri.sari.ac.uk 2009-12-10 06:12 EST ------- (In reply to comment #1) > A nice easy one to wrap at first glance. I would like to also include the > "aformat" output to set the output alignment format (useful to set to pair or > simple for AlignIO to parse it as the "emboss" alignment format - see the > needle and water wrappers). You could then also add a run time test to > test_Emboss.py piping this to AlignIO... ;) That shouldn't take too long to do (though probably won't get done by me this week). Do we want to set any particular policy for the sequence-associated and outfile-associated arguments? Their inclusion in the command-line wrappers is pretty inconsistent, which is why I left them out in the first place. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 10 06:15:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Dec 2009 06:15:09 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track in GenomeDiagram In-Reply-To: Message-ID: <200912101115.nBABF90t015907@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #12 from Daniel.Nicorici at gmail.com 2009-12-10 06:15 EST ------- (In reply to comment #11) > (In reply to comment #10) > > > It looks a little bit confusing too me now because I see that there are two > > sides of the problem (or two bugs?), as following: > > 1) drawing a line orthogonal on y-axis at any position which represents the > > x-axis (this does not affect how the values are plotted and in what interval) > > 2) in the case of bar plotting (partially affects also linear plotting), the > > values should be drawn automatically from zero (zero on y-axis, i.e. x=0 and > > y=-inf...+inf) unless the user specify something else and not to be drawn by > > default from some arbitrary point, e.g. median, mean, etc., as it is done now. > > > > I have the feeling that the solution presented here affects only the point 1) > > and not 2). > > > > Please, could you elaborate more such that maybe I could implement your > > suggestion? > > I see why you've distinguished between the two cases, but I think they can be > handled by the earlier suggestion to implement the location of the x-axis in > the context of also allowing the user to set y-axis limits (see comment #5). > It's the combination of allowing y-axis limits and the location of x-axis > crossing that gives the greatest flexibility. For example, if y-limit > selection and x-axis crossing point were under user control... > > ...if you wanted to continue with the current behaviour, you'd not set any > y-limits, and not specify the location of the x-axis. > > ...if you wanted to draw short read coverage, you'd set the lower y-limit to 0, > and set the location of the x-axis to zero (if that was not the default). This > should draw bars with their bases on the bottom/inner of the track, and the > scale running along the bottom/inner of the track. > > ...if you wanted to represent some data as a bar graph, with a special meaning > for the mean (or median) value, you could optionally set y-limits, but have the > x-axis cross at mean(data) or median(data). This should draw bars with their > bases on the x-axis, and the axis located at the mean/median value for the > data. I submitted the changes which do somehow what is described above, i.e. still by default the x-axis is drawn in the middle of the track (it is still left for now like this in order not to change the default behavior of GenomeDiagram). If the x-axis is specified to be drawn at the bottom or top of the track then the x-axis is drawn there and the values for bars/lines in the graph are drawn using zero-based (if the some values are positive and other are negative) or min (if all values are positive) or max (all values are negative). Hence only when specifying the x-axis to be drawn at the bottom or top for the track, the behavior of the graph and plotting are affected. The limits are computed automatically. > > Does this help clarify what I meant, above? It helped. Thanks! BR, Daniel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Dec 10 07:20:55 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 10 Dec 2009 12:20:55 +0000 Subject: [Biopython-dev] Removing C implementation of deprecated listfns, mathfns, stringfns Message-ID: <320fb6e00912100420o74dc84efhe3af0aa278386ec8@mail.gmail.com> Hi all, The modules listfns, mathfns, stringfns are now all deprecated. They all have both a C implementation and a pure Python implementation. We could wait for the complete deprecation process, and remove the C code when the Python code gets removed. However, I would like remove their C implementations for the next release, as this will simplify our code base. The only downside is anyone still using these modules will get a deprecation warning and a possible slow down (as the C code wouldn't exist any more). Also anyone using the C code directly will be in trouble (but no-one should be doing that...). Any comments? Objections? Thanks, Peter From bugzilla-daemon at portal.open-bio.org Thu Dec 10 07:39:15 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Dec 2009 07:39:15 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912101239.nBACdFtu018207@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #10 from chapmanb at 50mail.com 2009-12-10 07:39 EST ------- Thanks Peter. All of the tests will run on SQLite provided sqlite3 is installed, so there is no need to split them. I enabled SQLite by default, so they will run automatically if a user has sqlite3 and fail gracefully with a dependency error if not. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 10 07:43:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Dec 2009 07:43:28 -0500 Subject: [Biopython-dev] [Bug 2969] Addition of SeqmatchallCommandline to Emboss/Applications.py In-Reply-To: Message-ID: <200912101243.nBAChSHg018300@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2969 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-10 07:43 EST ------- (In reply to comment #2) > Do we want to set any particular policy for the sequence-associated and > outfile-associated arguments? Their inclusion in the command-line wrappers > is pretty inconsistent, which is why I left them out in the first place. In the long term, I'd like us to look at generating the wrappers automatically from the EMBOSS ACD files which define their tool options. For now, since some EMBOSS tools have so many options, they have been added in a somewhat ad-hoc basis based on what the coder thought most important, or user feedback. Fix checked in with addition of aformat option. Thanks! Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 10 07:52:16 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Dec 2009 07:52:16 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912101252.nBACqGp6018512@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-10 07:52 EST ------- (In reply to comment #10) > Thanks Peter. All of the tests will run on SQLite provided sqlite3 is > installed, so there is no need to split them. I enabled SQLite by default, so > they will run automatically if a user has sqlite3 and fail gracefully with a > dependency error if not. That's great as is. I was thinking about something more: What I meant was, I want to be able to run all the tests on SQLite (by default) AND on another back end (e.g. MySQL) if the user has configured it. Otherwise we (as developers) have to manually switch the BioSQL settings and rerun the BioSQL unit tests. I will be able to test the effect of your changes on MySQL, hopefully Cymon can do this on PostgreSQL - not that I anticipate and regressions, but best to be sure ;) Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 10 07:56:44 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Dec 2009 07:56:44 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912101256.nBACuheQ018635@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-10 07:56 EST ------- (In reply to comment #11) > > That's great as is. I was thinking about something more: What I meant was, I > want to be able to run all the tests on SQLite (by default) AND on another > back end (e.g. MySQL) if the user has configured it. Otherwise we (as > developers) have to manually switch the BioSQL settings and rerun the BioSQL > unit tests. > On reflection, that kind of improvement can wait until after Biopython 1.53 is out. It would be great to make it completely general so that if you have all the backends installed the test suite could check on SQLite, MySQL, PostgreSQL etc. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 10 08:15:45 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Dec 2009 08:15:45 -0500 Subject: [Biopython-dev] [Bug 2495] parse element symbols for ATOM/HETATM records (Bio.PDB.PDBParser) In-Reply-To: Message-ID: <200912101315.nBADFj7O019533@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2495 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-10 08:15 EST ------- (In reply to comment #2) > > Leaving bug open to deal with the output as well. > Marking bug as fixed. I've just committed a change based on a patch from Frederik Gwinner via GitHub - Bio.PDB.PDBIO should now save the element on output now, Please reopen this bug if there is any problem. Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Dec 10 09:25:53 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 10 Dec 2009 14:25:53 +0000 Subject: [Biopython-dev] Removing C implementation of deprecated listfns, mathfns, stringfns In-Reply-To: <320fb6e00912100420o74dc84efhe3af0aa278386ec8@mail.gmail.com> References: <320fb6e00912100420o74dc84efhe3af0aa278386ec8@mail.gmail.com> Message-ID: <320fb6e00912100625s48ba290cj1234d757da0b94f@mail.gmail.com> On Thu, Dec 10, 2009 at 12:20 PM, Peter wrote: > Hi all, > > The modules listfns, mathfns, stringfns are now all deprecated. They > all have both a C implementation and a pure Python implementation. > > We could wait for the complete deprecation process, and remove > the C code when the Python code gets removed. However, I would > like remove their C implementations for the next release, as this will > simplify our code base. > > The only downside is anyone still using these modules will get > a deprecation warning and a possible slow down (as the C code > wouldn't exist any more). Also anyone using the C code directly > will be in trouble (but no-one should be doing that...). > > Any comments? Objections? I hope there are no objections as I've just done this on the trunk ;) Peter From bugzilla-daemon at portal.open-bio.org Thu Dec 10 09:54:17 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Dec 2009 09:54:17 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912101454.nBAEsHdi023376@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-10 09:54 EST ------- (In reply to comment #11) > > I will be able to test the effect of your changes on MySQL, hopefully Cymon > can do this on PostgreSQL - not that I anticipate and regressions, but best > to be sure ;) > The branch still merges cleanly onto the trunk (I had already manually applied the Bio/SeqIO/InsdcIO.py date fix to the trunk). Testing "as is" on Mac OS X 10.5 with Apple's Python 2.5.2 uses SQLite, and works. Changing setup_BioSQL.py to use MySQL also works fine :) I have not yet tried this on Windows. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 12 13:12:23 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 12 Dec 2009 13:12:23 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912121812.nBCICNWt003206@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #14 from cymon.cox at gmail.com 2009-12-12 13:12 EST ------- (In reply to comment #11) > I will be able to test the effect of your changes on MySQL, hopefully Cymon > can do this on PostgreSQL - not that I anticipate and regressions, but best > to be sure ;) Is SQLite ":memory:" TESTDB working for you on Brads branch? It fails for me, all else is fin (incl the SQLite file db). [cymon at spiro Tests]$ python test_BioSQL_SeqIO.py Connecting to database Removing existing sub-database 'biosql-seqio-test' (if exists) Traceback (most recent call last): File "test_BioSQL_SeqIO.py", line 134, in if db_name in server.keys(): File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line 123, in keys return self.adaptor.list_biodatabase_names() File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line 306, in list_biodatabase_names "SELECT name FROM biodatabase") File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line 355, in execute_and_fetch_col0 self.execute(sql, args or ()) File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line 336, in execute self.dbutils.execute(self.cursor, sql, args) File "/home/cymon/git/biopython-github-master/BioSQL/DBUtils.py", line 53, in execute cursor.execute(sql, args or ()) sqlite3.OperationalError: no such table: biodatabase Perhaps its my sqlite installation - I'm not familiar with it: [cymon at spiro BioSQL]$ dpkg -l|egrep sqlite ii libmono-sqlite2.0-cil 2.4.2.3+dfsg-2 Mono Sqlite library (for CLI 2.0) ii libsqlite0 2.8.17-6build1 SQLite shared library ii libsqlite3-0 3.6.16-1ubuntu1 SQLite 3 shared library ii sqlite3 3.6.16-1ubuntu1 A command line interface for SQLite 3 C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 12 13:33:15 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 12 Dec 2009 13:33:15 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912121833.nBCIXFCH003747@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-12 13:33 EST ------- (In reply to comment #14) > (In reply to comment #11) > > I will be able to test the effect of your changes on MySQL, hopefully Cymon > > can do this on PostgreSQL - not that I anticipate and regressions, but best > > to be sure ;) > > Is SQLite ":memory:" TESTDB working for you on Brads branch? I didn't try that specifically - just SQLite on disk. Brad? > > It fails for me, all else is fin (incl the SQLite file db) > But the good news is Brad's changes to BioSQL/*.py haven't caused any regressions on PostreSQL :) Thanks Cymon, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 12 13:39:07 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 12 Dec 2009 13:39:07 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912121839.nBCId7U6003831@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #16 from cymon.cox at gmail.com 2009-12-12 13:39 EST ------- (In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #11) > > > I will be able to test the effect of your changes on MySQL, hopefully Cymon > > > can do this on PostgreSQL - not that I anticipate and regressions, but best > > > to be sure ;) > > > > Is SQLite ":memory:" TESTDB working for you on Brads branch? > > I didn't try that specifically - just SQLite on disk. Brad? > > > > > It fails for me, all else is fin (incl the SQLite file db) > > > > But the good news is Brad's changes to BioSQL/*.py haven't caused any > regressions on PostreSQL :) Yep, no problems, although I only tried the psycopg2 driver (with and without rules deletion). Psycopg version 1 support has had a deprecation warning since version 1.53 http://bugzilla.open-bio.org/show_bug.cgi?id=2851#c4 - when can we drop it? C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 12 14:05:02 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 12 Dec 2009 14:05:02 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912121905.nBCJ52Nn004276@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #17 from chapmanb at 50mail.com 2009-12-12 14:05 EST ------- Thanks Cymon -- glad nothing is broken on Postgres. The in memory database (:memory:) doesn't work for the tests, because they assume a database created by previous test cases. Since the memory one keeps going away, they will get plenty of errors about non-existing tables. It would work in theory with some test re-writing, but it's not too necessary. Sorry, should have added a note about this. Thanks again for double checking that everything works. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 12 14:41:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 12 Dec 2009 14:41:12 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912121941.nBCJfCXr004756@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-12 14:41 EST ------- (In reply to comment #16) > > Yep, no problems, although I only tried the psycopg2 driver (with and > without rules deletion). > > Psycopg version 1 support has had a deprecation warning since version 1.53 > http://bugzilla.open-bio.org/show_bug.cgi?id=2851#c4 - when can we drop it? > > C. > Minor typo - Psycopg v1 support was deprecated in Biopython 1.51 (August 2009). In line with the current deprecation policy, we aim for two releases with the warning (which has happened already, 1.51 and 1.52) plus at least one year - which means we can drop Psycopg v1 in summer 2010. Given in this case its a fairly simple task for someone to just install Psycopg v2, we might look at dropping the Psycopg v1 support a little quicker (say Biopython 1.54?). See: http://www.biopython.org/wiki/Deprecation_policy (In reply to comment #17) > Thanks Cymon -- glad nothing is broken on Postgres. > > The in memory database (:memory:) doesn't work for the tests, because they > assume a database created by previous test cases. Since the memory one keeps > going away, they will get plenty of errors about non-existing tables. It would > work in theory with some test re-writing, but it's not too necessary. > > Sorry, should have added a note about this. Thanks again for double checking > that everything works. OK then - Brad, would you like to merge this to the trunk now (or in the next few days), add a note about not using :memory: in Tests/setup_BioSQL.py, and something to the NEWS file (with a proviso about the SQLite schema not yet being official)? Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 14 07:48:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 14 Dec 2009 07:48:28 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912141248.nBECmS6b007714@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 chapmanb at 50mail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #19 from chapmanb at 50mail.com 2009-12-14 07:48 EST ------- Peter and Cymon -- thanks again for the help. Merged into the main trunk and marking this as resolved. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Dec 14 11:24:44 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 14 Dec 2009 16:24:44 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> Message-ID: <320fb6e00912140824x3bfa58cfy8520142c0fea3a45@mail.gmail.com> On Mon, Dec 7, 2009 at 1:28 PM, Peter wrote: > > One good reason for doing Biopython 1.53 soon is the > NCBI said they plan to start using the new Jan 2010 DTD > files for MedLine/PubMed as early as mid December: > http://lists.open-bio.org/pipermail/biopython-dev/2009-November/007020.html I've just checked the PubMed XML from efetch, and the NCBI are still using the old 2009 DTD file. I guess it is only midday in the USA, so plenty of time for them to make the switch on 14 Dec as announced... Once that happens (hopefully within hours), and I've checked the Entrez parser is still happy, we can do the Biopython release. Until then, only documentation and unit tests fixes on the trunk please. Thanks, Peter From biopython at maubp.freeserve.co.uk Tue Dec 15 05:45:31 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 10:45:31 +0000 Subject: [Biopython-dev] Code freeze for Biopython 1.53 Message-ID: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com> Hello all, I plan to do the Biopython 1.53 release this afternoon (in a few hours time). If there are any last minute changes anyone wants to make on the trunk, please email first. Ideally just documentation or additional unit tests at this point ;) Thanks Peter From biopython at maubp.freeserve.co.uk Tue Dec 15 10:29:48 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 15:29:48 +0000 Subject: [Biopython-dev] Code freeze for Biopython 1.53 In-Reply-To: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com> References: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com> Message-ID: <320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com> On Tue, Dec 15, 2009 at 10:45 AM, Peter wrote: > Hello all, > > I plan to do the Biopython 1.53 release this afternoon (in a few hours time). > OK - Everything looks good on the code side, git has been tagged, source archives and windows installers uploaded. If anyone could double check the installers work on your machines that would be great. Brad - could you run a sanity test before uploading to pypi? David - did you manage to draft a release announcement? If not, don't worry, I'll make one up ;) Peter From biopython at maubp.freeserve.co.uk Tue Dec 15 11:28:13 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 16:28:13 +0000 Subject: [Biopython-dev] Code freeze for Biopython 1.53 In-Reply-To: <320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com> References: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com> <320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com> Message-ID: <320fb6e00912150828q5d3901deq162f14db458f980d@mail.gmail.com> On Tue, Dec 15, 2009 at 3:29 PM, Peter wrote: > On Tue, Dec 15, 2009 at 10:45 AM, Peter wrote: >> Hello all, >> >> I plan to do the Biopython 1.53 release this afternoon (in a few hours time). >> > > OK - Everything looks good on the code side, git has been tagged, source > archives and windows installers uploaded. If anyone could double check > the installers work on your machines that would be great. > > Brad - could you run a sanity test before uploading to pypi? > > David - did you manage to draft a release announcement? If not, don't > worry, I'll make one up ;) Draft text below - any comments? Thanks, Peter ---- We are pleased to announce the availability of Biopython 1.53, a new stable release of the Biopython library, three months after the release of Biopython 1.52. This is our first release since migrating from CVS to git for source code control. There have been some additions to our core objects ? the Seq (and related UnknownSeq) objects gained upper and lower methods (like the string methods of the same name but alphabet aware) plus a new ungap method. The SeqFeature object now has an extract method to get the region of sequence it describes (useful for getting CDS nucleotide sequences from GenBank files). Also SeqRecord objects now support addition, giving a new SeqRecord with the combined sequence, all the SeqFeatures, and any common annotation. SQLite support (built into Python 2.5+) was added to our BioSQL interface. This is still a little experimental as we are using a draft BioSQL SQLite schema, but this should be merged into the next BioSQL release. Biopython now includes wrappers for the new NCBI BLAST C++ tools, which will be replacing the old NCBI ?legacy? BLAST tools written in C. The plain text BLAST parser has been updated to cope as well. Nevertheless, we (and the NCBI) still recommend using the XML output for parsing. Bio.Entrez includes the new (Jan 2010) DTD files from the NCBI for parsing MedLine/PubMed data. The NCBI codon tables have been updated from version 3.4 to 3.9, which adds a few extra start codons, and a few new tables (Tables 16, 21, 22 and 23). The restriction enzyme list in Bio.Restriction has been updated to the Nov 2009 release of REBASE. The Bio.PDB parser and output code has been updated to understand the element column in ATOM and HETATM lines, and Bio.PDB.PDBList has been updated for recent changes to the PDB FTP site. Finally, support for running Biopython under Jython (using the Java Virtual Machine) has been much improved. Note that Jython does not support C code, and currently Jython does not parse DTD files (needed for the Bio.Entrez XML parser). However, most of the Biopython modules seem fine from testing Jython 2.5.0 and 2.5.1. Sources and Windows Installers are available from our downloads page. Thanks to the Biopython development team and to everyone who has reported bugs or contributed patches since our last release. From bugzilla-daemon at portal.open-bio.org Tue Dec 15 11:32:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Dec 2009 11:32:28 -0500 Subject: [Biopython-dev] [Bug 2895] Bio.Restriction.Restriction_Dictionary Jython Error Fix+Patch In-Reply-To: Message-ID: <200912151632.nBFGWS6a022173@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2895 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-15 11:32 EST ------- Fixed in Biopython 1.53, using a similar technique but complicated because this file is generated by a separate script. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 15 11:32:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Dec 2009 11:32:46 -0500 Subject: [Biopython-dev] [Bug 2892] Jython MatrixInfo.py fix+patch In-Reply-To: Message-ID: <200912151632.nBFGWkSA022203@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2892 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-15 11:32 EST ------- Fixed in Biopython 1.53 using a similar technique. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 15 11:32:48 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Dec 2009 11:32:48 -0500 Subject: [Biopython-dev] [Bug 2895] Bio.Restriction.Restriction_Dictionary Jython Error Fix+Patch In-Reply-To: Message-ID: <200912151632.nBFGWm0Q022215@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2895 Bug 2895 depends on bug 2892, which changed state. Bug 2892 Summary: Jython MatrixInfo.py fix+patch http://bugzilla.open-bio.org/show_bug.cgi?id=2892 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 15 11:32:51 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Dec 2009 11:32:51 -0500 Subject: [Biopython-dev] [Bug 2893] Jython test_prosite fix+patch In-Reply-To: Message-ID: <200912151632.nBFGWpCp022227@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2893 Bug 2893 depends on bug 2892, which changed state. Bug 2892 Summary: Jython MatrixInfo.py fix+patch http://bugzilla.open-bio.org/show_bug.cgi?id=2892 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 15 11:33:13 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Dec 2009 11:33:13 -0500 Subject: [Biopython-dev] [Bug 2893] Jython test_prosite fix+patch In-Reply-To: Message-ID: <200912151633.nBFGXD3Y022254@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2893 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-15 11:33 EST ------- Fixed in Biopython 1.53 using a similar technique -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 15 11:33:15 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Dec 2009 11:33:15 -0500 Subject: [Biopython-dev] [Bug 2895] Bio.Restriction.Restriction_Dictionary Jython Error Fix+Patch In-Reply-To: Message-ID: <200912151633.nBFGXFa7022266@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2895 Bug 2895 depends on bug 2893, which changed state. Bug 2893 Summary: Jython test_prosite fix+patch http://bugzilla.open-bio.org/show_bug.cgi?id=2893 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 15 11:41:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Dec 2009 11:41:30 -0500 Subject: [Biopython-dev] [Bug 2807] Clustalw return codes In-Reply-To: Message-ID: <200912151641.nBFGfUpS022532@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2807 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-15 11:41 EST ------- Bio.Clustalw was declared obsolete in Release 1.52, so there is no reason to add better support for return codes. With the new alignment wrappers and subprocess this is a non-issue. Marking as "won't fix". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 15 11:46:17 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Dec 2009 11:46:17 -0500 Subject: [Biopython-dev] [Bug 2820] Convert test_PDB.py to unittest In-Reply-To: Message-ID: <200912151646.nBFGkHAG022705@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2820 ------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-15 11:46 EST ------- (In reply to comment #1) > > I've checked in a slightly modified version as test_PDB_unit.py - I think > having both this and the original test_PDB.py is sensible in the short term. > I've just removed old print-and-compare test_PDB.py, then renamed test_PDB_unit.py to test_PDB.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Dec 15 12:01:38 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 17:01:38 +0000 Subject: [Biopython-dev] Biopython 1.53 released Message-ID: <320fb6e00912150901k138ae04bmc5d5af9c867340ec@mail.gmail.com> Dear Biopythoneers, We are pleased to announce the availability of Biopython 1.53, a new stable release of the Biopython library, three months after the release of Biopython 1.52. This is our first release since migrating from CVS to git for source code control. There have been some additions to our core objects ? the Seq (and related UnknownSeq) objects gained upper and lower methods (like the string methods of the same name but alphabet aware) plus a new ungap method. The SeqFeature object now has an extract method to get the region of sequence it describes (useful for getting CDS nucleotide sequences from GenBank files). Also SeqRecord objects now support addition, giving a new SeqRecord with the combined sequence, all the SeqFeatures, and any common annotation. SQLite support (built into Python 2.5+) was added to our BioSQL interface. This is still a little experimental as we are using a draft BioSQL SQLite schema, but this should be merged into the next BioSQL release. Biopython now includes wrappers for the new NCBI BLAST C++ tools, which will be replacing the old NCBI ?legacy? BLAST tools written in C. The plain text BLAST parser has been updated to cope as well. Nevertheless, we (and the NCBI) still recommend using the XML output for parsing. Bio.Entrez includes the new (Jan 2010) DTD files from the NCBI for parsing MedLine/PubMed data. The NCBI codon tables have been updated from version 3.4 to 3.9, which adds a few extra start codons, and a few new tables (Tables 16, 21, 22 and 23). The restriction enzyme list in Bio.Restriction has been updated to the Nov 2009 release of REBASE. The Bio.PDB parser and output code has been updated to understand the element column in ATOM and HETATM lines, and Bio.PDB.PDBList has been updated for recent changes to the PDB FTP site. Finally, support for running Biopython under Jython (using the Java Virtual Machine) has been much improved. Note that Jython does not support C code, and currently Jython does not parse DTD files (needed for the Bio.Entrez XML parser). However, most of the Biopython modules seem fine from testing Jython 2.5.0 and 2.5.1. Sources and Windows Installers are available from our downloads page. Thanks to the Biopython development team and to everyone who has reported bugs or contributed patches since our last release. --Peter, on behalf of the Biopython developers P.S. This news post is online at http://news.open-bio.org/news/2009/12/biopython-release-153/ You may wish to subscribe to our news feed. For RSS links etc, see: http://biopython.org/wiki/News Biopython news is also on twitter: http://twitter.com/biopython From chapmanb at 50mail.com Wed Dec 16 07:42:35 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 16 Dec 2009 07:42:35 -0500 Subject: [Biopython-dev] Code freeze for Biopython 1.53 In-Reply-To: <320fb6e00912150828q5d3901deq162f14db458f980d@mail.gmail.com> References: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com> <320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com> <320fb6e00912150828q5d3901deq162f14db458f980d@mail.gmail.com> Message-ID: <20091216124235.GK78379@sobchak.mgh.harvard.edu> Hi Peter; > >> I plan to do the Biopython 1.53 release this afternoon (in a few hours time). Sorry I am too slow with your mails. Thanks for the hard work getting this together. Great stuff. > > Brad - could you run a sanity test before uploading to pypi? Looks good to me, and uploaded to pypi. > Draft text below - any comments? As a thought for next time, what do you think about adding the names of people who have worked on the items mentioned in the release? This would give a bit more public recognition for the contributions, especially to people who only look at the release notes and not mailing list traffic. Thanks again, Brad From biopython at maubp.freeserve.co.uk Wed Dec 16 17:43:16 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 16 Dec 2009 22:43:16 +0000 Subject: [Biopython-dev] Code freeze for Biopython 1.53 In-Reply-To: <20091216124235.GK78379@sobchak.mgh.harvard.edu> References: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com> <320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com> <320fb6e00912150828q5d3901deq162f14db458f980d@mail.gmail.com> <20091216124235.GK78379@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00912161443q30f82120of1c98b073136c3f6@mail.gmail.com> Brad wrote: >> Brad - could you run a sanity test before uploading to pypi? > > Looks good to me, and uploaded to pypi. Great, thank you. >> Draft text below - any comments? > > As a thought for next time, what do you think about adding the > names of people who have worked on the items mentioned in the > release? This would give a bit more public recognition for the > contributions, especially to people who only look at the release > notes and not mailing list traffic. Its too late for the emails and the source code bundles, but the nice thing about the NEWS file (in the repository) and the OBF news server is we can update them even now. Of course, quite where to draw the line is debatable - a simple patch probably doesn't warrant it (or does it?), but solving a more complex bug or adding some new functionality does. If any existing core developers want more "recognition" we can do that too. For example, Kyle, would you have like to be named with regards to the Jython work? I almost put you in anyway, but in the end just mentioned it on twitter: http://twitter.com/Biopython/statuses/6502469425 Another idea to showcase new features would be for the author(s) to prepare a (credited) blog post with some examples (to put on our news server). I have already done a few like this, and think it would also be a good thing in general. Peter From kellrott at gmail.com Wed Dec 16 20:39:49 2009 From: kellrott at gmail.com (Kyle Ellrott) Date: Wed, 16 Dec 2009 17:39:49 -0800 Subject: [Biopython-dev] zxJDBC support for BioSQL Message-ID: I've push pushed a patch to the BioSQL code that enables zxJDBC support. This means that Jython can now run BioSQL through mysql. (SQLite hasn't been ported to Java yet) zxJDBC is a Jython module included in the standard distribution that provides a PythonDB interface through the java sql interfaces. I've only ran the unit tests using the mysql-connector, but it should theoretically work with Oracle as well. The biggest issues for changing code: - Java expects ? instead of %s, so sql strings have to be altered (I override the execute method in the DBUtils to run a regular express before execution) - A Sql string with a=? works, one with a='?' does not (Loader.py had some examples of this) - Java returns unicode, not strings (recent patch to the mainline fixes this) Code can be found at http://github.com/kellrott/biopython Kyle From biopython at maubp.freeserve.co.uk Thu Dec 17 05:46:37 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 17 Dec 2009 10:46:37 +0000 Subject: [Biopython-dev] zxJDBC support for BioSQL In-Reply-To: References: Message-ID: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com> On Thu, Dec 17, 2009 at 1:39 AM, Kyle Ellrott wrote: > > I've push pushed a patch to the BioSQL code that enables zxJDBC support. > This means that Jython can now run BioSQL through mysql. ?(SQLite hasn't > been ported to Java yet) > zxJDBC is a Jython module included in the standard distribution that > provides a PythonDB interface through the java sql interfaces. ?I've only > ran the unit tests using the mysql-connector, but it should theoretically > work with Oracle as well. Sounds good, and ought to work on PostgreSQL too in theory. I should be able to test it on MySQL. > The biggest issues for changing code: > ?- Java expects ? instead of %s, so sql strings have to be altered (I > override the execute method in the DBUtils to run a regular express > before execution) > ?- A Sql string with a=? works, one with a='?' does not (Loader.py had some > examples of this) > ?- Java returns unicode, not strings (recent patch to the mainline fixes > this) Some of those issues applied to SQLite (hence the changes on the trunk from Brad). > Code can be found at http://github.com/kellrott/biopython Lovely. That's on your jython branch (along with lots of your other work)? Peter From biopython at maubp.freeserve.co.uk Thu Dec 17 08:31:30 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 17 Dec 2009 13:31:30 +0000 Subject: [Biopython-dev] zxJDBC support for BioSQL In-Reply-To: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com> References: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com> Message-ID: <320fb6e00912170531j3f9c9b38n123e0464fa536e45@mail.gmail.com> On Thu, Dec 17, 2009 at 10:46 AM, Peter wrote: > On Thu, Dec 17, 2009 at 1:39 AM, Kyle Ellrott wrote: >> >> I've push pushed a patch to the BioSQL code that enables zxJDBC support. >> This means that Jython can now run BioSQL through mysql. ?(SQLite hasn't >> been ported to Java yet) >> zxJDBC is a Jython module included in the standard distribution that >> provides a PythonDB interface through the java sql interfaces. ?I've only >> ran the unit tests using the mysql-connector, but it should theoretically >> work with Oracle as well. > > Sounds good, and ought to work on PostgreSQL too in theory. > > I should be able to test it on MySQL. I worked out I needed to install MySQL Connector/J so that org.gjt.mm.mysql.Driver works in Jython, get it from here: http://dev.mysql.com/downloads/connector/j/ Installation seems to be just unzipping this and updating your CLASSPATH environment variable to point at the jar file. Peter From biopython at maubp.freeserve.co.uk Thu Dec 17 09:54:13 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 17 Dec 2009 14:54:13 +0000 Subject: [Biopython-dev] zxJDBC support for BioSQL In-Reply-To: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com> References: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com> Message-ID: <320fb6e00912170654g41bc8c4eyce0f56b4472076f9@mail.gmail.com> On Thu, Dec 17, 2009 at 10:46 AM, Peter wrote: > On Thu, Dec 17, 2009 at 1:39 AM, Kyle Ellrott wrote: >> >> I've push pushed a patch to the BioSQL code that enables zxJDBC support. >> This means that Jython can now run BioSQL through mysql. ?(SQLite hasn't >> been ported to Java yet) Maybe one day Jython will have a Python sqlite3 like library built in: http://bugs.jython.org/issue1682864 For now it looks like we could probably use SQLite via zxJDBC instead (see links on that Jython issue). Peter From kellrott at gmail.com Thu Dec 17 13:03:38 2009 From: kellrott at gmail.com (Kyle Ellrott) Date: Thu, 17 Dec 2009 10:03:38 -0800 Subject: [Biopython-dev] zxJDBC support for BioSQL In-Reply-To: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com> References: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com> Message-ID: > > Code can be found at http://github.com/kellrott/biopython > > Lovely. That's on your jython branch (along with lots of your other work)? > Yes, but all of the zxJDBC work has been done in the past 2 weeks (just the last three commits), so it should be easy to cherry-pick out the relevant patches. Kyle From mhampton at d.umn.edu Thu Dec 17 13:42:33 2009 From: mhampton at d.umn.edu (Marshall Hampton) Date: Thu, 17 Dec 2009 12:42:33 -0600 (CST) Subject: [Biopython-dev] code credits In-Reply-To: References: Message-ID: I strongly encourage you to list anyone who has contributed a patch, no matter how small. This has worked very well for the Sage project (www.sagemath.org) where credit is given to all contributors and reviewers (every patch must be reviewed by at least one other person). For example see: http://groups.google.com/group/sage-announce/msg/bcf5591837068b5f Marshall Hampton Department of Mathematics and Statistics University of Minnesota, Duluth > Message: 1 > Date: Wed, 16 Dec 2009 22:43:16 +0000 > From: Peter > Subject: Re: [Biopython-dev] Code freeze for Biopython 1.53 > To: Brad Chapman , biopython-dev at biopython.org > Message-ID: > <320fb6e00912161443q30f82120of1c98b073136c3f6 at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Brad wrote: >>> Brad - could you run a sanity test before uploading to pypi? >> >> Looks good to me, and uploaded to pypi. > > Great, thank you. > >>> Draft text below - any comments? >> >> As a thought for next time, what do you think about adding the >> names of people who have worked on the items mentioned in the >> release? This would give a bit more public recognition for the >> contributions, especially to people who only look at the release >> notes and not mailing list traffic. > > Its too late for the emails and the source code bundles, but > the nice thing about the NEWS file (in the repository) and > the OBF news server is we can update them even now. > > Of course, quite where to draw the line is debatable - a simple > patch probably doesn't warrant it (or does it?), but solving a > more complex bug or adding some new functionality does. > If any existing core developers want more "recognition" we > can do that too. > > For example, Kyle, would you have like to be named with > regards to the Jython work? I almost put you in anyway, > but in the end just mentioned it on twitter: > http://twitter.com/Biopython/statuses/6502469425 > > Another idea to showcase new features would be for the > author(s) to prepare a (credited) blog post with some > examples (to put on our news server). I have already done > a few like this, and think it would also be a good thing in > general. > > Peter From kellrott at gmail.com Thu Dec 17 16:20:10 2009 From: kellrott at gmail.com (Kyle Ellrott) Date: Thu, 17 Dec 2009 13:20:10 -0800 Subject: [Biopython-dev] code credits In-Reply-To: References: Message-ID: I would agree with that. Drawing from broad stereotypes, I would think that a majority of contributors are academic and would be most interested in adding things to their CV. So acknowledgment would be of great value to them at no real cost to the Biopython project. Plus there's the old idea that the more authors a paper has the more important it must be. Kyle I strongly encourage you to list anyone who has contributed a patch, no > matter how small. This has worked very well for the Sage project ( > www.sagemath.org) where credit is given to all contributors and reviewers > (every patch must be reviewed by at least one other person). For example > see: > > http://groups.google.com/group/sage-announce/msg/bcf5591837068b5f > > Marshall Hampton > Department of Mathematics and Statistics > University of Minnesota, Duluth > > From tallpaulinjax at yahoo.com Thu Dec 17 16:48:25 2009 From: tallpaulinjax at yahoo.com (Paul B) Date: Thu, 17 Dec 2009 13:48:25 -0800 (PST) Subject: [Biopython-dev] code credits In-Reply-To: Message-ID: <928490.72367.qm@web30708.mail.mud.yahoo.com> I also agree completely. Adding value to the code deserves some form of credit, if desired by the contributor. I fixed a bit of code in a couple of the modules and received no credit... that made me a good bit less gung ho about contributing more. --- On Thu, 12/17/09, Kyle Ellrott wrote: From: Kyle Ellrott Subject: Re: [Biopython-dev] code credits To: "Marshall Hampton" Cc: biopython-dev at lists.open-bio.org Date: Thursday, December 17, 2009, 4:20 PM I would agree with that.? Drawing from broad stereotypes, I would think that a majority of contributors are academic and would be most interested in adding things to their CV.? So acknowledgment would be of great value to them at no real cost to the Biopython project.? Plus there's the old idea that the more authors a paper has the more important it must be. Kyle I strongly encourage you to list anyone who has contributed a patch, no > matter how small.? This has worked very well for the Sage project ( > www.sagemath.org) where credit is given to all contributors and reviewers > (every patch must be reviewed by at least one other person).? For example > see: > > http://groups.google.com/group/sage-announce/msg/bcf5591837068b5f > > Marshall Hampton > Department of Mathematics and Statistics > University of Minnesota, Duluth > > _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Thu Dec 17 17:54:40 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 17 Dec 2009 22:54:40 +0000 Subject: [Biopython-dev] code credits In-Reply-To: <928490.72367.qm@web30708.mail.mud.yahoo.com> References: <928490.72367.qm@web30708.mail.mud.yahoo.com> Message-ID: <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com> Hi all, Marshall Hampton's description of how they do it on Sage sounds worth trying - if we keep track as things are checked in, it won't be too much work either. Do you (sage) have a list of guidelines for what qualifies for a credit? On Thu, Dec 17, 2009 at 9:48 PM, Paul B wrote: > > I also agree completely. Adding value to the code deserves > some form of credit, if desired by the contributor. I fixed a bit > of code in a couple of the modules and received no credit... > that made me a good bit less gung ho about contributing more. > Sorry :( You didn't get no credit at all though, you were named in the commit: http://github.com/biopython/biopython/commit/225fb0eb92c99018c3710c3ec5ac0b22e9706208 Also people who offer changes via github that can be merged cleanly onto the trunk, or cherry-picked would also automatically get a credit in the repository history. Would someone like to go through the git log for Biopython 1.53 for a full list? e.g. Hongbo Zhu and Frederik Gwinner contributed to a PDB enhancement (Bug 2495), and as he pointed out, so did Paul B (again, PDB stuff). These were the "border line" cases I had in mind here: http://lists.open-bio.org/pipermail/biopython-dev/2009-December/007161.html >From personal experience contributing to other open source project, getting a credit in release notes even for a small bug fix/enhancement as in sage is rare. So while I thought I was following OS norms in writing the last release notes, we can certainly do this differently in future. Regards, Peter From mhampton at d.umn.edu Thu Dec 17 20:54:00 2009 From: mhampton at d.umn.edu (Marshall Hampton) Date: Thu, 17 Dec 2009 19:54:00 -0600 (CST) Subject: [Biopython-dev] code credits In-Reply-To: <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com> References: <928490.72367.qm@web30708.mail.mud.yahoo.com> <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com> Message-ID: On Thu, 17 Dec 2009, Peter wrote: > Marshall Hampton's description of how they do it on Sage > sounds worth trying - if we keep track as things are checked > in, it won't be too much work either. Do you (sage) have a > list of guidelines for what qualifies for a credit? I don't think we have formal guidelines, but the process is pretty simple. Whoever works on a patch in our bug/feature tracker has to flag it for review. Both the person who implements the patch and the reviewer get credit. It doesn't matter if its a 1-character change to the documentation, they're listed in the release notes. Basically, the idea is to err (if that's the right word) on the side of acknowledging any contribution. I think that Sage (really William Stein initially) adopting that philosophy is one of the reasons its gone from 1 to 150 or so developers. I'm cc'ing sage-devel in case anyone there wants to comment on this. Cheers, Marshall Hampton Department of Mathematics and Statistics University of Minnesota, Duluth From bugzilla-daemon at portal.open-bio.org Fri Dec 18 04:44:02 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 04:44:02 -0500 Subject: [Biopython-dev] [Bug 2938] Bio.Entrez.read() returns empty string for HTML (not an error) In-Reply-To: Message-ID: <200912180944.nBI9i22n007947@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2938 ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2009-12-18 04:44 EST ------- The offending XML file (the one that does not start with Message-ID: <200912180946.nBI9kjFA008009@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2943 ------- Comment #11 from mdehoon at ims.u-tokyo.ac.jp 2009-12-18 04:46 EST ------- Peter, are you still looking at this bug report? Otherwise I could have a look at it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 05:00:50 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 05:00:50 -0500 Subject: [Biopython-dev] [Bug 2698] Attempt at a unit test for MaxEntrophy In-Reply-To: Message-ID: <200912181000.nBIA0opL008316@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2698 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2009-12-18 05:00 EST ------- Thanks for your test! I would like to simplify the code a bit though. How about replacing ix, iy= expand_count([0, 0, 1],'C', 40) xm.extend(ix) ym.extend(iy) by xm.extend([0,0,1] * 40) ym.extend(['C'] * 40) Or, you could replace this whole section by xm = [0,0,1]*40 + [0,0,1]*60 + [0,1,0]*75 + [0,1,0]*25 + [1,0,0]*90 + [1,0,0]*10 and similarly for ym. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 05:08:24 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 05:08:24 -0500 Subject: [Biopython-dev] [Bug 2693] LogisticRegression convergence criterion is too lenient In-Reply-To: Message-ID: <200912181008.nBIA8Ogf008537@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2693 ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2009-12-18 05:08 EST ------- Sorry for not getting back to this bug report earlier. (In reply to comment #3) > > Also, it is not necessary to pass old_llik to update_fn; if needed, update_fn > > can store the value of llik on each call. > > I guess this is all how you define the purpose of the update_fn function. > Do you have an example of the update_fn function where old_llik is needed? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 05:17:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 05:17:12 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200912181017.nBIAHCJN008837@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #12 from mdehoon at ims.u-tokyo.ac.jp 2009-12-18 05:17 EST ------- One option is to store these variables inside the function. As an example, if this is a module mymodule.py: def f(x = None): if x==None: x = f.x print x f.x = 3 then we can do the following: >>> import mymodule >>> mymodule.f() 3 >>> mymodule.f(5) 5 >>> mymodule.f.x = 9 >>> mymodule.f(5) 5 >>> mymodule.f() 9 >>> But personally, I think that having module-level defaults is not really necessary. We typically don't have that for other functions, and the only reason for having them here is that once upon a time this module had such module-level defaults. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 05:24:35 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 05:24:35 -0500 Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model. In-Reply-To: Message-ID: <200912181024.nBIAOZj6009054@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2943 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-18 05:24 EST ------- (In reply to comment #11) > Peter, are you still looking at this bug report? > Otherwise I could have a look at it. Thanks Michiel - Please feel free. I didn't feel we had time to get this into Biopython 1.53, as I think it is going to be a lot of work to assess, but needs to be done. I think there are two issues here, poor support for multiple models, and re-writing the flex parser in pure python. Given time (!) I would want to take Paul's python parser and use it to replace the flex code (which is currently not compiled or installed by default, Bug 2619) and verify it is backwards compatible, and then add in the model support. If we have enough test coverage already, then doing it in one go might be OK. Up to you. Other relevant issues include Bug 2626 (files the current parser can't read - it may turn out that these are also multi-model CIF files). Also regarding the model support, for PDB files we currently index them 0,1,2,... as found in the file. There are also names given in the PDB file itself, which need not by continuous etc. See Bug 2950 and Bug 2951 for this. Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 05:44:13 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 05:44:13 -0500 Subject: [Biopython-dev] [Bug 2938] Bio.Entrez.read() returns empty string for HTML (not an error) In-Reply-To: Message-ID: <200912181044.nBIAiDD6009554@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2938 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-18 05:44 EST ------- (In reply to comment #6) > The offending XML file (the one that does not start with efetch from the journals database. Upon the EUtils documentation more > carefully, it seems that XML output from the journals database is not > officially supported; only text and html output are supported. One option is > to simply remove the offending XML file from the tests, and raise an error > whenever Entrez.read is presented with data that do not start with Additionally, we could add a parser for the text output generated by efetch > from the journals database. Hmm - sounds like a plan, but maybe drop the Entrez team a query about this. Does the current funny XML file have anything useful in it? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 05:50:03 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 05:50:03 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200912181050.nBIAo39q009740@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-18 05:50 EST ------- (In reply to comment #12) > > But personally, I think that having module-level defaults is not really > necessary. We typically don't have that for other functions, and the only > reason for having them here is that once upon a time this module had such > module-level defaults. I agree the module-level defaults are not necessary - but it would be "nice" to have a transition where both can be used. In reality, I may being overly cautious - doubt it would affect many (any?) users to just make a clean switch (which would keep the code simple). I'm happy to leave this to your judgement Michiel. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 05:54:24 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 05:54:24 -0500 Subject: [Biopython-dev] [Bug 2947] Bio.HMM calculates wrong viterbi path In-Reply-To: Message-ID: <200912181054.nBIAsOIw009914@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2947 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-18 05:54 EST ------- (In reply to comment #0) > > Thus it appears to me that the viterbi algorithm is not robust enough > and biased towards the last letter of the state alphabet. Quite possibly. Might there be a bug in our code, or do you think this is just an inherent algorithm limitation? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 06:53:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 06:53:14 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200912181153.nBIBrEQi011286@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #14 from mdehoon at ims.u-tokyo.ac.jp 2009-12-18 06:53 EST ------- Fixed in github. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 09:12:26 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 09:12:26 -0500 Subject: [Biopython-dev] [Bug 2947] Bio.HMM calculates wrong viterbi path In-Reply-To: Message-ID: <200912181412.nBIECQ59014801@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2947 ------- Comment #2 from georg.lipps at fhnw.ch 2009-12-18 09:12 EST ------- Hi Peter, I am not an expert of the Viterbi algorithm. But as such the algorithm does not do what is is expected to do. So I guess it is indeed an error in the implementation. I would be very happy if it can be fixed. Greetings, Georg -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 11:15:24 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 11:15:24 -0500 Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model. In-Reply-To: Message-ID: <200912181615.nBIGFOD2017597@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2943 ------- Comment #13 from TallPaulInJax at yahoo.com 2009-12-18 11:15 EST ------- Michiel, if you have any questions please feel free to contact me! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rjalves at igc.gulbenkian.pt Fri Dec 18 18:39:28 2009 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Fri, 18 Dec 2009 23:39:28 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> Message-ID: <4B2C12B0.9060806@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Sorry to take this to the discussion list, took a bit longer than I expected to get the approval. Bringing now the subject to the right place. Leaving full quote history to help the reading. Quoting Peter on 12/18/2009 09:39 PM: > Hi Renato, > > I'm cooking dinner while writing this, so it won't be as in depth as > usual... > > On Fri, Dec 18, 2009 at 5:17 PM, Renato Alves wrote: >> [I tried submitting this message to the dev mailing list, but got >> rejected since I'm not yet authorized to post there, so here it goes] > > Have you definitely subscribed to the dev list? That should be all that > is required to post there, and this discussion would be better suited > there. > >> Hi everyone, >> >> I'm working on changes to the Bio.SeqIO.index() function to make it more >> consistent with the .read and .parse i.e. accept a filehandle instead of >> a filename and also to include a way to cache the index into a file to >> speed up the process. >> >> The reason why we are implementing these two is because we were going to >> implement our own index solution until we realized this was added to 1.52. >> >> However the implementation in 1.52 has a few limitations. > > Yes, this was designed to cover basic use cases in a general way, > but with the option in future to do other things - and in particular > saving the index to disk was kept in mind. > >> One limitation is that we are using a gzipped database for the sake of >> space and using gzip.open() to create the file-handle that would then be >> passed to .parse(). The same was not doable with .index(). >> This is already implemented in >> http://github.com/Unode/biopython/commit/6fc390151452e3ddf26a117269132125a3ffb3fe > > That was a deliberate choice in that the index code wants to "own" > the handle. If other code has access to the handle, there is a risky > of different bits of code moving the handle pointer etc. But, if you > are careful it could be done. The way I approached it was to reset the handle pointer to the first position, since we would like to index the full file. But I understand that if the user uses the same handle on different files weird results may happen. Something that could be a simple workaround would be to copy the filehandle object in such a way that it's properties are maintained (like being a gzip.open() filehandle) but it's use doesn't affect the use of the original handle. However I don't know if this is possible. > > There are also issues here in combination with saving the index. > With a filename, the code can easily reopen the file in the same > mode. With a handle, things are more tricky. You have non-file > handles to consider - such as the gzip example. There is also the > problem of recording the file mode (normal text, universal text, > or binary - which we will need for SFF files - code already written). > I see, only after your comment I realized handle.name and handle.mode are only available in normal filehandles. The gzip.open() example stores the filename in .filename while the .mode seems to have a different meaning. > If we do change the code to allow handles, it would have to be > to allow handles OR filenames to be compatible with Biopython > 1.52 and 1.53 (which take just filenames). This could be handled > as in Bio.SeqIO.convert(), which also allows both (which was the > subject of some discussion!). > I'll have to look more on the example and consider the fact that my current implementation breaks compatibility with previous code and that not everything needed (filename, mode,...) is accessible in filehandles. >> The second is that we are going to use this feature to quick search the >> database in a web application. Here we have the limitation that we don't >> have persistence across web requests, which means that we would need to >> recalculate the index on every web request. >> >> The details of how we plan to implement this are the following: >> >> cPickle the internal dictionary of offsets and save it on the database >> folder with the same name as the database + .index. The consistency >> check on whether the file has changed will be performed based on name >> and timestamp. By default .index() will search for this file, check the >> timestamp and use the cache if they match, otherwise they will be >> recalculated. The save function will be available like: >> >>>>> d = SeqIO.index(...) >>>>> d.save(filename) >> where filename is optional and defaults to "%s.index" % _handle.name >> >> We already have a solution like this implemented with subclasses of >> SeqIO._index, it's just a matter of reworking that and merge it into >> BioPython if you consider a good addition to the code. >> >> I would like to hear your comments and suggestions on this. > > Yes, saving indexes is an obvious addition. I have explored > using pickle via shelve, and also SQLite - there are > implementations of this on my github respository, plus > begun to look into the existing OBF Open Biological > Database Access (OBDA) specification for cross project > compatibility. Other potential benefits here are reduced > memory usage if we don't keep the dictionary > of offsets in RAM. I did try to use pickle directly on the dict like object that is returned from SeqIO.index() but pickle was not happy with it. The SQLite approach also crossed my mind and also BioSQL or just some custom SQL database, but the RAM approach seemed good enough, at least for our current uses. I can see though that some file formats will require a lot more RAM depending on what is indexed and their size. In the end it came out as cPickled dictionaries for faster access. > > http://github.com/peterjc/biopython/tree/index-shelve > http://github.com/peterjc/biopython/tree/index-sqlite > > There is a potential complication with index sub-classes > which do more specialised indexing (e.g. GenBank files, > and for a more extreme case, SFF files). See: > http://github.com/peterjc/biopython/tree/sff-seqio For these I would have to do it on a unittest base, I'm not familiar with the formats. Also the implementation I did was based on the current master branch of biopython. I now realize a lot more has been done outside of it that I should look into. > > Anyway - great to see you are finding the code useful, > and have some quite similar ideas for how to extend > it further. > > Peter Thanks for all that info, I have a lot to dig into and see if I can actually contribute with something. You seem to have pretty much everything sorted ;) Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkssEqkACgkQYh11EUYTX9QWHwCeOIuuaEGA3qLvB1EHamDohpZ3 bj0AnRAkP9jOGpvTnSc0W7YgFyX/Ard/ =S45W -----END PGP SIGNATURE----- From biopython at maubp.freeserve.co.uk Sat Dec 19 04:57:25 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 19 Dec 2009 09:57:25 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <4B2C12B0.9060806@igc.gulbenkian.pt> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> Message-ID: <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> On Fri, Dec 18, 2009 at 11:39 PM, Renato Alves wrote: > Sorry to take this to the discussion list, took a bit longer than I > expected to get the approval. > > Bringing now the subject to the right place. Leaving full quote history > to help the reading. Thanks. >> That was a deliberate choice in that the index code wants to "own" >> the handle. If other code has access to the handle, there is a risk >> of different bits of code moving the handle pointer etc. But, if you >> are careful it could be done. > > The way I approached it was to reset the handle pointer to the first > position, since we would like to index the full file. But I understand > that if the user uses the same handle on different files weird results > may happen. OK > Something that could be a simple workaround would be to copy the > filehandle object in such a way that it's properties are maintained > (like being a gzip.open() filehandle) but it's use doesn't affect the > use of the original handle. However I don't know if this is possible. That may work for some handles but not others. Worth trying. >> There are also issues here in combination with saving the index. >> With a filename, the code can easily reopen the file in the same >> mode. With a handle, things are more tricky. You have non-file >> handles to consider - such as the gzip example. There is also the >> problem of recording the file mode (normal text, universal text, >> or binary - which we will need for SFF files - code already written). > > I see, only after your comment I realized handle.name and handle.mode > are only available in normal filehandles. The gzip.open() example stores > the filename in .filename while the .mode seems to have a different > meaning. That would make finding out the filename from a handle tricky. >> If we do change the code to allow handles, it would have to be >> to allow handles OR filenames to be compatible with Biopython >> 1.52 and 1.53 (which take just filenames). This could be handled >> as in Bio.SeqIO.convert(), which also allows both (which was the >> subject of some discussion!). > > I'll have to look more on the example and consider the fact that my > current implementation breaks compatibility with previous code and that > not everything needed (filename, mode,...) is accessible in filehandles. OK. >> Yes, saving indexes is an obvious addition. I have explored >> using pickle via shelve, and also SQLite - there are >> implementations of this on my github respository, plus >> begun to look into the existing OBF Open Biological >> Database Access (OBDA) specification for cross project >> compatibility. Other potential benefits here are reduced >> memory usage if we don't keep the dictionary >> of offsets in RAM. > > I did try to use pickle directly on the dict like object that is > returned from SeqIO.index() but pickle was not happy with it. The SQLite > approach also crossed my mind and also BioSQL or just some custom SQL > database, but the RAM approach seemed good enough, at least for our > current uses. I can see though that some file formats will require a lot > more RAM depending on what is indexed and their size. In the end it came > out as cPickled dictionaries for faster access. I agree that an in RAM dictionary works pretty well, even for very large sequence files. In terms of speed, I would expect a two step build index in memory, then save to disk, to be faster than building the index database on disk (which was a bit slow). >> There is a potential complication with index sub-classes >> which do more specialised indexing (e.g. GenBank files, >> and for a more extreme case, SFF files). See: >> http://github.com/peterjc/biopython/tree/sff-seqio > > For these I would have to do it on a unittest base, I'm not familiar > with the formats. Also the implementation I did was based on > the current master branch of biopython. I now realize a lot more > has been done outside of it that I should look into. I'm sorry if the discussion on the (dev) mailing list wasn't clearer - but having a fresh set of eyes looking at the topic is very useful. >> Anyway - great to see you are finding the code useful, >> and have some quite similar ideas for how to extend >> it further. > > Thanks for all that info, I have a lot to dig into and see if I can > actually contribute with something. You seem to have pretty much > everything sorted ;) Well, i hadn't been thinking about gzipped files (or any archives). How does gzip behave with memory use? I assume it doesn't load everything into RAM, but does allow you random access (seek and tell). This is a vague idea (which I haven't tried yet), but maybe the Bio.SeqIO.index() function could take an optional argument (gzip=True, or something more general like archive=...) which would cause the file to be opened via the gzip module instead? Regards, Peter From bugzilla-daemon at portal.open-bio.org Sat Dec 19 06:02:44 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Dec 2009 06:02:44 -0500 Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text output with NCBStandalone.PSIBlastParser In-Reply-To: Message-ID: <200912191102.nBJB2iOb014900@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2927 ------- Comment #6 from robfsouza at gmail.com 2009-12-19 06:02 EST ------- Created an attachment (id=1412) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1412&action=view) Testcase for NCBI's BLAST alignment with errors This is a sequence from Naegleria gruberi and blastpgp output which reproduces a reported bug in NCBI's blastpgp output at the first iteration (see hit against sequence gi|156552846|ref|XP_001600053.1). Search parameters were blastpgp -d nr -i Ngru1000013938.fa -o Ngru1000013938.fa.br -a 8 -j 1 -b 10000 -v 10000 -h 0.01 -I T -m 0 -M BLOSUM62 -F F -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 19 06:21:13 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Dec 2009 06:21:13 -0500 Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text output with NCBStandalone.PSIBlastParser In-Reply-To: Message-ID: <200912191121.nBJBLDax015457@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2927 robfsouza at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1412 is|0 |1 obsolete| | ------- Comment #7 from robfsouza at gmail.com 2009-12-19 06:21 EST ------- Created an attachment (id=1413) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1413&action=view) Testcase for NCBI's BLAST alignment with errors Sending the right query sequence now (my mistake! :)) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 19 07:09:57 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Dec 2009 07:09:57 -0500 Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text output with NCBStandalone.PSIBlastParser In-Reply-To: Message-ID: <200912191209.nBJC9vxr016459@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2927 ------- Comment #8 from ibdeno at gmail.com 2009-12-19 07:09 EST ------- (In reply to comment #7) Just to confirm that I can reproduce the 'Query: 0' with blastpgp 2.2.22 using Robson's test case. Thanks to Robson for this and apologies for not having been able to send a test case. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rjalves at igc.gulbenkian.pt Sat Dec 19 16:48:10 2009 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Sat, 19 Dec 2009 21:48:10 +0000 Subject: [Biopython-dev] SeqIO.index improvement suggestions In-Reply-To: <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> Message-ID: <4B2D4A1A.6@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Well, i hadn't been thinking about gzipped files (or any archives). > How does gzip behave with memory use? I assume it doesn't > load everything into RAM, but does allow you random access > (seek and tell). - From what I can tell, in terms of RAM it behaves the same way as a normal open() it only decompresses the segments as they are accessed but doesn't cache them. A reasonable trade-off between space and access time. > This is a vague idea (which I haven't tried yet), but maybe the > Bio.SeqIO.index() function could take an optional argument > (gzip=True, or something more general like archive=...) which > would cause the file to be opened via the gzip module instead? I thought about something similar but using a combination of extension of the file and magic (or actually python-magic[1]). The first one is potentially messy although it's how things are mostly done in Windows. The second one I couldn't confirm if is available for Windows but is widely present in Linux (and I suppose MacOS too). In the end I dislike the idea of 'having' to use one approach or the other depending on the OS the code is running on, however this would fit in without breaking any compatibility with current code. 1 - http://pypi.python.org/pypi/python-magic/0.1 Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (GNU/Linux) iEYEARECAAYFAkstShgACgkQYh11EUYTX9Tu3wCglh6d3rt/ANU5J45bsceqcQ78 TQ0AnjgIlNhYRMqdzl4jBGYOPdMKOY7D =rqsi -----END PGP SIGNATURE----- From eric.talevich at gmail.com Sat Dec 19 17:42:23 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 19 Dec 2009 14:42:23 -0800 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> Message-ID: <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> On Sat, Dec 19, 2009 at 1:57 AM, Peter wrote: > This is a vague idea (which I haven't tried yet), but maybe the > Bio.SeqIO.index() function could take an optional argument > (gzip=True, or something more general like archive=...) which > would cause the file to be opened via the gzip module instead? > Or: open=open -- accept a function that opens the file; by default, the built-in open function, but easily replaced by gzip.open, bz2.BZ2File, or a user-defined function to open zip files (since that's less straightforward). Otherwise, since the variety of archive formats supported by the Python standard library is limited, archive='gzip'|'bz2'|'zip' sounds good. -Eric From rjalves at igc.gulbenkian.pt Sat Dec 19 19:08:42 2009 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Sun, 20 Dec 2009 00:08:42 +0000 Subject: [Biopython-dev] SeqIO.index improvement suggestions In-Reply-To: <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> Message-ID: <4B2D6B0A.4040008@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - - From Eric Talevich on 12/19/2009 10:42 PM: > Or: open=open -- accept a function that opens the file; by default, the > built-in open function, but easily replaced by gzip.open, bz2.BZ2File, > or a user-defined function to open zip files (since that's less > straightforward). > > Otherwise, since the variety of archive formats supported by the Python > standard library is limited, archive='gzip'|'bz2'|'zip' sounds good. I prefer the first option. Flexible, backwards compatible, fits all mentioned cases so far and allows inclusion of other formats. Got my vote on that one. Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (GNU/Linux) iEYEARECAAYFAkstawUACgkQYh11EUYTX9TJbwCgi4fQGQcfaBdJNLbMRsubjz82 4LQAnRgY0IKjwznjtiQzRNd0k8SH4oMN =YNHc -----END PGP SIGNATURE----- From biopython at maubp.freeserve.co.uk Sun Dec 20 13:06:33 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 20 Dec 2009 18:06:33 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> Message-ID: <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> On Sat, Dec 19, 2009 at 10:42 PM, Eric Talevich wrote: > On Sat, Dec 19, 2009 at 1:57 AM, Peter wrote: >> >> This is a vague idea (which I haven't tried yet), but maybe the >> Bio.SeqIO.index() function could take an optional argument >> (gzip=True, or something more general like archive=...) which >> would cause the file to be opened via the gzip module instead? > > Or: open=open -- accept a function that opens the file; by default, the > built-in open function, but easily replaced by gzip.open, bz2.BZ2File, or a > user-defined function to open zip files (since that's less straightforward). That's what I had in mind with the "archive=..." bit (I should have been clearer), but "open" is probably a better name for it (assuming it isn't going to become a reserved word in future versions of Python). > Otherwise, since the variety of archive formats supported by the Python > standard library is limited, archive='gzip'|'bz2'|'zip' sounds good. That would work, but as you say, it is rather limited. Peter From biopython at maubp.freeserve.co.uk Mon Dec 21 06:57:51 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 21 Dec 2009 11:57:51 +0000 Subject: [Biopython-dev] code credits In-Reply-To: References: <928490.72367.qm@web30708.mail.mud.yahoo.com> <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com> Message-ID: <320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com> Hello all, This email has been sent to the Biopython developers list, where we are proposing to include a list of contributors in the Biopython 1.53 and future release notes. I have specifically CC'd Chris Lasher, Hongbo Zhu and Paul B as "new contributors". I don't have an email address for Frederik Gwinner but will send him this message via github instead. On Fri, Dec 18, 2009 at 1:54 AM, Marshall Hampton wrote: > > On Thu, 17 Dec 2009, Peter wrote: >> >> Marshall Hampton's description of how they do it on Sage >> sounds worth trying - if we keep track as things are checked >> in, it won't be too much work either. Do you (sage) have a >> list of guidelines for what qualifies for a credit? > > I don't think we have formal guidelines, but the process is pretty simple. > Whoever works on a patch in our bug/feature tracker has to flag it for > review. ?Both the person who implements the patch and the reviewer get > credit. ?It doesn't matter if its a 1-character change to the documentation, > they're listed in the release notes. ?Basically, the idea is to err (if > that's the right word) on the side of acknowledging any contribution. ... On that basis, this is a (partial?) list for Biopython 1.53, given alphabetically as done by Sage: Bartek Wilczyns Brad Chapman Chris Lasher (first contribution?) Cymon Cox Frank Kauff Frederik Gwinner (first contribution?) Hongbo Zhu (first contribution?) Kyle Ellrott Leighton Pritchard Michiel de Hoon Paul B (first contribution?) Peter Cock Am I missing anyone? Have I spelt all the names right? (Actually a serious question - I recently made a typo on a git commit comment and miss-typed Leighton's surname). We can update the release note on the news server/blog to include this, and send round another announcement email describing this plan. For the source code, I have two suggestions: (1) Include this in the NEWS file for each release, and continue adding names to the single alphabetical list in the CONTRIBUTORS file. (2) Don't included the list of names in the NEWS file, but instead put them in the CONTRIBUTORS file. This can have a section for each future release, with all the existing entries listed as contributors up to and including Biopython 1.52. I prefer the second option - the NEWS file is already quite long, and can refer to the CONTRIBUTORS file (e.g. for Biopython 1.53 we can have a line "(At least) 12 people contributed to this release, including 4 first time contributors". Peter From chapmanb at 50mail.com Mon Dec 21 08:23:39 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 21 Dec 2009 08:23:39 -0500 Subject: [Biopython-dev] code credits In-Reply-To: <320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com> References: <928490.72367.qm@web30708.mail.mud.yahoo.com> <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com> <320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com> Message-ID: <20091221132339.GC21580@sobchak.mgh.harvard.edu> Hi Peter; Awesome. Nice to see all the new and familiar names from this latest release. > (1) Include this in the NEWS file for each release, and continue adding > names to the single alphabetical list in the CONTRIBUTORS file. I'd rather see it this way, which is a bit more informal and in context. Something along the lines of: Bob Jones added the FooBar module for parsing the latest NCBI file format. or: Several bug fixes were committed to the PDB module. Thanks to Joe Smith, Steve P and Jorge Garcia for their patches. If people contributed to something that didn't make the new cut, then we could just list additional contributors near the end. The goal should be to recognize people if they contributed to a release by having their name somewhere in the release notes. For core contributors like yourself, you probably don't want your name next to everything so pick a couple of your favorites for attribution. Brad From biopython at maubp.freeserve.co.uk Mon Dec 21 09:34:38 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 21 Dec 2009 14:34:38 +0000 Subject: [Biopython-dev] code credits In-Reply-To: <20091221132339.GC21580@sobchak.mgh.harvard.edu> References: <928490.72367.qm@web30708.mail.mud.yahoo.com> <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com> <320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com> <20091221132339.GC21580@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00912210634o77d9eb9ex21e4ec3630dd1ed6@mail.gmail.com> On Mon, Dec 21, 2009 at 1:23 PM, Brad Chapman wrote: > > Hi Peter; > Awesome. Nice to see all the new and familiar names from this latest > release. > >> (1) Include this in the NEWS file for each release, and continue adding >> names to the single alphabetical list in the CONTRIBUTORS file. > > I'd rather see it this way, which is a bit more informal and in > context. Something along the lines of: > > Bob Jones added the FooBar module for parsing the latest NCBI > file format. > > or: > > Several bug fixes were committed to the PDB module. Thanks to Joe > Smith, Steve P and Jorge Garcia for their patches. > > If people contributed to something that didn't make the new cut, then we > could just list additional contributors near the end. The goal should > be to recognize people if they contributed to a release by having > their name somewhere in the release notes. For core contributors like > yourself, you probably don't want your name next to everything so pick a > couple of your favorites for attribution. OK - some under your option (3?), the CONTRIBOTORS file is kept in the existing style, and the NEWS file also continues in a similar *style* to before, but making a more concious effort to include names next to noteworthy features, and ensure any other contributors get included at the end (e.g. "Plus miscelaneous bug fixes from X, Y and Z"). That seems fine. Peter From bugzilla-daemon at portal.open-bio.org Mon Dec 21 10:34:22 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Dec 2009 10:34:22 -0500 Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text output with NCBStandalone.PSIBlastParser In-Reply-To: Message-ID: <200912211534.nBLFYMKt002285@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2927 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-21 10:34 EST ------- (In reply to comment #8) > (In reply to comment #7) > Just to confirm that I can reproduce the 'Query: 0' with blastpgp 2.2.22 > using Robson's test case. Thanks to Robson for this and apologies for not > having been able to send a test case. I was also able to confirmed the problem is present in blastpgp 2.2.22, however it seems to have been fixed in the "new" BLAST+ suite, psiblast 2.2.22+ as described here: http://lists.open-bio.org/pipermail/bioperl-l/2009-December/031811.html Given this new information, this does look like an NCBI BLAST bug, and not a problem in Biopython itself. We *might* be able to get our parser to cope with the funny BLAST output, but it does look difficult and risky to me. Miguel - Is it possible the BLAST bug is relatively recent and first showed up when you updated blastpgp to 2.2.18? Regards, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Dec 21 11:48:50 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 21 Dec 2009 16:48:50 +0000 Subject: [Biopython-dev] code credits In-Reply-To: <320fb6e00912210634o77d9eb9ex21e4ec3630dd1ed6@mail.gmail.com> References: <928490.72367.qm@web30708.mail.mud.yahoo.com> <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com> <320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com> <20091221132339.GC21580@sobchak.mgh.harvard.edu> <320fb6e00912210634o77d9eb9ex21e4ec3630dd1ed6@mail.gmail.com> Message-ID: <320fb6e00912210848x449fd73al4e97d3c9e21cf4@mail.gmail.com> Peter wrote this (with spelling fixed): > > OK - some under your option (3?), the CONTRIBUTORS file is kept > in the existing style, and the NEWS file also continues in a similar > *style* to before, but making a more concious effort to include names > next to noteworthy features, and ensure any other contributors get > included at the end (e.g. "Plus miscellaneous bug fixes from X, Y > and Z"). > Actually, looking over this again, if we want to include a "Sage style" list of names in the release notes (which looks good), it really would be easier if we kept this list of names in that format within the repository (updating it as needed when new code is checked in). The NEWS and CONTRIBUTORS files are the obvious places to do this. With Brad's outline (3), or at least how I understood it (and maybe I misunderstood you Brad), the NEWS file would have the contributor names for each release, but not in a format where they can be copy and pasted to put together a release notice. Meanwhile the CONTRIBUTORS file would continue as a single list of all contributions to date. This means whomever writes the release notice has to synthesise the contributor list by hand, which is tedious and risks omitting people. My earlier suggestions had the list of names in the NEWS file for each release (1), or in the CONTRIBUTORS file broken down by release (2). These options seem better to me just from a practical point of view - and we can still also credit people in the main text of the NEWS file as we do now if appropriate. So, how about a merger of (1) and (3)? i.e. * The CONTRIBUTORS file remains a single alphabetical list of all contributors to date (no change). * Entries in the NEWS file for new features etc may continue to credit authors as appropriate. * The NEWS file will include at the end of each release section an alphabetical list of contributors for that release (with new contributors flagged). This will be re-used in the release notice. Peter From bugzilla-daemon at portal.open-bio.org Mon Dec 21 11:49:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Dec 2009 11:49:29 -0500 Subject: [Biopython-dev] [Bug 2966] Primer3Commandline does not use EMBOSS 6.1.0 arguments In-Reply-To: Message-ID: <200912211649.nBLGnTed003915@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2966 ------- Comment #2 from lpritc at scri.sari.ac.uk 2009-12-21 11:49 EST ------- I also found an issue with the PrimerSearchCommandline. The command line options -sequences and -primers do not appear to be used in EMBOSS6.1.0, having been replaced by -seqall and -infile, respectively. I changed the options accordingly, and the modified files are available at http://github.com/widdowquinn/biopython/tree/emboss-branch. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From p.j.a.cock at googlemail.com Tue Dec 22 04:25:27 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 22 Dec 2009 09:25:27 +0000 Subject: [Biopython-dev] Fwd: Debian python-biopython packaging for Biopython 1.53 In-Reply-To: References: <320fb6e00908110407x2c4132f8va17e19aaf2b224d0@mail.gmail.com> <48B3E023-F75A-4F50-90CE-6FDA7DDA9E4C@ini.phys.ethz.ch> <320fb6e00908120308w5077f598i428b6011912c6f37@mail.gmail.com> <783F8F61-58D6-4638-B2C7-5C206C321C13@ini.phys.ethz.ch> <320fb6e00908190305o3cb4523ct1645b98f4b284d43@mail.gmail.com> <4151f0acb1da52f12d3f08419d3171e9@ini.phys.ethz.ch> <320fb6e00908200748g78485c64kc19cea88c7c4cee@mail.gmail.com> Message-ID: <320fb6e00912220125w50a600c1xcf5e4750d70b39ca@mail.gmail.com> Hi all, Do any of our C experts know if this compilation warning is important (under Linux Debain, query raised by Philipp Benner who kindly packages Biopython for Debian, which also get used on Ubuntu). Thanks, Peter ---------- Forwarded message ---------- From: Philipp Benner Date: Mon, Dec 21, 2009 at 6:34 PM Subject: Debian python-biopython packaging for Biopython 1.53 To: Peter Cock Hey Peter, I just uploaded the new release. Just a minor question: dpkg-shlibdeps: warning: dependency on libpthread.so.0 could be avoided if "debian/python-biopython/usr/lib/pyshared/python2.5/Bio/Cluster/cluster.so debian/python-biopython/usr/lib/pyshared/python2.5/Bio/Motif/_pwm.so debian/python-biopython/usr/lib/pyshared/python2.5/Bio/KDTree/_CKDTree.so debian/python-biopython/usr/lib/pyshared/python2.4/Bio/trie.so debian/python-biopython/usr/lib/pyshared/python2.5/Bio/PDB/mmCIF/MMCIFlex.so debian/python-biopython/usr/lib/pyshared/python2.4/Bio/Cluster/cluster.so debian/python-biopython/usr/lib/pyshared/python2.5/Bio/trie.so debian/python-biopython/usr/lib/pyshared/python2.4/Bio/Motif/_pwm.so debian/python-biopython/usr/lib/pyshared/python2.5/Bio/cMarkovModel.so debian/python-biopython/usr/lib/pyshared/python2.4/Bio/PDB/mmCIF/MMCIFlex.so debian/python-biopython/usr/lib/pyshared/python2.4/Bio/Nexus/cnexus.so debian/python-biopython/usr/lib/pyshared/python2.4/Bio/cpairwise2.so debian/python-biopython/usr/lib/pyshared/python2.4/Bio/KDTree/_CKDTree.so debian/python-biopython/usr/lib/pyshared/python2.5/Bio/Nexus/cnexus.so debian/python-biopython/usr/lib/pyshared/python2.5/Bio/cpairwise2.so debian/python-biopython/usr/lib/pyshared/python2.4/Bio/cMarkovModel.so" were not uselessly linked against it (they use none of its symbols). is this true? It might also be an error of dpkg-shlibdeps. Regards, Philipp From biopython at maubp.freeserve.co.uk Tue Dec 22 07:14:32 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 22 Dec 2009 12:14:32 +0000 Subject: [Biopython-dev] code credits In-Reply-To: <320fb6e00912210848x449fd73al4e97d3c9e21cf4@mail.gmail.com> References: <928490.72367.qm@web30708.mail.mud.yahoo.com> <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com> <320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com> <20091221132339.GC21580@sobchak.mgh.harvard.edu> <320fb6e00912210634o77d9eb9ex21e4ec3630dd1ed6@mail.gmail.com> <320fb6e00912210848x449fd73al4e97d3c9e21cf4@mail.gmail.com> Message-ID: <320fb6e00912220414t6429f1e5n792e5feeecbe633f@mail.gmail.com> On Mon, Dec 21, 2009 at 4:48 PM, Peter wrote: > So, how about a merger of (1) and (3)? i.e. > > * The CONTRIBUTORS file remains a single alphabetical list > of all contributors to date (no change). > * Entries in the NEWS file for new features etc may continue > to credit authors as appropriate. > * The NEWS file will include at the end of each release section > an alphabetical list of contributors for that release (with new > contributors flagged). This will be re-used in the release notice. I've done that in github - how do the NEWS and CONTRIB file look? http://github.com/biopython/biopython/commit/86d8d99aab894ab5f32a0e7a0c45d63a441da645 I haven't automatically included email addresses for the new contributors since there is a risk of them being harvested for spam, so I figure that should be "opt in". Peter From biopython at maubp.freeserve.co.uk Tue Dec 22 10:34:37 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 22 Dec 2009 15:34:37 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> Message-ID: <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> On Sun, Dec 20, 2009 at 6:06 PM, Peter wrote: > On Sat, Dec 19, 2009 at 10:42 PM, Eric Talevich wrote: >> On Sat, Dec 19, 2009 at 1:57 AM, Peter wrote: >>> >>> This is a vague idea (which I haven't tried yet), but maybe the >>> Bio.SeqIO.index() function could take an optional argument >>> (gzip=True, or something more general like archive=...) which >>> would cause the file to be opened via the gzip module instead? >> >> Or: open=open -- accept a function that opens the file; by default, the >> built-in open function, but easily replaced by gzip.open, bz2.BZ2File, or a >> user-defined function to open zip files (since that's less straightforward). > > That's what I had in mind with the "archive=..." bit (I should have > been clearer), but "open" is probably a better name for it (assuming > it isn't going to become a reserved word in future versions of Python). Proof of concept on github: http://github.com/peterjc/biopython/tree/index-zip This is using open_function as the new argument name (to match the existing key_function and avoid any confusion with the built in name open). I'm open to debate on this. Points to note, this is untested on Windows. In particular we need to look at gzipped plain text files using DOS/Windows new lines (rare case?) plus gzipped plain text files using Unix new lines (likely to be the more common of the two I'd expect). From my initial checks, while gzip.open() does take a mode argument it doesn't seem to support the "rU" value for universal new line read mode. This spoils my plan to give the open_function both the filename and the desired mode (generally "rU", but for SFF files etc we will want to use "rb"). Peter From biopython at maubp.freeserve.co.uk Tue Dec 22 11:08:50 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 22 Dec 2009 16:08:50 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> Message-ID: <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> On Tue, Dec 22, 2009 at 3:34 PM, Peter wrote: > > Points to note, this is untested on Windows. In particular we need > to look at gzipped plain text files using DOS/Windows new lines > (rare case?) plus gzipped plain text files using Unix new lines > (likely to be the more common of the two I'd expect). From my > initial checks, while gzip.open() does take a mode argument it > doesn't seem to support the "rU" value for universal new line > read mode. This spoils my plan to give the open_function both > the filename and the desired mode (generally "rU", but for SFF > files etc we will want to use "rb"). The gzip mode issue is interesting... running on the Mac, Leopard 10.5, using the Apple provided Python 2.5.2, looking at a gzipped QUAL file everything is fine: Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import gzip >>> gzip.open("Quality/example.qual.gz", "r").read() '>EAS54_6_R1_2_1_413_324\n26 26 18 26 26 26 26 26 26 26 26 26 26 26 26 22 26 26 26 26\n26 26 26 23 23\n>EAS54_6_R1_2_1_540_792\n26 26 26 26 26 26 26 26 26 26 26 22 26 26 26 26 26 12 26 26\n26 18 26 23 18\n>EAS54_6_R1_2_1_443_348\n26 26 26 26 26 26 26 26 26 26 26 24 26 22 26 26 13 22 26 18\n24 18 18 18 18\n' >>> gzip.open("Quality/example.qual.gz", "rb").read() '>EAS54_6_R1_2_1_413_324\n26 26 18 26 26 26 26 26 26 26 26 26 26 26 26 22 26 26 26 26\n26 26 26 23 23\n>EAS54_6_R1_2_1_540_792\n26 26 26 26 26 26 26 26 26 26 26 22 26 26 26 26 26 12 26 26\n26 18 26 23 18\n>EAS54_6_R1_2_1_443_348\n26 26 26 26 26 26 26 26 26 26 26 24 26 22 26 26 13 22 26 18\n24 18 18 18 18\n' >>> gzip.open("Quality/example.qual.gz", "rU").read() '>EAS54_6_R1_2_1_413_324\n26 26 18 26 26 26 26 26 26 26 26 26 26 26 26 22 26 26 26 26\n26 26 26 23 23\n>EAS54_6_R1_2_1_540_792\n26 26 26 26 26 26 26 26 26 26 26 22 26 26 26 26 26 12 26 26\n26 18 26 23 18\n>EAS54_6_R1_2_1_443_348\n26 26 26 26 26 26 26 26 26 26 26 24 26 22 26 26 13 22 26 18\n24 18 18 18 18\n' Looking at a gzipped FASTA file everything is fine: >>> gzip.open("Quality/example.fasta.gz", "r").read() '>EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n>EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n>EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n' >>> gzip.open("Quality/example.fasta.gz", "rb").read() '>EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n>EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n>EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n' >>> gzip.open("Quality/example.fasta.gz", "rU").read() '>EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n>EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n>EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n' But, there is a problem with my gzipped FASTQ file: >>> gzip.open("Quality/example.fastq.gz", "r").read() '@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n' >>> gzip.open("Quality/example.fastq.gz", "rb").read() '@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n' >>> gzip.open("Quality/example.fastq.gz", "rU").read() Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py", line 220, in read self._read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py", line 292, in _read self._read_eof() File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py", line 311, in _read_eof raise IOError, "CRC check failed" IOError: CRC check failed I may have stumbled on a bug in the Python gzip library :( Peter From bugzilla-daemon at portal.open-bio.org Thu Dec 24 07:00:56 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 24 Dec 2009 07:00:56 -0500 Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text output with NCBStandalone.PSIBlastParser In-Reply-To: Message-ID: <200912241200.nBOC0ukq031745@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2927 ------- Comment #10 from ibdeno at gmail.com 2009-12-24 07:00 EST ------- (In reply to comment #9) > (In reply to comment #8) > > (In reply to comment #7) > I was also able to confirmed the problem is present in blastpgp 2.2.22, > however it seems to have been fixed in the "new" BLAST+ suite, psiblast > 2.2.22+ as described here: > http://lists.open-bio.org/pipermail/bioperl-l/2009-December/031811.html > > Given this new information, this does look like an NCBI BLAST bug, and not > a problem in Biopython itself. We *might* be able to get our parser to cope > with the funny BLAST output, but it does look difficult and risky to me. > I think the best strategy will be to use the BLAST+ suite, since the "old" programs will be abandoned, as I learnt from NCBI. Also, I think we should use XML output. I know I promised to work on testing that, but I don't think I will able to do the test before Februare... > Miguel - Is it possible the BLAST bug is relatively recent and first showed > up when you updated blastpgp to 2.2.18? > I had been using 2.2.18 for quite a while (months) and never had a problem. I think I initially thought it might be a problem with the actual databases, more than with the program... Best regards, Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 24 10:25:15 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 24 Dec 2009 10:25:15 -0500 Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model. In-Reply-To: Message-ID: <200912241525.nBOFPFxH003980@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2943 ------- Comment #14 from mdehoon at ims.u-tokyo.ac.jp 2009-12-24 10:25 EST ------- >From testing the current flex-based MMCIF parser, it seems that it is not quite complete. I don't think it is necessary to be backwards compatible with it. I rather have a well-designed MMCIF parser written independently, like the one by Paul, and have it replace the current MMCIF parser over time. This also allows us to have the design of the new parser more consistent with other Biopython modules. To do so, I suggest to have the new MMCIF parser in a new module MMCIF.py under Bio.PDB, and let it coexist with the current MMCIF parser for the time being. Since the new MMCIF parser does not use flex, I would think that the previous division into MMCIF2Dict and MMCIFParser may not be needed for the new parser. Paul, do you agree? Can the new parser live in a single MMCIF.py module? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 24 10:37:08 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 24 Dec 2009 10:37:08 -0500 Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model. In-Reply-To: Message-ID: <200912241537.nBOFb83e004255@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2943 ------- Comment #15 from TallPaulInJax at yahoo.com 2009-12-24 10:37 EST ------- Hi Michiel, "I have a well-designed MMCIF parser written independently": Very interesting! Is it written solely in Python as well? I will say the parser I wrote is slower than I would like, so if you have an alternative? "Since the new MMCIF parser does not use flex, I would think that the previous division into MMCIF2Dict and MMCIFParser may not be needed for the new parser." I'm not expert enough in Python and in BioPython to know the correct call here. Perhaps Peter could answer this? I personally like the separation of concerns so that if someone else wanted to write their own parser, the code is modular in nature and supports doing that. Thanks for your help, Michiel! Paul -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 26 05:08:05 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 26 Dec 2009 05:08:05 -0500 Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model. In-Reply-To: Message-ID: <200912261008.nBQA85So025649@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2943 ------- Comment #16 from mdehoon at ims.u-tokyo.ac.jp 2009-12-26 05:08 EST ------- (In reply to comment #15) > "I have a well-designed MMCIF parser written independently": Very interesting! Actually I wrote "I *rather* have....". I don't have an MMCIF parser myself; I was referring to your parser. Btw, could you add a test case for the MMCIF parser (using some small data file that we can include with the Biopython distribution)? Tests are not just important to make sure everything works; often they are a very good example of how the code works. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Mon Dec 28 20:51:40 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 28 Dec 2009 17:51:40 -0800 Subject: [Biopython-dev] Code review request for phyloxml branch In-Reply-To: <3f6baf360909232048u54a63ce5q2adbd0e18ebd7036@mail.gmail.com> References: <3f6baf360909232048u54a63ce5q2adbd0e18ebd7036@mail.gmail.com> Message-ID: <3f6baf360912281751g5152a945p951dbbbcbffbddb1@mail.gmail.com> Hi folks, Here's an update on the status of Bio.Tree and TreeIO. I think I've taken care of most of the blockers since the last review in September. First, some links: http://github.com/etal/biopython/tree/phyloxml/Bio/Tree/ http://github.com/etal/biopython/tree/phyloxml/Bio/TreeIO/ http://github.com/etal/biopython/tree/phyloxml/Tests/test_PhyloXML.py http://github.com/etal/biopython/tree/phyloxml/Tests/test_Tree.py http://biopython.org/wiki/PhyloXML Discussion: *TreeIO* Conversion between Nexus, Newick and phyloXML tree file formats works; the read/parse/write functions for each IO format use the same object types. Neat! The tree annotations (e.g. id) aren't preserved perfectly during conversions -- I'll keep working on this, but I don't think it's a blocker. The taxon names of terminal nodes are kept as "clade" names in phyloXML for round-tripping. Tree topology and branch lengths seem OK. Under the hood: -- PhyloXMLIO is from GSoC -- NewickIO is ported from the Bio.Nexus.Trees parser. I think it works the same way. -- NexusIO relies on Bio.Nexus.Nexus for parsing, then converts the resulting Nexus.Trees.Tree objects to Bio.Tree.Newick objects. One day, when Nexus.Trees is replaced by NewickIO in the main Nexus parser, then this conversion can be dropped and NexusIO will be very simple. *Tree* The BaseTree object structure looks like this:* -- BaseTree.**Tree* contains global tree information, like whether the tree is rooted, and a reference to the root clade. The phyloXML Phylogeny object inherits from this.* -- BaseTree.**Subtree* contains local (clade- or node-specific) information, and references to each of its direct descendents, recursively. The phyloXML Clade object inherits from this. Nodes are implicit. I could add references to the ancestor of each sub-tree without too much difficulty, but I haven't needed them yet. The same methods (get_terminals et al.) generally apply to both classes, so I created a separate TreeMixin class from which both BaseTree.Tree and BaseTree.Subtree inherit. Bio.Tree.Newick contains simple subclasses of Tree and Subtree, and an incomplete set of shims that track Bio.Nexus.Trees.Tree (minus the I/O). This is to ease the deprecation and eventual replacement of Bio.Nexus.Trees, as I imagine it: (1) Port methods from Nexus.Trees to Bio.Tree, simplifying arguments where reasonable (since the node IDs and adjacency list lookup are no longer needed) (2) Implement methods in Bio.Tree.Newick with the original argument lists, but triggering a deprecation warning indicating the newer replacement method (3) Replace Nexus.Trees with an import of Bio.Tree.Newick(IO) and a few more shims to duplicate the original API -- so test_Nexus.py should still pass, ideally (with deprecation warnings) (4) In Nexus.Nexus, replace all usage of Nexus.Trees with proper usage of NexusIO and Bio.Tree methods. (5) Eventually delete Nexus.Trees and the shims in Bio.Tree.Newick. I'm currently doing (1) and (2), with more emphasis on getting (1) right. Not all of the important methods have been ported, but I'm happy with the tree traversal methods. * Tests *I created test_Tree.py to test the methods in Bio.Tree.BaseTree; test_PhyloXML.py tests Bio.Tree.PhyloXML objects and Bio.TreeIO.PhyloXMLIO parsing/writing. I noticed that in Tests/Nexus/, the example file for internal node labels is actually in Newick/NH format, not Nexus. That was briefly confusing, so maybe that file should be renamed. What do you think? All the best, Eric From bugzilla-daemon at portal.open-bio.org Tue Dec 1 12:28:33 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 1 Dec 2009 07:28:33 -0500 Subject: [Biopython-dev] [Bug 2957] GenBank Writer Should Write Out Date In-Reply-To: Message-ID: <200912011228.nB1CSXec001831@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2957 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-01 07:28 EST ------- A slightly more robust version of this has been checked in. Future work could handle date/time objects. Please reopen this bug if there are any problems. Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Dec 1 19:34:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 1 Dec 2009 19:34:19 +0000 Subject: [Biopython-dev] Fwd: [Utilities-announce] NCBI E-Utility Policy Change In-Reply-To: <320fb6e00912011129j68dda3b2p6df9a232f0462458@mail.gmail.com> References: <7B6F170840CA6C4DA63EE0C8A7BB43EC09CA7387@NIHCESMLBX15.nih.gov> <320fb6e00912011129j68dda3b2p6df9a232f0462458@mail.gmail.com> Message-ID: <320fb6e00912011134u2481644aw5dfdfe9f9a3049f0@mail.gmail.com> Hi all, Attention NCBI Entrez users - the NCBI really do want you to include your email address, and it will be mandatory in future! See below... If using Bio.Entrez, the tool parameter will by default be set to Biopython, but the email is omitted. We already encourage the email to be included in our documentation but given the new NCBI guidance I'd suggest we make omitting the email issue a warning in the next release (and an error in the subsequent release of Biopython?). Peter ---------- Forwarded message ---------- From: ? Date: Tue, Dec 1, 2009 at 6:59 PM Subject: [Utilities-announce] NCBI E-Utility Policy Change To: utilities-announce at ncbi.nlm.nih.gov As part of an ongoing effort to ensure efficient access to the Entrez Utilities (E-utilities) by all users, NCBI has decided to change the usage policy for the E-utilities effective June 1, 2010. Effective on June 1, 2010, all E-utility requests, either using standard URLs or SOAP, must contain non-null values for both the &tool and &email parameters. Any E-utility request made after June 1, 2010 that does not contain values for both parameters will return an error explaining that these parameters must be included in E-utility requests. The value of the &tool parameter should be a URI-safe string that is the name of the software package, script or web page producing the E-utility request. The value of the &email parameter should be a valid e-mail address for the appropriate contact person or group responsible for maintaining the tool producing the E-utility request. NCBI uses these parameters to contact users whose use of the E-utilities violates the standard usage policies described at http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements. These usage policies are designed to prevent excessive requests from a small group of users from reducing or eliminating the wider community's access to the E-utilities. NCBI will attempt to contact a user at the e-mail address provided in the &email parameter prior to blocking access to the E-utilities. NCBI realizes that this policy change will require many of our users to change their code. Based on past experience, we anticipate that most of our users should be able to make the necessary changes before the June 1, 2010 deadline. If you have any concerns about making these changes by that date, or if you have any questions about these policies, please contact eutilities at ncbi.nlm.nih.gov. Thank you for your understanding and cooperation in helping us continue to deliver a reliable and efficient web service. _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce -------------- next part -------------- _______________________________________________ Utilities-announce mailing list http://www.ncbi.nlm.nih.gov/mailman/listinfo/utilities-announce From chapmanb at 50mail.com Wed Dec 2 12:57:44 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 2 Dec 2009 07:57:44 -0500 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <320fb6e00911270823g320c7c24pd0773ae8b72902ee@mail.gmail.com> References: <320fb6e00904060625v4a49da2au76159eae18f707eb@mail.gmail.com> <20090406220826.GH43636@sobchak.mgh.harvard.edu> <320fb6e00911270823g320c7c24pd0773ae8b72902ee@mail.gmail.com> Message-ID: <20091202125744.GA46415@sobchak.mgh.harvard.edu> Hi Peter; > Brad has some GFF parsing code he as been working on, which > would be nice to merge into Biopython at some point. See: > > http://lists.open-bio.org/pipermail/biopython-dev/2009-April/005700.html > > As we started to discuss earlier this year, we need to think about > what to do with the existing (old) Bio.GFF module. This was written > by Michael Hoffman back in 2002 which accesses MySQL General > Feature Format (GFF) databases created with BioPerl. > > I've been looking at the old Bio.GFF code, and there are a lot of > redundant things like its own GenBank/EMBL location parsing, > plus its own location objects and its own Feature objects (rather > than reusing Bio.SeqFeature which should have sufficed). I'm ambivalent on deprecating GFF. Agreed that some of it is not well integrated with the rest of Biopython, with the Location/LocationFromString code being the most duplicated. It's too bad feature were reimplemented as well. Is Michael around at all? > I want to suggest we deprecate Michael Hoffman's Bio.GFF module > in Biopython 1.53 (I'm hoping we can do this next month, Dec 2009). > Depending on how soon Brad's code is ready to be merged (which I > am assuming could be Biopython 1.54, spring 2010), we can perhaps > accelerate removal of the old module. The current structure of the GFF code does not require removing what is currently there. It needs a couple of lines in __init__.py to expose the useful classes at the top level: from GFFParser import GFFParser, DiscoGFFParser, GFFExaminer from GFFOutput import GFF3Writer and we'd need to move the MySQLdb check to the Connection class so it's only needed if you are actually using the database code. So these can happen in parallel. Ideally, I'd like to get the GFF stuff in sooner rather than later. The main item on my todo list is finishing the documentation, with the stubs here: http://biopython.org/wiki/GFF_Parsing If I crank that out what do we think about putting it in with the __init__.py modifications I suggested? Brad From mjldehoon at yahoo.com Wed Dec 2 14:29:27 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 2 Dec 2009 06:29:27 -0800 (PST) Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <20091202125744.GA46415@sobchak.mgh.harvard.edu> Message-ID: <317375.58712.qm@web62401.mail.re1.yahoo.com> --- On Wed, 12/2/09, Brad Chapman wrote: > If I crank that out what do we think about putting it in > with the __init__.py modifications I suggested? I'd definitely welcome a GFF parser in Biopython, but I think the current code needs to be simplified and its usage more consistent with other Biopython modules. It's great that the documentation is available. It's a big help in designing the module, in particular what its usage looks like to the user. Let's start from basic GFF parsing. This is the example in the documentation: >>> from BCBio.GFF import GFFParser >>> in_file = "your_file.gff" >>> parser = GFFParser() >>> in_handle = open(in_file) >>> for rec in parser.parse(in_handle): ... print rec >>> in_handle.close() What is the purpose of creating the parser first, and then calling parser.parse on the in_handle? I'd much rather have >>> from BCBio import GFF >>> in_file = "your_file.gff" >>> in_handle = open(in_file) >>> for rec in GFF.parse(in_handle): ... print rec >>> in_handle.close() which is how most other Biopython parsers work. --Michiel. From chapmanb at 50mail.com Thu Dec 3 14:25:34 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 3 Dec 2009 09:25:34 -0500 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <317375.58712.qm@web62401.mail.re1.yahoo.com> References: <20091202125744.GA46415@sobchak.mgh.harvard.edu> <317375.58712.qm@web62401.mail.re1.yahoo.com> Message-ID: <20091203142534.GF51407@sobchak.mgh.harvard.edu> Hi Michiel; > > If I crank that out what do we think about putting it in > > with the __init__.py modifications I suggested? > > I'd definitely welcome a GFF parser in Biopython, but I think the > current code needs to be simplified and its usage more consistent > with other Biopython modules. It's great that the documentation is > available. It's a big help in designing the module, in particular what > its usage looks like to the user. Awesome. I welcome these suggestions; it's really helpful to have fresh eyes looking at it. Hopefully moving it into Biopython will stimulate that. > Let's start from basic GFF parsing. This is the example in the documentation: [...] > What is the purpose of creating the parser first, and then calling > parser.parse on the in_handle? I'd much rather have > > >>> from BCBio import GFF > >>> in_file = "your_file.gff" > >>> in_handle = open(in_file) > >>> for rec in GFF.parse(in_handle): > ... print rec > >>> in_handle.close() Great -- done for parsing and writing and committed to GitHub. The documentation is updated as well. Happy to get other comments and thoughts. Thanks again, Brad From biopython at maubp.freeserve.co.uk Thu Dec 3 14:53:44 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 3 Dec 2009 14:53:44 +0000 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <20091203142534.GF51407@sobchak.mgh.harvard.edu> References: <20091202125744.GA46415@sobchak.mgh.harvard.edu> <317375.58712.qm@web62401.mail.re1.yahoo.com> <20091203142534.GF51407@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> On Thu, Dec 3, 2009 at 2:25 PM, Brad Chapman wrote: > > Great -- done for parsing and writing and committed to GitHub. The > documentation is updated as well. > > Happy to get other comments and thoughts. Thanks again, > I understand that GFF files are complex, and a simple "record iterator" isn't flexible enough to cover all use cases - hence the need for a complex parser class. That said, Michiel is right that GFF.parse() or GFF.read() functions would be consistent with other bits of Biopython, and would provide for the simple use cases. Looking at your code, BCBio.GFF.parse(...) would return SeqRecord objects (with SeqFeatures). That seems redundant to me as one expect people to just use Bio.SeqIO.parse(handle, "gff3") instead. I would instead have expected BCBio.GFF.parse(...) to iterate over the features in a GFF file. Also, and we'd touched on this before - I'd much prefer to have the GFF module quite "low level" using either new GFF-specific classes or simple Python objects (e.g. for each feature use a tuple of ints and strings for the first feature columns plus a dict for the final extendible column of annotation). >From a technical point of view, a justification for this separation is the GFF details are not a perfect fit to the SeqRecord and SeqFeature objects and forcing their use adds unnecessary overheads for people wanting to work directly with the features themselves. Also, by splitting the code into basic parsing and a SeqRecord/SeqFeature conversion layer (which I would put in Bio/SeqIO/GffIO.py) we can add the code in two steps (first GFF parsing, then SeqIO support). I think this split is useful as this is a very big job to do properly: Once we have GFF to SeqRecord parsing, we need to try and ensure that it is compatible with the GenBank to SeqRecord parsing. This is important as we would in effect be extending Biopython to allow GFF3 to GenBank conversions. For testing all this, we can grab the same data in the two file formats (e.g. from the NCBI) and perhaps also use EMBOSS. You may recall we talked to Peter Rice (from EMBOSS) about this - there are some important issues here like ontology mapping where we should be able to reuse a lot of the work EMBOSS has already done (and use the EMBOSS tools to help validate our mapping). i.e. While I may be being overly cautious, I think that while adding GFF parsing and GFF to SeqRecord mapping is very important, it is also very complex. Therefore breaking this into a two stage task makes managing and testing it easier - as well as seeming a good idea for the code itself. Peter From bugzilla-daemon at portal.open-bio.org Thu Dec 3 15:03:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Dec 2009 10:03:29 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912031503.nB3F3Tu8013049@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-03 10:03 EST ------- Brad, Now that Chris at BioPerl is interested, I am confident we can get the SQLite schema into BioSQL in the near future: http://lists.open-bio.org/pipermail/biosql-l/2009-November/001655.html Do you want to update your patch (if needed) and put this up on a Biopython branch in github? How soon do you think it could be ready to merge? It would be nice to have this in the next release (even if we put a bug "beta" warning in)? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Dec 3 15:30:54 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 3 Dec 2009 15:30:54 +0000 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <20091202125744.GA46415@sobchak.mgh.harvard.edu> References: <320fb6e00904060625v4a49da2au76159eae18f707eb@mail.gmail.com> <20090406220826.GH43636@sobchak.mgh.harvard.edu> <320fb6e00911270823g320c7c24pd0773ae8b72902ee@mail.gmail.com> <20091202125744.GA46415@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00912030730rb66c2dav1993465ba25f9f5f@mail.gmail.com> On Wed, Dec 2, 2009 at 12:57 PM, Brad Chapman wrote: > Hi Peter; > >> Brad has some GFF parsing code he as been working on, which >> would be nice to merge into Biopython at some point. See: >> >> http://lists.open-bio.org/pipermail/biopython-dev/2009-April/005700.html >> >> As we started to discuss earlier this year, we need to think about >> what to do with the existing (old) Bio.GFF module. This was written >> by Michael Hoffman back in 2002 which accesses MySQL General >> Feature Format (GFF) databases created with BioPerl. >> >> I've been looking at the old Bio.GFF code, and there are a lot of >> redundant things like its own GenBank/EMBL location parsing, >> plus its own location objects and its own Feature objects (rather >> than reusing Bio.SeqFeature which should have sufficed). > > I'm ambivalent on deprecating GFF. Agreed that some of it is not > well integrated with the rest of Biopython, with the > Location/LocationFromString code being the most duplicated. It's too > bad feature were reimplemented as well. Is Michael around at all? I got in touch with Michael Hoffman - he has moved from the EBI to the University of Washington but his EBI email address still worked. He said: "Please feel free to deprecate the module or make any necessary changes for the project." Even if you (Brad) didn't have a new GFF parser waiting to be added to Biopython, I would still want to do something with Bio.GFF to reduce the redundancy of location and feature code. Deprecation is the simplest solution (but I may be able to reuse some of his location string parsing code on Bug 2738). Peter From bugzilla-daemon at portal.open-bio.org Thu Dec 3 15:32:31 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 3 Dec 2009 10:32:31 -0500 Subject: [Biopython-dev] [Bug 2738] Speed up GenBank parsing, in particular location parsing In-Reply-To: Message-ID: <200912031532.nB3FWV7G013739@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2738 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-03 10:32 EST ------- Note - we may be able to reuse some of the location string parsing ideas in Bio/GFF/easy.py here too... Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 4 12:31:51 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Dec 2009 07:31:51 -0500 Subject: [Biopython-dev] [Bug 2961] New: Adding undocumented file format switches to MUSCLE wrapper Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2961 Summary: Adding undocumented file format switches to MUSCLE wrapper Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk As discussed on the mailing list, and confirmed with MUSCLE author Robert Edgar, there are a number of useful command line arguments for things like PHYLIP output (both interlaced and sequential) which the Bio.Align.Applications wrapper does not support. See: http://lists.open-bio.org/pipermail/biopython/2009-December/005881.html We should add these. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 4 12:50:25 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Dec 2009 07:50:25 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912041250.nB4CoP72007627@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 ------- Comment #1 from cymon.cox at gmail.com 2009-12-04 07:50 EST ------- Created an attachment (id=1408) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1408&action=view) Add PHYLIP output to Muscle command line interface -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 4 13:14:08 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Dec 2009 08:14:08 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912041314.nB4DE8aA008792@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1408 is|0 |1 obsolete| | ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-04 08:14 EST ------- (From update of attachment 1408) Patch applied. Should we also add -phyiout, -physout, -htmlout, -msfout, -clwout etc (which all take a filename)? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 4 13:21:52 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Dec 2009 08:21:52 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912041321.nB4DLqsd009037@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 ------- Comment #3 from cymon.cox at gmail.com 2009-12-04 08:21 EST ------- (In reply to comment #2) > (From update of attachment 1408 [details]) > Patch applied. > > Should we also add -phyiout, -physout, -htmlout, -msfout, -clwout etc (which > all take a filename)? ! Is there anything else undocumented? OK, I'll do that asap. I'll also add tests - change test suite to use subprocess module etc. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 4 13:36:11 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 4 Dec 2009 08:36:11 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912041336.nB4DaBvS009365@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-04 08:36 EST ------- (In reply to comment #3) > (In reply to comment #2) > > (From update of attachment 1408 [details] [details]) > > Patch applied. > > > > Should we also add -phyiout, -physout, -htmlout, -msfout, -clwout etc (which > > all take a filename)? > > ! Is there anything else undocumented? Robert did imply there could be other things as his documentation was out of sync with the code :( These after of limited value given you can use "-phyi -out filename.phy" as an alternative to "-phyiout filename.phy" however one bonus feature is these options allow you to get multiple output files in one run (e.g. both HTML output to inspect visually and ClustalW output to parse). > OK, I'll do that asap. I'll also add tests - change test suite to use > subprocess module etc. I'd forgotten about that (using subprocess rather than generic_run in our unit tests). Could you do that as a separate patch please? Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at 50mail.com Fri Dec 4 13:40:10 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 4 Dec 2009 08:40:10 -0500 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> References: <20091202125744.GA46415@sobchak.mgh.harvard.edu> <317375.58712.qm@web62401.mail.re1.yahoo.com> <20091203142534.GF51407@sobchak.mgh.harvard.edu> <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> Message-ID: <20091204134010.GK51407@sobchak.mgh.harvard.edu> Hi all; Peter, thanks for the feedback. Thoughts below. > Looking at your code, BCBio.GFF.parse(...) would return > SeqRecord objects (with SeqFeatures). That seems > redundant to me as one expect people to just use > Bio.SeqIO.parse(handle, "gff3") instead. I would instead > have expected BCBio.GFF.parse(...) to iterate over the > features in a GFF file. This would work for simple cases, but for most real life cases you will likely want to limit the file to a subset of things you are interested in. It helps reduce memory problems, and is equivalent to a track system view in UCSC or Ensembl. I find it very useful for all of the work I've done with it. We could use SeqIO here, but then there is the issue of passing along the additional arguments. The simplicity of SeqIO is really nice, so not sure if we'd want to clutter SeqIO with it. So we could support basic parsing in SeqIO, but it would be useful to have this GFF specific parsing as the additional arguments will be a regular use case. > Also, and we'd touched on this before - I'd much prefer to > have the GFF module quite "low level" using either new > GFF-specific classes or simple Python objects (e.g. for > each feature use a tuple of ints and strings for the first > feature columns plus a dict for the final extendible > column of annotation). Yes, it is implemented this way. The parse_simple function returns a line by line parse of the file as a dictionary, which is then used to build up the SeqFeature objects: http://github.com/chapmanb/bcbb/blob/master/gff/BCBio/GFF/GFFParser.py We can document and flesh that out, although I'm not really sure how useful it will be. It's pretty easy to build your own simple line-by-line GFF parser; the only advantage of this code over a home-brew is that it handles tricky annotation cases. For all of my uses, the real win was being able to build up the multiple transcript exon/intron structures from the file. This is not trivial to do on your own, and the real win of the code is in handling this, especially for older GFF2 and GTF formatted files. > From a technical point of view, a justification for this > separation is the GFF details are not a perfect fit to the > SeqRecord and SeqFeature objects and forcing their > use adds unnecessary overheads for people wanting > to work directly with the features themselves. Why are SeqRecord and SeqFeature not appropriate for GFF? We could improve them to make things more lightweight, as we discussed previously, but conceptually the values fit into the framework fine. > Also, by splitting the code into basic parsing and a > SeqRecord/SeqFeature conversion layer (which I > would put in Bio/SeqIO/GffIO.py) we can add the > code in two steps (first GFF parsing, then SeqIO > support). We can do this as is. I'm not suggesting SeqIO support right now, and want to target getting the GFF parser as is into Biopython. > I think this split is useful as this is a very big job to do > properly: Once we have GFF to SeqRecord parsing, > we need to try and ensure that it is compatible with the > GenBank to SeqRecord parsing. This is important as > we would in effect be extending Biopython to allow > GFF3 to GenBank conversions. For testing all this, > we can grab the same data in the two file formats > (e.g. from the NCBI) and perhaps also use EMBOSS. Do you think GFF to GenBank is a common use case? Agreed that it is very hard, but this really had less to do with the object structure in Biopython and more to do with how things are represented and named in the original source files. GenBank has some "consistency" since it is produced mostly by NCBI, but GFF files are all over the place. This can be tackled later if someone wants, but right now my goals are simply: - Produce Biopython objects from GFF3/GTF/GFF2 files - Represent nested features - Allow GFF2/GTF to GFF3 conversion This should be done with the current code. We can formalize the raw parse_simple output for the line-by-line if people find it useful, but otherwise we should leave these bigger projects for down the line. Brad From biopython at maubp.freeserve.co.uk Fri Dec 4 14:25:40 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 4 Dec 2009 14:25:40 +0000 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <20091204134010.GK51407@sobchak.mgh.harvard.edu> References: <20091202125744.GA46415@sobchak.mgh.harvard.edu> <317375.58712.qm@web62401.mail.re1.yahoo.com> <20091203142534.GF51407@sobchak.mgh.harvard.edu> <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> <20091204134010.GK51407@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com> On Fri, Dec 4, 2009 at 1:40 PM, Brad Chapman wrote: > Hi all; > Peter, thanks for the feedback. Thoughts below. > >> Looking at your code, BCBio.GFF.parse(...) would return >> SeqRecord objects (with SeqFeatures). That seems >> redundant to me as one expect people to just use >> Bio.SeqIO.parse(handle, "gff3") instead. I would instead >> have expected BCBio.GFF.parse(...) to iterate over the >> features in a GFF file. > > This would work for simple cases, but for most real life cases you > will likely want to limit the file to a subset of things you are > interested in. It helps reduce memory problems, and is equivalent to > a track system view in UCSC or Ensembl. I find it very useful for > all of the work I've done with it. Understood - a feature returning Bio.GFF.parse() function could take various arguments, or for full flexibility, the user can use the parser object directly. > We could use SeqIO here, but then there is the issue of passing > along the additional arguments. The simplicity of SeqIO is really > nice, so not sure if we'd want to clutter SeqIO with it. > > So we could support basic parsing in SeqIO, but it would be useful to > have this GFF specific parsing as the additional arguments will be a > regular use case. This is already catered for in that Bio.SeqIO.parse() and read() don't take arbitrary arguments (currently), but the underlying Bio.SeqIO.XxxxIO.XxxIterator() they invoke may do so. i.e. You could have Bio.SeqIO.GffIO.GffIterator() and perhaps variants (e.g. GFF2 vs GFF3) which take filtering arguments. >> Also, and we'd touched on this before - I'd much prefer to >> have the GFF module quite "low level" using either new >> GFF-specific classes or simple Python objects (e.g. for >> each feature use a tuple of ints and strings for the first >> feature columns plus a dict for the final extendible >> column of annotation). > > Yes, it is implemented this way. The parse_simple function returns > a line by line parse of the file as a dictionary, which is then used > to build up the SeqFeature objects: > > http://github.com/chapmanb/bcbb/blob/master/gff/BCBio/GFF/GFFParser.py > > We can document and flesh that out, although I'm not really sure how > useful it will be. It's pretty easy to build your own simple > line-by-line GFF parser; the only advantage of this code over a > home-brew is that it handles tricky annotation cases. I still think it would be useful to have Bio/GFF/Parser.py (or similar) as the low level parser, and Bio/SeqIO/GffIO.py (or similar) to turn this into SeqRecord and SeqFeature objects. > For all of my uses, the real win was being able to build up the > multiple transcript exon/intron structures from the file. This is > not trivial to do on your own, and the real win of the code is in > handling this, especially for older GFF2 and GTF formatted files. > >> From a technical point of view, a justification for this >> separation is the GFF details are not a perfect fit to the >> SeqRecord and SeqFeature objects and forcing their >> use adds unnecessary overheads for people wanting >> to work directly with the features themselves. > > Why are SeqRecord and SeqFeature not appropriate for GFF? We could > improve them to make things more lightweight, as we discussed > previously, but conceptually the values fit into the framework fine. The nested features that worry me. Perhaps the existing location operator (e.g. "join") could be set to something like "parent/child" if the subfeatures is used to hold child features rather than the elements of a join? We need the GenBank output code etc to be able to tell these apart reliably. >> Also, by splitting the code into basic parsing and a >> SeqRecord/SeqFeature conversion layer (which I >> would put in Bio/SeqIO/GffIO.py) we can add the >> code in two steps (first GFF parsing, then SeqIO >> support). > > We can do this as is. I'm not suggesting SeqIO support right now, > and want to target getting the GFF parser as is into Biopython. My point is the moment you include GFF -> SeqRecord code (even if not explicitly via the Bio.SeqIO namespace) it opens us up to people giving these SeqRecord objects to SeqIO for output (e.g. as GenBank). >> I think this split is useful as this is a very big job to do >> properly: Once we have GFF to SeqRecord parsing, >> we need to try and ensure that it is compatible with the >> GenBank to SeqRecord parsing. This is important as >> we would in effect be extending Biopython to allow >> GFF3 to GenBank conversions. For testing all this, >> we can grab the same data in the two file formats >> (e.g. from the NCBI) and perhaps also use EMBOSS. > > Do you think GFF to GenBank is a common use case? I suspect its something I'd want to do it when working with new genome annotations. GeneMark produces GFF, while Prodigal produces (simple) GenBank. The SOLiD pipeline corona produces GFF. Sometimes you can get both, the tool RAST outputs GenBank, GFF, GTF and EMBL files. > Agreed that it is very hard, but this really had less to do > with the object structure in Biopython and more to do > with how things are represented and named in the > original source files. GenBank has some "consistency" > since it is produced mostly by NCBI, but GFF files are > all over the place. > > This can be tackled later if someone wants, but right > now my goals are simply: > > - Produce Biopython objects from GFF3/GTF/GFF2 files > - Represent nested features > - Allow GFF2/GTF to GFF3 conversion > > This should be done with the current code. We can > formalize the raw parse_simple output for the line-by-line > if people find it useful, but otherwise we should leave > these bigger projects for down the line. Worth goals, but if by "Produce Biopython objects from GFF3/GTF/GFF2 files" you mean SeqRecords with SeqFeatures, (as I said above) we are opening up the GFF to GenBank can of worms. There is no "later" :( Peter From mjldehoon at yahoo.com Sat Dec 5 15:54:19 2009 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sat, 5 Dec 2009 07:54:19 -0800 (PST) Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> Message-ID: <983129.2133.qm@web62408.mail.re1.yahoo.com> I didn't realize that the GFF parser returns SeqRecords. I agree with Peter that a parser returning SeqRecords should be accessed through Bio.SeqIO, while a lower-level parser can live in Bio.GFF. --Michiel --- On Thu, 12/3/09, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Bio.GFF and Brad's code > To: "Brad Chapman" , biopython-dev at lists.open-bio.org > Date: Thursday, December 3, 2009, 9:53 AM > On Thu, Dec 3, 2009 at 2:25 PM, Brad > Chapman > wrote: > > > > Great -- done for parsing and writing and committed to > GitHub. The > > documentation is updated as well. > > > > Happy to get other comments and thoughts. Thanks > again, > > > > I understand that GFF files are complex, and a simple > "record > iterator" isn't flexible enough to cover all use cases - > hence the > need for a complex parser class. That said, Michiel is > right that > GFF.parse() or GFF.read() functions would be consistent > with > other bits of Biopython, and would provide for the simple > use > cases. > > Looking at your code, BCBio.GFF.parse(...) would return > SeqRecord objects (with SeqFeatures). That seems > redundant to me as one expect people to just use > Bio.SeqIO.parse(handle, "gff3") instead. I would instead > have expected BCBio.GFF.parse(...) to iterate over the > features in a GFF file. > > Also, and we'd touched on this before - I'd much prefer to > have the GFF module quite "low level" using either new > GFF-specific classes or simple Python objects (e.g. for > each feature use a tuple of ints and strings for the first > feature columns plus a dict for the final extendible > column of annotation). > > >From a technical point of view, a justification for > this > separation is the GFF details are not a perfect fit to the > SeqRecord and SeqFeature objects and forcing their > use adds unnecessary overheads for people wanting > to work directly with the features themselves. > > Also, by splitting the code into basic parsing and a > SeqRecord/SeqFeature conversion layer (which I > would put in Bio/SeqIO/GffIO.py) we can add the > code in two steps (first GFF parsing, then SeqIO > support). > > I think this split is useful as this is a very big job to > do > properly: Once we have GFF to SeqRecord parsing, > we need to try and ensure that it is compatible with the > GenBank to SeqRecord parsing. This is important as > we would in effect be extending Biopython to allow > GFF3 to GenBank conversions. For testing all this, > we can grab the same data in the two file formats > (e.g. from the NCBI) and perhaps also use EMBOSS. > > You may recall we talked to Peter Rice (from EMBOSS) > about this - there are some important issues here like > ontology mapping where we should be able to reuse a > lot of the work EMBOSS has already done (and use the > EMBOSS tools to help validate our mapping). > > i.e. While I may be being overly cautious, I think that > while adding GFF parsing and GFF to SeqRecord > mapping is very important, it is also very complex. > Therefore breaking this into a two stage task makes > managing and testing it easier - as well as seeming > a good idea for the code itself. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From MatatTHC at gmx.de Sun Dec 6 14:18:40 2009 From: MatatTHC at gmx.de (Matthias Bernt) Date: Sun, 06 Dec 2009 15:18:40 +0100 Subject: [Biopython-dev] Genetic Code Message-ID: <20091206141840.67400@gmx.net> Hi, The genetic codes you provide in Bio.Data.CodonTable are somewhat out of date. E.g. in the mitochondrial echinoderm (id 9) genetic code one start codon is missing. Regards, Matthias -- GRATIS f?r alle GMX-Mitglieder: Die maxdome Movie-FLAT! Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01 From biopython at maubp.freeserve.co.uk Sun Dec 6 14:55:24 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 6 Dec 2009 14:55:24 +0000 Subject: [Biopython-dev] Genetic Code In-Reply-To: <20091206141840.67400@gmx.net> References: <20091206141840.67400@gmx.net> Message-ID: <320fb6e00912060655r75103918w3122f46b3ccb538f@mail.gmail.com> On Sun, Dec 6, 2009 at 2:18 PM, Matthias Bernt wrote: > Hi, > > The genetic codes you provide in Bio.Data.CodonTable are somewhat > out of date. E.g. in the mitochondrial echinoderm (id 9) genetic code > one start codon is missing. Confirmed - could you file a bug please? http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython It looks like we have only got Version 3.4 (based on a visual inspection), but the latest version is Version 3.9. We should just need to re-run the script to generate these. Also the original URL noted in the Biopython source code of ftp://ncbi.nlm.nih.gov/entrez/misc/data/gc.prt is now ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt Peter From bugzilla-daemon at portal.open-bio.org Sun Dec 6 15:07:23 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 10:07:23 -0500 Subject: [Biopython-dev] [Bug 2962] New: deprecated generic code Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2962 Summary: deprecated generic code Product: Biopython Version: 1.52 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: MatatTHC at gmx.de The genetic codes provided in Bio.Data.CodonTable are out of date. E.g. in the mitochondrial echinoderm (id 9) genetic code one start codon is missing. It looks like we have only got Version 3.4 (based on a visual inspection), but the latest version is Version 3.9. We should just need to re-run the script to generate these. Also the original URL noted in the Biopython source code of ftp://ncbi.nlm.nih.gov/entrez/misc/data/gc.prt is now ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 15:07:43 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 10:07:43 -0500 Subject: [Biopython-dev] [Bug 2963] New: deprecated genetic code Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2963 Summary: deprecated genetic code Product: Biopython Version: 1.52 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: MatatTHC at gmx.de The genetic codes provided in Bio.Data.CodonTable are out of date. E.g. in the mitochondrial echinoderm (id 9) genetic code one start codon is missing. It looks like we have only got Version 3.4 (based on a visual inspection), but the latest version is Version 3.9. We should just need to re-run the script to generate these. Also the original URL noted in the Biopython source code of ftp://ncbi.nlm.nih.gov/entrez/misc/data/gc.prt is now ftp://ftp.ncbi.nih.gov/entrez/misc/data/gc.prt -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 15:35:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 10:35:09 -0500 Subject: [Biopython-dev] [Bug 2963] deprecated genetic code In-Reply-To: Message-ID: <200912061535.nB6FZ9qY029156@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2963 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |DUPLICATE ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-06 10:35 EST ------- *** This bug has been marked as a duplicate of bug 2962 *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 15:35:21 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 10:35:21 -0500 Subject: [Biopython-dev] [Bug 2962] deprecated generic code In-Reply-To: Message-ID: <200912061535.nB6FZL0I029172@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2962 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-06 10:35 EST ------- *** Bug 2963 has been marked as a duplicate of this bug. *** -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 16:09:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 11:09:28 -0500 Subject: [Biopython-dev] [Bug 2962] deprecated generic code In-Reply-To: Message-ID: <200912061609.nB6G9Sk9030056@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2962 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-06 11:09 EST ------- The NCBI codon tables have been updated from version 3.4 to 3.9, which adds a few extra start codons, and a few new tables (Tables 16, 21, 22 and 23). Note that Table 14 which used to be called "Flatworm Mitochondrial" is now called "Alternative Flatworm Mitochondrial", and "Flatworm Mitochondrial" is now an alias for Table 9 ("Echinoderm Mitochondrial"). See: http://github.com/biopython/biopython/commit/74ba9d295b2cd6c6fa6862e91f1e1e59300deeb6 Marking as fixed - but feel free to reopen this is I missed anything. Thanks! Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Sun Dec 6 16:11:08 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 6 Dec 2009 16:11:08 +0000 Subject: [Biopython-dev] Genetic Code In-Reply-To: <320fb6e00912060655r75103918w3122f46b3ccb538f@mail.gmail.com> References: <20091206141840.67400@gmx.net> <320fb6e00912060655r75103918w3122f46b3ccb538f@mail.gmail.com> Message-ID: <320fb6e00912060811x1fc336ech6245221741372c62@mail.gmail.com> On Sun, Dec 6, 2009 at 2:55 PM, Peter wrote: > Confirmed - could you file a bug please? > http://bugzilla.open-bio.org/enter_bug.cgi?product=Biopython Thanks - I was expecting to look at this next week, but had some spare time this afternoon after all. It should be fixed, you can grab the latest code and reinstall to test: http://www.biopython.org/wiki/SourceCode Peter From bugzilla-daemon at portal.open-bio.org Sun Dec 6 17:46:55 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 12:46:55 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912061746.nB6Hkt7x032479@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 ------- Comment #5 from cymon.cox at gmail.com 2009-12-06 12:46 EST ------- Created an attachment (id=1409) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1409&action=view) Patch for output file fomat options -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 18:50:08 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 13:50:08 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912061850.nB6Io80P001234@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-06 13:50 EST ------- (In reply to comment #5) > Created an attachment (id=1409) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1409&action=view) [details] > Patch for output file fomat options > Applied with minor changes to the docstrings - Bio.AlignIO will now accept the default CLUSTALW output from MUSCLE as is. Thanks! Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 19:10:01 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 14:10:01 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912061910.nB6JA1p3001668@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 ------- Comment #7 from cymon.cox at gmail.com 2009-12-06 14:10 EST ------- Created an attachment (id=1410) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1410&action=view) Change Application cmdline tests to use subprocess module -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 19:36:27 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 14:36:27 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912061936.nB6JaRo0002258@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 ------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-06 14:36 EST ------- (In reply to comment #7) > Created an attachment (id=1410) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1410&action=view) [details] > Change Application cmdline tests to use subprocess module > Lovely - applied as is - thanks again :) Did you want to add tests for the new MUSCLE output options, or can we close this bug now Cymon? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 19:43:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 14:43:12 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912061943.nB6JhCOd002510@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 ------- Comment #9 from cymon.cox at gmail.com 2009-12-06 14:43 EST ------- (In reply to comment #8) > (In reply to comment #7) > > Created an attachment (id=1410) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1410&action=view) [details] [details] > > Change Application cmdline tests to use subprocess module > > > > Lovely - applied as is - thanks again :) > > Did you want to add tests for the new MUSCLE output options, or can we close > this bug now Cymon? There's is one in the patch called: test_with_multiple_output_formats that writes to stdout, phylip interleaved, and clustalw strict, using the -phyi and -clwstrict options. I think it can be closed. Cheers, C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Dec 6 19:47:11 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 6 Dec 2009 14:47:11 -0500 Subject: [Biopython-dev] [Bug 2961] Adding undocumented file format switches to MUSCLE wrapper In-Reply-To: Message-ID: <200912061947.nB6JlBHi002609@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2961 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-06 14:47 EST ------- (In reply to comment #9) > > Did you want to add tests for the new MUSCLE output options, or can we > > close this bug now Cymon? > > There's is one in the patch called: test_with_multiple_output_formats that > writes to stdout, phylip interleaved, and clustalw strict, using the -phyi and > -clwstrict options. So there is - I missed that. Lovely :) Marking bug as fixed. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 7 09:16:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 04:16:42 -0500 Subject: [Biopython-dev] [Bug 2964] New: placing x-axis of graph track at the bottom or top of the track Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2964 Summary: placing x-axis of graph track at the bottom or top of the track Product: Biopython Version: 1.52 Platform: PC OS/Version: Windows Status: NEW Severity: normal Priority: P2 Component: Other AssignedTo: biopython-dev at biopython.org ReportedBy: Daniel.Nicorici at gmail.com By default when one uses the graph track the axis is placed automatically in the middle of the track (which is given by the mean of the all values which are plotted). It would be great if the x-axis of the graph track could be placed at the bottom of the track also and the plotting of the values could be done accordingly. This would allow one to plot for example the short-read coverage in next-gen sequencing data. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 7 09:48:11 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 04:48:11 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track In-Reply-To: Message-ID: <200912070948.nB79mBTh022941@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #1 from Daniel.Nicorici at gmail.com 2009-12-07 04:48 EST ------- This has feature has been added in: http://github.com/ndaniel/biopython/tree/x-axis_GenomeDiagram/Bio/Graphics/GenomeDiagram/ Also, here a small additional bug has been fixed, i.e. the line/bar graphs are drawn from the first element to the last element of the graph and not from the origin to the end of the x-axis as it was original. One can specify that the x-axis should be drawn at bottom of the track by specifying the argument x_axis='bottom' for new_track, e.g. gdt_features=gdd.new_track(2,x_axis='bottom'). Below one may find two examples where the x-axis is drawn in the middle (as it is originally done by the GenomeDiagram) and bottom of the track (the new feature added to GenomeDiagram). ====Example_1:_Using_Graph_from_GenomeDiagram_where_the_x-axis_is_at_the_middle_of_track(as_it_is_originally)============================= import Bio.SeqFeature import Bio.Graphics.GenomeDiagram import random gdd=Bio.Graphics.GenomeDiagram.Diagram('Test diagram') gdt_features=gdd.new_track(1) gds_features=gdt_features.new_set() # Add three features feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(25,125),strand=+1) gds_features.add_feature(feature,name="Forward",label=True) feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(150,250),strand=None) gds_features.add_feature(feature,name="Forward",label=True) feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(275,375),strand=-1) gds_features.add_feature(feature,name="Forward",label=True) # Add graph gdt_features=gdd.new_track(2) gds_features=gdt_features.new_set('graph') # generate some random values for plotting coverage=[] coverage.append((50,float(0))) # this is need in order skip the interpolation done by GenomeDiagram for missing values coverage.extend( [ (i, random.uniform(0,100)) for i in xrange(51,100)]) coverage.append((100,float(0))) # this is need in order skip the interpolation done by GenomeDiagram for missing values coverage.append((250,float(0)))# this is need in order skip the interpolation done by GenomeDiagram for missing values coverage.extend( [ (i, random.uniform(50,400)) for i in xrange(251,400)]) coverage.append((400,float(0)))# this is need in order skip the interpolation done by GenomeDiagram for missing values gds_features.new_graph(coverage, 'coverage', style='bar') gdd.draw(format='linear',orientation='landscape',pagesize='A4',fragments=1,start=1,end=500) gdd.write("Test_gaph.pdf","pdf") ============================================ ====Example_2:_Using_Graph_from_GenomeDiagram_where_x-axis_is_at_the_bottom_of_track============================= import Bio.SeqFeature import Bio.Graphics.GenomeDiagram import random gdd=Bio.Graphics.GenomeDiagram.Diagram('Test diagram') gdt_features=gdd.new_track(1) gds_features=gdt_features.new_set() # Add three features feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(25,125),strand=+1) gds_features.add_feature(feature,name="Forward",label=True) feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(150,250),strand=None) gds_features.add_feature(feature,name="Forward",label=True) feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(275,375),strand=-1) gds_features.add_feature(feature,name="Forward",label=True) # Add graph gdt_features=gdd.new_track(2,x_axis='bottom') gds_features=gdt_features.new_set('graph') # generate some random values for plotting coverage=[] coverage.append((50,float(0))) # this is need in order skip the interpolation done by GenomeDiagram for missing values coverage.extend( [ (i, random.uniform(0,100)) for i in xrange(51,100)]) coverage.append((100,float(0))) # this is need in order skip the interpolation done by GenomeDiagram for missing values coverage.append((250,float(0)))# this is need in order skip the interpolation done by GenomeDiagram for missing values coverage.extend( [ (i, random.uniform(50,400)) for i in xrange(251,400)]) coverage.append((400,float(0)))# this is need in order skip the interpolation done by GenomeDiagram for missing values gds_features.new_graph(coverage, 'coverage', style='bar') gdd.draw(format='linear',orientation='landscape',pagesize='A4',fragments=1,start=1,end=500) gdd.write("Test_gaph.pdf","pdf") ============================================ Best Regards, Daniel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 7 10:55:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 05:55:12 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track In-Reply-To: Message-ID: <200912071055.nB7AtCol024504@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-07 05:55 EST ------- I'm guessing you are talking about GenomeDiagram? If so, yes, tracks default to having the x-axis line at the middle y-value (center or centre=None). Try setting center to zero when you create the Graph object. If you could give a cut down example it would be easier to help. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Dec 7 11:34:11 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 7 Dec 2009 11:34:11 +0000 Subject: [Biopython-dev] Biopython git access for Cymon Message-ID: <320fb6e00912070334m311dd287r4a20f1e399413adc@mail.gmail.com> Dear all, It is a little overdue, but I'm pleased to announce Cymon Cox now has write access to the Biopython repository. Cymon has made contributions to Biopython over many years, initially with the modules Bio.Nexus and Bio.Sequencing (together with Frank Kauff), and more recently with improvements to our BioSQL wrappers (especially on PostgreSQL) and his recent work on alignment wrappers. I'd previously talked to Cymon about giving him CVS access, and he said we might as well wait until after the git transition. I've just checked in a few patches on his behalf (alignment tool wrappers), which served to remind me of this - it would have saved me some work to just say "Yes, please check that in" ;) On behalf of the Biopython project, welcome (fully) to the development team Cymon, and thanks again for all your work to date. Regards, Peter From bugzilla-daemon at portal.open-bio.org Mon Dec 7 11:38:27 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 06:38:27 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track In-Reply-To: Message-ID: <200912071138.nB7BcROx026201@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #3 from Daniel.Nicorici at gmail.com 2009-12-07 06:38 EST ------- (In reply to comment #2) > I'm guessing you are talking about GenomeDiagram? If so, yes, tracks default to > having the x-axis line at the middle y-value (center or centre=None). Try > setting > center to zero when you create the Graph object. If you could give a cut down > example it would be easier to help. Yes, I am referring to GenomeDiagram. If one sets the center to zero then the lower half of the track (below the x-axis) is always empty and unused when all values are positive, e.g. CG content, short-read coverage have positive values. This feature allows one to use the entire track for plotting and not only half of it when setting center to zero is used. Best Regards, Daniel > > Peter > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 7 11:48:32 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 06:48:32 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track In-Reply-To: Message-ID: <200912071148.nB7BmW8A026423@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #4 from Daniel.Nicorici at gmail.com 2009-12-07 06:48 EST ------- Here is the cut down example of what I mean: ===================================================== import Bio.SeqFeature import Bio.Graphics.GenomeDiagram import random gdd=Bio.Graphics.GenomeDiagram.Diagram('Test diagram') gdt_features=gdd.new_track(1) gds_features=gdt_features.new_set() feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(25,125),strand=+1) gds_features.add_feature(feature,name="Forward",label=True) feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(150,250),strand=None) gds_features.add_feature(feature,name="Forward",label=True) feature=Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(275,375),strand=-1) gds_features.add_feature(feature,name="Forward",label=True) # Add graph gdt_features=gdd.new_track(2) gds_features=gdt_features.new_set('graph') # generate some random values for plotting coverage=[] coverage.append((50,float(0))) coverage.extend( [ (i, random.uniform(0,100)) for i in xrange(51,100)]) coverage.append((100,float(0))) coverage.append((250,float(0))) coverage.extend( [ (i, random.uniform(50,400)) for i in xrange(251,400)]) coverage.append((400,float(0))) gds_features.new_graph(coverage, 'coverage', style='bar',center=0) gdd.draw(format='linear',orientation='landscape',pagesize='A4',fragments=1,start=1,end=500) gdd.write("Test_gaph.pdf","pdf") =========================================== The values which are plotted here in this are in range 0 to 400 and the GenomeDiagram's y-axis range is from -400 to 400 when center is set to 0. It is really odd choice of a y-axis range of -n to n when all the values which are to be plotted are in range 0 to n. The feature proposed here allows the entire track to be used instead of using half of the track and also having a better range for y-axis. (In reply to comment #2) > I'm guessing you are talking about GenomeDiagram? If so, yes, tracks default to > having the x-axis line at the middle y-value (center or centre=None). Try > setting > center to zero when you create the Graph object. If you could give a cut down > example it would be easier to help. > > Peter > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 7 11:59:33 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 06:59:33 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track in GenomeDiagram In-Reply-To: Message-ID: <200912071159.nB7BxXs5026717@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |enhancement Summary|placing x-axis of graph |placing x-axis of graph |track at the bottom or top |track at the bottom or top |of the track |of the track in | |GenomeDiagram ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-07 06:59 EST ------- When I wrote comment 2, I hadn't seen comment 1 with the github link and examples. Leighton and I had (some time ago now) chatted about a related enhancement allowing the user to give the y-limits. With than in mind, it makes sense to give the x-axis vertical position in terms of a y-coordinate (rather than a few limited options like top, middle and bottom). This would be more flexible. Marking this as an enhancement. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at 50mail.com Mon Dec 7 12:12:45 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 7 Dec 2009 07:12:45 -0500 Subject: [Biopython-dev] Biopython git access for Cymon In-Reply-To: <320fb6e00912070334m311dd287r4a20f1e399413adc@mail.gmail.com> References: <320fb6e00912070334m311dd287r4a20f1e399413adc@mail.gmail.com> Message-ID: <20091207121245.GM51407@sobchak.mgh.harvard.edu> Hi all; > It is a little overdue, but I'm pleased to announce Cymon Cox > now has write access to the Biopython repository. > > Cymon has made contributions to Biopython over many years, > initially with the modules Bio.Nexus and Bio.Sequencing > (together with Frank Kauff), and more recently with > improvements to our BioSQL wrappers (especially on > PostgreSQL) and his recent work on alignment wrappers. Awesome. Congrats Cymon and thanks for all your excellent work. Well deserved. Brad From bugzilla-daemon at portal.open-bio.org Mon Dec 7 12:15:03 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 07:15:03 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track in GenomeDiagram In-Reply-To: Message-ID: <200912071215.nB7CF3pE027513@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #6 from Daniel.Nicorici at gmail.com 2009-12-07 07:15 EST ------- (In reply to comment #5) > When I wrote comment 2, I hadn't seen comment 1 with the github link and > examples. > ;-) > Leighton and I had (some time ago now) chatted about a related enhancement > allowing the user to give the y-limits. I think that it is need enhancement. Let's see if others think that same! ;-) > With than in mind, it makes sense to > give the x-axis vertical position in terms of a y-coordinate (rather than a few > limited options like top, middle and bottom). This would be more flexible. This sounds good and I agree that it is more flexible. Indeed that options like "top, middle, bottom" are limited but still the scaling is done automatically and the user does not have to know in what range are his/her values are and what are the minimum and maximum and what axis position matches all the graphs which he/she wants to generate. I am sure that this can be done better than I did it. > > Marking this as an enhancement. Ok. Daniel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 7 13:03:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 08:03:14 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track in GenomeDiagram In-Reply-To: Message-ID: <200912071303.nB7D3Esa029362@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #7 from lpritc at scri.sari.ac.uk 2009-12-07 08:03 EST ------- (In reply to comment #6) > > (In reply to comment #5) > > Leighton and I had (some time ago now) chatted about a related enhancement > > allowing the user to give the y-limits. > > I think that it is need enhancement. Let's see if others think that same! ;-) Oh, it definitely does! ;) Thank you for taking the time to improve it. > > With than in mind, it makes sense to > > give the x-axis vertical position in terms of a y-coordinate (rather than a few > > limited options like top, middle and bottom). This would be more flexible. > > This sounds good and I agree that it is more flexible. This is my preferred option. > Indeed that options like "top, middle, bottom" are limited but still the > scaling is done automatically and the user does not have to know in what range > are his/her values are and what are the minimum and maximum and what axis > position matches all the graphs which he/she wants to generate. > > I am sure that this can be done better than I did it. By allowing the position of the axis to take any value within the data range, this still allows 'top', 'middle' and 'bottom' to be defined as functions of the data with, e.g. x_axis_pos = min(data) # bottom x_axis_pos = max(data) # middle x_axis_pos = median(data) # top and also allows for explicit placing of the axis at specified points on the y-axis, or as other points that depend on the data (e.g. mean, quartiles, etc.) Cheers, L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 7 13:05:11 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 08:05:11 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track in GenomeDiagram In-Reply-To: Message-ID: <200912071305.nB7D5B22029508@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #8 from lpritc at scri.sari.ac.uk 2009-12-07 08:05 EST ------- (In reply to comment #7) > x_axis_pos = min(data) # bottom > x_axis_pos = max(data) # middle > x_axis_pos = median(data) # top x_axis_pos = min(data) # bottom x_axis_pos = max(data) # top x_axis_pos = median(data) # middle D'oh! L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 7 13:25:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 08:25:29 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track in GenomeDiagram In-Reply-To: Message-ID: <200912071325.nB7DPTSH030274@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #9 from Daniel.Nicorici at gmail.com 2009-12-07 08:25 EST ------- (In reply to comment #8) Ok. > (In reply to comment #7) > > > x_axis_pos = min(data) # bottom > > x_axis_pos = max(data) # middle > > x_axis_pos = median(data) # top > > x_axis_pos = min(data) # bottom > x_axis_pos = max(data) # top > x_axis_pos = median(data) # middle > > D'oh! > > L. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Dec 7 13:28:10 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 7 Dec 2009 13:28:10 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 Message-ID: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> Hi all, I would like us to do the Biopython 1.53 release this month. We still have lots of new stuff that hasn't yet landed on the trunk, but despite that, looking at the NEWS file we have had plenty of improvements in the two months and a bit since Biopython 1.52 was released: http://biopython.open-bio.org/SRC/biopython/NEWS http://github.com/biopython/biopython/blob/master/NEWS One good reason for doing Biopython 1.53 soon is the NCBI said they plan to start using the new Jan 2010 DTD files for MedLine/PubMed as early as mid December: http://lists.open-bio.org/pipermail/biopython-dev/2009-November/007020.html Any comments on how things stand on the trunk - is there anything people think needs to be fixed before the release? Thanks, Peter From eric.talevich at gmail.com Mon Dec 7 16:33:30 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 7 Dec 2009 11:33:30 -0500 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> Message-ID: <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> On Mon, Dec 7, 2009 at 8:28 AM, Peter wrote: > Hi all, > > I would like us to do the Biopython 1.53 release this month. > > We still have lots of new stuff that hasn't yet landed on the > trunk, but despite that, looking at the NEWS file we have > had plenty of improvements in the two months and a bit > since Biopython 1.52 was released: > > http://biopython.open-bio.org/SRC/biopython/NEWS > http://github.com/biopython/biopython/blob/master/NEWS > > One good reason for doing Biopython 1.53 soon is the > NCBI said they plan to start using the new Jan 2010 DTD > files for MedLine/PubMed as early as mid December: > http://lists.open-bio.org/pipermail/biopython-dev/2009-November/007020.html > > Any comments on how things stand on the trunk - is there > anything people think needs to be fixed before the release? > > I'll chime in about the status of the Summer of Code stuff. For Bio.TreeIO, I've borrowed the Newick tree parsing code from Nexus.Trees and changed it to construct Bio.Tree objects via Bio.TreeIO.NewickIO -- so the TreeIO API will work independently of file formats. For Bio.Tree, I'm about halfway done porting the Nexus tree methods, though it'll go faster now that the semester's over. (I'll post the details and ask for a code review soon.) My phyloxml branch won't be ready to land in time for a December release, but merging it into the trunk right after that is feasible. That would everyone time to try it out and suggest changes before Biopython 1.54 cements the API. Separately: GitHub says Nick Matzke's BioGeography branch hasn't been touched since Aug. 19. It will need some love before it can be merged into the trunk. Is there a plan for this, Peter or Brad? If not, should I try to rescue it after TreeIO lands? Cheers, Eric From biopython at maubp.freeserve.co.uk Mon Dec 7 16:48:34 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 7 Dec 2009 16:48:34 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> Message-ID: <320fb6e00912070848i4153ee33w9df5c7df65a4c225@mail.gmail.com> On Mon, Dec 7, 2009 at 4:33 PM, Eric Talevich wrote: > > I'll chime in about the status of the Summer of Code stuff. Thanks > For Bio.TreeIO, I've borrowed the Newick tree parsing code from Nexus.Trees > and changed it to construct Bio.Tree objects via Bio.TreeIO.NewickIO -- so > the TreeIO API will work independently of file formats. For Bio.Tree, I'm > about halfway done porting the Nexus tree methods, though it'll go faster > now that the semester's over. (I'll post the details and ask for a code > review soon.) > > My phyloxml branch won't be ready to land in time for a December release, > but merging it into the trunk right after that is feasible. That would > everyone time to try it out and suggest changes before Biopython 1.54 > cements the API. That is what I was hoping for. Fingers crossed Tiago will be able to spare some time to go over the basics of the phyloxml and TreeIO work - more eyes on the code would be great. > Separately: GitHub says Nick Matzke's BioGeography branch hasn't been > touched since Aug. 19. It will need some love before it can be merged into > the trunk. Is there a plan for this, Peter or Brad? If not, should I try to > rescue it after TreeIO lands? That sounds good as a tentative plan - Nick may want to be more involved, but you would be the next logical choice to handle this. Cheers, Peter From bugzilla-daemon at portal.open-bio.org Mon Dec 7 18:56:20 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 7 Dec 2009 13:56:20 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track in GenomeDiagram In-Reply-To: Message-ID: <200912071856.nB7IuKI7007552@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #10 from Daniel.Nicorici at gmail.com 2009-12-07 13:56 EST ------- (In reply to comment #7) > (In reply to comment #6) > > > > (In reply to comment #5) > > > Leighton and I had (some time ago now) chatted about a related enhancement > > > allowing the user to give the y-limits. > > > > I think that it is need enhancement. Let's see if others think that same! ;-) > > Oh, it definitely does! ;) Thank you for taking the time to improve it. > > > > With than in mind, it makes sense to > > > give the x-axis vertical position in terms of a y-coordinate (rather than a few > > > limited options like top, middle and bottom). This would be more flexible. > > > > This sounds good and I agree that it is more flexible. > > This is my preferred option. > > > Indeed that options like "top, middle, bottom" are limited but still the > > scaling is done automatically and the user does not have to know in what range > > are his/her values are and what are the minimum and maximum and what axis > > position matches all the graphs which he/she wants to generate. > > > > I am sure that this can be done better than I did it. > > By allowing the position of the axis to take any value within the data range, > this still allows 'top', 'middle' and 'bottom' to be defined as functions of > the data with, e.g. > > x_axis_pos = min(data) # bottom > x_axis_pos = max(data) # middle > x_axis_pos = median(data) # top > > and also allows for explicit placing of the axis at specified points on the > y-axis, or as other points that depend on the data (e.g. mean, quartiles, etc.) > It looks a little bit confusing too me now because I see that there are two sides of the problem (or two bugs?), as following: 1) drawing a line orthogonal on y-axis at any position which represents the x-axis (this does not affect how the values are plotted and in what interval) 2) in the case of bar plotting (partially affects also linear plotting), the values should be drawn automatically from zero (zero on y-axis, i.e. x=0 and y=-inf...+inf) unless the user specify something else and not to be drawn by default from some arbitrary point, e.g. median, mean, etc., as it is done now. I have the feeling that the solution presented here affects only the point 1) and not 2). Please, could you elaborate more such that maybe I could implement your suggestion? BR, Daniel > Cheers, > > L. > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 8 08:49:59 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 8 Dec 2009 03:49:59 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track in GenomeDiagram In-Reply-To: Message-ID: <200912080849.nB88nx00030750@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #11 from lpritc at scri.sari.ac.uk 2009-12-08 03:49 EST ------- (In reply to comment #10) > It looks a little bit confusing too me now because I see that there are two > sides of the problem (or two bugs?), as following: > 1) drawing a line orthogonal on y-axis at any position which represents the > x-axis (this does not affect how the values are plotted and in what interval) > 2) in the case of bar plotting (partially affects also linear plotting), the > values should be drawn automatically from zero (zero on y-axis, i.e. x=0 and > y=-inf...+inf) unless the user specify something else and not to be drawn by > default from some arbitrary point, e.g. median, mean, etc., as it is done now. > > I have the feeling that the solution presented here affects only the point 1) > and not 2). > > Please, could you elaborate more such that maybe I could implement your > suggestion? I see why you've distinguished between the two cases, but I think they can be handled by the earlier suggestion to implement the location of the x-axis in the context of also allowing the user to set y-axis limits (see comment #5). It's the combination of allowing y-axis limits and the location of x-axis crossing that gives the greatest flexibility. For example, if y-limit selection and x-axis crossing point were under user control... ...if you wanted to continue with the current behaviour, you'd not set any y-limits, and not specify the location of the x-axis. ...if you wanted to draw short read coverage, you'd set the lower y-limit to 0, and set the location of the x-axis to zero (if that was not the default). This should draw bars with their bases on the bottom/inner of the track, and the scale running along the bottom/inner of the track. ...if you wanted to represent some data as a bar graph, with a special meaning for the mean (or median) value, you could optionally set y-limits, but have the x-axis cross at mean(data) or median(data). This should draw bars with their bases on the x-axis, and the axis located at the mean/median value for the data. Does this help clarify what I meant, above? L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at 50mail.com Tue Dec 8 13:33:12 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 8 Dec 2009 08:33:12 -0500 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com> References: <20091202125744.GA46415@sobchak.mgh.harvard.edu> <317375.58712.qm@web62401.mail.re1.yahoo.com> <20091203142534.GF51407@sobchak.mgh.harvard.edu> <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> <20091204134010.GK51407@sobchak.mgh.harvard.edu> <320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com> Message-ID: <20091208133312.GE74538@sobchak.mgh.harvard.edu> Peter and Michiel; Thanks for the thoughts. Tried to combine these below: Michiel: > I didn't realize that the GFF parser returns SeqRecords. I agree with > Peter that a parser returning SeqRecords should be accessed through > Bio.SeqIO, while a lower-level parser can live in Bio.GFF. Peter: > My point is the moment you include GFF -> SeqRecord > code (even if not explicitly via the Bio.SeqIO namespace) > it opens us up to people giving these SeqRecord objects > to SeqIO for output (e.g. as GenBank). [...] > Worth goals, but if by "Produce Biopython objects from > GFF3/GTF/GFF2 files" you mean SeqRecords with > SeqFeatures, (as I said above) we are opening up the > GFF to GenBank can of worms. There is no "later" :( We seem to have a very different view of SeqRecords/SeqFeatures. To me, they are a convenient well thought out object model to capture annotations and features associated with a sequence. They have the advantage that people who have used Biopython will be familiar with the object model. That's why I chose to use them for representing GFF, as opposed to a GFF specific class. You are adding on two extra conditions: - If something produces SeqRecords, it needs to come from SeqIO. - If you have a SeqRecord, it has to be compatible with GenBank output. This quickly ties us up to the not-that-great GenBank way of representing features and locations, and makes it hard to add on more flexible formats like GFF. Converting between very different feature representations is going to be complex and a whole new problem; why do you have to support that to use a SeqRecord in your code? Overall, I'd like to see it be simpler for people to contribute and add parsers to Biopython. > I still think it would be useful to have Bio/GFF/Parser.py (or > similar) as the low level parser, and Bio/SeqIO/GffIO.py (or > similar) to turn this into SeqRecord and SeqFeature objects. This appears to be about where the code lives. Personally, I prefer having things under the GFF namespace and then building thin wrappers around if in SeqIO if desired. Practically, I want to leave SeqIO inclusion out right now and try to argue only for getting the GFF specific parser in. > The nested features that worry me. Perhaps the existing > location operator (e.g. "join") could be set to something > like "parent/child" if the subfeatures is used to hold child > features rather than the elements of a join? We need > the GenBank output code etc to be able to tell these > apart reliably. Right now I don't set the location operator at all. The parent/child model is much more flexible than the GenBank operator stuff, so maybe the right way to go is to phase out using the operator at all. If it is set to nothing than parent/child is assumed, and GenBank output can add in all of the operators at output time. Brad From chapmanb at 50mail.com Tue Dec 8 14:03:54 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 8 Dec 2009 09:03:54 -0500 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> Message-ID: <20091208140354.GG74538@sobchak.mgh.harvard.edu> Hi Eric; > I'll chime in about the status of the Summer of Code stuff. > > For Bio.TreeIO, I've borrowed the Newick tree parsing code from Nexus.Trees > and changed it to construct Bio.Tree objects via Bio.TreeIO.NewickIO -- so > the TreeIO API will work independently of file formats. For Bio.Tree, I'm > about halfway done porting the Nexus tree methods, though it'll go faster > now that the semester's over. (I'll post the details and ask for a code > review soon.) > > My phyloxml branch won't be ready to land in time for a December release, > but merging it into the trunk right after that is feasible. That would > everyone time to try it out and suggest changes before Biopython 1.54 > cements the API. This sounds awesome. Thanks for keeping up with the code; looking forward to seeing it get in to the main branch. > Separately: GitHub says Nick Matzke's BioGeography branch hasn't been > touched since Aug. 19. It will need some love before it can be merged into > the trunk. Is there a plan for this, Peter or Brad? If not, should I try to > rescue it after TreeIO lands? No plan from my end; hopefully Nick will chime in. If Nick doesn't have time, it would be beyond great if you could finalize and merge the most useful parts. Thanks for volunteering on this. Brad From biopython at maubp.freeserve.co.uk Tue Dec 8 14:15:30 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Dec 2009 14:15:30 +0000 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <20091208133312.GE74538@sobchak.mgh.harvard.edu> References: <20091202125744.GA46415@sobchak.mgh.harvard.edu> <317375.58712.qm@web62401.mail.re1.yahoo.com> <20091203142534.GF51407@sobchak.mgh.harvard.edu> <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> <20091204134010.GK51407@sobchak.mgh.harvard.edu> <320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com> <20091208133312.GE74538@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00912080615k641cfc15v1c80b26132de83eb@mail.gmail.com> On Tue, Dec 8, 2009 at 1:33 PM, Brad Chapman wrote: > > We seem to have a very different view of SeqRecords/SeqFeatures. To > me, they are a convenient well thought out object model to capture > annotations and features associated with a sequence. They have the > advantage that people who have used Biopython will be familiar with > the object model. That's why I chose to use them for representing GFF, > as opposed to a GFF specific class. OK, but (as I expand on below), your planned use of the SeqFeature (while legitimate) appears to risk being inconsistent with existing parts of the Biopython code base (in particular, GenBank output, and maybe GenomeDiagram). > You are adding on two extra conditions: > > - If something produces SeqRecords, it needs to come from SeqIO. It was more of an aim than a rule. Isn't true of all the existing code for historical reasons, e.g. Bio.SeqIO "genbank" support acts as a thin wrapper to Bio.GenBank which does offer SeqRecord objects. For a user perspective, if you want a SeqRecord from a sequence file, the first point of call should be Bio.SeqIO. > - If you have a SeqRecord, it has to be compatible with GenBank > ?output. > > This quickly ties us up to the not-that-great GenBank way of > representing features and locations, and makes it hard to add on more > flexible formats like GFF. Converting between very different feature > representations is going to be complex and a whole new problem; > why do you have to support that to use a SeqRecord in your code? The big aim of Bio.SeqIO was to allow using many different file formats with the same object representation. Implicitly (assuming the required data is present), input from one file format could be output in another format. The problem lots of current code in Biopython uses SeqRecord/SeqFeatures in a particular way (GenBank/EMBL parsers, GenomeDiagram, GenBank output). Unfortunately, for GFF files it seems this isn't the most natural way to use SeqFeature objects (where you need real nesting). > Overall, I'd like to see it be simpler for people to contribute and > add parsers to Biopython. I hope that for simple file formats this already the case. But for annotation rich file formats, if we want SeqIO to continue to be useful for conversion, this by neccessity requires some awareness of how the other parsers/writers will represent the same data. One option for contributions is to offer a "low level" parser using basic Python datatypes or simple file-type specific records. Then someone more familiar with SeqIO and the other file formats can write a SeqRecord converter in order to integrate it into Bio.SeqIO. This is basically how Ace, Phred, SwissProt (and probably others) were done. >> I still think it would be useful to have Bio/GFF/Parser.py (or >> similar) as the low level parser, and Bio/SeqIO/GffIO.py (or >> similar) to turn this into SeqRecord and SeqFeature objects. > > This appears to be about where the code lives. Personally, I prefer > having things under the GFF namespace and then building thin > wrappers around if in SeqIO if desired. Practically, I want to leave > SeqIO inclusion out right now and try to argue only for getting the > GFF specific parser in. Where the code lives isn't a big issue. You can do a thin wrapper in Bio.SeqIO calling Bio.GFF (where Bio.GFF makes SeqRecords), or a fat wrapper (where Bio.GFF does not make SeqRecords). The problem (as I see it) is SeqIO integration and how your desired use of SeqFeatures will impact this. >> The nested features that worry me. Perhaps the existing >> location operator (e.g. "join") could be set to something >> like "parent/child" if the subfeatures is used to hold child >> features rather than the elements of a join? We need >> the GenBank output code etc to be able to tell these >> apart reliably. > > Right now I don't set the location operator at all. The parent/child > model is much more flexible than the GenBank operator stuff, so > maybe the right way to go is to phase out using the operator at all. > If it is set to nothing than parent/child is assumed, and GenBank > output can add in all of the operators at output time. I agree that using SeqFeature sub-features for parent/child relationships makes a lot of sense. BUT, we have a lot of existing code which follows the GenBank/EMBL parser route of using this for joins (and a few other corner cases). There are other annoyances with the current SeqFeature and FeatureLocation model - the strand and location operator are part of the SeqFeature not the FeatureLocation. It would make more sense to me to move them to the FeatureLocation (and have that handle joins itself). Or, move everything to the SeqFeature (and get rid of the FeatureLocation object). I think the best route forward is to plan a transition of the SeqFeature object to allow nice handling of real nested relationships, and a reworking of complex location handling. Then (hopefully) we can have the GenBank/EMBL/GFF3 parsers all using the SeqFeature in a consistent way. Peter From bugzilla-daemon at portal.open-bio.org Tue Dec 8 16:56:17 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 8 Dec 2009 11:56:17 -0500 Subject: [Biopython-dev] [Bug 2965] New: Updating Bio.Restriction with latest REBASE data Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2965 Summary: Updating Bio.Restriction with latest REBASE data Product: Biopython Version: Not Applicable Platform: PC OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk The Bio/Restriction/Restriction_Dictionary.py file hasn't been updated since 2004. The latest REBASE restriction digest files seem to be from Nov 29 2009, ftp://ftp.neb.com/pub/rebase/ This bug is to update Restriction_Dictionary.py to use the Nov 2009 data. I have tried and failed as described below: ---------------------------------------------------------------------------- I manually downloading these files to the Scripts/Restriction directory: ftp://ftp.neb.com/pub/rebase/emboss_e.912 ftp://ftp.neb.com/pub/rebase/emboss_r.912 ftp://ftp.neb.com/pub/rebase/emboss_s.912 And then ran ranacompiler.py which generated a new Restriction_Dictionary.py As an aside, module sre is deprecate, re is suggested instead. Other interesting output: WARNING : HaeIV cut twice with different overhang length each time. Unable to deal with this behaviour. This enzyme will not be included in the database. Sorry. Checking : Anyway, HaeIV is not commercially available. WARNING : TaqII has two different sites. The new database contains 753 enzymes. So far so good, but using the new Restriction_Dictionary.py the unit tests fail: $ python test_Restriction.py Traceback (most recent call last): File "test_Restriction.py", line 6, in from Bio.Restriction import * File "/Users/myusername/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Restriction/__init__.py", line 61, in from Bio.Restriction.Restriction import * File "/Users/myusername/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Restriction/Restriction.py", line 2358, in newenz = T(k, bases, enzymedict[k]) File "/Users/myusername/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Restriction/Restriction.py", line 216, in __init__ cls.compsite = re.compile(cls.compsite) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py", line 188, in compile return _compile(pattern, flags) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/re.py", line 241, in _compile raise error, v # invalid expression sre_constants.error: bad character in group name -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 8 17:02:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 8 Dec 2009 12:02:42 -0500 Subject: [Biopython-dev] [Bug 2965] Updating Bio.Restriction with latest REBASE data In-Reply-To: Message-ID: <200912081702.nB8H2g4b014553@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2965 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-08 12:02 EST ------- To be more precise, running Bio/Restriction/Restriction.py in IDLE and looking at the stack track, the regular expression failing is for enzyme CviKI-1, (?P[AG]GC[CT])|(?P[AG]GC[CT]) The problem seems to be the hyphen/minus sign in the enzyme name which is being used as a group name in the regular expression. I think this is the only Enzyme with this name. Since it can't be used as a python name either, we should probably map it to an underscore: >>> import re >>> re.compile('(?P[AG]GC[CT])|(?P[AG]GC[CT])') ... error: bad character in group name >>> re.compile('(?P[AG]GC[CT])|(?P[AG]GC[CT])') <_sre.SRE_Pattern object at 0xe8d700> -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 8 17:50:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 8 Dec 2009 12:50:29 -0500 Subject: [Biopython-dev] [Bug 2965] Updating Bio.Restriction with latest REBASE data In-Reply-To: Message-ID: <200912081750.nB8HoTDW016476@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2965 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-08 12:50 EST ------- Fixed by mapping hyphen to an underscore. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From kellrott at gmail.com Tue Dec 8 22:00:11 2009 From: kellrott at gmail.com (Kyle Ellrott) Date: Tue, 8 Dec 2009 14:00:11 -0800 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <20091208140354.GG74538@sobchak.mgh.harvard.edu> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> Message-ID: Speaking of stuff that may not be ready for 1.53, but should start speeding up for 1.54, I've translated a bunch of HMMER3 / PfamScan code in the Bio.HMMER and Bio.Pfam modules in my github branch (right now it's sitting in the jython branch, but I can spin it into a separate branch). Right now it's missing the code to parse HMMER2, there needs to be more extensive unit testing, and the API needs to be nailed down with some documentation. Is there anybody else that needs HMMER and Pfam support? Kyle From biopython at maubp.freeserve.co.uk Tue Dec 8 22:18:03 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Dec 2009 22:18:03 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> On Tue, Dec 8, 2009 at 10:00 PM, Kyle Ellrott wrote: > > Speaking of stuff that may not be ready for 1.53, but should start speeding > up for 1.54, I've translated a bunch of HMMER3 / PfamScan code in the > Bio.HMMER and Bio.Pfam modules in my github branch (right now it's sitting > in the jython branch, but I can spin it into a separate branch). > Right now it's missing the code to parse HMMER2, there needs to be more > extensive unit testing, and the API needs to be nailed down with some > documentation. > Is there anybody else that needs HMMER and Pfam support? > > Kyle That had caught my eye, and it is potentially of direct interest to me personally. I will probably skip HMMER2 and go straight to HMMER3 though ;) On a related point, I am reasonably confident we can get most of Biopython running on Jython 2.5.1 in time for the release. Other than things that Jython doesn't support at all, i.e. the C code, DTD parsing (needed for Bio.Entrez), and the lack of a buffer function (not important, only used in deprecated code now), the only remaining hurdle is Bio.Restriction, and I think I have solved that. I will be testing this tomorrow (time permitting). Your groundwork has been very useful here Kyle. Thanks, Peter From biopython at maubp.freeserve.co.uk Tue Dec 8 22:30:20 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Dec 2009 22:30:20 +0000 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <320fb6e00912080615k641cfc15v1c80b26132de83eb@mail.gmail.com> References: <20091202125744.GA46415@sobchak.mgh.harvard.edu> <317375.58712.qm@web62401.mail.re1.yahoo.com> <20091203142534.GF51407@sobchak.mgh.harvard.edu> <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> <20091204134010.GK51407@sobchak.mgh.harvard.edu> <320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com> <20091208133312.GE74538@sobchak.mgh.harvard.edu> <320fb6e00912080615k641cfc15v1c80b26132de83eb@mail.gmail.com> Message-ID: <320fb6e00912081430q6db93d55l6de4a02baefd6c12@mail.gmail.com> On Tue, Dec 8, 2009 at 2:15 PM, Peter wrote: > > I agree that using SeqFeature sub-features for parent/child > relationships makes a lot of sense. BUT, we have a lot of > existing code which follows the GenBank/EMBL parser > route of using this for joins (and a few other corner cases). > > There are other annoyances with the current SeqFeature > and FeatureLocation model - the strand and location operator > are part of the SeqFeature not the FeatureLocation. It would > make more sense to me to move them to the FeatureLocation > (and have that handle joins itself). Or, move everything to > the SeqFeature (and get rid of the FeatureLocation object). > > I think the best route forward is to plan a transition of the > SeqFeature object to allow nice handling of real nested > relationships, and a reworking of complex location handling. > Then (hopefully) we can have the GenBank/EMBL/GFF3 > parsers all using the SeqFeature in a consistent way. > Just to add some ideas to this thread for discussion, on possible ways forward without breaking backwards compatibility... hopefully this is clear, I did have a glass of wine with dinner ;) Given the way the existing SeqFeature list property subfeatures is used (by the GenBank/EMBL parser etc), would it make sense for the GFF needs to add a new list for child features (say property "children"), and perhaps another property (maybe "parent") which can point back at the parent SeqFeature. i.e. A sort of tree, allowing us to represent genes, exons, etc. Note we may want to use weak references in the above (children/parent references) to assist the python GC. Given the above, potentially the GenBank/EMBL parser could be enhanced to use these new properties (e.g. for linking gene and CDS features in bacteria, or CDS and mat_peptide features in viruses etc). [This still leaves the ontology issues - which might be best dealt with by the GenBank output code] Peter From kellrott at gmail.com Tue Dec 8 22:42:54 2009 From: kellrott at gmail.com (Kyle Ellrott) Date: Tue, 8 Dec 2009 14:42:54 -0800 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> Message-ID: > > On a related point, I am reasonably confident we can get most > of Biopython running on Jython 2.5.1 in time for the release. > Other than things that Jython doesn't support at all, i.e. the C > code, DTD parsing (needed for Bio.Entrez), and the lack of a > buffer function (not important, only used in deprecated code > now), the only remaining hurdle is Bio.Restriction, and I think > I have solved that. I will be testing this tomorrow (time > permitting). The last bit for 'full' jython support is getting BioSQL working. Unfortunately MySQLdb links directly to the C mysql API, and doesn't work in Jython. My jython port also has work that moves the BioSQL interface from the internal ORM to a SqlAlchemy interface. Of course that is a little controversial because it introduces a dependency on another python package. Of course it takes care of sqlite and Java MySql connector support at the same time, so it does have some pluses. Kyle From biopython at maubp.freeserve.co.uk Tue Dec 8 22:46:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Dec 2009 22:46:19 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> Message-ID: <320fb6e00912081446w303edd73qe3a5dad964314487@mail.gmail.com> On Tue, Dec 8, 2009 at 10:42 PM, Kyle Ellrott wrote: > > The last bit for 'full' jython support is getting BioSQL working. > Unfortunately MySQLdb links directly to the C mysql API, and doesn't work in > Jython.? My jython port also has work that moves the BioSQL interface from > the internal ORM to a SqlAlchemy interface.? Of course that is a little > controversial because it introduces a dependency on another python package. > Of course it takes care of sqlite and Java MySql connector support at the > same time, so it does have some pluses. Fair point w.r.t. "full" jython support ;) I would be more comfortable with BioSQL on Jython working directly with sqlite (once we add that to BioSQL) and the Java MySql connector directly (without the extra dependency on SQLAlchemy). Peter From biopython at maubp.freeserve.co.uk Tue Dec 8 23:38:04 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 8 Dec 2009 23:38:04 +0000 Subject: [Biopython-dev] Bio.GFF and Brad's code In-Reply-To: <320fb6e00912081430q6db93d55l6de4a02baefd6c12@mail.gmail.com> References: <20091202125744.GA46415@sobchak.mgh.harvard.edu> <317375.58712.qm@web62401.mail.re1.yahoo.com> <20091203142534.GF51407@sobchak.mgh.harvard.edu> <320fb6e00912030653k276f49a6x3e1eade3f0ef04e0@mail.gmail.com> <20091204134010.GK51407@sobchak.mgh.harvard.edu> <320fb6e00912040625j7e2c4d03m4f2d595e9288fdb6@mail.gmail.com> <20091208133312.GE74538@sobchak.mgh.harvard.edu> <320fb6e00912080615k641cfc15v1c80b26132de83eb@mail.gmail.com> <320fb6e00912081430q6db93d55l6de4a02baefd6c12@mail.gmail.com> Message-ID: <320fb6e00912081538o635347ceh8e10aa4863e538e9@mail.gmail.com> On Tue, Dec 8, 2009 at 2:15 PM, Peter wrote: >> >> There are other annoyances with the current SeqFeature >> and FeatureLocation model - the strand and location operator >> are part of the SeqFeature not the FeatureLocation. It would >> make more sense to me to move them to the FeatureLocation >> (and have that handle joins itself). Or, move everything to >> the SeqFeature (and get rid of the FeatureLocation object). >> In addition to the strand and location operator, there is also (sometimes) a database cross reference (properties ref and db_ref, e.g. in contig files). Again, this is conceptually part of the feature location (and stored that way in BioSQL if I recall correctly). One example of where it would make sense to move things like the database, operator and strand to the FeatureLocation is the coded_by information in some GenPept file annotation, a use case very recently raised on the main mailing list: http://lists.open-bio.org/pipermail/biopython/2009-December/005910.html The current FeatureLocation simply can't be used here - although a full SeqFeature could be. Peter From bugzilla-daemon at portal.open-bio.org Wed Dec 9 09:56:34 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 04:56:34 -0500 Subject: [Biopython-dev] [Bug 2966] New: Primer3Commandline does not use EMBOSS 6.1.0 arguments Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2966 Summary: Primer3Commandline does not use EMBOSS 6.1.0 arguments Product: Biopython Version: 1.52 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: lpritc at scri.sari.ac.uk Several arguments for EMBOSS eprimer3 are different in version 6.1.0 from those used in Primer3Commandline. I have updated Primer3Commandline locally (and added documentation strings), and will make this available via github with some other proposed changes shortly, after talking to Peter. This revealed another bug, which I will submit separately. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 10:07:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 05:07:14 -0500 Subject: [Biopython-dev] [Bug 2967] New: AbstractCommandline silently accepts invalid parameter options Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2967 Summary: AbstractCommandline silently accepts invalid parameter options Product: Biopython Version: 1.52 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: lpritc at scri.sari.ac.uk While investigating Bug 2996 I noticed that AbstractCommandline was silently accepting invalid parameter options when passed by setting attributes. For example: cline = Primer3Commandline(bogus=True) cline.sequence = filename raises the appropriate ValueError, as the parameter name 'bogus' is being compared to the self.parameters list when setting, and is found not to be valid. However, the following code: cline = Primer3Commandline() cline.sequence = filename cline.bogus = True # Invalid argument not flagged up cline.sequnce = True # Mistyped argument not flagged up silently sets the invalid cline.bogus and cline.sequnce attributes without warning. Parameters set via attribute are not validated with the setter/getters defined for the properties in AbstractCommandline.__init__ This could (did!) lead the user to think that parameters are set when they are not, under at least two circumstances: 1) Typos in the parameter name 2) Using a parameter unsupported by the interface (see Bug 2996). L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 10:08:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 05:08:12 -0500 Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts invalid parameter options In-Reply-To: Message-ID: <200912091008.nB9A8Cc5008147@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2967 ------- Comment #1 from lpritc at scri.sari.ac.uk 2009-12-09 05:08 EST ------- Sorry, I'm referring to bug 2966 in the post above. My bad. L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 10:46:11 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 05:46:11 -0500 Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts invalid parameter options In-Reply-To: Message-ID: <200912091046.nB9AkBXi009268@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2967 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-09 05:46 EST ------- (In reply to comment #0) > While investigating Bug 2996 I noticed that AbstractCommandline was silently > accepting invalid parameter options when passed by setting attributes. For > example: > > cline = Primer3Commandline(bogus=True) > cline.sequence = filename > > raises the appropriate ValueError, as the parameter name 'bogus' is being > compared to the self.parameters list when setting, and is found not to be > valid. However, the following code: > > cline = Primer3Commandline() > cline.sequence = filename > cline.bogus = True # Invalid argument not flagged up > cline.sequnce = True # Mistyped argument not flagged up > > > silently sets the invalid cline.bogus and cline.sequnce attributes without > warning. Parameters set via attribute are not validated with the > setter/getters defined for the properties in AbstractCommandline.__init__ > This could (did!) lead the user to think that parameters are set when they > are not, under at least two circumstances: > > 1) Typos in the parameter name > 2) Using a parameter unsupported by the interface This is normal Python object behaviour - you can add any "property" like this at run time, >>> class Dummy(object) : ... pass ... >>> d = Dummy() >>> d.name = "Fred" >>> dir(d) ['__class__', '__delattr__', '__dict__', '__doc__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__', '__weakref__', 'name'] >>> d.name 'Fred' We might still be able to block this via __setattr__, this needs some experimentation. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 12:23:34 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 07:23:34 -0500 Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts invalid parameter options In-Reply-To: Message-ID: <200912091223.nB9CNYtT012354@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2967 ------- Comment #3 from lpritc at scri.sari.ac.uk 2009-12-09 07:23 EST ------- (In reply to comment #2) > This is normal Python object behaviour - you can add any "property" like this > at run time, [...] Oddly enough, I was already aware of that... ;) The issue is that the setting of parameters via attributes fails silently, but is demonstrated in the tutorial and is in any case often rather more convenient than declaring the parameters on instantiation, so is very likely to be used in anger. This potentially (and *actually* in my case, when attempting to use EMBOSS 6.1.0 parameter names with eprimer3) leads to cases where the user might expect that command-line options have been set, when they in fact haven't. > We might still be able to block this via __setattr__, this needs some > experimentation. That seems to be the best route to me, initially. It might be worth removing the property magic in the AbstractCommandline.__init__(), and instead use __setattr__, __getattr__, and __delattr__, having them behave appropriately for known parameter names. I'll have a go at doing that and put it in with the EMBOSS stuff I'm working on, just now. L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 12:28:07 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 07:28:07 -0500 Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts invalid parameter options In-Reply-To: Message-ID: <200912091228.nB9CS7vS012457@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2967 ------- Comment #4 from lpritc at scri.sari.ac.uk 2009-12-09 07:28 EST ------- (In reply to comment #3) > (In reply to comment #2) > > > We might still be able to block this via __setattr__, this needs some > > experimentation. > > That seems to be the best route to me, initially. It might be worth removing > the property magic in the AbstractCommandline.__init__(), and instead use > __setattr__, __getattr__, and __delattr__, having them behave appropriately for > known parameter names. > > I'll have a go at doing that and put it in with the EMBOSS stuff I'm working > on, just now. Peter has pointed out that he'd like to retain discoverability, and so restrict the change to a validating __setattr__ - which seems reasonable. L. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 12:53:00 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 07:53:00 -0500 Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts invalid parameter options In-Reply-To: Message-ID: <200912091253.nB9Cr0cP013048@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2967 ------- Comment #5 from lpritc at scri.sari.ac.uk 2009-12-09 07:53 EST ------- This works for me, at the moment: def __setattr__(self, name, value): """ Workaround for a user interface issue. Without this __setattr__ attribute-based assignment of parameters will silently accept invalid parameters, leading to known instances of the user assuming that parameters for the application are set, when they are not. This workaround uses a whitelist of object attributes, and sets the object attribute list as normal, for these. Other attributes are assumed to be parameters, and passed to the self.set_parameter method for validation and assignment. """ attr_whitelist = ['parameters', 'program_name'] # Allowed attributes if name not in attr_whitelist: # If not in whitelist, treat self.set_parameter(name, value) # as parameter else: self.__dict__[name] = value -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Dec 9 13:21:50 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 9 Dec 2009 13:21:50 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> Message-ID: <320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com> On Tue, Dec 8, 2009 at 10:18 PM, Peter wrote: > > On a related point, I am reasonably confident we can get most > of Biopython running on Jython 2.5.1 in time for the release. > Other than things that Jython doesn't support at all, i.e. the C > code, DTD parsing (needed for Bio.Entrez), and the lack of a > buffer function (not important, only used in deprecated code > now), the only remaining hurdle is Bio.Restriction, and I think > I have solved that. I will be testing this tomorrow (time > permitting). Your groundwork has been very useful here Kyle. I'm stuck again with Bio.Restriction under Jython. I've got the Bio/Restriction/Restriction_Dictionary.py to load under Jython (just = the Nov 2009 update isn't helping to keep the code size down), but doing test_Restriction.py hits the JVM limit. Furthermore, there is a little bit of C code in Bio.Restriction (which I think we can replace with plain python). Peter From biopython at maubp.freeserve.co.uk Wed Dec 9 14:18:19 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 9 Dec 2009 14:18:19 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> <320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com> Message-ID: <320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com> On Wed, Dec 9, 2009 at 1:21 PM, Peter wrote: > > Furthermore, there is a little bit of C code in Bio.Restriction > (which I think we can replace with plain python). > I've replaced the C module Bio.Restriction.DNAUtils with Python code, and deprecated it. I am surprised it was written in C in the first place! Peter From bugzilla-daemon at portal.open-bio.org Wed Dec 9 15:04:10 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 10:04:10 -0500 Subject: [Biopython-dev] [Bug 2967] AbstractCommandline silently accepts invalid parameter options In-Reply-To: Message-ID: <200912091504.nB9F4AUM017626@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2967 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #6 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-09 10:04 EST ------- Fix committed - almost as is, I also added a doctest. Thanks! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Dec 9 15:57:20 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 9 Dec 2009 15:57:20 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> <320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com> <320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com> Message-ID: <320fb6e00912090757s6efbd2acpcb197e8e77cd298f@mail.gmail.com> Good news: I've tweaked the RestrictionCompiler.py code to modify how it generates Bio/Restriction/Restriction_Dictionary.py in order to build the dictionaries incrementally. Together with the removal of the C code DNAUtils, this means (after a clean install) that Jython likes Bio.Restriction and that test_Restiction.py passes on Jython 2.5.1 (and C Python too). Bad news: I think I have broken test_CAPS.py (under both Jython and Python). It looks like it hits some bits of Bio.Restriction are not covered by test_Restiction.py I'm working on it still ... Peter From biopython at maubp.freeserve.co.uk Wed Dec 9 16:25:28 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 9 Dec 2009 16:25:28 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <320fb6e00912090757s6efbd2acpcb197e8e77cd298f@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> <320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com> <320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com> <320fb6e00912090757s6efbd2acpcb197e8e77cd298f@mail.gmail.com> Message-ID: <320fb6e00912090825t45d2ac1atfaba7159d75aa6fc@mail.gmail.com> On Wed, Dec 9, 2009 at 3:57 PM, Peter wrote: > Good news: > > I've tweaked the RestrictionCompiler.py code to modify how it generates > Bio/Restriction/Restriction_Dictionary.py in order to build the dictionaries > incrementally. Together with the removal of the C code DNAUtils, this > means (after a clean install) that Jython likes Bio.Restriction and that > test_Restiction.py passes on Jython 2.5.1 (and C Python too). > > Bad news: > > I think I have broken test_CAPS.py (under both Jython and Python). > It looks like it hits some bits of Bio.Restriction are not covered by > test_Restiction.py > > I'm working on it still ... Solved: the check_bases function in Bio.Restriction also used to make things uppercase (but the docstring didn't make this clear and the C code was non-obvious). I think this means the whole test suite passes on Jython 2.5.1 (barring those bits with C code dependencies, BioSQL, or the known Jython issues with DTD passing or the missing buffer function). Kyle - could you confirm this on your machine please? Thanks, Peter From bugzilla-daemon at portal.open-bio.org Wed Dec 9 17:57:37 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 12:57:37 -0500 Subject: [Biopython-dev] [Bug 2968] New: Modifications to Emboss eprimer3 parser and associated files Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2968 Summary: Modifications to Emboss eprimer3 parser and associated files Product: Biopython Version: 1.52 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: lpritc at scri.sari.ac.uk The existing Emboss primer3/eprimer3 code has a couple of issues, and some scope for improvement: - The existing Primer3.py parser code can only parse output when eprimer3 is applied to a single sequence. When eprimer3 is applied to multiple sequence input, it groups all primers for all sequences into a single record, which may incorrectly associate primers with the wrong sequences in downstream analysis. - The current parser lacks an iterator for iterating over multiple sequence output - The current parser creates 'ghost' primers for all primer pairs, with length zero and sequence as an empty string; it does not do this for internal oligos. A more intuitive solution might be to return None for absent primers/oligos - The current data model stores all primer data as individual attributes. It might be more useful to group the attributes of individual primers into their natural associations I have written new code for Emboss/Primer3.py that adds iterator/multiple sequence parsing functionality to the parser, and extensively revises the object model for the data. The Record and Primers objects are retained, but each primer/oligo is now represented by a Primer object that collects the relevant data together. The Record object has a new attribute that allows the sequence to be recorded directly, rather than having to be parsed from the comments attribute. The new data model retains the old attribute-based access for compatibility, but adds direct access to the Primer objects (where present) by .forward, .reverse and .oligo attributes, and by keywords. One change was required to the unit test, to account for the reporting of absent primers as None, rather than having 'null' attributes. I've added two further test output files, which may be rather large for the distribution (60kb total), and doctests that use these. The code can be inspected at my GitHub repository: http://github.com/widdowquinn/biopython/commit/b4701079ced297d7af5aa75b46738280c8783fe0 This enhancement request also relates to bug 2966. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 17:59:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 12:59:14 -0500 Subject: [Biopython-dev] [Bug 2968] Modifications to Emboss eprimer3 parser and associated files In-Reply-To: Message-ID: <200912091759.nB9HxErQ022462@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2968 ------- Comment #1 from lpritc at scri.sari.ac.uk 2009-12-09 12:59 EST ------- I forgot to mention - the new code still passes the test_EmbossPrimer.py unit test. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 18:01:13 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 13:01:13 -0500 Subject: [Biopython-dev] [Bug 2966] Primer3Commandline does not use EMBOSS 6.1.0 arguments In-Reply-To: Message-ID: <200912091801.nB9I1DMe022568@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2966 ------- Comment #1 from lpritc at scri.sari.ac.uk 2009-12-09 13:01 EST ------- I have made changes to Primer3Commandline that involve adding the EMBOSS 6.1.0 arguments, and docstrings to each argument. I have also added doctests. The proposed code can be inspected at my GitHub repository: http://github.com/widdowquinn/biopython/commit/9c0643e333b0cafb4e356426fb4902e0e9d2385c -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 18:03:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 13:03:30 -0500 Subject: [Biopython-dev] [Bug 2969] New: Addition of SeqmatchallCommandline to Emboss/Applications.py Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2969 Summary: Addition of SeqmatchallCommandline to Emboss/Applications.py Product: Biopython Version: 1.52 Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: lpritc at scri.sari.ac.uk I thought it would be useful to have a command line wrapper to the EMBOSS seqmatchall application, and have added this to Emboss/Applications.py, with doctests. The proposed code can be inspected at my GitHub repository: http://github.com/widdowquinn/biopython/commit/ced72a34b2565b97f3ad2c77a66e1083375cff02 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From kellrott at gmail.com Wed Dec 9 19:22:01 2009 From: kellrott at gmail.com (Kyle Ellrott) Date: Wed, 9 Dec 2009 11:22:01 -0800 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <320fb6e00912090825t45d2ac1atfaba7159d75aa6fc@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> <3f6baf360912070833j15d0c36bs99f16669f22345b@mail.gmail.com> <20091208140354.GG74538@sobchak.mgh.harvard.edu> <320fb6e00912081418k7bfcd3b2g47cbd17dad693549@mail.gmail.com> <320fb6e00912090521ifb78246w79b45e71ed0a78c1@mail.gmail.com> <320fb6e00912090618y43add6f9v5cee8fb044b27eab@mail.gmail.com> <320fb6e00912090757s6efbd2acpcb197e8e77cd298f@mail.gmail.com> <320fb6e00912090825t45d2ac1atfaba7159d75aa6fc@mail.gmail.com> Message-ID: > Kyle - could you confirm this on your machine please? > It looks like the master branch is working well. I guess the next step will be looking into the zxJDBC to expand the BioSQL ORM. Intro can be found at: http://www.informit.com/articles/article.aspx?p=26143 Kyle From bugzilla-daemon at portal.open-bio.org Wed Dec 9 21:53:42 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 16:53:42 -0500 Subject: [Biopython-dev] [Bug 2969] Addition of SeqmatchallCommandline to Emboss/Applications.py In-Reply-To: Message-ID: <200912092153.nB9LrgYN027652@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2969 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-09 16:53 EST ------- A nice easy one to wrap at first glance. I would like to also include the "aformat" output to set the output alignment format (useful to set to pair or simple for AlignIO to parse it as the "emboss" alignment format - see the needle and water wrappers). You could then also add a run time test to test_Emboss.py piping this to AlignIO... ;) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 22:42:26 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 17:42:26 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912092242.nB9MgQS9028588@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #8 from chapmanb at 50mail.com 2009-12-09 17:42 EST ------- Great idea Peter -- happy to get this in. It's now on a branch here: http://github.com/chapmanb/biopython/tree/biosql-sqlite It would be excellent if you, Cymon or anyone else interested could review and merge it in. This also includes a small typo fix on Bio/SeqIO/InsdcIO.py which isn't really related but came up when I was running the BioSQL tests. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Dec 9 23:51:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 9 Dec 2009 18:51:14 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912092351.nB9NpESn030303@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-09 18:51 EST ------- Hi Brad, My only immediate comment is it might make sense to split the BioSQL tests in two, one for SQLite which we can try and make 100% automatic (at least on Python 2.5+), and one for a user specified back end (MySQL, PostreSQL etc) which requires a username and password. Its midnight here in the UK, so feel free to tweak things this evening your time and I'll take full look tomorrow. Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 10 11:12:36 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Dec 2009 06:12:36 -0500 Subject: [Biopython-dev] [Bug 2969] Addition of SeqmatchallCommandline to Emboss/Applications.py In-Reply-To: Message-ID: <200912101112.nBABCaRr015734@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2969 ------- Comment #2 from lpritc at scri.sari.ac.uk 2009-12-10 06:12 EST ------- (In reply to comment #1) > A nice easy one to wrap at first glance. I would like to also include the > "aformat" output to set the output alignment format (useful to set to pair or > simple for AlignIO to parse it as the "emboss" alignment format - see the > needle and water wrappers). You could then also add a run time test to > test_Emboss.py piping this to AlignIO... ;) That shouldn't take too long to do (though probably won't get done by me this week). Do we want to set any particular policy for the sequence-associated and outfile-associated arguments? Their inclusion in the command-line wrappers is pretty inconsistent, which is why I left them out in the first place. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 10 11:15:09 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Dec 2009 06:15:09 -0500 Subject: [Biopython-dev] [Bug 2964] placing x-axis of graph track at the bottom or top of the track in GenomeDiagram In-Reply-To: Message-ID: <200912101115.nBABF90t015907@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2964 ------- Comment #12 from Daniel.Nicorici at gmail.com 2009-12-10 06:15 EST ------- (In reply to comment #11) > (In reply to comment #10) > > > It looks a little bit confusing too me now because I see that there are two > > sides of the problem (or two bugs?), as following: > > 1) drawing a line orthogonal on y-axis at any position which represents the > > x-axis (this does not affect how the values are plotted and in what interval) > > 2) in the case of bar plotting (partially affects also linear plotting), the > > values should be drawn automatically from zero (zero on y-axis, i.e. x=0 and > > y=-inf...+inf) unless the user specify something else and not to be drawn by > > default from some arbitrary point, e.g. median, mean, etc., as it is done now. > > > > I have the feeling that the solution presented here affects only the point 1) > > and not 2). > > > > Please, could you elaborate more such that maybe I could implement your > > suggestion? > > I see why you've distinguished between the two cases, but I think they can be > handled by the earlier suggestion to implement the location of the x-axis in > the context of also allowing the user to set y-axis limits (see comment #5). > It's the combination of allowing y-axis limits and the location of x-axis > crossing that gives the greatest flexibility. For example, if y-limit > selection and x-axis crossing point were under user control... > > ...if you wanted to continue with the current behaviour, you'd not set any > y-limits, and not specify the location of the x-axis. > > ...if you wanted to draw short read coverage, you'd set the lower y-limit to 0, > and set the location of the x-axis to zero (if that was not the default). This > should draw bars with their bases on the bottom/inner of the track, and the > scale running along the bottom/inner of the track. > > ...if you wanted to represent some data as a bar graph, with a special meaning > for the mean (or median) value, you could optionally set y-limits, but have the > x-axis cross at mean(data) or median(data). This should draw bars with their > bases on the x-axis, and the axis located at the mean/median value for the > data. I submitted the changes which do somehow what is described above, i.e. still by default the x-axis is drawn in the middle of the track (it is still left for now like this in order not to change the default behavior of GenomeDiagram). If the x-axis is specified to be drawn at the bottom or top of the track then the x-axis is drawn there and the values for bars/lines in the graph are drawn using zero-based (if the some values are positive and other are negative) or min (if all values are positive) or max (all values are negative). Hence only when specifying the x-axis to be drawn at the bottom or top for the track, the behavior of the graph and plotting are affected. The limits are computed automatically. > > Does this help clarify what I meant, above? It helped. Thanks! BR, Daniel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Dec 10 12:20:55 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 10 Dec 2009 12:20:55 +0000 Subject: [Biopython-dev] Removing C implementation of deprecated listfns, mathfns, stringfns Message-ID: <320fb6e00912100420o74dc84efhe3af0aa278386ec8@mail.gmail.com> Hi all, The modules listfns, mathfns, stringfns are now all deprecated. They all have both a C implementation and a pure Python implementation. We could wait for the complete deprecation process, and remove the C code when the Python code gets removed. However, I would like remove their C implementations for the next release, as this will simplify our code base. The only downside is anyone still using these modules will get a deprecation warning and a possible slow down (as the C code wouldn't exist any more). Also anyone using the C code directly will be in trouble (but no-one should be doing that...). Any comments? Objections? Thanks, Peter From bugzilla-daemon at portal.open-bio.org Thu Dec 10 12:39:15 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Dec 2009 07:39:15 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912101239.nBACdFtu018207@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #10 from chapmanb at 50mail.com 2009-12-10 07:39 EST ------- Thanks Peter. All of the tests will run on SQLite provided sqlite3 is installed, so there is no need to split them. I enabled SQLite by default, so they will run automatically if a user has sqlite3 and fail gracefully with a dependency error if not. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 10 12:43:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Dec 2009 07:43:28 -0500 Subject: [Biopython-dev] [Bug 2969] Addition of SeqmatchallCommandline to Emboss/Applications.py In-Reply-To: Message-ID: <200912101243.nBAChSHg018300@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2969 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-10 07:43 EST ------- (In reply to comment #2) > Do we want to set any particular policy for the sequence-associated and > outfile-associated arguments? Their inclusion in the command-line wrappers > is pretty inconsistent, which is why I left them out in the first place. In the long term, I'd like us to look at generating the wrappers automatically from the EMBOSS ACD files which define their tool options. For now, since some EMBOSS tools have so many options, they have been added in a somewhat ad-hoc basis based on what the coder thought most important, or user feedback. Fix checked in with addition of aformat option. Thanks! Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 10 12:52:16 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Dec 2009 07:52:16 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912101252.nBACqGp6018512@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-10 07:52 EST ------- (In reply to comment #10) > Thanks Peter. All of the tests will run on SQLite provided sqlite3 is > installed, so there is no need to split them. I enabled SQLite by default, so > they will run automatically if a user has sqlite3 and fail gracefully with a > dependency error if not. That's great as is. I was thinking about something more: What I meant was, I want to be able to run all the tests on SQLite (by default) AND on another back end (e.g. MySQL) if the user has configured it. Otherwise we (as developers) have to manually switch the BioSQL settings and rerun the BioSQL unit tests. I will be able to test the effect of your changes on MySQL, hopefully Cymon can do this on PostgreSQL - not that I anticipate and regressions, but best to be sure ;) Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 10 12:56:44 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Dec 2009 07:56:44 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912101256.nBACuheQ018635@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-10 07:56 EST ------- (In reply to comment #11) > > That's great as is. I was thinking about something more: What I meant was, I > want to be able to run all the tests on SQLite (by default) AND on another > back end (e.g. MySQL) if the user has configured it. Otherwise we (as > developers) have to manually switch the BioSQL settings and rerun the BioSQL > unit tests. > On reflection, that kind of improvement can wait until after Biopython 1.53 is out. It would be great to make it completely general so that if you have all the backends installed the test suite could check on SQLite, MySQL, PostgreSQL etc. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 10 13:15:45 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Dec 2009 08:15:45 -0500 Subject: [Biopython-dev] [Bug 2495] parse element symbols for ATOM/HETATM records (Bio.PDB.PDBParser) In-Reply-To: Message-ID: <200912101315.nBADFj7O019533@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2495 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-10 08:15 EST ------- (In reply to comment #2) > > Leaving bug open to deal with the output as well. > Marking bug as fixed. I've just committed a change based on a patch from Frederik Gwinner via GitHub - Bio.PDB.PDBIO should now save the element on output now, Please reopen this bug if there is any problem. Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Dec 10 14:25:53 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 10 Dec 2009 14:25:53 +0000 Subject: [Biopython-dev] Removing C implementation of deprecated listfns, mathfns, stringfns In-Reply-To: <320fb6e00912100420o74dc84efhe3af0aa278386ec8@mail.gmail.com> References: <320fb6e00912100420o74dc84efhe3af0aa278386ec8@mail.gmail.com> Message-ID: <320fb6e00912100625s48ba290cj1234d757da0b94f@mail.gmail.com> On Thu, Dec 10, 2009 at 12:20 PM, Peter wrote: > Hi all, > > The modules listfns, mathfns, stringfns are now all deprecated. They > all have both a C implementation and a pure Python implementation. > > We could wait for the complete deprecation process, and remove > the C code when the Python code gets removed. However, I would > like remove their C implementations for the next release, as this will > simplify our code base. > > The only downside is anyone still using these modules will get > a deprecation warning and a possible slow down (as the C code > wouldn't exist any more). Also anyone using the C code directly > will be in trouble (but no-one should be doing that...). > > Any comments? Objections? I hope there are no objections as I've just done this on the trunk ;) Peter From bugzilla-daemon at portal.open-bio.org Thu Dec 10 14:54:17 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 10 Dec 2009 09:54:17 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912101454.nBAEsHdi023376@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-10 09:54 EST ------- (In reply to comment #11) > > I will be able to test the effect of your changes on MySQL, hopefully Cymon > can do this on PostgreSQL - not that I anticipate and regressions, but best > to be sure ;) > The branch still merges cleanly onto the trunk (I had already manually applied the Bio/SeqIO/InsdcIO.py date fix to the trunk). Testing "as is" on Mac OS X 10.5 with Apple's Python 2.5.2 uses SQLite, and works. Changing setup_BioSQL.py to use MySQL also works fine :) I have not yet tried this on Windows. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 12 18:12:23 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 12 Dec 2009 13:12:23 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912121812.nBCICNWt003206@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #14 from cymon.cox at gmail.com 2009-12-12 13:12 EST ------- (In reply to comment #11) > I will be able to test the effect of your changes on MySQL, hopefully Cymon > can do this on PostgreSQL - not that I anticipate and regressions, but best > to be sure ;) Is SQLite ":memory:" TESTDB working for you on Brads branch? It fails for me, all else is fin (incl the SQLite file db). [cymon at spiro Tests]$ python test_BioSQL_SeqIO.py Connecting to database Removing existing sub-database 'biosql-seqio-test' (if exists) Traceback (most recent call last): File "test_BioSQL_SeqIO.py", line 134, in if db_name in server.keys(): File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line 123, in keys return self.adaptor.list_biodatabase_names() File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line 306, in list_biodatabase_names "SELECT name FROM biodatabase") File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line 355, in execute_and_fetch_col0 self.execute(sql, args or ()) File "/home/cymon/git/biopython-github-master/BioSQL/BioSeqDatabase.py", line 336, in execute self.dbutils.execute(self.cursor, sql, args) File "/home/cymon/git/biopython-github-master/BioSQL/DBUtils.py", line 53, in execute cursor.execute(sql, args or ()) sqlite3.OperationalError: no such table: biodatabase Perhaps its my sqlite installation - I'm not familiar with it: [cymon at spiro BioSQL]$ dpkg -l|egrep sqlite ii libmono-sqlite2.0-cil 2.4.2.3+dfsg-2 Mono Sqlite library (for CLI 2.0) ii libsqlite0 2.8.17-6build1 SQLite shared library ii libsqlite3-0 3.6.16-1ubuntu1 SQLite 3 shared library ii sqlite3 3.6.16-1ubuntu1 A command line interface for SQLite 3 C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 12 18:33:15 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 12 Dec 2009 13:33:15 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912121833.nBCIXFCH003747@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-12 13:33 EST ------- (In reply to comment #14) > (In reply to comment #11) > > I will be able to test the effect of your changes on MySQL, hopefully Cymon > > can do this on PostgreSQL - not that I anticipate and regressions, but best > > to be sure ;) > > Is SQLite ":memory:" TESTDB working for you on Brads branch? I didn't try that specifically - just SQLite on disk. Brad? > > It fails for me, all else is fin (incl the SQLite file db) > But the good news is Brad's changes to BioSQL/*.py haven't caused any regressions on PostreSQL :) Thanks Cymon, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 12 18:39:07 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 12 Dec 2009 13:39:07 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912121839.nBCId7U6003831@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #16 from cymon.cox at gmail.com 2009-12-12 13:39 EST ------- (In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #11) > > > I will be able to test the effect of your changes on MySQL, hopefully Cymon > > > can do this on PostgreSQL - not that I anticipate and regressions, but best > > > to be sure ;) > > > > Is SQLite ":memory:" TESTDB working for you on Brads branch? > > I didn't try that specifically - just SQLite on disk. Brad? > > > > > It fails for me, all else is fin (incl the SQLite file db) > > > > But the good news is Brad's changes to BioSQL/*.py haven't caused any > regressions on PostreSQL :) Yep, no problems, although I only tried the psycopg2 driver (with and without rules deletion). Psycopg version 1 support has had a deprecation warning since version 1.53 http://bugzilla.open-bio.org/show_bug.cgi?id=2851#c4 - when can we drop it? C. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 12 19:05:02 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 12 Dec 2009 14:05:02 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912121905.nBCJ52Nn004276@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #17 from chapmanb at 50mail.com 2009-12-12 14:05 EST ------- Thanks Cymon -- glad nothing is broken on Postgres. The in memory database (:memory:) doesn't work for the tests, because they assume a database created by previous test cases. Since the memory one keeps going away, they will get plenty of errors about non-existing tables. It would work in theory with some test re-writing, but it's not too necessary. Sorry, should have added a note about this. Thanks again for double checking that everything works. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 12 19:41:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 12 Dec 2009 14:41:12 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912121941.nBCJfCXr004756@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 ------- Comment #18 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-12 14:41 EST ------- (In reply to comment #16) > > Yep, no problems, although I only tried the psycopg2 driver (with and > without rules deletion). > > Psycopg version 1 support has had a deprecation warning since version 1.53 > http://bugzilla.open-bio.org/show_bug.cgi?id=2851#c4 - when can we drop it? > > C. > Minor typo - Psycopg v1 support was deprecated in Biopython 1.51 (August 2009). In line with the current deprecation policy, we aim for two releases with the warning (which has happened already, 1.51 and 1.52) plus at least one year - which means we can drop Psycopg v1 in summer 2010. Given in this case its a fairly simple task for someone to just install Psycopg v2, we might look at dropping the Psycopg v1 support a little quicker (say Biopython 1.54?). See: http://www.biopython.org/wiki/Deprecation_policy (In reply to comment #17) > Thanks Cymon -- glad nothing is broken on Postgres. > > The in memory database (:memory:) doesn't work for the tests, because they > assume a database created by previous test cases. Since the memory one keeps > going away, they will get plenty of errors about non-existing tables. It would > work in theory with some test re-writing, but it's not too necessary. > > Sorry, should have added a note about this. Thanks again for double checking > that everything works. OK then - Brad, would you like to merge this to the trunk now (or in the next few days), add a note about not using :memory: in Tests/setup_BioSQL.py, and something to the NEWS file (with a proviso about the SQLite schema not yet being official)? Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Dec 14 12:48:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 14 Dec 2009 07:48:28 -0500 Subject: [Biopython-dev] [Bug 2866] SQLite support for BioSQL In-Reply-To: Message-ID: <200912141248.nBECmS6b007714@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2866 chapmanb at 50mail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #19 from chapmanb at 50mail.com 2009-12-14 07:48 EST ------- Peter and Cymon -- thanks again for the help. Merged into the main trunk and marking this as resolved. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Dec 14 16:24:44 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 14 Dec 2009 16:24:44 +0000 Subject: [Biopython-dev] Plans for Biopython 1.53 In-Reply-To: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> References: <320fb6e00912070528s79609056o198cc86169403bdb@mail.gmail.com> Message-ID: <320fb6e00912140824x3bfa58cfy8520142c0fea3a45@mail.gmail.com> On Mon, Dec 7, 2009 at 1:28 PM, Peter wrote: > > One good reason for doing Biopython 1.53 soon is the > NCBI said they plan to start using the new Jan 2010 DTD > files for MedLine/PubMed as early as mid December: > http://lists.open-bio.org/pipermail/biopython-dev/2009-November/007020.html I've just checked the PubMed XML from efetch, and the NCBI are still using the old 2009 DTD file. I guess it is only midday in the USA, so plenty of time for them to make the switch on 14 Dec as announced... Once that happens (hopefully within hours), and I've checked the Entrez parser is still happy, we can do the Biopython release. Until then, only documentation and unit tests fixes on the trunk please. Thanks, Peter From biopython at maubp.freeserve.co.uk Tue Dec 15 10:45:31 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 10:45:31 +0000 Subject: [Biopython-dev] Code freeze for Biopython 1.53 Message-ID: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com> Hello all, I plan to do the Biopython 1.53 release this afternoon (in a few hours time). If there are any last minute changes anyone wants to make on the trunk, please email first. Ideally just documentation or additional unit tests at this point ;) Thanks Peter From biopython at maubp.freeserve.co.uk Tue Dec 15 15:29:48 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 15:29:48 +0000 Subject: [Biopython-dev] Code freeze for Biopython 1.53 In-Reply-To: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com> References: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com> Message-ID: <320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com> On Tue, Dec 15, 2009 at 10:45 AM, Peter wrote: > Hello all, > > I plan to do the Biopython 1.53 release this afternoon (in a few hours time). > OK - Everything looks good on the code side, git has been tagged, source archives and windows installers uploaded. If anyone could double check the installers work on your machines that would be great. Brad - could you run a sanity test before uploading to pypi? David - did you manage to draft a release announcement? If not, don't worry, I'll make one up ;) Peter From biopython at maubp.freeserve.co.uk Tue Dec 15 16:28:13 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 16:28:13 +0000 Subject: [Biopython-dev] Code freeze for Biopython 1.53 In-Reply-To: <320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com> References: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com> <320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com> Message-ID: <320fb6e00912150828q5d3901deq162f14db458f980d@mail.gmail.com> On Tue, Dec 15, 2009 at 3:29 PM, Peter wrote: > On Tue, Dec 15, 2009 at 10:45 AM, Peter wrote: >> Hello all, >> >> I plan to do the Biopython 1.53 release this afternoon (in a few hours time). >> > > OK - Everything looks good on the code side, git has been tagged, source > archives and windows installers uploaded. If anyone could double check > the installers work on your machines that would be great. > > Brad - could you run a sanity test before uploading to pypi? > > David - did you manage to draft a release announcement? If not, don't > worry, I'll make one up ;) Draft text below - any comments? Thanks, Peter ---- We are pleased to announce the availability of Biopython 1.53, a new stable release of the Biopython library, three months after the release of Biopython 1.52. This is our first release since migrating from CVS to git for source code control. There have been some additions to our core objects ? the Seq (and related UnknownSeq) objects gained upper and lower methods (like the string methods of the same name but alphabet aware) plus a new ungap method. The SeqFeature object now has an extract method to get the region of sequence it describes (useful for getting CDS nucleotide sequences from GenBank files). Also SeqRecord objects now support addition, giving a new SeqRecord with the combined sequence, all the SeqFeatures, and any common annotation. SQLite support (built into Python 2.5+) was added to our BioSQL interface. This is still a little experimental as we are using a draft BioSQL SQLite schema, but this should be merged into the next BioSQL release. Biopython now includes wrappers for the new NCBI BLAST C++ tools, which will be replacing the old NCBI ?legacy? BLAST tools written in C. The plain text BLAST parser has been updated to cope as well. Nevertheless, we (and the NCBI) still recommend using the XML output for parsing. Bio.Entrez includes the new (Jan 2010) DTD files from the NCBI for parsing MedLine/PubMed data. The NCBI codon tables have been updated from version 3.4 to 3.9, which adds a few extra start codons, and a few new tables (Tables 16, 21, 22 and 23). The restriction enzyme list in Bio.Restriction has been updated to the Nov 2009 release of REBASE. The Bio.PDB parser and output code has been updated to understand the element column in ATOM and HETATM lines, and Bio.PDB.PDBList has been updated for recent changes to the PDB FTP site. Finally, support for running Biopython under Jython (using the Java Virtual Machine) has been much improved. Note that Jython does not support C code, and currently Jython does not parse DTD files (needed for the Bio.Entrez XML parser). However, most of the Biopython modules seem fine from testing Jython 2.5.0 and 2.5.1. Sources and Windows Installers are available from our downloads page. Thanks to the Biopython development team and to everyone who has reported bugs or contributed patches since our last release. From bugzilla-daemon at portal.open-bio.org Tue Dec 15 16:32:28 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Dec 2009 11:32:28 -0500 Subject: [Biopython-dev] [Bug 2895] Bio.Restriction.Restriction_Dictionary Jython Error Fix+Patch In-Reply-To: Message-ID: <200912151632.nBFGWS6a022173@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2895 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-15 11:32 EST ------- Fixed in Biopython 1.53, using a similar technique but complicated because this file is generated by a separate script. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 15 16:32:46 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Dec 2009 11:32:46 -0500 Subject: [Biopython-dev] [Bug 2892] Jython MatrixInfo.py fix+patch In-Reply-To: Message-ID: <200912151632.nBFGWkSA022203@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2892 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-15 11:32 EST ------- Fixed in Biopython 1.53 using a similar technique. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 15 16:32:48 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Dec 2009 11:32:48 -0500 Subject: [Biopython-dev] [Bug 2895] Bio.Restriction.Restriction_Dictionary Jython Error Fix+Patch In-Reply-To: Message-ID: <200912151632.nBFGWm0Q022215@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2895 Bug 2895 depends on bug 2892, which changed state. Bug 2892 Summary: Jython MatrixInfo.py fix+patch http://bugzilla.open-bio.org/show_bug.cgi?id=2892 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 15 16:32:51 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Dec 2009 11:32:51 -0500 Subject: [Biopython-dev] [Bug 2893] Jython test_prosite fix+patch In-Reply-To: Message-ID: <200912151632.nBFGWpCp022227@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2893 Bug 2893 depends on bug 2892, which changed state. Bug 2892 Summary: Jython MatrixInfo.py fix+patch http://bugzilla.open-bio.org/show_bug.cgi?id=2892 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 15 16:33:13 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Dec 2009 11:33:13 -0500 Subject: [Biopython-dev] [Bug 2893] Jython test_prosite fix+patch In-Reply-To: Message-ID: <200912151633.nBFGXD3Y022254@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2893 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-15 11:33 EST ------- Fixed in Biopython 1.53 using a similar technique -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 15 16:33:15 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Dec 2009 11:33:15 -0500 Subject: [Biopython-dev] [Bug 2895] Bio.Restriction.Restriction_Dictionary Jython Error Fix+Patch In-Reply-To: Message-ID: <200912151633.nBFGXFa7022266@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2895 Bug 2895 depends on bug 2893, which changed state. Bug 2893 Summary: Jython test_prosite fix+patch http://bugzilla.open-bio.org/show_bug.cgi?id=2893 What |Old Value |New Value ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 15 16:41:30 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Dec 2009 11:41:30 -0500 Subject: [Biopython-dev] [Bug 2807] Clustalw return codes In-Reply-To: Message-ID: <200912151641.nBFGfUpS022532@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2807 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WONTFIX ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-15 11:41 EST ------- Bio.Clustalw was declared obsolete in Release 1.52, so there is no reason to add better support for return codes. With the new alignment wrappers and subprocess this is a non-issue. Marking as "won't fix". -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Dec 15 16:46:17 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 15 Dec 2009 11:46:17 -0500 Subject: [Biopython-dev] [Bug 2820] Convert test_PDB.py to unittest In-Reply-To: Message-ID: <200912151646.nBFGkHAG022705@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2820 ------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-15 11:46 EST ------- (In reply to comment #1) > > I've checked in a slightly modified version as test_PDB_unit.py - I think > having both this and the original test_PDB.py is sensible in the short term. > I've just removed old print-and-compare test_PDB.py, then renamed test_PDB_unit.py to test_PDB.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Dec 15 17:01:38 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 15 Dec 2009 17:01:38 +0000 Subject: [Biopython-dev] Biopython 1.53 released Message-ID: <320fb6e00912150901k138ae04bmc5d5af9c867340ec@mail.gmail.com> Dear Biopythoneers, We are pleased to announce the availability of Biopython 1.53, a new stable release of the Biopython library, three months after the release of Biopython 1.52. This is our first release since migrating from CVS to git for source code control. There have been some additions to our core objects ? the Seq (and related UnknownSeq) objects gained upper and lower methods (like the string methods of the same name but alphabet aware) plus a new ungap method. The SeqFeature object now has an extract method to get the region of sequence it describes (useful for getting CDS nucleotide sequences from GenBank files). Also SeqRecord objects now support addition, giving a new SeqRecord with the combined sequence, all the SeqFeatures, and any common annotation. SQLite support (built into Python 2.5+) was added to our BioSQL interface. This is still a little experimental as we are using a draft BioSQL SQLite schema, but this should be merged into the next BioSQL release. Biopython now includes wrappers for the new NCBI BLAST C++ tools, which will be replacing the old NCBI ?legacy? BLAST tools written in C. The plain text BLAST parser has been updated to cope as well. Nevertheless, we (and the NCBI) still recommend using the XML output for parsing. Bio.Entrez includes the new (Jan 2010) DTD files from the NCBI for parsing MedLine/PubMed data. The NCBI codon tables have been updated from version 3.4 to 3.9, which adds a few extra start codons, and a few new tables (Tables 16, 21, 22 and 23). The restriction enzyme list in Bio.Restriction has been updated to the Nov 2009 release of REBASE. The Bio.PDB parser and output code has been updated to understand the element column in ATOM and HETATM lines, and Bio.PDB.PDBList has been updated for recent changes to the PDB FTP site. Finally, support for running Biopython under Jython (using the Java Virtual Machine) has been much improved. Note that Jython does not support C code, and currently Jython does not parse DTD files (needed for the Bio.Entrez XML parser). However, most of the Biopython modules seem fine from testing Jython 2.5.0 and 2.5.1. Sources and Windows Installers are available from our downloads page. Thanks to the Biopython development team and to everyone who has reported bugs or contributed patches since our last release. --Peter, on behalf of the Biopython developers P.S. This news post is online at http://news.open-bio.org/news/2009/12/biopython-release-153/ You may wish to subscribe to our news feed. For RSS links etc, see: http://biopython.org/wiki/News Biopython news is also on twitter: http://twitter.com/biopython From chapmanb at 50mail.com Wed Dec 16 12:42:35 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 16 Dec 2009 07:42:35 -0500 Subject: [Biopython-dev] Code freeze for Biopython 1.53 In-Reply-To: <320fb6e00912150828q5d3901deq162f14db458f980d@mail.gmail.com> References: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com> <320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com> <320fb6e00912150828q5d3901deq162f14db458f980d@mail.gmail.com> Message-ID: <20091216124235.GK78379@sobchak.mgh.harvard.edu> Hi Peter; > >> I plan to do the Biopython 1.53 release this afternoon (in a few hours time). Sorry I am too slow with your mails. Thanks for the hard work getting this together. Great stuff. > > Brad - could you run a sanity test before uploading to pypi? Looks good to me, and uploaded to pypi. > Draft text below - any comments? As a thought for next time, what do you think about adding the names of people who have worked on the items mentioned in the release? This would give a bit more public recognition for the contributions, especially to people who only look at the release notes and not mailing list traffic. Thanks again, Brad From biopython at maubp.freeserve.co.uk Wed Dec 16 22:43:16 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 16 Dec 2009 22:43:16 +0000 Subject: [Biopython-dev] Code freeze for Biopython 1.53 In-Reply-To: <20091216124235.GK78379@sobchak.mgh.harvard.edu> References: <320fb6e00912150245p34b40aabqd4f7f296cb7979a7@mail.gmail.com> <320fb6e00912150729g36fd5e8dp924f07c1eec0d1cb@mail.gmail.com> <320fb6e00912150828q5d3901deq162f14db458f980d@mail.gmail.com> <20091216124235.GK78379@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00912161443q30f82120of1c98b073136c3f6@mail.gmail.com> Brad wrote: >> Brad - could you run a sanity test before uploading to pypi? > > Looks good to me, and uploaded to pypi. Great, thank you. >> Draft text below - any comments? > > As a thought for next time, what do you think about adding the > names of people who have worked on the items mentioned in the > release? This would give a bit more public recognition for the > contributions, especially to people who only look at the release > notes and not mailing list traffic. Its too late for the emails and the source code bundles, but the nice thing about the NEWS file (in the repository) and the OBF news server is we can update them even now. Of course, quite where to draw the line is debatable - a simple patch probably doesn't warrant it (or does it?), but solving a more complex bug or adding some new functionality does. If any existing core developers want more "recognition" we can do that too. For example, Kyle, would you have like to be named with regards to the Jython work? I almost put you in anyway, but in the end just mentioned it on twitter: http://twitter.com/Biopython/statuses/6502469425 Another idea to showcase new features would be for the author(s) to prepare a (credited) blog post with some examples (to put on our news server). I have already done a few like this, and think it would also be a good thing in general. Peter From kellrott at gmail.com Thu Dec 17 01:39:49 2009 From: kellrott at gmail.com (Kyle Ellrott) Date: Wed, 16 Dec 2009 17:39:49 -0800 Subject: [Biopython-dev] zxJDBC support for BioSQL Message-ID: I've push pushed a patch to the BioSQL code that enables zxJDBC support. This means that Jython can now run BioSQL through mysql. (SQLite hasn't been ported to Java yet) zxJDBC is a Jython module included in the standard distribution that provides a PythonDB interface through the java sql interfaces. I've only ran the unit tests using the mysql-connector, but it should theoretically work with Oracle as well. The biggest issues for changing code: - Java expects ? instead of %s, so sql strings have to be altered (I override the execute method in the DBUtils to run a regular express before execution) - A Sql string with a=? works, one with a='?' does not (Loader.py had some examples of this) - Java returns unicode, not strings (recent patch to the mainline fixes this) Code can be found at http://github.com/kellrott/biopython Kyle From biopython at maubp.freeserve.co.uk Thu Dec 17 10:46:37 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 17 Dec 2009 10:46:37 +0000 Subject: [Biopython-dev] zxJDBC support for BioSQL In-Reply-To: References: Message-ID: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com> On Thu, Dec 17, 2009 at 1:39 AM, Kyle Ellrott wrote: > > I've push pushed a patch to the BioSQL code that enables zxJDBC support. > This means that Jython can now run BioSQL through mysql. ?(SQLite hasn't > been ported to Java yet) > zxJDBC is a Jython module included in the standard distribution that > provides a PythonDB interface through the java sql interfaces. ?I've only > ran the unit tests using the mysql-connector, but it should theoretically > work with Oracle as well. Sounds good, and ought to work on PostgreSQL too in theory. I should be able to test it on MySQL. > The biggest issues for changing code: > ?- Java expects ? instead of %s, so sql strings have to be altered (I > override the execute method in the DBUtils to run a regular express > before execution) > ?- A Sql string with a=? works, one with a='?' does not (Loader.py had some > examples of this) > ?- Java returns unicode, not strings (recent patch to the mainline fixes > this) Some of those issues applied to SQLite (hence the changes on the trunk from Brad). > Code can be found at http://github.com/kellrott/biopython Lovely. That's on your jython branch (along with lots of your other work)? Peter From biopython at maubp.freeserve.co.uk Thu Dec 17 13:31:30 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 17 Dec 2009 13:31:30 +0000 Subject: [Biopython-dev] zxJDBC support for BioSQL In-Reply-To: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com> References: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com> Message-ID: <320fb6e00912170531j3f9c9b38n123e0464fa536e45@mail.gmail.com> On Thu, Dec 17, 2009 at 10:46 AM, Peter wrote: > On Thu, Dec 17, 2009 at 1:39 AM, Kyle Ellrott wrote: >> >> I've push pushed a patch to the BioSQL code that enables zxJDBC support. >> This means that Jython can now run BioSQL through mysql. ?(SQLite hasn't >> been ported to Java yet) >> zxJDBC is a Jython module included in the standard distribution that >> provides a PythonDB interface through the java sql interfaces. ?I've only >> ran the unit tests using the mysql-connector, but it should theoretically >> work with Oracle as well. > > Sounds good, and ought to work on PostgreSQL too in theory. > > I should be able to test it on MySQL. I worked out I needed to install MySQL Connector/J so that org.gjt.mm.mysql.Driver works in Jython, get it from here: http://dev.mysql.com/downloads/connector/j/ Installation seems to be just unzipping this and updating your CLASSPATH environment variable to point at the jar file. Peter From biopython at maubp.freeserve.co.uk Thu Dec 17 14:54:13 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 17 Dec 2009 14:54:13 +0000 Subject: [Biopython-dev] zxJDBC support for BioSQL In-Reply-To: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com> References: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com> Message-ID: <320fb6e00912170654g41bc8c4eyce0f56b4472076f9@mail.gmail.com> On Thu, Dec 17, 2009 at 10:46 AM, Peter wrote: > On Thu, Dec 17, 2009 at 1:39 AM, Kyle Ellrott wrote: >> >> I've push pushed a patch to the BioSQL code that enables zxJDBC support. >> This means that Jython can now run BioSQL through mysql. ?(SQLite hasn't >> been ported to Java yet) Maybe one day Jython will have a Python sqlite3 like library built in: http://bugs.jython.org/issue1682864 For now it looks like we could probably use SQLite via zxJDBC instead (see links on that Jython issue). Peter From kellrott at gmail.com Thu Dec 17 18:03:38 2009 From: kellrott at gmail.com (Kyle Ellrott) Date: Thu, 17 Dec 2009 10:03:38 -0800 Subject: [Biopython-dev] zxJDBC support for BioSQL In-Reply-To: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com> References: <320fb6e00912170246p64956c9ft85c0d288c078e097@mail.gmail.com> Message-ID: > > Code can be found at http://github.com/kellrott/biopython > > Lovely. That's on your jython branch (along with lots of your other work)? > Yes, but all of the zxJDBC work has been done in the past 2 weeks (just the last three commits), so it should be easy to cherry-pick out the relevant patches. Kyle From mhampton at d.umn.edu Thu Dec 17 18:42:33 2009 From: mhampton at d.umn.edu (Marshall Hampton) Date: Thu, 17 Dec 2009 12:42:33 -0600 (CST) Subject: [Biopython-dev] code credits In-Reply-To: References: Message-ID: I strongly encourage you to list anyone who has contributed a patch, no matter how small. This has worked very well for the Sage project (www.sagemath.org) where credit is given to all contributors and reviewers (every patch must be reviewed by at least one other person). For example see: http://groups.google.com/group/sage-announce/msg/bcf5591837068b5f Marshall Hampton Department of Mathematics and Statistics University of Minnesota, Duluth > Message: 1 > Date: Wed, 16 Dec 2009 22:43:16 +0000 > From: Peter > Subject: Re: [Biopython-dev] Code freeze for Biopython 1.53 > To: Brad Chapman , biopython-dev at biopython.org > Message-ID: > <320fb6e00912161443q30f82120of1c98b073136c3f6 at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Brad wrote: >>> Brad - could you run a sanity test before uploading to pypi? >> >> Looks good to me, and uploaded to pypi. > > Great, thank you. > >>> Draft text below - any comments? >> >> As a thought for next time, what do you think about adding the >> names of people who have worked on the items mentioned in the >> release? This would give a bit more public recognition for the >> contributions, especially to people who only look at the release >> notes and not mailing list traffic. > > Its too late for the emails and the source code bundles, but > the nice thing about the NEWS file (in the repository) and > the OBF news server is we can update them even now. > > Of course, quite where to draw the line is debatable - a simple > patch probably doesn't warrant it (or does it?), but solving a > more complex bug or adding some new functionality does. > If any existing core developers want more "recognition" we > can do that too. > > For example, Kyle, would you have like to be named with > regards to the Jython work? I almost put you in anyway, > but in the end just mentioned it on twitter: > http://twitter.com/Biopython/statuses/6502469425 > > Another idea to showcase new features would be for the > author(s) to prepare a (credited) blog post with some > examples (to put on our news server). I have already done > a few like this, and think it would also be a good thing in > general. > > Peter From kellrott at gmail.com Thu Dec 17 21:20:10 2009 From: kellrott at gmail.com (Kyle Ellrott) Date: Thu, 17 Dec 2009 13:20:10 -0800 Subject: [Biopython-dev] code credits In-Reply-To: References: Message-ID: I would agree with that. Drawing from broad stereotypes, I would think that a majority of contributors are academic and would be most interested in adding things to their CV. So acknowledgment would be of great value to them at no real cost to the Biopython project. Plus there's the old idea that the more authors a paper has the more important it must be. Kyle I strongly encourage you to list anyone who has contributed a patch, no > matter how small. This has worked very well for the Sage project ( > www.sagemath.org) where credit is given to all contributors and reviewers > (every patch must be reviewed by at least one other person). For example > see: > > http://groups.google.com/group/sage-announce/msg/bcf5591837068b5f > > Marshall Hampton > Department of Mathematics and Statistics > University of Minnesota, Duluth > > From tallpaulinjax at yahoo.com Thu Dec 17 21:48:25 2009 From: tallpaulinjax at yahoo.com (Paul B) Date: Thu, 17 Dec 2009 13:48:25 -0800 (PST) Subject: [Biopython-dev] code credits In-Reply-To: Message-ID: <928490.72367.qm@web30708.mail.mud.yahoo.com> I also agree completely. Adding value to the code deserves some form of credit, if desired by the contributor. I fixed a bit of code in a couple of the modules and received no credit... that made me a good bit less gung ho about contributing more. --- On Thu, 12/17/09, Kyle Ellrott wrote: From: Kyle Ellrott Subject: Re: [Biopython-dev] code credits To: "Marshall Hampton" Cc: biopython-dev at lists.open-bio.org Date: Thursday, December 17, 2009, 4:20 PM I would agree with that.? Drawing from broad stereotypes, I would think that a majority of contributors are academic and would be most interested in adding things to their CV.? So acknowledgment would be of great value to them at no real cost to the Biopython project.? Plus there's the old idea that the more authors a paper has the more important it must be. Kyle I strongly encourage you to list anyone who has contributed a patch, no > matter how small.? This has worked very well for the Sage project ( > www.sagemath.org) where credit is given to all contributors and reviewers > (every patch must be reviewed by at least one other person).? For example > see: > > http://groups.google.com/group/sage-announce/msg/bcf5591837068b5f > > Marshall Hampton > Department of Mathematics and Statistics > University of Minnesota, Duluth > > _______________________________________________ Biopython-dev mailing list Biopython-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Thu Dec 17 22:54:40 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 17 Dec 2009 22:54:40 +0000 Subject: [Biopython-dev] code credits In-Reply-To: <928490.72367.qm@web30708.mail.mud.yahoo.com> References: <928490.72367.qm@web30708.mail.mud.yahoo.com> Message-ID: <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com> Hi all, Marshall Hampton's description of how they do it on Sage sounds worth trying - if we keep track as things are checked in, it won't be too much work either. Do you (sage) have a list of guidelines for what qualifies for a credit? On Thu, Dec 17, 2009 at 9:48 PM, Paul B wrote: > > I also agree completely. Adding value to the code deserves > some form of credit, if desired by the contributor. I fixed a bit > of code in a couple of the modules and received no credit... > that made me a good bit less gung ho about contributing more. > Sorry :( You didn't get no credit at all though, you were named in the commit: http://github.com/biopython/biopython/commit/225fb0eb92c99018c3710c3ec5ac0b22e9706208 Also people who offer changes via github that can be merged cleanly onto the trunk, or cherry-picked would also automatically get a credit in the repository history. Would someone like to go through the git log for Biopython 1.53 for a full list? e.g. Hongbo Zhu and Frederik Gwinner contributed to a PDB enhancement (Bug 2495), and as he pointed out, so did Paul B (again, PDB stuff). These were the "border line" cases I had in mind here: http://lists.open-bio.org/pipermail/biopython-dev/2009-December/007161.html >From personal experience contributing to other open source project, getting a credit in release notes even for a small bug fix/enhancement as in sage is rare. So while I thought I was following OS norms in writing the last release notes, we can certainly do this differently in future. Regards, Peter From mhampton at d.umn.edu Fri Dec 18 01:54:00 2009 From: mhampton at d.umn.edu (Marshall Hampton) Date: Thu, 17 Dec 2009 19:54:00 -0600 (CST) Subject: [Biopython-dev] code credits In-Reply-To: <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com> References: <928490.72367.qm@web30708.mail.mud.yahoo.com> <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com> Message-ID: On Thu, 17 Dec 2009, Peter wrote: > Marshall Hampton's description of how they do it on Sage > sounds worth trying - if we keep track as things are checked > in, it won't be too much work either. Do you (sage) have a > list of guidelines for what qualifies for a credit? I don't think we have formal guidelines, but the process is pretty simple. Whoever works on a patch in our bug/feature tracker has to flag it for review. Both the person who implements the patch and the reviewer get credit. It doesn't matter if its a 1-character change to the documentation, they're listed in the release notes. Basically, the idea is to err (if that's the right word) on the side of acknowledging any contribution. I think that Sage (really William Stein initially) adopting that philosophy is one of the reasons its gone from 1 to 150 or so developers. I'm cc'ing sage-devel in case anyone there wants to comment on this. Cheers, Marshall Hampton Department of Mathematics and Statistics University of Minnesota, Duluth From bugzilla-daemon at portal.open-bio.org Fri Dec 18 09:44:02 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 04:44:02 -0500 Subject: [Biopython-dev] [Bug 2938] Bio.Entrez.read() returns empty string for HTML (not an error) In-Reply-To: Message-ID: <200912180944.nBI9i22n007947@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2938 ------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2009-12-18 04:44 EST ------- The offending XML file (the one that does not start with Message-ID: <200912180946.nBI9kjFA008009@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2943 ------- Comment #11 from mdehoon at ims.u-tokyo.ac.jp 2009-12-18 04:46 EST ------- Peter, are you still looking at this bug report? Otherwise I could have a look at it. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 10:00:50 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 05:00:50 -0500 Subject: [Biopython-dev] [Bug 2698] Attempt at a unit test for MaxEntrophy In-Reply-To: Message-ID: <200912181000.nBIA0opL008316@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2698 ------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2009-12-18 05:00 EST ------- Thanks for your test! I would like to simplify the code a bit though. How about replacing ix, iy= expand_count([0, 0, 1],'C', 40) xm.extend(ix) ym.extend(iy) by xm.extend([0,0,1] * 40) ym.extend(['C'] * 40) Or, you could replace this whole section by xm = [0,0,1]*40 + [0,0,1]*60 + [0,1,0]*75 + [0,1,0]*25 + [1,0,0]*90 + [1,0,0]*10 and similarly for ym. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 10:08:24 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 05:08:24 -0500 Subject: [Biopython-dev] [Bug 2693] LogisticRegression convergence criterion is too lenient In-Reply-To: Message-ID: <200912181008.nBIA8Ogf008537@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2693 ------- Comment #4 from mdehoon at ims.u-tokyo.ac.jp 2009-12-18 05:08 EST ------- Sorry for not getting back to this bug report earlier. (In reply to comment #3) > > Also, it is not necessary to pass old_llik to update_fn; if needed, update_fn > > can store the value of llik on each call. > > I guess this is all how you define the purpose of the update_fn function. > Do you have an example of the update_fn function where old_llik is needed? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 10:17:12 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 05:17:12 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200912181017.nBIAHCJN008837@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #12 from mdehoon at ims.u-tokyo.ac.jp 2009-12-18 05:17 EST ------- One option is to store these variables inside the function. As an example, if this is a module mymodule.py: def f(x = None): if x==None: x = f.x print x f.x = 3 then we can do the following: >>> import mymodule >>> mymodule.f() 3 >>> mymodule.f(5) 5 >>> mymodule.f.x = 9 >>> mymodule.f(5) 5 >>> mymodule.f() 9 >>> But personally, I think that having module-level defaults is not really necessary. We typically don't have that for other functions, and the only reason for having them here is that once upon a time this module had such module-level defaults. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 10:24:35 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 05:24:35 -0500 Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model. In-Reply-To: Message-ID: <200912181024.nBIAOZj6009054@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2943 ------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-18 05:24 EST ------- (In reply to comment #11) > Peter, are you still looking at this bug report? > Otherwise I could have a look at it. Thanks Michiel - Please feel free. I didn't feel we had time to get this into Biopython 1.53, as I think it is going to be a lot of work to assess, but needs to be done. I think there are two issues here, poor support for multiple models, and re-writing the flex parser in pure python. Given time (!) I would want to take Paul's python parser and use it to replace the flex code (which is currently not compiled or installed by default, Bug 2619) and verify it is backwards compatible, and then add in the model support. If we have enough test coverage already, then doing it in one go might be OK. Up to you. Other relevant issues include Bug 2626 (files the current parser can't read - it may turn out that these are also multi-model CIF files). Also regarding the model support, for PDB files we currently index them 0,1,2,... as found in the file. There are also names given in the PDB file itself, which need not by continuous etc. See Bug 2950 and Bug 2951 for this. Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 10:44:13 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 05:44:13 -0500 Subject: [Biopython-dev] [Bug 2938] Bio.Entrez.read() returns empty string for HTML (not an error) In-Reply-To: Message-ID: <200912181044.nBIAiDD6009554@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2938 ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-18 05:44 EST ------- (In reply to comment #6) > The offending XML file (the one that does not start with efetch from the journals database. Upon the EUtils documentation more > carefully, it seems that XML output from the journals database is not > officially supported; only text and html output are supported. One option is > to simply remove the offending XML file from the tests, and raise an error > whenever Entrez.read is presented with data that do not start with Additionally, we could add a parser for the text output generated by efetch > from the journals database. Hmm - sounds like a plan, but maybe drop the Entrez team a query about this. Does the current funny XML file have anything useful in it? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 10:50:03 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 05:50:03 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200912181050.nBIAo39q009740@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 ------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-18 05:50 EST ------- (In reply to comment #12) > > But personally, I think that having module-level defaults is not really > necessary. We typically don't have that for other functions, and the only > reason for having them here is that once upon a time this module had such > module-level defaults. I agree the module-level defaults are not necessary - but it would be "nice" to have a transition where both can be used. In reality, I may being overly cautious - doubt it would affect many (any?) users to just make a clean switch (which would keep the code simple). I'm happy to leave this to your judgement Michiel. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 10:54:24 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 05:54:24 -0500 Subject: [Biopython-dev] [Bug 2947] Bio.HMM calculates wrong viterbi path In-Reply-To: Message-ID: <200912181054.nBIAsOIw009914@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2947 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-18 05:54 EST ------- (In reply to comment #0) > > Thus it appears to me that the viterbi algorithm is not robust enough > and biased towards the last letter of the state alphabet. Quite possibly. Might there be a bug in our code, or do you think this is just an inherent algorithm limitation? Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 11:53:14 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 06:53:14 -0500 Subject: [Biopython-dev] [Bug 2697] MaxEntropy calculate function assumes integer values for class and convergence criteria is hard coded In-Reply-To: Message-ID: <200912181153.nBIBrEQi011286@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2697 mdehoon at ims.u-tokyo.ac.jp changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #14 from mdehoon at ims.u-tokyo.ac.jp 2009-12-18 06:53 EST ------- Fixed in github. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 14:12:26 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 09:12:26 -0500 Subject: [Biopython-dev] [Bug 2947] Bio.HMM calculates wrong viterbi path In-Reply-To: Message-ID: <200912181412.nBIECQ59014801@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2947 ------- Comment #2 from georg.lipps at fhnw.ch 2009-12-18 09:12 EST ------- Hi Peter, I am not an expert of the Viterbi algorithm. But as such the algorithm does not do what is is expected to do. So I guess it is indeed an error in the implementation. I would be very happy if it can be fixed. Greetings, Georg -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Dec 18 16:15:24 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 18 Dec 2009 11:15:24 -0500 Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model. In-Reply-To: Message-ID: <200912181615.nBIGFOD2017597@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2943 ------- Comment #13 from TallPaulInJax at yahoo.com 2009-12-18 11:15 EST ------- Michiel, if you have any questions please feel free to contact me! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rjalves at igc.gulbenkian.pt Fri Dec 18 23:39:28 2009 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Fri, 18 Dec 2009 23:39:28 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> Message-ID: <4B2C12B0.9060806@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Sorry to take this to the discussion list, took a bit longer than I expected to get the approval. Bringing now the subject to the right place. Leaving full quote history to help the reading. Quoting Peter on 12/18/2009 09:39 PM: > Hi Renato, > > I'm cooking dinner while writing this, so it won't be as in depth as > usual... > > On Fri, Dec 18, 2009 at 5:17 PM, Renato Alves wrote: >> [I tried submitting this message to the dev mailing list, but got >> rejected since I'm not yet authorized to post there, so here it goes] > > Have you definitely subscribed to the dev list? That should be all that > is required to post there, and this discussion would be better suited > there. > >> Hi everyone, >> >> I'm working on changes to the Bio.SeqIO.index() function to make it more >> consistent with the .read and .parse i.e. accept a filehandle instead of >> a filename and also to include a way to cache the index into a file to >> speed up the process. >> >> The reason why we are implementing these two is because we were going to >> implement our own index solution until we realized this was added to 1.52. >> >> However the implementation in 1.52 has a few limitations. > > Yes, this was designed to cover basic use cases in a general way, > but with the option in future to do other things - and in particular > saving the index to disk was kept in mind. > >> One limitation is that we are using a gzipped database for the sake of >> space and using gzip.open() to create the file-handle that would then be >> passed to .parse(). The same was not doable with .index(). >> This is already implemented in >> http://github.com/Unode/biopython/commit/6fc390151452e3ddf26a117269132125a3ffb3fe > > That was a deliberate choice in that the index code wants to "own" > the handle. If other code has access to the handle, there is a risky > of different bits of code moving the handle pointer etc. But, if you > are careful it could be done. The way I approached it was to reset the handle pointer to the first position, since we would like to index the full file. But I understand that if the user uses the same handle on different files weird results may happen. Something that could be a simple workaround would be to copy the filehandle object in such a way that it's properties are maintained (like being a gzip.open() filehandle) but it's use doesn't affect the use of the original handle. However I don't know if this is possible. > > There are also issues here in combination with saving the index. > With a filename, the code can easily reopen the file in the same > mode. With a handle, things are more tricky. You have non-file > handles to consider - such as the gzip example. There is also the > problem of recording the file mode (normal text, universal text, > or binary - which we will need for SFF files - code already written). > I see, only after your comment I realized handle.name and handle.mode are only available in normal filehandles. The gzip.open() example stores the filename in .filename while the .mode seems to have a different meaning. > If we do change the code to allow handles, it would have to be > to allow handles OR filenames to be compatible with Biopython > 1.52 and 1.53 (which take just filenames). This could be handled > as in Bio.SeqIO.convert(), which also allows both (which was the > subject of some discussion!). > I'll have to look more on the example and consider the fact that my current implementation breaks compatibility with previous code and that not everything needed (filename, mode,...) is accessible in filehandles. >> The second is that we are going to use this feature to quick search the >> database in a web application. Here we have the limitation that we don't >> have persistence across web requests, which means that we would need to >> recalculate the index on every web request. >> >> The details of how we plan to implement this are the following: >> >> cPickle the internal dictionary of offsets and save it on the database >> folder with the same name as the database + .index. The consistency >> check on whether the file has changed will be performed based on name >> and timestamp. By default .index() will search for this file, check the >> timestamp and use the cache if they match, otherwise they will be >> recalculated. The save function will be available like: >> >>>>> d = SeqIO.index(...) >>>>> d.save(filename) >> where filename is optional and defaults to "%s.index" % _handle.name >> >> We already have a solution like this implemented with subclasses of >> SeqIO._index, it's just a matter of reworking that and merge it into >> BioPython if you consider a good addition to the code. >> >> I would like to hear your comments and suggestions on this. > > Yes, saving indexes is an obvious addition. I have explored > using pickle via shelve, and also SQLite - there are > implementations of this on my github respository, plus > begun to look into the existing OBF Open Biological > Database Access (OBDA) specification for cross project > compatibility. Other potential benefits here are reduced > memory usage if we don't keep the dictionary > of offsets in RAM. I did try to use pickle directly on the dict like object that is returned from SeqIO.index() but pickle was not happy with it. The SQLite approach also crossed my mind and also BioSQL or just some custom SQL database, but the RAM approach seemed good enough, at least for our current uses. I can see though that some file formats will require a lot more RAM depending on what is indexed and their size. In the end it came out as cPickled dictionaries for faster access. > > http://github.com/peterjc/biopython/tree/index-shelve > http://github.com/peterjc/biopython/tree/index-sqlite > > There is a potential complication with index sub-classes > which do more specialised indexing (e.g. GenBank files, > and for a more extreme case, SFF files). See: > http://github.com/peterjc/biopython/tree/sff-seqio For these I would have to do it on a unittest base, I'm not familiar with the formats. Also the implementation I did was based on the current master branch of biopython. I now realize a lot more has been done outside of it that I should look into. > > Anyway - great to see you are finding the code useful, > and have some quite similar ideas for how to extend > it further. > > Peter Thanks for all that info, I have a lot to dig into and see if I can actually contribute with something. You seem to have pretty much everything sorted ;) Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkssEqkACgkQYh11EUYTX9QWHwCeOIuuaEGA3qLvB1EHamDohpZ3 bj0AnRAkP9jOGpvTnSc0W7YgFyX/Ard/ =S45W -----END PGP SIGNATURE----- From biopython at maubp.freeserve.co.uk Sat Dec 19 09:57:25 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 19 Dec 2009 09:57:25 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <4B2C12B0.9060806@igc.gulbenkian.pt> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> Message-ID: <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> On Fri, Dec 18, 2009 at 11:39 PM, Renato Alves wrote: > Sorry to take this to the discussion list, took a bit longer than I > expected to get the approval. > > Bringing now the subject to the right place. Leaving full quote history > to help the reading. Thanks. >> That was a deliberate choice in that the index code wants to "own" >> the handle. If other code has access to the handle, there is a risk >> of different bits of code moving the handle pointer etc. But, if you >> are careful it could be done. > > The way I approached it was to reset the handle pointer to the first > position, since we would like to index the full file. But I understand > that if the user uses the same handle on different files weird results > may happen. OK > Something that could be a simple workaround would be to copy the > filehandle object in such a way that it's properties are maintained > (like being a gzip.open() filehandle) but it's use doesn't affect the > use of the original handle. However I don't know if this is possible. That may work for some handles but not others. Worth trying. >> There are also issues here in combination with saving the index. >> With a filename, the code can easily reopen the file in the same >> mode. With a handle, things are more tricky. You have non-file >> handles to consider - such as the gzip example. There is also the >> problem of recording the file mode (normal text, universal text, >> or binary - which we will need for SFF files - code already written). > > I see, only after your comment I realized handle.name and handle.mode > are only available in normal filehandles. The gzip.open() example stores > the filename in .filename while the .mode seems to have a different > meaning. That would make finding out the filename from a handle tricky. >> If we do change the code to allow handles, it would have to be >> to allow handles OR filenames to be compatible with Biopython >> 1.52 and 1.53 (which take just filenames). This could be handled >> as in Bio.SeqIO.convert(), which also allows both (which was the >> subject of some discussion!). > > I'll have to look more on the example and consider the fact that my > current implementation breaks compatibility with previous code and that > not everything needed (filename, mode,...) is accessible in filehandles. OK. >> Yes, saving indexes is an obvious addition. I have explored >> using pickle via shelve, and also SQLite - there are >> implementations of this on my github respository, plus >> begun to look into the existing OBF Open Biological >> Database Access (OBDA) specification for cross project >> compatibility. Other potential benefits here are reduced >> memory usage if we don't keep the dictionary >> of offsets in RAM. > > I did try to use pickle directly on the dict like object that is > returned from SeqIO.index() but pickle was not happy with it. The SQLite > approach also crossed my mind and also BioSQL or just some custom SQL > database, but the RAM approach seemed good enough, at least for our > current uses. I can see though that some file formats will require a lot > more RAM depending on what is indexed and their size. In the end it came > out as cPickled dictionaries for faster access. I agree that an in RAM dictionary works pretty well, even for very large sequence files. In terms of speed, I would expect a two step build index in memory, then save to disk, to be faster than building the index database on disk (which was a bit slow). >> There is a potential complication with index sub-classes >> which do more specialised indexing (e.g. GenBank files, >> and for a more extreme case, SFF files). See: >> http://github.com/peterjc/biopython/tree/sff-seqio > > For these I would have to do it on a unittest base, I'm not familiar > with the formats. Also the implementation I did was based on > the current master branch of biopython. I now realize a lot more > has been done outside of it that I should look into. I'm sorry if the discussion on the (dev) mailing list wasn't clearer - but having a fresh set of eyes looking at the topic is very useful. >> Anyway - great to see you are finding the code useful, >> and have some quite similar ideas for how to extend >> it further. > > Thanks for all that info, I have a lot to dig into and see if I can > actually contribute with something. You seem to have pretty much > everything sorted ;) Well, i hadn't been thinking about gzipped files (or any archives). How does gzip behave with memory use? I assume it doesn't load everything into RAM, but does allow you random access (seek and tell). This is a vague idea (which I haven't tried yet), but maybe the Bio.SeqIO.index() function could take an optional argument (gzip=True, or something more general like archive=...) which would cause the file to be opened via the gzip module instead? Regards, Peter From bugzilla-daemon at portal.open-bio.org Sat Dec 19 11:02:44 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Dec 2009 06:02:44 -0500 Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text output with NCBStandalone.PSIBlastParser In-Reply-To: Message-ID: <200912191102.nBJB2iOb014900@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2927 ------- Comment #6 from robfsouza at gmail.com 2009-12-19 06:02 EST ------- Created an attachment (id=1412) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1412&action=view) Testcase for NCBI's BLAST alignment with errors This is a sequence from Naegleria gruberi and blastpgp output which reproduces a reported bug in NCBI's blastpgp output at the first iteration (see hit against sequence gi|156552846|ref|XP_001600053.1). Search parameters were blastpgp -d nr -i Ngru1000013938.fa -o Ngru1000013938.fa.br -a 8 -j 1 -b 10000 -v 10000 -h 0.01 -I T -m 0 -M BLOSUM62 -F F -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 19 11:21:13 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Dec 2009 06:21:13 -0500 Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text output with NCBStandalone.PSIBlastParser In-Reply-To: Message-ID: <200912191121.nBJBLDax015457@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2927 robfsouza at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #1412 is|0 |1 obsolete| | ------- Comment #7 from robfsouza at gmail.com 2009-12-19 06:21 EST ------- Created an attachment (id=1413) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1413&action=view) Testcase for NCBI's BLAST alignment with errors Sending the right query sequence now (my mistake! :)) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 19 12:09:57 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 19 Dec 2009 07:09:57 -0500 Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text output with NCBStandalone.PSIBlastParser In-Reply-To: Message-ID: <200912191209.nBJC9vxr016459@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2927 ------- Comment #8 from ibdeno at gmail.com 2009-12-19 07:09 EST ------- (In reply to comment #7) Just to confirm that I can reproduce the 'Query: 0' with blastpgp 2.2.22 using Robson's test case. Thanks to Robson for this and apologies for not having been able to send a test case. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From rjalves at igc.gulbenkian.pt Sat Dec 19 21:48:10 2009 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Sat, 19 Dec 2009 21:48:10 +0000 Subject: [Biopython-dev] SeqIO.index improvement suggestions In-Reply-To: <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> Message-ID: <4B2D4A1A.6@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > Well, i hadn't been thinking about gzipped files (or any archives). > How does gzip behave with memory use? I assume it doesn't > load everything into RAM, but does allow you random access > (seek and tell). - From what I can tell, in terms of RAM it behaves the same way as a normal open() it only decompresses the segments as they are accessed but doesn't cache them. A reasonable trade-off between space and access time. > This is a vague idea (which I haven't tried yet), but maybe the > Bio.SeqIO.index() function could take an optional argument > (gzip=True, or something more general like archive=...) which > would cause the file to be opened via the gzip module instead? I thought about something similar but using a combination of extension of the file and magic (or actually python-magic[1]). The first one is potentially messy although it's how things are mostly done in Windows. The second one I couldn't confirm if is available for Windows but is widely present in Linux (and I suppose MacOS too). In the end I dislike the idea of 'having' to use one approach or the other depending on the OS the code is running on, however this would fit in without breaking any compatibility with current code. 1 - http://pypi.python.org/pypi/python-magic/0.1 Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (GNU/Linux) iEYEARECAAYFAkstShgACgkQYh11EUYTX9Tu3wCglh6d3rt/ANU5J45bsceqcQ78 TQ0AnjgIlNhYRMqdzl4jBGYOPdMKOY7D =rqsi -----END PGP SIGNATURE----- From eric.talevich at gmail.com Sat Dec 19 22:42:23 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Sat, 19 Dec 2009 14:42:23 -0800 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> Message-ID: <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> On Sat, Dec 19, 2009 at 1:57 AM, Peter wrote: > This is a vague idea (which I haven't tried yet), but maybe the > Bio.SeqIO.index() function could take an optional argument > (gzip=True, or something more general like archive=...) which > would cause the file to be opened via the gzip module instead? > Or: open=open -- accept a function that opens the file; by default, the built-in open function, but easily replaced by gzip.open, bz2.BZ2File, or a user-defined function to open zip files (since that's less straightforward). Otherwise, since the variety of archive formats supported by the Python standard library is limited, archive='gzip'|'bz2'|'zip' sounds good. -Eric From rjalves at igc.gulbenkian.pt Sun Dec 20 00:08:42 2009 From: rjalves at igc.gulbenkian.pt (Renato Alves) Date: Sun, 20 Dec 2009 00:08:42 +0000 Subject: [Biopython-dev] SeqIO.index improvement suggestions In-Reply-To: <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> Message-ID: <4B2D6B0A.4040008@igc.gulbenkian.pt> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 - - From Eric Talevich on 12/19/2009 10:42 PM: > Or: open=open -- accept a function that opens the file; by default, the > built-in open function, but easily replaced by gzip.open, bz2.BZ2File, > or a user-defined function to open zip files (since that's less > straightforward). > > Otherwise, since the variety of archive formats supported by the Python > standard library is limited, archive='gzip'|'bz2'|'zip' sounds good. I prefer the first option. Flexible, backwards compatible, fits all mentioned cases so far and allows inclusion of other formats. Got my vote on that one. Renato -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (GNU/Linux) iEYEARECAAYFAkstawUACgkQYh11EUYTX9TJbwCgi4fQGQcfaBdJNLbMRsubjz82 4LQAnRgY0IKjwznjtiQzRNd0k8SH4oMN =YNHc -----END PGP SIGNATURE----- From biopython at maubp.freeserve.co.uk Sun Dec 20 18:06:33 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sun, 20 Dec 2009 18:06:33 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> Message-ID: <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> On Sat, Dec 19, 2009 at 10:42 PM, Eric Talevich wrote: > On Sat, Dec 19, 2009 at 1:57 AM, Peter wrote: >> >> This is a vague idea (which I haven't tried yet), but maybe the >> Bio.SeqIO.index() function could take an optional argument >> (gzip=True, or something more general like archive=...) which >> would cause the file to be opened via the gzip module instead? > > Or: open=open -- accept a function that opens the file; by default, the > built-in open function, but easily replaced by gzip.open, bz2.BZ2File, or a > user-defined function to open zip files (since that's less straightforward). That's what I had in mind with the "archive=..." bit (I should have been clearer), but "open" is probably a better name for it (assuming it isn't going to become a reserved word in future versions of Python). > Otherwise, since the variety of archive formats supported by the Python > standard library is limited, archive='gzip'|'bz2'|'zip' sounds good. That would work, but as you say, it is rather limited. Peter From biopython at maubp.freeserve.co.uk Mon Dec 21 11:57:51 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 21 Dec 2009 11:57:51 +0000 Subject: [Biopython-dev] code credits In-Reply-To: References: <928490.72367.qm@web30708.mail.mud.yahoo.com> <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com> Message-ID: <320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com> Hello all, This email has been sent to the Biopython developers list, where we are proposing to include a list of contributors in the Biopython 1.53 and future release notes. I have specifically CC'd Chris Lasher, Hongbo Zhu and Paul B as "new contributors". I don't have an email address for Frederik Gwinner but will send him this message via github instead. On Fri, Dec 18, 2009 at 1:54 AM, Marshall Hampton wrote: > > On Thu, 17 Dec 2009, Peter wrote: >> >> Marshall Hampton's description of how they do it on Sage >> sounds worth trying - if we keep track as things are checked >> in, it won't be too much work either. Do you (sage) have a >> list of guidelines for what qualifies for a credit? > > I don't think we have formal guidelines, but the process is pretty simple. > Whoever works on a patch in our bug/feature tracker has to flag it for > review. ?Both the person who implements the patch and the reviewer get > credit. ?It doesn't matter if its a 1-character change to the documentation, > they're listed in the release notes. ?Basically, the idea is to err (if > that's the right word) on the side of acknowledging any contribution. ... On that basis, this is a (partial?) list for Biopython 1.53, given alphabetically as done by Sage: Bartek Wilczyns Brad Chapman Chris Lasher (first contribution?) Cymon Cox Frank Kauff Frederik Gwinner (first contribution?) Hongbo Zhu (first contribution?) Kyle Ellrott Leighton Pritchard Michiel de Hoon Paul B (first contribution?) Peter Cock Am I missing anyone? Have I spelt all the names right? (Actually a serious question - I recently made a typo on a git commit comment and miss-typed Leighton's surname). We can update the release note on the news server/blog to include this, and send round another announcement email describing this plan. For the source code, I have two suggestions: (1) Include this in the NEWS file for each release, and continue adding names to the single alphabetical list in the CONTRIBUTORS file. (2) Don't included the list of names in the NEWS file, but instead put them in the CONTRIBUTORS file. This can have a section for each future release, with all the existing entries listed as contributors up to and including Biopython 1.52. I prefer the second option - the NEWS file is already quite long, and can refer to the CONTRIBUTORS file (e.g. for Biopython 1.53 we can have a line "(At least) 12 people contributed to this release, including 4 first time contributors". Peter From chapmanb at 50mail.com Mon Dec 21 13:23:39 2009 From: chapmanb at 50mail.com (Brad Chapman) Date: Mon, 21 Dec 2009 08:23:39 -0500 Subject: [Biopython-dev] code credits In-Reply-To: <320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com> References: <928490.72367.qm@web30708.mail.mud.yahoo.com> <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com> <320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com> Message-ID: <20091221132339.GC21580@sobchak.mgh.harvard.edu> Hi Peter; Awesome. Nice to see all the new and familiar names from this latest release. > (1) Include this in the NEWS file for each release, and continue adding > names to the single alphabetical list in the CONTRIBUTORS file. I'd rather see it this way, which is a bit more informal and in context. Something along the lines of: Bob Jones added the FooBar module for parsing the latest NCBI file format. or: Several bug fixes were committed to the PDB module. Thanks to Joe Smith, Steve P and Jorge Garcia for their patches. If people contributed to something that didn't make the new cut, then we could just list additional contributors near the end. The goal should be to recognize people if they contributed to a release by having their name somewhere in the release notes. For core contributors like yourself, you probably don't want your name next to everything so pick a couple of your favorites for attribution. Brad From biopython at maubp.freeserve.co.uk Mon Dec 21 14:34:38 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 21 Dec 2009 14:34:38 +0000 Subject: [Biopython-dev] code credits In-Reply-To: <20091221132339.GC21580@sobchak.mgh.harvard.edu> References: <928490.72367.qm@web30708.mail.mud.yahoo.com> <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com> <320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com> <20091221132339.GC21580@sobchak.mgh.harvard.edu> Message-ID: <320fb6e00912210634o77d9eb9ex21e4ec3630dd1ed6@mail.gmail.com> On Mon, Dec 21, 2009 at 1:23 PM, Brad Chapman wrote: > > Hi Peter; > Awesome. Nice to see all the new and familiar names from this latest > release. > >> (1) Include this in the NEWS file for each release, and continue adding >> names to the single alphabetical list in the CONTRIBUTORS file. > > I'd rather see it this way, which is a bit more informal and in > context. Something along the lines of: > > Bob Jones added the FooBar module for parsing the latest NCBI > file format. > > or: > > Several bug fixes were committed to the PDB module. Thanks to Joe > Smith, Steve P and Jorge Garcia for their patches. > > If people contributed to something that didn't make the new cut, then we > could just list additional contributors near the end. The goal should > be to recognize people if they contributed to a release by having > their name somewhere in the release notes. For core contributors like > yourself, you probably don't want your name next to everything so pick a > couple of your favorites for attribution. OK - some under your option (3?), the CONTRIBOTORS file is kept in the existing style, and the NEWS file also continues in a similar *style* to before, but making a more concious effort to include names next to noteworthy features, and ensure any other contributors get included at the end (e.g. "Plus miscelaneous bug fixes from X, Y and Z"). That seems fine. Peter From bugzilla-daemon at portal.open-bio.org Mon Dec 21 15:34:22 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Dec 2009 10:34:22 -0500 Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text output with NCBStandalone.PSIBlastParser In-Reply-To: Message-ID: <200912211534.nBLFYMKt002285@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2927 ------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2009-12-21 10:34 EST ------- (In reply to comment #8) > (In reply to comment #7) > Just to confirm that I can reproduce the 'Query: 0' with blastpgp 2.2.22 > using Robson's test case. Thanks to Robson for this and apologies for not > having been able to send a test case. I was also able to confirmed the problem is present in blastpgp 2.2.22, however it seems to have been fixed in the "new" BLAST+ suite, psiblast 2.2.22+ as described here: http://lists.open-bio.org/pipermail/bioperl-l/2009-December/031811.html Given this new information, this does look like an NCBI BLAST bug, and not a problem in Biopython itself. We *might* be able to get our parser to cope with the funny BLAST output, but it does look difficult and risky to me. Miguel - Is it possible the BLAST bug is relatively recent and first showed up when you updated blastpgp to 2.2.18? Regards, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Dec 21 16:48:50 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 21 Dec 2009 16:48:50 +0000 Subject: [Biopython-dev] code credits In-Reply-To: <320fb6e00912210634o77d9eb9ex21e4ec3630dd1ed6@mail.gmail.com> References: <928490.72367.qm@web30708.mail.mud.yahoo.com> <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com> <320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com> <20091221132339.GC21580@sobchak.mgh.harvard.edu> <320fb6e00912210634o77d9eb9ex21e4ec3630dd1ed6@mail.gmail.com> Message-ID: <320fb6e00912210848x449fd73al4e97d3c9e21cf4@mail.gmail.com> Peter wrote this (with spelling fixed): > > OK - some under your option (3?), the CONTRIBUTORS file is kept > in the existing style, and the NEWS file also continues in a similar > *style* to before, but making a more concious effort to include names > next to noteworthy features, and ensure any other contributors get > included at the end (e.g. "Plus miscellaneous bug fixes from X, Y > and Z"). > Actually, looking over this again, if we want to include a "Sage style" list of names in the release notes (which looks good), it really would be easier if we kept this list of names in that format within the repository (updating it as needed when new code is checked in). The NEWS and CONTRIBUTORS files are the obvious places to do this. With Brad's outline (3), or at least how I understood it (and maybe I misunderstood you Brad), the NEWS file would have the contributor names for each release, but not in a format where they can be copy and pasted to put together a release notice. Meanwhile the CONTRIBUTORS file would continue as a single list of all contributions to date. This means whomever writes the release notice has to synthesise the contributor list by hand, which is tedious and risks omitting people. My earlier suggestions had the list of names in the NEWS file for each release (1), or in the CONTRIBUTORS file broken down by release (2). These options seem better to me just from a practical point of view - and we can still also credit people in the main text of the NEWS file as we do now if appropriate. So, how about a merger of (1) and (3)? i.e. * The CONTRIBUTORS file remains a single alphabetical list of all contributors to date (no change). * Entries in the NEWS file for new features etc may continue to credit authors as appropriate. * The NEWS file will include at the end of each release section an alphabetical list of contributors for that release (with new contributors flagged). This will be re-used in the release notice. Peter From bugzilla-daemon at portal.open-bio.org Mon Dec 21 16:49:29 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 21 Dec 2009 11:49:29 -0500 Subject: [Biopython-dev] [Bug 2966] Primer3Commandline does not use EMBOSS 6.1.0 arguments In-Reply-To: Message-ID: <200912211649.nBLGnTed003915@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2966 ------- Comment #2 from lpritc at scri.sari.ac.uk 2009-12-21 11:49 EST ------- I also found an issue with the PrimerSearchCommandline. The command line options -sequences and -primers do not appear to be used in EMBOSS6.1.0, having been replaced by -seqall and -infile, respectively. I changed the options accordingly, and the modified files are available at http://github.com/widdowquinn/biopython/tree/emboss-branch. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From p.j.a.cock at googlemail.com Tue Dec 22 09:25:27 2009 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 22 Dec 2009 09:25:27 +0000 Subject: [Biopython-dev] Fwd: Debian python-biopython packaging for Biopython 1.53 In-Reply-To: References: <320fb6e00908110407x2c4132f8va17e19aaf2b224d0@mail.gmail.com> <48B3E023-F75A-4F50-90CE-6FDA7DDA9E4C@ini.phys.ethz.ch> <320fb6e00908120308w5077f598i428b6011912c6f37@mail.gmail.com> <783F8F61-58D6-4638-B2C7-5C206C321C13@ini.phys.ethz.ch> <320fb6e00908190305o3cb4523ct1645b98f4b284d43@mail.gmail.com> <4151f0acb1da52f12d3f08419d3171e9@ini.phys.ethz.ch> <320fb6e00908200748g78485c64kc19cea88c7c4cee@mail.gmail.com> Message-ID: <320fb6e00912220125w50a600c1xcf5e4750d70b39ca@mail.gmail.com> Hi all, Do any of our C experts know if this compilation warning is important (under Linux Debain, query raised by Philipp Benner who kindly packages Biopython for Debian, which also get used on Ubuntu). Thanks, Peter ---------- Forwarded message ---------- From: Philipp Benner Date: Mon, Dec 21, 2009 at 6:34 PM Subject: Debian python-biopython packaging for Biopython 1.53 To: Peter Cock Hey Peter, I just uploaded the new release. Just a minor question: dpkg-shlibdeps: warning: dependency on libpthread.so.0 could be avoided if "debian/python-biopython/usr/lib/pyshared/python2.5/Bio/Cluster/cluster.so debian/python-biopython/usr/lib/pyshared/python2.5/Bio/Motif/_pwm.so debian/python-biopython/usr/lib/pyshared/python2.5/Bio/KDTree/_CKDTree.so debian/python-biopython/usr/lib/pyshared/python2.4/Bio/trie.so debian/python-biopython/usr/lib/pyshared/python2.5/Bio/PDB/mmCIF/MMCIFlex.so debian/python-biopython/usr/lib/pyshared/python2.4/Bio/Cluster/cluster.so debian/python-biopython/usr/lib/pyshared/python2.5/Bio/trie.so debian/python-biopython/usr/lib/pyshared/python2.4/Bio/Motif/_pwm.so debian/python-biopython/usr/lib/pyshared/python2.5/Bio/cMarkovModel.so debian/python-biopython/usr/lib/pyshared/python2.4/Bio/PDB/mmCIF/MMCIFlex.so debian/python-biopython/usr/lib/pyshared/python2.4/Bio/Nexus/cnexus.so debian/python-biopython/usr/lib/pyshared/python2.4/Bio/cpairwise2.so debian/python-biopython/usr/lib/pyshared/python2.4/Bio/KDTree/_CKDTree.so debian/python-biopython/usr/lib/pyshared/python2.5/Bio/Nexus/cnexus.so debian/python-biopython/usr/lib/pyshared/python2.5/Bio/cpairwise2.so debian/python-biopython/usr/lib/pyshared/python2.4/Bio/cMarkovModel.so" were not uselessly linked against it (they use none of its symbols). is this true? It might also be an error of dpkg-shlibdeps. Regards, Philipp From biopython at maubp.freeserve.co.uk Tue Dec 22 12:14:32 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 22 Dec 2009 12:14:32 +0000 Subject: [Biopython-dev] code credits In-Reply-To: <320fb6e00912210848x449fd73al4e97d3c9e21cf4@mail.gmail.com> References: <928490.72367.qm@web30708.mail.mud.yahoo.com> <320fb6e00912171454v2ce81fc5v93547951d7af84f8@mail.gmail.com> <320fb6e00912210357m32156fdax6639445cadd83217@mail.gmail.com> <20091221132339.GC21580@sobchak.mgh.harvard.edu> <320fb6e00912210634o77d9eb9ex21e4ec3630dd1ed6@mail.gmail.com> <320fb6e00912210848x449fd73al4e97d3c9e21cf4@mail.gmail.com> Message-ID: <320fb6e00912220414t6429f1e5n792e5feeecbe633f@mail.gmail.com> On Mon, Dec 21, 2009 at 4:48 PM, Peter wrote: > So, how about a merger of (1) and (3)? i.e. > > * The CONTRIBUTORS file remains a single alphabetical list > of all contributors to date (no change). > * Entries in the NEWS file for new features etc may continue > to credit authors as appropriate. > * The NEWS file will include at the end of each release section > an alphabetical list of contributors for that release (with new > contributors flagged). This will be re-used in the release notice. I've done that in github - how do the NEWS and CONTRIB file look? http://github.com/biopython/biopython/commit/86d8d99aab894ab5f32a0e7a0c45d63a441da645 I haven't automatically included email addresses for the new contributors since there is a risk of them being harvested for spam, so I figure that should be "opt in". Peter From biopython at maubp.freeserve.co.uk Tue Dec 22 15:34:37 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 22 Dec 2009 15:34:37 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> Message-ID: <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> On Sun, Dec 20, 2009 at 6:06 PM, Peter wrote: > On Sat, Dec 19, 2009 at 10:42 PM, Eric Talevich wrote: >> On Sat, Dec 19, 2009 at 1:57 AM, Peter wrote: >>> >>> This is a vague idea (which I haven't tried yet), but maybe the >>> Bio.SeqIO.index() function could take an optional argument >>> (gzip=True, or something more general like archive=...) which >>> would cause the file to be opened via the gzip module instead? >> >> Or: open=open -- accept a function that opens the file; by default, the >> built-in open function, but easily replaced by gzip.open, bz2.BZ2File, or a >> user-defined function to open zip files (since that's less straightforward). > > That's what I had in mind with the "archive=..." bit (I should have > been clearer), but "open" is probably a better name for it (assuming > it isn't going to become a reserved word in future versions of Python). Proof of concept on github: http://github.com/peterjc/biopython/tree/index-zip This is using open_function as the new argument name (to match the existing key_function and avoid any confusion with the built in name open). I'm open to debate on this. Points to note, this is untested on Windows. In particular we need to look at gzipped plain text files using DOS/Windows new lines (rare case?) plus gzipped plain text files using Unix new lines (likely to be the more common of the two I'd expect). From my initial checks, while gzip.open() does take a mode argument it doesn't seem to support the "rU" value for universal new line read mode. This spoils my plan to give the open_function both the filename and the desired mode (generally "rU", but for SFF files etc we will want to use "rb"). Peter From biopython at maubp.freeserve.co.uk Tue Dec 22 16:08:50 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 22 Dec 2009 16:08:50 +0000 Subject: [Biopython-dev] [Biopython] SeqIO.index improvement suggestions In-Reply-To: <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> References: <4B2BB938.5030709@igc.gulbenkian.pt> <320fb6e00912181339o1a5c4100w6f1957fd4d78d20d@mail.gmail.com> <4B2C12B0.9060806@igc.gulbenkian.pt> <320fb6e00912190157m151c1b49t59b776c5130dad22@mail.gmail.com> <3f6baf360912191442m1ceb36afw824437f703dfaad0@mail.gmail.com> <320fb6e00912201006k5fbfebe4rb61e0538578e6ad@mail.gmail.com> <320fb6e00912220734r197e4baanac78c9188a33ddce@mail.gmail.com> Message-ID: <320fb6e00912220808w53485af8s801e5a24666d9627@mail.gmail.com> On Tue, Dec 22, 2009 at 3:34 PM, Peter wrote: > > Points to note, this is untested on Windows. In particular we need > to look at gzipped plain text files using DOS/Windows new lines > (rare case?) plus gzipped plain text files using Unix new lines > (likely to be the more common of the two I'd expect). From my > initial checks, while gzip.open() does take a mode argument it > doesn't seem to support the "rU" value for universal new line > read mode. This spoils my plan to give the open_function both > the filename and the desired mode (generally "rU", but for SFF > files etc we will want to use "rb"). The gzip mode issue is interesting... running on the Mac, Leopard 10.5, using the Apple provided Python 2.5.2, looking at a gzipped QUAL file everything is fine: Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import gzip >>> gzip.open("Quality/example.qual.gz", "r").read() '>EAS54_6_R1_2_1_413_324\n26 26 18 26 26 26 26 26 26 26 26 26 26 26 26 22 26 26 26 26\n26 26 26 23 23\n>EAS54_6_R1_2_1_540_792\n26 26 26 26 26 26 26 26 26 26 26 22 26 26 26 26 26 12 26 26\n26 18 26 23 18\n>EAS54_6_R1_2_1_443_348\n26 26 26 26 26 26 26 26 26 26 26 24 26 22 26 26 13 22 26 18\n24 18 18 18 18\n' >>> gzip.open("Quality/example.qual.gz", "rb").read() '>EAS54_6_R1_2_1_413_324\n26 26 18 26 26 26 26 26 26 26 26 26 26 26 26 22 26 26 26 26\n26 26 26 23 23\n>EAS54_6_R1_2_1_540_792\n26 26 26 26 26 26 26 26 26 26 26 22 26 26 26 26 26 12 26 26\n26 18 26 23 18\n>EAS54_6_R1_2_1_443_348\n26 26 26 26 26 26 26 26 26 26 26 24 26 22 26 26 13 22 26 18\n24 18 18 18 18\n' >>> gzip.open("Quality/example.qual.gz", "rU").read() '>EAS54_6_R1_2_1_413_324\n26 26 18 26 26 26 26 26 26 26 26 26 26 26 26 22 26 26 26 26\n26 26 26 23 23\n>EAS54_6_R1_2_1_540_792\n26 26 26 26 26 26 26 26 26 26 26 22 26 26 26 26 26 12 26 26\n26 18 26 23 18\n>EAS54_6_R1_2_1_443_348\n26 26 26 26 26 26 26 26 26 26 26 24 26 22 26 26 13 22 26 18\n24 18 18 18 18\n' Looking at a gzipped FASTA file everything is fine: >>> gzip.open("Quality/example.fasta.gz", "r").read() '>EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n>EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n>EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n' >>> gzip.open("Quality/example.fasta.gz", "rb").read() '>EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n>EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n>EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n' >>> gzip.open("Quality/example.fasta.gz", "rU").read() '>EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n>EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n>EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n' But, there is a problem with my gzipped FASTQ file: >>> gzip.open("Quality/example.fastq.gz", "r").read() '@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n' >>> gzip.open("Quality/example.fastq.gz", "rb").read() '@EAS54_6_R1_2_1_413_324\nCCCTTCTTGTCTTCAGCGTTTCTCC\n+\n;;3;;;;;;;;;;;;7;;;;;;;88\n at EAS54_6_R1_2_1_540_792\nTTGGCAGGCCAAGGCCGATGGATCA\n+\n;;;;;;;;;;;7;;;;;-;;;3;83\n at EAS54_6_R1_2_1_443_348\nGTTGCTTCTGGCGTGGGTGGGGGGG\n+\n;;;;;;;;;;;9;7;;.7;393333\n' >>> gzip.open("Quality/example.fastq.gz", "rU").read() Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py", line 220, in read self._read(readsize) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py", line 292, in _read self._read_eof() File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/gzip.py", line 311, in _read_eof raise IOError, "CRC check failed" IOError: CRC check failed I may have stumbled on a bug in the Python gzip library :( Peter From bugzilla-daemon at portal.open-bio.org Thu Dec 24 12:00:56 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 24 Dec 2009 07:00:56 -0500 Subject: [Biopython-dev] [Bug 2927] Problem parsing PSI-BLAST plain text output with NCBStandalone.PSIBlastParser In-Reply-To: Message-ID: <200912241200.nBOC0ukq031745@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2927 ------- Comment #10 from ibdeno at gmail.com 2009-12-24 07:00 EST ------- (In reply to comment #9) > (In reply to comment #8) > > (In reply to comment #7) > I was also able to confirmed the problem is present in blastpgp 2.2.22, > however it seems to have been fixed in the "new" BLAST+ suite, psiblast > 2.2.22+ as described here: > http://lists.open-bio.org/pipermail/bioperl-l/2009-December/031811.html > > Given this new information, this does look like an NCBI BLAST bug, and not > a problem in Biopython itself. We *might* be able to get our parser to cope > with the funny BLAST output, but it does look difficult and risky to me. > I think the best strategy will be to use the BLAST+ suite, since the "old" programs will be abandoned, as I learnt from NCBI. Also, I think we should use XML output. I know I promised to work on testing that, but I don't think I will able to do the test before Februare... > Miguel - Is it possible the BLAST bug is relatively recent and first showed > up when you updated blastpgp to 2.2.18? > I had been using 2.2.18 for quite a while (months) and never had a problem. I think I initially thought it might be a problem with the actual databases, more than with the program... Best regards, Miguel -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 24 15:25:15 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 24 Dec 2009 10:25:15 -0500 Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model. In-Reply-To: Message-ID: <200912241525.nBOFPFxH003980@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2943 ------- Comment #14 from mdehoon at ims.u-tokyo.ac.jp 2009-12-24 10:25 EST ------- >From testing the current flex-based MMCIF parser, it seems that it is not quite complete. I don't think it is necessary to be backwards compatible with it. I rather have a well-designed MMCIF parser written independently, like the one by Paul, and have it replace the current MMCIF parser over time. This also allows us to have the design of the new parser more consistent with other Biopython modules. To do so, I suggest to have the new MMCIF parser in a new module MMCIF.py under Bio.PDB, and let it coexist with the current MMCIF parser for the time being. Since the new MMCIF parser does not use flex, I would think that the previous division into MMCIF2Dict and MMCIFParser may not be needed for the new parser. Paul, do you agree? Can the new parser live in a single MMCIF.py module? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Dec 24 15:37:08 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 24 Dec 2009 10:37:08 -0500 Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model. In-Reply-To: Message-ID: <200912241537.nBOFb83e004255@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2943 ------- Comment #15 from TallPaulInJax at yahoo.com 2009-12-24 10:37 EST ------- Hi Michiel, "I have a well-designed MMCIF parser written independently": Very interesting! Is it written solely in Python as well? I will say the parser I wrote is slower than I would like, so if you have an alternative? "Since the new MMCIF parser does not use flex, I would think that the previous division into MMCIF2Dict and MMCIFParser may not be needed for the new parser." I'm not expert enough in Python and in BioPython to know the correct call here. Perhaps Peter could answer this? I personally like the separation of concerns so that if someone else wanted to write their own parser, the code is modular in nature and supports doing that. Thanks for your help, Michiel! Paul -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sat Dec 26 10:08:05 2009 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sat, 26 Dec 2009 05:08:05 -0500 Subject: [Biopython-dev] [Bug 2943] MMCIFParser only handling a single model. In-Reply-To: Message-ID: <200912261008.nBQA85So025649@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2943 ------- Comment #16 from mdehoon at ims.u-tokyo.ac.jp 2009-12-26 05:08 EST ------- (In reply to comment #15) > "I have a well-designed MMCIF parser written independently": Very interesting! Actually I wrote "I *rather* have....". I don't have an MMCIF parser myself; I was referring to your parser. Btw, could you add a test case for the MMCIF parser (using some small data file that we can include with the Biopython distribution)? Tests are not just important to make sure everything works; often they are a very good example of how the code works. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From eric.talevich at gmail.com Tue Dec 29 01:51:40 2009 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 28 Dec 2009 17:51:40 -0800 Subject: [Biopython-dev] Code review request for phyloxml branch In-Reply-To: <3f6baf360909232048u54a63ce5q2adbd0e18ebd7036@mail.gmail.com> References: <3f6baf360909232048u54a63ce5q2adbd0e18ebd7036@mail.gmail.com> Message-ID: <3f6baf360912281751g5152a945p951dbbbcbffbddb1@mail.gmail.com> Hi folks, Here's an update on the status of Bio.Tree and TreeIO. I think I've taken care of most of the blockers since the last review in September. First, some links: http://github.com/etal/biopython/tree/phyloxml/Bio/Tree/ http://github.com/etal/biopython/tree/phyloxml/Bio/TreeIO/ http://github.com/etal/biopython/tree/phyloxml/Tests/test_PhyloXML.py http://github.com/etal/biopython/tree/phyloxml/Tests/test_Tree.py http://biopython.org/wiki/PhyloXML Discussion: *TreeIO* Conversion between Nexus, Newick and phyloXML tree file formats works; the read/parse/write functions for each IO format use the same object types. Neat! The tree annotations (e.g. id) aren't preserved perfectly during conversions -- I'll keep working on this, but I don't think it's a blocker. The taxon names of terminal nodes are kept as "clade" names in phyloXML for round-tripping. Tree topology and branch lengths seem OK. Under the hood: -- PhyloXMLIO is from GSoC -- NewickIO is ported from the Bio.Nexus.Trees parser. I think it works the same way. -- NexusIO relies on Bio.Nexus.Nexus for parsing, then converts the resulting Nexus.Trees.Tree objects to Bio.Tree.Newick objects. One day, when Nexus.Trees is replaced by NewickIO in the main Nexus parser, then this conversion can be dropped and NexusIO will be very simple. *Tree* The BaseTree object structure looks like this:* -- BaseTree.**Tree* contains global tree information, like whether the tree is rooted, and a reference to the root clade. The phyloXML Phylogeny object inherits from this.* -- BaseTree.**Subtree* contains local (clade- or node-specific) information, and references to each of its direct descendents, recursively. The phyloXML Clade object inherits from this. Nodes are implicit. I could add references to the ancestor of each sub-tree without too much difficulty, but I haven't needed them yet. The same methods (get_terminals et al.) generally apply to both classes, so I created a separate TreeMixin class from which both BaseTree.Tree and BaseTree.Subtree inherit. Bio.Tree.Newick contains simple subclasses of Tree and Subtree, and an incomplete set of shims that track Bio.Nexus.Trees.Tree (minus the I/O). This is to ease the deprecation and eventual replacement of Bio.Nexus.Trees, as I imagine it: (1) Port methods from Nexus.Trees to Bio.Tree, simplifying arguments where reasonable (since the node IDs and adjacency list lookup are no longer needed) (2) Implement methods in Bio.Tree.Newick with the original argument lists, but triggering a deprecation warning indicating the newer replacement method (3) Replace Nexus.Trees with an import of Bio.Tree.Newick(IO) and a few more shims to duplicate the original API -- so test_Nexus.py should still pass, ideally (with deprecation warnings) (4) In Nexus.Nexus, replace all usage of Nexus.Trees with proper usage of NexusIO and Bio.Tree methods. (5) Eventually delete Nexus.Trees and the shims in Bio.Tree.Newick. I'm currently doing (1) and (2), with more emphasis on getting (1) right. Not all of the important methods have been ported, but I'm happy with the tree traversal methods. * Tests *I created test_Tree.py to test the methods in Bio.Tree.BaseTree; test_PhyloXML.py tests Bio.Tree.PhyloXML objects and Bio.TreeIO.PhyloXMLIO parsing/writing. I noticed that in Tests/Nexus/, the example file for internal node labels is actually in Newick/NH format, not Nexus. That was briefly confusing, so maybe that file should be renamed. What do you think? All the best, Eric