From thamelry at vub.ac.be Thu Feb 6 14:09:38 2003 From: thamelry at vub.ac.be (Thomas Hamelryck) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] CVS problems? In-Reply-To: <200301301928.h0UJS37E014052@pw600a.bioperl.org> References: <200301301928.h0UJS37E014052@pw600a.bioperl.org> Message-ID: <200302062009.38579.thamelry@vub.ac.be> Hi, Is anything wrong with the biopython CVS server? I keep getting "permission denied" when I try to checkout.... Regards, -Thomas --- Thomas Hamelryck COMO-ULTR Vrije Universiteit Brussel (VUB) Belgium http://homepages.vub.ac.be/~thamelry From jchang at smi.stanford.edu Thu Feb 6 15:37:38 2003 From: jchang at smi.stanford.edu (Jeffrey Chang) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] CVS problems? In-Reply-To: <200302062009.38579.thamelry@vub.ac.be> References: <200301301928.h0UJS37E014052@pw600a.bioperl.org> <200302062009.38579.thamelry@vub.ac.be> Message-ID: <20030206203738.GE30982@springfield.stanford.edu> It works for me. However, we have changed the machine hosting the read-write server to dev.open-bio.org. Is that where you are trying to checkout from? If it still doesn't work, let me know. Jeff On Thu, Feb 06, 2003 at 08:09:38PM +0100, Thomas Hamelryck wrote: > > Hi, > > Is anything wrong with the biopython CVS server? > I keep getting "permission denied" when I try to checkout.... > > Regards, > > -Thomas > > --- > Thomas Hamelryck > COMO-ULTR > Vrije Universiteit Brussel (VUB) > Belgium > http://homepages.vub.ac.be/~thamelry > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev From Anthony.Metzidis at ktl.fi Fri Feb 7 03:43:29 2003 From: Anthony.Metzidis at ktl.fi (Anthony Metzidis) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] Possible Contribution: UCSC Blat and Ensembl SSAHA Sequence Locator Message-ID: <3E4371B1.3080908@ktl.fi> Hello, We've developed a Python API for the UCSC BLAT(http://genome.ucsc.edu/cgi-bin/hgBlat?command=start) and Ensembl SSAHA (http://www.ensembl.org/Homo_sapiens/ssahaview) genome search tools. Using our tool, you can input a series of dna sequences in Fasta format and then get the results back as dictionaries, indexed by the Fasta title, of dictionaries indexed by the fields presented by the web interfaces. The http connection and parsing of the HTML results pages are handled by our tool. We use these tools for locating large amounts of SNPs that are not yet annotated by the public DBs, or to relocate annotations that reference an older genome build to a newer one, for example. We would like to contribute this to BioPython, if you think there would be an interest in it. If so, could you offer advise about other existing BioPython interfaces that we should model ours after? I would like the interface to be as consistent as possible with the rest of BioPython. Any other advise on making contributions would be greatly appreciated. Thanks, Tony From jchang at smi.stanford.edu Fri Feb 7 15:42:29 2003 From: jchang at smi.stanford.edu (Jeffrey Chang) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] Possible Contribution: UCSC Blat and Ensembl SSAHA Sequence Locator In-Reply-To: <3E4371B1.3080908@ktl.fi> References: <3E4371B1.3080908@ktl.fi> Message-ID: <20030207204229.GD36025@springfield.stanford.edu> On Fri, Feb 07, 2003 at 10:43:29AM +0200, Anthony Metzidis wrote: [I've reordered some paragraphs...] > Hello, > We've developed a Python API for the UCSC > BLAT(http://genome.ucsc.edu/cgi-bin/hgBlat?command=start) and Ensembl > SSAHA (http://www.ensembl.org/Homo_sapiens/ssahaview) genome search tools. > We would like to contribute this to BioPython, if you think there would > be an interest in it. Yes, there would definitely be interest in it! > If so, could you offer advise about other existing BioPython interfaces > that we should model ours after? I would like the interface to be as > consistent as possible with the rest of BioPython. There's a few data types that should be supported. More below... > Using our tool, you can input a series of dna sequences in Fasta format > and then get the results back as dictionaries, indexed by the Fasta > title, of dictionaries indexed by the fields presented by the web > interfaces. The DNA sequences should be Bio.Seq objects, and not require FASTA format. Also, the results should be in defined and documented objects (for an example, see Bio.Blast.Record), rather than dictionaries. > The http connection and parsing of the HTML results pages are handled by > our tool. Also, make sure that these are decoupled. That is, you can use the tool to make HTTP connections and save the HTML results for processing later. Also, you can take HTML results (saved to disk, database, etc) and parse it into an appropriate object. Be sure to check out the FAQ, which gives some guidelines on submitting code. Basically, you have to agree to license your code under the Biopython license, and also that you can legally do that! Jeff From stefan at came.sbg.ac.at Mon Feb 10 06:19:40 2003 From: stefan at came.sbg.ac.at (Stefan Suhrer jun.) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] README Message-ID: <200302101219.40617.stefan@came.sbg.ac.at> You write in your README file that it is not required to install the Numerical Python 15.3 But after two hours of installation and compilation failures I can tell you that it is important to install it. Without it I could not compile biopython. Please change this message in the README file and save the time of a lot of people ----------------------------------------------------------------------------------------- Stefan Suhrer jun. Stefan.Suhrer@came.sbg.ac.at Institute of Chemistry and Biochemistry University of Salzburg C.A.M.E. - Center Of Applied Molecular Engineering Jakob-Haringerstr. 3 A-5020 Salzburg +43 - 662 - 8044-5796 ----------------------------------------------------------------------------------------- From thamelry at vub.ac.be Mon Feb 10 06:55:50 2003 From: thamelry at vub.ac.be (Thomas Hamelryck) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] README In-Reply-To: <200302101219.40617.stefan@came.sbg.ac.at> References: <200302101219.40617.stefan@came.sbg.ac.at> Message-ID: <200302101255.50447.thamelry@vub.ac.be> On Monday 10 February 2003 12:19 pm, Stefan Suhrer jun. wrote: > You write in your README file that it is not required to install the > Numerical Python 15.3 > But after two hours of installation and compilation failures I can tell you > that it is important to install it. Without it I could not compile > biopython. > > Please change this message in the README file and save the time of a lot of > people In fact, if you are using an RPM based linux system (as many people are), you need the pacakages Numpy AND Numpy-devel. Greetings, -Thomas --- Thomas Hamelryck COMO-ULTR Vrije Universiteit Brussel (VUB) Belgium http://homepages.vub.ac.be/~thamelry From katel at worldpath.net Tue Feb 11 22:59:55 2003 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] Artificial immune system Message-ID: <000701c2d24b$358b58a0$cadc85d0@pcklindner> I checked in a new version with a pseudorandom generator as the default. Cayte From biopython-bugs at bioperl.org Wed Feb 12 06:58:18 2003 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] Notification: incoming/121 Message-ID: <200302121158.h1CBwI7E031355@pw600a.bioperl.org> JitterBug notification new message incoming/121 Message summary for PR#121 From: gebauer-jung@ice.mpg.de Subject: typing mistakes Date: Wed, 12 Feb 2003 06:58:17 -0500 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From gebauer-jung@ice.mpg.de Wed Feb 12 06:58:18 2003 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.12.6/8.12.6) with ESMTP id h1CBwH7E031348 for ; Wed, 12 Feb 2003 06:58:17 -0500 Date: Wed, 12 Feb 2003 06:58:17 -0500 Message-Id: <200302121158.h1CBwH7E031348@pw600a.bioperl.org> From: gebauer-jung@ice.mpg.de To: biopython-bugs@bioperl.org Subject: typing mistakes X-Spam-Status: No X-Scanned-By: MIMEDefang 2.26 (www . roaringpenguin . com / mimedefang) Full_Name: Module: Bio/Application/__init__.py and Bio/Blast/Applications.py Version: biopython 1.10 OS: Submission from: n61.ice.mpg.de (141.5.20.61) Hello, I found some typing errors: Bio/Application/__init__.py paramater instead of parameter Bio/Blast/Applications.py option '-p' appears twice with different meaning With best regards, Steffi From biopython-bugs at bioperl.org Wed Feb 12 06:58:32 2003 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] Notification: incoming/122 Message-ID: <200302121158.h1CBwW7E031372@pw600a.bioperl.org> JitterBug notification new message incoming/122 Message summary for PR#122 From: gebauer-jung@ice.mpg.de Subject: typing mistakes Date: Wed, 12 Feb 2003 06:58:30 -0500 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From gebauer-jung@ice.mpg.de Wed Feb 12 06:58:31 2003 Received: from localhost (localhost [127.0.0.1]) by pw600a.bioperl.org (8.12.6/8.12.6) with ESMTP id h1CBwU7E031366 for ; Wed, 12 Feb 2003 06:58:31 -0500 Date: Wed, 12 Feb 2003 06:58:30 -0500 Message-Id: <200302121158.h1CBwU7E031366@pw600a.bioperl.org> From: gebauer-jung@ice.mpg.de To: biopython-bugs@bioperl.org Subject: typing mistakes X-Spam-Status: No X-Scanned-By: MIMEDefang 2.26 (www . roaringpenguin . com / mimedefang) Full_Name: Module: Bio/Application/__init__.py and Bio/Blast/Applications.py Version: biopython 1.10 OS: Submission from: n61.ice.mpg.de (141.5.20.61) Hello, I found some typing errors: Bio/Application/__init__.py paramater instead of parameter Bio/Blast/Applications.py option '-p' appears twice with different meaning With best regards, Steffi From jchang at smi.stanford.edu Wed Feb 12 14:38:44 2003 From: jchang at smi.stanford.edu (Jeffrey Chang) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] README In-Reply-To: <200302101219.40617.stefan@came.sbg.ac.at> References: <200302101219.40617.stefan@came.sbg.ac.at> Message-ID: <20030212193844.GA63847@springfield.stanford.edu> Thanks for the bug report! I'll make the changes to the README file. Do you have a complete list of the RPMs that have to be installed? Jeff On Mon, Feb 10, 2003 at 12:19:40PM +0100, Stefan Suhrer jun. wrote: > > > You write in your README file that it is not required to install the Numerical > Python 15.3 > But after two hours of installation and compilation failures I can tell you > that it is important to install it. Without it I could not compile biopython. > > Please change this message in the README file and save the time of a lot of > people > > > > ----------------------------------------------------------------------------------------- > Stefan Suhrer jun. > Stefan.Suhrer@came.sbg.ac.at > > Institute of Chemistry and Biochemistry > University of Salzburg > C.A.M.E. - Center Of Applied Molecular Engineering > Jakob-Haringerstr. 3 > A-5020 Salzburg > +43 - 662 - 8044-5796 > ----------------------------------------------------------------------------------------- > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev From idoerg at burnham.org Thu Feb 20 15:55:07 2003 From: idoerg at burnham.org (Iddo Friedberg) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] Re: [BioPython] Whitespace in sequences References: <3E551643.9070702@burnham.org> Message-ID: <3E5540AB.7020105@burnham.org> Hi, I replaced \n with os.linesep in Bio.FASTA and commited to the CVS. I can't find the contingency tests for the Bio.SeqIO modules, so I don't want to commit those changes before i test them. If anybody can point my at the right test_*.py script please? Iddo Iddo Friedberg wrote: > > > Paul-Michael Agapow wrote: > >> >> Thanks Iddo, >> >>> I guess you were using biopython on a Mac/Windows box, where '\r' or >>> '\r\n' is a >>> newline. Also, it looks like you were using the Bio.Fasta package to >>> read... the bug shouldn't occur within Bio.SeqIO.FASTA.FastaReader >>> (although it will within SeqIO.FASTA.FastaWriter!) >> >> >> >> Picked it straight off - I'm using MacOS X. However, I'm using >> Bio.SeqIO.FASTA.FastaReader. > > > Well that's funny... Jeff, Yair, any comment on this? I believe you have > Macs too. > > Which IMHO does not preclude the `\n' --> os.linesep overhaul required. > I wonder why this bug was hit upon only now. Thanks very much for > reporting this. > > Cheers all, > > Iddo > > > -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 646 3171 http://bioinformatics.ljcrf.edu/~iddo From biopython-bugs at bioperl.org Wed Feb 19 01:06:33 2003 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] Notification: incoming/123 Message-ID: <200302190606.h1J66W7E001571@pw600a.bioperl.org> JitterBug notification new message incoming/123 Message summary for PR#123 From: incomingforward@cs.com Subject: biopython-bugs, LIVE FROM WALL STREET: VICC Test Results Are In.......... Date: Wed, 19 Feb 2003 06:14:33 GMT 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From incomingforward@cs.com Wed Feb 19 01:06:25 2003 Received: from anchor-post-31.mail.demon.net (anchor-post-31.mail.demon.net [194.217.242.89]) by pw600a.bioperl.org (8.12.6/8.12.6) with ESMTP id h1J65t7E001554 for ; Wed, 19 Feb 2003 01:05:55 -0500 Received: from mailgate.g-s.co.uk ([212.240.107.3] helo=mail.g-s.local) by anchor-post-31.mail.demon.net with esmtp (Exim 3.35 #1) id 18lNV4-000OhY-4d; Wed, 19 Feb 2003 06:14:34 +0000 Received: from smtp0000.mail.yahoo.com ([202.164.165.244]) by mail.g-s.local with Microsoft SMTPSVC(5.0.2195.5329); Wed, 19 Feb 2003 06:14:49 +0000 Date: Wed, 19 Feb 2003 06:14:33 GMT From: incomingforward@cs.com X-Priority: 3 To: biopython-bugs@bioperl.org Subject: biopython-bugs, LIVE FROM WALL STREET: VICC Test Results Are In.......... Mime-Version: 1.0 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: X-OriginalArrivalTime: 19 Feb 2003 06:14:52.0328 (UTC) FILETIME=[36E1E280:01C2D7DE] X-Spam-Warning: SpamAssassin says this message is SPAM X-Spam-Status: Yes X-Spam-Report: SPAM: ---- Start SpamAssassin results SPAM: 7.60 hits, 5 required; SPAM: * 1.3 -- From: does not include a real name SPAM: * 0.5 -- BODY: Spam phrases score is 01 to 02 (low) SPAM: [score: 1] SPAM: * 0.2 -- BODY: Includes a URL link to send an email SPAM: * 1.3 -- URI: Uses a dotted-decimal IP address in URL SPAM: * 2.9 -- To: username at front of subject SPAM: * 1.0 -- Message has priority setting, but no X-Mailer SPAM: * 0.4 -- HTML-only mail, with no text version SPAM: SPAM: ---- End of SpamAssassin results X-Spam-Level: ******* (7.6) X-Scanned-By: MIMEDefang 2.26 (www . roaringpenguin . com / mimedefang) biopython-bugs@bioperl.org

If you bought into our last recommendation (CIMG) early enough you had an excellent opportunity to make substantial gains (from .90 to 1.65 in just the first day). Now is your chance to do the same with our newest pick: VICC. To find out more go to Live From the Street.

If you no longer want to receive information from us just go to tallrhe@cs.com.

  From jchang at smi.stanford.edu Thu Feb 20 04:39:54 2003 From: jchang at smi.stanford.edu (Jeffrey Chang) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] parsers status Message-ID: <442B8CF7-44B7-11D7-BE02-000A956845CE@smi.stanford.edu> The FormatIO system is starting to mature, so it is a good idea to start thinking about using Martel expressions for new parsers (and possibly porting old ones). The new system has a lot of advantages, such as exporting a unified interface, format auto-detecting, integration with data loading. I've gone through and taken an inventory of the current Biopython parsers. The ones under the PARSERS heading are the parsers that are not integrated with FormatIO, and EXPRESSIONS are the ones that are. I am sorry if I have missed any -- it is unintentional and not indicative of any negative opinion of your code! The eventual goal is to get all of them under the EXPRESSIONS heading. The Martel-based parsers will be easier to do than the non-Martel ones. I imagine that there will be a slow migration path; new parsers will be developed with FormatIO in mind, and old ones will be ported over as needed. Jeff PARSERS Blast.NCBIStandalone Blast.NCBIWWW Clustalw (Martel) Emboss.primer3_format (Martel) Emboss.primersearch_format (Martel) Enzyme FSSP Fasta GenBank (Martel) Geo (Martel) Gobase Intelligenetics (Martel) KEGG.Enzyme (Martel) KEGG.Compound (Martel) KEGG.Map (Martel) Kabat (Martel) LocusLink (Martel) Medline (Martel) MetaTool (Martel) NBRF (Martel) PDB Prosite Rebase SCOP Saf (Martel) SwissProt UniGene EXPRESSIONS blocks fasta sprot38 sprot40 genbank embl65 ncbiblast wublast From jchang at jeffchang.com Thu Feb 20 04:51:26 2003 From: jchang at jeffchang.com (Jeffrey Chang) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] Bugzilla is default bug database now Message-ID: I've changed our web page so that it now points to the Bugzilla database at: http://cvs.open-bio.org/bugzilla/ rather than the old Jitterbug database. We will now officially start using this database to manage bugs. Jeff From rob at spot.colorado.edu Thu Feb 20 11:12:24 2003 From: rob at spot.colorado.edu (Rob Knight) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] Questions about code contributions Message-ID: Hi, We are setting up a fairly large database project here at CU Boulder (5-10 developers over the next 3 years), and have settled on Python, PostgreSQL and Zope as our primary development tools. The main goal of the project is to automate comparative analyses across many disparate taxa, and to incorporate multiple types of expression and structural data in a phylogenetic context. Unfortunately, this focus on phylogeny and expression means that the existing BioSQL project doesn't really meet our needs. I am currently deciding whether we should use the Biopython code base. As noted at the root of the API documentation, the existing code needs to be cleaned up extensively. Also, there seems to have been very little activity towards handling phylogeny, mass spec data, and RNA structure, which are three areas that are critical to us. On the other hand, there are many useful modules in the Biopython code, such as the GA and Graphics modules and the parsing framework, that we would like to use and extend. However, a lot of the code examples in the Biopython Tutorial/Cookbook seem not to work or are very brittle due to bugs in the underlying code (these may have been fixed in the cvs version -- I haven't had time to check). If we do use Biopython, we would definitely be interested in returning all our contributions to the community. However, it would only be worth the time it would take for us to do this (as opposed to starting fresh) if it's possible for us to reorganize the code significantly. So, the main questions I have are: 1. What is the process, if any, for suggesting and/or making large-scale changes that are not compatible with existing code (e.g. changing the module structure, changing inheritance patterns, introducing and using new top-level abstract data types)? How much support would there be for doing a significant reorganization for, say, a 2.0 release in 2004? From Jeff Chang's message to the list earlier today, I get the impression that this is already in progress, at least for the parsers. 2. To what extent is the current code base compatible with Jython? Is there any general interest in using Biopython with Jython? (This is important to us, since we have a Java framework for distributing tasks across a cluster, and we may also want to integrate with the Mesquite phylogeny package later on). 3. How many developers are currently actively working on Biopython? (In other words, how much can we expect it will benefit us to participate in Biopython rather than just writing things on our own?) I'll definitely appreciate any thoughts you have: I'd certainly like to contribute to the Biopython project, but need to check that it will make sense for us to integrate our efforts. Thanks, Rob Knight MCD Biology University of Colorado, Boulder From katel at worldpath.net Thu Feb 20 19:06:25 2003 From: katel at worldpath.net (katel@worldpath.net) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] Artificial immune experiments Message-ID: <4910-220032521062558@M2W062.mail2web.com> I'm setting up some experiments to see if my ais tool with recognize members of a cog. I plan to pick a cog with a large number of sequences and pick a subset as the basis. Then I will test to see if the tool tends recognize sequences from the same cog and reject sequences from other cogs. Timing will be tested too.The test will be run varying the numbers of recognizers, rate of mutation, etc. It will be a challenge to keep the number of cases from xploding. I would appreciate suggestions, expecially since I know doodlysquat about experimental design. Thank you in advance. Cayte. -------------------------------------------------------------------- mail2web - Check your email from the web at http://mail2web.com/ . From jchang at jeffchang.com Fri Feb 21 02:54:06 2003 From: jchang at jeffchang.com (Jeffrey Chang) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] new news system for Bioython Message-ID: As everyone here is painfully aware, the News section on the Biopython web site is sorely out of date, and not well maintained. To help this, open-bio has installed MoveableType, a blogging program, to make it easy for people to post news items. Take a look at: http://news.open-bio.org/ We are going to move over to this system relatively soon. We need 1-2 volunteers that can help contribute to these news. This entails following the mailing lists and posting to the site whenever something of general interest happens. Let me know if you are interested in helping out! Thanks, Jeff From chapmanb at arches.uga.edu Fri Feb 21 04:16:18 2003 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] Plans for Biopython documentation Message-ID: <22641F4C-457D-11D7-89A2-000393C10EB6@arches.uga.edu> Hey all; I know I've been hearing a lot of rumblings about the state of documentation in the current Tutorial and have been trying to think of ways to update the docs and also make them easier to keep up to date in the future. The problem with the current large Tutorial are two-fold: 1. It is huge so trying to get into working on it is a serious mental exercise. 2. There is lots of useful code and docs in there that do work just fine, but some people might dismiss the whole thing as "out of date" if they run into something that doesn't work. To try and handle both problems, I've decided to trim the Tutorial back to it's original intention -- a 10 page or so getting started doc, and start creating smaller documentation bits that cover specific topics. I've already started doing that this week by splitting off the Installation documentation from the Tutorial and making the BioSQL documentation separate, and hope to continue this as I have time. What this means practically is that I think when we document new modules we should think about them as "standalone documentation bits" instead of parts of the huge Tutorial. My current plan is organize these on the Wiki page: http://www.biopython.org/wiki/html/BioPython/BiopythonCode.html And place the html/pdf/whatever pages in a directory (docs) on the biopython.org website so that anyone with cvs access should be able to add their own stuff. This also has additional benefits in that you can write your documentation bits in any format you might like and aren't forced to use the latex format that I like so much. Does this seem like a good plan to people to clear up some of the documentation messes and get more useful docs out there? Any other suggestions or random thoughts? Brad From jchang at jeffchang.com Fri Feb 21 06:01:55 2003 From: jchang at jeffchang.com (Jeffrey Chang) Date: Sat Mar 5 14:43:20 2005 Subject: [Biopython-dev] Questions about code contributions In-Reply-To: Message-ID: On Friday, February 21, 2003, at 12:12 AM, Rob Knight wrote: [introduction cut] Hi Rob, > I am currently deciding whether we should use the Biopython code base. > As > noted at the root of the API documentation, the existing code needs to > be > cleaned up extensively. Also, there seems to have been very little > activity towards handling phylogeny, mass spec data, and RNA structure, > which are three areas that are critical to us. On the other hand, there > are many useful modules in the Biopython code, such as the GA and > Graphics > modules and the parsing framework, that we would like to use and > extend. Great! Biopython is a volunteer-driven project. Nobody is directly getting paid to work on it, so the most mature parts are the ones that are important for someone's day job. You have identified some areas that are missing and waiting for someone to pick up. > However, a lot of the code examples in the Biopython Tutorial/Cookbook > seem not to work or are very brittle due to bugs in the underlying code > (these may have been fixed in the cvs version -- I haven't had time to > check). Yes, we know that the code in the documentation is quite often (like right now) out of date. There are code snippets in there that have been broken due to redesign. As the project matures, though, that will start to happen less, and less work will have to be done to maintain the docs. However, I think it is unfair to attribute the brittleness in the documentation to bugs in the code. Most of the code in the codebase is surprisingly robust, given the complexity of the problems. Out of the bug reports that we get, nearly all of them are attributable to 1) underlying format changes that break our stuff, 2) API changes in the code base that breaks documentation and scripts. There is nothing we can do to fix the first problem, and it will only be solved once people start moving to structured machine-readable representations such as XML. The second problem is our fault, but we simply don't have the resources to maintain the documentation. However, since this is an open source project, we will accept patches that fix the docs! We are aware that changing the API breaks people's code, and the documentation, and makes the toolkit harder to learn. However, we do think about the API changes quite carefully, and seriously think about the benefits before we do them. Fortunately, the toolkit is still relatively young, we have been able to make changes without causing too much unhappiness. However, Biopython is maturing, and the APIs are now starting to stabilize. > If we do use Biopython, we would definitely be interested in returning > all > our contributions to the community. However, it would only be worth the > time it would take for us to do this (as opposed to starting fresh) if > it's possible for us to reorganize the code significantly. > > So, the main questions I have are: > > 1. What is the process, if any, for suggesting and/or making > large-scale > changes that are not compatible with existing code (e.g. changing > the > module structure, changing inheritance patterns, introducing and > using > new top-level abstract data types)? Generally, it is to propose on the biopython-dev list for discussion and then forming consensus if anyone objects. Some changes are more disruptive than others. For example, adding new top-level data types are usually not problems. The most important is actually contributing patches. There is a surplus of good ideas, and a deficit of resources to implement them. Having a great idea is a good thing, but it most likely won't get integrated in the toolkit unless you also contribute the patch. > How much support would there be for > doing a significant reorganization for, say, a 2.0 release in 2004? > From Jeff Chang's message to the list earlier today, I get the > impression that this is already in progress, at least for the > parsers. Parts of the core will not change any more, while other stuff is currently undergoing significant reorganization. > 2. To what extent is the current code base compatible with Jython? Is > there any general interest in using Biopython with Jython? (This is > important to us, since we have a Java framework for distributing > tasks across a cluster, and we may also want to integrate with > the Mesquite phylogeny package later on). I know of several groups using Biopython with Jython. However, we haven't been officially supporting that kind of use, so have no documentation on which parts will or won't work with it. Although we officially do not have a Jython compatibility requirement, much of the package does work with it. One caveat is that some modules in Biopython uses Python 2.2 constructs, which aren't supported in Jython yet (I think). One major part of Biopython that is incompatible with Jython is the parsing framework. A lot of it depends on mxTextTools, which is a C-library. According to Andrew, though, there used to be a Python implementation of mxTextTools, but he does not know whether it is still around or supported. > 3. How many developers are currently actively working on Biopython? (In > other words, how much can we expect it will benefit us to > participate > in Biopython rather than just writing things on our own?) I have no idea how many developers are working on Biopython. It would be nice if people would let us know how they are using it, but that rarely happens. For example, the Biopython tutorial has been translated into other languages without us knowing. We are pleased when stuff like that happens, but we don't always hear about it. There are other ways to quantify, such as looking at web hits. Now, we get about 1000 downloads of each release of the toolkit. I don't know how that translate into actual use, though. It is hard to say how much participating in Biopython will benefit you. If you want someone to write your code, I'm fairly confident that's not going to happen. However, people may be willing to help debug your code, if it is of enough general interest to the community. For example, I'm confident that we now have the most robust BLAST parser in the world (although it might break on the current version of BLAST :). There are also intangible benefits from participating in the community, such as getting feedback, generating goodwill for your project, taking advantage of our infrastructure (distribution, web page, bug database, etc), and taking advantage of the goodwill we've generated in the community. > I'll definitely appreciate any thoughts you have: I'd certainly like to > contribute to the Biopython project, but need to check that it will > make > sense for us to integrate our efforts. I understand your concerns about not having control over the toolkit and perhaps not having things organized in ways that are most useful for your project. I'm not sure I have an easy answer, other than that we are extremely open to new ideas and new structures, however, we have been working on this for quite a long time and have a good idea of what will and won't work. Thanks for the email. I do hope you continue to post your plans to this list. Jeff From rob at spot.colorado.edu Fri Feb 21 13:06:06 2003 From: rob at spot.colorado.edu (Rob Knight) Date: Sat Mar 5 14:43:21 2005 Subject: [Biopython-dev] Questions about code contributions In-Reply-To: Message-ID: Hi Jeff, Thanks for the thoughtful reply. As I said initially, we _are_ interested in contributing (especially in the areas that are currently missing), but need to make sure that the benefits outweigh giving up control over the code we'll need. > Parts of the core will not change any more, while other stuff is > currently undergoing significant reorganization. Which parts are now fixed, and which parts are in flux? I have read the last couple of months' posts on the mailing list, but it would be great to get the current status in one place. > I know of several groups using Biopython with Jython. Are any of them active on this list? I'd definitely be interested in hearing any experiences people have had with this. Is there anything major besides the parsing framework that depends on C libraries? How difficult would it be to translate into Java the parts of mxTextTools that Martel requires? > It is hard to say how much participating in Biopython will benefit you. > If you want someone to write your code, I'm fairly confident that's > not going to happen. That's not what we're looking for, but discussion, debugging, and other support would definitely be useful. We'll come up with a more concrete plan for how we want to organize our code over the next couple of weeks, and post it to the list for discussion. At worst, it's easy to make up dummy modules that just translate between different naming conventions. I definitely do appreciate the amount of work that's gone into the Biopython project, and recognize that there are probably good reasons for a lot of the things that I don't currently understand. > However, I think it is unfair to attribute the brittleness in the > documentation to bugs in the code. One illustrative example: Page 32 of the Tutorial describes how to set up an NCBIDictionary with the default settings (for nucleotide sequences). We were trying to get some protein sequences. When we passed in peptide accession numbers, the error message indicated that the most likely problem was that the accession numbers were not in the database. However, they were present when we looked them up manually through NCBI's web site. >From the Tutorial: >>> from Bio import GenBank >>> ncbi_dict = GenBank.NCBIDictionary() >>> print ncbi_dict['6273291'] LOCUS AF191665 902 bp DNA linear PLN 07-NOV-1999 DEFINITION Opuntia marenae rpl16 gene; chloroplast gene for chloroplast ...many more lines: works fine. >>> print ncbi_dict['AAN12123'] Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.2/site-packages/Bio/GenBank/__init__.py", line 1541, in __getitem__ raise KeyError, x KeyError: ERROR, possibly because id not available? >>> new_ncbi_dict = GenBank.NCBIDictionary(database='protein') >>> print new_ncbi_dict['AAN12123'] Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.2/site-packages/Bio/GenBank/__init__.py", line 1541, in __getitem__ raise KeyError, x KeyError: ERROR, possibly because id not available? >>> new_ncbi_dict_2 = GenBank.NCBIDictionary(database='protein', format='gp') >>> print new_ncbi_dict_2['AAN12123'] LOCUS AAN12123 438 aa linear INV 14-FEB-2003 DEFINITION CG5605-PF [Drosophila melanogaster]. ...many more lines: works fine In other words, initializing GenBank.NCBIDictionary specifying only a database does not work: the format must be specified as well, and the docstring doesn't say what the valid formats are. This is not fixed even in the cvs version. Looking at Bio/GenBank/__init__.py in cvs: [first line is 1470] def __init__(self, database='sequences', format="gb", delay=5.0, parser=None): """NCBIDictionary([database][, delay][, parser]) Create a new Dictionary to access GenBank. Valid values for database are 'genome', 'nucleotide', 'protein', 'popset', and 'sequences'. delay is the number of seconds to wait between each query (5 default). parser is an optional parser object to change the results into another form. If unspecified, then the raw contents of the file will be returned. """ from Bio.WWW import RequestLimiter self.parser = parser self.limiter = RequestLimiter(delay) self.database = database if format: self.format = format elif self.database == 'nucleotide': self.format = 'gb' elif self.database == 'protein' or self.database == 'popset': self.format = 'gp' else: self.format = 'native' The code to set the format is never executed, because format is set to 'gb' as a default parameter. The fix is trivial: def __init__(self, database='sequences', format=None, delay=5.0, parser=None): This always returns a result, albeit in native format (which breaks the Tutorial example). To preserve the Tutorial example, set the default database to 'nucleotide' instead of 'sequences'. Another option would be to try to autodetect whether a particular accession number is protein or nucleotide and return in the appropriate gb or gp format, but this would take somewhat more effort. Our experiences trying to follow and modify the recipes in the Tutorial suggest that this kind of thing is fairly common. We will file bug reports if time permits, but it does take significant effort to write them up and verify that the patches work (especially given the state of the tests). Also, we were surprised to find all this code lurking in __init__.py in the first place. Is there a specific motivation for this design decision? I think that the proposal of breaking up the Tutorial into sections might help a lot with this sort of thing. It might also help to make the specific examples in the Tutorial into unit tests that can be conveniently run when the code is updated so that it's easy to see what breaks... Anyway, I will keep you posted as our specific plans mature. Rob From biopython-bugs at bioperl.org Sat Feb 22 14:27:17 2003 From: biopython-bugs at bioperl.org (biopython-bugs@bioperl.org) Date: Sat Mar 5 14:43:21 2005 Subject: [Biopython-dev] Notification: incoming/124 Message-ID: <200302221927.h1MJRH7E003200@pw600a.bioperl.org> JitterBug notification new message incoming/124 Message summary for PR#124 From: admin@icq.com Subject: IMPORTANT:Your UIN Activision Statues Date: Fri, 21 Feb 2003 20:32:42 0200 0 replies 0 followups ====> ORIGINAL MESSAGE FOLLOWS <==== >From admin@icq.com Sat Feb 22 14:27:16 2003 Received: from localhost.com ([217.131.175.162]) by pw600a.bioperl.org (8.12.6/8.12.6) with SMTP id h1MJR67E003195 for ; Sat, 22 Feb 2003 14:27:14 -0500 Message-Id: <200302221927.h1MJR67E003195@pw600a.bioperl.org> From: admin@icq.com To: biopython-bugs@bioperl.org Date: Fri, 21 Feb 2003 20:32:42 +0200 Subject: IMPORTANT:Your UIN Activision Statues X-Mailer: MailXSender 1.02 MIME-Version: 1.0 Content-Type: text/html; charset="windows-1254" Content-Transfer-Encoding: base64 X-Spam-Warning: SpamAssassin says this message is SPAM X-Spam-Status: Yes X-Spam-Report: SPAM: ---- Start SpamAssassin results SPAM: 5.60 hits, 5 required; SPAM: * 1.3 -- From: does not include a real name SPAM: * 0.8 -- BODY: Spam phrases score is 00 to 01 (low) SPAM: * 1.4 -- RAW: Message text disguised using base-64 encoding SPAM: * 0.3 -- RAW: Message contains a lot of ^M characters SPAM: * 1.4 -- URI: URL uses words and phrases which indicate porn (4) SPAM: * 0.4 -- HTML-only mail, with no text version SPAM: SPAM: ---- End of SpamAssassin results X-Spam-Level: ***** (5.6) X-Scanned-By: MIMEDefang 2.26 (www . roaringpenguin . com / mimedefang) QmlubGVyY2Ug3G5s/G78biBD/XBsYWsgdmUgRXJvdGlrIEZvdG/wcmFmbGFy/S4gVP1rbGEgdmUg aGVwc2luaSBn9nIuIA0KDQpQb3JubyBGaWxtbGVyLCBQb3JubyBSZXNpbWxlciwgQWR1bHQgSGlr YXllbGVyIA0KSGVwc2kgYnUgYWRyZXN0ZS4uLg0KVP1rbGFtYW4geWV0ZXJsaSB0YXRs/W0uLi4N Cg0KaHR0cDovL3d3dy5zZXhpY2FkaS5jamIubmV0IA0KDQpCdSBhZHJlc2kgbXV0bGFrYSBkZW5l bWVsaXNpbml6LiBIaediaXL+ZXkga2F5YmV0bWV6c2luaXouIA0KDQo0IEv9eiBBcmthZGFzICBh eW79IGFuZGEgYmlyIGV2ZGUgeWFwdP1rbGFy/SBtdWh0ZXNlbSBncnVwIGxlemJpeWVuIHJlc2lt bGVyaSB2ZSBmaWxtbGVyaSBidSBhZHJlc3RlDQoNCmh0dHA6Ly93d3cuc2V4aWNhZGkuY2piLm5l dCANCg0KQWR1bHQsIEVyb3RpayB2ZSBQb3JubyByZXNpbSB2ZSBoaWtheWVsZXIgZmlsbWxlci4u Li4gSGVwc2kgYnUgYWRyZXNpbiBhbHT9bmRhLiBU/WtsYW1hbv16IHlldGVybGkuLi4uDQoNCk11 dGxha2Egeml5YXJldCBlZGluLiBTaXppIGJpciBiYXNrYSBk/G55YXlhIGf2dPxyZWNlayBvbGFu IGJ1IGFkcmVzIGh0dHA6Ly93d3cuc2V4aWNhZGkuY2piLm5ldCANCg0KUvx5YWxhcv1u/Xr9IHP8 c2xleWVuIGv9emxhciwg/G5s/GxlciB2ZSBiaXJiaXJpbmRlbiBzZXhpIHBvcm5vIHn9bGT9emxh cv0uLi4uDQoNCmh0dHA6Ly93d3cuc2V4aWNhZGkuY2piLm5ldCANCmh0dHA6Ly93d3cuc2V4aWNh ZGkuY2piLm5ldCANCmh0dHA6Ly93d3cuc2V4aWNhZGkuY2piLm5ldCANCmh0dHA6Ly93d3cuc2V4 aWNhZGkuY2piLm5ldCANCmh0dHA6Ly93d3cuc2V4aWNhZGkuY2piLm5ldCANCg== From DJaeggi at imim.es Mon Feb 24 05:17:58 2003 From: DJaeggi at imim.es (Jaeggi, Daniel M.) Date: Sat Mar 5 14:43:21 2005 Subject: [Biopython-dev] Gentoo ebuild Message-ID: <3E59F156.2080902@imim.es> Hi Finally a Gentoo biopython ebuild has made it into the tree. It's currently listed as an unstable package under app-sci/, so if there are any Gentoo users out there, please give it a try and report any installation bugs to http://bugs.gentoo.org. Thanks, Daniel From Yves.Bastide at irisa.fr Mon Feb 24 05:38:56 2003 From: Yves.Bastide at irisa.fr (Yves Bastide) Date: Sat Mar 5 14:43:21 2005 Subject: [Biopython-dev] parsers status In-Reply-To: <442B8CF7-44B7-11D7-BE02-000A956845CE@smi.stanford.edu> References: <442B8CF7-44B7-11D7-BE02-000A956845CE@smi.stanford.edu> Message-ID: <3E59F640.5090901@irisa.fr> Jeffrey Chang wrote: > The FormatIO system is starting to mature, so it is a good idea to start > thinking about using Martel expressions for new parsers (and possibly > porting old ones). The new system has a lot of advantages, such as > exporting a unified interface, format auto-detecting, integration with > data loading. > [...] How does one use FormatIO? There's no documentation on it, and I've been unable to grok it yet. (Too bad, didn't know wublast.py before hacking Blast.NCBIStandalone... :-/) Thanks, yves From jchang at jeffchang.com Mon Feb 24 20:57:23 2003 From: jchang at jeffchang.com (Jeffrey Chang) Date: Sat Mar 5 14:43:21 2005 Subject: [Biopython-dev] Gentoo ebuild In-Reply-To: <3E59F156.2080902@imim.es> Message-ID: <7B782B88-4864-11D7-B28A-000A956845CE@jeffchang.com> Hey Daniel, Thanks a lot for doing this! Having Biopython easily distributable really helps people who want to use the package. Are there any special requirements or dependencies for this package that need to be documented? Jeff On Monday, February 24, 2003, at 02:17 AM, Jaeggi, Daniel M. wrote: > Hi > > Finally a Gentoo biopython ebuild has made it into the tree. It's > currently listed as an unstable package under app-sci/, so if there > are any Gentoo users out there, please give it a try and report any > installation bugs to http://bugs.gentoo.org. > > Thanks, > > Daniel > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev From jchang at jeffchang.com Tue Feb 25 03:33:36 2003 From: jchang at jeffchang.com (Jeffrey Chang) Date: Sat Mar 5 14:43:21 2005 Subject: [Biopython-dev] parsers status In-Reply-To: <3E59F640.5090901@irisa.fr> Message-ID: Yeah, unfortunately, there's no documentation on it. There should be... Brad's doing a reorg of the documentation now, so it will be easier to manage and integrate the stuff people write! ;) It's a cool system that Andrew Dalke developed, that does really fast parsing, automatic format detection, and makes it easier to write parsers. Jeff On Monday, February 24, 2003, at 02:38 AM, Yves Bastide wrote: > Jeffrey Chang wrote: >> The FormatIO system is starting to mature, so it is a good idea to >> start thinking about using Martel expressions for new parsers (and >> possibly porting old ones). The new system has a lot of advantages, >> such as exporting a unified interface, format auto-detecting, >> integration with data loading. > [...] > > > How does one use FormatIO? There's no documentation on it, and I've > been unable to grok it yet. (Too bad, didn't know wublast.py before > hacking Blast.NCBIStandalone... :-/) > > Thanks, > yves > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev From jchang at smi.stanford.edu Tue Feb 25 14:47:23 2003 From: jchang at smi.stanford.edu (Jeffrey Chang) Date: Sat Mar 5 14:43:21 2005 Subject: [Biopython-dev] Questions about code contributions In-Reply-To: Message-ID: On Friday, February 21, 2003, at 10:06 AM, Rob Knight wrote: > [Jeff] >> Parts of the core will not change any more, while other stuff is >> currently undergoing significant reorganization. > > Which parts are now fixed, and which parts are in flux? I have read the > last couple of months' posts on the mailing list, but it would be > great to > get the current status in one place. The core right now consists of the database access (Bio.db, Bio.config, Bio.dbdefs), parsing frameworks, and sequence objects. Those are nearing completion. There is still work going on on the code, but I don't expect that there will be any more major structural changes. Code that may use the core framework, but doesn't yet (e.g. Bio.PubMed, Bio.GenBank, etc) will get rewritten to work with it. Also, code that accesses NCBI databases will be rewritten to work with EUtils. There are also many individual contributions (e.g. Bio.PDB) whose authors know their status better than me. >> I know of several groups using Biopython with Jython. > > Are any of them active on this list? I'd definitely be interested in > hearing any experiences people have had with this. The list is a good place to start. If you don't mind me passing your email on, I can see if one of my colleagues with some experience with this can help. > Is there anything major besides the parsing framework that depends on C > libraries? How difficult would it be to translate into Java the parts > of > mxTextTools that Martel requires? I suspect that it might be a lot of work. However, the source code is available, which would help. The best would be to talk to Marc-Andre Lemburg. He may know of people who have pure python or java implementations. > Page 32 of the Tutorial describes how to set up an NCBIDictionary with > the > default settings (for nucleotide sequences). We were trying to get some > protein sequences. When we passed in peptide accession numbers, the > error > message indicated that the most likely problem was that the accession > numbers were not in the database. However, they were present when we > looked them up manually through NCBI's web site. [discussion of bug in Bio.Genbank.NCBIDictionary, Tutorial] Yep, you're right. It is broken. This code will be reworked to use EUtils, which is more robust and has features such as better error checking and diagnosis. > Our experiences trying to follow and modify the recipes in the Tutorial > suggest that this kind of thing is fairly common. We will file bug > reports > if time permits, but it does take significant effort to write them up > and > verify that the patches work (especially given the state of the tests). Yes. > Also, we were surprised to find all this code lurking in __init__.py in > the first place. Is there a specific motivation for this design > decision? Yes, this was discussed on the mailing list (one year ago?). For many modules, this reduces the level of nesting, changing: from Bio.Genbank import Genbank to: from Bio import Genbank I believe that the consensus was that the second is cleaner and less confusing. One disadvantage of putting a bunch of code in __init__.py is that many people don't look for code there. However, that doesn't usually cause problems for people more than once! > I think that the proposal of breaking up the Tutorial into sections > might > help a lot with this sort of thing. It might also help to make the > specific examples in the Tutorial into unit tests that can be > conveniently > run when the code is updated so that it's easy to see what breaks... That's a great idea! Hey Brad, any thoughts? > Anyway, I will keep you posted as our specific plans mature. Great, thanks! Jeff