From jchang at SMI.Stanford.EDU Thu Feb 1 01:45:50 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:42:56 2005 Subject: [Biopython-dev] major refactoring Message-ID: Hello everybody, I just finished a major refactoring of many of the modules. Specifically, I moved all the instances of X/X to X/__init__. For example, code from Bio/Fasta/Fasta.py was moved to Bio/Fasta/__init__.py. Because of this, import code now looks a bit cleaner, i.e. from Bio import Fasta instead of from Bio.Fasta import Fasta Unfortunately, this breaks existing code that imports these modules. I've fixed all the occurrences of this in Biopython that I could find. However, please be on the lookout in case I missed something! Jeff From chapmanb at arches.uga.edu Sat Feb 3 16:15:00 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:42:56 2005 Subject: [Biopython-dev] MutableSeq and array problem Message-ID: <14972.29908.739456.913825@taxus.athen1.ga.home.com> Hello all; Just a quick one. In the constructor for MutableSeq, it seems like the array function is intialized in a way that python doesn't like. I just picked this up recently after moving to the 2.1 alpha releases, so it might be that new versions are checking more carefully or something. At any rate, this is a proposed patch for Seq.py, which seems to fix this. As far as I can tell this seems to be backcompatible with other versions and consistent with the documentation for the array constructor. Just a one liner, but I wanted to post it here to make sure I was thinking about this right. Brad [Bio]$ diff -c Seq.py.orig Seq.py *** Seq.py.orig Sat Sep 9 15:19:18 2000 --- Seq.py Sat Feb 3 15:42:39 2001 *************** *** 60,66 **** class MutableSeq: def __init__(self, data, alphabet = Alphabet.generic_alphabet): if type(data) == type(""): ! self.data = array.array( ("c", data) ) else: self.data = data # assumes the input is an array self.alphabet = alphabet --- 60,66 ---- class MutableSeq: def __init__(self, data, alphabet = Alphabet.generic_alphabet): if type(data) == type(""): ! self.data = array.array("c", data) else: self.data = data # assumes the input is an array self.alphabet = alphabet From dalke at acm.org Sat Feb 3 17:56:18 2001 From: dalke at acm.org (Andrew Dalke) Date: Sat Mar 5 14:42:56 2005 Subject: [Biopython-dev] MutableSeq and array problem Message-ID: <009901c08e34$86390020$8eac323f@josiah> Brad: >! self.data = array.array( ("c", data) ) >--- 60,66 ---- >! self.data = array.array("c", data) Sorry, that's my fault. Instead of reading the documentation I played around with the parameters until they worked. Your change is correct. Andrew From chunnuan at netscape.com Wed Feb 14 14:42:45 2001 From: chunnuan at netscape.com (Chunnuan Chen) Date: Sat Mar 5 14:42:56 2005 Subject: [Biopython-dev] Is the www.biopython.org server down Message-ID: <3A8ADFB5.FF880643@netscape.com> Hi, I have been trying to access the biopython website for the last two days but couldn't get there. Is the server hosting the site down? If so, when will it be back on again? Thanks, Chunnuan From thomas at cbs.dtu.dk Thu Feb 15 06:50:42 2001 From: thomas at cbs.dtu.dk (Thomas Sicheritz-Ponten) Date: Sat Mar 5 14:42:56 2005 Subject: [Biopython-dev] get CDS In-Reply-To: Jeffrey Chang's message of "Wed, 24 Jan 2001 09:08:48 -0800 (PST)" References: Message-ID: Hej, Is there an easy way to retrieve the coordinates for all CDS's (not the translations) in a given DNA sequence ? Actually it is for the nextorf script - I'd like to include the bad-orf retrieving function where the user can choose if the ORF should be limited by start and stop codons (for good sequences) or if she wants to retrieve the longest possible coding regions within stop codons (as usually found in sequencing projects with raw sequence data). Am I reinventing the wheel, does this function already exist ? cheers -thomas -- Sicheritz-Ponten Thomas, Ph.D CBS, Department of Biotechnology thomas@biopython.org The Technical University of Denmark CBS: +45 45 252489 Building 208, DK-2800 Lyngby Fax +45 45 931585 http://www.cbs.dtu.dk/thomas De Chelonian Mobile ... The Turtle Moves ... From dalke at acm.org Thu Feb 15 12:28:04 2001 From: dalke at acm.org (Andrew Dalke) Date: Sat Mar 5 14:42:56 2005 Subject: [Biopython-dev] regression test code Message-ID: <01b601c09774$a85c5dc0$f8ab323f@josiah> According to the latest python-dev summary: >The question of replacing Python's hoary old regrtest-driven test >suite with something more modern came up again. Andrew Kuchling >enquired whether the issue was to be decided by voting or BDFL fiat: > > > >Guido obliged: > > > >There was then some discussion of what changes people would like to >see made in the standard-Python-unit-testing-framework-elect >(PyUnit) before they would be happy with it. How about we plan on merging the two different regression test code in biopython to standardize on PyUnit? Andrew From chapmanb at arches.uga.edu Thu Feb 15 13:21:36 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:42:56 2005 Subject: [Biopython-dev] regression test code In-Reply-To: <01b601c09774$a85c5dc0$f8ab323f@josiah> References: <01b601c09774$a85c5dc0$f8ab323f@josiah> Message-ID: <14988.7728.195143.721847@taxus.athen1.ga.home.com> Andrew Dalke writes: [Python might move to PyUnit, since that's what Zope folks like] > > How about we plan on merging the two different regression > test code in biopython to standardize on PyUnit? I'm using PyUnit in biopython-corba right now, and am happy with it. I had to hack it a little to support CORBA better (it wasn't possible to pass in objects and I couldn't recreate every CORBA object in the setUp function), but that is probably just an idiosyncracy with CORBA. It really helps me organize tests, and makes me feel like I am testing things better; so I like it. Seeing it integrated with regression code would be especially nice. So +1 on this from me. BTW, since we are talking about tests -- does anyone have any idea what happened to the Tests directory recently? I'm suddenly getting 7 tests failing, and most of the problems look like line ending problems. Just curious... Brad From chapmanb at arches.uga.edu Thu Feb 15 13:33:05 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:42:56 2005 Subject: [Biopython-dev] get CDS In-Reply-To: References: Message-ID: <14988.8417.443723.544438@taxus.athen1.ga.home.com> Hi Thomas; > Is there an easy way to retrieve the coordinates for all CDS's (not the > translations) in a given DNA sequence ? At least in my mind, finding putative CDSs is a hard job, so I don't know if I can think of any easy ways :-). In your NextOrf function, are you thinking of this in terms of finding an ORF within a cDNA, where you don't have to worry about introns, etc -- or are you focusing on bacterial stuff mostly? Just curious. This would at least help me get an idea what exactly you are trying to do. Coming from a Eukaryotic world, I guess I just see the problem of finding CDSs as "pretty damn hard." > Actually it is for the nextorf script - I'd like to include the bad-orf > retrieving function where the user can choose if the ORF should be limited > by start and stop codons (for good sequences) or if she wants to retrieve > the longest possible coding regions within stop codons (as usually found in > sequencing projects with raw sequence data). In terms of locations, have you thought about using the new Location model I put into CVS (it's in Bio/SeqFeature.py) to help deal with GenBank and BioCorba locations? I'd love to get more feedback on it, so people can decide if they think this does a good job of handling locations. BTW, thanks for the new example code! Brad From dalke at acm.org Thu Feb 15 13:38:27 2001 From: dalke at acm.org (Andrew Dalke) Date: Sat Mar 5 14:42:56 2005 Subject: [Biopython-dev] regression test code Message-ID: <001e01c0977e$816dbce0$81ac323f@josiah> Brad Chapman : >I'm using PyUnit in biopython-corba right now, >and am happy with it. I had to hack it a little to support CORBA >better (it wasn't possible to pass in objects and I couldn't >recreate every CORBA object in the setUp function), but that is >probably just an idiosyncracy with CORBA. You might want to bring it up on c.l.py or biopython-dev. Apparently they are reworking it a bit to make it easier to use for broader cases. > I'm suddenly getting 7 >tests failing, and most of the problems look like line ending >problems. Just curious... Of course, the other reason to switch is then I don't need to fix the regrtest bug with different newlines :) Andrew From jchang at SMI.Stanford.EDU Thu Feb 15 14:12:38 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:42:56 2005 Subject: [Biopython-dev] regression test code In-Reply-To: <01b601c09774$a85c5dc0$f8ab323f@josiah> Message-ID: On Thu, 15 Feb 2001, Andrew Dalke wrote: > How about we plan on merging the two different regression > test code in biopython to standardize on PyUnit? I've not used PyUnit before. However, the reasons we'd want to move is if it: 1. has some necessary feature the current framework doesn't support 2. facilitates having better testing coverage 3. is integrateable (whether through Python or in our distro) In general, people don't write regression tests, so we want a framework that makes it as painless as possible. Jeff From jchang at SMI.Stanford.EDU Thu Feb 15 14:16:00 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:42:56 2005 Subject: [Biopython-dev] regression test code In-Reply-To: <001e01c0977e$816dbce0$81ac323f@josiah> Message-ID: > > I'm suddenly getting 7 > >tests failing, and most of the problems look like line ending > >problems. Just curious... > > Of course, the other reason to switch is then I don't need > to fix the regrtest bug with different newlines :) Yes, we definitely need this feature. Another thing we need to address is how to test code that utilizes the internet. It only needs to be tested for people with a connection and has to be done in a bandwidth-friendly way. Jeff From Desai.Dinakar at mayo.edu Tue Feb 13 17:54:04 2001 From: Desai.Dinakar at mayo.edu (Dinakar Desai) Date: Sat Mar 5 14:42:56 2005 Subject: [Biopython-dev] is server down References: <200102111457.f1BEvTp13921@pw600a.bioperl.org> Message-ID: <3A89BB0C.C1F39F25@mayo.edu> hello: is the web server of biopython down? i am not able to connect to it since this afternoon. hope to hear from someone. Dinakar -------------- next part -------------- A non-text attachment was scrubbed... Name: desai.dinakar.vcf Type: text/x-vcard Size: 197 bytes Desc: Card for Dinakar Desai Url : http://portal.open-bio.org/pipermail/biopython-dev/attachments/20010213/3d18f843/desai.dinakar.vcf From chapmanb at arches.uga.edu Thu Feb 15 15:50:55 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:42:56 2005 Subject: [Biopython-dev] is server down In-Reply-To: <3A89BB0C.C1F39F25@mayo.edu> References: <200102111457.f1BEvTp13921@pw600a.bioperl.org> <3A89BB0C.C1F39F25@mayo.edu> Message-ID: <14988.16687.605532.44288@taxus.athen1.ga.home.com> Hello! > is the web server of biopython down? I think there are DNS problems (some DNS servers went down on the east coast of the US, I believe), but the biopython server itself is fine. I think the problems are getting resolved now (since your mail got through to the list!). > i am not able to connect to it since this afternoon. Hopefully everything should be getting better now. Apologies for the problems. Brad From chapmanb at arches.uga.edu Thu Feb 15 16:06:33 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] Newline fun (was Re: regression test code) In-Reply-To: References: <001e01c0977e$816dbce0$81ac323f@josiah> Message-ID: <14988.17625.272177.28612@taxus.athen1.ga.home.com> [me] > > > I'm suddenly getting 7 > > >tests failing, and most of the problems look like line ending > > >problems. Just curious... [Andrew] > > Of course, the other reason to switch is then I don't need > > to fix the regrtest bug with different newlines :) [Jeff] > Yes, we definitely need this feature. Aha! I figured out the problem -- it appears that python 2.1 handles newlines differently: [Tests]$ python Python 2.1a2 (#1, Feb 3 2001, 15:37:56) [GCC 2.95.2 19991024 (release/franzo)] on linux2 Type "copyright", "credits" or "license" for more information. >>> spam = "I have a newline\n" >>> spam 'I have a newline\n' >>> [Tests]$ python2.0 Python 2.0 (#2, Jan 13 2001, 16:29:22) [GCC 2.95.2 19991024 (release/franzo)] on linux2 Type "copyright", "credits" or "license" for more information. >>> spam = "I have a newline too\n" >>> spam 'I have a newline too\012' >>> Blah, so everytime we had a \012 in a string output before, python2.1 is now generating a \n, so it makes the regression comparisons fail. The fun of line breaks never stops. Brad From chapmanb at arches.uga.edu Mon Feb 19 19:31:02 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] Biopython MutableSeq bug Message-ID: <14993.47814.833375.698034@taxus.athen1.ga.home.com> Hi Dan; I was just checking on the Biopython Bug listing for new bugs and noticed yours from a couple of weeks back about the MutableSeq problem. Sorry for the delay in getting back with you on it. Amazingly enough, I also noticed this bug on the same day, sent a message to Andrew, who wrote MutableSeq, and fixed it. We must be sharing some scary psychic connection to have both noticed it at the same time :-). The fix is contained in the latest CVS version of biopython (instructions to get it are at cvs.biopython.org). The fix is just to remove the parenthesis that make the arguments a tuple in the call to the initializer. The fix will also be in the latest release, which should be out by the end of this month. Sorry again for the late response. Thanks for the bug report and interest in Biopython. Brad From katel at worldpath.net Tue Feb 20 00:13:57 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] Rebase Message-ID: <003001c09afb$ee150e40$010a0a0a@cadence.com> I just committed a redo of Rebase.py and test_rebase.py. The old output won't pass, but I'm not sure when to post the new output, because of the newline problem with Python 2.1. Cayte From jchang at SMI.Stanford.EDU Tue Feb 20 00:05:23 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] Rebase In-Reply-To: <003001c09afb$ee150e40$010a0a0a@cadence.com> Message-ID: Is the newline problem something that would be fixed if br_regrtest was more permissive in what it accepts as newlines? Go ahead an commit the new output. The old output definitely doesn't work, and the new output will work when br_regrtest gets fixed. Jeff On Mon, 19 Feb 2001, Cayte wrote: > I just committed a redo of Rebase.py and test_rebase.py. The old output > won't pass, but I'm not sure when to post the new output, because of the > newline problem with Python 2.1. > > Cayte > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev@biopython.org > http://biopython.org/mailman/listinfo/biopython-dev > From thomas at cbs.dtu.dk Wed Feb 21 06:14:50 2001 From: thomas at cbs.dtu.dk (thomas@cbs.dtu.dk) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] python biocorba <-> ensembl Message-ID: <14995.41770.128673.930495@delphinus.cbs.dtu.dk> Hej, I need to work with the ensembl data (www.ensembl.org) and the only ways skipping perl is to implement ensembl calls in python or use (as Ewan Birney suggested) a Perl<->CORBA<->Python bridge. ... the question is how should I start with Corba ??? Is there anything I could use in the biocorba stuff or should I go directly for corba+python ??? Any pointers, quick tutorials etc. ??? thx -thomas -- Sicheritz-Ponten Thomas, Ph.D CBS, Department of Biotechnology thomas@biopython.org The Technical University of Denmark CBS: +45 45 252489 Building 208, DK-2800 Lyngby Fax +45 45 931585 http://www.cbs.dtu.dk/thomas De Chelonian Mobile ... The Turtle Moves ... From chapmanb at arches.uga.edu Wed Feb 21 14:53:33 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] python biocorba <-> ensembl In-Reply-To: <14995.41770.128673.930495@delphinus.cbs.dtu.dk> References: <14995.41770.128673.930495@delphinus.cbs.dtu.dk> Message-ID: <14996.7357.209805.178924@taxus.athen1.ga.home.com> Hi Thomas; > I need to work with the ensembl data (www.ensembl.org) and the only ways > skipping perl is to implement ensembl calls in python or use > (as Ewan Birney suggested) a Perl<->CORBA<->Python bridge. I was following this on the ensembl-dev list, and I think it would be very cool to have an ensembl-based server that you could get to through biocorba (or expanded biocorba). Since the bioperl-corba server is nearly done (thanks to Jason!), it seems like we would only need to think up an IDL for some of the interfaces, and build this top level server. I'm ccing this to Jason, who mentioned that he might want to help write a perl server for ensembl (and I'm holding him to it :-). If he, or Ewan could provide this, then I would be happy to do the biopython-corba nuts and bolts stuff from the python side. Then Thomas would get to be the guinea pig to test it... > ... the question is how should I start with Corba ??? > Is there anything I could use in the biocorba stuff or should I go directly > for corba+python ??? Biocorba is definately your best bet here. All of the lower level objects (like Sequences, Features, etc) are already accessible through biocorba. You would just need the high level layer to connect with ensembl. > Any pointers, quick tutorials etc. ??? Of course! Alan Robinson just wrote a very nice tutorial to getting started writing CORBA clients at: http://industry.ebi.ac.uk/~alan/CORBA/Tutorials/Introduction/Tutorial_Introduction.html [Sorry about the wrapping on that URL] There is a python section to the Tutorial. This is a big picture kind of introduction to CORBA. There are also quite a bit of docs on using biocorba from python. They are liked to from the BiopythonCorba wiki page at: http://biopython.org/wiki/html/BioPython/BiopythonCorba.html I'm also willing to help with questions, etc. I'm very happy to see biocora utilized, and definately think it is the right way to go here. Convincing Jason or Ewan to provide a ensembl corba server is much easier then writing python wrappers around ensembl internals :-). Brad From jchang at SMI.Stanford.EDU Fri Feb 23 19:58:48 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] Upcoming release Message-ID: Hello everybody! The current plans are to have a release at the end of the month (next Wednesday!) Please prepare to have all your code checked in by early next week. - Brad, is there anything special that needs to be done with the docs? - I'm not sure if Andrew is going to get around to making br_regrtest accept different newline conventions. Does anyone want to have a go at fixing this, or should we punt until the next release? Or, should we just scrap it altogether for PyUnit (definitely next release)? Let me know if there are other outstanding issues. Thanks, Jeff From thomas at cbs.dtu.dk Sat Feb 24 11:11:01 2001 From: thomas at cbs.dtu.dk (Thomas Sicheritz-Ponten) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] Re: [BioPython] first steps into python In-Reply-To: Ewan Birney's message of "Sat, 24 Feb 2001 13:44:46 +0000 (GMT)" References: Message-ID: Ewan Birney writes: > Ok. To help Ensembl-->python (probably via CORBA) integration. I have > downloaded biopython and biopython-corba. > > I am, therefore, belated learning python. Don't expect any road-to-damscus > type conversions (yet) however... Nice - I am currently working on understanding your perl scripts in order to make a lightweight python interface to ensembl. Maybe we could join efforts ? > > Some questions > (a) what is the difference between def __name__ and def name functions? * _single_leading_underscore: weak "internal use" indicator * single_trailing_underscore_: used by convention to avoid conflicts with Python keyword * __double_leading_underscore: class-private names in Python 1.4. *__double_leading_and_trailing_underscore__: "magic" objects or attributes that live in user-controlled namespaces, e.g. __init__, __import__ or __file__. user code should generally refrain from using this convention for its own use. * Attributes starting with two underschores, eg. "__n" are renamed when byte-compiled to "_CLASS__VARNAME". Since the class's name is used as part of the variable name, the variable "__n" in a subclass would not be the same as in the superclass. This is probably the closest to 'private' as you will get. > (b) how is inhertiance done in python > class ensembl: .. code something class EnsemblSQL(ensembl): ... code something > (c) is there any concept/files for interfaces either expliciting (java) or > "just documentation" like bioperl's "I" files > I am not sure if I understood your question .. > (d) could some one sketch out an easy biopython script like: > read embl file. > test whether there is a cacaca repeat in there > if yes, dump a genbank file. I don't know if there is a already working embl or genbank parser in the biopython core so I give you an example for a Fasta file. ==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP import sys from Bio.Fasta import Fasta # the filename as first commanline argument file = sys.argv[1] # open a Fasta parser iter = Fasta.Iterator(handle = open(file), parser = Fasta.RecordParser()) repeat = 'CACACA' # loop over all sequence entries in the fasta file while 1: rec = iter.next() if not rec: break sequence = rec.sequence # test with a simple string count n = sequence.count(repeat) if n: print repeat, 'occured', n, 'times in', rec.title else: print 'nope' ==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP If you want to look at actual code, I can put my last weeks, quick and dirty python script to the ensembl mysql db on www.cbs.dtu.dk/thomas/ensembl.py Don't hesitate asking/mailing me if you run into problems with python or have other general questions. Gotta run to a party :-) cheers, -thomas -- Sicheritz-Ponten Thomas, Ph.D CBS, Department of Biotechnology thomas@biopython.org The Technical University of Denmark CBS: +45 45 252489 Building 208, DK-2800 Lyngby Fax +45 45 931585 http://www.cbs.dtu.dk/thomas De Chelonian Mobile ... The Turtle Moves ... From jason at chg.mc.duke.edu Sat Feb 24 14:05:03 2001 From: jason at chg.mc.duke.edu (Jason Stajich) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] Re: [BioPython] first steps into python In-Reply-To: Message-ID: [cross-posted ] Ewan & Thomas, I am happy to work on the perl corba end and I am sure Brad Chapman would be happy to do the python corba end if we can get an IDL that can describe the ensembl data. The question is if we can squeeze ensembl objects into the current bioperl, biopython objects. I think the Bioperl gene objects are robust enough, not sure about biopython. I think it would be a real advantage to build the CORBA bridge because it would give us the ability to write programs to access the data withough having to install the mysql server locally. Otherwise Thomas is stuck learning Ensembl table structure and writing SQL which means when/if Ensembl decides to change table structure his code stops working. IMHO that is bad. But Thomas may not want to wait for us to get this going.... I'm willing to do the coding for this, but I'd need some help with the IDL design as I am not sure how deep we want to go into the Ensembl data model. -Jason On 24 Feb 2001, Thomas Sicheritz-Ponten wrote: > Ewan Birney writes: > > > Ok. To help Ensembl-->python (probably via CORBA) integration. I have > > downloaded biopython and biopython-corba. > > > > I am, therefore, belated learning python. Don't expect any road-to-damscus > > type conversions (yet) however... > > Nice - I am currently working on understanding your perl scripts in order > to make a lightweight python interface to ensembl. Maybe we could join > efforts ? > > > > > Some questions > > (a) what is the difference between def __name__ and def name functions? > > * _single_leading_underscore: weak "internal use" indicator > * single_trailing_underscore_: used by convention to avoid conflicts with Python keyword > * __double_leading_underscore: class-private names in Python 1.4. > *__double_leading_and_trailing_underscore__: "magic" objects or attributes that live in user-controlled namespaces, e.g. __init__, __import__ or __file__. > user code should generally refrain from using this convention for > its own use. > > * Attributes starting with two underschores, eg. "__n" are renamed > when byte-compiled to "_CLASS__VARNAME". Since the class's name is > used as part of the variable name, the variable "__n" in a subclass > would not be the same as in the superclass. This is probably the > closest to 'private' as you will get. > > > > (b) how is inhertiance done in python > > > class ensembl: > .. code something > > class EnsemblSQL(ensembl): > ... code something > > > > (c) is there any concept/files for interfaces either expliciting (java) or > > "just documentation" like bioperl's "I" files > > > I am not sure if I understood your question .. > > > (d) could some one sketch out an easy biopython script like: > > read embl file. > > test whether there is a cacaca repeat in there > > if yes, dump a genbank file. > > > I don't know if there is a already working embl or genbank parser in the biopython > core so I give you an example for a Fasta file. > > ==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP > import sys > from Bio.Fasta import Fasta > > # the filename as first commanline argument > file = sys.argv[1] > # open a Fasta parser > iter = Fasta.Iterator(handle = open(file), parser = Fasta.RecordParser()) > > > repeat = 'CACACA' > > # loop over all sequence entries in the fasta file > while 1: > rec = iter.next() > if not rec: break > > sequence = rec.sequence > > # test with a simple string count > n = sequence.count(repeat) > if n: > print repeat, 'occured', n, 'times in', rec.title > > else: > print 'nope' > > ==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP > > If you want to look at actual code, I can put my last weeks, quick and > dirty python script to the ensembl mysql db on www.cbs.dtu.dk/thomas/ensembl.py > > Don't hesitate asking/mailing me if you run into problems with python or > have other general questions. > > Gotta run to a party :-) > > cheers, > -thomas > > > -- > Sicheritz-Ponten Thomas, Ph.D CBS, Department of Biotechnology > thomas@biopython.org The Technical University of Denmark > CBS: +45 45 252489 Building 208, DK-2800 Lyngby > Fax +45 45 931585 http://www.cbs.dtu.dk/thomas > > De Chelonian Mobile ... The Turtle Moves ... > Jason Stajich jason@chg.mc.duke.edu Center for Human Genetics Duke University Medical Center http://www.chg.duke.edu/ From chapmanb at arches.uga.edu Sat Feb 24 17:02:34 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] Upcoming release In-Reply-To: References: Message-ID: <15000.12154.103965.403453@taxus.athen1.ga.home.com> Hey Jeff! > The current plans are to have a release at the end of the month (next > Wednesday!) Please prepare to have all your code checked in by early next > week. Sounds good! > - Brad, is there anything special that needs to be done with the docs? I don't think so... unless people want to donate more documentation :-). I think it would be nice to include a pdf version in the distribution, unless people are opposed to this. > - I'm not sure if Andrew is going to get around to making br_regrtest > accept different newline conventions. Does anyone want to have a go at > fixing this, or should we punt until the next release? Or, should we just > scrap it altogether for PyUnit (definitely next release)? I tried to fix br_regrtest, but didn't have any luck :-<. So I implemented a top level test module using PyUnit that runs all of the tests. This doesn't do any comparisons with the golden output right now, but at least makes sure that the tests run. Right now I'm saving the output from the test into a string using StringIO, so maybe someone has an idea for expanding this to also do comparisons in a way that is easier to deal with different newlines. I'm stumped on that... Anyways, let me know what you guys think about the script. BTW, you do need to download PyUnit from: http://pyunit.sourceforge.net and put unittest.py in the test directory. We can include this in the distribution without a problem, so it won't be an extra download once we do this. > Let me know if there are other outstanding issues. The only test that is failing for me with this script is Unigene -- it is looking for unigene_format.py, which isn't in CVS for me -- maybe this is something that still needs to be checked in. Cayte? Brad -------------- next part -------------- #!/usr/bin/env python """Run the biopython tests as a set of PyUnit tests. This is the top level function for running all tests. """ # standard modules import sys import cStringIO import os # PyUnit import unittest def run_tests(argv): all_tests = findtests() test_suite = unittest.TestSuite() for test in all_tests: class BiopythonTest(unittest.TestCase): def __init__(self, test_name): unittest.TestCase.__init__(self) self.test_name = test_name def shortDescription(self): return self.test_name def runTest(self): output = cStringIO.StringIO() # remember standard out so we can reset it after we are done save_stdout = sys.stdout try: # write the output from the test into a string sys.stdout = output __import__(self.test_name) finally: # return standard out to its normal setting sys.stdout = save_stdout # run the test as a PyUnit test test_suite.addTest(BiopythonTest(test)) runner = unittest.TextTestRunner() runner.run(test_suite) def findtests(): """Return a list of all applicable test modules.""" testdir = findtestdir() names = os.listdir(testdir) tests = [] for name in names: if name[:5] == "test_" and name[-3:] == ".py": tests.append(name[:-3]) tests.sort() return tests def findtestdir(): if __name__ == '__main__': file = sys.argv[0] else: file = __file__ testdir = os.path.dirname(file) or os.curdir return testdir if __name__ == "__main__": sys.exit(run_tests(sys.argv)) From jchang at SMI.Stanford.EDU Sat Feb 24 17:47:33 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] Bio.Tools.MultiProc code Message-ID: Hello everyone, I just checked in a new package Bio.Tools.MultiProc that contains some code for rudimentary multiprocessing. There are 3 modules: - copen.py Code that wraps a child process in a file-like object. This basically gives you a file handle to a fork'd system command or python function. - Task.py Partially implements the threading.Thread interface around copen, to give a common interface for multi-threaded and fork'd subprocesses. - Scheduler.py Helps run multiple Thread (or Task) objects. Here's a trivial example: >>> from Bio.Tools.MultiProc import copen >>> def add1(x): ... return x+1 ... >>> h = copen.copen_fn(add1, 5) >>> print h.read() 6 >>> add1 was actually computed as a separate process. I've been using this code to help speed up a trivially parallelizable computation. Hopefully someone else may find this useful as well! Jeff From jchang at SMI.Stanford.EDU Sat Feb 24 17:50:20 2001 From: jchang at SMI.Stanford.EDU (Jeffrey Chang) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] Upcoming release In-Reply-To: <15000.12154.103965.403453@taxus.athen1.ga.home.com> Message-ID: On Sat, 24 Feb 2001, Brad Chapman wrote: > I think it would be nice to include a pdf version in the distribution, > unless people are opposed to this. I think it's be really good. > > - I'm not sure if Andrew is going to get around to making br_regrtest > > accept different newline conventions. Does anyone want to have a go at > > fixing this, or should we punt until the next release? Or, should we just > > scrap it altogether for PyUnit (definitely next release)? > > I tried to fix br_regrtest, but didn't have any luck :-<. Yeah, I had looked into this once, and it wasn't clear that the fix would be really simple. I guess we'll have to let Andrew deal with it... > So I implemented a top level test module using PyUnit that runs all of > the tests. This doesn't do any comparisons with the golden output > right now, but at least makes sure that the tests run. Right now I'm > saving the output from the test into a string using StringIO, so maybe > someone has an idea for expanding this to also do comparisons in a way > that is easier to deal with different newlines. I'm stumped on that... I'm glad someone's looking seriously into this! It sounds like something for the next release, though... Jeff From chapmanb at arches.uga.edu Sun Feb 25 07:13:43 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] Upcoming release In-Reply-To: References: <15000.12154.103965.403453@taxus.athen1.ga.home.com> Message-ID: <15000.63223.754238.992950@taxus.athen1.ga.home.com> [I was working on a PyUnit framework for integrating the tests] Jeff: > I'm glad someone's looking seriously into this! It sounds like > something for the next release, though... Okay, well I completely ignored your message and worked more on this :-). In the shower this morning I thought of some ways to fix the problems we've been having, using the PyUnit framework I posted yesterday. It seems like I've got the regression comparisons working now so I implemented a "replacement" for br_regrtest.py that uses PyUnit. The only downside of the comparisons now is that it reads the entire output into a string, and then does the comparison, but I can't ever imagine that an output would be so incredibly huge this would be a problem (otherwise the test should probably be split up!). I used the fancy pyunit GUI stuff, so now the tests run by default with a little Tk GUI (should be nicer on Windows, and especially nicer for Macs). This all works for me okay on both Unix and Windows. What do people think? Does anyone have time to look at this before next weeks release, or do you all want to put it off until after? BTW, I noticed some problems with the tests while doing this, which I can now attribute to actual problems: o test_NCBIWWW is failing right now, due to problems in comparing the output (and these are not due to newline problems). I looked in the logs to see what had changed, and it looks like Thomas checked in an output change, but there wasn't a corresponding change to the tests. o test_SubsMat -- this seems to be failing on windows due to the fact that Windows prints -0.00 and the output is 0.00. I guess this is a Windows/UNIX difference. It's probably not worth worrying about since -0.00 and 0.00 are the same thing (as far as I know :-). Brad -------------- next part -------------- #!/usr/bin/env python """Run the biopython tests as a set of PyUnit-based regression tests. This will find all modules whose name is "test_*" in the test directory, and run them. Various command line options provide additional facilities. Command line options: -g;--generate -- write the output file for a test instead of comparing it. A test to write the output for must be specified. --no-gui -- do not use a GUI to run the tests --help -- show usage info """ # standard modules import sys import cStringIO import os import string import sys import getopt # PyUnit import unittest import unittestgui def main(argv): # start off using the GUI use_gui = 1 if use_gui: try: import Tkinter as tk except ImportError: use_gui = 0 # get the command line options try: opts, args = getopt.getopt(argv[1:], 'g', ["generate", "no-gui", "help"]) except getopt.error, msg: print msg print __doc__ return 2 # deal with the options for o, a in opts: if o == "--help": print __doc__ return 0 if o == "--no-gui": use_gui = 0 if o == "-g" or o == "--generate": if len(args) > 1: print "Only one argument (the test name) needed for generate" print __doc__ return 2 elif len(args) == 0: print "No test name specified to generate output for." print __doc__ return 2 # strip off .py if it was included if args[0][-3:] == ".py": args[0] = args[0][:-3] generate_output(args[0]) return 0 # run the tests if use_gui: root = tk.Tk() root.title("PyUnit") runner = unittestgui.TkTestRunner(root, "pyunit_testing.biopython_suite") root.protocol('WM_DELETE_WINDOW', root.quit) root.mainloop() else: test_suite = biopython_suite() runner = unittest.TextTestRunner() runner.run(test_suite) def biopython_suite(): all_tests = findtests() test_suite = unittest.TestSuite() # all_tests = ["test_File"] for test in all_tests: class BiopythonTest(unittest.TestCase): def __init__(self, test_name): unittest.TestCase.__init__(self) self.test_name = test_name def __str__(self): return self.shortDescription() def shortDescription(self): return self.test_name def runTest(self): generated_output = '' output = cStringIO.StringIO() # remember standard out so we can reset it after we are done save_stdout = sys.stdout try: # write the output from the test into a string sys.stdout = output __import__(self.test_name) generated_output = output.getvalue() finally: # return standard out to its normal setting sys.stdout = save_stdout # get the expected output testdir = findtestdir() outputdir = os.path.join(testdir, "output") outputfile = os.path.join(outputdir, self.test_name) try: expected_handle = open(outputfile, 'r') output_handle = cStringIO.StringIO(generated_output) # check the expected output to be consistent with what # we generated compare_output(self.test_name, output_handle, expected_handle) except IOError: raise IOError, "Warning: Can't open %s for test %s" % \ (outputfile, self.test_name) # add the test to the test suite test_suite.addTest(BiopythonTest(test)) return test_suite def findtests(): """Return a list of all applicable test modules.""" testdir = findtestdir() names = os.listdir(testdir) tests = [] for name in names: if name[:5] == "test_" and name[-3:] == ".py": tests.append(name[:-3]) tests.sort() return tests def findtestdir(): if __name__ == '__main__': file = sys.argv[0] else: file = __file__ testdir = os.path.dirname(file) or os.curdir return testdir def generate_output(test_name): """Generate the golden output for the specified test. """ testdir = findtestdir() outputdir = os.path.join(testdir, "output") outputfile = os.path.join(outputdir, test_name) output_handle = open(outputfile, 'w') # write the test name as the first line of the output output_handle.write(test_name + "\n") # remember standard out so we can reset it after we are done save_stdout = sys.stdout try: # write the output from the test into a string sys.stdout = output_handle __import__(test_name) finally: output_handle.close() # return standard out to its normal setting sys.stdout = save_stdout def compare_output(test_name, output_handle, expected_handle): """Compare output from a test to the expected output. Arguments: o test_name - The name of the test we are running. o output_handle - A handle to all of the output generated by running a test. o expected_handle - A handle to the expected output for a test. """ # first check that we are dealing with the right output # the first line of the output file is the test name expected_test = string.strip(expected_handle.readline()) assert expected_test == test_name, "\nOutput: %s\nExpected: %s" % \ (test_name, expected_test) # now loop through the output and compare it to the expected file while 1: expected_line = expected_handle.readline() output_line = output_handle.readline() # stop looping if either of the info handles reach the end if not(expected_line) or not(output_line): # make sure both have no information left assert expected_line == '', "Unread: %s" % expected_line assert output_line == '', "Extra output: %s" % output_line break # normalize the newlines in the two lines expected_line = string.strip(expected_line) output_line = string.strip(output_line) expected_line = convert_newlines(expected_line) output_line = convert_newlines(output_line) # make sure the two lines are the same assert expected_line == output_line, "\nOutput : %s\nExpected: %s" % \ (output_line, expected_line) def convert_newlines(line): """Convert all newlines in the given line into '\\n'. This helps deal with the problem between python2.1 and older pythons in how line breaks are generated inside strings. Older python versions used '\012', and 2.1 uses '\n'. """ # two slashes are used since we are dealing with newlines inside strings # where they are "escaped" with the extra \ newlines = ["\\012"] for newline_to_replace in newlines: line = line.replace(newline_to_replace, "\\n") return line if __name__ == "__main__": sys.exit(main(sys.argv)) From chapmanb at arches.uga.edu Sun Feb 25 20:25:51 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] Biopython clustalw bug Message-ID: <15001.45215.138131.400298@taxus.athen1.ga.home.com> Hey Dan; Thanks for the bug report on clustalw. I actually ran into this bug a month or so ago and fixed it in CVS. Wow, I am so efficient -- good thing I can see into the future and predict bug reports :-). Seriously, the fix should be out in the new release (due out this Wednesday) or you can get the current version from CVS. If you could make sure this works with your example, that would be super. I'm really glad you submitted this, because it gave me a reason to go back in and fix up the clustalw format to be a lot more permissive on what it allows in titles, and also to support cross-platform line breaks. Thanks again for the report. Brad From Desai.Dinakar at mayo.edu Sun Feb 25 20:41:01 2001 From: Desai.Dinakar at mayo.edu (Dinakar) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] problem with happydoc and biopython References: <15001.45215.138131.400298@taxus.athen1.ga.home.com> Message-ID: <3A99B42C.B5FD0454@mayo.edu> Hello Everyone: I downloaded HappyDoc and tried to generate biopython doc. It seems to go some kind of indefinite loop when it is generating doc for SubsMat modules. It goes through other modules pretty fast. I dont know where the problem it. I tested happydoc with my modules and it works. If I move SubsMat to some other location then happydoc generates documents without problem. I can not figureout where exactly the problem. I hope, some familiar with code can take a look at it. thank you. dinakar -------------- next part -------------- A non-text attachment was scrubbed... Name: desai.dinakar.vcf Type: text/x-vcard Size: 242 bytes Desc: Card for Dinakar Url : http://portal.open-bio.org/pipermail/biopython-dev/attachments/20010225/548b2a8f/desai.dinakar.vcf From chapmanb at arches.uga.edu Sun Feb 25 20:57:05 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] problem with happydoc and biopython In-Reply-To: <3A99B42C.B5FD0454@mayo.edu> References: <15001.45215.138131.400298@taxus.athen1.ga.home.com> <3A99B42C.B5FD0454@mayo.edu> Message-ID: <15001.47089.136149.697197@taxus.athen1.ga.home.com> Hi Dinakar; > I downloaded HappyDoc and tried to generate biopython doc. It seems to go > some kind of indefinite loop when it is generating doc for SubsMat > modules. Happydoc is just slow on the SubsMat directory because MatrixInfo.py is a very long file (it is just a ton of different substitution matrices as python dictionaries), so it takes a while for Happydoc to get through it. If you just wait it out a bit, it should finish without a problem. My solution is to go get myself a beer while waiting for happydoc to finish :-). Brad From katel at worldpath.net Mon Feb 26 02:31:09 2001 From: katel at worldpath.net (Cayte) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] Upcoming release References: <15000.12154.103965.403453@taxus.athen1.ga.home.com> Message-ID: <002701c09fc6$17a4ef40$010a0a0a@cadence.com> > The only test that is failing for me with this script is Unigene -- it > is looking for unigene_format.py, which isn't in CVS for me -- maybe > this is something that still needs to be checked in. Cayte? > > Brad > I couldn't find a reference to unigene_format in my latest version. unigene_format.py is a fossil from my attempt to mesh it with Martel. Andrew and I agreed that sgmllib would be a better choice for data that makes heavy use of html. I commited Unigene.py again un case the old version was still in the database. Let me know if you still have a problem. Cayte From chapmanb at arches.uga.edu Sun Feb 25 23:46:13 2001 From: chapmanb at arches.uga.edu (Brad Chapman) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] Upcoming release In-Reply-To: <002701c09fc6$17a4ef40$010a0a0a@cadence.com> References: <15000.12154.103965.403453@taxus.athen1.ga.home.com> <002701c09fc6$17a4ef40$010a0a0a@cadence.com> Message-ID: <15001.57237.430446.843956@taxus.athen1.ga.home.com> [Problems with unigene tests] > I couldn't find a reference to unigene_format in my latest version. > unigene_format.py is a fossil from my attempt to mesh it with Martel. > Andrew and I agreed that sgmllib would be a better choice for data that > makes heavy use of html. > > I commited Unigene.py again un case the old version was still in the > database. Let me know if you still have a problem. Ah ha! I think I know the problem. Jeff moved UniGene.py code to UniGene/__init__.py, since we decided to do that across all of the modules to make imports easier. UniGene/UniGene.py is officially deleted in CVS, so checkouts won't get it -- if you could copy your current code base to UniGene/__init__.py and commit that, then hopefully things will work again. Brad From thomas at cbs.dtu.dk Mon Feb 26 06:30:32 2001 From: thomas at cbs.dtu.dk (thomas@cbs.dtu.dk) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] Re: Ensembl<->Biocorba bridge - sequence gets working. Message-ID: <15002.15960.782090.828683@genome.cbs.dtu.dk> Ewan Birney writes: > I don't have the combination of a large, accessible Ensembl database and > CORBA::ORBit provided. There is also the problem of the Sanger Centre > firewall. > It maybe that Thomas is the first person to test out the code ;) Great, this morning I dived into CORBA and theoretically everything seems to work. I have a locally running Ensembl database at CBS and at my linuxbox at home. But ... in I am not a perl'er and I trying to install ALL necessary perl modules in order to get the server running via servers/ensembl_server.pl is a $#%# pain in the $#%#. This morning I promised myself to get it up and running before I go for breakfast and coffee ... puuuuhhhhh How easy/realistic would it be to implement the server in python ... - I feel one hour CORBA crash-course (without coffee) is maybe not enough to play around with the server code ... kaffedarra'ly y'rs -thomas -- Sicheritz-Ponten Thomas, Ph.D CBS, Department of Biotechnology thomas@biopython.org The Technical University of Denmark CBS: +45 45 252489 Building 208, DK-2800 Lyngby Fax +45 45 931585 http://www.cbs.dtu.dk/thomas De Chelonian Mobile ... The Turtle Moves ... From thomas at cbs.dtu.dk Mon Feb 26 10:49:11 2001 From: thomas at cbs.dtu.dk (Thomas Sicheritz-Ponten) Date: Sat Mar 5 14:42:57 2005 Subject: [Biopython-dev] Re: Ensembl<->Biocorba bridge - sequence gets working. In-Reply-To: Ewan Birney's message of "Mon, 26 Feb 2001 13:43:16 +0000 (GMT)" References: Message-ID: Ewan Birney writes: > > Ewan Birney writes: > > > > > > I think I have everything, but I still get following error: > > > > Can't locate object method "new" via package "Bio::EnsEMBL::CORBA::Base" at > > > > /home/thomas/cbs/python/biopython/biopython-corba/ensembl-corba-server/modules/Bio/EnsEMBL/CORBA/Base.pm > > > > line 54. > > > > > > Aha. This is because I wrote the ensembl-corba-server too quickly without > > > a Makefile.PL > > > > > > You need to point the PERL5LIB environment variable to > > > ensembl-corba-server/modules and/or use the -I argument to perl to put the > > > corba-server objects in your path. > > > > I had tried that before, but it won't work. > > > > The problem occurs in: > > @ISA = qw(Bio::Root::RootI); > > > > sub new { > > my ($class, @args) = @_; > > my $self = $class->SUPER::new(@args); <=== > > > > Whats SUPER is that a general perl OO qualifier or something bioperl > > special ? > > It is not a bioperl special but (hmmmm... Ewan thinks). I think I know why > this is happening. Do the following: Either > > (a) check out the main trunk of bioperl (bioperl-live) from the cvs > server. This should work. > > (b) replace the above SUPER::new lines with > > $self = {}; > bless $self,$class; > > > > This was an expected 0.6.2<->Main trunk bioperl difference. Nope - bioperl-live didn't solve the problem, and I don't feel like solution b) is the right way to go ... I wait until the admnistrators installed perls ORBit at CBS so I can test the script on our IRIX'es. In the meantime, I use the logs of mysql to see what SQL queries perl executes for a virtualcontig. I am going to implement these in python. waiting-for-root-to-install-things'ly y'rs -thomas -- Sicheritz-Ponten Thomas, Ph.D CBS, Department of Biotechnology thomas@biopython.org The Technical University of Denmark CBS: +45 45 252489 Building 208, DK-2800 Lyngby Fax +45 45 931585 http://www.cbs.dtu.dk/thomas De Chelonian Mobile ... The Turtle Moves ...