From anunberg at oriongenomics.com Wed Oct 1 15:02:08 2003 From: anunberg at oriongenomics.com (Andrew Nunberg) Date: Wed Oct 1 14:59:56 2003 Subject: [BioPython] help with mx installation on redhat In-Reply-To: <9AE1D738-F396-11D7-B71D-000A95A001EE@oriongenomics.com> Message-ID: Nevermind, I fixed it by installing again. A reminder that rpms for python seem to be end up in /usr/lib/python... so make sure your PYTHONPATH includes that path Andy On Tuesday, September 30, 2003, at 05:36 PM, Andrew Nunberg wrote: > Hi, > I installed the egenix mx module on a redhat 7,2 machine. > i can import mx but cannot import TexTools > when i try from mx import TextTools i get: > Traceback (most recent call last): > File "", line 1, in ? > File > "/usr/local/lib/python2.3/site-packages/mx/TextTools/__init__.py", > line 8, in ? > File > "/usr/local/lib/python2.3/site-packages/mx/TextTools/TextTools.py", > line 13, in ? > File > "/usr/local/lib/python2.3/site-packages/mx/TextTools/mxTextTools/ > __init__.py", line 8, in ? > ImportError: No module named mxTextTools > > I take it i did not install themodule correctly or is it some > permission issue? > Thanks > > --------------------------------------------------- > Andrew Nunberg Ph.D > Bioinfomagician > Orion Genomics > 4041 Forest Park > St Louis, MO > 314-615-6989 > anunberg@oriongenomics.com > www.oriongenomics.com > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > --------------------------------------------------- Andrew Nunberg Ph.D Bioinfomagician Orion Genomics 4041 Forest Park St Louis, MO 314-615-6989 anunberg@oriongenomics.com www.oriongenomics.com From anunberg at oriongenomics.com Thu Oct 2 15:33:39 2003 From: anunberg at oriongenomics.com (Andrew Nunberg) Date: Thu Oct 2 15:31:28 2003 Subject: [BioPython] retrieving fasta records from a Fasta Dictionary Message-ID: <52ADD154-F50F-11D7-8996-000A95A001EE@oriongenomics.com> Hi, I created a fasta index and passed this subroutine in the statement def get_id(fasta_record): id= string.split(fasta_record.title) return id[0] when i retreive sequences from the dictionary using Fasta.SequenceParser the SeqRecord objects have no id or name, but they do have the sequence, is this normal behavior? Andy --------------------------------------------------- Andrew Nunberg Ph.D Bioinfomagician Orion Genomics 4041 Forest Park St Louis, MO 314-615-6989 anunberg@oriongenomics.com www.oriongenomics.com From daegelen at genoscope.cns.fr Mon Oct 6 04:33:52 2003 From: daegelen at genoscope.cns.fr (Patrick Daegelen) Date: Mon Oct 6 04:31:32 2003 Subject: [BioPython] python 2.3 Message-ID: <20031006103352.A1474@etna.genoscope.cns.fr> Hi, I work under python 2.3. The current biopython release cannot be installed under that python version. Cd you tell me if a python 2. 2.3 compatible biopython will be available in a near future ? Thanks, P. -- Patrick Daegelen Projet "Eaux Douces" UMR CNRS 8030 "Structure et ?volution des g?nomes" GENOSCOPE Centre National de S?quen?age 2, rue Gaston Cr?mieux, CP 5706, 91000 ?VRY Cedex, FRANCE T?l : 33 (0) 1 60 87 25 08 FAX : 33 (0) 1 60 87 25 14 Email : daegelen@genoscope.cns.fr http://www.genoscope.fr/ From mdehoon at ims.u-tokyo.ac.jp Mon Oct 6 04:55:58 2003 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Mon Oct 6 04:51:15 2003 Subject: [BioPython] python 2.3 In-Reply-To: <20031006103352.A1474@etna.genoscope.cns.fr> References: <20031006103352.A1474@etna.genoscope.cns.fr> Message-ID: <3F812E1E.2050808@ims.u-tokyo.ac.jp> I have installed biopython 1.21 with Python 2.3 and didn't run into problems. Which system are you running on, and which problem did you encounter when installing biopython? --Michiel. Patrick Daegelen wrote: > Hi, > > I work under python 2.3. > The current biopython release cannot be installed > under that python version. > > Cd you tell me if a python 2. 2.3 compatible biopython > will be available in a near future ? > > Thanks, > P. > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From kapiaa at hotmail.com Tue Oct 7 08:59:20 2003 From: kapiaa at hotmail.com (apiaa kuffor) Date: Tue Oct 7 08:56:58 2003 Subject: [BioPython] Cooperation Message-ID: Sir, I am Mr. Kuffor Apia the manager, in charge of bill and exchange treasury African development bank Accra, Ghana (A.D.B).There is a deal I would want you to champion on my behalf, In my bank, I discovered an abandoned sum of $7million dollars in account belonging to one of our expatriate customer an Australian citizen who died with his family in a plane crash that happened off the coast of Abidjan in Ivory coast in West Africa, In Jan.2000, Involving Kenya airbus A310. KQ431 The information contained in the file with the bank shows that his wife, being his next of kin is the only person authorized legally to put claims for this money, but unfortunately he died along with her. This money had been floating without anybody coming to make claims for it. It is therefore upon this discovery that I have decided to contact a foreigner to collaborate with me to put a claim that you are a relation to the deceased so that this money will be paid to you. Since nobody had come up for the claim. If this money is allowed to continue to float the bank will eventually transfer it to dormant account As you are aware, I cannot personally take away this money without the support of an expatriate who has foreign account. I can only succeed in executing this business if you accept to front for it. I promise to do everything humanly possible on my capacity to make sure that this deal sails through and we will have a favourable sharing pattern. In this regard, I will consider you take 40% of the total sum after the deal. If this proposal is acceptable to you, please, get back to me immediately to enable me furnish you with more details for the transaction. Yours truly Kuffor Apia _________________________________________________________________ Protect your PC - get McAfee.com VirusScan Online http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963 From jefftc at stanford.edu Thu Oct 9 23:18:44 2003 From: jefftc at stanford.edu (Jeffrey Chang) Date: Thu Oct 9 23:16:23 2003 Subject: [BioPython] Biopython 1.22 available Message-ID: <7445CC8C-FAD0-11D7-A5AC-000A956845CE@stanford.edu> Hello Everybody, Biopython 1.22 is now available from the website at: http://www.biopython.org/ This is mostly a maintenance release. The installation process is improved, and now distributes Martel and DTD files correctly. The changes made in this release are: Added Peter Slicker's patches for speeding up modules under Python 2.3 Fixed Martel installation. Does not install Bio.Cluster without Numeric. Distribute EUtils DTDs. Yves Bastide patched NCBIStandalone.Iterator to be Python 2.0 iterator Ashleigh's string coersion fixes in Clustalw. Yair Benita added precision to the protein molecular weights. Bartek updated AlignAce.Parser and added Motif.sim method bug fixes in Michiel De Hoon's clustering library Iddo's bug fixes to Bio.Enzyme and new RecordConsumer Guido Draheim added patches for fixing import path to xbb scripts regression tests updated to be Python 2.3 compatible GenBank.NCBIDictionary is smarter about guessing the format As usual, please report bugs to biopython-dev@biopython.org, or the bug database also available from the website. Jeff From thamelry at vub.ac.be Fri Oct 10 08:14:30 2003 From: thamelry at vub.ac.be (Thomas Hamelryck) Date: Fri Oct 10 08:12:43 2003 Subject: [BioPython] mmCIF parser added to Bio.PDB In-Reply-To: References: Message-ID: <200310101414.30793.thamelry@vub.ac.be> Hi everybody, Due to popular demand (by Cath Lawrence :-), I've added mmCIF support to Bio.PDB. mmCIF in short is a file format that is used to describe crystal structures. The mmmCIF format solves many problems that are associated with the older PDB format (or at least that's what I'm told :-). Usage: >>> from Bio.PDB.MMCIFParser import MMCIFParser >>> parser=MMCIFParser() >>> structure=parser.get_structure("test", "1FAT.cif") In addition, there is also MMCIF2Dict, which makes the contents of an mmCIF file available as a Python dictionary (with the data tags as keys), so you can easily address all data in the mmCIF file. Usage: >>> from Bio.PDB.MMCIF2Dict import MMCIF2Dict >>> d=MMCIF2Dict("1FAT.cif") >>> print d["_database_PDB_matrix.entry_id"] 1FAT >>> print d["_struct_site.id"] ['CAA', 'MNA', 'CAB', 'MNB', 'CAC', 'MNC', 'CAD', 'MND'] >>> d["_computing.structure_solution"] "'X-PLOR 3.1'" The modules use C/Lex code to parse the file, so it's reasonably fast. Note that compilation requires C and GNU Lex (ie. Flex). There is no support for writing mmCIF files, and I'm not planning to work on that either. I'd be interested to hear about possible bugs, requested feactures etc, but it should work reasonably as is. Cheers, --- Thomas Hamelryck COMO-ULTR Vrije Universiteit Brussel (VUB) Belgium http://homepages.vub.ac.be/~thamelry From ssheu at post.harvard.edu Fri Oct 10 14:51:57 2003 From: ssheu at post.harvard.edu (Shu-Hsien Sheu) Date: Fri Oct 10 14:56:18 2003 Subject: [BioPython] Spatial clustering In-Reply-To: <7445CC8C-FAD0-11D7-A5AC-000A956845CE@stanford.edu> References: <7445CC8C-FAD0-11D7-A5AC-000A956845CE@stanford.edu> Message-ID: <3F86FFCD.8030506@post.harvard.edu> Hi, I am now working on a mapping protein binding site project which would generate thousands of small organic molecules in cartesian coordinates. Next step would be to cluster these small molecules. Is there any modules available for this kind of task? PyCluster seems to work with 2D gene expression data only, thoug through some modifications I can use it as well. I am thinking of using RMSD matrix and then a density-based algorithym. The following paper gave me some general ideas about the approaches I can take: http://prlab.ee.memphis.edu/frigui/CLUSTER_PAPERS/Ericasurvey.pdf Any comments here? thanks! -shuhsien From idoerg at burnham.org Fri Oct 10 15:13:28 2003 From: idoerg at burnham.org (Iddo Friedberg) Date: Fri Oct 10 15:11:57 2003 Subject: [BioPython] Spatial clustering In-Reply-To: <3F86FFCD.8030506@post.harvard.edu> References: <7445CC8C-FAD0-11D7-A5AC-000A956845CE@stanford.edu> <3F86FFCD.8030506@post.harvard.edu> Message-ID: <3F8704D8.8070605@burnham.org> Shi-Hsien, I haven't read the paper you pointed to (yet) but here is a little inventory work regarding macine Learning tools that we got: NaiveBayes, SVM, kMeans, MaxEntropy ... any of those good? Off biopython, ever considered cluto? http://www-users.cs.umn.edu/~karypis/cluto/ Best, Iddo Shu-Hsien Sheu wrote: > Hi, > > I am now working on a mapping protein binding site project which would > generate thousands of small organic molecules in cartesian coordinates. > Next step would be to cluster these small molecules. Is there any > modules available for this kind of task? PyCluster seems to work with 2D > gene expression data only, thoug through some modifications I can use it > as well. I am thinking of using RMSD matrix and then a density-based > algorithym. The following paper gave me some general ideas about the > approaches I can take: > > http://prlab.ee.memphis.edu/frigui/CLUSTER_PAPERS/Ericasurvey.pdf > > Any comments here? > > thanks! > > -shuhsien > > > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > > -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 646 3171 http://ffas.ljcrf.edu/~iddo From dalke at dalkescientific.com Fri Oct 10 17:57:17 2003 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri Oct 10 17:53:33 2003 Subject: [BioPython] Spatial clustering In-Reply-To: <3F86FFCD.8030506@post.harvard.edu> Message-ID: Shu-Hsien Sheu wrote: > Next step would be to cluster these small molecules. I don't know much about spatial clustering like what you're talking about -- are you attempting to classify binding affinities for different parts of the pocket? Have you considered fingerprints (Daylight-style or MACCS keys) or shape fitting (as from OpenEye)? There's some fingerprint generation code in frowns, or if you have Daylight or OEChem it's easy to create on your own. Another option you might consider is pharmacophore points instead of RMSD. Andrew dalke@dalkescientific.com From mdehoon at ims.u-tokyo.ac.jp Sat Oct 11 04:28:32 2003 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sat Oct 11 04:23:29 2003 Subject: [BioPython] Spatial clustering In-Reply-To: <3F86FFCD.8030506@post.harvard.edu> References: <7445CC8C-FAD0-11D7-A5AC-000A956845CE@stanford.edu> <3F86FFCD.8030506@post.harvard.edu> Message-ID: <3F87BF30.5090209@ims.u-tokyo.ac.jp> An easy solution would be to cluster the molecules using the treecluster routine in PyCluster (== Bio.Cluster in Biopython). This routine implements pairwise single-, maximum-, average-, and centroid-linkage hierarchical clustering. You will need to calculate the distances between the molecules and pass these distances as the "distancematrix" argument to the treecluster routine. This lets you define the distance measure as suitable for your problem. Unfortunately I am not familiar with density-based algorithm, but if you use a sensible definition for the distance between molecules then hierarchical clustering should give you a sensible clustering result. You can also use the k-means routine in PyCluster, but that one won't let you specify the distance matrix yourself. This means that you can only use the distance measures built in to the k-means routine (e.g. the Euclidean distance), which may or may not be suitable for your problem. In my experience, choosing the right distance measure is often more important than the clustering algorithm, so I would go with hierarchical clustering if the distance measures in k-means clustering are not suitable for your task. --Michiel. Shu-Hsien Sheu wrote: > Hi, > > I am now working on a mapping protein binding site project which would > generate thousands of small organic molecules in cartesian coordinates. > Next step would be to cluster these small molecules. Is there any > modules available for this kind of task? PyCluster seems to work with 2D > gene expression data only, thoug through some modifications I can use it > as well. I am thinking of using RMSD matrix and then a density-based > algorithym. The following paper gave me some general ideas about the > approaches I can take: > > http://prlab.ee.memphis.edu/frigui/CLUSTER_PAPERS/Ericasurvey.pdf > > Any comments here? > > thanks! > > -shuhsien > > > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > > -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From ssheu at post.harvard.edu Tue Oct 14 11:16:00 2003 From: ssheu at post.harvard.edu (Shu-Hsien Sheu) Date: Tue Oct 14 11:16:20 2003 Subject: [BioPython] Spatial clustering In-Reply-To: <3F8704D8.8070605@burnham.org> References: <7445CC8C-FAD0-11D7-A5AC-000A956845CE@stanford.edu> <3F86FFCD.8030506@post.harvard.edu> <3F8704D8.8070605@burnham.org> Message-ID: <3F8C1330.1000703@post.harvard.edu> Dear all, thanks for all the inputs! I am new to this field and came from a bio background so I am not that familiar with computer sciences. The project, however, was there for 1 year and had shown great results for some enzymes we tested: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=14499612&dopt=Abstract The basic idea is to use organic solvents as "probes" and use energy function to find the favorable minimums. We first used a simplex method with Van der Waals cancellation and then do the further minimization using CHARMm. Through some testing we've found a 6660 positions of the probes would bring the best results. By clustering those molecules and calculating the average free energy for each we can come up with top 5 energy favorable clusters. It was shown that the "concensus" site of the clusters of different probes is the binding site of the protein. Actually the cluster code is already there and is written in C. The person who wrote both the mapping program and the clustering program had already left this lab. Originally I was working on the concensus site finding part, which was done by manual inspection in Rasmal or PyMol in the past, but later thought that it might be more efficient if I wrap these two parts together. To me creating a valid RMSD matrix seems to be as important as the algorithym for clustering. For instance, the small molecules we used ranges from methanol to t-butanyl, and for the later two reference points might be needed. Finding the consensus sight might have more problems, since you are then dealing with different kinds of molecules. Any comments here? Clustering seems to be an important issue when doing molecular modelling. People working on protein-protein docking in this lab all have some efforts in this though no collaborationg or a uniform method have been developed yet. I have a naive questions about array/matrixes. Pairwise RMSD doesn't have direction, e.g. RMSD(1,2) == RMSD(2,1). Therefore, the distance matrix would look like this: 1 2 3 4 5 1 X .2 .1 1.2 3.4 2 .2 X .5 .2 .4 3 .1 .5 X .6 .7 4 1.2 .2 .6 X .2 5 3.4 .4 .7 .2 X I've read the Numarray tutorial and there seems no special functions for matrixes that's symmetrical on the diagnol. Any more efficient approaches? An algorithy in my mind is, starting with the RMSD matrix, first I would find the one with most neighbors, make it the hub of the cluster and take it out along with its memeber, then do the same thing recursively. Dear Iddo, I just checked cluto and would try to find if it's good for my purpose. thanks! Dear Andrew, I am not familiar with fingerprints or shape fiitting. Can you give me a place for start? I will search through google as well. I am not familiar with pharmacophore and will check it as well. Dear Michiel, I've read the PyCluster document and it seems that I had missed the point that the treecluster can let me specify the distance matrix myself. It might be the easiest solution. Thanks! -shuhsien From ssheu at post.harvard.edu Tue Oct 14 11:23:22 2003 From: ssheu at post.harvard.edu (Shu-Hsien Sheu) Date: Tue Oct 14 11:30:28 2003 Subject: [BioPython] Spatial clustering In-Reply-To: <3F8C1330.1000703@post.harvard.edu> References: <7445CC8C-FAD0-11D7-A5AC-000A956845CE@stanford.edu> <3F86FFCD.8030506@post.harvard.edu> <3F8704D8.8070605@burnham.org> <3F8C1330.1000703@post.harvard.edu> Message-ID: <3F8C14EA.7080004@post.harvard.edu> Sorry but I forgot to mentioned this. The clustering code is using an algorithym that creats an appropriate number of clusters such that the maximum distance between the hub of a cluster and any of its memebers(the cluster radius) is smaller then half of the average distance among all of the existing hubs. Also, an upper limit of the cluster radiusis was introduced to account for the physical dimentions of the molecule. From Jose.Sepulveda.Sanchis at cgb.ki.se Tue Oct 14 16:02:25 2003 From: Jose.Sepulveda.Sanchis at cgb.ki.se (Jose Sepulveda Sanchis) Date: Tue Oct 14 15:59:53 2003 Subject: [BioPython] DAS client Message-ID: <5573f53f35.53f355573f@ki.se> hi, I have seen that there is a DAS client for python but I cannot get it from the link. Is this client included in the biopython? if not how can I get it? is it posible to see the source code ? thanks [BioPython] DAS client Andrew Dalke dalke@dalkescientific.com Fri, 23 Aug 2002 11:40:37 -0600 * Previous message: [BioPython] Better blasting with XML * Next message: [BioPython] clustaw * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] I've got a first draft, as it were, of a DAS client. You can download and try it out from http://www.biopython.org/~dalke/das.tar.gz Here's some documentation for it. - ungzip / untar it - cd das - start python >>> import das >>> server = das.Server("http://servlet.sanger.ac.uk:8080/das") the 'server' object acts like a Python dictionary for the different data sources available >>> print len(server) >>> print server.keys() >>> dsn = server["mouse73"] >>> sheet = dsn.stylesheet() This is class is built given the DTD. Here's some examples of how you can manipulate it. Dump the whole data structure as XML >>> print sheet See what it contains >>> sheet.get_children() Work with subelements >>> print len(sheet["STYLESHEET"]) >>> category = sheet["STYLESHEET"][0] Get attributes of a node >>> print category["TYPE"].id >>> glyph = category["TYPE"]["GLYPH"] Any subelement can also be dumped to XML >>> print glyph Get the text inside a node (.text() returns a list of unicode strings) >>> print "".join(map(str, glyph[0]["COLOR"].text())) Besides 'stylesheet' the following methods are also supported on a DSN >>> [s for s in dir(dsn) if not s.startswith("_")] ['dna', 'dsn', 'entry_points', 'features', 'link', 'sequence', 'server', 'stylesheet', 'types'] (Actually, 'dsn' and 'server' are attributes. >>> eps = dsn.entry_points() >>> for segment in ep["ENTRY_POINTS"]: ... print segment.id, segment.size ... X 147161770 19 61356199 18 91189200 17 94132929 16 99184200 15 104633288 14 116006794 13 117115093 12 114251360 11 122883361 10 131187037 9 125583845 8 129321983 7 135793178 6 150316567 5 151006098 4 151730910 3 160564582 2 180335396 1 196842934 >>> print dsn.dna( segments = [ ("X", 147161760, 147161770), ("1", 100, 20) ] ) Calling url 'http://servlet.sanger.ac.uk:8080/das/mouse73/dna' with query 'segment=X:147161760,147161770;segment=1:100,20' agagagagagt nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn nnnnnnnnnnnnnnnnnnnnn >>> print dsn.types( segments = [("6", None, None)] ) 151 23 23 304 >>> It also converts DAS errors into Python exceptions >>> print dsn.sequence( [ ("X", 147161760, 147161770), ("1", 100, 20) ] ) Calling url 'http://servlet.sanger.ac.uk:8080/das/mouse73/sequence' with query 'segment=X:147161760,147161770;segment=1:100,20' Traceback (most recent call last): File "", line 1, in ? File "das.py", line 255, in sequence return self._call("sequence", s, "dassequence") File "das.py", line 243, in _call return self.server._call(self.dsn + "/" + command, query, dtd_name) File "das.py", line 180, in _call raise DASError(status_code) das.DASError: 400: Bad command (command not recognized) >>> Note that debugging is turned on in that it shows the URL called and the query string, which gives you the chance to see what it's doing and (if needed) reproduce a problem by hand. Speaking of which, many of the publically available servers listed at http://www.tigr.org/tdb/DAS/das_server_list.html have problems. I've tweaked the DTDs to support things like 'COLOR' and 'OUTLINECOLOR' which are pre-1.0 spec, and added a few other things as I found them. Another has 'X-DAS-Status' line with descriptive text after the 3 digit status number. Still others return syntactically invalid XML or simply implement the wrong function (a couple return the DASDSN for 'entry_points'!). So enjoy, but be wary! :) Andrew dalke@dalkescientific.com * Previous message: [BioPython] Better blasting with XML * Next message: [BioPython] clustaw * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] From Jose.Sepulveda.Sanchis at cgb.ki.se Tue Oct 14 18:17:24 2003 From: Jose.Sepulveda.Sanchis at cgb.ki.se (Jose Sepulveda Sanchis) Date: Tue Oct 14 18:16:40 2003 Subject: [BioPython] Installation problems Message-ID: <3188e2fbfc.2fbfc3188e@ki.se> hi I have had some installation problems. I'm trying to install biopython in Mac os X. When I'm installing egenix the TextTools don't work does anyone know if there is a particular way to install biopython in mac os x From jchang at jeffchang.com Tue Oct 14 23:49:48 2003 From: jchang at jeffchang.com (Jeffrey Chang) Date: Tue Oct 14 23:47:16 2003 Subject: [BioPython] Installation problems In-Reply-To: <3188e2fbfc.2fbfc3188e@ki.se> Message-ID: <9F60F702-FEC2-11D7-A8B2-000A956845CE@jeffchang.com> How are you trying to install it? Are you using the unix command line version of python? I have installed python from source, and have success treating it like any other unix system. > python setup.py install should work for both TextTools and Biopython. Jeff On Tuesday, October 14, 2003, at 06:17 PM, Jose Sepulveda Sanchis wrote: > hi I have had some installation problems. > I'm trying to install biopython in Mac os X. When I'm installing > egenix the TextTools don't work does anyone know if there is a > particular way to install biopython in mac os x > > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython From pbouige at pasteur.fr Wed Oct 15 03:04:50 2003 From: pbouige at pasteur.fr (Philippe Bouige) Date: Wed Oct 15 03:02:13 2003 Subject: [cgb.ki] [BioPython] DAS client In-Reply-To: <5573f53f35.53f355573f@ki.se>; from Jose.Sepulveda.Sanchis@cgb.ki.se on Tue, Oct 14, 2003 at 03:02:25PM -0500 References: <5573f53f35.53f355573f@ki.se> Message-ID: <20031015090450.A368768@electre.pasteur.fr> Jose Sepulveda Sanchis ?crit : > hi, > I have seen that there is a DAS client for python but I cannot get it from the link. Is this client included in the biopython? if not how can I get it? > is it posible to see the source code ? > thanks > > [BioPython] DAS client > Andrew Dalke dalke@dalkescientific.com > Fri, 23 Aug 2002 11:40:37 -0600 > > * Previous message: [BioPython] Better blasting with XML > * Next message: [BioPython] clustaw > * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] > > I've got a first draft, as it were, of a DAS client. You can download > and try it out from > > http://www.biopython.org/~dalke/das.tar.gz Yes, but I have a problem :-( with : http://www.biopython.org/~dalke/das.tar.gz "The requested link does not exist on this site." "Page not found: /~dalke/das.tar.gz" From dalke at dalkescientific.com Wed Oct 15 05:30:36 2003 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed Oct 15 05:26:38 2003 Subject: [BioPython] DAS client In-Reply-To: <5573f53f35.53f355573f@ki.se> Message-ID: <3BABD739-FEF2-11D7-A0AA-000393C92466@dalkescientific.com> Hi Jose, > I have seen that there is a DAS client for python but I cannot > get it from the link. Is this client included in the biopython? > if not how can I get it? > is it posible to see the source code ? The accounts got moved around and it appears my biopython.org/~dalke/ is no longer visible through biopython.org. For that matter, it appears that I've forgotten my password since I can't seem to log in to anything there. Hmmm.... Okay, I got the most recent code *temporarily* at http://www.dalkescientific.com/PyDAS-0.8.tar.gz I'll eventually list it somewhere under http://www.dalkescientific.com/Python/ I have not touched the DAS code in over a year nor have I received comments about it. As I recall, there were problems in that none of the DAS servers actually put out correct DAS, and some didn't even put out valid XML. PyDAS attempts to work around some of that, but it has not been tested on real data for doing real work. Please plan accordingly. In addition, that code used a DTD->Python parser generator. I now no longer believe that that's the right way to handle this sort of problem. (Were I to do it again, I would look at libraries like Fredrick Lundh's ElementTree. Ditto for the EUtils code.) For the good news, I'm listed as a contractor for the DAS2 project, and we should here if it got funded Real Soon Now. (It got a good review.) So I'll be doing more DAS programming starting in the next couple of months. (In the meanwhile, I'm available for consulting, and while I've been to G?teborg many times -- next time will be December -- I've yet to be to Stockholm. ;) Andrew dalke@dalkescientific.com From sbassi at asalup.org Wed Oct 15 09:21:46 2003 From: sbassi at asalup.org (Sebastian Bassi) Date: Wed Oct 15 09:30:13 2003 Subject: [BioPython] check for primer in primer3 or biopython? Message-ID: <3F8D49EA.2070906@asalup.org> Hello, Is there a way to check if a specific primer is good for a PCR reation? Primer3 can pick primers given a target sequence, but what I want is to test how "good" is a specific primer pair that was choosen previously (by hand or by another program). By good I mean that both primers wont anneal each other, they won't anneal itself, stability, missmatch percent, and so on. -- Best regards, //=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ //=\ \=// IT Manager Advanta Seeds - Balcarce Research Center - \=// //=\ Pro secretario ASALUP - www.asalup.org - PGP key available //=\ \=// E-mail: sbassi@genesdigitales.com - ICQ UIN: 3356556 - \=// http://Bioinformatica.info From anunberg at oriongenomics.com Wed Oct 15 11:33:33 2003 From: anunberg at oriongenomics.com (Andrew Nunberg) Date: Wed Oct 15 11:30:56 2003 Subject: [BioPython] check for primer in primer3 or biopython? In-Reply-To: <3F8D49EA.2070906@asalup.org> Message-ID: Primer3 does that. You can set parameters for how much primer-overlap, %GC, etc... -- Andrew Nunberg Bioinformagician Orion Genomics (314)-615-6989 www.oriongenomics.com > From: Sebastian Bassi > Organization: ASALUP > Reply-To: sbassi@asalup.org > Date: Wed, 15 Oct 2003 10:21:46 -0300 > To: biopython@biopython.org > Subject: [BioPython] check for primer in primer3 or biopython? > > Hello, > > Is there a way to check if a specific primer is good for a PCR reation? > Primer3 can pick primers given a target sequence, but what I want is to > test how "good" is a specific primer pair that was choosen previously > (by hand or by another program). > By good I mean that both primers wont anneal each other, they won't > anneal itself, stability, missmatch percent, and so on. > > -- > Best regards, > > //=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ //=\ > \=// IT Manager Advanta Seeds - Balcarce Research Center - \=// > //=\ Pro secretario ASALUP - www.asalup.org - PGP key available //=\ > \=// E-mail: sbassi@genesdigitales.com - ICQ UIN: 3356556 - \=// > > http://Bioinformatica.info > > > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > From sbassi at asalup.org Wed Oct 15 13:49:55 2003 From: sbassi at asalup.org (Sebastian Bassi) Date: Wed Oct 15 13:42:59 2003 Subject: [BioPython] check for primer in primer3 or biopython? In-Reply-To: <20031015184318.A570@asx.cgb.ki.se> References: <3F8D49EA.2070906@asalup.org> <20031015184318.A570@asx.cgb.ki.se> Message-ID: <3F8D88C3.8010906@asalup.org> Hello Anders, Anders Lundmark wrote: > I belive you can use the tags 'PRIMER_LEFT_INPUT' and > 'PRIMER_RIGHT_INPUT' in your primer3 input file to specify a primer > pair to check versus a given sequence. According to primer3 readme file: PRIMER_LEFT_INPUT (nucleotide sequence, default empty) The sequence of a left primer to check and around which to design right primers and optional internal oligos. Must be a substring of SEQUENCE. PRIMER_RIGHT_INPUT (nucleotide sequence, default empty) The sequence of a right primer to check and around which to design left primers and optional internal oligos. Must be a substring of the reverse strand of SEQUENCE. I think this won't be useful since it doesn't allow mismatches ("Must be a substring of SEQUENCE.") and real PCR primers could have mismatches. Anyway I will try it. -- Best regards, //=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ //=\ \=// IT Manager Advanta Seeds - Balcarce Research Center - \=// //=\ Pro secretario ASALUP - www.asalup.org - PGP key available //=\ \=// E-mail: sbassi@genesdigitales.com - ICQ UIN: 3356556 - \=// http://Bioinformatica.info From lee at epigenomix.com Wed Oct 15 15:29:49 2003 From: lee at epigenomix.com (Lee R. Shekter) Date: Wed Oct 15 15:29:49 2003 Subject: [BioPython] GenBank parsing Message-ID: <1066246347.656.8.camel@precision2.epivax-lan.com> Hi all - I am trying to have BioPython become a standard at my new job (we're a tiny biotech start-up in Providence, RI). I am new to BioPython, not so new to python. I have version 1.22 of BioPython installed on top of python 2.3.2 and all is working correctly (as per tests, sample code, etc.). I have read the documentation but am still somewhat puzzled. I have to parse GenBank records for the following information (the sequences have already been downloaded and so are local): 1) Accession number 2) protein sequence 3) Date of deposit 4) Country of origin I'm sure it's just a slight modification of what is described in the manual and there seems to be some snippets of code on the net that are related to this, but if anyone could help me, i.e. point me to some code which I can use to extract the information I need and put it into a file, that would be greatly appreciated. Lee Shekter From sbassi at asalup.org Thu Oct 16 07:06:00 2003 From: sbassi at asalup.org (Sebastian Bassi) Date: Thu Oct 16 09:18:43 2003 Subject: [BioPython] check for primer in primer3 or biopython? In-Reply-To: References: Message-ID: <3F8E7B98.3040204@asalup.org> Andrew Nunberg wrote: > Primer3 does that. You can set parameters for how much primer-overlap, %GC, > etc... > AFAIK, primer3 will let you choose primers parameters but the program chooses the primers (based on your parameters). I want to "validate" a previoulsy choosen primer. -- Best regards, //=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ //=\ \=// IT Manager Advanta Seeds - Balcarce Research Center - \=// //=\ Pro secretario ASALUP - www.asalup.org - PGP key available //=\ \=// E-mail: sbassi@genesdigitales.com - ICQ UIN: 3356556 - \=// http://Bioinformatica.info From Sean.Maceachern at dpi.vic.gov.au Thu Oct 16 21:52:26 2003 From: Sean.Maceachern at dpi.vic.gov.au (Sean.Maceachern@dpi.vic.gov.au) Date: Thu Oct 16 21:51:32 2003 Subject: [BioPython] Calculating synonymous sites Message-ID: Hello, I was hoping that someone may have run across an existing method for calculating the number of synonymous and non-synonymous sites. I am new at programming and I imagine it would take me a long time to code all of the possibilities etc.. so I was hoping someone might be able to offer some help or pass me on to an existing Biopython script before I begin, it seems like a lengthy process that someone must have done before. Cheers Sean MacEachern From jason at cgt.duhs.duke.edu Thu Oct 16 22:12:52 2003 From: jason at cgt.duhs.duke.edu (Jason Stajich) Date: Thu Oct 16 22:10:01 2003 Subject: [BioPython] Calculating synonymous sites In-Reply-To: References: Message-ID: Of course there are already a lot of existing programs out there to do this... We have a Nei-Gojoburi implementation in Bioperl Bio::Align::DNAStatistics if you want to adapt this to BioPython feel free. Do you really need to code it yourself if you are new to programming or can you rely on other programs? HY-PHY and PAML have a ML calculation of Ka and Ks and can be scripted with some work. Kevin Thornton has a bunch of C++ code which does some of this as well http://www.molpopgen.org/software/ -jason On Fri, 17 Oct 2003 Sean.Maceachern@dpi.vic.gov.au wrote: > Hello, I was hoping that someone may have run across an existing method > for calculating the number of synonymous and non-synonymous sites. I am > new at programming and I imagine it would take me a long time to code > all of the possibilities etc.. so I was hoping someone might be able to > offer some help or pass me on to an existing Biopython script before I > begin, it seems like a lengthy process that someone must have done > before. > > Cheers > > Sean MacEachern > > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > -- Jason Stajich Duke University jason at cgt.mc.duke.edu From anunberg at oriongenomics.com Fri Oct 17 11:39:25 2003 From: anunberg at oriongenomics.com (Andrew Nunberg) Date: Fri Oct 17 11:43:37 2003 Subject: [BioPython] Question about FeatureLocation objects Message-ID: I noticed that the FeatureLocation start and end attributes are not type ?int? but type ?instance? which makes operations like this Loc.end ? loc.start break And int(loc.start) also fails. One needs to do int(str(location.start)) Now I am new to python, is there a reason that start and end are not int objects? -- Andrew Nunberg Bioinformagician Orion Genomics (314)-615-6989 www.oriongenomics.com From chapmanb at uga.edu Fri Oct 17 11:49:17 2003 From: chapmanb at uga.edu (Brad Chapman) Date: Fri Oct 17 11:51:43 2003 Subject: [BioPython] Question about FeatureLocation objects In-Reply-To: References: Message-ID: <20031017154917.GI76808@evostick.agtec.uga.edu> Hey Andy; > I noticed that the FeatureLocation start and end attributes are not type > ?int? but type ?instance? which makes operations like this > Loc.end ? loc.start break > > And int(loc.start) also fails. > > One needs to do > int(str(location.start)) > > Now I am new to python, is there a reason that start and end are not int > objects? It's because of the mess of fuzzy locations. The start and end can be single integers, or any of the various GenBank/EMBL-style fuzzy locations ((2.3), (2^3), <2, >2). If you want to ignore the fuzzy locations and just deal with integers (which is what I read that you want to do from your mail), there is already a hook to do this. Just use loc.nofuzzy_start and loc.nofuzzy_end instead of loc.start and loc.end. Hopefully this helps. Brad From chapmanb at uga.edu Fri Oct 17 12:01:56 2003 From: chapmanb at uga.edu (Brad Chapman) Date: Fri Oct 17 12:38:17 2003 Subject: [BioPython] Calculating synonymous sites In-Reply-To: References: Message-ID: <20031017160155.GJ76808@evostick.agtec.uga.edu> Sean: > > Hello, I was hoping that someone may have run across an existing method > > for calculating the number of synonymous and non-synonymous sites. Jason: > Do you really need to code it yourself if you are new to programming or > can you rely on other programs? > > HY-PHY and PAML have a ML calculation of Ka and Ks and can be scripted > with some work. If you want to go the route of using an existing program, I've attached a script I wrote for one of my lab-mates which takes files of duplicated protein pairs and duplicated nucleotide pairs, aligns them using mrtrans, and then calculates and extracts Ks and Ka values using PAML. This has some case-specific stuff in it (like parsing the info I need from ugly FASTA titles), but could be a useful starting point nevertheless. Brad -------------- next part -------------- #!/usr/bin/env python """Calculate synonymous mutation rates for duplicated gene pairs. This does the following: 1. Fetches a duplicated protein pair. 2. Aligns the protein pair with clustalw 3. Convert the output to Fasta format. 4. Use this alignment info to align gene sequences using mrtrans 5. Convert mrtrans output to sequential input looking like phy 6. Run yn00 to calculate synonymous mutation rates. This requires that mrtrans: ftp://ftp.virginia.edu/pub/fasta/other/mrtrans.shar and PAML (specificially yn00): http://abacus.gene.ucl.ac.uk/software/paml.html be installed (to do the nucleotide alignments and synynmous calculations, respectively). Usage: dup_synonymous.py is a file containing duplicated protein pairs two at a time. is a file containing duplicated nucleotide pairs corresponding to the protein pairs in . These will be aligned based on alignments of the corresponding proteins. is the name of a file to write the results to. The results are output as a simple comma separated value file of pair name, synomymous rate, non-synonymous rate. """ CLEAN_UP = 0 import sys import os from Bio import Application from Bio import Fasta from Bio import Clustalw from Bio.Align.FormatConvert import FormatConverter def main(protein_file, dna_file, output_file): output_h = open(output_file, "w") output_h.write("name,dS,dN\n") work_dir = os.path.join(os.getcwd(), "syn_analysis") if not(os.path.exists(work_dir)): os.makedirs(work_dir) prot_iterator = Fasta.Iterator(open(protein_file), Fasta.RecordParser()) dna_iterator = Fasta.Iterator(open(dna_file), Fasta.RecordParser()) while 1: p_rec_1 = prot_iterator.next() if p_rec_1 is None: break p_rec_2 = prot_iterator.next() if p_rec_2 is None: break n_rec_1 = dna_iterator.next() if n_rec_1 is None: break n_rec_2 = dna_iterator.next() if n_rec_2 is None: break print "--------", p_rec_1.title, p_rec_2.title align_fasta = clustal_align_protein(p_rec_1, p_rec_2, work_dir) mrtrans_fasta = run_mrtrans(align_fasta, n_rec_1, n_rec_2, work_dir) if mrtrans_fasta: fix_mrtrans = fix_file(mrtrans_fasta) ds_subs, dn_subs = find_synonymous(fix_mrtrans, work_dir) if ds_subs is not None: pair_name = "%s;%s" % (p_rec_1.title, p_rec_2.title) output_h.write("%s,%s,%s\n" % (pair_name, ds_subs, dn_subs)) output_h.flush() def find_synonymous(input_file, work_dir): """Run yn00 to find the synonymous subsitution rate for the alignment. """ # create the .ctl file ctl_file = os.path.join(work_dir, "yn-input.ctl") output_file = os.path.join(work_dir, "nuc-subs.yn") ctl_h = open(ctl_file, "w") ctl_h.write("seqfile = %s\noutfile = %s\nverbose = 0\n" % (input_file, output_file)) ctl_h.write("icode = 0\nweighting = 0\ncommonf3x4 = 0\n") ctl_h.close() cl = YnCommandline(ctl_file) print "\tyn00:", cl result, r, e = Application.generic_run(cl) # errors = e.read() ds_value = None dn_value = None #while 1: # line = r.readline() # if not(line): # break # if line.find("kappa") == 0: # line = line.strip() # name, subs_string = line.split(" = ") # sub_value = float(subs_string) output_h = open(output_file) for line in output_h.xreadlines(): if line.find("+-") >= 0 and line.find("dS") == -1: parts = line.split(" +-") ds_value = extract_subs_value(parts[1]) dn_value = extract_subs_value(parts[0]) if ds_value is None or dn_value is None: h = open(output_file) print "yn00 didn't work: \n%s" % h.read() return ds_value, dn_value def extract_subs_value(text): """Extract a subsitution value from a line of text. This is just a friendly function to grab a float value for Ks and Kn values from the junk I get from the last line of the yn00 file. Line: 2 1 52.7 193.3 2.0452 0.8979 0.0193 0.0573 +- 0.0177 2.9732 +- 3.2002 Parts: [' 2 1 52.7 193.3 2.0452 0.8979 0.0193 0.0573', ' 0.0177 2.9732', ' 3.2002\n'] So we want 0.0573 for Kn and 2.9732 for Ks. """ parts = text.split() value = float(parts[-1]) return value class YnCommandline: """Little commandline for yn00. """ def __init__(self, ctl_file): self.ctl_file = ctl_file self.parameters = [] def __str__(self): return "yn00 %s" % self.ctl_file def run_mrtrans(align_fasta, rec_1, rec_2, work_dir): """Align two nucleotide sequences with mrtrans and the protein alignment. """ try: align_file = os.path.join(work_dir, "prot-align.fasta") nuc_file = os.path.join(work_dir, "nuc.fasta") output_file = os.path.join(work_dir, "nuc-align.mrtrans") # make the protein alignment file align_h = open(align_file, "w") align_h.write(str(align_fasta)) align_h.close() # make the nucleotide file nuc_h = open(nuc_file, "w") nuc_h.write(str(rec_1) + "\n") nuc_h.write(str(rec_2) + "\n") nuc_h.close() # run the program cl = MrTransCommandline(align_file, nuc_file, output_file) result, r, e = Application.generic_run(cl) errors = e.read() if errors.find("could not translate") >= 0: print "***mrtrans could not translate" return None else: print "\tmrtrans:", cl return output_file finally: if CLEAN_UP: if os.path.exists(nuc_file): os.remove(nuc_file) if os.path.exists(align_file): os.remove(align_file) class MrTransCommandline: """Simple commandline faker. """ def __init__(self, prot_align_file, nuc_file, output_file): self.prot_align_file = prot_align_file self.nuc_file = nuc_file self.output_file = output_file self.parameters = [] def __str__(self): return "mrtrans %s %s > %s" % (self.prot_align_file, self.nuc_file, self.output_file) def clustal_align_protein(rec_1, rec_2, work_dir): """Align the two given proteins with clustalw. """ try: fasta_file = os.path.join(work_dir, "prot-start.fasta") align_file = os.path.join(work_dir, "prot.aln") fasta_h = open(fasta_file, "w") fasta_h.write(str(rec_1) + "\n") fasta_h.write(str(rec_2) + "\n") fasta_h.close() clustal_cl = Clustalw.MultipleAlignCL(fasta_file) clustal_cl.set_output(align_file, #output_type = 'CLUSTAL', output_order = 'INPUT') clustal_cl.set_type('PROTEIN') alignment = Clustalw.do_alignment(clustal_cl) print "\tDoing clustalw alignment: %s" % clustal_cl converter = FormatConverter(alignment) return converter.to_fasta() finally: if CLEAN_UP: if os.path.exists(align_file): os.remove(align_file) if os.path.exists(fasta_file): os.remove(fasta_file) def process_john_title(title): """Process the crappily encoded fasta titles into info we need from them. """ # A01N001a&At1g01010 john_part, real_part = title.split("&") seg_name, dup_name = john_part.split("N") dup_num = dup_name[:-1] # get rid of the letter return seg_name, dup_num, real_part def fix_file(filename): new_file = filename + ".fix" input_h = open(filename) output_h = open(new_file, "w") it = Fasta.Iterator(input_h, Fasta.RecordParser()) fixed_rec_1 = fix_record(it.next()) fixed_rec_2 = fix_record(it.next()) assert len(fixed_rec_1.sequence) == len(fixed_rec_2.sequence) output_h.write("\t2 %s\n" % (len(fixed_rec_1.sequence))) output_h.write(str(fixed_rec_1) + "\n") output_h.write(str(fixed_rec_2) + "\n") return new_file def fix_record(rec): rec.sequence = rec.sequence.strip() # get rid of all '.'s in the sequence -- we want '-'s rec.sequence = rec.sequence.replace(".", "-") assert len(rec.sequence) % 3.0 == 0 return rec if __name__ == "__main__": if len(sys.argv) != 4: print "Incorrect arguments" print __doc__ sys.exit() main(sys.argv[1], sys.argv[2], sys.argv[3]) From jchang at jeffchang.com Sun Oct 19 11:37:12 2003 From: jchang at jeffchang.com (Jeffrey Chang) Date: Sun Oct 19 11:34:32 2003 Subject: [BioPython] Biopython 1.23 available Message-ID: <1BAC9E6A-024A-11D8-8924-000A956845CE@jeffchang.com> Hello Everybody, Biopython 1.23 is now available from the website at: http://www.biopython.org/ This is mostly a maintenance release, which fixes some problems in the installation. You do not need to update from 1.22 unless you are using the Bio.Cluster, Bio.KDTree, or Bio.PDB.mmCIF packages. The changes made in this release are: Fixed distribution of files in Bio/Cluster Now distributing Bio/KDTree/_KDTree.swig.C minor updates in installation code added mmCIF support for PDB files As usual, please report bugs to biopython-dev@biopython.org, or the bug database also available from the website. Jeff From madrulz at singnet.com.sg Mon Oct 20 02:43:49 2003 From: madrulz at singnet.com.sg (Maddie Wong) Date: Mon Oct 20 02:41:04 2003 Subject: [BioPython] License terms for Biopython v1.22 Message-ID: <1066632229.3f9384252248b@flounder.singnet.com.sg> Hi, What are the license agreement for Biopython version 1.22? Where can I find this information? Regards, Maddie From jchang at jeffchang.com Mon Oct 20 07:35:41 2003 From: jchang at jeffchang.com (Jeffrey Chang) Date: Mon Oct 20 07:33:00 2003 Subject: [BioPython] License terms for Biopython v1.22 In-Reply-To: <1066632229.3f9384252248b@flounder.singnet.com.sg> Message-ID: <892325B0-02F1-11D8-B670-000A956845CE@jeffchang.com> Biopython is released under the Biopython license, which is essentially the same as the Python one. A copy of the license is included with every distribution. It used to be online as well, but that got lost somewhere when we changed the website. Jeff On Monday, October 20, 2003, at 02:43 AM, Maddie Wong wrote: > Hi, > > What are the license agreement for Biopython version 1.22? Where can I > find this information? > > Regards, > Maddie > > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython From djeffares at zi.ku.dk Mon Oct 20 10:28:08 2003 From: djeffares at zi.ku.dk (Daniel Jeffares) Date: Mon Oct 20 17:04:42 2003 Subject: [BioPython] biopython course Message-ID: <9FF3F90C-0309-11D8-B94A-003065DBEAF4@zi.ku.dk> Does anyone know of an introductory Bioinformatics course that includes BioPython - that would suit a molecular biologist with no experience in programming? The most ideal Ive seen yet is this the course in informatics for biology 2004 offered at the Pasteur Institute but I dont speak French! Daniel Jeffares Department of Evolutionary Biology University of Copenhagen? 15 Universitetsparken? 2100 Copenhagen ? DENMARK ph +45 3532-1279 (W) fax +45 3532-1300 (W) ph +45 3819-3924 (H) djeffares@zi.ku.dk http://www.zi.ku.dk/evolbiology/ -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 731 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biopython/attachments/20031020/1385c722/attachment.bin From Ravinder.Singh at colorado.edu Mon Oct 20 18:34:24 2003 From: Ravinder.Singh at colorado.edu (Ravinder Singh) Date: Mon Oct 20 18:31:37 2003 Subject: [BioPython] biopython course References: <9FF3F90C-0309-11D8-B94A-003065DBEAF4@zi.ku.dk> Message-ID: <3F9462EF.3030908@colorado.edu> This is in American English ( e.g, Center not Centre). I hope it is useful. http://mcdb.colorado.edu/courses/6440/ Good luck. Ravinder ---- Daniel Jeffares wrote: > Does anyone know of an introductory Bioinformatics course that > includes BioPython - that would suit a molecular biologist with no > experience in programming? The most ideal Ive seen yet is this the > course in informatics for biology 2004 offered at the Pasteur > Institute > but I dont speak French! -- ******************************************************************************** Dr. Ravinder Singh Assistant Professor MCD Biology 347 UCB University of Colorado Boulder, CO 80309-0347 (303)492-8886 (voice) (303)492-7744 (fax) From madrulz at singnet.com.sg Tue Oct 21 00:25:32 2003 From: madrulz at singnet.com.sg (Maddie Wong) Date: Tue Oct 21 00:22:49 2003 Subject: [BioPython] License terms for Biopython v1.22 In-Reply-To: <892325B0-02F1-11D8-B670-000A956845CE@jeffchang.com> References: <892325B0-02F1-11D8-B670-000A956845CE@jeffchang.com> Message-ID: <1066710332.3f94b53c06abf@herring.singnet.com.sg> Hi, I searched at http://www.python.org/doc/Copyright.html and found this copyright notice: Copyright 1991-1995 by Stichting Mathematisch Centrum, Amsterdam, The Netherlands. All Rights Reserved Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the names of Stichting Mathematisch Centrum or CWI or Corporation for National Research Initiatives or CNRI not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. While CWI is the initial source for this software, a modified version is made available by the Corporation for National Research Initiatives (CNRI) at the Internet address ftp://ftp.python.org. STICHTING MATHEMATISCH CENTRUM AND CNRI DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM OR CNRI BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. Is this the one that you are referring to? Thanks. Regards, Maddie --- Jeffrey Chang wrote: > Biopython is released under the Biopython license, which is > essentially > the same as the Python one. A copy of the license is included with > > every distribution. It used to be online as well, but that got lost > > somewhere when we changed the website. > > Jeff > > > > On Monday, October 20, 2003, at 02:43 AM, Maddie Wong wrote: > > > Hi, > > > > What are the license agreement for Biopython version 1.22? Where > can I > > find this information? > > > > Regards, > > Maddie > > > > > > _______________________________________________ > > BioPython mailing list - BioPython@biopython.org > > http://biopython.org/mailman/listinfo/biopython > > > > From dalke at dalkescientific.com Tue Oct 21 00:59:12 2003 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue Oct 21 00:56:26 2003 Subject: [BioPython] License terms for Biopython v1.22 In-Reply-To: <1066710332.3f94b53c06abf@herring.singnet.com.sg> Message-ID: <4FF0CB2F-0383-11D8-86DA-000393C92466@dalkescientific.com> Maddie Wong wrote: > I searched at http://www.python.org/doc/Copyright.html and found this > copyright notice: That's the old Python license. The Biopython license is Biopython License Agreement Permission to use, copy, modify, and distribute this software and its documentation with or without modifications and for any purpose and without fee is hereby granted, provided that any copyright notices appear in all copies and that both those copyright notices and this permission notice appear in supporting documentation, and that the names of the contributors or copyright holders not be used in advertising or publicity pertaining to distribution of the software without specific prior permission. THE CONTRIBUTORS AND COPYRIGHT HOLDERS OF THIS SOFTWARE DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL THE CONTRIBUTORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. It's a rewrite of the old Python license, which is itself a form of the MIT/new-style BSD license. Andrew dalke@dalkescientific.com From letondal at pasteur.fr Tue Oct 21 02:20:57 2003 From: letondal at pasteur.fr (Catherine Letondal) Date: Tue Oct 21 02:18:06 2003 Subject: [BioPython] biopython course In-Reply-To: Your message of "Mon, 20 Oct 2003 16:28:08 +0200." <9FF3F90C-0309-11D8-B94A-003065DBEAF4@zi.ku.dk> Message-ID: <200310210620.h9L6Kvbs359337@electre.pasteur.fr> Daniel Jeffares wrote: > > Does anyone know of an introductory Bioinformatics course that includes=20= > > BioPython - that would suit a molecular biologist with no experience in=20= > > programming? The most ideal Ive seen yet is this the course in=20 > informatics for biology 2004 offered at the Pasteur Institute=20 > but I dont=20 > speak French! The online Python course is in English (well, we hope :-)) Its intended for biologists without any experience in programming. Click on the 'Python' link from the page above or directly: http://www.pasteur.fr/formation/infobio/python/ -- Catherine Letondal -- Pasteur Institute Computing Center From thamelry at vub.ac.be Wed Oct 22 08:36:09 2003 From: thamelry at vub.ac.be (Thomas Hamelryck) Date: Wed Oct 22 08:44:18 2003 Subject: [BioPython] Re: [Biopython-dev] Updates on PDB entries In-Reply-To: <200310201216.04502.kristian.rother@charite.de> References: <200310201216.04502.kristian.rother@charite.de> Message-ID: <200310221436.09650.thamelry@vub.ac.be> Hi everybody, Kristian Rother donated code to retrieve the weekly distributed files of new or modified protein structures from the PDB server or its mirrors. You can also use the code in a weekly cronjob to keep your local version of the PDB up-to-date. The code is added to Bio.PDB in CVS. Best regards, --- Thomas Hamelryck Computational modeling lab (COMO) Vrije Universiteit Brussel (VUB) Belgium http://homepages.vub.ac.be/~thamelry From ale at telethon.bio.unipd.it Wed Oct 22 09:06:02 2003 From: ale at telethon.bio.unipd.it (Alessandro) Date: Wed Oct 22 09:03:57 2003 Subject: [BioPython] help for searching overlapping occurrences Message-ID: <20031022130602.GA26142@telethon> I' am looking for a method like finditer from module re but returning all the occurrences of a pattern in a straing even if overlapping to each other. Thanks a lot. Alessandro Coppe From dalke at dalkescientific.com Wed Oct 22 10:51:25 2003 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed Oct 22 10:48:34 2003 Subject: [BioPython] help for searching overlapping occurrences In-Reply-To: <20031022130602.GA26142@telethon> Message-ID: <35B108A0-049F-11D8-86DA-000393C92466@dalkescientific.com> Alessandro: > I' am looking for a method like finditer from module re but > returning all the occurrences of a pattern in a straing even > if overlapping to each other. There is no such function. Such a search may be highly exponential as every ambiguous branch must be taken. If you want, I have a very experimental pure Python regular expression engine. The NFA portion is decent, but doesn't handle all regexp syntax (eg, non-greedy matches). You could modify that to explore all paths rather than just take one. It also includes the ability to turn simple regexps into a DFA, which is much faster but limited to a smaller number of patterns. You can also use Perl's regexp engine. That has the ability to call arbitrary Perl code to decide if a match occurs. You would just need to write a hook which saves the current match information and rejects it, forcing the engine to backtrack and find another possibility. Andrew dalke@dalkescientific.com From jchang at jeffchang.com Wed Oct 22 11:23:34 2003 From: jchang at jeffchang.com (Jeffrey Chang) Date: Wed Oct 22 11:20:48 2003 Subject: [BioPython] help for searching overlapping occurrences In-Reply-To: <35B108A0-049F-11D8-86DA-000393C92466@dalkescientific.com> Message-ID: On Wednesday, October 22, 2003, at 10:51 AM, Andrew Dalke wrote: > Alessandro: >> I' am looking for a method like finditer from module re but >> returning all the occurrences of a pattern in a straing even >> if overlapping to each other. > > There is no such function. > > Such a search may be highly exponential as every ambiguous > branch must be taken. Would the following (untested) code do what Alessandro wants? def finditer_overlapped(pattern, string): for i in range(len(string)): m = re.match(pattern, string[i:]) if m: yield m Jeff From dalke at dalkescientific.com Wed Oct 22 11:58:36 2003 From: dalke at dalkescientific.com (Andrew Dalke) Date: Wed Oct 22 11:55:46 2003 Subject: [BioPython] help for searching overlapping occurrences In-Reply-To: Message-ID: <98346B10-04A8-11D8-86DA-000393C92466@dalkescientific.com> Jeff: > Would the following (untested) code do what Alessandro wants? > > def finditer_overlapped(pattern, string): > for i in range(len(string)): > m = re.match(pattern, string[i:]) > if m: > yield m Consider the pattern a(bc|bcd) when searched against abcd Alessandro wanted > all the occurrences of a pattern in a straing even if overlapping to > each other. which I take to mean he wants the "abc" AND "abcd" matches. Python uses the left-first approach so only finds the "abc" (compared to the POSIX left-longest one which finds "abcd"). The scanner code you wrote won't yield both of those possibilities. Andrew dalke@dalkescientific.com From jchang at jeffchang.com Wed Oct 22 13:47:07 2003 From: jchang at jeffchang.com (Jeffrey Chang) Date: Wed Oct 22 13:44:20 2003 Subject: [BioPython] help for searching overlapping occurrences In-Reply-To: <98346B10-04A8-11D8-86DA-000393C92466@dalkescientific.com> Message-ID: On Wednesday, October 22, 2003, at 11:58 AM, Andrew Dalke wrote: > Jeff: >> Would the following (untested) code do what Alessandro wants? >> >> def finditer_overlapped(pattern, string): >> for i in range(len(string)): >> m = re.match(pattern, string[i:]) >> if m: >> yield m > > Consider the pattern > > a(bc|bcd) > > when searched against > > abcd Yes, that would indeed fail. Alessandro, can you tell us more about your problem? I suspect you may be searching for occurrences of a motif, such as GA.CC, within a DNA sequence. If so, you may not need exactly what you said you needed... Jeff From sbassi at asalup.org Thu Oct 23 06:59:45 2003 From: sbassi at asalup.org (Sebastian Bassi) Date: Thu Oct 23 07:31:04 2003 Subject: [BioPython] help for searching overlapping occurrences In-Reply-To: <20031022130602.GA26142@telethon> References: <20031022130602.GA26142@telethon> Message-ID: <3F97B4A1.6090808@asalup.org> Alessandro wrote: > I' am looking for a method like finditer from module re but returning all the occurrences of a pattern in a straing even if overlapping to each other. Thanks a lot. > Alessandro Coppe You may try this: e-Paper: tacg, a grep for DNA (Harry J Mangalam). BMC Bioinformatics 2002, 3:8. http://www.biomedcentral.com/1471-2105/3/8 Software: http://tacg.sourceforge.net I never used this program but it seems it could be useful for this kind of job. -- Best regards, //=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ //=\ \=// IT Manager Advanta Seeds - Balcarce Research Center - \=// //=\ Pro secretario ASALUP - www.asalup.org - PGP key available //=\ \=// E-mail: sbassi@genesdigitales.com - ICQ UIN: 3356556 - \=// http://Bioinformatica.info From hjm at tacgi.com Thu Oct 23 12:26:33 2003 From: hjm at tacgi.com (Harry Mangalam) Date: Thu Oct 23 12:23:41 2003 Subject: [BioPython] help for searching overlapping occurrences In-Reply-To: <3F97B4A1.6090808@asalup.org> References: <20031022130602.GA26142@telethon> <3F97B4A1.6090808@asalup.org> Message-ID: <3F980139.80101@tacgi.com> I thought you were looking for a Python lib, so I didn't respond. If interested, you're welcome to a newer version. It depends on what kind of thing you're looking for. If a Python module for doing this, then no. If very long patterns, probably not - tacg only supports ~50 bp patterns as supplied (altho you could recompile to extend this arbitrarily I guess). If you're looking for a standalone app to do it, yes. It supports IUPAC patterns, regexes, TRANSFAC matrices, and rules (logical combinations of the previous patterns as in (pattern_A AND pattern_B AND pattern_C) NOT ((Pattern_D AND pattern_E) XOR (pattern_F OR patternG)) in a sliding window of XX bases (useful for finding things like SARS/MARS) Both strands at the same time, auto sequence conversion on input, no limit to sequence length, 30-50x faster than the GCG equivs, 100x more memory efficient than the EMBOSS equivs, (but overall less functionality - only nucleic acid searches supported currently) Sebastian Bassi wrote: > Alessandro wrote: > >> I' am looking for a method like finditer from module re but returning >> all the occurrences of a pattern in a straing even if overlapping to >> each other. Thanks a lot. >> Alessandro Coppe > > > You may try this: > > e-Paper: > tacg, a grep for DNA (Harry J Mangalam). > BMC Bioinformatics 2002, 3:8. > http://www.biomedcentral.com/1471-2105/3/8 > Software: > http://tacg.sourceforge.net > > I never used this program but it seems it could be useful for this kind > of job. > > -- Cheers, Harry Harry J Mangalam - 949 856 2847 (v&f) - hjm@tacgi.com <> From fkauff at duke.edu Fri Oct 24 15:30:31 2003 From: fkauff at duke.edu (Frank Kauff) Date: Fri Oct 24 15:27:41 2003 Subject: [BioPython] Bug in SeqUtils.quicker_apply_on_multi_fasta() Message-ID: <1067023831.4762.5.camel@osiris.biology.duke.edu> Hi there, there seems to be a bug in SeqUtils.quicker_apply_on_multi_fasta() Instead of if result: results.append('>%s\n%s' % (record.title, result)) it should be if result: results.append('>%s\n%s' % (name, result)) Probably a copy-paste error from apply_on_multi_fasta() Frank -- Frank Kauff Dept. of Biology Duke University Box 90338 Durham, NC 27708 USA Phone 919-660-7382 Fax 919-660-7293 From ma1016s at fromru.com Sat Oct 25 15:16:39 2003 From: ma1016s at fromru.com (msk) Date: Sat Oct 25 14:13:32 2003 Subject: [BioPython] Affordable Housecleaning Service/First Time 20% Off Message-ID: <200310251813.h9PIDTdb010432@portal.open-bio.org> Dear Los Angeles Resident: Hi, we are a housecleaning company servicing your area for over 10 years. Our workers are professional trained ladies, licensed, insured and bonded. Our prices are very affordable, and right now we are offering 20% off to first time customers. Also, we work with Realtors and R.E. Management companies to clean: Move Ins, Move Outs, Open Houses and More. If you are interested and would like to get some more information or set an appointment, please give us a call to our office telephone number (310) 823-8356. We work from Monday thru Saturday 8am to 5pm Thank you, Master Kleen (310) 823-8356 RATES* Apt Apt House 1 Bdr + 1 Bthr $ 40 $ 50 2 Bdr + 2 Bthr $ 50 $ 60 3 Bdr + 2 Bthr $ 60 $ 65 * Some restrictions apply depending upon size, sq. ft., stories and pets. **If you wish to be REMOVED from our future mailings, send an email to no4me@fromru.com