From Cariaso at yahoo.com Thu Apr 1 19:00:31 2004 From: Cariaso at yahoo.com (Michael Cariaso) Date: Thu Apr 1 19:05:37 2004 Subject: [BioPython] Problems building biopython-based script with py2exe In-Reply-To: <20040204234841.GI907@evostick.agtec.uga.edu> References: <6.0.1.1.0.20040203095808.01e675e8@exchange1.scri.sari.ac.uk> <20040204234841.GI907@evostick.agtec.uga.edu> Message-ID: <406CAD1F.6000901@yahoo.com> perhaps one more minor issue was overlooked. Bio/__init__.py : 93 zipfiles = __import__("Bio.config", {}, {}, ["Bio"]).__loader__.files doesn't work for me. But (with the '_' before files) it does zipfiles = __import__("Bio.config", {}, {}, ["Bio"]).__loader__._files Brad Chapman wrote: >Hi Leighton, Andy; > >Me: > > >>>>Could you please verify that I did everything right and didn't mess >>>>anything up on typing it up? Ugh, all those list comprehensions and >>>>import hacks and my head is a bit tired right now; so hopefully I >>>>did a decent job. >>>> >>>> > >Leighton: > > >>>Just off by a couple of typos: >>> >>> > >Thanks -- I just got these all fixed. You also get yourself added to >the contributors file with happy lines of code. Thanks again for >your work on this. > >Andy: > > >>Please commit the changes ASAP, >>I just updated my biopython and tried to install and got an error >>at line 54 in the Bio.__init__.py >> >> > >Yeah, I'm a mess. Sorry about that. The fixes were checked in >(actually about 15 minutes before I received your mail -- how's that >for service :-). It just might take a bit to propagate over to >anonymous CVS. > >Brad >_______________________________________________ >BioPython mailing list - BioPython@biopython.org >http://biopython.org/mailman/listinfo/biopython > > > From lpritc at scri.sari.ac.uk Fri Apr 2 03:57:23 2004 From: lpritc at scri.sari.ac.uk (Leighton Pritchard) Date: Fri Apr 2 04:02:32 2004 Subject: [BioPython] Problems building biopython-based script with py2exe In-Reply-To: <406CAD1F.6000901@yahoo.com> References: <6.0.1.1.0.20040203095808.01e675e8@exchange1.scri.sari.ac.uk> <20040204234841.GI907@evostick.agtec.uga.edu> <406CAD1F.6000901@yahoo.com> Message-ID: <406D2AF3.9060606@scri.sari.ac.uk> Well spotted. I see from the CVS that Brad has already fixed it, too - now /that/ is service :) (My Windows install is behind the CVS and still has my hack, which may be the reason besides my poor proofreading that I hadn't noticed it - :oops: ) Michael Cariaso wrote: > perhaps one more minor issue was overlooked. > > Bio/__init__.py : 93 > zipfiles = __import__("Bio.config", {}, {}, > ["Bio"]).__loader__.files > doesn't work for me. But (with the '_' before files) it does > zipfiles = __import__("Bio.config", {}, {}, > ["Bio"]).__loader__._files > > > Brad Chapman wrote: > >> Hi Leighton, Andy; >> >> Me: >> >> >>>>> Could you please verify that I did everything right and didn't mess >>>>> anything up on typing it up? Ugh, all those list comprehensions and >>>>> import hacks and my head is a bit tired right now; so hopefully I >>>>> did a decent job. >>>>> >> >> >> Leighton: >> >> >>>> Just off by a couple of typos: >>>> >> >> >> Thanks -- I just got these all fixed. You also get yourself added to >> the contributors file with happy lines of code. Thanks again for >> your work on this. >> >> Andy: >> >> >>> Please commit the changes ASAP, >>> I just updated my biopython and tried to install and got an error at >>> line 54 in the Bio.__init__.py >>> >> >> >> Yeah, I'm a mess. Sorry about that. The fixes were checked in >> (actually about 15 minutes before I received your mail -- how's that >> for service :-). It just might take a bit to propagate over to >> anonymous CVS. >> >> Brad >> _______________________________________________ >> BioPython mailing list - BioPython@biopython.org >> http://biopython.org/mailman/listinfo/biopython >> >> >> > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > > -- Dr Leighton Pritchard AMRSC D104, PPI, Scottish Crop Research Institute Invergowrie, Dundee, DD2 5DA, Scotland, UK E: lpritc@scri.sari.ac.uk W: http://bioinf.scri.sari.ac.uk/index.shtml T: +44 (0)1382 568579 F: +44 (0)1382 568578 PGP key FEFC205C: GPG key E58BA41B: http://www.keyserver.net From chapmanb at uga.edu Fri Apr 2 11:20:52 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Fri Apr 2 11:33:54 2004 Subject: [BioPython] error with Fasta.Record? In-Reply-To: <20040331164354.GA9655@uracil.uio.no> References: <20040331164354.GA9655@uracil.uio.no> Message-ID: <20040402162052.GB45713@evostick.agtec.uga.edu> Hi Karin; > I use the following code to read in a fasta file: [...] > I do this with a test file: > > adenine:18:38> cat /med/adenine/u2/projects/locator/gard/testfile > >1_dapB_to_carA_29196_29650 > gtctataagtgccaaaaattacatgttttgtcttctgtttttgttgttttaatgtaaatt > ttgaccatttggtccacttttttctgctcgtttttatttcatgcaatc [...] > And the files I get look like this: > > adenine:18:37> cat /med/adenine/u2/projects/locator/gard/singles/10001 > >1_dapB_to_carA_29196_29650 > GTCTATAAGTGCCAAAAATTACATGTTTTGTCTTCTGTTTTTGTTGTTTTAATGTAAATT Unfortunately, I'm not able to reproduce this error. I've attached a test script which uses the quick_FASTA_reader and works with the f002 file from Tests/Fasta (so you can check it yourself on the same file and make sure everything works on your platform). If you run this script on your test file, do you see the same problem? Without knowing more, I have a couple of guesses about the problem: 1. There is some kind of newline problem. The quick_FASTA_reader is a pretty simple implementation which probably won't work properly if fed a file with lots of different newlines (or newlines different from the platform they are being run on). The best solution here is to use the full Fasta.RecordParser() for parsing. 2. Your code is somewhere modifying the sequences. If seems like you have at least a bit of other code in there which is doing things with the entries. Perhaps they are modified somehow there. Just guesses though. I'd like to fix the problem but need to distill this down to a test case so that I can reproduce it. Hopefully my attached test code helps do this. Thanks for the report and checking into this. Brad -------------- next part -------------- from Bio.SeqUtils import quick_FASTA_reader from Bio import Fasta fasta_file = "f002" outfile = "test-writing.fasta" outhandle = open(outfile, "w") entries = quick_FASTA_reader(fasta_file) for name, seq in entries: rec = Fasta.Record() rec.title = name rec.sequence = seq print rec outhandle.write(str(rec) + "\n") outhandle.close() From chapmanb at uga.edu Fri Apr 2 11:28:24 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Fri Apr 2 11:39:10 2004 Subject: [BioPython] Deprecated Bio.sequtils Message-ID: <20040402162824.GC45713@evostick.agtec.uga.edu> Hello everyone; In checking into a bug report, I suddenly realized for the first time that Bio/sequtils.py and Bio/SeqUtils/__init__.py duplicate the same code. It looks like sequtils.py was expanded to an entire directory at some point but the duplication was never removed. Even worse, there have been different fixes and additions to both parts. To fix this problem, I've merged all the changes into Bio/SeqUtils/__init__.py and officially deprecated Bio/sequtils.py. The SeqUtils directory contains other useful manipulation code so this seems the right way to do things without being too confusing. For now Bio/sequtils.py raises a DeprecationWarning when used, but still works (by importing the code from Bio/SeqUtils.py into it's namespace. This will give people time to update their scripts without breaking anything. To fix Scripts that reference Bio/sequtils.py you need to change: from Bio.sequtils import whatever to: from Bio.SeqUtils import whatever Sorry about any difficulties this may cause. Bio/sequtils.py will remain around for a while to maintain back compatibility with the warning. Please let me know if I've done anything which does not leave it back compatible. Thanks! Brad From fkauff at duke.edu Fri Apr 2 14:42:11 2004 From: fkauff at duke.edu (Frank Kauff) Date: Fri Apr 2 14:47:19 2004 Subject: [BioPython] Deprecated Bio.sequtils In-Reply-To: <20040402162824.GC45713@evostick.agtec.uga.edu> References: <20040402162824.GC45713@evostick.agtec.uga.edu> Message-ID: <1080934930.2059.5.camel@osiris.biology.duke.edu> Hi Brad, while you're at it... :-) there seems to be a little bug in SeqUtils.quicker_apply_on_multi_fasta() Instead of if result: results.append('>%s\n%s' % (record.title, result)) it should be if result: results.append('>%s\n%s' % (name, result)) Probably a copy-paste error from apply_on_multi_fasta() Cheers, Frank On Fri, 2004-04-02 at 11:28, Brad Chapman wrote: > Hello everyone; > In checking into a bug report, I suddenly realized for the first > time that Bio/sequtils.py and Bio/SeqUtils/__init__.py duplicate the > same code. It looks like sequtils.py was expanded to an entire > directory at some point but the duplication was never removed. Even > worse, there have been different fixes and additions to both parts. > > To fix this problem, I've merged all the changes into > Bio/SeqUtils/__init__.py and officially deprecated Bio/sequtils.py. > The SeqUtils directory contains other useful manipulation code so > this seems the right way to do things without being too confusing. > > For now Bio/sequtils.py raises a DeprecationWarning when used, but > still works (by importing the code from Bio/SeqUtils.py into it's > namespace. This will give people time to update their scripts > without breaking anything. To fix Scripts that reference > Bio/sequtils.py you need to change: > > from Bio.sequtils import whatever > > to: > > from Bio.SeqUtils import whatever > > Sorry about any difficulties this may cause. Bio/sequtils.py will > remain around for a while to maintain back compatibility with the > warning. Please let me know if I've done anything which does not > leave it back compatible. > > Thanks! > Brad > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython -- Frank Kauff Dept. of Biology Duke University Box 90338 Durham, NC 27708 USA Phone 919-660-7382 Fax 919-660-7293 From chapmanb at uga.edu Fri Apr 2 15:41:24 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Fri Apr 2 15:52:11 2004 Subject: [BioPython] Deprecated Bio.sequtils In-Reply-To: <1080934930.2059.5.camel@osiris.biology.duke.edu> References: <20040402162824.GC45713@evostick.agtec.uga.edu> <1080934930.2059.5.camel@osiris.biology.duke.edu> Message-ID: <20040402204124.GD45713@evostick.agtec.uga.edu> Hi Frank; > there seems to be a little bug in > SeqUtils.quicker_apply_on_multi_fasta() Thanks! All checked into CVS -- just drop a line is you spot anything else. Brad From dora at doracasso.com Fri Apr 2 20:08:32 2004 From: dora at doracasso.com (Dora Casso) Date: Fri Apr 2 20:08:41 2004 Subject: [BioPython] biopython.org ranked # 38 in Google for super viagra Message-ID: <17605128.1080954600265.JavaMail.developer@211.152.14.82> Hi there! Sorry for an e-mail out of the blue, but I just did a search for the term super viagra on Google and found biopython.org ranked 38. Since I publish a related website about Health - Pharmacy (it's strictly informational, so I'm definitely NOT a competitor of yours), I'd like to link to your site. My site is one of the best resources for info in our category (I think you'll see that my site is pretty clean and high quality, and I only request to link to other quality sites for exchange). Because of this great info, I get a pretty decent amount of visitors...so if I link to you, your site should get some nice traffic as well. So you know, I've already linked to you and will keep it there for a few days until I hear from you. If you're interested in swapping links for good, please reply back so I can get you all of the pertinent information. Thanks! Dora Casso RAC IM: 1105232. From mdehoon at ims.u-tokyo.ac.jp Sun Apr 4 23:37:54 2004 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Sun Apr 4 23:43:23 2004 Subject: [BioPython] kMeans.py -> Bio.Cluster Message-ID: <4070D492.20107@ims.u-tokyo.ac.jp> Brad: > 1. Starting to raise a Deprecation Warning for the kMeans module. 2. Trying > to write some kind of short document on how to switch from using kMeans to > using Bio.Cluster.kcluster. BioPerl has a document called DEPRECATED with > this kind of info -- that seems like a reasonable step to follow. Jeff and > Michiel, would it be possible to write something up quick. 3. Thomas needs to > decide if he wants to rewrite xkMeans or deprecate it as well. 1. I have added a Deprecation Warning to the kMeans module and the xkMeans module. Btw, it seems that the import statement in xkMeans.py is no longer valid (from Bio.Clustering import kMeans). 2. Below is my attempt at the DEPRECATED doc. Jeff, could you have a look at this to see if there are any mistakes? Thanks! --Michiel. Moving from kMeans.py to Bio.Cluster ==================================== The k-Means algorithm is an algorithm for unsupervised clustering of data. Biopython includes an implementation of the k-means clustering algorithm in kMeans.py. Recently, a larger set of clustering algorithms entered Biopython as Bio.Cluster. As the kcluster routine in Bio.Cluster also implements the k-means clustering algorithm, the kMeans.py module has been deprecated. This document describes how to switch from kMeans.py to Bio.Cluster's kcluster. The function kcluster in Bio.Cluster performs k-means or k-medians clustering. The corresponding function in kMeans.py is called cluster. This function takes the following arguments: o data o k o distance_fn o init_centroids_fn o calc_centroid_fn o max_iterations o update_fn The function kcluster in Bio.Cluster takes the following arguments: o data o nclusters o mask o weight o transpose o npass o method o dist o initialid Arguments for kMeans.py's cluster, and their equivalents in Bio.Cluster ======================================================================= data ---- In kMeans.py, data is a list of vectors, each containing the same number of data points. Within the context of clustering genes based on their gene expression values, each vector would correspond to the gene expression data of one particular gene, and the values in the vector would correspond to the measured gene expression value by the different microarrays. The cluster routine in kMeans.py always performs a row-wise clustering by grouping vectors. The argument data to Bio.Cluster's kcluster has the same structure as in kMeans.py. However, Bio.Cluster allows row-wise and column-wise clustering by the transpose argument. If transpose==0 (the default value), kcluster performs row-wise clustering, consistent with kMeans.py. If transpose==1, kcluster performs column-wise clustering. The same behavior can be obtained, of course, by transposing the data array before calling kcluster. k - The desired number of clusters is specified by the input argument k in kMeans.py. The corresponding argument in Bio.Cluster's kcluster is nclusters. distance_fn ----------- In kMeans.py, the argument distance_fn represents the distance function to calculate the distances between items and cluster centroids. This argument corresponds to a true Python function. The default value is the Euclidean distance, implemented as distance.euclidean in distance.py. User-defined distance functions can also be used. The k-means routine in Bio.Cluster does not allow user-specified distance functions. Instead, it provides the following nine built-in distance functions, depending on the argument dist: dist=='e': Euclidean distance dist=='h': Harmonically summed Euclidean distance dist=='b': City-block distance dist=='c': Pearson correlation dist=='a': absolute value of the Pearson correlation dist=='u': uncentered correlation dist=='x': absolute uncentered correlation dist=='s': Spearmans rank correlation dist=='k': Kendalls tau User-defined distance functions are possible only by modifying the C code in cluster.c (which may not be as hard as it sounds). The default distance function is the Euclidean distance (distance=='e'). Note that in Bio.Cluster the Euclidean distance is defined as the sum of squared differences, whereas in kMeans.py the square root of this quantity is taken. This does not affect the clustering result. init_centroids_fn ----------------- This function specifies the initial choice for the cluster centroids. By default, cluster in kMeans.py uses a random initial choice of cluster centroids by randomly choosing k data vectors from the input vectors in the data input argument. Alternatively, the user can specify a user-defined function to choose the initial cluster centroids. In Bio.Cluster, the k-means algorithm in kcluster starts from an initial cluster assignment instead of an initial choice of cluster centroids. As far as I know, these two initialization methods are equivalent in practice. Similar to the cluster routine in kMeans.py, Bio.Cluster's kcluster performs a random initial assignment of items to clusters. Alternatively, users can specify a (deterministic) initial clustering via the initialid argument. This argument is None by default. If not None, it should be a 1D array (or list) containing the number (between 0 and nclusters-1) of the cluster to which each item is assigned initially. Note that the k-means routine in Bio.Cluster performs automatic repeats of the algorithm, each time starting from a different random initial clustering. See the comment for the npass argument below. calc_centroid_fn ---------------- This argument specifies how to calculate the cluster centroids, given the data vectors of the items that belong to each cluster. By default, the mean over the vectors is calculated. A user-defined function can also be used. Bio.Cluster's kcluster does not allow user-defined functions. Instead, the method to calculate the cluster centroid is determined by the argument method, which can be either 'a' (arithmetic mean) or 'm' (median). The default is to calculate the mean ('a'). max_iterations -------------- The cluster routine in kMeans.py has an argument max_iterations, which is used to stop the iteration it the routine does not converge after the given number of iterations. The kcluster routine in Bio.Cluster does not have such an argument. The failure of a k-means algorithm to converge is due to the occurrence of periodic clustering solutions during the course of the k-means algorithm. The kcluster routine in Bio.Cluster automatically checks for the occurrence of such a periodicity in the solutions. If a periodic behavior is detected, the algorithm is interrupted and the last clustering solution is returned. Accordingly, the kcluster routine is guaranteed to return a clustering solution. Also see the discussion of the npass argument below. update_fn --------- The argument update_fn to cluster in kMeans.py is a hook function that is called at the beginning of every iteration and passed the iteration number, cluster centroids, and current cluster assignments. It is used by xkMeans.py, which provides a visualization of k-means clustering. Currently there is no equivalent in Bio.Cluster. Other arguments for Bio.Cluster's kcluster ========================================== Three arguments in Bio.Cluster's kcluster do not have a direct equivalent in kMeans.py's cluster. mask ---- Microarray experiments tend to suffer from a large number of missing data. The argument mask to Bio.Cluster's kcluster lets the user specify which data are missing. This argument is an array with the same shape as data, and contains a 1 for each data point that is present, and a 0 for a missing data point: mask[i,j]==1: data[i,j] is valid mask[i,j]==0: data[i,j] is a missing data point Missing data points are ignored by the clustering algorithm. By default, mask is an array containing 1's everywhere. weight ------ The weight argument is used to put different weights on different data point. For example, when clustering genes based on their gene expression profile, we may want to attach a bigger weight to some microarrays compared to others. By default, the weight argument contains equal weights of 1.0 for all data points. Note that for row-wise clustering, the weight argument is a 1D vector whose length is equal to the number of columns. For column-wise clustering, the length of this argument is equal to the number of rows. npass ----- Typical implementations of the k-means clustering algorithm rely on a random initialization. Unlike Self-Organizing Maps, however, the k-means algorithm has a clearly defined goal, which is to minimize the within-cluster sum of distances. Different k-means clustering solutions (based on different initial clusterings) can therefore be compared to each other directly. In order to increase the chance of finding the optimal k-means clustering solution, the k-means routine in Bio.Cluster automatically repeats the algorithm npass times, each time starting from a different initial random clustering. The best clustering solution, as well as in how many of the npass attempts it was found, is returned to the user. For more information, see the output variable nfound below. The default value of npass is 1. Return values ============= The cluster routine in kMeans.py returns two values, centroids and clusters. The kcluster routine in Bio.Cluster returns four values: clusterid, centroids, error, and nfound. centroids --------- The centroids return value contains the centroids of the k clusters that were found, and corresponds to the centroids return value from Bio.Cluster's kcluster routine. clusters -------- The clusters return value contains the number of the cluster to which each vector was assigned. The corresponding return value in Bio.Cluster's kcluster is clusterid. error ----- The error return value from Bio.Cluster's kcluster is the within-cluster sum of distances for the optimal clustering solution that was found. This value can be used to compare different clustering solutions to each other. nfound ------ The nfound return value from Bio.Cluster's kcluster shows in how many of the npass runs the optimal clustering solution was found. Accordingly, nfound is at least 1 and at most equal to npass. A large value for nfound is an indication that the clustering solution that was found is optimal. On the other hand, if nfound is equal to 1, it is very well possible that a better clustering solution exists than the one found by kcluster. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From Angelina7May at yahoo.ca Mon Apr 5 09:04:41 2004 From: Angelina7May at yahoo.ca (Angelina May) Date: Mon Apr 5 05:09:13 2004 Subject: [BioPython] 14 FWD: All MEN should read this Message-ID: Paradise SEX Island Awaits! Tropical 1-2 week vacations where anything goes! We have lots of WOMEN, SEX, ALCOHOL, ETC!! Every man's dream awaits on this island of pleasure. Ever wonder what a Fantasy Sex Holiday would be like? If it was available at a reasonable cost.........would you go? Check out more information on our site & we can make your dream vacation a reality.... * All contact, reservations, billings, are strcitly confidential & are discussed directly with the client only. ** Group dis-counts are available. ie. Bachelor parties, etc. World-class Golfing, snorkling, night-clubs, & beaches within minutes of resort. APRIL BONUS now available. http://www.intimate-travelclub.com This communi-cation is privileged and contains confidential information intended only for the person(s) to whom it is addressed. Any unauthorized disclosure, copying, distribution of this communication. or any action on its c.ontents is strictly prohibited. If you have received this message in error, please notify us immediately OR remove yours.elf from our list if there is no in.terest in regards to our services. http://www.intimate-travelclub.com/remove/remove.html berlioz jill quicken dreamy eider expand idiomatic pfennig arrogate caution annulling trytophan vivace porch atreus biddy cesare fortieth fredholm trawl lime repertoire ciliate bombastic exact avesta betoken From sdimov at sbnd.net Mon Apr 5 05:37:53 2004 From: sdimov at sbnd.net (Stoytcho Dimov) Date: Mon Apr 5 05:37:56 2004 Subject: [BioPython] Outsourcing of software development Message-ID: <0a4101c41af1$2e652e80$150c000a@sbnd.int> Attention to: IT department of The Biopython Project Dear Manager, We saw your company description in the "Alexa" database and we are interested in establishing partnership between our companies. SBND Technologies, a leading Bulgarian software outsourcing company, caters to custom programming in the areas of Web development, Internet based systems development, Low-level systems development and Desktop applications. SBND Technologies provides solutions for small to large businesses. Our current development is mainly oriented towards Microsoft? Windows? and UNIX?-based platforms on PC and PDA devices. We employ a wide range of up-to-date programming technologies and languages. We handle various types of projects from very small to large, complex and cross-platform, incorporating a number of different technologies. We cover the complete project lifecycle, including required consulting, design, development, testing, deployment to end users and support. We charge between $15 and $25 per working hour depending on the length of work and complexity of the task involved. The product, and all the relevant source code, is the intellectual property of the client. We take care to ensure that all the author and support information in the final product refers to the client. In addition, the source code is always supplied with the final release of the product, for no additional charge. If you are interested in our services, I would be happy to provide you with any information and references you may request. More information could be obtained also at http://www.sbnd.net/ I look forward to hearing from you. Kind regards, Stoycho Dimov Executive Manager SBND Technologies Ltd. URL: http://www.sbnd.net E-mail: sdimov@sbnd.net Phone1: +359 2 9312378 Phone2: +359 2 328709 Fax: + 359 2 8313158 From jeffrey_chang at stanfordalumni.org Mon Apr 5 10:13:10 2004 From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang) Date: Mon Apr 5 10:18:11 2004 Subject: [BioPython] kMeans.py -> Bio.Cluster In-Reply-To: <4070D492.20107@ims.u-tokyo.ac.jp> References: <4070D492.20107@ims.u-tokyo.ac.jp> Message-ID: <5E359854-870B-11D8-8D4C-000A956845CE@stanfordalumni.org> On Apr 4, 2004, at 11:37 PM, Michiel Jan Laurens de Hoon wrote: > 1. I have added a Deprecation Warning to the kMeans module and the > xkMeans module. Btw, it seems that the import statement in xkMeans.py > is no longer valid (from Bio.Clustering import kMeans). > 2. Below is my attempt at the DEPRECATED doc. Jeff, could you have a > look at this to see if there are any mistakes? Thanks! This looks fantastic, Michiel! It seems accurate and complete to me. Jeff From pieter at kotnet.org Mon Apr 5 16:58:31 2004 From: pieter at kotnet.org (pieter@kotnet.org) Date: Mon Apr 5 17:16:10 2004 Subject: [BioPython] Non blocking blast. Message-ID: <87y8pagn0o.fsf@hades.kotnet.org> Hello, Is there a way to use biopython to submit let say 100 jobs to blast. Without waiting for them, storing the request ids. And than afterwards reading the ids and checking which results are available? Thanks in advanvd, Pieter From jeffrey_chang at stanfordalumni.org Mon Apr 5 18:26:49 2004 From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang) Date: Mon Apr 5 18:32:21 2004 Subject: [BioPython] Non blocking blast. In-Reply-To: <87y8pagn0o.fsf@hades.kotnet.org> References: <87y8pagn0o.fsf@hades.kotnet.org> Message-ID: <54AB7104-8750-11D8-B659-000A956845CE@stanfordalumni.org> You may be able to use Bio.MultiProc.copen to do something like that. But you do realize that NCBI BLAST is a shared resource, right? Jeff On Apr 5, 2004, at 4:58 PM, pieter@kotnet.org wrote: > Hello, > > Is there a way to use biopython to submit let say 100 jobs to blast. > Without waiting for them, storing the request ids. And than afterwards > reading the ids and checking which results are available? > > Thanks in advanvd, > > Pieter > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython From postmaster at scichina.com Mon Apr 5 22:35:53 2004 From: postmaster at scichina.com (Postmaster) Date: Mon Apr 5 22:40:39 2004 Subject: [BioPython] Undeliverable Mail Message-ID: <10404061035.AA76816130@scichina.com> No message body: liyawen@scichina.com Original message follows. From dalke at dalkescientific.com Tue Apr 6 01:00:49 2004 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue Apr 6 01:07:18 2004 Subject: [BioPython] Non blocking blast. In-Reply-To: <87y8pagn0o.fsf@hades.kotnet.org> References: <87y8pagn0o.fsf@hades.kotnet.org> Message-ID: <5F054E10-8787-11D8-B94E-000393C92466@dalkescientific.com> pieter@kotnet.org: > Is there a way to use biopython to submit let say 100 jobs to blast. > Without waiting for them, storing the request ids. And than afterwards > reading the ids and checking which results are available? NCBI won't like it if you do 100 BLASTs at once, but let's suppose it's a hypothetical. Biopython's BLAST looks like a function call. That it, it hides that it's doing network I/O. The standard way to parallalize it is to use threads, and for this the standard idiom is boss/worker. One thread creates two Queue.Queue instances, one for job requests and the other for job results. It then starts up N other threads, each of which know about the Queues. The boss thread submits the jobs (as a simple data structure) to the queue. Each worker thread does a get on the queue to get the next job and does the Biopython BLAST request. When done, the worker thread returns the information in the results Queue. While waiting the boss thread can do whatever else is needed. Aahz wrote some documentation about this idiom ... probably http://starship.python.net/crew/aahz/OSCON2001/ Andrew dalke@dalkescientific.com From fkauff at duke.edu Tue Apr 6 10:32:56 2004 From: fkauff at duke.edu (Frank Kauff) Date: Tue Apr 6 10:37:53 2004 Subject: [BioPython] Non blocking blast. In-Reply-To: <5F054E10-8787-11D8-B94E-000393C92466@dalkescientific.com> References: <87y8pagn0o.fsf@hades.kotnet.org> <5F054E10-8787-11D8-B94E-000393C92466@dalkescientific.com> Message-ID: <1081261976.2059.13.camel@osiris.biology.duke.edu> Folks, On Tue, 2004-04-06 at 01:00, Andrew Dalke wrote: > pieter@kotnet.org: > > Is there a way to use biopython to submit let say 100 jobs to blast. > > Without waiting for them, storing the request ids. And than afterwards > > reading the ids and checking which results are available? > > NCBI won't like it if you do 100 BLASTs at once, but let's suppose > it's a hypothetical. > > Biopython's BLAST looks like a function call. That it, it hides > that it's doing network I/O. The standard way to parallalize it > is to use threads, and for this the standard idiom is boss/worker. > One thread creates two Queue.Queue instances, one for job > requests and the other for job results. It then starts up N > other threads, each of which know about the Queues. The boss > thread submits the jobs (as a simple data structure) to the > queue. Each worker thread does a get on the queue to get the > next job and does the Biopython BLAST request. When done, the > worker thread returns the information in the results Queue. > While waiting the boss thread can do whatever else is needed. > I've a little (crude) script ready that does that, blasting a fasta file of sequences using threads. It can be useful for blasting a 96 plate of sequences overnight. But be careful - as Jeff mentioned, blast is a shared resource: - for each additional request in the blast queue, you'll get a 60 (or so) seconds penalty from NCBI: 60s for the second, 120s for the third, etc. Makes to many threads quite unattractive... - If you start too many blasts in a short time, after hitting some limit the only response will be a nice page saying 'Access denied due to possible misuse', and your IP will be blocked from further access to ncbi blast... You'll then have to write them a nice email and beg for grace. Happend to me while testing some automated blast feature :-) But the limit seems to be several 100 requests in like 24h, which is quite a lot. If you're interested in the script, send me an email. Frank > Aahz wrote some documentation about this idiom ... probably > http://starship.python.net/crew/aahz/OSCON2001/ > > Andrew > dalke@dalkescientific.com > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython -- Frank Kauff Dept. of Biology Duke University Box 90338 Durham, NC 27708 USA Phone 919-660-7382 Fax 919-660-7293 From Neal5Mcguire at rock.com Tue Apr 6 18:10:16 2004 From: Neal5Mcguire at rock.com (Neal Mcguire) Date: Tue Apr 6 14:13:08 2004 Subject: [BioPython] 14 Date Features Message-ID: Hello, Friends have sent you an invitation for a surprise date. Hurry! http://needtolookforlove.com/confirm/?oc=52212355 This commun-ication is privileged and contains confi.dential information - intended on.ly for the person(s) to whom it is addressed. Any unauthorized disclosure, copying, other distribution of this communi.cation or taking any action .on its contents is strictly prohibited. If you have received. this -message in error, plea.se. notify us immediately OR remove yourself- from our list if there is no interest in re-gards to our services. http://needtolookforlove.com/remove/?oc=17333 hovel corona carport plausible onslaught bronchial danubian cryogenic bacillus beech conic collaborate dementia introduction butch tech economy interior bakery booky serendipitous downey amiss chairperson parkish deuterium clasp insert ago 2 From cook_jim at yahoo.com Tue Apr 6 16:07:47 2004 From: cook_jim at yahoo.com (J Cook) Date: Tue Apr 6 16:12:44 2004 Subject: [BioPython] numeric Message-ID: <20040406200747.71663.qmail@web10507.mail.yahoo.com> Hi, I'm pulling together the dependencies for biopython and need to know whether I should download "numeric" or "numarray". Thanks, Jim Cook ===== Email from: Jim Cook Reply to : cook_jim@yahoo.com cookjim@ieee.org __________________________________ Do you Yahoo!? Yahoo! Small Business $15K Web Design Giveaway http://promotions.yahoo.com/design_giveaway/ From cook_jim at yahoo.com Tue Apr 6 17:07:36 2004 From: cook_jim at yahoo.com (J Cook) Date: Tue Apr 6 17:12:32 2004 Subject: [BioPython] Installation help Message-ID: <20040406210736.56677.qmail@web10509.mail.yahoo.com> Hi, I've completed the installation of the biopython dependencies and would like to run the rigorous tests mentioned in the documentation. However, I don't see a "Tests" directory in the installation. What do I need to do to run the tests? Thanks, Jim Cook ===== Email from: Jim Cook Reply to : cook_jim@yahoo.com cookjim@ieee.org __________________________________ Do you Yahoo!? Yahoo! Small Business $15K Web Design Giveaway http://promotions.yahoo.com/design_giveaway/ From chapmanb at uga.edu Tue Apr 6 17:44:26 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Tue Apr 6 17:54:59 2004 Subject: [BioPython] kMeans.py -> Bio.Cluster In-Reply-To: <4070D492.20107@ims.u-tokyo.ac.jp> References: <4070D492.20107@ims.u-tokyo.ac.jp> Message-ID: <20040406214426.GA25784@evostick.agtec.uga.edu> Hi Michiel; > 1. I have added a Deprecation Warning to the kMeans module and the xkMeans > module. Btw, it seems that the import statement in xkMeans.py is no longer > valid (from Bio.Clustering import kMeans). > 2. Below is my attempt at the DEPRECATED doc. Jeff, could you have a look > at this to see if there are any mistakes? Thanks! Thanks for doing this! The deprecated documentation is excellent and I definitely support the deprecation of xkMeans.py as well. From the invalid import statement it is likely not it use much and if it doesn't fit with the new system we might as well let it fall to the side. Glad to have things coming together nicely. I think I am going to push for another release soonish (probably the week after next), so if there is more work you want to do before then on the clustering stuff please feel free to go ahead. Thanks again. Brad From chapmanb at uga.edu Tue Apr 6 18:39:02 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Tue Apr 6 18:49:36 2004 Subject: [BioPython] Installation help In-Reply-To: <20040406210736.56677.qmail@web10509.mail.yahoo.com> References: <20040406210736.56677.qmail@web10509.mail.yahoo.com> Message-ID: <20040406223902.GE25784@evostick.agtec.uga.edu> Hi Jim; [Condensing both answers into a single mail] > I'm pulling together the dependencies for biopython > and need to know whether I should download "numeric" > or "numarray". Numeric. Although Numarray is supposed to be API compatible to some extent with Numeric, I don't believe that anyone has fully tested that out. Numeric should work for now -- at some point in the future someone will likely go through and ensure everything works and we'll move on to Numarray. > I've completed the installation of the biopython > dependencies and would like to run the rigorous tests > mentioned in the documentation. However, I don't see > a "Tests" directory in the installation. What do I > need to do to run the tests? If you've installed it from source, then inside the biopython-1.24 directory there should be a Tests directory. You can either change to the Tests directory and do python run_tests.py (add --no-gui if you don't want a Tk GUI) or just do python setup.py test from the main installation directory. If you are on Windows and installed using a Windows installer, you need to download the source and unpack it to find the Tests. Hope this helps. Brad From sbassi at asalup.org Tue Apr 6 18:52:06 2004 From: sbassi at asalup.org (Sebastian Bassi) Date: Tue Apr 6 18:58:12 2004 Subject: [BioPython] kMeans.py -> Bio.Cluster In-Reply-To: <20040406214426.GA25784@evostick.agtec.uga.edu> References: <4070D492.20107@ims.u-tokyo.ac.jp> <20040406214426.GA25784@evostick.agtec.uga.edu> Message-ID: <40733496.2060609@asalup.org> Brad Chapman wrote: > Glad to have things coming together nicely. I think I am going to > push for another release soonish (probably the week after next), so > if there is more work you want to do before then on the clustering > stuff please feel free to go ahead. Please wait for another update of Tm calculation. I received a buf report (seems to be a bad value on the deltaS table and some other minor stuff). -- Best regards, //=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ //=\ \=// IT Manager Advanta Seeds - Balcarce Research Center - \=// //=\ Pro secretario ASALUP - www.asalup.org - PGP key available //=\ \=// E-mail: sbassi@genesdigitales.com - ICQ UIN: 3356556 - \=// http://Bioinformatica.info From dalke at dalkescientific.com Tue Apr 6 23:14:13 2004 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue Apr 6 23:20:31 2004 Subject: [BioPython] BOSC Message-ID: Hi all, Just a reminder -- only a bit over a week to submit a talk proposal for BOSC. In addition to the talks there will also be lightning talks (5-7 minutes max) and a demo session for showing off your software to others in a small group rather than to everyone at once. Hope to see many of you there! Andrew dalke@dalkescientific.com From mdehoon at ims.u-tokyo.ac.jp Wed Apr 7 08:19:27 2004 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Wed Apr 7 08:24:26 2004 Subject: [BioPython] Questions & suggestions In-Reply-To: References: <401AC30A.E639BD2E@ebc.uu.se> <200403191023.01597.thamelry@binf.ku.dk> <20040319173703.GC95219@evostick.agtec.uga.edu> <200403192151.05004.thamelry@binf.ku.dk> <20040321174605.GA18818@evostick.agtec.uga.edu> Message-ID: <4073F1CF.8050809@ims.u-tokyo.ac.jp> Jeffrey Chang wrote: > SVM is superceded by libsvm. It should be deprecated. Is libsvm in Biopython? I couldn't find it there. I am now using Bio.SVM, which seems to have a problem with large data sets, so I'd like to try libsvm. --Michiel. -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From jeffrey_chang at stanfordalumni.org Wed Apr 7 09:25:51 2004 From: jeffrey_chang at stanfordalumni.org (Jeffrey Chang) Date: Wed Apr 7 09:30:49 2004 Subject: [BioPython] Questions & suggestions In-Reply-To: <4073F1CF.8050809@ims.u-tokyo.ac.jp> References: <401AC30A.E639BD2E@ebc.uu.se> <200403191023.01597.thamelry@binf.ku.dk> <20040319173703.GC95219@evostick.agtec.uga.edu> <200403192151.05004.thamelry@binf.ku.dk> <20040321174605.GA18818@evostick.agtec.uga.edu> <4073F1CF.8050809@ims.u-tokyo.ac.jp> Message-ID: <17306090-8897-11D8-91D5-000A956845CE@stanfordalumni.org> libsvm is a general SVM library, not related to Biopython. http://www.csie.ntu.edu.tw/~cjlin/libsvm/ It has a Python interface. Jeff On Apr 7, 2004, at 8:19 AM, Michiel Jan Laurens de Hoon wrote: > Jeffrey Chang wrote: >> SVM is superceded by libsvm. It should be deprecated. > > Is libsvm in Biopython? I couldn't find it there. I am now using > Bio.SVM, which seems to have a problem with large data sets, so I'd > like to try libsvm. > > --Michiel. > > > -- > Michiel de Hoon, Assistant Professor > University of Tokyo, Institute of Medical Science > Human Genome Center > 4-6-1 Shirokane-dai, Minato-ku > Tokyo 108-8639 > Japan > http://bonsai.ims.u-tokyo.ac.jp/~mdehoon > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython From jgreena at emory.edu Wed Apr 7 07:34:50 2004 From: jgreena at emory.edu (jgreena@emory.edu) Date: Wed Apr 7 09:31:14 2004 Subject: [BioPython] Re: Question Message-ID: <200404071155.i37Btgg2010751@portal.open-bio.org> WARNING: This e-mail has been altered by MIMEDefang. Following this paragraph are indications of the actual changes made. For more information about your site's MIMEDefang policy, contact MIMEDefang Administrator's . For more information about MIMEDefang, see: http://www.roaringpenguin.com/mimedefang/enduser.php3 An attachment named word_doc_biopython.pif was removed from this document as it constituted a security hazard. If you require this document, please contact the sender and arrange an alternate means of receiving it. -------------- next part -------------- I have attached the sample. From cook_jim at yahoo.com Wed Apr 7 16:35:11 2004 From: cook_jim at yahoo.com (J Cook) Date: Wed Apr 7 16:40:04 2004 Subject: [BioPython] Testing of installation Message-ID: <20040407203511.64895.qmail@web10503.mail.yahoo.com> Hi, I can't find the file "run_tests.py" referred to in the biopython documentation. Is there another file that will test the entire installation? Thanks, Jim Cook ===== Email from: Jim Cook Reply to : cook_jim@yahoo.com cookjim@ieee.org __________________________________ Do you Yahoo!? Yahoo! Small Business $15K Web Design Giveaway http://promotions.yahoo.com/design_giveaway/ From cook_jim at yahoo.com Wed Apr 7 16:43:52 2004 From: cook_jim at yahoo.com (J Cook) Date: Wed Apr 7 16:48:47 2004 Subject: [BioPython] Installation help Message-ID: <20040407204352.29555.qmail@web10505.mail.yahoo.com> Hi Brad, That answers my questions. I am on a Windows machine and I used a Windows installer, so I'll download the source to get the tests. Thanks, Jim Cook ===== Email from: Jim Cook Reply to : cook_jim@yahoo.com cookjim@ieee.org __________________________________ Do you Yahoo!? Yahoo! Small Business $15K Web Design Giveaway http://promotions.yahoo.com/design_giveaway/ From pieter at laeremans.org Thu Apr 8 08:37:17 2004 From: pieter at laeremans.org (Pieter Laeremans) Date: Thu Apr 8 08:42:36 2004 Subject: [BioPython] Non blocking blast. References: <87y8pagn0o.fsf@hades.kotnet.org> <5F054E10-8787-11D8-B94E-000393C92466@dalkescientific.com> <1081261976.2059.13.camel@osiris.biology.duke.edu> Message-ID: <87vfkaabnm.fsf@hades.kotnet.org> Frank Kauff writes: > > I've a little (crude) script ready that does that, blasting a fasta file > of sequences using threads. It can be useful for blasting a 96 plate of > sequences overnight. > But be careful - as Jeff mentioned, blast is a shared resource: > - for each additional request in the blast queue, you'll get a 60 (or > so) seconds penalty from NCBI: 60s for the second, 120s for the third, > etc. Makes to many threads quite unattractive... > - If you start too many blasts in a short time, after hitting some limit > the only response will be a nice page saying 'Access denied due to > possible misuse', and your IP will be blocked from further access to > ncbi blast... You'll then have to write them a nice email and beg for > grace. Happend to me while testing some automated blast feature :-) But > the limit seems to be several 100 requests in like 24h, which is quite a > lot. > > If you're interested in the script, send me an email. > > Frank Thank you all very much for the input. But I think I have no other option than submitting one job at a time. Since it is of utmost importance that I do not get blocked. kind regards, Pieter From biosql at hotmail.com Thu Apr 8 16:31:14 2004 From: biosql at hotmail.com (Jonathan Boulais) Date: Thu Apr 8 17:03:47 2004 Subject: [BioPython] Need help to get Fasta sequence of Gis ! Message-ID: An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/biopython/attachments/20040408/bd741961/attachment.htm From idoerg at burnham.org Thu Apr 8 17:07:08 2004 From: idoerg at burnham.org (Iddo Friedberg) Date: Thu Apr 8 17:12:31 2004 Subject: [BioPython] Need help to get Fasta sequence of Gis ! In-Reply-To: References: Message-ID: <4075BEFC.5000303@burnham.org> Welcome! Since the list is huge, I guess you should do it standalone, rather than via the net. How about downloading nr or np as the case may be from NCBI. The gi numbers should be in the fasta headers. Then use the fasta parser (see tutorial) to parse through the file, and retrieve those sequences which you need. email me if you need more info, Iddo Jonathan Boulais wrote: > Hi everyone ! > > I'm a newbie to Biopython and I would like to get the fasta sequences of > a huge list of Gis. Any suggestions ? > > Thanks ! > > ------------------------------------------------------------------------ > MSN Search, le moteur de recherche qui pense comme vous ! Cliquez-ici > > > > ------------------------------------------------------------------------ > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython -- Iddo Friedberg, Ph.D. The Burnham Institute 10901 N. Torrey Pines Rd. La Jolla, CA 92037 USA Tel: +1 (858) 646 3100 x3516 Fax: +1 (858) 713 9930 http://ffas.ljcrf.edu/~iddo From dalke at dalkescientific.com Thu Apr 8 17:15:37 2004 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu Apr 8 17:21:53 2004 Subject: [BioPython] Need help to get Fasta sequence of Gis ! In-Reply-To: References: Message-ID: Jonathan Boulais: > Hi everyone ! > I'm a newbie to Biopython Welcome! > and I would like to get the fasta sequences of a huge list of Gis. Any > suggestions ? How huge? At some point it's better to just download GenBank and get the data straight from there. If it's small enough (10,000 or fewer records?), then look at the Bio.EUtils client. >>> from Bio import EUtils >>> from Bio.EUtils import ThinClient >>> client = ThinClient.ThinClient() >>> dbids = EUtils.DBIds("protein", ["914034", "5263173", "1769808", "1060883"]) >>> f = client.efetch_using_dbids(dbids, retmode = "text", rettype = "fasta") >>> print f.read() >gi|914034|gb|AAB32951.1| cruxrhodopsin-2 [Haloarcula] MLQSGMSTYVPGGESIFLWVGTAGMFLGMLYFIARGWSVSDQRRQKFYIATIMIAAIAFVNYLSMALGFG VTTIELGGEERAIYWARYTDWLFTTPLLLYDLALLAGADRNTIYSLVGLDVLMIGTGALATLSAGSGVLP AGAERLVWWGISTGFLLVLLYFLFSNLTDRASELSGDLQSKFSTLRNLVLVLWLVYPVLWLVGTEGLGLV GLPIETAAFMVLDLTAKIGFGIILLQSHAVLDEGQTASEGAAVAD >gi|5263173|dbj|BAA81816.1| cruxrhodopsin [Haloarcula japonica] MPEPGSEAIWLWLGTAGMFLGMLYFIGRGWGETDSRRQKFYIATILITAIAFVNYLAMALGFGLTIVEFA GEEHPIYWARYSDWLFTTPLLLYDLGLLAGADRNTIASLVSLDVLMIGTGLVATLSAGSGVLSAGAERLV WWGISTAFLLVLLYFLFSSLSGRVADLPSDTRSTFKTLRNLVTVVWLVYPVWWLIGTEGLGLVGIGIETA GFMVIDLTAKVGFGIILLRSHGVLDGAAETTGAGATATAD >gi|1769808|dbj|BAA06680.1| cruxrhodopsin-3 [Haloarcula vallismortis] MPAPEGEAIWLWLGTAGMFLGMLYFIARGWGETDSRRQKFYIATILITAIAFVNYLAMALGFGLTIVEIA GEQRPIYWARYSDWLFTTPLLLYDLGLLAGADRNTISSLVSLDVLMIGTGLVATLSAGSGVLSAGAERLV WWGISTAFLLVLLYFLFSSLSGRVADLPSDTRSTFKTLRNLVTVVWLVYPVWWLVGTEGIGLVGIGIETA GFMVIDLVAKVGFGIILLRSHGVLDGAAETTGAGATATAD >gi|1060883|dbj|BAA06678.1| cruxrhodopsin-1 [Haloarcula argentinensis] MPEPGSEAIWLWLGTAGMFLGMLYFIARGWGETDSRRQKFYIATILITAIAFVNYLAMALGFGLTIVEFA GEEHPIYWARYSDWLFTTPLLLYDLGLLAGADRNTITSLVSLDVLMIGTGLVATLSPGSGVLSAGAERLV WWGISTAFLLVLLYFLFSSLSGRVADLPSDTRSTFKTLRNLVTVVWLVYPVWWLIGTEGIGLVGIGIETA GFMVIDLTAKVGFGIILLRSHGVLDGAAETTGTGATPADD I'm working a cleanup of EUtils to make some of the machinery disappear. I expect the result will let you do import EUtils f = EUtils.efetch("protein", ["914034", "5263173", "1769808", "1060883"], format = "fasta") print f.read() Is anyone here using EUtils? I would like to see some code which uses it, to make sure I don't break things and to see if I can improve the API. Andrew dalke@dalkescientific.com From chapmanb at uga.edu Thu Apr 8 17:15:42 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Thu Apr 8 17:26:08 2004 Subject: [BioPython] Need help to get Fasta sequence of Gis ! In-Reply-To: <4075BEFC.5000303@burnham.org> References: <4075BEFC.5000303@burnham.org> Message-ID: <20040408211542.GD63800@evostick.agtec.uga.edu> Hey Jonathon, Iddo; Jonathon: > >I'm a newbie to Biopython and I would like to get the fasta sequences of > >a huge list of Gis. Any suggestions ? Iddo: > Since the list is huge, I guess you should do it standalone, rather than > via the net. That's the best idea. But if you want to do it by the web and it is feasible (depends a lot on your definition of huge), you can use the Biopython EUtils interface. If your list of gis is in a variable called my_gis, you could do this like: from Bio.EUtils import DBIds from Bio.EUtils import DBIdsClient # assuming they are GIs for DNA sequence db_ids = DBIds("nucleotide", my_gis) eutils_client = DBIdClient.from_dbids(db_ids) fasta_handle = eutils_client.efetch(retmode = "text", rettype = "fasta") output_handle = open("my_output.fasta", "w") output_handle.write(fasta_handle.read()) output_handle.close() EUtils is pretty nice at giving you back a lot of sequences, so that might work for you. Best of luck. Brad From chapmanb at uga.edu Thu Apr 8 17:27:48 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Thu Apr 8 17:38:15 2004 Subject: [BioPython] BOSC In-Reply-To: References: Message-ID: <20040408212748.GE63800@evostick.agtec.uga.edu> Hey all; Andrew: > Just a reminder -- only a bit over a week to submit a talk > proposal for BOSC. In addition to the talks there will also > be lightning talks (5-7 minutes max) and a demo session for > showing off your software to others in a small group rather > than to everyone at once. Thanks for the reminder. I'd definitely like to encourage people to give Python/Biopython related talks. If you wrote (or are nearly done writing :-) a module for Biopython the lightning talks are a great chance to show it off. We'd also like to try and get an official Biopython talk (a longer talk with extensive examples of Biopython and it's usage) as BOSC is a great chance to let people know about the kind of things Biopython does well. I'd like to offer up the talk to any of the regular contributors to Biopython. Although I'm now the "official" coordinator, I've also given the talk the past two years and would like to have a fresh perspective on it from someone. Additionally, BOSC falls right after graduation deadlines for me this year so I will be quite busy. Is anyone interested? Basically, it would involve writing up an abstract by the deadline (May 5th, from the BOSC website: http://open-bio.org/bosc2004/) and then getting together the talk. There are several sample talks on the Biopython documentation page to give an idea of what they are like: http://www.biopython.org/documentation/ Well, that's my plea for good Biopython representation at the BOSC conference this year. Pretty persuasive, eh? Brad From Nicolas.Chauvat at logilab.fr Sat Apr 10 16:59:17 2004 From: Nicolas.Chauvat at logilab.fr (Nicolas Chauvat) Date: Sat Apr 10 17:04:16 2004 Subject: [BioPython] Europython: Registration open - Talk submission deadline is Apr 15th Message-ID: <20040410205917.GA10699@logilab.fr> Dear Scientific Pythonistas, I'm forwarding this update about the EuroPython conference. Please note that the talk submission deadline is April 15th but will probably be extended a bit. I suppose most of you will be interested in the Science Track at least. For previous years' program, please refer to : http://www.europython.org/2002/sessions/talks http://www.europython.org/2003/sessions/talks --------------------------------------------------------------------- Subject: [EuroPython] Europython Update: Registration open Europython Update ================= - Registration is now open. We apologise for the delay, but we have had some technical problems. - Due to this, we have decided to keep the submission of abstracts for the refereed track open for one more day. Last submission time is now on Sunday 11 April at 23.59 CET. - We have a limited number of beds available in very affordable accomodation near the conference venue. Book early before it runs out. - We are still receiving submissions for regular talks and tutorials. Closing date is 15 April. - There is now a wiki at the Europython website for sprint organising. Start planning! About the conference ==================== EuroPython 2004 will be held 7-9 June in G?teborg, Sweden. The EuroPython conference will have tracks for Science, Business, Education, Applications, Frameworks, Zope and the Python language itself. Lightning talks, Open Space and BOF sessions are also planned. There will be tutorials as well, both for newcomers to Python and Python users interested in special subjects. In the days before and after the conference, programming sprints will be arranged. Important dates =============== Refereed paper proposals: until 11 April. Submission of talks: 1 March - 15 April. Early Bird registration: 9 April - 1 May. Accomodation booking: 9 April - 1 May (or until space runs out) More information at http://www.europython.org. --------------------------------------------------------------------- Hope to see you there. -- Nicolas Chauvat logilab.fr - services en informatique avanc?e et gestion de connaissances From dalke at dalkescientific.com Sun Apr 11 03:36:35 2004 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sun Apr 11 03:42:39 2004 Subject: [BioPython] Europython: Registration open - Talk submission deadline is Apr 15th In-Reply-To: <20040410205917.GA10699@logilab.fr> References: <20040410205917.GA10699@logilab.fr> Message-ID: EuroPython is in G?teborg, Sweden this year, which just happens to be where one of my clients is located. I'm planning to be at EuroPython and am planning to submit a talk along the lines of "Python Libraries for Chemistry and Biology" which will cover Biopython, PyDaylight, OEChem, and probably a few others (MMTK, PyQuante). If I make it "Python Software for ..." then I'll mention PyMol and a few more applications. As such, I think it will be a 45 minute talk. However, if someone else here is planning to be at EuroPython and wants to talk about one or more of these then let me know so I can adjust my proposal accordingly. Andrew dalke@dalkescientific.com From yong27 at bioinfo.sarang.net Mon Apr 12 03:34:43 2004 From: yong27 at bioinfo.sarang.net (Hyung-Yong Kim) Date: Mon Apr 12 03:39:35 2004 Subject: [BioPython] parsing error in Bio.Sequencing.Ace Message-ID: Hi, I'm parsing the ACE file made by phrap. But there is some parsing error. $ cat aceToAbs.py #!/home/yong27/bin/python import sys from Bio.Sequencing import Ace for contig in Ace.Iterator(sys.stdin, Ace.RecordParser()): sys.stdout.write(contig.contig_name+'-----------\n') for af in contig.af: sys.stdout.write(af.name+' '+str(af.padded_start)+'\n') $ ./aceToAbs.py < 040407.fasta.screen.ace.1 > out Traceback (most recent call last): File "./aceToAbs.py", line 6, in ? for contig in Ace.Iterator(sys.stdin,Ace.RecordParser()): File "/home/yong27/python/lib/python2.3/site-packages/Bio/Sequencing/Ace.py", line 206, in next return self._parser.parse(File.StringHandle(data)) File "/home/yong27/python/lib/python2.3/site-packages/Bio/Sequencing/Ace.py", line 225, in parse self._scanner.feed(uhandle, self._consumer) File "/home/yong27/python/lib/python2.3/site-packages/Bio/Sequencing/Ace.py", line 315, in feed self._scan_record(handle, consumer) File "/home/yong27/python/lib/python2.3/site-packages/Bio/Sequencing/Ace.py", line 363, in _scan_record read_and_call(uhandle,consumer.ct_start,start='CT') File "/home/yong27/python/lib/python2.3/site-packages/Bio/ParserSupport.py", line 301, in read_and_call method(line) File "/home/yong27/python/lib/python2.3/site-packages/Bio/Sequencing/Ace.py", line 502, in ct_start raise SyntaxError, 'CT tag does not start with CT{' SyntaxError: CT tag does not start with CT{ This problem seems that because parser don't clarify the distinction between 'CT' record and nucleic acid '^CT'. Temporary solution is to modify 362 line of Ace.py if line.startswith('CT'): --> if line.startswith('CT{'): It needs to be modified. Hyung-Yong Kim --------------------------------- National Livestock Research Institute (Korea) Division of Animal Genomics & Bioinformatics http://bioinfo.sarang.net From fkauff at duke.edu Mon Apr 12 09:01:26 2004 From: fkauff at duke.edu (Frank Kauff) Date: Mon Apr 12 09:06:15 2004 Subject: [BioPython] parsing error in Bio.Sequencing.Ace In-Reply-To: References: Message-ID: <1081774886.2059.4.camel@osiris.biology.duke.edu> Hi Hyung-Yong Kim, On Mon, 2004-04-12 at 03:34, Hyung-Yong Kim wrote: ... > This problem seems that because parser don't clarify the distinction between 'CT' record and nucleic acid '^CT'. > Temporary solution is to modify 362 line of Ace.py > > if line.startswith('CT'): --> if line.startswith('CT{'): > > It needs to be modified. > It does! In theory, the parser should not run into this problem - are you using the newest ace.py from cvs? The ace parser has been updated after the recent 'official' release of biopython. Anyway, please send me your input file so I can have a closer look why this happens and I can fix it. Frank > Hyung-Yong Kim > --------------------------------- > National Livestock Research Institute (Korea) > Division of Animal Genomics & Bioinformatics > http://bioinfo.sarang.net > > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython -- Frank Kauff Dept. of Biology Duke University Box 90338 Durham, NC 27708 USA Phone 919-660-7382 Fax 919-660-7293 From =?iso-2022-jp?Q?=1B=24B=22!3Z!9=25H=25/=25H=25/=25S=258=25M=1B=28B?= Mon Apr 12 10:00:31 2004 From: =?iso-2022-jp?Q?=1B=24B=22!3Z!9=25H=25/=25H=25/=25S=258=25M=1B=28B?= (=?iso-2022-jp?Q?=1B=24B=22!3Z!9=25H=25/=25H=25/=25S=258=25M=1B=28B?=) Date: Mon Apr 12 10:11:32 2004 Subject: [BioPython] =?iso-2022-jp?b?GyRCIVokKkU3NSQlYSVrJV4lLCFbIzUbKEI=?= =?iso-2022-jp?b?GyRCMi8jOUBpS3wxXz5aNXIhJj1QTWgkazpfQnAlUyU4JU0lOSRIGyhC?= =?iso-2022-jp?b?GyRCQTQ5cSQqRTc1JD5wSnMhIUJoGyhCNDEwGyRCOWYhShsoQjgsMDAw?= =?iso-2022-jp?b?GyRCSXRHWz8uGyhCKQ==?= Message-ID: <18976763.1081778431703.JavaMail.nobody@hosyou-b.mine.nu> biopython@biopython.org様、 お読み頂き、ありがとうございます。 ◇本メルマガの配信不要、または登録した覚えのない場合は  一番下の★★★ 今日の天気予報 ★★★     と━毎朝見れる! 全国のお天気━     の間にあるアドレスで解除させて頂きます。    ━━━━━━━━━━━━━━━━━━━━━━━━━━   ◇5億9千万円証拠有ビジネス!高額金額は関心が高い◇   ━━━━━━━━━━━━━━━━━━━━━━━━━━ ☆5千万円貯金が欲しい! ☆3億円の事業資金が欲しい! ☆老後のために4千万円欲しい ☆安心の為に保証が欲しい ☆ネットで7千万円以上収入が欲しい そんな貴方に最適な簡単在宅ビジネスです。 既に3億、5億9千万円の収入者続出しています。 まずは、証拠を見てください、それからです! ★証拠は東京高裁にて判決★ ◆高額収入の証拠アリ!◆ 詳しくはHPにてご確認ください。  http://break.at/hosyou 何事も論より証拠・裁判官の目でお確かめ推奨!!              ━━━━━━━━━━━━━       貴方様は!ご存じですか!       ━━━━━━━━━━━━━  ついにグランドオープンの日程が決まりました!  それまでに貴方のポジションを確保して置いて下さい!  ↓  ↓  http://fortuna.love-jpn.com/index.cgi?id=world            ━━━━━━━━━━━━━━        働くママの!私にも出来る事!        ━━━━━━━━━━━━━━ 在庫なし・セミナー参加なし・初心者にも簡単!・サポート 体制バッチリ 参加者の8割が女性の方というのも納得です http://stage-one.jp/jc/myao/       ━━━━━━━━━━━━━━━━━ 【広告投稿随時募集中 !!】 広告 5回掲載30日で3000円 広告 7回掲載5日で4000円 広告10回掲載60日で5000円      ◎ 本日も投稿!ありがとうございました♪◎ <<メルマガ発行者>>楽々トクトク情報社 yazawa@easy.to ■免責事項 当メールマガジンに掲載している情報に関して発行者では一切の 責任を負いません。 一切の責任を負いかねますのでご了承ください。掲載記事に関す るお問い合わせは直接投稿者へお願いいたします。      ★★★ 今日の天気予報 ★★★ ■ アドレスで配信解除 ■ ◇本メルマガの配信不要、または登録した覚えのない場合は  次のアドレスからの登録により解除させて頂きます。        http://back.to/mailstop    ━━━毎朝見れる! 全国のお天気━━  ------------------------------------------- 4月12日11時発表 主要都市 今夜 明日 札幌 曇りのち時々晴れ 晴れ 仙台 晴れ 曇り時々晴れ 東京 晴れ 晴れのち時々曇り 長野 晴れ 晴れ 静岡 晴れ 晴れのち時々曇り 名古屋 晴れ 晴れのち時々曇り 新潟 晴れ 晴れ 金沢 晴れ 晴れ 大阪 晴れ 晴れのち時々曇り 岡山 晴れ 晴れのち時々曇り 広島 晴れ 晴れのち一時雨 高松 晴れ 曇り 福岡 晴れのち時々曇り 曇り一時雨 鹿児島 晴れのち時々曇り 雨 那覇 晴れ 晴れ From dalke at dalkescientific.com Mon Apr 12 19:13:42 2004 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon Apr 12 19:19:44 2004 Subject: [BioPython] BOSC In-Reply-To: <20040408212748.GE63800@evostick.agtec.uga.edu> References: <20040408212748.GE63800@evostick.agtec.uga.edu> Message-ID: <0A6B4443-8CD7-11D8-B418-000393C92466@dalkescientific.com> Me: > Just a reminder -- only a bit over a week to submit a talk > proposal for BOSC. Oops! As a couple people pointed out to me, the deadline for abstract submission is May 5, not this week. (I was looking at last year's announcement for the date. *chagrin*) So another few weeks to go. Andrew dalke@dalkescientific.com From yong27 at bioinfo.sarang.net Mon Apr 12 21:44:41 2004 From: yong27 at bioinfo.sarang.net (Hyung-Yong Kim) Date: Mon Apr 12 21:49:30 2004 Subject: [BioPython] parsing error in Bio.Sequencing.Ace In-Reply-To: <1081774886.2059.4.camel@osiris.biology.duke.edu> Message-ID: Hi Frank Kauff Thanks for your concern. > It does! In theory, the parser should not run into this problem - are > you using the newest ace.py from cvs? The ace parser has been updated > after the recent 'official' release of biopython. Anyway, please send me > your input file so I can have a closer look why this happens and I can > fix it. There are no problems parsing them when I use newest ace.py from cvs. I used v1.3(in CVS) and the newest is v1.5. I could see their diffs in http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Sequencing/Ace.py.diff?r1=1.3&r2=1.5&cvsroot=biopython But, I also modified ace.py because Iterator class does not support __iter__. I want to know why __iter__ is depreciated although it is very convenient. Hyung-Yong Kim > Frank Kauff > Dept. of Biology > Duke University > Box 90338 > Durham, NC 27708 > USA > Phone 919-660-7382 > Fax 919-660-7293 Hyung-Yong Kim --------------------------------- National Livestock Research Institute (Korea) Division of Animal Genomics & Bioinformatics http://bioinfo.sarang.net From pieter at laeremans.org Tue Apr 13 09:59:59 2004 From: pieter at laeremans.org (Pieter Laeremans) Date: Tue Apr 13 10:04:58 2004 Subject: [BioPython] Parsing error when parsing a blast report Message-ID: <874qrokmg0.fsf@hades.kotnet.org> Hello, I've tried to parse some output I got through NCBIWWW.Blast. But I receive the following error: >>> ## working on region in file /tmp/python-8118EHE... Traceback (most recent call last): File "", line 1, in ? File "/tmp/python-8118EHE", line 14, in ? record = parser.parse(b_results) File "/usr/lib/python2.3/site-packages/Bio/Blast/NCBIWWW.py", line 48, in parse self._scanner.feed(handle, self._consumer) File "/usr/lib/python2.3/site-packages/Bio/Blast/NCBIWWW.py", line 97, in feed has_re=re.compile(r'.?BLAST')) File "/usr/lib/python2.3/site-packages/Bio/ParserSupport.py", line 335, in read_and_call_until line = safe_readline(uhandle) File "/usr/lib/python2.3/site-packages/Bio/ParserSupport.py", line 411, in safe_readline raise SyntaxError, "Unexpected end of stream." SyntaxError: Unexpected end of stream. >>> --------------------------------------- This is the script I used to get this result. I have no idea what 's wrong. Has anyone a clue ? thanks in advance, Pieter --------------------------------------- from Bio.Blast import * from Bio.Blast.NCBIWWW import * # here used to be a longer sequence sequence="MKIPNIGNVMNKFEILGVVGEGAYGVVLKCRHKETHEIV" b_results = NCBIWWW.blast(program='tblastn', database='nr', query=sequence, expect=0.001, entrez_query='Canis familiaris[ORGN]') parser = NCBIWWW.BlastParser() record = parser.parse(b_results) From bertrand.frottier at free.fr Tue Apr 13 14:46:24 2004 From: bertrand.frottier at free.fr (Bertrand FROTTIER) Date: Tue Apr 13 14:51:08 2004 Subject: [BioPython] Parsing error when parsing a blast report In-Reply-To: <874qrokmg0.fsf@hades.kotnet.org> References: <874qrokmg0.fsf@hades.kotnet.org> Message-ID: <407C3580.6020703@free.fr> I had the same problem last week. I suppose it's because the NCBI is using BLAST 2.2.8 now, and the parser is not up-to-date yet. So I started working on a parser for the XML output. It's not completely over yet (missing multiple_alignment, support of PSI-BLAST, and a lot of testing) but I can send you the file if you want. Pieter Laeremans a ?crit : > Hello, > > I've tried to parse some output I got through NCBIWWW.Blast. But I > receive the following error: > > > >>>>## working on region in file /tmp/python-8118EHE... > > Traceback (most recent call last): > File "", line 1, in ? > File "/tmp/python-8118EHE", line 14, in ? > record = parser.parse(b_results) > File "/usr/lib/python2.3/site-packages/Bio/Blast/NCBIWWW.py", line 48, in parse > self._scanner.feed(handle, self._consumer) > File "/usr/lib/python2.3/site-packages/Bio/Blast/NCBIWWW.py", line 97, in feed > has_re=re.compile(r'.?BLAST')) > File "/usr/lib/python2.3/site-packages/Bio/ParserSupport.py", line 335, in read_and_call_until > line = safe_readline(uhandle) > File "/usr/lib/python2.3/site-packages/Bio/ParserSupport.py", line 411, in safe_readline > raise SyntaxError, "Unexpected end of stream." > SyntaxError: Unexpected end of stream. > > > --------------------------------------- > > This is the script I used to get this result. I have no idea what 's > wrong. Has anyone a clue ? thanks in advance, > > Pieter > > > --------------------------------------- > > from Bio.Blast import * > from Bio.Blast.NCBIWWW import * > > # here used to be a longer sequence > sequence="MKIPNIGNVMNKFEILGVVGEGAYGVVLKCRHKETHEIV" > > b_results = NCBIWWW.blast(program='tblastn', database='nr', query=sequence, expect=0.001, entrez_query='Canis familiaris[ORGN]') > parser = NCBIWWW.BlastParser() > record = parser.parse(b_results) > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > > -- () (o_ //\ <-V_/_ Demonic penguin Win98 is called Win98 because 98 is the number of bugs occurring right after inserting the CD. From pieter at laeremans.org Wed Apr 14 06:19:41 2004 From: pieter at laeremans.org (Pieter Laeremans) Date: Wed Apr 14 06:24:29 2004 Subject: [BioPython] Transforming XML output in HTML ? Message-ID: <871xmqdfpe.fsf@laeremans.org> Hello, I find it very usefull to inspect the output from blast by hand from time to time. Therefore it would be nice if the xml output could be converted to html. Is there an XSLT or so available somewhere which makes this possible ? thanks in advance, Pieter From letondal at pasteur.fr Wed Apr 14 12:39:16 2004 From: letondal at pasteur.fr (Catherine Letondal) Date: Wed Apr 14 12:43:59 2004 Subject: [BioPython] Martel not installed in MacOSX biopython distribution ? Message-ID: <200404141639.i3EGdGRw206939@electre.pasteur.fr> Hi, We have a recent biopython installation (biopython-1.24.tar.gz) on a MacOsX platform. When using the parse function from module Bio.Clustalw, we get an error message saying that Martel is missing. Otherwise, all the biopython modules work well. Also, when pointing to the Martel link (http://www.biopython.org/~dalke/Martel/ and http://www.bioinformatics.org/bradstuff/bp/api/Martel/index.html), we get a 'not found' message. Is there a known problem of Martel module missing when installing biopython? Thanks in advance! Thanks also to include my colleague edeveaud@pasteur.fr in the reply :-) Best, -- Catherine Letondal -- Pasteur Institute Computing Center From dalke at dalkescientific.com Thu Apr 15 06:08:01 2004 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu Apr 15 06:12:43 2004 Subject: [BioPython] NBN tutorial slides Message-ID: Two months ago I taught an intro. to programming class as part of a bioinformatics course organized by the National Bioinformatics Network in South Africa. It was two weeks long, two hours a day, six days a week, and all in Python. More information about the class and my slides are now available at http://www.dalkescientific.com/writings/NBN/ Andrew dalke@dalkescientific.com From chapmanb at uga.edu Sun Apr 18 09:47:13 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sun Apr 18 13:55:24 2004 Subject: [BioPython] Parsing error when parsing a blast report In-Reply-To: <407C3580.6020703@free.fr> References: <874qrokmg0.fsf@hades.kotnet.org> <407C3580.6020703@free.fr> Message-ID: <20040418134713.GA8067@misterbd.agtec.uga.edu> Hello Pieter and Bertrand; Pieter: > >I've tried to parse some output I got through NCBIWWW.Blast. But I > >receive the following error: [...] > >SyntaxError: Unexpected end of stream. [...] > >This is the script I used to get this result. I have no idea what 's > >wrong. Has anyone a clue ? thanks in advance, The problem isn't the parser, but rather BLASTing against NCBI. This problem is fixed in CVS and will be in the next release -- Catherine reported it back in March: http://portal.open-bio.org/pipermail/biopython/2004-March/001903.html So you can either use CVS, or the quick fix is to change: b_results = NCBIWWW.blast(program='tblastn', database='nr', query=sequence, expect=0.001, entrez_query='Canis familiaris[ORGN]') to: b_results = NCBIWWW.blast(program='tblastn', database='nr', query=sequence, expect=0.001, entrez_query='Canis familiaris[ORGN]', format_type = "HTML") Bertrand: > So I started working on a parser for the XML > output. It's not completely over yet (missing multiple_alignment, > support of PSI-BLAST, and a lot of testing) but I can send you the file > if you want. If you get this finished and working, we would definitely be willing to accept it into Biopython -- a parser for XML blast output is currently missing but desired. Hope this helps -- let us know if that doesn't fix the problem or there are any other questions. Brad From chapmanb at uga.edu Sun Apr 18 09:53:39 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sun Apr 18 14:01:49 2004 Subject: [BioPython] parsing error in Bio.Sequencing.Ace In-Reply-To: <1082038875.2059.11.camel@osiris.biology.duke.edu> References: <1082038875.2059.11.camel@osiris.biology.duke.edu> Message-ID: <20040418135339.GB8067@misterbd.agtec.uga.edu> Hi Hyung-Yong and Frank; Thanks for handling the Ace parser questions -- glad the ol' parser is still cooking along for everyone. > Oh, there's also a typo in line 576, where it should read 'Missing > header line in WA tag' instead of '... CT tag' :-( Fixed in CVS. > > But, I also modified ace.py because Iterator class does not support __iter__. > > I want to know why __iter__ is depreciated although it is very convenient. > > > Good question. I can't remember having removed __iter__ by myself on > purpose? Maybe I deleted it accidentially? Brad, do you know why > __iter__ is out of the Iterator class? I'm not sure -- I guess it must have been left out accidentally. Either way, I just added it back into CVS, so it should be there for the next release. Just let me know if there are any other changes or patches. Thanks again. Brad From chapmanb at uga.edu Sun Apr 18 10:24:25 2004 From: chapmanb at uga.edu (Brad Chapman) Date: Sun Apr 18 14:33:15 2004 Subject: [BioPython] Martel not installed in MacOSX biopython distribution ? In-Reply-To: <200404141639.i3EGdGRw206939@electre.pasteur.fr> References: <200404141639.i3EGdGRw206939@electre.pasteur.fr> Message-ID: <20040418142425.GD8067@misterbd.agtec.uga.edu> Hi Catherine; > We have a recent biopython installation (biopython-1.24.tar.gz) on a MacOsX platform. > > When using the parse function from module Bio.Clustalw, we get an error message saying > that Martel is missing. Otherwise, all the biopython modules work well. > Also, when pointing to the Martel link (http://www.biopython.org/~dalke/Martel/ > and http://www.bioinformatics.org/bradstuff/bp/api/Martel/index.html), we > get a 'not found' message. > > Is there a known problem of Martel module missing when installing biopython? Yes, I did clean up some bad code I wrote a while back in the Clustalw module -- as of revision 1.8 of Bio/Clustalw/clustaw_format.py in CVS (since March). I had an import check there from way back in the days when Martel was distributed separately from Biopython, hence the bad URL as well. So this should behave better in future releases -- sorry for the confusion with that. As to the symptoms of your problems, Martel should be installed by default with Biopython -- you can check this using: >>> import Martel at a python prompt. If all other Biopython modules are working and Clustalw is not, then it is very likely that Martel is installed but something else may be the problem. So I guess the best way to proceed and check out the problem is: 1. Does 'import Martel' work? If not, what kind of error message are you getting? The only current known problems with 1.24 and Martel installation are that it can be a little strange if there is an old Martel instance already installed into site-packages. This should be fixed in the next release, but the quick fix is to remove site-packages/Martel and install again. 2. Try and get the latest changes in Clustalw/clustal_format.py and install this in place of the 1.24 version. Hopefully then you should at least see the problem with the import Martel call and we can diagnose the problem further. I hope this helps. Sorry about any problems and be sure to write again if this doesn't clear things up. Brad From cat at rmb.com.hk Mon Apr 19 04:05:27 2004 From: cat at rmb.com.hk (cat) Date: Mon Apr 19 04:09:57 2004 Subject: [BioPython] Rash guard USD6.00/PC Message-ID: <200404190809.i3J89r6C025228@portal.open-bio.org> product name:Rash guard QTY:1,000PCS USD6.00/PC T : 0086-769-5835182 F : 0086-769-5835182 Website : http://home.netvigator.com/~sky888s/ Thanks, cat From lpritc at scri.sari.ac.uk Mon Apr 19 08:34:56 2004 From: lpritc at scri.sari.ac.uk (Leighton Pritchard) Date: Mon Apr 19 08:39:50 2004 Subject: [BioPython] Different parse error in ace.py Message-ID: <4083C770.6010306@scri.sari.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi all, The Ace.py parser assumes that the DS tag will always be present in Phrap output. Unfortunately, it isn't always present in my output, and so the parser gave me this traceback: uncaught: (, ) Traceback (most recent call last): ~ File "/usr/local/lib/python2.3/site-packages/ScriptFramework/script.py", line 241, in main ~ self.run() ~ File "/home/lpritc/Data/scripts/draw_ace.py", line 71, in run ~ self.ace_file_record = self.load_record() # Load ACE file ~ File "/home/lpritc/Data/scripts/draw_ace.py", line 97, in load_record ~ ace_file_record = parser.parse(self.input) ~ File "/usr/local/lib/python2.3/site-packages/Bio/Sequencing/Ace.py", line 295, in parse ~ rec=iter.next() ~ File "/usr/local/lib/python2.3/site-packages/Bio/Sequencing/Ace.py", line 218, in next ~ return self._parser.parse(File.StringHandle(data)) ~ File "/usr/local/lib/python2.3/site-packages/Bio/Sequencing/Ace.py", line 234, in parse ~ self._scanner.feed(uhandle, self._consumer) ~ File "/usr/local/lib/python2.3/site-packages/Bio/Sequencing/Ace.py", line 331, in feed ~ self._scan_record(handle, consumer) ~ File "/usr/local/lib/python2.3/site-packages/Bio/Sequencing/Ace.py", line 352, in _scan_record ~ read_and_call(uhandle,consumer.ds,start='DS ') ~ File "/usr/local/lib/python2.3/site-packages/Bio/ParserSupport.py", line 300, in read_and_call ~ raise SyntaxError, errmsg SyntaxError: Line does not start with 'DS ': RT{ I've fixed it locally by changing line 352 in Ace.py to ### if attempt_read_and_call(uhandle,consumer.ds,start='DS '): ~ read_and_call(uhandle,consumer.ds,start='DS ') ### so as to take the possible absence of a DS tag into account, though you may want to do something more elegant with the real code ;) - -- Dr Leighton Pritchard AMRSC D104, PPI, Scottish Crop Research Institute Invergowrie, Dundee, DD2 5DA, Scotland, UK E: lpritc@scri.sari.ac.uk W: http://bioinf.scri.sari.ac.uk/index.shtml T: +44 (0)1382 568579 F: +44 (0)1382 568578 PGP key FEFC205C: GPG key E58BA41B: http://www.keyserver.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAg8dvL1gZ+OWLpBsRAllhAJ4kgHrygUcTVozoyKH2XlLTBE+07ACfRWZ1 5T0Ztu63f1flGaxHJSs0QZA= =zTh4 -----END PGP SIGNATURE----- From fkauff at duke.edu Mon Apr 19 09:05:47 2004 From: fkauff at duke.edu (Frank Kauff) Date: Mon Apr 19 09:11:53 2004 Subject: [BioPython] Different parse error in ace.py In-Reply-To: <4083C770.6010306@scri.sari.ac.uk> References: <4083C770.6010306@scri.sari.ac.uk> Message-ID: <1082379947.2059.3.camel@osiris.biology.duke.edu> Hi Leighton, On Mon, 2004-04-19 at 08:34, Leighton Pritchard wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi all, > > The Ace.py parser assumes that the DS tag will always be present in > Phrap output. Unfortunately, it isn't always present in my output, and > so the parser gave me this traceback: > Thanks for pointing this out. Ace parsing is learning by doing - unfortunately there's no proper description of which tags appear when and where... I indeed assumed that there's always a DS. I'll update the code, so far your workaround seems to the right thing to do. Frank > uncaught: (, > ) > Traceback (most recent call last): > ~ File > "/usr/local/lib/python2.3/site-packages/ScriptFramework/script.py", line > 241, in main > ~ self.run() > ~ File "/home/lpritc/Data/scripts/draw_ace.py", line 71, in run > ~ self.ace_file_record = self.load_record() # Load ACE file > ~ File "/home/lpritc/Data/scripts/draw_ace.py", line 97, in load_record > ~ ace_file_record = parser.parse(self.input) > ~ File "/usr/local/lib/python2.3/site-packages/Bio/Sequencing/Ace.py", > line 295, in parse > ~ rec=iter.next() > ~ File "/usr/local/lib/python2.3/site-packages/Bio/Sequencing/Ace.py", > line 218, in next > ~ return self._parser.parse(File.StringHandle(data)) > ~ File "/usr/local/lib/python2.3/site-packages/Bio/Sequencing/Ace.py", > line 234, in parse > ~ self._scanner.feed(uhandle, self._consumer) > ~ File "/usr/local/lib/python2.3/site-packages/Bio/Sequencing/Ace.py", > line 331, in feed > ~ self._scan_record(handle, consumer) > ~ File "/usr/local/lib/python2.3/site-packages/Bio/Sequencing/Ace.py", > line 352, in _scan_record > ~ read_and_call(uhandle,consumer.ds,start='DS ') > ~ File "/usr/local/lib/python2.3/site-packages/Bio/ParserSupport.py", > line 300, in read_and_call > ~ raise SyntaxError, errmsg > SyntaxError: Line does not start with 'DS ': > RT{ > > I've fixed it locally by changing line 352 in Ace.py to > > ### > > if attempt_read_and_call(uhandle,consumer.ds,start='DS '): > ~ read_and_call(uhandle,consumer.ds,start='DS ') > > ### > > so as to take the possible absence of a DS tag into account, though you > may want to do something more elegant with the real code ;) > > - -- > Dr Leighton Pritchard AMRSC > D104, PPI, Scottish Crop Research Institute > Invergowrie, Dundee, DD2 5DA, Scotland, UK > E: lpritc@scri.sari.ac.uk W: http://bioinf.scri.sari.ac.uk/index.shtml > T: +44 (0)1382 568579 F: +44 (0)1382 568578 > PGP key FEFC205C: GPG key E58BA41B: http://www.keyserver.net > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.2.3 (GNU/Linux) > Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org > > iD8DBQFAg8dvL1gZ+OWLpBsRAllhAJ4kgHrygUcTVozoyKH2XlLTBE+07ACfRWZ1 > 5T0Ztu63f1flGaxHJSs0QZA= > =zTh4 > -----END PGP SIGNATURE----- > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython -- Frank Kauff Dept. of Biology Duke University Box 90338 Durham, NC 27708 USA Phone 919-660-7382 Fax 919-660-7293 From ciccio at unical.it Mon Apr 19 09:43:09 2004 From: ciccio at unical.it (ciccio@unical.it) Date: Mon Apr 19 09:49:02 2004 Subject: [BioPython] random numbers Message-ID: <1082382189.4083d76d575a0@webmail.unical.it> Hi all, is there the possibility to generate random numbers according to a gamma distribution with both mean and shape parameters fixed? (in python or biopython) Thank you ernesto ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/ From fkauff at duke.edu Mon Apr 19 09:54:28 2004 From: fkauff at duke.edu (Frank Kauff) Date: Mon Apr 19 09:59:01 2004 Subject: [BioPython] Different parse error in ace.py In-Reply-To: <1082379947.2059.3.camel@osiris.biology.duke.edu> References: <4083C770.6010306@scri.sari.ac.uk> <1082379947.2059.3.camel@osiris.biology.duke.edu> Message-ID: <1082382868.2059.11.camel@osiris.biology.duke.edu> Leighton, On Mon, 2004-04-19 at 09:05, Frank Kauff wrote: > Hi Leighton, > > On Mon, 2004-04-19 at 08:34, Leighton Pritchard wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > Hi all, > > > > The Ace.py parser assumes that the DS tag will always be present in > > Phrap output. Unfortunately, it isn't always present in my output, and > > so the parser gave me this traceback: > > > > Thanks for pointing this out. Ace parsing is learning by doing - > unfortunately there's no proper description of which tags appear when > and where... I indeed assumed that there's always a DS. I'll update the > code, so far your workaround seems to the right thing to do. > Just had a look at the parser and it seems that it is unfortunately a bigger problem. The way the read data (rd, qa, ds) is currently stored assumes that there's the same number of rd, qa and ds records for each read. Storing an empty DS would be a possibility, but is pretty unelegant. Anyway, I was feeling more and more uncomfortable with the way ace.py is storing data for record - now I have a reason for restructuring it :-) F. -- Frank Kauff Dept. of Biology Duke University Box 90338 Durham, NC 27708 USA Phone 919-660-7382 Fax 919-660-7293 From anewgene at hotpop.com Mon Apr 19 10:14:28 2004 From: anewgene at hotpop.com (CL Wu) Date: Mon Apr 19 10:17:56 2004 Subject: [BioPython] random numbers In-Reply-To: <1082382189.4083d76d575a0@webmail.unical.it> References: <1082382189.4083d76d575a0@webmail.unical.it> Message-ID: <4083DEC4.9060909@hotpop.com> Does it work for you? >>> import random >>> random.gammavariate(1,2) 1.5041091518144103 Chunlei ciccio@unical.it wrote: > >Hi all, >is there the possibility to generate random numbers according to a gamma >distribution with both mean and shape parameters fixed? (in python or >biopython) > >Thank you > >ernesto > > >------------------------------------------------- >This mail sent through IMP: http://horde.org/imp/ > >_______________________________________________ >BioPython mailing list - BioPython@biopython.org >http://biopython.org/mailman/listinfo/biopython > > > From deanna at t-online.de Mon Apr 19 10:27:22 2004 From: deanna at t-online.de (reed) Date: Mon Apr 19 10:41:45 2004 Subject: [BioPython] =?windows-1251?b?zO7y6OLg9uj/IO/l8PHu7eDr4A==?= Message-ID: <200404191441.i3JEfe6B029249@portal.open-bio.org> ?? ??? ???? ?? ??????, ??? ????? ?????? ??????? ??????? ?? ????, ????????? ???????????? ?????????? ?? ??? ??????????. ? ??? ???????? ???????, ?? ?????????? ?? ????? ???????? ????????? ??? ????? ?????????! ??? ??????? ???, ????? ?????????? ? ??????????? ????????, ?? ?????? ??????????? ???????? ? ????????? ????? ????? ???? ??? ??????? ????? ?????? ???????? ??? ?????????? ?? ?? ????? ??????????? ??? ???? ???? ?????? ???? ?? ??????????? ???????????? ????????-????????: ??????????? ??????? ????????? ?????????? 26-27 ??????. ????????? ????????-????????: ?????? I. ????????? ?????????. ? ???????? ?????????. ? ???????????? ? ??????????? ??????? ? ?????????: ????????? ????????????? ??????, ?? ?????? ? ??????? ??????????. ??????? ??????????? ? ????????? ????????? ?????? ? ????????? ????? ?????????? ?????????. ?????? II. ???????? ???????? ?????????. 1.????????? ????????? ?? ?????? ???????????. ? ????????????? ?????????. ?????? ????????? ????????? ? ???????? ?????? ? ???????????? ????????. ? ?????? ????????????? ?????? ?????????. ? ??????? ??????? ? ????????? ?????? ?????????. ? ??????? ????????????? ? ??????????????? ??????????????: ??????????? ? ???????????. ? ????????????? ????????: ????????????? ???????? ? ????? ??? ??????? ?????????. ?????????? ??????? ? ????? ???????? ? ??????? ????????? ?????????. ? ????????? ?????????. 2. ????????? ?????? ??????????. ????????????? ??????????. ? ???????? ???????????? ??????? ?????. ?????????? ????? ?????????? ????? ? ??????????? ??????? ???????? ?????. ?????? III. ??? ??????? ? ?????? ?????? ?????????? ????????????? ????? ???????? (???? ?????? ?????????? ? ???????????? ???? ????????????). ? ???? ?????????? ??????? ?????????. ? ??????. ???????. ??????. ??????. ?????????? ?????: ??? ????????? ???????????. ? ??????????? ????????????? ??????? ????????? ???????????. ? ?????????? ????????? ???????? ?? ???????? ?????????? ?????????. ????????? ??????? ? ??????????? - 7500 ??????, ? ?.?. ???. ????? ?????? ????? (???????? ??? ???????????). ? ????????? ??????: ??????? ? ?????? ??????, ??????????? ???????? ????-?????, ????. ??????-????? ???????? ? ?????? (?. ?????????????). ????? ?????????? ? 10 ?? 17.30. ??????????? ?????????? ???????????. ?????? ??????? ? ???????? ? ??? ???? ?????????? ??????????? ?????????? ??????????? ??????????? (?? CD,DVD ??? ?????????????). ? ??????????? ??????????? ??????????? ????????. ????????? ???????????????? 4500 ???. ? ?????? ??? ??? ??????? ? ??????-?????? ??? ??????? ??????????????? ?? ????????? ?????? ????? ?????????? ??? ???????????. ?????????? ??????????? ?????? ????????? ?? ?????: 1. ?????? ? ?????? ????????? ? ???????????. 2. ???? ????? ? ??????? (?????? ????????? ????? ????????? ? ??????) 3. ???????? ??????????? ???????????. ?????????? ???????? (095) 207-26-21 ? (095) 789-81-90 From anewgene at hotpop.com Mon Apr 19 14:25:53 2004 From: anewgene at hotpop.com (CL Wu) Date: Mon Apr 19 14:37:35 2004 Subject: [BioPython] random numbers In-Reply-To: <1082389538.4083f4227715f@webmail.unical.it> References: <1082382189.4083d76d575a0@webmail.unical.it> <4083DEC4.9060909@hotpop.com> <1082389538.4083f4227715f@webmail.unical.it> Message-ID: <408419B1.8040709@hotpop.com> An HTML attachment was scrubbed... URL: http://portal.open-bio.org/pipermail/biopython/attachments/20040419/57df7087/attachment.htm From crocha at dc.uba.ar Mon Apr 19 17:28:52 2004 From: crocha at dc.uba.ar (Cristian S. Rocha) Date: Mon Apr 19 17:37:26 2004 Subject: [BioPython] BioPython and Zope. Message-ID: <1082410131.16662.12.camel@numero2> Hello everybody, I'm trying to parse fasta string input using Zope. I made a Zope external method to interface with BioPython modules. This method work fine on command line, but It don't work in Zope. The code is: ---------------- import StringIO from Bio import SeqRecord from Bio.SeqIO import FASTA def SequenceRead(SequenceString): input = StringIO.StringIO(SequenceString) reader = FASTA.FastaReader(input) # Error line seq = reader.next() while seq: print "> %s" % seq.id seq = reader.next() return "!" if __name__ == "__main__": print BioTools( """>123 ATAGGGGATGATAGGAT >456 GGATGAGGAGCGATGCG """ ) ---------------- The error is: AttributeError: 'module' object has no attribute 'FastaReader' The traceback is: """ Traceback (innermost last): Module ZPublisher.Publish, line 98, in publish Module ZPublisher.mapply, line 88, in mapply Module ZPublisher.Publish, line 39, in call_object Module OFS.DTMLMethod, line 127, in __call__ Module DocumentTemplate.DT_String, line 474, in __call__ Module DocumentTemplate.DT_Try, line 140, in render Module DocumentTemplate.DT_Try, line 183, in render_try_except Module DocumentTemplate.DT_Util, line 201, in eval - __traceback_info__: SequenceText Module , line 1, in Module Shared.DC.Scripts.Bindings, line 306, in __call__ Module Shared.DC.Scripts.Bindings, line 343, in _bindAndExec Module Products.PythonScripts.PythonScript, line 307, in _exec Module None, line 15, in Translate - - Line 15 Module Products.ExternalMethod.ExternalMethod, line 224, in __call__ - __traceback_info__: (('> Input.\r\nATGATGA\r\n> Output\r\nATGGGGAT',), {}, None) Module /usr/lib/zope/Extensions/BioTools.py, line 9, in SequenceRead AttributeError: 'module' object has no attribute 'FastaReader' """ Do you have any idea why it's happening? Thanks, Cristian. -- Lic. Cristian S. Rocha. Departamento de Computacin. FCEyN. UBA. Pabellon I. Cuarto 9. Ciudad Universitaria. (1428) Buenos Aires. Argentina. Tel: +54-11-4576-3390/96 int 714 Tel/Fax: +54-11-4576-3359 Cel: 15-5-607-9192 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta parte del mensaje =?ISO-8859-1?Q?est=E1?= firmada digitalmente Url : http://portal.open-bio.org/pipermail/biopython/attachments/20040419/cbcc018c/attachment.bin From thamelry at binf.ku.dk Wed Apr 7 11:03:19 2004 From: thamelry at binf.ku.dk (Thomas Hamelryck) Date: Tue Apr 20 10:22:38 2004 Subject: [BioPython] Bio.PDB news In-Reply-To: <20040331003137.GH29401@evostick.agtec.uga.edu> References: <406450B7.6060401@mitre.org> <20040331003137.GH29401@evostick.agtec.uga.edu> Message-ID: <200404071703.19364.thamelry@binf.ku.dk> Hi everybody, Bio.PDB now has a class that deals with superimposing crystal structures (Superimposer - see CVS). Brad, I am working fiercely on the Bio.PDB manual, class documentation and a FAQ. So keep the documentation police away for a while! Cheers, -Thomas From cymon at duke.edu Tue Apr 20 11:06:07 2004 From: cymon at duke.edu (Cymon Cox) Date: Tue Apr 20 11:10:39 2004 Subject: [BioPython] BioPython and Zope. In-Reply-To: <1082410131.16662.12.camel@numero2> References: <1082410131.16662.12.camel@numero2> Message-ID: <1082473567.16022.30.camel@isis.biology.duke.edu> Hi Cristian, When you add an External Method to your Zope application you assign it directly to a function contained in a module. I don't think the imports at the module level are never made. In short try: def SequenceRead(SequenceString): import StringIO from Bio import SeqRecord from Bio import FASTA input = StringIO.StringIO(SequenceString) reader = FASTA.FastaReader(input) # Error line seq = reader.next() etc... Cheers, Cymon On Mon, 2004-04-19 at 17:28, Cristian S. Rocha wrote: > Hello everybody, > > I'm trying to parse fasta string input using Zope. I made a Zope > external method to interface with BioPython modules. This method work > fine on command line, but It don't work in Zope. > > The code is: > > ---------------- > import StringIO > from Bio import SeqRecord > from Bio.SeqIO import FASTA > > def SequenceRead(SequenceString): > input = StringIO.StringIO(SequenceString) > reader = FASTA.FastaReader(input) # Error line > seq = reader.next() > while seq: > print "> %s" % seq.id > seq = reader.next() > return "!" > > if __name__ == "__main__": > print BioTools( > """>123 > ATAGGGGATGATAGGAT > >456 > GGATGAGGAGCGATGCG > """ > ) > ---------------- > > The error is: > > AttributeError: 'module' object has no attribute 'FastaReader' > > The traceback is: > > """ > Traceback (innermost last): > Module ZPublisher.Publish, line 98, in publish > Module ZPublisher.mapply, line 88, in mapply > Module ZPublisher.Publish, line 39, in call_object > Module OFS.DTMLMethod, line 127, in __call__ > Module DocumentTemplate.DT_String, line 474, in __call__ > Module DocumentTemplate.DT_Try, line 140, in render > Module DocumentTemplate.DT_Try, line 183, in render_try_except > Module DocumentTemplate.DT_Util, line 201, in eval > - __traceback_info__: SequenceText > Module , line 1, in > Module Shared.DC.Scripts.Bindings, line 306, in __call__ > Module Shared.DC.Scripts.Bindings, line 343, in _bindAndExec > Module Products.PythonScripts.PythonScript, line 307, in _exec > Module None, line 15, in Translate > - > - Line 15 > Module Products.ExternalMethod.ExternalMethod, line 224, in __call__ > - __traceback_info__: (('> Input.\r\nATGATGA\r\n> Output\r\nATGGGGAT',), {}, None) > Module /usr/lib/zope/Extensions/BioTools.py, line 9, in SequenceRead > AttributeError: 'module' object has no attribute 'FastaReader' > """ > > Do you have any idea why it's happening? > > Thanks, > Cristian. -- Cymon Cox Duke University From marti at seznam.cz Wed Apr 21 17:55:01 2004 From: marti at seznam.cz (julayne) Date: Wed Apr 21 18:12:27 2004 Subject: [BioPython] for you Message-ID: <200404212212.i3LMCL6B024451@portal.open-bio.org> ?????? ??? ????? ? ?????? ????????? ??????? ??????????. ????? ????????? ?????????? ??????????????? ?? www.mypresent.ru ???????? ????? ???????? ????????? ????? ? ???????????? ? ????? ???? ? ????? ????????. From jchuang8 at itsa.ucsf.edu Wed Apr 21 21:00:07 2004 From: jchuang8 at itsa.ucsf.edu (Jer-Yee John Chuang) Date: Wed Apr 21 21:04:37 2004 Subject: [BioPython] PSIBlastParser behavior Message-ID: <200404220100.i3M107fI026284@itsa.ucsf.edu> Hi, I am observing an unexpected behavior using the PSIBlastParser. I am doing a simple PSI-Blast run: Code: -------------------------------------------------------- blastOut, errorInfo= NCBIStandalone.blastpgp(myBlastExe,myBlastDB, myBlastFile, expectation=myEValue, npasses=myNPasses) myParser= NCBIStandalone.PSIBlastParser() myRecord= myParser.parse(blastOut) Error message: -------------------------------------------------------- Traceback (most recent call last): File "locateHomologuesPsi_post.py", line 49, in ? myRecord= myParser.parse(myFile) File "c:\python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line 557, in parse self._scanner.feed(handle, self._consumer) File "c:\python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line 98, in feed self._scan_database_report(uhandle, consumer) File "c:\python23\lib\site-packages\Bio\Blast\NCBIStandalone.py", line 413, in _scan_database_report read_and_call(uhandle, consumer.database, start=' Database') File "c:\python23\lib\site-packages\Bio\ParserSupport.py", line 300, in read_a nd_call raise SyntaxError, errmsg SyntaxError: Line does not start with ' Database': Results from round 1 I wrote the PSI-Blast report to file, then tried deleting lines then calling the PSIBlastParser on the remaining. I found that the parser is expecting a line beginning with the work "Searching" before the line "Results from round" (this is defined in NCBIStandalone _Scanner class in def _scan_descriptions). Once I correct this (manually adding the "Searching" line in the PSI-Blast report or commenting out the relevant lines in NCBIStandalone.py), the PSIBlastParser works fine. However, the blastpgp report doesn't contain this "Searching" line in the report it generates. Is there something that I am missing here?? Thank you for your time. Cheers, John ----------- John Chuang UCSF, MB S516JJ 500 16th St. San Francisco, CA 94143 jchuang8@itsa.ucsf.edu From crocha at dc.uba.ar Thu Apr 22 17:46:05 2004 From: crocha at dc.uba.ar (Cristian S. Rocha) Date: Thu Apr 22 18:00:19 2004 Subject: [BioPython] Problems with Zope. Message-ID: <1082670364.22677.6.camel@numero2> Hello, I'm trying to use BioPython in an External Method in Zope but it return "ImportError: cannot import name FastaReader" (and others functions). I check some "tricks" but nothing change. Then I modify my external method to know what happening with the symbol table having the following results: The program: """ def SequenceRead(SequenceString): import StringIO import Bio.SeqIO.FASTA return vars(Bio.SeqIO.FASTA).keys() """ When I run it at command line the function return: """ ['Bio', 'string', 'Seq', 'FastaReader', '__builtins__', '__file__', 'SeqRecord', 'FastaWriter', '__name__', 'os', '__doc__'] """ That's ok. But When I run it in Zope the function return: """ ['Bio', 'string', 'Seq', '__builtins__', '__name__', '__file__', 'os', '__doc__'] """ The function lost the "FastaReader", "SeqRecord" and "FastaWriter" in the symbol table!!! What's wrong? Thxs, Cristian. -- Lic. Cristian S. Rocha. Departamento de Computacin. FCEyN. UBA. Pabellon I. Cuarto 9. Ciudad Universitaria. (1428) Buenos Aires. Argentina. Tel: +54-11-4576-3390/96 int 714 Tel/Fax: +54-11-4576-3359 Cel: 15-5-607-9192 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta parte del mensaje =?ISO-8859-1?Q?est=E1?= firmada digitalmente Url : http://portal.open-bio.org/pipermail/biopython/attachments/20040422/a0f13d68/attachment.bin From ciccio at unical.it Fri Apr 23 04:51:17 2004 From: ciccio at unical.it (ciccio@unical.it) Date: Fri Apr 23 04:57:54 2004 Subject: [BioPython] parametric bootstrap Message-ID: <1082710277.4088d905b6916@webmail.unical.it> Hi all, do you know how is possible to analyze parametric bootstrap results to obtain a P value (by means of Biopython or python) ? Thanks ernesto ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/ From chunlei.wu at uth.tmc.edu Fri Apr 23 01:03:33 2004 From: chunlei.wu at uth.tmc.edu (Chunlei Wu) Date: Fri Apr 23 08:30:51 2004 Subject: [BioPython] RecordFile.py Message-ID: <4088A3A5.8010901@uth.tmc.edu> Hi, group, I just tried "RecordFile.py", but it failed for both fasta file and genbank file I tested. >>> rec_h=RecordFile.RecordFile(open(r"gb_test.txt" ),'LOCUS','\\') or >>> rec_h=RecordFile.RecordFile(open(r"gb_test.txt" ),'>','') both returned the same error: >>> rec_h.read() Traceback (most recent call last): File "", line 1, in ? File "C:\Python23\Lib\site-packages\Bio\RecordFile.py", line 83, in read text = self._in_record_state( args, keywds ) File "C:\Python23\Lib\site-packages\Bio\RecordFile.py", line 120, in _in_record_state requested_text = text UnboundLocalError: local variable 'text' referenced before assignment I checked the code, but the code is not obvious for me to fix it. Actually, I wrote a simply script before using Bio.File's UndoHandle for the same purpose. It looks much simpler, maybe not as powerful as RecordFile.py, but it does works for me. I post it here and hope it is worth sharing with you. Best, Chunlei Wu -------------- next part -------------- #Chunlei Wu 07/30/2003 ''' FlatRecHandle is a class simulating a file handle for Flatfile format record file, using record as a reading unit instead of line. FlatRecHandle.readrecord() returns a record everytime. ''' from Bio import File,Fasta class FlatRecHandle: '''A FileHandle for Flatfile format record file, using record as a reading unit instead of line. start_marker is the marker of the start of each record: ">" for Fasta format record, "LOCUS" for GenBank format record, etc. stop_marker is the marker of the stop of each record, if None, the record stops till next start_marker or file end. e.g.: None for Fasta format record, "//" for GenBank format record, etc. return '' if reaching eof.''' def __init__(self,handle,start_marker=None,stop_marker=None): self._handle = File.UndoHandle(handle) self.start_marker=start_marker self.stop_marker=stop_marker def readrecord(self): '''return one record at one time,just like readline(). return '' if reaching eof.''' is_record=0 saved_record='' while 1: line=self._handle.readline() if line == '': ##reach eof. if self.stop_marker is not None and is_record : print 'Warning: This record may be incomplete. No stop marker("%s") found,but reach EOF!' % self.stop_marker break else: break if line[:len(self.start_marker)] == self.start_marker: is_record=1 if is_record: saved_record += line if self.stop_marker is None: next_line=self._handle.peekline() if next_line[:len(self.start_marker)] == self.start_marker or next_line == '': break else: if line[:len(self.stop_marker)] == self.stop_marker: break return saved_record def rewind(self): '''rewind the handler pointer to the beginning.''' return self._handle.seek(0) def tell(self): return self._handle.tell() def close(self): return self._handle.close() def closed(self): return self._handle.closed() def readrecords(self): '''return list of records,just like readlines()''' rec_list=[] while 1: rec=self.readrecord() if rec == '': break rec_list.append(rec) return rec_list def fasta_handle(in_f_handle): '''return a FlatRecHandle for fasta format. input is a fasta format file handle.''' return FlatRecHandle(in_f_handle,">") def fasta_iterator(fastafile_handle): '''return a Fasta file iterator using Bio.Fasta input is a fasta format file handle.''' parser=Fasta.RecordParser() return Fasta.Iterator(fastafile_handle,parser) def gb_handle(in_f_handle): '''return a FlatRecHandle for GenBank format.''' return FlatRecHandle(in_f_handle,"LOCUS","//") From cat at rmb.com.hk Fri Apr 23 08:51:27 2004 From: cat at rmb.com.hk (cat) Date: Fri Apr 23 08:55:51 2004 Subject: [BioPython] Rash guard USD6.00/PC Message-ID: <200404231255.i3NCtk6C018400@portal.open-bio.org> product name:Rash guard QTY:1,000PCS USD6.00/PC T : 0086-769-5835182 F : 0086-769-5835182 Website : http://home.netvigator.com/~sky888s/ Thanks, cat From crocha at dc.uba.ar Fri Apr 23 12:16:39 2004 From: crocha at dc.uba.ar (Cristian S. Rocha) Date: Fri Apr 23 12:22:10 2004 Subject: [BioPython] Solved BioPython & Zope problem and a Bug? Message-ID: <1082736999.23447.67.camel@numero2> Hello, I could solve my problem with BioPython in Zope. The main problem was Zope can't import some functions defined in some modules. These modules have other modules with relative imports and Zope can't import them. For example: Bio.config.FormatRegistry.py have the following relative import: import _support I change it to: from Bio.config import _support and Zope start to import FormatRegistry.py functions. I do the same with FormatIO.py with ReseekFile import. The second problem IS """Bio.MultiProc.copen""". I really don't know why, but Zope begin an infinity loop when I ask for the symbol table of the module. To solve these problem I commented the import on the Bio.config._support module. Thanks, Cristian. -- Lic. Cristian S. Rocha. Departamento de Computacin. FCEyN. UBA. Pabellon I. Cuarto 9. Ciudad Universitaria. (1428) Buenos Aires. Argentina. Tel: +54-11-4576-3390/96 int 714 Tel/Fax: +54-11-4576-3359 Cel: 15-5-607-9192 From ellis at seznam.cz Mon Apr 26 09:21:53 2004 From: ellis at seznam.cz (shan-min) Date: Mon Apr 26 09:39:09 2004 Subject: [BioPython] presents Message-ID: <200404261338.i3QDci6B016787@portal.open-bio.org> ?????? ???? ?????? ?????????? ??????! ??? ??? ??? ??? ?????????, ?????? ????? ?????? ???????! ????? ?? ?????????? ??????? ?????? ?? ??????????: ??! ?? 1 ??? ??? ????????! http://www.mypresent.ru From crocha at dc.uba.ar Mon Apr 26 09:35:24 2004 From: crocha at dc.uba.ar (Cristian S. Rocha) Date: Mon Apr 26 09:46:29 2004 Subject: [BioPython] Standard API to sequence parser? Message-ID: <1082986523.31151.5.camel@numero2> Simple question: Does exist an standard API to parse sequence files of different formats? Thxs, Cristian. -- Lic. Cristian S. Rocha. Departamento de Computacin. FCEyN. UBA. Pabellon I. Cuarto 9. Ciudad Universitaria. (1428) Buenos Aires. Argentina. Tel: +54-11-4576-3390/96 int 714 Tel/Fax: +54-11-4576-3359 Cel: 15-5-607-9192 -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Esta parte del mensaje =?ISO-8859-1?Q?est=E1?= firmada digitalmente Url : http://portal.open-bio.org/pipermail/biopython/attachments/20040426/c1fe2a6d/attachment.bin From sbassi at asalup.org Mon Apr 26 12:43:59 2004 From: sbassi at asalup.org (Sebastian Bassi) Date: Mon Apr 26 12:49:25 2004 Subject: [BioPython] Standard API to sequence parser? In-Reply-To: <1082986523.31151.5.camel@numero2> References: <1082986523.31151.5.camel@numero2> Message-ID: <408D3C4F.8060702@asalup.org> Cristian S. Rocha wrote: > Simple question: > Does exist an standard API to parse sequence files of different formats? On the BioPython cookbook there is an example of what you are looking for. A 3 line converter from one format to another (Genbank to fasta, but you can adapt it to your needs). -- Best regards, //=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ //=\ http://Bioinformatica.info From Myriam.Vezain at loria.fr Tue Apr 27 08:58:30 2004 From: Myriam.Vezain at loria.fr (Myriam Vezain) Date: Tue Apr 27 09:02:53 2004 Subject: [BioPython] Looking for functions In-Reply-To: <408E587A.5060209@loria.fr> References: <1082986523.31151.5.camel@numero2> <408E587A.5060209@loria.fr> Message-ID: <408E58F6.5010604@loria.fr> Myriam Vezain a ?crit : > Hello, > > I am looking for several functions : > - convert a swissprot file to a fasta file, > - find EcoRI restriction sites in a fasta sequence. > > Can you help me to find those functions. > I try this one : > > from Bio.SeqIO import FASTA > from Bio.SwissProt import SProt > from sys import * > > def convert_sp_fasta(infile,outfile): > """ > convert a SwissProt file into a Fasta formatted file > """ > in_h = open(infile) > sp = SProt.Iterator(in_h, SProt.SequenceParser()) > out_h = FASTA.FastaWriter(outfile) > sequence = sp.next() > out_h.write(sequence) > in_h.close() > out_h.close() > > But it was not a succes. > > Thank you. > > Myriam. > > > > > From dlondon at ebi.ac.uk Wed Apr 28 06:06:37 2004 From: dlondon at ebi.ac.uk (Darin London) Date: Wed Apr 28 06:11:00 2004 Subject: [BioPython] BOSC 2nd Call For Papers Message-ID: {Please pass the word!} SECOND CALL FOR SPEAKERS BOSC PROGRAM & CONTACT INFO * Web: http://www.open-bio.org/bosc2004/ * Email: bosc@open-bio.org * Online registration: https://www.cteusa.com/iscb3/ The program committee is currently seeking abstracts for talks at BOSC 2004. BOSC is a great opportunity for you to tell the community about your use, development, or philosophy of open source software development in bioinformatics. The committee will select several submitted abstracts for 25-minute talks and others for shorter "lightning" talks. Accepted abstracts will be published on the BOSC web site. If you are interested in speaking at BOSC 2004, please send us: * an abstract (no more than a few paragraphs) * a URL for the project page, if applicable * information about the open source license used for your software or your release plans. *** Abstracts for formal presentations must be recieved by 5-May-2004. *** LIGHTNING-TALK SPEAKERS WANTED! The program committee is currently seeking speakers for the lightning talks at BOSC 2004. Lightning talks are quick - only five minutes long - and a great opportunity for you to give people a quick summary of your open source project, code, idea, or vision of the future. If you are interested in giving a lightning talk at BOSC 2004, please send us: * a brief title and summary (one or two lines) * a URL for the project page, if applicable * information about the open source license used for your software or your release plans. We will accept entries on-line until BOSC starts, but space for demos and lightning talks is limited.
SOFTWARE DEMONSTRATIONS WANTED! If you are involved in the development of Open Source Bioinformatics Software, you are invited to provide a short demonstration to attendees of BOSC 2004. If you are interested in giving a software demonstration at BOSC 2004, please send us: * a brief title and summary (one or two lines) * a URL for the project page, if applicable * Internet connectivity requirements (e.g. website Application served on the world wide web, or web based client application). We will accept entries on-line until the BOSC starts, but space for demos and lightning talks is limited. ** Because the mission of the OBF is to promote Open Source software, we will favor submissions for projects that apply a recognized Open Source License, or adhere to the general Open Source Philosophy. See the following websites for further details: href="http://www.opensource.org/licenses/ href="http://www.opensource.org/docs/definition.php cheers, -- Darin London dlondon@ebi.ac.uk European Bioinformatics Institute, +44 (0)1223 49 2566 Wellcome Trust Genome Campus, Hinxton +44 (0)1223 49 4468 (fax) Cambridgeshire CB10 1SD, UK From stephandamen at hotmail.com Thu Apr 29 05:43:56 2004 From: stephandamen at hotmail.com (Stephan Damen) Date: Thu Apr 29 09:07:09 2004 Subject: [BioPython] Robustness of parsing Message-ID: Dear sir / madam, I have a question about the robustness of parsing in my case the UniGene at NCBI. I have been assigned to write a parser to put the UniGene flat files into an existing database structure, before starting writing code I thought I'd better search the web to find some existing solutions. This is when I opened your site and found a parser for the UniGene. At my work we already have a parser for these flat files written in C, the only problem with this parser is, is that it will not run anymore if the structure of the UniGene changes. For instance if a new field is added or if relations change from a 1-to-1 to 1-to-many. My question about biopython; has it the same problems? If that is the case; in what timespan are updates available? If biopython is also lacking these problems I want to write a more generetic solution, perhaps in python. Kind regards, Stephan Damen Project leader at UL From lpritc at scri.sari.ac.uk Thu Apr 29 10:03:14 2004 From: lpritc at scri.sari.ac.uk (Leighton Pritchard) Date: Thu Apr 29 10:07:31 2004 Subject: [BioPython] GenBank parser Message-ID: <40910B22.6040705@scri.sari.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I've noticed an oddity in the GenBank FeatureParser (CVS installation 19/4). While parsing the Salmonella typhi file NC_003198.gbk, my way of dealing with 'gene' tags fell over. This turned out to be because the GenBank file contains entries with valueless tags such as /partial and /pseudo. The current parser concatenates these tags with the following tag, e.g for: ~ CDS 1449249..1450391 ~ /partial ~ /gene="fdnG" ~ /note="Similar to part of Escherichia coli formate ~ dehydrogenase, nitrate-inducible, major subunit fdnG ~ SW:FDNG_ECOLI (P24183; P78261) (1015 aa) fasta scores: ~ E(): 0, 94.4% id in 376 aa" ~ /pseudo ~ /codon_start=1 ~ /transl_table=11 it returns a set of qualifiers which include the tags "partial gene" and "pseudo codon_start". This probably isn't what was intended by the authors ;) I haven't got a fix for the parser, but my workaround in the code was: ################## qualifiers = cds.qualifiers # Shorthand for qualifiers # We need to account for use of qualifiers, e.g. in # NC_003198.gbk, the /partial and /pseudo tags often have no # associated value - the BioPython GenBank feature parser lumps the # two together into a single tag, e.g. 'partial gene' and # 'pseudo codon_start'. This buggers up our processing below, # so the solution is to split tags by the ' ' space character, # and add a qualifier comprising only the last item in the # resulting list for key in qualifiers.keys(): ~ if key.count(' '): ~ qualifiers[key.split(' ')[-1]] = qualifiers[key] ################### ...I wasn't bothered about the partial or pseudo tags for my script - -- Dr Leighton Pritchard AMRSC D104, PPI, Scottish Crop Research Institute Invergowrie, Dundee, DD2 5DA, Scotland, UK E: lpritc@scri.sari.ac.uk W: http://bioinf.scri.sari.ac.uk/index.shtml T: +44 (0)1382 568579 F: +44 (0)1382 568578 PGP key FEFC205C: GPG key E58BA41B: http://www.keyserver.net -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.3 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFAkQsiL1gZ+OWLpBsRAg2mAJkBe3EvfNiygGEwsJ4i5wwA85t5DwCfVfPp nFoRXTGoAdrq8shnfhSPjuA= =P60G -----END PGP SIGNATURE----- From no_reply at powered-hosting.com Thu Apr 29 16:54:44 2004 From: no_reply at powered-hosting.com (AntiVirus) Date: Thu Apr 29 16:57:30 2004 Subject: [BioPython] =?iso-8859-1?q?Atenci=F3n_=3A_Virus_de_e-mail_detect?= =?iso-8859-1?q?ado?= Message-ID: <200404292054.i3TKsiWn000823@ns1.powered-hosting.com> Nuestro detector de virus ha sido activado por un mensaje enviado por Usted: A: 3db9f928.3000603@asalup.org Asunto: Re: Yahoo! Fecha: Thu Apr 29 17:54:44 2004 Uno o m?s de los anexos est?n en la lista de archivos no aceptados por este sitio y no ser?n entregados. Considere renombrar los archivos o comprimirlos en un archivo ".zip" para evitar esta restricci?n. El detector de virus dijo lo siguiente acerca del mensaje: Informe: Control panel items are often used to hide viruses (Smoke.cpl) -- Protecci?n contra Virus de E-mail From crocha at dc.uba.ar Thu Apr 29 17:11:19 2004 From: crocha at dc.uba.ar (Cristian S. Rocha) Date: Thu Apr 29 17:16:54 2004 Subject: [BioPython] FormatIO + Fasta parser + BioDB. Message-ID: <1083273079.14904.37.camel@numero2> Hello, I'm writing a procedure to store files in a BioDB but I have the following error: """ ... File "/usr/lib/python2.2/site-packages/BioSQL/Loader.py", line 209, in _load_bioentry_table if record.id.find('.') >= 0: # try to get a version from the id AttributeError: 'NoneType' object has no attribute 'find' """ and the procedure is: """ def SequenceStoreFile(SeqFile, database, format='genbank'): server = BioSeqDatabase.open_database(driver='MySQLdb', user='bio', passwd='bio', host='localhost', db='bio') if server[database]: db = server[database] else: db = server.new_database(database) formatter = FormatIO.FormatIO("SeqRecord", formats[format]) itr = formatter.readFile(SeqFile, format=formats[format]) db.load(itr) return if __name__ == "__main__": SequenceStoreFile(open('example.fasta'), 'estC', 'fasta') """ I feel that is because I don't define the title2ids function for the Fasta parser. If I'm right, how can I tell to the FormatIO module to use a title2ids function? Thanks, Cristian. -- Lic. Cristian S. Rocha. Departamento de Computacin. FCEyN. UBA. Pabellon I. Cuarto 9. Ciudad Universitaria. (1428) Buenos Aires. Argentina. Tel: +54-11-4576-3390/96 int 714 Tel/Fax: +54-11-4576-3359 Cel: 15-5-607-9192 From mdehoon at ims.u-tokyo.ac.jp Fri Apr 30 02:58:29 2004 From: mdehoon at ims.u-tokyo.ac.jp (Michiel Jan Laurens de Hoon) Date: Fri Apr 30 03:02:51 2004 Subject: [BioPython] Lowess function for nonparametric regression Message-ID: <4091F915.2060101@ims.u-tokyo.ac.jp> Dear Biopythoneers, Recently I wrote a pure Python implementation of the Lowess function for nonparametric regression. For an example, see http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/lowess.png the red line is a nonparametric regression curve fitted to the scatterplot. The value on the x-axis is the mean of replicated gene expression measurements; the y-axis is the associated measurement error. Such plots are used to find out how large the measurement error typically is for a given magnitude of a measured gene expression level. If this function is useful for other computational biologists, I can submit it to Biopython. In that case, which module would this fall under? Or can I just make a Lowess.py under Bio? The code is very short, about 25 lines of Python for the actual function. --Michiel -- Michiel de Hoon, Assistant Professor University of Tokyo, Institute of Medical Science Human Genome Center 4-6-1 Shirokane-dai, Minato-ku Tokyo 108-8639 Japan http://bonsai.ims.u-tokyo.ac.jp/~mdehoon From nomy2020 at yahoo.com Fri Apr 30 00:26:00 2004 From: nomy2020 at yahoo.com (Bzy Bee) Date: Fri Apr 30 07:46:13 2004 Subject: [BioPython] HSPs in Blast parser Message-ID: <20040430042600.6570.qmail@web90001.mail.scd.yahoo.com> Hi I am stuck on parsing a BlastN output and would appreciate some help. I am working on multiple HSPs for a single hit . For example if there are two hsps found for one hit, I need to find where query and subject ends for one hsp and then compare it with the query and subject start for the next hsp, e.g. in the following example: >test_seq1 Length = 424 Score = 841 bits (424), Expect = 0.0 Identities = 424/424 (100%) Strand = Plus / Plus Query: 1 ggactggttcgtcgtttacaagctgccggcccacacagggtcgggagatgcgacgcagaa 60 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 1 ggactggttcgtcgtttacaagctgccggcccacacagggtcgggagatgcgacgcagaa 60 Query: 61 cggcctgcggtacaagtactttgacgaacactcagaagactggagcgacggcgtggggtt 120 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| Sbjct: 61 cggcctgcggtacaagtactttgacgaacactcagaagactggagcgacggcgtggggtt 120 Score = 226 bits (114), Expect = 2e-58 Identities = 141/150 (94%) Strand = Plus / Plus Query: 275 ccagctcgcctttgtgctctacaatgaccaaccgcctaaatgcagcgagtgtaaggactc 334 ||||||||||||||||||||||||||||||||||||||||| |||||||| ||||||||| Sbjct: 513 ccagctcgcctttgtgctctacaatgaccaaccgcctaaatccagcgagtctaaggactc 572 Query: 335 ttgcagtcgtgggcacacgaagggtgtgctgctcctggaccaagaagggggcttgtggtt 394 || ||||||||||||||||||||||||||||||||||||||||||||||||||| ||||| Sbjct: 573 ttccagtcgtgggcacacgaagggtgtgctgctcctggaccaagaagggggcttctggtt 632 I am interetsed in where Query and sbjct ended in first hsp (i.e. 120, 120) and where it started in the second hsp (i.e. 275, 513). I have noticed that in the blast parser one can iterate through each hsp for every single hit, but am not too sure how to treat two hsps of a single hit as related and iterate through the two hsps of a single hit in order to find the query (and subject) end of one and query (and subject) start of the other. Any help would be highly appreciated. Thanks Jawad Ali --------------------------------- Do you Yahoo!? Win a $20,000 Career Makeover at Yahoo! HotJobs