From lzhou at ufscc.ufl.edu Wed Jun 1 17:53:11 2005 From: lzhou at ufscc.ufl.edu (Lei Zhou) Date: Wed Jun 1 17:45:58 2005 Subject: [BioPython] blast output file size limit Message-ID: Does anyone know whether there is a size (number) limit for the NCBIStandalone.BlastParser and NCBIStandalone.Iterator module? I have a program that works well with blast output files <100kb. however it quit with files of about 8Mb. Any help is appreciated. -Lei From mike at maibaum.org Wed Jun 1 17:58:13 2005 From: mike at maibaum.org (Michael Maibaum) Date: Wed Jun 1 17:55:27 2005 Subject: [BioPython] blast output file size limit In-Reply-To: References: Message-ID: <08B58ACA-E414-4AD6-B5BE-673692D857A4@maibaum.org> On 1 Jun 2005, at 22:53, Lei Zhou wrote: > Does anyone know whether there is a size (number) limit for the > NCBIStandalone.BlastParser and NCBIStandalone.Iterator module? > > I have a program that works well with blast output files <100kb. > however it quit with files of about 8Mb. > > Any help is appreciated. I've used both on very large files, I usually limit the number of scores/alignments to around 200 so each individual record is not that large even if the file is very large. On rarer occasions I've returned 5-10,000 hits without major problems. I wouldn't expect it to quit unless the machine ran out of memory somehow. I don't recall seeing it use more than a few hundred megabytes of RAM. Michael From dalke at dalkescientific.com Thu Jun 2 02:52:05 2005 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu Jun 2 02:44:24 2005 Subject: [BioPython] qblast through a proxy In-Reply-To: <429CDA92.7080704@mitre.org> References: <429CDA92.7080704@mitre.org> Message-ID: <7e8362a6d8d90af4780d251e7930ca5c@dalkescientific.com> Alexander A. Morgan wrote: > However, Blast.NCBIWWW uses the socket library in '_send_to_qblast()'. > There doesn't seem to be an easy way to get through a proxy using the > low level socket library. Does anyone have a quick fix/workaround for > this? I ran into that a couple months ago, but didn't have the time to fix it then or now. Something like this should work. I've picked values ("User-Agent", 1024 bytes per copy block) to exactly equal the existing code, though the 1024 seems limiting. import urllib2, shutil def _send_to_blasturl(query, outhandle): req = urllib2.Request( "http://www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nph-blast_report", query, {"User-Agent: "BiopythonClient"}) inhandle = urllib2.urlopen(req) shutil.copyfileobj(inhandle, outhandle, 1024) inhandle.close() However, the upstream code might be fixed. It's currently outhandle = cStringIO.StringIO() _send_to_blasturl(message, outhandle) outhandle.seek(0) # Reset the handle to the beginning. return outhandle and with a urllib2.open result it's a file-like object, so the cStringIO is only needed if the code needs to be reseekable. I don't know if it does or doesn't. Another difference is the urllib2 code parses in the headers while the existing code doesn't. I don't know how that affects the actual parser; looks like it doesn't. Andrew dalke@dalkescientific.com From aurelie.bornot at free.fr Thu Jun 2 10:00:29 2005 From: aurelie.bornot at free.fr (aurelie.bornot@free.fr) Date: Thu Jun 2 09:52:28 2005 Subject: [BioPython] Difference between URLAPI Qblast et blastcl3 Message-ID: <1117720829.429f10fdec0b2@imp2-q.free.fr> Hi everybody ! I don't know very well all the questions of client/servor and so .. But I have a question : I used for my program the qblast function of biopython that connect the NCBI by the URLAPI Qblast system. And I am very happy like this ;) but I have learned recently the existence of the Blastcl3 client... And I wonder why biopython chose URLAPI ?? Could someone be very very nice and explain me the differences between the 2, the pros and the cons...? I am looking forward becoming less ignorant.. ;) Thank you very much ! Aur?lie From michael.fieseler at math.uni-muenster.de Fri Jun 3 11:55:26 2005 From: michael.fieseler at math.uni-muenster.de (Michael Fieseler) Date: Fri Jun 3 11:47:41 2005 Subject: [BioPython] 'Unexpected end of stream' while parsing blast results Message-ID: <200506031755.26485.michael.fieseler@math.uni-muenster.de> Hi, while trying to parse output from local blast I encountered the following error: Traceback (most recent call last): File "./extracthits.py", line 34, in ? b_record = b_parser.parse(blast_out); File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIStandalone.py", line 610, inparse self._scanner.feed(handle, self._consumer) File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIStandalone.py", line 93, in feed read_and_call_until(uhandle, consumer.noevent, contains='BLAST') File "/usr/lib/python2.4/site-packages/Bio/ParserSupport.py", line 335, in read_and_call_until line = safe_readline(uhandle) File "/usr/lib/python2.4/site-packages/Bio/ParserSupport.py", line 411, in safe_readline raise SyntaxError, "Unexpected end of stream." SyntaxError: Unexpected end of stream. The code I used is from the biopython tutorial and cookbook: from Bio.Blast import NCBIStandalone blast_out = open('18.blast','r') b_parser = NCBIStandalone.BlastParser() b_record = b_parser.parse(blast_out) The software I am using is: python 2.4.1 biopython 1.40b blastall 2.2.10 Is there any solution to this? Regards, Michael From t.zito at biologie.hu-berlin.de Mon Jun 13 08:29:09 2005 From: t.zito at biologie.hu-berlin.de (Tiziano Zito) Date: Mon Jun 13 08:21:19 2005 Subject: [BioPython] ANN: MDP 1.1.0 Message-ID: <20050613122909.GC28566@itb.biologie.hu-berlin.de> We post the following announcement on this list since we found the Biopython project very interesting and thought that it may be useful for Biopython users and developers. MDP 1.1.0 --------- http://mdp-toolkit.sourceforge.net/ Modular toolkit for Data Processing (MDP) is a Python library to perform data processing. Already implemented algorithms include: Principal Component Analysis (PCA), Independent Component Analysis (ICA), Slow Feature Analysis (SFA), and Growing Neural Gas (GNG). MDP allows to combine different algorithms and other data processing elements (nodes) into data processing sequences (flows). Moreover, it provides a framework that makes the implementation of new algorithms easy and intuitive. MDP supports the most common numerical extensions to Python, currently Numeric, Numarray, SciPy. When used together with SciPy and the symeig package, MDP gives to the scientific programmer the full power of well-known C and FORTRAN data processing libraries. MDP helps the programmer to exploit Python object oriented design with C and FORTRAN efficiency. MDP has been written for research in neuroscience, but it has been designed to be helpful in any context where trainable data processing algorithms are used. Its simplicity on the user side together with the reusability of the implemented nodes could make it also a valid educational tool. Requirements: * Python >= 2.3 * one of the following Python numerical extensions: Numeric, Numarray, or SciPy. For optimal performance we recommend to use SciPy with LAPACK and ATLAS libraries, and to install the symeig module. (sorry for multiple posting) -- Tiziano Zito Institute for Theoretical Biology Humboldt-Universitaet zu Berlin Invalidenstrasse, 43 D-10115 Berlin, Germany http://itb.biologie.hu-berlin.de/~zito/ From mdehoon at c2b2.columbia.edu Fri Jun 17 15:42:22 2005 From: mdehoon at c2b2.columbia.edu (Michiel De Hoon) Date: Fri Jun 17 15:36:57 2005 Subject: [BioPython] Re; Rethinking Seq objects Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE7AC199@cgcmail.cgc.cpmc.columbia.edu> Dear biopythoneers, A couple of weeks ago there was a discussion on the Biopython mailing lists about the Seq and MutableSeq classes in Bio.Seq. Whereas opinions were divided on most of my proposals (which I therefore did not implement), most people agreed that there was a need for a more user-friendly way to transcribe and translate sequences. So I added a transcribe, back_transcribe, and translate function to Bio.Seq. I wrote these as functions rather than a method so that it can take both Seq objects and Python string objects as input. These functions work approximately the same as the corresponding methods in Bio.Transcribe and Bio.Translate. The example in the Biopython tutorial would look like this: Using strings: >>> my_seq = 'GCCATTGTAATGGGCCGCTGAAAGGGTGCCCGA' >>> transcribe(my_seq) 'GCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGA' >>> back_transcribe(_) 'GCCATTGTAATGGGCCGCTGAAAGGGTGCCCGA' >>> translate(my_seq) 'AIVMGR*KGAR' >>> translate(my_seq,table="Vertebrate Mitochondrial") 'AIVMGRWKGAR' >>> translate(my_seq,table=1) 'AIVMGR*KGAR' >>> translate(my_seq,table=2) 'AIVMGRWKGAR' Using Seq objects: >>> from Bio.Alphabet import IUPAC >>> my_alpha = IUPAC.unambiguous_dna >>> from Bio.Seq import * >>> my_seq = Seq('GCCATTGTAATGGGCCGCTGAAAGGGTGCCCGA', IUPAC.unambiguous_dna) >>> transcribe(my_seq) Seq('GCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGA', IUPACUnambiguousRNA()) >>> back_transcribe(_) Seq('GCCATTGTAATGGGCCGCTGAAAGGGTGCCCGA', IUPACUnambiguousDNA()) >>> translate(my_seq) Seq('AIVMGR*KGAR', HasStopCodon(IUPACProtein(), '*')) >>> translate(my_seq,table="Vertebrate Mitochondrial") Seq('AIVMGRWKGAR', HasStopCodon(IUPACProtein(), '*')) >>> translate(my_seq,table=1) Seq('AIVMGR*KGAR', HasStopCodon(IUPACProtein(), '*')) >>> translate(my_seq,table=2) Seq('AIVMGRWKGAR', HasStopCodon(IUPACProtein(), '*')) >>> The original methods in Bio.Transcribe and Bio.Translate of course still work (for Seq objects). Thanks, everybody, for contributing to this discussion. I hope these functions will prove to be useful. --Michiel. Michiel de Hoon Center for Computational Biology and Bioinformatics Columbia University 1150 St Nicholas Avenue New York, NY 10032 From boehme at mpiib-berlin.mpg.de Mon Jun 27 08:38:18 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Mon Jun 27 08:30:15 2005 Subject: [BioPython] Options for qblast Message-ID: <42BFF33A.5020004@mpiib-berlin.mpg.de> Hi, I'm looking for an option to give qblast the word_size (for short, nearly excact matches), is that possible? "expect" seems to be ok. Can't find it in biopython 1.40b py2.4. Martina From dr.yu.wang at gmail.com Mon Jun 27 10:34:51 2005 From: dr.yu.wang at gmail.com (yu wang) Date: Mon Jun 27 10:27:29 2005 Subject: [BioPython] blast Message-ID: Hi there, I have two questions related to blast 1. is there a straight forward to print out a blast record ? e.g. I just want to print out a best hit full record. 2. How to print a full query sequence? I searched the documentation. The only way I could do now is to buy a fasta dictionary for my query fasta file and use a query name to get the sequence. is there a better way to do it? Thank you very much Yu From eirik.sonneland at student.umb.no Thu Jun 30 07:53:34 2005 From: eirik.sonneland at student.umb.no (=?ISO-8859-1?Q?Eirik_S=F8nneland?=) Date: Thu Jun 30 07:44:57 2005 Subject: [BioPython] Blast Message-ID: <42C3DD3E.7060208@student.umb.no> Hi! I use following biopython code with great success: b_results = NCBIWWW.qblast('blastn', 'bta_genome/all_contig', f_record) Now I want to blast the Bos taurus trace archive. I looked on the NCBI site have a DB called Trace/Bos taurus_other. This don't work when implimenting it into the code. Can someone please recommend me what to put in as database instead of 'bta_genome/all_contig'? I only need a blast output file which I later will use to retrive the TI numbers and retrive the specific Bos taurus traces (.scf) I need. I really appreciate the help you can offer! Have a nice day! Regards, Eirik S?nneland Master student Norwegian University of Life Sciences, Centre of Integrative Genetics/ Bos taurus SNP project. From eirik.sonneland at student.umb.no Thu Jun 30 07:53:34 2005 From: eirik.sonneland at student.umb.no (=?ISO-8859-1?Q?Eirik_S=F8nneland?=) Date: Thu Jun 30 07:45:10 2005 Subject: [BioPython] Blast Message-ID: <42C3DD3E.7060208@student.umb.no> Hi! I use following biopython code with great success: b_results = NCBIWWW.qblast('blastn', 'bta_genome/all_contig', f_record) Now I want to blast the Bos taurus trace archive. I looked on the NCBI site have a DB called Trace/Bos taurus_other. This don't work when implimenting it into the code. Can someone please recommend me what to put in as database instead of 'bta_genome/all_contig'? I only need a blast output file which I later will use to retrive the TI numbers and retrive the specific Bos taurus traces (.scf) I need. I really appreciate the help you can offer! Have a nice day! Regards, Eirik S?nneland Master student Norwegian University of Life Sciences, Centre of Integrative Genetics/ Bos taurus SNP project.