From lzhou at ufscc.ufl.edu  Wed Jun  1 17:53:11 2005
From: lzhou at ufscc.ufl.edu (Lei Zhou)
Date: Wed Jun  1 17:45:58 2005
Subject: [BioPython] blast output file size limit
Message-ID: <s29df628.078@GWIA.ITC.HEALTH.UFL.EDU>

Does anyone know whether there is a size (number) limit for the
NCBIStandalone.BlastParser and NCBIStandalone.Iterator module?

I have a program that works well with blast output files <100kb. 
however it quit with files of about 8Mb.

Any help is appreciated.

-Lei 
From mike at maibaum.org  Wed Jun  1 17:58:13 2005
From: mike at maibaum.org (Michael Maibaum)
Date: Wed Jun  1 17:55:27 2005
Subject: [BioPython] blast output file size limit
In-Reply-To: <s29df628.078@GWIA.ITC.HEALTH.UFL.EDU>
References: <s29df628.078@GWIA.ITC.HEALTH.UFL.EDU>
Message-ID: <08B58ACA-E414-4AD6-B5BE-673692D857A4@maibaum.org>


On 1 Jun 2005, at 22:53, Lei Zhou wrote:

> Does anyone know whether there is a size (number) limit for the
> NCBIStandalone.BlastParser and NCBIStandalone.Iterator module?
>
> I have a program that works well with blast output files <100kb.
> however it quit with files of about 8Mb.
>
> Any help is appreciated.


I've used both on very large files, I usually limit the number of  
scores/alignments to around 200 so each individual record is not that  
large even if the file is very large. On rarer occasions I've  
returned 5-10,000 hits without major problems. I wouldn't expect it  
to quit unless the machine ran out of memory somehow. I don't recall  
seeing it use more than a few hundred megabytes of RAM.

Michael
From dalke at dalkescientific.com  Thu Jun  2 02:52:05 2005
From: dalke at dalkescientific.com (Andrew Dalke)
Date: Thu Jun  2 02:44:24 2005
Subject: [BioPython] qblast through a proxy
In-Reply-To: <429CDA92.7080704@mitre.org>
References: <429CDA92.7080704@mitre.org>
Message-ID: <7e8362a6d8d90af4780d251e7930ca5c@dalkescientific.com>

Alexander A. Morgan wrote:
> However, Blast.NCBIWWW uses the socket library in '_send_to_qblast()'. 
>  There doesn't seem to be an easy way to get through a proxy using the 
> low level socket library.  Does anyone have a quick fix/workaround for 
> this?

I ran into that a couple months ago, but didn't have the time to
fix it then or now.

Something like this should work.  I've picked values ("User-Agent",
1024 bytes per copy block) to exactly equal the existing code, though
the 1024 seems limiting.

import urllib2, shutil

def _send_to_blasturl(query, outhandle):
   req = urllib2.Request(
      "http://www.ncbi.nlm.nih.gov/cgi-bin/BLAST/nph-blast_report",
      query, {"User-Agent: "BiopythonClient"})
   inhandle = urllib2.urlopen(req)
   shutil.copyfileobj(inhandle, outhandle, 1024)
   inhandle.close()


However, the upstream code might be fixed.  It's currently

     outhandle = cStringIO.StringIO()
     _send_to_blasturl(message, outhandle)
     outhandle.seek(0)   # Reset the handle to the beginning.
     return outhandle


and with a urllib2.open result it's a file-like object,
so the cStringIO is only needed if the code needs to
be reseekable.  I don't know if it does or doesn't.

Another difference is the urllib2 code parses in the
headers while the existing code doesn't.  I don't know
how that affects the actual parser; looks like it doesn't.

					Andrew
					dalke@dalkescientific.com

From aurelie.bornot at free.fr  Thu Jun  2 10:00:29 2005
From: aurelie.bornot at free.fr (aurelie.bornot@free.fr)
Date: Thu Jun  2 09:52:28 2005
Subject: [BioPython] Difference between URLAPI Qblast et blastcl3
Message-ID: <1117720829.429f10fdec0b2@imp2-q.free.fr>

Hi everybody !

I don't know very well all the questions of client/servor and so ..
But I have a question :
I used for my program the qblast function of biopython that connect the NCBI by
the URLAPI Qblast system. And I am very happy like this ;)
but I have learned recently the existence of the Blastcl3 client...
And I wonder why biopython chose URLAPI ??

Could someone be very very nice and explain me the differences between the 2,
the pros and the cons...?
I am looking forward becoming less ignorant.. ;)

Thank you very much !
Aur?lie
From michael.fieseler at math.uni-muenster.de  Fri Jun  3 11:55:26 2005
From: michael.fieseler at math.uni-muenster.de (Michael Fieseler)
Date: Fri Jun  3 11:47:41 2005
Subject: [BioPython] 'Unexpected end of stream' while parsing blast results
Message-ID: <200506031755.26485.michael.fieseler@math.uni-muenster.de>

Hi,

while trying to parse output from local blast I encountered the following 
error:

Traceback (most recent call last):
  File "./extracthits.py", line 34, in ?
    b_record = b_parser.parse(blast_out);
  File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIStandalone.py", line 
610, inparse
    self._scanner.feed(handle, self._consumer)
  File "/usr/lib/python2.4/site-packages/Bio/Blast/NCBIStandalone.py", line 
93, in feed
    read_and_call_until(uhandle, consumer.noevent, contains='BLAST')
  File "/usr/lib/python2.4/site-packages/Bio/ParserSupport.py", line 335, in 
read_and_call_until
    line = safe_readline(uhandle)
  File "/usr/lib/python2.4/site-packages/Bio/ParserSupport.py", line 411, in 
safe_readline
    raise SyntaxError, "Unexpected end of stream."
SyntaxError: Unexpected end of stream.
 

The code I used is from the biopython tutorial and cookbook:

from Bio.Blast import NCBIStandalone
blast_out = open('18.blast','r')
b_parser = NCBIStandalone.BlastParser()
b_record = b_parser.parse(blast_out)

The software I am using is:
python 2.4.1
biopython 1.40b
blastall 2.2.10

Is there any solution to this? 

Regards,

Michael
From t.zito at biologie.hu-berlin.de  Mon Jun 13 08:29:09 2005
From: t.zito at biologie.hu-berlin.de (Tiziano Zito)
Date: Mon Jun 13 08:21:19 2005
Subject: [BioPython] ANN: MDP 1.1.0
Message-ID: <20050613122909.GC28566@itb.biologie.hu-berlin.de>

We post the following announcement on this list since we found 
the Biopython project very interesting and thought that it may be 
useful for Biopython users and developers.

MDP 1.1.0
---------
http://mdp-toolkit.sourceforge.net/

Modular toolkit for Data Processing (MDP) is a Python library to
perform data processing. Already implemented algorithms include:
Principal Component Analysis (PCA), Independent Component Analysis
(ICA), Slow Feature Analysis (SFA), and Growing Neural Gas (GNG).

MDP allows to combine different algorithms and other data processing
elements (nodes) into data processing sequences (flows). Moreover, it
provides a framework that makes the implementation of new algorithms
easy and intuitive.

MDP supports the most common numerical extensions to Python, currently
Numeric, Numarray, SciPy. When used together with SciPy and the symeig
package, MDP gives to the scientific programmer the full power of
well-known C and FORTRAN data processing libraries. MDP helps the
programmer to exploit Python object oriented design with C and FORTRAN
efficiency.

MDP has been written for research in neuroscience, but it has been
designed to be helpful in any context where trainable data processing
algorithms are used. Its simplicity on the user side together with the
reusability of the implemented nodes could make it also a valid
educational tool.

Requirements:

    * Python >= 2.3
    * one of the following Python numerical extensions: 
      Numeric, Numarray, or SciPy.

    For optimal performance we recommend to use SciPy with LAPACK
    and ATLAS libraries, and to install the symeig module.

(sorry for multiple posting)

--

 Tiziano Zito
 Institute for Theoretical Biology
 Humboldt-Universitaet zu Berlin  
 Invalidenstrasse, 43
 D-10115 Berlin, Germany
 
 http://itb.biologie.hu-berlin.de/~zito/

From mdehoon at c2b2.columbia.edu  Fri Jun 17 15:42:22 2005
From: mdehoon at c2b2.columbia.edu (Michiel De Hoon)
Date: Fri Jun 17 15:36:57 2005
Subject: [BioPython] Re; Rethinking Seq objects
Message-ID: <6CA15ADD82E5724F88CB53D50E61C9AE7AC199@cgcmail.cgc.cpmc.columbia.edu>

Dear biopythoneers,

A couple of weeks ago there was a discussion on the Biopython mailing lists
about the Seq and MutableSeq classes in Bio.Seq. Whereas opinions were
divided on most of my proposals (which I therefore did not implement), most
people agreed that there was a need for a more user-friendly way to
transcribe and translate sequences.

So I added a transcribe, back_transcribe, and translate function to Bio.Seq.
I wrote these as functions rather than a method so that it can take both Seq
objects and Python string objects as input. These functions work
approximately the same as the corresponding methods in Bio.Transcribe and
Bio.Translate.

The example in the Biopython tutorial would look like this:

Using strings:

>>> my_seq = 'GCCATTGTAATGGGCCGCTGAAAGGGTGCCCGA'
>>> transcribe(my_seq)
'GCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGA'
>>> back_transcribe(_)
'GCCATTGTAATGGGCCGCTGAAAGGGTGCCCGA'
>>> translate(my_seq)
'AIVMGR*KGAR'
>>> translate(my_seq,table="Vertebrate Mitochondrial")
'AIVMGRWKGAR'
>>> translate(my_seq,table=1)
'AIVMGR*KGAR'
>>> translate(my_seq,table=2)
'AIVMGRWKGAR'

Using Seq objects:

>>> from Bio.Alphabet import IUPAC
>>> my_alpha = IUPAC.unambiguous_dna
>>> from Bio.Seq import *
>>> my_seq = Seq('GCCATTGTAATGGGCCGCTGAAAGGGTGCCCGA', IUPAC.unambiguous_dna)
>>> transcribe(my_seq)
Seq('GCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGA', IUPACUnambiguousRNA())
>>> back_transcribe(_)
Seq('GCCATTGTAATGGGCCGCTGAAAGGGTGCCCGA', IUPACUnambiguousDNA())
>>> translate(my_seq)
Seq('AIVMGR*KGAR', HasStopCodon(IUPACProtein(), '*'))
>>> translate(my_seq,table="Vertebrate Mitochondrial")
Seq('AIVMGRWKGAR', HasStopCodon(IUPACProtein(), '*'))
>>> translate(my_seq,table=1)
Seq('AIVMGR*KGAR', HasStopCodon(IUPACProtein(), '*'))
>>> translate(my_seq,table=2)
Seq('AIVMGRWKGAR', HasStopCodon(IUPACProtein(), '*'))
>>>

The original methods in Bio.Transcribe and Bio.Translate of course still work
(for Seq objects).


Thanks, everybody, for contributing to this discussion. I hope these
functions will prove to be useful.

--Michiel.


Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1150 St Nicholas Avenue
New York, NY 10032


From boehme at mpiib-berlin.mpg.de  Mon Jun 27 08:38:18 2005
From: boehme at mpiib-berlin.mpg.de (Martina)
Date: Mon Jun 27 08:30:15 2005
Subject: [BioPython] Options for qblast
Message-ID: <42BFF33A.5020004@mpiib-berlin.mpg.de>

Hi,

I'm looking for an option to give qblast the word_size (for short, 
nearly excact matches), is that possible? "expect" seems to be ok.
Can't find it in biopython 1.40b py2.4.

Martina
From dr.yu.wang at gmail.com  Mon Jun 27 10:34:51 2005
From: dr.yu.wang at gmail.com (yu wang)
Date: Mon Jun 27 10:27:29 2005
Subject: [BioPython] blast
Message-ID: <dfd2be6050627073444d9356d@mail.gmail.com>

Hi there, I have two questions related to blast
 1. is there a straight forward to print out a blast record ? e.g. I
just want to print out a best hit full record.
 2. How to print a full query sequence? I searched the documentation.
The only way I could do now is to buy a fasta dictionary for my query
fasta file and use a query name to get the sequence. is there a better
way to do it?

Thank you very much

Yu

From eirik.sonneland at student.umb.no  Thu Jun 30 07:53:34 2005
From: eirik.sonneland at student.umb.no (=?ISO-8859-1?Q?Eirik_S=F8nneland?=)
Date: Thu Jun 30 07:44:57 2005
Subject: [BioPython] Blast
Message-ID: <42C3DD3E.7060208@student.umb.no>

Hi!
I use following biopython code with great success:

 b_results = NCBIWWW.qblast('blastn', 'bta_genome/all_contig', f_record)

Now I want to blast the Bos taurus trace archive. I looked on the NCBI 
site have a DB called Trace/Bos taurus_other. This don't work when 
implimenting it into the code. Can someone please recommend me what to 
put in as database instead of 'bta_genome/all_contig'? I only need a 
blast output file which I later will use to retrive the TI numbers and 
retrive the specific Bos taurus traces (.scf) I need.

I really appreciate the help you can offer!

Have a nice day!

Regards,

Eirik S?nneland
Master student
Norwegian University of Life Sciences,
Centre of Integrative Genetics/ Bos taurus SNP project.
From eirik.sonneland at student.umb.no  Thu Jun 30 07:53:34 2005
From: eirik.sonneland at student.umb.no (=?ISO-8859-1?Q?Eirik_S=F8nneland?=)
Date: Thu Jun 30 07:45:10 2005
Subject: [BioPython] Blast
Message-ID: <42C3DD3E.7060208@student.umb.no>

Hi!
I use following biopython code with great success:

 b_results = NCBIWWW.qblast('blastn', 'bta_genome/all_contig', f_record)

Now I want to blast the Bos taurus trace archive. I looked on the NCBI 
site have a DB called Trace/Bos taurus_other. This don't work when 
implimenting it into the code. Can someone please recommend me what to 
put in as database instead of 'bta_genome/all_contig'? I only need a 
blast output file which I later will use to retrive the TI numbers and 
retrive the specific Bos taurus traces (.scf) I need.

I really appreciate the help you can offer!

Have a nice day!

Regards,

Eirik S?nneland
Master student
Norwegian University of Life Sciences,
Centre of Integrative Genetics/ Bos taurus SNP project.