From eric.talevich at gmail.com Fri Jul 1 11:27:42 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 1 Jul 2011 11:27:42 -0400 Subject: [Biopython] Having a hard time getting a handle on handles In-Reply-To: <4E0D441C.5020600@gmail.com> References: <4E0D212B.4040206@gmail.com> <4E0D441C.5020600@gmail.com> Message-ID: OK, the "iteration" is where you're trying to construct a list. Where it says: blast_record = list(NCBIXML.read(result_handle)) Try: blast_record = NCBIXML.read(result_handle) print blast_record And see what's in the blast_record object. That should make it more clear how the rest of your code should navigate it. -E On Thu, Jun 30, 2011 at 11:50 PM, Dilara Ally wrote: > Thanks, I tried that and now the error is > > > Traceback (most recent call last): > File "", line 12, in > TypeError: iteration over non-sequence > > Dilara > > > On 6/30/11 8:24 PM, Eric Talevich wrote: > >> from Bio.Blast import NCBIXML >> > From p.j.a.cock at googlemail.com Sat Jul 2 05:18:45 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 2 Jul 2011 10:18:45 +0100 Subject: [Biopython] multiple sequence blast In-Reply-To: <20110630104227.GA2883@sobchak> References: <4E0BAD67.70305@gmail.com> <20110630104227.GA2883@sobchak> Message-ID: On Thu, Jun 30, 2011 at 11:42 AM, Brad Chapman wrote: > Dilara; > Thanks for the message. It would be helpful if you'd include the > error message traceback that you got stuck on; this will help > pinpoint the problem. > > From reading your code, my guess is that you are getting and IOError > about files not existing. When you do os.listdir, it only includes > the name of the files, not the full path to where they are located. I would have suggested the same thing. In addition, are you really trying to run 100,000 contigs though the NCBI online BLAST service? If it works it will take a long time, but they might not like that and block your access. Big BLAST jobs like this are better done by installing BLAST+ (and in this case the NR database) locally. Biopython has wrappers to help call standalone BLAST too. Peter From dilara.ally at gmail.com Sun Jul 3 14:27:12 2011 From: dilara.ally at gmail.com (Dilara Ally) Date: Sun, 03 Jul 2011 11:27:12 -0700 Subject: [Biopython] multiple sequence blast In-Reply-To: References: <4E0BAD67.70305@gmail.com> <20110630104227.GA2883@sobchak> Message-ID: <4E10B480.1030600@gmail.com> Thanks Peter. On 7/2/11 2:18 AM, Peter Cock wrote: > On Thu, Jun 30, 2011 at 11:42 AM, Brad Chapman wrote: >> Dilara; >> Thanks for the message. It would be helpful if you'd include the >> error message traceback that you got stuck on; this will help >> pinpoint the problem. >> >> From reading your code, my guess is that you are getting and IOError >> about files not existing. When you do os.listdir, it only includes >> the name of the files, not the full path to where they are located. > I would have suggested the same thing. > > In addition, are you really trying to run 100,000 contigs though > the NCBI online BLAST service? If it works it will take a long time, > but they might not like that and block your access. Big BLAST > jobs like this are better done by installing BLAST+ (and in this case > the NR database) locally. Biopython has wrappers to help call > standalone BLAST too. > > Peter > From dilara.ally at gmail.com Sun Jul 3 15:27:53 2011 From: dilara.ally at gmail.com (Dilara Ally) Date: Sun, 03 Jul 2011 12:27:53 -0700 Subject: [Biopython] multiple sequence blast In-Reply-To: References: <4E0BAD67.70305@gmail.com> <20110630104227.GA2883@sobchak> Message-ID: <4E10C2B9.2000909@gmail.com> Hi Peter How long will it take then to do a big BLAST job that has over 600,000 contigs. Wouldn't downloading the databasese and doing a standalone BLAST take a lot of cpu memory? Should I be doing this on a cluster? Dilara On 7/2/11 2:18 AM, Peter Cock wrote: > On Thu, Jun 30, 2011 at 11:42 AM, Brad Chapman wrote: >> Dilara; >> Thanks for the message. It would be helpful if you'd include the >> error message traceback that you got stuck on; this will help >> pinpoint the problem. >> >> From reading your code, my guess is that you are getting and IOError >> about files not existing. When you do os.listdir, it only includes >> the name of the files, not the full path to where they are located. > I would have suggested the same thing. > > In addition, are you really trying to run 100,000 contigs though > the NCBI online BLAST service? If it works it will take a long time, > but they might not like that and block your access. Big BLAST > jobs like this are better done by installing BLAST+ (and in this case > the NR database) locally. Biopython has wrappers to help call > standalone BLAST too. > > Peter > From p.j.a.cock at googlemail.com Sun Jul 3 19:52:34 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 4 Jul 2011 00:52:34 +0100 Subject: [Biopython] multiple sequence blast In-Reply-To: <4E10C2B9.2000909@gmail.com> References: <4E0BAD67.70305@gmail.com> <20110630104227.GA2883@sobchak> <4E10C2B9.2000909@gmail.com> Message-ID: On Sun, Jul 3, 2011 at 8:27 PM, Dilara Ally wrote: > Hi Peter > > How long will it take then to do a big BLAST job that has over > 600,000 contigs. How long is a piece of string? ;) What I mean is this is hard to say without looking at your data. Do you know the total sequence length of the contigs? Try doing 60 representative contigs on your machine for an estimate (note their lengths are important - shorter contigs should be faster to run as BLAST queries). Remember that standalone BLAST+ can be run multi-threaded. It will depend on the number of CPUs and how much RAM you have. > Wouldn't downloading the databasese and doing a standalone > BLAST take a lot of cpu memory? Yes, it will take a lot of CPU time, and a moderate amount of RAM (if you are doing genome assembly to get the contigs, that will probably have needed far more RAM than running BLAST will). >?Should I be doing this on a cluster? It would probably be worth while. You *might* manage with a powerful multicore desktop (like a recent MacPro or similar) or powerful server. Peter From kokomutai at gmail.com Mon Jul 4 04:25:35 2011 From: kokomutai at gmail.com (Koko Mutai) Date: Mon, 4 Jul 2011 11:25:35 +0300 Subject: [Biopython] (no subject) Message-ID: hallo l would like to ask how to import,sequence matrix metalloproteinases and DNA annotation using biopython From from.d.putto at gmail.com Thu Jul 7 06:26:48 2011 From: from.d.putto at gmail.com (Sheila the angel) Date: Thu, 7 Jul 2011 12:26:48 +0200 Subject: [Biopython] Passing sequence to local BLAST Message-ID: Hi All, I want to download genbank file from NCBI and pass the protein sequence directly to the local BLAST. But I am getting error in BLAST step #------------------------------------------------------------------------------------------- from Bio import SeqIO from Bio import Entrez from Bio.Blast.Applications import NcbiblastpCommandline id='200203' handle = Entrez.efetch(db="protein", id=id, rettype="gp") seq_record = SeqIO.read(handle, "gb") x=seq_record.seq #getting the sequence in a variable x blastp_cline = NcbiblastpCommandline(query=x, db="protein_database", evalue=0.001) # My BLAST command result_handle, stderr = blastp_cline() #Running BLAST and getting error :( #------------------------------------------------------------------------------------------- At this last step I am getting error..... I sort-of understand the problem.....it is taking value of x as a file name while its a variable which contains the sequence. Is there any way out to this problem without making temporary file. Thanks in Advance -- Cheers Sheila From p.j.a.cock at googlemail.com Thu Jul 7 06:43:22 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 7 Jul 2011 11:43:22 +0100 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: On Thu, Jul 7, 2011 at 11:26 AM, Sheila the angel wrote: > Hi All, > > I want to download genbank file from NCBI and pass the protein sequence > directly to the local BLAST. But I am getting error in BLAST step > #------------------------------------------------------------------------------------------- > from Bio import SeqIO > from Bio import Entrez > from Bio.Blast.Applications import NcbiblastpCommandline > id='200203' > handle = Entrez.efetch(db="protein", id=id, rettype="gp") > seq_record = SeqIO.read(handle, "gb") > x=seq_record.seq ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?#getting the > sequence in a variable x > blastp_cline = NcbiblastpCommandline(query=x, db="protein_database", > evalue=0.001) ? ?# My BLAST command > result_handle, stderr = blastp_cline() ? ? ? ? ? ? ? ? ? ?#Running BLAST and > getting error :( > > #------------------------------------------------------------------------------------------- > > At this last step I am getting error..... > I sort-of understand the problem.....it is taking value of x as a file name > while its a variable which contains the sequence. > Is there any way out to this problem without making temporary file. With the standalone blast tools you generally need to prepare an input FASTA file with your query sequence(s). However, in principle you can give the input filename as - (default), and instead pipe the query FASTA record in as stdin (standard input). Try something like this (untested): ... blastp_cline = NcbiblastpCommandline(query="-", db="protein_database", evalue=0.001) stdout, stderr = blastp_cline(stdin=seq_record.format("fasta")) Peter From from.d.putto at gmail.com Thu Jul 7 06:53:28 2011 From: from.d.putto at gmail.com (Sheila the angel) Date: Thu, 7 Jul 2011 12:53:28 +0200 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: Great !!!! It works Thanks a lot :) On Thu, Jul 7, 2011 at 12:43 PM, Peter Cock wrote: > On Thu, Jul 7, 2011 at 11:26 AM, Sheila the angel > wrote: > > Hi All, > > > > I want to download genbank file from NCBI and pass the protein sequence > > directly to the local BLAST. But I am getting error in BLAST step > > > #------------------------------------------------------------------------------------------- > > from Bio import SeqIO > > from Bio import Entrez > > from Bio.Blast.Applications import NcbiblastpCommandline > > id='200203' > > handle = Entrez.efetch(db="protein", id=id, rettype="gp") > > seq_record = SeqIO.read(handle, "gb") > > x=seq_record.seq #getting the > > sequence in a variable x > > blastp_cline = NcbiblastpCommandline(query=x, db="protein_database", > > evalue=0.001) # My BLAST command > > result_handle, stderr = blastp_cline() #Running BLAST > and > > getting error :( > > > > > #------------------------------------------------------------------------------------------- > > > > At this last step I am getting error..... > > I sort-of understand the problem.....it is taking value of x as a file > name > > while its a variable which contains the sequence. > > Is there any way out to this problem without making temporary file. > > With the standalone blast tools you generally need to prepare an input > FASTA file with your query sequence(s). > > However, in principle you can give the input filename as - (default), > and instead pipe the query FASTA record in as stdin (standard input). > Try something like this (untested): > > ... > blastp_cline = NcbiblastpCommandline(query="-", db="protein_database", > evalue=0.001) > stdout, stderr = blastp_cline(stdin=seq_record.format("fasta")) > > Peter > From p.j.a.cock at googlemail.com Thu Jul 7 06:55:29 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 7 Jul 2011 11:55:29 +0100 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: On Thu, Jul 7, 2011 at 11:53 AM, Sheila the angel wrote: > Great !!!!?It works > Thanks a lot :) Thanks for letting us know. Another solution would be to just download the file from the NCBI in FASTA format rather than GenBank format - but I expect you had other reasons for doing that. Peter From jessecolangelolillis at googlemail.com Thu Jul 7 07:30:58 2011 From: jessecolangelolillis at googlemail.com (Jesse Colangelo-Lillis) Date: Thu, 7 Jul 2011 04:30:58 -0700 Subject: [Biopython] Bio.Blast; entrez_query= multiple organisms Message-ID: Can someone tell me the format for specifying multiple organisms within the blast parameters? I have this: result_handle = NCBIWWW.qblast("blastp", "nr", gene_seq, expect=100, hitlist_size=1, entrez_query="unclassified Caudovirales[orgn]") but I actually want to blast against both 'Caudovirales' and 'unclassified Caudovirales'. Thanks for any help. -- Jesse Colangelo-Lillis -- Australian National University Research School Earth Sciences Bldg 61. Mills Road Canberra, ACT 0200 Australia cell: 0415 380 105 -- From p.j.a.cock at googlemail.com Thu Jul 7 07:42:45 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 7 Jul 2011 12:42:45 +0100 Subject: [Biopython] Bio.Blast; entrez_query= multiple organisms In-Reply-To: References: Message-ID: On Thu, Jul 7, 2011 at 12:30 PM, Jesse Colangelo-Lillis wrote: > Can someone tell me the format for specifying multiple organisms > within the blast parameters? > > I have this: > > result_handle = NCBIWWW.qblast("blastp", "nr", gene_seq, expect=100, > hitlist_size=1, entrez_query="unclassified Caudovirales[orgn]") > > but I actually want to blast against both 'Caudovirales' and > 'unclassified Caudovirales'. > Thanks for any help. You probably would need explicit quotes round unclassified Caudovirales on the Entrez query, otherwise it will do this I think: unclassified AND Caudovirales[orgn] I would use the taxid rather than the name to avoid the space problem. Is a taxid that covers both your clades of interest? Otherwise combine fields with OR (or AND as appropriate). Play with the web interface to build the right query: http://www.ncbi.nlm.nih.gov/protein/advanced Peter From mnemonico at posthocergopropterhoc.net Fri Jul 8 00:19:20 2011 From: mnemonico at posthocergopropterhoc.net (A M Torres, Hugo) Date: Fri, 8 Jul 2011 01:19:20 -0300 Subject: [Biopython] error writing fasta file using SeqIO Message-ID: Hi. Can someone spot why I can't create a fasta file here? I tried following the cookbook tutorial but something goes wrong when I try to write the sequence from a SeqRecord object to a fasta file: Lodge It - New - All - About - ? Paste #9205 Paste Details reply | raw posted on Jul 8, 2011 4:12:16 AM - reply to this paste - download paste - compare with paste - select different colorscheme Autumn Borland Bw Colorful Default Emacs Friendly Fruity Manni Monokai Murphy Native Pastie Perldoc Tango Trac Vs - toggle line numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 import abifpy from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord from Bio.Emboss.Applications import NeedleCommandline import os #uso a funcao listdir from Bio import SeqIO def acessa_ab1(arquivo,trim=True): #generalizar depois """acessa um arquivo ab1 e retorna um objeto SeqRecord""" dado = abifpy.Trace(arquivo) if trim: cortado = dado.trim(dado.seq(ambig=True)) return SeqRecord(cortado, id=arquivo, description='dado cortado') else: return dado.seqrecord() def abre_ref(arquivo): """acessa um arquivo contendo uma sequencia de referencia retorna um objeto SeqRecord""" with open(arquivo, 'rUb') as dado: referencia = SeqIO.read(dado, 'genbank') return referencia def salva_fasta(obj_SeqRecord): """Pega um objeto SeqRecord e cria um fasta com a sua sequencia""" SeqIO.write([obj_SeqRecord], obj_SeqRecord + '.fasta','fasta') def processar_lote(diretorio, ref): """abre os arquivos ab1 de uma pasta, apara, salva em fasta, faz o alinhamento com o fasta de referencia e salva o alinhamento em um arquivo para analise posterior. diretorio --> uma string representando o caminho da pasta contendo os arquivos ref --> uma string representando o caminho absoluto + genbank com a sequencia de referencia. """ referencia = abre_ref(ref) referencia.id = 'sequencia de referencia' salva_fasta(referencia) ab1files = [x for x in os.listdir(diretorio) if x.endswith('.ab1')] for file in ab1files: dado = acessa_ab1(diretorio + file) salva_fasta(dado) needle_cline = NeedleCommandline(asequence='referencia.fasta', bsequence= file + '.fasta', gapopen=10, gapextend=0.5, outfile=file + "_aligned.txt") stdout, stderr = needle_cline() #pasta = '/home/mercutio22/Dropbox/My scripts/Fabi/vs/Seq_placa273 analisada/' #referencia = '/home/mercutio22/Dropbox/My scripts/Fabi/vs/Seq_placa273 analisada/BRCA1 (total) - Frag 3450.gb' #processar_lote(pasta, referencia) dado = acessa_ab1('/home/mercutio22/Dropbox/My scripts/Fabi/vs/Seq_placa273 analisada/1174411_3450F_A01.ab1') print type(dado) salva_fasta(dado) ===============================error msg============== Traceback (most recent call last): File "louise.py", line 59, in salva_fasta(dado) File "louise.py", line 26, in salva_fasta SeqIO.write([obj_SeqRecord], obj_SeqRecord + '.fasta','fasta') File "/usr/lib/pymodules/python2.6/Bio/SeqIO/__init__.py", line 412, in write count = writer_class(handle).write_file(sequences) File "/usr/lib/pymodules/python2.6/Bio/SeqIO/Interfaces.py", line 271, in write_file count = self.write_records(records) File "/usr/lib/pymodules/python2.6/Bio/SeqIO/Interfaces.py", line 256, in write_records self.write_record(record) File "/usr/lib/pymodules/python2.6/Bio/SeqIO/FastaIO.py", line 134, in write_record self.handle.write(">%s\n" % title) AttributeError: 'SeqRecord' object has no attribute 'write' From w.arindrarto at gmail.com Fri Jul 8 00:46:11 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 8 Jul 2011 06:46:11 +0200 Subject: [Biopython] error writing fasta file using SeqIO In-Reply-To: References: Message-ID: Hi Hugo, I think the problem is you tried to concatenate a SeqRecord object and a string object. Do this in 'salva_fasta' instead: SeqIO.write([obj_SeqRecord], obj_SeqRecord.id + '.fasta', 'fasta') And just as an additional input, in the 'processar_lote' method, you can use this to generate a list of absolute file name paths (import the os and glob module beforehand). files = [os.path.abspath(x) for x in glob.glob('*.ab1')] os.path.abspath() returns the absolute file path for a given file, and glob.glob() returns a list of names that matches the given pattern. Hope that helps! --- Wibowo Arindrarto (bow) http://bow.web.id On Fri, Jul 8, 2011 at 06:19, A M Torres, Hugo < mnemonico at posthocergopropterhoc.net> wrote: > Hi. Can someone spot why I can't create a fasta file here? I tried > following > the cookbook tutorial but something goes wrong when I try to write the > sequence from a SeqRecord object to a fasta file: > > Lodge It > > - New > - All > - About > - ? > > Paste #9205 > Paste Details > > reply | > raw > > posted on Jul 8, 2011 4:12:16 AM > > - reply to this paste > - download paste > - compare with paste > - select different colorscheme Autumn Borland Bw Colorful Default Emacs > Friendly Fruity Manni Monokai Murphy Native Pastie Perldoc Tango Trac Vs > - toggle line numbers< > http://paste.pound-python.org/show/9205/?linenos=no> > > 1 > 2 > 3 > 4 > 5 > 6 > 7 > 8 > 9 > 10 > 11 > 12 > 13 > 14 > 15 > 16 > 17 > 18 > 19 > 20 > 21 > 22 > 23 > 24 > 25 > 26 > 27 > 28 > 29 > 30 > 31 > 32 > 33 > 34 > 35 > 36 > 37 > 38 > 39 > 40 > 41 > 42 > 43 > 44 > 45 > 46 > 47 > 48 > 49 > 50 > 51 > 52 > 53 > 54 > 55 > 56 > 57 > 58 > 59 > 60 > 61 > 62 > 63 > 64 > 65 > 66 > 67 > 68 > 69 > 70 > 71 > 72 > 73 > 74 > 75 > 76 > 77 > 78 > > import abifpy > from Bio.Seq import Seq > from Bio.SeqRecord import SeqRecord > from Bio.Emboss.Applications import NeedleCommandline > import os #uso a funcao listdir > from Bio import SeqIO > > def acessa_ab1(arquivo,trim=True): #generalizar depois > """acessa um arquivo ab1 e retorna um objeto SeqRecord""" > dado = abifpy.Trace(arquivo) > if trim: > cortado = dado.trim(dado.seq(ambig=True)) > return SeqRecord(cortado, id=arquivo, description='dado cortado') > else: > return dado.seqrecord() > > def abre_ref(arquivo): > """acessa um arquivo contendo uma sequencia de referencia > retorna um objeto SeqRecord""" > with open(arquivo, 'rUb') as dado: > referencia = SeqIO.read(dado, 'genbank') > return referencia > > def salva_fasta(obj_SeqRecord): > """Pega um objeto SeqRecord e cria um fasta com a sua sequencia""" > > SeqIO.write([obj_SeqRecord], obj_SeqRecord + '.fasta','fasta') > > def processar_lote(diretorio, ref): > """abre os arquivos ab1 de uma pasta, apara, salva em fasta, faz o > alinhamento com > o fasta de referencia e salva o alinhamento em um arquivo para analise > posterior. > diretorio --> uma string representando o caminho da pasta contendo > os arquivos > ref --> uma string representando o caminho absoluto + genbank com a > sequencia de referencia. > """ > > referencia = abre_ref(ref) > referencia.id = 'sequencia de referencia' > salva_fasta(referencia) > ab1files = [x for x in os.listdir(diretorio) if x.endswith('.ab1')] > for file in ab1files: > dado = acessa_ab1(diretorio + file) > salva_fasta(dado) > needle_cline = NeedleCommandline(asequence='referencia.fasta', > bsequence= file + '.fasta', > gapopen=10, gapextend=0.5, > outfile=file + "_aligned.txt") > stdout, stderr = needle_cline() > > > > > #pasta = '/home/mercutio22/Dropbox/My scripts/Fabi/vs/Seq_placa273 > analisada/' > #referencia = '/home/mercutio22/Dropbox/My > scripts/Fabi/vs/Seq_placa273 analisada/BRCA1 (total) - Frag 3450.gb' > > #processar_lote(pasta, referencia) > > dado = acessa_ab1('/home/mercutio22/Dropbox/My > scripts/Fabi/vs/Seq_placa273 analisada/1174411_3450F_A01.ab1') > print type(dado) > salva_fasta(dado) > > > ===============================error msg============== > > Traceback (most recent call last): > File "louise.py", line 59, in > salva_fasta(dado) > File "louise.py", line 26, in salva_fasta > SeqIO.write([obj_SeqRecord], obj_SeqRecord + '.fasta','fasta') > File "/usr/lib/pymodules/python2.6/Bio/SeqIO/__init__.py", line 412, in > write > count = writer_class(handle).write_file(sequences) > File "/usr/lib/pymodules/python2.6/Bio/SeqIO/Interfaces.py", line > 271, in write_file > count = self.write_records(records) > File "/usr/lib/pymodules/python2.6/Bio/SeqIO/Interfaces.py", line > 256, in write_records > self.write_record(record) > File "/usr/lib/pymodules/python2.6/Bio/SeqIO/FastaIO.py", line 134, > in write_record > self.handle.write(">%s\n" % title) > AttributeError: 'SeqRecord' object has no attribute 'write' > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From cmccoy at fhcrc.org Fri Jul 8 18:33:35 2011 From: cmccoy at fhcrc.org (Connor McCoy) Date: Fri, 8 Jul 2011 15:33:35 -0700 Subject: [Biopython] seqmagick Message-ID: Hi all, We wrote (and use) seqmagick, a little tool to conveniently access BioPython's Sequence I/O and manipulation capabilities from the command line. Seqmagick allows one to extract summary information on sequence files, convert between formats based on file extension, modify sequences, and much more. For example: # Convert from fasta to stockholm > seqmagick convert seqfile.fasta seqfile.sto # Reverse complement the first 5 sequences in file > seqmagick convert --reverse-complement --head 5 seqfile.fasta seqfile.rev= comp.fasta # Remove all columns containing > 5% gaps, in place > seqmagick mogrify --squeeze-threshold 0.95 seqfile.fasta For more info, see: http://fhcrc.github.com/seqmagick/ Or to install: pip install seqmagick Comments / contributions welcome. Cheers, Connor From mnemonico at posthocergopropterhoc.net Mon Jul 11 02:22:10 2011 From: mnemonico at posthocergopropterhoc.net (A M Torres, Hugo) Date: Mon, 11 Jul 2011 03:22:10 -0300 Subject: [Biopython] error writing fasta file using SeqIO In-Reply-To: References: Message-ID: Hi folks. Thanks once again for helping me out. That problem is solved. I took a look at the glob module. It is really just neat! A new problem has arised when I try to call the process that should run the sequence alignment. I try to use the 'needle' subprocess but it fails with this error: http://paste.pound-python.org/show/9395/. Here is how the code looks now: http://paste.pound-python.org/show/9394/. The weird thing is needle executes ok when I use needle_cline as it is in a bash shell. I took care to write full paths when pointing to individual files and the needle binary but that hasn't solved the error. Any clues? On Fri, Jul 8, 2011 at 1:46 AM, Wibowo Arindrarto wrote: > Hi Hugo, > > I think the problem is you tried to concatenate a SeqRecord object and a > string object. Do this in 'salva_fasta' instead: > > SeqIO.write([obj_SeqRecord], obj_SeqRecord.id + '.fasta', 'fasta') > > And just as an additional input, in the 'processar_lote' method, you can > use this to generate a list of absolute file name paths (import the os and > glob module beforehand). > > files = [os.path.abspath(x) for x in glob.glob('*.ab1')] > > os.path.abspath() returns the absolute file path for a given file, and > glob.glob() returns a list of names that matches the given pattern. > > Hope that helps! > --- > Wibowo Arindrarto (bow) > http://bow.web.id > > > > From gori at cs.ru.nl Mon Jul 11 12:07:59 2011 From: gori at cs.ru.nl (Fabio Gori) Date: Mon, 11 Jul 2011 18:07:59 +0200 Subject: [Biopython] Parsing FASTA records based on headers Message-ID: <201107111808.00013.gori@cs.ru.nl> Hi all, I tried to parse a FASTA file to select the sequences whose headers satisfy a condition. The condition is that the first word of the header belongs to a list named SelectedSequencesId. In the page http://biopython.org/wiki/SeqIO, I found this example, where the condition is that sequence length <300: 1 from Bio import SeqIO 2 3 input_seq_iterator = SeqIO.parse(open("cor6_6.gb", "rU"), "genbank") 4 short_seq_iterator = (record for record in input_seq_iterator \ 5 if len(record.seq) < 300) 6 7 output_handle = open("short_seqs.fasta", "w") 8 SeqIO.write(short_seq_iterator, output_handle, "fasta") 9 output_handle.close() so I tried to substitute line 5 with 5 record.id.split()[0] in SelectedSequencesId) But it did not work. I was able to get what I wanted generating a list with all the records and then parsing it, but I'd like to find a solution that uses a generating expression. Thanks in advance, Fabio -- F. Gori, PhD student Intelligent Systems ICIS (Institute for Computing and Information Sciences) Radboud University Nijmegen Home Page: http://www.cs.ru.nl/~gori/ From surykartka at gmail.com Mon Jul 11 13:02:51 2011 From: surykartka at gmail.com (Dorota Matelska) Date: Mon, 11 Jul 2011 19:02:51 +0200 Subject: [Biopython] Parsing FASTA records based on headers In-Reply-To: <201107111808.00013.gori@cs.ru.nl> References: <201107111808.00013.gori@cs.ru.nl> Message-ID: <5EECA7AF-1767-4BFB-9ECC-D8EACFCEED54@gmail.com> Hi Fabio, You forgot to change also the format name of your input file while using SeqIO.parse(). Your input is of fasta format, so instead of "genbank" put there "fasta", and it should work. Hope this will help you :-) Dorota On Jul 11, 2011, at 6:07 PM, Fabio Gori wrote: > Hi all, > > I tried to parse a FASTA file to select the sequences whose headers satisfy a > condition. The condition is that the first word of the header belongs to a list > named SelectedSequencesId. > In the page http://biopython.org/wiki/SeqIO, I found this example, where the > condition is that sequence length <300: > > 1 from Bio import SeqIO > 2 > 3 input_seq_iterator = SeqIO.parse(open("cor6_6.gb", "rU"), "genbank") > 4 short_seq_iterator = (record for record in input_seq_iterator \ > 5 if len(record.seq) < 300) > 6 > 7 output_handle = open("short_seqs.fasta", "w") > 8 SeqIO.write(short_seq_iterator, output_handle, "fasta") > 9 output_handle.close() > > so I tried to substitute line 5 with > 5 record.id.split()[0] in SelectedSequencesId) > > But it did not work. > I was able to get what I wanted generating a list with all the records and > then parsing it, but I'd like to find a solution that uses a generating > expression. > > Thanks in advance, > > Fabio > > -- > > F. Gori, PhD student > Intelligent Systems > ICIS (Institute for Computing and Information Sciences) > Radboud University Nijmegen > > Home Page: http://www.cs.ru.nl/~gori/ > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From w.arindrarto at gmail.com Mon Jul 11 13:02:36 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Mon, 11 Jul 2011 19:02:36 +0200 Subject: [Biopython] error writing fasta file using SeqIO In-Reply-To: References: Message-ID: Hi Hugo, I think you should pass 'alinhar' as the argument for subprocess.call() instead of 'needle_cline'. You can use the 'needle_cline' for the argument, but you should also set shell to true, so the command is subprocess.call(needle_cline, shell=True). Hope that helps. --- Wibowo Arindrarto (bow) http://bow.web.id On Mon, Jul 11, 2011 at 08:22, A M Torres, Hugo < mnemonico at posthocergopropterhoc.net> wrote: > Hi folks. > > Thanks once again for helping me out. That problem is solved. I took a look > at the glob module. It is really just neat! > > A new problem has arised when I try to call the process that should run the > sequence alignment. I try to use the 'needle' subprocess but it fails with > this error: http://paste.pound-python.org/show/9395/. > > Here is how the code looks now: http://paste.pound-python.org/show/9394/. > > The weird thing is needle executes ok when I use needle_cline as it is in a > bash shell. I took care to write full paths when pointing to individual > files and the needle binary but that hasn't solved the error. > > Any clues? > > > On Fri, Jul 8, 2011 at 1:46 AM, Wibowo Arindrarto wrote: > >> Hi Hugo, >> >> I think the problem is you tried to concatenate a SeqRecord object and a >> string object. Do this in 'salva_fasta' instead: >> >> SeqIO.write([obj_SeqRecord], obj_SeqRecord.id + '.fasta', 'fasta') >> >> And just as an additional input, in the 'processar_lote' method, you can >> use this to generate a list of absolute file name paths (import the os and >> glob module beforehand). >> >> files = [os.path.abspath(x) for x in glob.glob('*.ab1')] >> >> os.path.abspath() returns the absolute file path for a given file, and >> glob.glob() returns a list of names that matches the given pattern. >> >> Hope that helps! >> --- >> Wibowo Arindrarto (bow) >> http://bow.web.id >> >> >> >> > From devaniranjan at gmail.com Mon Jul 11 15:43:56 2011 From: devaniranjan at gmail.com (George Devaniranjan) Date: Mon, 11 Jul 2011 15:43:56 -0400 Subject: [Biopython] comparision of alignment scores Message-ID: I have several statistical comparison of alignment scores for a list of proteins--generated using biopython with the use of BLOSUM and other matrix generated by me. All matrix methods (inc BLOSUM) correctly identies it's own sequence in a collection of seq (high score set apart from the other scores) but I want to see if my own matrix is better performing than the BLOSUM. i.e --is my matrix more sensitive than BLOSUM. Is there a way using statistics to find this out? I know that this might not be the most appropriate forum to ask this question but since many of you work in this area I thought I will try. Thank you, George From p.j.a.cock at googlemail.com Mon Jul 11 15:51:09 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 11 Jul 2011 20:51:09 +0100 Subject: [Biopython] Parsing FASTA records based on headers In-Reply-To: <201107111808.00013.gori@cs.ru.nl> References: <201107111808.00013.gori@cs.ru.nl> Message-ID: On Mon, Jul 11, 2011 at 5:07 PM, Fabio Gori wrote: > Hi all, > > I tried to parse a FASTA file to select the sequences whose headers > satisfy a condition. > > The condition is that the first word of the header belongs to a list > named SelectedSequencesId. > > In the page http://biopython.org/wiki/SeqIO, I found this example, where the > condition is that sequence length <300: > > ... > > so I tried to substitute line 5 with > 5 record.id.split()[0] in SelectedSequencesId) The SeqIO parse uses the first word of the ">" line as the id, so all you need is this: record.id in SelectedSequencesId rather than: len(record.seq) < 300 > But it did not work. In what way? Did you also change the format to "fasta" as Dorota pointed out? Peter From mnemonico at posthocergopropterhoc.net Tue Jul 12 03:59:54 2011 From: mnemonico at posthocergopropterhoc.net (A M Torres, Hugo) Date: Tue, 12 Jul 2011 04:59:54 -0300 Subject: [Biopython] error writing fasta file using SeqIO In-Reply-To: References: Message-ID: Yes of course! I should've known. Thanks a bunch, this one was killing me. On Mon, Jul 11, 2011 at 2:02 PM, Wibowo Arindrarto wrote: > Hi Hugo, > > I think you should pass 'alinhar' as the argument for subprocess.call() > instead of 'needle_cline'. You can use the 'needle_cline' for the argument, > but you should also set shell to true, so the command > is subprocess.call(needle_cline, shell=True). > > Hope that helps. > --- > Wibowo Arindrarto (bow) > http://bow.web.id > > > > On Mon, Jul 11, 2011 at 08:22, A M Torres, Hugo < > mnemonico at posthocergopropterhoc.net> wrote: > >> Hi folks. >> >> Thanks once again for helping me out. That problem is solved. I took a >> look at the glob module. It is really just neat! >> >> A new problem has arised when I try to call the process that should run >> the sequence alignment. I try to use the 'needle' subprocess but it fails >> with this error: http://paste.pound-python.org/show/9395/. >> >> Here is how the code looks now: http://paste.pound-python.org/show/9394/. >> >> The weird thing is needle executes ok when I use needle_cline as it is in >> a bash shell. I took care to write full paths when pointing to individual >> files and the needle binary but that hasn't solved the error. >> >> Any clues? >> >> >> On Fri, Jul 8, 2011 at 1:46 AM, Wibowo Arindrarto > > wrote: >> >>> Hi Hugo, >>> >>> I think the problem is you tried to concatenate a SeqRecord object and a >>> string object. Do this in 'salva_fasta' instead: >>> >>> SeqIO.write([obj_SeqRecord], obj_SeqRecord.id + '.fasta', 'fasta') >>> >>> And just as an additional input, in the 'processar_lote' method, you can >>> use this to generate a list of absolute file name paths (import the os and >>> glob module beforehand). >>> >>> files = [os.path.abspath(x) for x in glob.glob('*.ab1')] >>> >>> os.path.abspath() returns the absolute file path for a given file, and >>> glob.glob() returns a list of names that matches the given pattern. >>> >>> Hope that helps! >>> --- >>> Wibowo Arindrarto (bow) >>> http://bow.web.id >>> >>> >>> >>> >> > From lawson.jones at gmail.com Wed Jul 13 14:40:18 2011 From: lawson.jones at gmail.com (Daniel Jones) Date: Wed, 13 Jul 2011 14:40:18 -0400 Subject: [Biopython] clustalw align multiple sequences to reference. Message-ID: Hi Biopython users, I have a file with many (~50,000) 200 bp sequences, each of which I would like to align to a fixed reference sequence. I *don't* care about aligning all 50,000 sequences with each other; I only care about aligning each one with the reference sequence. I can't figure out a way to do this without generating 50,000 files, which seems like ridiculous unnecessary overhead. It seems like ClustalW's interface is quite inflexible in demanding separate input and output files for each alignment, but I don't have much experience using it so maybe I'm completely missing something. Incidentally, I'm not wedded to the idea of using ClustalW, so if there's an alternate alignment program that would make this easier, I'd certainly be open to trying it. Thanks, Daniel Jones From eric.talevich at gmail.com Wed Jul 13 16:38:42 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 13 Jul 2011 16:38:42 -0400 Subject: [Biopython] clustalw align multiple sequences to reference. In-Reply-To: References: Message-ID: On Wed, Jul 13, 2011 at 2:40 PM, Daniel Jones wrote: > Hi Biopython users, > I have a file with many (~50,000) 200 bp sequences, each of which I would > like to align to a fixed reference sequence. I *don't* care about aligning > all 50,000 sequences with each other; I only care about aligning each one > with the reference sequence. I can't figure out a way to do this without > generating 50,000 files, which seems like ridiculous unnecessary overhead. > It seems like ClustalW's interface is quite inflexible in demanding > separate > input and output files for each alignment, but I don't have much experience > using it so maybe I'm completely missing something. > > Incidentally, I'm not wedded to the idea of using ClustalW, so if there's > an > alternate alignment program that would make this easier, I'd certainly be > open to trying it. > > Are these reads from sequencing? If so, then BWA or Bowtie might be what you want: http://bio-bwa.sourceforge.net/ http://bowtie-bio.sourceforge.net/index.shtml If not, then you could try BLAST with your reference sequence as the query and the short sequences as your database. Cheers, Eric From p.j.a.cock at googlemail.com Wed Jul 13 17:44:55 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 13 Jul 2011 22:44:55 +0100 Subject: [Biopython] clustalw align multiple sequences to reference. In-Reply-To: References: Message-ID: On Wednesday, July 13, 2011, Daniel Jones wrote: > Hi Biopython users, > I have a file with many (~50,000) 200 bp sequences, each of which I would > like to align to a fixed reference sequence. I *don't* care about aligning > all 50,000 sequences with each other; I only care about aligning each one > with the reference sequence. I can't figure out a way to do this without > generating 50,000 files, which seems like ridiculous unnecessary overhead. > It seems like ClustalW's interface is quite inflexible in demanding separate > input and output files for each alignment, but I don't have much experience > using it so maybe I'm completely missing something. > > Incidentally, I'm not wedded to the idea of using ClustalW, so if there's an > alternate alignment program that would make this easier, I'd certainly be > open to trying it. > > Thanks, > Daniel Jones You need a pairwise alignment tool. Perhaps needle or water from the EMBOSS suite, or Biopython's pairwise2 module would be suitable (not in the tutorial, read the API docs). However, as Eric suggested, an NGS alignment tool might be more appropriate. Peter From mnemonico at posthocergopropterhoc.net Thu Jul 14 00:19:22 2011 From: mnemonico at posthocergopropterhoc.net (A M Torres, Hugo) Date: Thu, 14 Jul 2011 01:19:22 -0300 Subject: [Biopython] clustalw align multiple sequences to reference. In-Reply-To: References: Message-ID: I am having the same problem. First I was running "needle" on each sequencing data and its reference then I thought of generating a big fasta containing all sequences and then running MUSCLE. Like Daniel Jones I don't mind that sequences are aligned against each other but they should be aligned properly against the reference sequence. On a side note: I have sequenced DNA data which I need to align to a reference sequence and look for mutations. I am confused. Should I use a global alignment algorithm or a local alignment algorithm? I have data for forward and reverse strands, so it could be useful if I had both aligned with the reference in a single file. I don't mean to hijack the thread, but this seems exactly like my problem. Excuse me for any inconvenience. Hugo Torres On Wed, Jul 13, 2011 at 6:44 PM, Peter Cock wrote: > On Wednesday, July 13, 2011, Daniel Jones wrote: > > Hi Biopython users, > > I have a file with many (~50,000) 200 bp sequences, each of which I would > > like to align to a fixed reference sequence. I *don't* care about > aligning > > all 50,000 sequences with each other; I only care about aligning each one > > with the reference sequence. I can't figure out a way to do this without > > generating 50,000 files, which seems like ridiculous unnecessary > overhead. > > It seems like ClustalW's interface is quite inflexible in demanding > separate > > input and output files for each alignment, but I don't have much > experience > > using it so maybe I'm completely missing something. > > > > Incidentally, I'm not wedded to the idea of using ClustalW, so if there's > an > > alternate alignment program that would make this easier, I'd certainly be > > open to trying it. > > > > Thanks, > > Daniel Jones > > You need a pairwise alignment tool. Perhaps needle or water from the > EMBOSS suite, or Biopython's pairwise2 module would be suitable (not > in the tutorial, read the API docs). > > However, as Eric suggested, an NGS alignment tool might be more > appropriate. > > Peter > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From gori at cs.ru.nl Thu Jul 14 07:21:31 2011 From: gori at cs.ru.nl (Fabio Gori) Date: Thu, 14 Jul 2011 13:21:31 +0200 Subject: [Biopython] Parsing FASTA records based on headers In-Reply-To: References: <201107111808.00013.gori@cs.ru.nl> Message-ID: <201107141321.31160.gori@cs.ru.nl> The condition "if record.id in SelectedSequencesId " works fine now, thank you. Fabio On Monday, July 11, 2011 09:51:09 pm Peter Cock wrote: > On Mon, Jul 11, 2011 at 5:07 PM, Fabio Gori wrote: > > Hi all, > > > > I tried to parse a FASTA file to select the sequences whose headers > > satisfy a condition. > > > > The condition is that the first word of the header belongs to a list > > named SelectedSequencesId. > > > > In the page http://biopython.org/wiki/SeqIO, I found this example, where > > the condition is that sequence length <300: > > > > ... > > > > so I tried to substitute line 5 with > > 5 record.id.split()[0] in SelectedSequencesId) > > The SeqIO parse uses the first word of the ">" line as the id, > so all you need is this: record.id in SelectedSequencesId > rather than: len(record.seq) < 300 > > > But it did not work. > > In what way? Did you also change the format to "fasta" > as Dorota pointed out? > > Peter -- F. Gori, PhD student Intelligent Systems ICIS (Institute for Computing and Information Sciences) Radboud University Nijmegen Post Address: Intelligent Systems Postbus 9010 6500 GL Nijmegen The Netherlands Visiting Address: Room HG02.517 Faculty of Science Heyendaalseweg 135 6525 AJ Nijmegen Tel.: +31 (0)24 36 52703 E-mail: gori at cs.ru.nl Home Page: http://www.cs.ru.nl/~gori/ From srivastavaisha.06 at gmail.com Fri Jul 15 03:29:42 2011 From: srivastavaisha.06 at gmail.com (isha srivastava) Date: Fri, 15 Jul 2011 12:59:42 +0530 Subject: [Biopython] Query Message-ID: Hello, I am new user of BioPython. I have Downloaded the latest version of BioPython i.e. BioPython 1.57. fallowed all the instructions according to tutorial. but wen i am running the program : >>> from Bio import SeqIO >>> for seq_record in SeqIO.parse("ls_orchid.fasta", "fasta"): ... print seq_record.id ... print repr(seq_record.seq) ... print len(seq_record) ... it is showing error as fallows :- Traceback (most recent call last): File "", line 1, in File "/usr/lib/pymodules/python2.6/Bio/SeqIO/__init__.py", line 424, in parse raise TypeError("Need a file handle, not a string (i.e. not a filename)") TypeError: Need a file handle, not a string (i.e. not a filename) Sir, how to solve this error? Kindly make a soon reply. With due regards, Isha From anaryin at gmail.com Fri Jul 15 03:44:02 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 15 Jul 2011 09:44:02 +0200 Subject: [Biopython] Query In-Reply-To: References: Message-ID: Hello Isha, As the error message says, you should not provide the filename to the function, but instead a file handle. In other words, you need to open the file first and then provide this to the function. handle = open("ls_orchid.fasta") > for seq_record in SeqIO.parse(handle, "fasta"): > blablabla Regards, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Fri, Jul 15, 2011 at 9:29 AM, isha srivastava < srivastavaisha.06 at gmail.com> wrote: > Hello, > > I am new user of BioPython. > I have Downloaded the latest version of BioPython i.e. BioPython 1.57. > fallowed all the instructions according to tutorial. > but wen i am running the program : > > >>> from Bio import SeqIO > >>> for seq_record in SeqIO.parse("ls_orchid.fasta", "fasta"): > ... print seq_record.id > ... print repr(seq_record.seq) > ... print len(seq_record) > ... > > > it is showing error as fallows :- > > Traceback (most recent call last): > File "", line 1, in > File "/usr/lib/pymodules/python2.6/Bio/SeqIO/__init__.py", line 424, in > parse > raise TypeError("Need a file handle, not a string (i.e. not a > filename)") > TypeError: Need a file handle, not a string (i.e. not a filename) > > Sir, how to solve this error? > Kindly make a soon reply. > > > With due regards, > Isha > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Fri Jul 15 05:37:03 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 15 Jul 2011 10:37:03 +0100 Subject: [Biopython] Query In-Reply-To: References: Message-ID: On Fri, Jul 15, 2011 at 8:29 AM, isha srivastava wrote: > ? ?Hello, > > ? ?I am new user of BioPython. > ? ?I have Downloaded the latest version of BioPython ?i.e. BioPython 1.57. > ? ?fallowed all the instructions according to tutorial. > ? ?but wen i am running the program : > > ? ? >>> from Bio import SeqIO > ? ? >>> for seq_record in SeqIO.parse("ls_orchid.fasta", "fasta"): > ? ? ... ? ? ? ? print seq_record.id > ? ? ... ? ? ? ? print repr(seq_record.seq) > ? ? ... ? ? ? ? print len(seq_record) > ? ? ... > > > ? ?it is showing error as fallows :- > > ? ?Traceback (most recent call last): > ? ? File "", line 1, in > ? ? File "/usr/lib/pymodules/python2.6/Bio/SeqIO/__init__.py", line 424, in > parse > ? ? raise TypeError("Need a file handle, not a string (i.e. not a > filename)") > ? ?TypeError: Need a file handle, not a string (i.e. not a filename) > > Sir, how to solve this error? That happens on older versions of Biopython - you can check what is being used within python with: import Bio print Bio.__version__ My guess is your install of Biopython 1.57 hasn't worked properly, and an older version is still being used. Peter From p.j.a.cock at googlemail.com Fri Jul 15 08:48:36 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 15 Jul 2011 13:48:36 +0100 Subject: [Biopython] Query In-Reply-To: References: Message-ID: On Fri, Jul 15, 2011 at 10:37 AM, Peter Cock wrote: >> >> That happens on older versions of Biopython - you can check >> what is being used within python with: >> >> import Bio >> print Bio.__version__ >> >> My guess is your install of Biopython 1.57 hasn't worked properly, >> and an older version is still being used. >> >> Peter >> On Fri, Jul 15, 2011 at 12:56 PM, isha srivastava wrote: > Hi Peter, > > you were right. i checked the version of biopython and it is showing version > 1.53. > how to get rid of this problem? > > hope for your soon reply. > > Thanx so much > isha > Hi again, Please CC the mailing list rather than emailing me directly with things like this. How was Biopython installed? What OS are you using? Peter From srivastavaisha.06 at gmail.com Fri Jul 15 14:03:18 2011 From: srivastavaisha.06 at gmail.com (isha srivastava) Date: Fri, 15 Jul 2011 23:33:18 +0530 Subject: [Biopython] Hi Message-ID: Hello Peter, I am sorry . Further i will not mail on your email directly. I am using UBUNTU. ya I ll reinstall BioPython 1.57 again. Regards, Isha From p.j.a.cock at googlemail.com Sun Jul 17 16:19:02 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 17 Jul 2011 21:19:02 +0100 Subject: [Biopython] Hi In-Reply-To: References: Message-ID: On Fri, Jul 15, 2011 at 7:03 PM, isha srivastava wrote: > Hello Peter, > > I am sorry . Further i will not mail on your email directly. One reason I asked was I am at a conference at the moment, so it was possible someone else might have been able to try to help first. The main reason is the mailing list is public, and people can search it for solutions to this kind of problem in future. > > I am using UBUNTU. ya I ll reinstall BioPython 1.57 again. > Was Biopython 1.53 installed using the Ubuntu package system? If so, please use the package manager to uninstall this old version before trying to reinstall Biopython 1.57. Peter From malvikasharan at gmail.com Sun Jul 17 17:53:03 2011 From: malvikasharan at gmail.com (malvika sharan) Date: Sun, 17 Jul 2011 23:53:03 +0200 Subject: [Biopython] PSI-BLAST Message-ID: Hi All, I am trying to code WWW version of PSI-BLAST. My intention is to integrate PSI-Blast in my tool in order to find distant homologs. i some how can not run the commandline version of psi blast (from Bio.Blast.Applications import *NcbipsiblastCommandline*) as it states the error *" **Python* error: ImportError: *cannot* import *name **NcbipsiblastCommandline "* * * *There are not much information available about it. Can somebody figure out why?* * * *Malvika* * * From senthil.debian at gmail.com Sun Jul 17 18:23:50 2011 From: senthil.debian at gmail.com (Senthil Kumar M) Date: Sun, 17 Jul 2011 15:23:50 -0700 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: On Sun, Jul 17, 2011 at 2:53 PM, malvika sharan wrote: > Hi All, > > I am trying to code WWW version of PSI-BLAST. My intention is to integrate > PSI-Blast in my tool in order to find distant homologs. > > i some how can not run the commandline version of psi blast (from > Bio.Blast.Applications import *NcbipsiblastCommandline*) as it states the > error > *" **Python* error: ImportError: *cannot* import *name > **NcbipsiblastCommandline > "* > * > * > *There are not much information available about it. Can somebody figure out > why?* > * > * > *Malvika* > * > * Hi, According to the Biopython tutorial http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc93 (2 April 2011 version): "You can run the standalone verion of PSI-BLAST (the legacy NCBI command line tool blastpgp, or its replacement psiblast) using the wrappers in Bio.Blast.Applications module. At the time of writing, the NCBI do not appear to support tools running a PSI-BLAST search via the internet." HTH, Senthil -/ "You know, it's at times like this when I'm trapped in a Vogon airlock with a man from Betelgeuse and about to die of asphyxiation in deep space that I really wish I'd listened to what my mother told me when I was young!" "Why, what did she tell you?" "I don't know, I didn't listen." -- Douglas Adams, "The Hitchhiker's Guide to the Galaxy" From malvikasharan at gmail.com Sun Jul 17 18:58:57 2011 From: malvikasharan at gmail.com (malvika sharan) Date: Mon, 18 Jul 2011 00:58:57 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: Thanks Senthil, but that is what i mentioned that i am trying to use wrappers in Bio.Blast.Applications module. It just does not seem to work. I have the whole Biopython installed and i am using Python 2.7. the problem occurs while importing itself, i mentioned the error as well. here is the whole error description: Traceback (most recent call last): File "psiBlast.py", line 6, in from Bio.Blast.Applications import NcbipsiblastCommandline ImportError: cannot importname NcbipsiblastCommandline i just cant figure out why. On Mon, Jul 18, 2011 at 12:23 AM, Senthil Kumar M wrote: > On Sun, Jul 17, 2011 at 2:53 PM, malvika sharan > wrote: > > Hi All, > > > > I am trying to code WWW version of PSI-BLAST. My intention is to > integrate > > PSI-Blast in my tool in order to find distant homologs. > > > > i some how can not run the commandline version of psi blast (from > > Bio.Blast.Applications import *NcbipsiblastCommandline*) as it states the > > error > > *" **Python* error: ImportError: *cannot* import *name > > **NcbipsiblastCommandline > > "* > > * > > * > > *There are not much information available about it. Can somebody figure > out > > why?* > > * > > * > > *Malvika* > > * > > * > > Hi, > > According to the Biopython tutorial > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc93 (2 April > 2011 version): > > "You can run the standalone verion of PSI-BLAST (the legacy NCBI > command line tool blastpgp, or its replacement psiblast) using the > wrappers in Bio.Blast.Applications module. > At the time of writing, the NCBI do not appear to support tools > running a PSI-BLAST search via the internet." > > HTH, > > Senthil > > -/ > "You know, it's at times like this when I'm trapped in a Vogon > airlock with a man from Betelgeuse and about to die of asphyxiation in > deep space that I really wish I'd listened to what my mother told me > when I was young!" > "Why, what did she tell you?" > "I don't know, I didn't listen." > -- Douglas Adams, "The Hitchhiker's Guide to the Galaxy" > From senthil.debian at gmail.com Sun Jul 17 19:20:17 2011 From: senthil.debian at gmail.com (Senthil Kumar M) Date: Sun, 17 Jul 2011 16:20:17 -0700 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: On Sun, Jul 17, 2011 at 3:58 PM, malvika sharan wrote: > Thanks Senthil, but that is what i mentioned that i am trying to use > wrappers in Bio.Blast.Applications module. It just does not seem to work. I > have the whole Biopython installed and i am using Python 2.7. > > the problem occurs while importing itself, i mentioned the error as well. > here is the whole error description: > Traceback (most recent call last): > ??? File "psiBlast.py", line 6, in > ??????? from Bio.Blast.Applications import NcbipsiblastCommandline > ImportError: cannot importname NcbipsiblastCommandline > > i just cant figure out why. Hi, Could you please provide more details such as your operating system, python/biopython versions and a minimal example of your script? On my system for example, 'from Bio.Blast.Applications import NcbipsiblastCommandline' does not raise an error. senthil at deepthought ~ $ python Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import Bio >>> print Bio.__version__ 1.57 >>> from Bio.Blast.Applications import NcbipsiblastCommandline >>> senthil at deepthought ~ $ uname -srm Linux 2.6.35-22-generic x86_64 HTH Senthil -/ Time is an illusion, lunchtime doubly so. -- The Hitchhiker's Guide to the Galaxy From malvikasharan at gmail.com Sun Jul 17 19:27:33 2011 From: malvikasharan at gmail.com (malvika sharan) Date: Mon, 18 Jul 2011 01:27:33 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. And well as i said that the the error shows at the import. import os, sys from Bio import SeqIO from Bio import Entrez from Bio.Blast import NCBIWWW* *from Bio.Blast.NCBIStandalone import PSIBlastParser *from Bio.Blast.Applications import NcbipsiblastCommandline* from Bio.Blast import NCBIXML its irrelevant with my programming cos the program dies at line 6. On Mon, Jul 18, 2011 at 1:20 AM, Senthil Kumar M wrote: > On Sun, Jul 17, 2011 at 3:58 PM, malvika sharan > wrote: > > Thanks Senthil, but that is what i mentioned that i am trying to use > > wrappers in Bio.Blast.Applications module. It just does not seem to work. > I > > have the whole Biopython installed and i am using Python 2.7. > > > > the problem occurs while importing itself, i mentioned the error as well. > > here is the whole error description: > > Traceback (most recent call last): > > File "psiBlast.py", line 6, in > > from Bio.Blast.Applications import NcbipsiblastCommandline > > ImportError: cannot importname NcbipsiblastCommandline > > > > i just cant figure out why. > > Hi, > > Could you please provide more details such as your operating system, > python/biopython versions and a minimal example of your script? > > On my system for example, 'from Bio.Blast.Applications import > NcbipsiblastCommandline' does not raise an error. > > senthil at deepthought ~ $ python > Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) > [GCC 4.4.5] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import Bio > >>> print Bio.__version__ > 1.57 > >>> from Bio.Blast.Applications import NcbipsiblastCommandline > >>> > > senthil at deepthought ~ $ uname -srm > Linux 2.6.35-22-generic x86_64 > > HTH > > Senthil > > -/ > Time is an illusion, lunchtime doubly so. > -- The Hitchhiker's Guide to the Galaxy > From malvikasharan at gmail.com Sun Jul 17 19:48:20 2011 From: malvikasharan at gmail.com (malvika sharan) Date: Mon, 18 Jul 2011 01:48:20 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: please ignore the asterisk * , it appeared due to ctrl b. On Mon, Jul 18, 2011 at 1:27 AM, malvika sharan wrote: > My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. > > And well as i said that the the error shows at the import. > import os, sys > from Bio import SeqIO > from Bio import Entrez > from Bio.Blast import NCBIWWW* > *from Bio.Blast.NCBIStandalone import PSIBlastParser > > *from Bio.Blast.Applications import NcbipsiblastCommandline* > from Bio.Blast import NCBIXML > > its irrelevant with my programming cos the program dies at line 6. > > > On Mon, Jul 18, 2011 at 1:20 AM, Senthil Kumar M > wrote: > >> On Sun, Jul 17, 2011 at 3:58 PM, malvika sharan >> wrote: >> > Thanks Senthil, but that is what i mentioned that i am trying to use >> > wrappers in Bio.Blast.Applications module. It just does not seem to >> work. I >> > have the whole Biopython installed and i am using Python 2.7. >> > >> > the problem occurs while importing itself, i mentioned the error as >> well. >> > here is the whole error description: >> > Traceback (most recent call last): >> > File "psiBlast.py", line 6, in >> > from Bio.Blast.Applications import NcbipsiblastCommandline >> > ImportError: cannot importname NcbipsiblastCommandline >> > >> > i just cant figure out why. >> >> Hi, >> >> Could you please provide more details such as your operating system, >> python/biopython versions and a minimal example of your script? >> >> On my system for example, 'from Bio.Blast.Applications import >> NcbipsiblastCommandline' does not raise an error. >> >> senthil at deepthought ~ $ python >> Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) >> [GCC 4.4.5] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >> >>> import Bio >> >>> print Bio.__version__ >> 1.57 >> >>> from Bio.Blast.Applications import NcbipsiblastCommandline >> >>> >> >> senthil at deepthought ~ $ uname -srm >> Linux 2.6.35-22-generic x86_64 >> >> HTH >> >> Senthil >> >> -/ >> Time is an illusion, lunchtime doubly so. >> -- The Hitchhiker's Guide to the Galaxy >> > > From alrakib at hotmail.com Mon Jul 18 03:54:23 2011 From: alrakib at hotmail.com (L. Zarel) Date: Mon, 18 Jul 2011 09:54:23 +0200 Subject: [Biopython] translating a FASTA file of CDS entries Message-ID: Good morning, I'm Luis Zarel, from Spain. I'm a new user. My question is: How I can make a translation into FASTA from CCDS entries? Any examples? Thank you very much. From p.j.a.cock at googlemail.com Mon Jul 18 05:14:30 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 18 Jul 2011 10:14:30 +0100 Subject: [Biopython] translating a FASTA file of CDS entries In-Reply-To: References: Message-ID: On Monday, July 18, 2011, L. Zarel wrote: > > Good morning, > > I'm Luis Zarel, from Spain. I'm a new user. > > My question is: How I can make a translation into FASTA from CCDS entries? Any examples? > > Thank you very much. > Hi Luis, Could you clarify your question? Did you mean CDS (coding sequences), for example from GenBank files? Peter From alrakib at hotmail.com Mon Jul 18 06:54:41 2011 From: alrakib at hotmail.com (L. Zarel) Date: Mon, 18 Jul 2011 12:54:41 +0200 Subject: [Biopython] translating a FASTA file of CDS entries In-Reply-To: References: , Message-ID: Yes, I mean CDS > Hi Luis, > > Could you clarify your question? Did you mean CDS (coding sequences), > for example from GenBank files? > > Peter From p.j.a.cock at googlemail.com Mon Jul 18 08:07:37 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 18 Jul 2011 13:07:37 +0100 Subject: [Biopython] translating a FASTA file of CDS entries In-Reply-To: References: Message-ID: On Monday, July 18, 2011, L. Zarel wrote: > > Yes, I mean CDS > Hi Luis, So you have a FASTA file containing CDS nucleotide sequences, and you want to turn this into a FASTA file containing translated protein sequences? Have you looked at the documentation? http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf In particular, in the Cookbook chapter there is a section "Translating a FASTA file of CDS entries". If not, what form do you have your CDS data in? For example, do you have an annotated GenBank file, or a tabular file of co-ordinates and strands, or a GFF3 file, or something else? Peter From srivastavaisha.06 at gmail.com Mon Jul 18 08:48:24 2011 From: srivastavaisha.06 at gmail.com (isha srivastava) Date: Mon, 18 Jul 2011 18:18:24 +0530 Subject: [Biopython] Biopyhton 1.57 version problem Message-ID: Hello Sir / Mam Sir , last days i asked problem about biopython 1.57 version that i have downloaded the latest version that is 1.57 of biopython. when i am giving the print Bio.__version__ command within the BioPython directory this is showing correct version 1.57. But when i am giving this command outside the biopyton directory then this is showing biopython version 1.53. I searched about the problem and found that my python 2.6.5 has already some files of biopython such as --- 1) python-biopython 1.53-1 Python library for bioinformatics 2) python-biopython-doc 1.53-1 Documentation for the Biopython library 3) python-biopython-sql 1.53-1 Biopython support for the BioSQL database sc I think the problem is because of these already existing biopython 1.53 files. Am i right? If yes then how to get rid off this proble,? If no then can anyone suggest me whats the problem and solution? Kindly reply me soon. Thanku very much. regards isha From anaryin at gmail.com Mon Jul 18 09:00:53 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 18 Jul 2011 15:00:53 +0200 Subject: [Biopython] Biopyhton 1.57 version problem In-Reply-To: References: Message-ID: Dear Isha, The problem most likely is that you didn't completely remove 1.53, and your PYTHONPATH variable is likely still pointing to it. When you start python from the Biopython directory and import Bio, it will not load system-wide packages but instead the one it has at hand. I'd suggest you: 1) Do sudo aptitude remove/purge python-biopython At this point, if you do: cd $HOME python >> import Bio It should give an ImportError. If not, there is still some version somewhere that you need to remove. 2) Install Biopython 1.57 from the directory you downloaded, using sudo make install and not giving any --home option during setup. This will ensure it is installed system-wide and accessible from any python call. Best, Jo?o From eric.talevich at gmail.com Mon Jul 18 11:13:54 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 18 Jul 2011 11:13:54 -0400 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: On Sun, Jul 17, 2011 at 7:27 PM, malvika sharan wrote: > My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. > That's a very old version of Biopython. Are you able to install a more recent version? And well as i said that the the error shows at the import. > import os, sys > from Bio import SeqIO > from Bio import Entrez > from Bio.Blast import NCBIWWW* > *from Bio.Blast.NCBIStandalone import PSIBlastParser > *from Bio.Blast.Applications import NcbipsiblastCommandline* > from Bio.Blast import NCBIXML > If the earlier imports of SeqIO and Entrez work, then NcbipsiblastCommandline probably wasn't included in Biopython 1.50, I would guess. On Mon, Jul 18, 2011 at 1:20 AM, Senthil Kumar M > wrote: > > > On Sun, Jul 17, 2011 at 3:58 PM, malvika sharan > > > wrote: > > > Thanks Senthil, but that is what i mentioned that i am trying to use > > > wrappers in Bio.Blast.Applications module. It just does not seem to > work. > > I > > > have the whole Biopython installed and i am using Python 2.7. > > > > > > the problem occurs while importing itself, i mentioned the error as > well. > > > here is the whole error description: > > > Traceback (most recent call last): > > > File "psiBlast.py", line 6, in > > > from Bio.Blast.Applications import NcbipsiblastCommandline > > > ImportError: cannot importname NcbipsiblastCommandline > > > > > > i just cant figure out why. > > > > Hi, > > > > Could you please provide more details such as your operating system, > > python/biopython versions and a minimal example of your script? > > > > On my system for example, 'from Bio.Blast.Applications import > > NcbipsiblastCommandline' does not raise an error. > > > > senthil at deepthought ~ $ python > > Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) > > [GCC 4.4.5] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> import Bio > > >>> print Bio.__version__ > > 1.57 > > >>> from Bio.Blast.Applications import NcbipsiblastCommandline > > >>> > > > > senthil at deepthought ~ $ uname -srm > > Linux 2.6.35-22-generic x86_64 > > > > HTH > > > > Senthil > > > > -/ > > Time is an illusion, lunchtime doubly so. > > -- The Hitchhiker's Guide to the Galaxy > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From akooser at unm.edu Mon Jul 18 11:15:39 2011 From: akooser at unm.edu (Ara Kooser) Date: Mon, 18 Jul 2011 09:15:39 -0600 Subject: [Biopython] NCBIWWW genbank files In-Reply-To: References: Message-ID: Good morning all, I am in the process of writing some code for pulling down files from NCBI. I wrote this based on the Biopython manual: from Bio.Blast import NCBIWWW def query(): file_query = raw_input("Please enter the name of your sequence file: ") fasta_seq = open(file_query).read() result_handle = NCBIWWW.qblast("blastn","nr", fasta_seq, expect=1e-30, hitlist_size=20000) save_file = open("blast_results.xml","w") save_file.write(result_handle.read()) save_file.close() result_handle.close() query() Everything works fine. But I was wondering is there a way to pull down the Genbank files using this method. I used the help(NCBIWWW.qblast) to look at all the options but didn't see the Genbank file format. Downstream in the program I use information extracted from both the .xml and genbank files since they contain different information to we need. I was hoping to combine everything into one program. Currently we use the web interface to pull down the xml and genbank files. Thanks! Ara From srivastavaisha.06 at gmail.com Tue Jul 19 01:34:43 2011 From: srivastavaisha.06 at gmail.com (isha srivastava) Date: Tue, 19 Jul 2011 11:04:43 +0530 Subject: [Biopython] Biopyhton 1.57 version problem In-Reply-To: References: Message-ID: Hi Thanx for ur answer sir. ya have already tried sudo apt-get remove python-biopython.it was worked properly. even i removes python-biopython-doc and sql too. Now when i m giving the command --> >>>import Bio >>>print Bio.__version__ this is showing error . which is good. but import Bio is still working . which is not gd and it means biopython is not completely removed .. thats y i m giving the fallowing command to findout all biopython named files in my PC -- $ dpkg --list | grep 'biopython' this command is showing no any result means there is no any biopyhton file.If there is no any biopython directory then why ( import Bio ) is still working. Hope for ur soon reply. Thanx isha From p.j.a.cock at googlemail.com Tue Jul 19 01:54:12 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 19 Jul 2011 06:54:12 +0100 Subject: [Biopython] NCBIWWW genbank files In-Reply-To: References: Message-ID: On Monday, July 18, 2011, Ara Kooser wrote: > Good morning all, > > > ? I am in the process of writing some code for pulling down files from NCBI. I wrote this based on the Biopython manual: > > from Bio.Blast import NCBIWWW > > def query(): > ? ?file_query = raw_input("Please enter the name of your sequence file: ") > ? ?fasta_seq = open(file_query).read() > ? ?result_handle = NCBIWWW.qblast("blastn","nr", fasta_seq, expect=1e-30, hitlist_size=20000) > ? ?save_file = open("blast_results.xml","w") > ? ?save_file.write(result_handle.read()) > ? ?save_file.close() > ? ?result_handle.close() > > > query() > > Everything works fine. But I was wondering is there a way to pull down the Genbank files using this method. I used the help(NCBIWWW.qblast) to look at all the options but didn't see the Genbank file format. Downstream in the program I use information extracted from both the .xml and genbank files since they contain different information to we need. I was hoping to combine everything into one program. Currently we use the web interface to pull down the xml and genbank files. > > Thanks! > Ara > Hi Ara, BLAST does not offer GenBank as an output format. Assuming I have understood your aim, this can be done as a multi step process: Run BLAST, extract a list of matching record accessions, download these records in GenBank format from the NCBI. You may find it useful to request tabular output from BLAST and extract the match names (column two). This should be faster as the XML version of the data is much larger. Also to avoid trying to download the same GenBank record more than once, I would use a Python set rather than a Python list object when recording this information from the BLAST file. You can use the NCBI Entrez utilities API to download GenBank files, see Bio.Entrez in the Biopython tutorial, function efetch. Peter From p.j.a.cock at googlemail.com Tue Jul 19 02:03:44 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 19 Jul 2011 07:03:44 +0100 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: Senthil wrote: > > Hi, > > According to the Biopython tutorial > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc93 (2 April > 2011 version): > > "You can run the standalone verion of PSI-BLAST (the legacy NCBI > command line tool blastpgp, or its replacement psiblast) using the > wrappers in Bio.Blast.Applications module. > At the time of writing, the NCBI do not appear to support tools > running a PSI-BLAST search via the internet." > > HTH, > > Senthil > That might be possibly after all, see this information reported recently by F?bio Madeira: https://github.com/biopython/biopython/pull/12 It should be possible with any recent Biopython, Fabio's change was to the documentation for the blast function only. We'd want to update the tutorial too ideally. Peter From p.j.a.cock at googlemail.com Tue Jul 19 02:08:59 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 19 Jul 2011 07:08:59 +0100 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: On Monday, July 18, 2011, Eric Talevich wrote: > On Sun, Jul 17, 2011 at 7:27 PM, malvika sharan wrote: > >> My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. >> > > That's a very old version of Biopython. Are you able to install a more > recent version? > > > And well as i said that the the error shows at the import. >> import os, sys >> from Bio import SeqIO >> from Bio import Entrez >> from Bio.Blast import NCBIWWW* >> *from Bio.Blast.NCBIStandalone import PSIBlastParser >> *from Bio.Blast.Applications import NcbipsiblastCommandline* >> from Bio.Blast import NCBIXML >> > > If the earlier imports of SeqIO and Entrez work, then > NcbipsiblastCommandline probably wasn't included in Biopython 1.50, I would > guess. > Correct. The BLAST+ wrappers were added in Biopython 1.53. You will need to update it, ideally to the current release, 1.57 In general if some imports work and others fail, your library Is either too old (and what you want didn't exist in the old version), or too new (the code you want to use was obsolete and removed). Peter From anaryin at gmail.com Tue Jul 19 02:48:29 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 19 Jul 2011 08:48:29 +0200 Subject: [Biopython] Biopyhton 1.57 version problem In-Reply-To: References: Message-ID: What error does print Bio.__version__ give? In which directory are you executing those python commands? From anaryin at gmail.com Tue Jul 19 05:28:25 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 19 Jul 2011 11:28:25 +0200 Subject: [Biopython] Biopyhton 1.57 version problem In-Reply-To: References: Message-ID: Hello Isha, Where did you run the Python interpreter (i.e. in which directory were you when you typed "python" and then "import Bio")? Did you install the new version of Biopython correctly? Go to that ~/Downloads/python/biopython-1.57/ directory and run sudo python setup.py install (assuming you have super user privileges). Regards, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Tue, Jul 19, 2011 at 11:24 AM, isha srivastava < srivastavaisha.06 at gmail.com> wrote: > error is --> > > > Traceback (most recent call last): > File "", line 1, in > AttributeError: 'module' object has no attribute '__version__' > > in python directory i made a biopython directory . > > isha at desktop:~/Downloads/python$ ls > > biopython-1.57 numpy-1.6.1rc3 python.mk Tutorial_files > debian_defaults ori_fasta pyversions.py > Tutorial.html > egenix-mx-base-3.2.0 prog reportlab-2.5 > fetch.py pyProgram.py runtime.d > > isha at -desktop:~/Downloads/python$ cd biopython-1.57/ > isha at -desktop:~/Downloads/python/biopython-1.57$ ls > > Bio build DEPRECATED LICENSE NEWS README setup.py > BioSQL CONTRIB Doc MANIFEST.in PKG-INFO Scripts Tests > > From malvikasharan at gmail.com Tue Jul 19 05:49:53 2011 From: malvikasharan at gmail.com (malvika sharan) Date: Tue, 19 Jul 2011 11:49:53 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: Thank you Peter and Eric. you are right and i think i should have known this :( I updated Biopython. and revising my codes. it should work now. Malvika On Tue, Jul 19, 2011 at 8:08 AM, Peter Cock wrote: > On Monday, July 18, 2011, Eric Talevich wrote: > > On Sun, Jul 17, 2011 at 7:27 PM, malvika sharan >wrote: > > > >> My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. > >> > > > > That's a very old version of Biopython. Are you able to install a more > > recent version? > > > > > > And well as i said that the the error shows at the import. > >> import os, sys > >> from Bio import SeqIO > >> from Bio import Entrez > >> from Bio.Blast import NCBIWWW* > >> *from Bio.Blast.NCBIStandalone import PSIBlastParser > >> *from Bio.Blast.Applications import NcbipsiblastCommandline* > >> from Bio.Blast import NCBIXML > >> > > > > If the earlier imports of SeqIO and Entrez work, then > > NcbipsiblastCommandline probably wasn't included in Biopython 1.50, I > would > > guess. > > > > Correct. The BLAST+ wrappers were added in Biopython 1.53. > You will need to update it, ideally to the current release, 1.57 > > In general if some imports work and others fail, your library > Is either too old (and what you want didn't exist in the old > version), or too new (the code you want to use was obsolete > and removed). > > Peter > From anaryin at gmail.com Tue Jul 19 05:54:29 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 19 Jul 2011 11:54:29 +0200 Subject: [Biopython] Biopyhton 1.57 version problem In-Reply-To: References: Message-ID: Dear Isha, If you run the interpreter in biopython-1.57 directory it will load the modules from that folder, even if it is not installed in the system. You have to do "sudo python setup.py install" in the biopython-1.57 directory, wait for completion of the installation, and then move to another directory, say for example your home directory or the Desktop, and there type "python" and try "import Bio". You don't have to remove anything.. Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Tue, Jul 19, 2011 at 11:40 AM, isha srivastava < srivastavaisha.06 at gmail.com> wrote: > Hello Sir > > I run in the terminal. > I typed that in Python directory. > Now i m removing the Biopyhton -1.57 and ll again download and install > that. > > Regards > isha > From akooser at unm.edu Tue Jul 19 11:07:23 2011 From: akooser at unm.edu (Ara Kooser) Date: Tue, 19 Jul 2011 09:07:23 -0600 Subject: [Biopython] NCBIWWW genbank files In-Reply-To: References: Message-ID: Peter, Thanks for the clarification there. I was a little confused. I'll give this a try. Regards, Ara On Jul 18, 2011, at 11:54 PM, Peter Cock wrote: > On Monday, July 18, 2011, Ara Kooser wrote: >> Good morning all, >> >> >> I am in the process of writing some code for pulling down files from NCBI. I wrote this based on the Biopython manual: >> >> from Bio.Blast import NCBIWWW >> >> def query(): >> file_query = raw_input("Please enter the name of your sequence file: ") >> fasta_seq = open(file_query).read() >> result_handle = NCBIWWW.qblast("blastn","nr", fasta_seq, expect=1e-30, hitlist_size=20000) >> save_file = open("blast_results.xml","w") >> save_file.write(result_handle.read()) >> save_file.close() >> result_handle.close() >> >> >> query() >> >> Everything works fine. But I was wondering is there a way to pull down the Genbank files using this method. I used the help(NCBIWWW.qblast) to look at all the options but didn't see the Genbank file format. Downstream in the program I use information extracted from both the .xml and genbank files since they contain different information to we need. I was hoping to combine everything into one program. Currently we use the web interface to pull down the xml and genbank files. >> >> Thanks! >> Ara >> > > Hi Ara, > > BLAST does not offer GenBank as an output format. > > Assuming I have understood your aim, this can be done as a multi step > process: Run BLAST, extract a list of matching record accessions, > download these records in GenBank format from the NCBI. > > You may find it useful to request tabular output from BLAST and > extract the match names (column two). This should be faster as the XML > version of the data is much larger. > > Also to avoid trying to download the same GenBank record more than > once, I would use a Python set rather than a Python list object when > recording this information from the BLAST file. > > You can use the NCBI Entrez utilities API to download GenBank files, > see Bio.Entrez in the Biopython tutorial, function efetch. > > Peter From mollymutant at googlemail.com Wed Jul 20 09:26:33 2011 From: mollymutant at googlemail.com (molly mutant) Date: Wed, 20 Jul 2011 15:26:33 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: hello all, I was also trying to run my program using the same wrapper from Bio.Blast.Application for psiblast commandline. i use the following code but this is not generating XML file. psi_cline = NcbipsiblastCommandline('psiblast', db = 'refseq_protein', query = queryID+".fasta", evalue = 10 , out = queryID+"_psi.xml", outfmt = 7, out_pssm = queryID+"_pssm") p = subprocess.Popen(str(psi_cline),stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=(sys.platform!="win32")) blastParser(p.stdout) i have defined blastParser for parsing XML files which works perfectly with other xml files. i get the following error : Traceback (most recent call last): File "psiBlast.py", line 110, in blastParser(p.stdout) File "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Blast/NCBIXML.py", line 617, in parse raise ValueError("Your XML file was empty") ValueError: Your XML file was empty you can see that i am using python 2.6 and Biopython 1.57. Do you know where am i going incorrect? Molly On Wed, Jul 20, 2011 at 3:07 PM, malvika sharan wrote: > > > ---------- Forwarded message ---------- > > > Thank you Peter and Eric. > > you are right and i think i should have known this :( > I updated Biopython. and revising my codes. it should work now. > > Malvika > > > On Tue, Jul 19, 2011 at 8:08 AM, Peter Cock wrote: > >> On Monday, July 18, 2011, Eric Talevich wrote: >> > On Sun, Jul 17, 2011 at 7:27 PM, malvika sharan < >> malvikasharan at gmail.com>wrote: >> > >> >> My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. >> >> >> > >> > That's a very old version of Biopython. Are you able to install a more >> > recent version? >> > >> > >> > And well as i said that the the error shows at the import. >> >> import os, sys >> >> from Bio import SeqIO >> >> from Bio import Entrez >> >> from Bio.Blast import NCBIWWW* >> >> *from Bio.Blast.NCBIStandalone import PSIBlastParser >> >> *from Bio.Blast.Applications import NcbipsiblastCommandline* >> >> from Bio.Blast import NCBIXML >> >> >> > >> > If the earlier imports of SeqIO and Entrez work, then >> > NcbipsiblastCommandline probably wasn't included in Biopython 1.50, I >> would >> > guess. >> > >> >> Correct. The BLAST+ wrappers were added in Biopython 1.53. >> You will need to update it, ideally to the current release, 1.57 >> >> In general if some imports work and others fail, your library >> Is either too old (and what you want didn't exist in the old >> version), or too new (the code you want to use was obsolete >> and removed). >> >> Peter >> > > > From mollymutant at googlemail.com Wed Jul 20 09:33:04 2011 From: mollymutant at googlemail.com (molly mutant) Date: Wed, 20 Jul 2011 15:33:04 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: Oh, i forgot to mention : queryID here is a protein ID for example NP_010247.1 ' query = queryID+".fasta" ' is a fasta file for this protein. i want to get the XML output from the psi blast. Regards Molly On Wed, Jul 20, 2011 at 3:26 PM, molly mutant wrote: > hello all, > > I was also trying to run my program using the same wrapper from > Bio.Blast.Application for psiblast commandline. > > i use the following code but this is not generating XML file. > psi_cline = NcbipsiblastCommandline('psiblast', db = > 'refseq_protein', query = queryID+".fasta", evalue = 10 , out = > queryID+"_psi.xml", outfmt = 7, out_pssm = queryID+"_pssm") > p = > subprocess.Popen(str(psi_cline),stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=(sys.platform!="win32")) > blastParser(p.stdout) > > i have defined blastParser for parsing XML files which works perfectly with > other xml files. > > i get the following error : > > Traceback (most recent call last): > File "psiBlast.py", line 110, in > blastParser(p.stdout) > > File > "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Blast/NCBIXML.py", > line 617, in parse > raise ValueError("Your XML file was empty") > ValueError: Your XML file was empty > > > you can see that i am using python 2.6 and Biopython 1.57. Do you know > where am i going incorrect? > > Molly > > On Wed, Jul 20, 2011 at 3:07 PM, malvika sharan wrote: > > >> >> ---------- Forwarded message ---------- >> >> >> Thank you Peter and Eric. >> >> you are right and i think i should have known this :( >> I updated Biopython. and revising my codes. it should work now. >> >> Malvika >> >> >> On Tue, Jul 19, 2011 at 8:08 AM, Peter Cock wrote: >> >>> On Monday, July 18, 2011, Eric Talevich wrote: >>> > On Sun, Jul 17, 2011 at 7:27 PM, malvika sharan < >>> malvikasharan at gmail.com>wrote: >>> > >>> >> My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. >>> >> >>> > >>> > That's a very old version of Biopython. Are you able to install a more >>> > recent version? >>> > >>> > >>> > And well as i said that the the error shows at the import. >>> >> import os, sys >>> >> from Bio import SeqIO >>> >> from Bio import Entrez >>> >> from Bio.Blast import NCBIWWW* >>> >> *from Bio.Blast.NCBIStandalone import PSIBlastParser >>> >> *from Bio.Blast.Applications import NcbipsiblastCommandline* >>> >> from Bio.Blast import NCBIXML >>> >> >>> > >>> > If the earlier imports of SeqIO and Entrez work, then >>> > NcbipsiblastCommandline probably wasn't included in Biopython 1.50, I >>> would >>> > guess. >>> > >>> >>> Correct. The BLAST+ wrappers were added in Biopython 1.53. >>> You will need to update it, ideally to the current release, 1.57 >>> >>> In general if some imports work and others fail, your library >>> Is either too old (and what you want didn't exist in the old >>> version), or too new (the code you want to use was obsolete >>> and removed). >>> >>> Peter >>> >> >> >> > From mollymutant at googlemail.com Wed Jul 20 11:30:13 2011 From: mollymutant at googlemail.com (molly mutant) Date: Wed, 20 Jul 2011 17:30:13 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: Can anyone please share the functioning codes for PSI-BLAST using NCBI commandline or suggest me the source from where i can get it?? i am in an urgent need of it :( and i can not find the problem with my command/code. Regards, Molly On Wed, Jul 20, 2011 at 4:19 PM, molly mutant wrote: > if it use cline() command: > > psi_cline = NcbipsiblastCommandline('psiblast', db = > 'refseq_protein',\ > query = queryID+".fasta", > evalue = 10 , \ > out = queryID+"_psi.xml", > outfmt = 7, \ > out_pssm = queryID+"_pssm") > str(psi_cline) > psi_cline() > > the following error occurs : > Traceback (most recent call last): > File "psiBlast.py", line 113, in > psi_cline() > File > "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Application/__init__.py", > line 432, in __call__ > stdout_str, stderr_str) > Bio.Application.ApplicationError: Command 'psiblast -out NP_012649_psi.xml > -outfmt 7 -query NP_012649.fasta -db refseq_protein -evalue 10 -out_pssm > NP_012649_pssm' returned non-zero exit status 127, '/bin/sh: psiblast: not > found' > > I think this error stands for that the command is not found, which means > that my command is incorrect, am i right?? > > > > On Wed, Jul 20, 2011 at 3:33 PM, molly mutant wrote: > >> Oh, i forgot to mention : >> >> queryID here is a protein ID for example NP_010247.1 >> ' query = queryID+".fasta" ' is a fasta file for this protein. >> i want to get the XML output from the psi blast. >> >> Regards >> Molly >> >> >> On Wed, Jul 20, 2011 at 3:26 PM, molly mutant > > wrote: >> >>> hello all, >>> >>> I was also trying to run my program using the same wrapper from >>> Bio.Blast.Application for psiblast commandline. >>> >>> i use the following code but this is not generating XML file. >>> psi_cline = NcbipsiblastCommandline('psiblast', db = >>> 'refseq_protein', query = queryID+".fasta", evalue = 10 , out = >>> queryID+"_psi.xml", outfmt = 7, out_pssm = queryID+"_pssm") >>> p = >>> subprocess.Popen(str(psi_cline),stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=(sys.platform!="win32")) >>> blastParser(p.stdout) >>> >>> i have defined blastParser for parsing XML files which works perfectly >>> with other xml files. >>> >>> i get the following error : >>> >>> Traceback (most recent call last): >>> File "psiBlast.py", line 110, in >>> blastParser(p.stdout) >>> >>> File >>> "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Blast/NCBIXML.py", >>> line 617, in parse >>> raise ValueError("Your XML file was empty") >>> ValueError: Your XML file was empty >>> >>> >>> you can see that i am using python 2.6 and Biopython 1.57. Do you know >>> where am i going incorrect? >>> >>> Molly >>> >>> On Wed, Jul 20, 2011 at 3:07 PM, malvika sharan >> > wrote: >>> >>> >>>> >>>> ---------- Forwarded message ---------- >>>> >>>> >>>> Thank you Peter and Eric. >>>> >>>> you are right and i think i should have known this :( >>>> I updated Biopython. and revising my codes. it should work now. >>>> >>>> Malvika >>>> >>>> >>>> On Tue, Jul 19, 2011 at 8:08 AM, Peter Cock wrote: >>>> >>>>> On Monday, July 18, 2011, Eric Talevich >>>>> wrote: >>>>> > On Sun, Jul 17, 2011 at 7:27 PM, malvika sharan < >>>>> malvikasharan at gmail.com>wrote: >>>>> > >>>>> >> My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. >>>>> >> >>>>> > >>>>> > That's a very old version of Biopython. Are you able to install a >>>>> more >>>>> > recent version? >>>>> > >>>>> > >>>>> > And well as i said that the the error shows at the import. >>>>> >> import os, sys >>>>> >> from Bio import SeqIO >>>>> >> from Bio import Entrez >>>>> >> from Bio.Blast import NCBIWWW* >>>>> >> *from Bio.Blast.NCBIStandalone import PSIBlastParser >>>>> >> *from Bio.Blast.Applications import NcbipsiblastCommandline* >>>>> >> from Bio.Blast import NCBIXML >>>>> >> >>>>> > >>>>> > If the earlier imports of SeqIO and Entrez work, then >>>>> > NcbipsiblastCommandline probably wasn't included in Biopython 1.50, I >>>>> would >>>>> > guess. >>>>> > >>>>> >>>>> Correct. The BLAST+ wrappers were added in Biopython 1.53. >>>>> You will need to update it, ideally to the current release, 1.57 >>>>> >>>>> In general if some imports work and others fail, your library >>>>> Is either too old (and what you want didn't exist in the old >>>>> version), or too new (the code you want to use was obsolete >>>>> and removed). >>>>> >>>>> Peter >>>>> >>>> From from.d.putto at gmail.com Wed Jul 20 12:22:37 2011 From: from.d.putto at gmail.com (Sheila the angel) Date: Wed, 20 Jul 2011 18:22:37 +0200 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: This method do not work with Bio.Emboss.Applications!!! I am trying to do the same with 'Bio.Emboss.Applications' as seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence object water_cline = WaterCommandline(asequence="-", bsequence="beta.fasta", gapopen=10, gapextend=0.5, outfile="water.txt") stdout, stderr =water_cline(stdin=seq_record.format("fasta")) but it is displaying error. How can I specify file handle or sequence only in Emboss Applications??? On Thu, Jul 7, 2011 at 12:43 PM, Peter Cock wrote: > On Thu, Jul 7, 2011 at 11:26 AM, Sheila the angel > wrote: > > Hi All, > > > > I want to download genbank file from NCBI and pass the protein sequence > > directly to the local BLAST. But I am getting error in BLAST step > > > #------------------------------------------------------------------------------------------- > > from Bio import SeqIO > > from Bio import Entrez > > from Bio.Blast.Applications import NcbiblastpCommandline > > id='200203' > > handle = Entrez.efetch(db="protein", id=id, rettype="gp") > > seq_record = SeqIO.read(handle, "gb") > > x=seq_record.seq #getting the > > sequence in a variable x > > blastp_cline = NcbiblastpCommandline(query=x, db="protein_database", > > evalue=0.001) # My BLAST command > > result_handle, stderr = blastp_cline() #Running BLAST > and > > getting error :( > > > > > #------------------------------------------------------------------------------------------- > > > > At this last step I am getting error..... > > I sort-of understand the problem.....it is taking value of x as a file > name > > while its a variable which contains the sequence. > > Is there any way out to this problem without making temporary file. > > With the standalone blast tools you generally need to prepare an input > FASTA file with your query sequence(s). > > However, in principle you can give the input filename as - (default), > and instead pipe the query FASTA record in as stdin (standard input). > Try something like this (untested): > > ... > blastp_cline = NcbiblastpCommandline(query="-", db="protein_database", > evalue=0.001) > stdout, stderr = blastp_cline(stdin=seq_record.format("fasta")) > > Peter > From mollymutant at googlemail.com Wed Jul 20 10:19:35 2011 From: mollymutant at googlemail.com (molly mutant) Date: Wed, 20 Jul 2011 16:19:35 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: if it use cline() command: psi_cline = NcbipsiblastCommandline('psiblast', db = 'refseq_protein',\ query = queryID+".fasta", evalue = 10 , \ out = queryID+"_psi.xml", outfmt = 7, \ out_pssm = queryID+"_pssm") str(psi_cline) psi_cline() the following error occurs : Traceback (most recent call last): File "psiBlast.py", line 113, in psi_cline() File "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Application/__init__.py", line 432, in __call__ stdout_str, stderr_str) Bio.Application.ApplicationError: Command 'psiblast -out NP_012649_psi.xml -outfmt 7 -query NP_012649.fasta -db refseq_protein -evalue 10 -out_pssm NP_012649_pssm' returned non-zero exit status 127, '/bin/sh: psiblast: not found' I think this error stands for that the command is not found, which means that my command is incorrect, am i right?? On Wed, Jul 20, 2011 at 3:33 PM, molly mutant wrote: > Oh, i forgot to mention : > > queryID here is a protein ID for example NP_010247.1 > ' query = queryID+".fasta" ' is a fasta file for this protein. > i want to get the XML output from the psi blast. > > Regards > Molly > > > On Wed, Jul 20, 2011 at 3:26 PM, molly mutant wrote: > >> hello all, >> >> I was also trying to run my program using the same wrapper from >> Bio.Blast.Application for psiblast commandline. >> >> i use the following code but this is not generating XML file. >> psi_cline = NcbipsiblastCommandline('psiblast', db = >> 'refseq_protein', query = queryID+".fasta", evalue = 10 , out = >> queryID+"_psi.xml", outfmt = 7, out_pssm = queryID+"_pssm") >> p = >> subprocess.Popen(str(psi_cline),stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=(sys.platform!="win32")) >> blastParser(p.stdout) >> >> i have defined blastParser for parsing XML files which works perfectly >> with other xml files. >> >> i get the following error : >> >> Traceback (most recent call last): >> File "psiBlast.py", line 110, in >> blastParser(p.stdout) >> >> File >> "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Blast/NCBIXML.py", >> line 617, in parse >> raise ValueError("Your XML file was empty") >> ValueError: Your XML file was empty >> >> >> you can see that i am using python 2.6 and Biopython 1.57. Do you know >> where am i going incorrect? >> >> Molly >> >> On Wed, Jul 20, 2011 at 3:07 PM, malvika sharan wrote: >> >> >>> >>> ---------- Forwarded message ---------- >>> >>> >>> Thank you Peter and Eric. >>> >>> you are right and i think i should have known this :( >>> I updated Biopython. and revising my codes. it should work now. >>> >>> Malvika >>> >>> >>> On Tue, Jul 19, 2011 at 8:08 AM, Peter Cock wrote: >>> >>>> On Monday, July 18, 2011, Eric Talevich >>>> wrote: >>>> > On Sun, Jul 17, 2011 at 7:27 PM, malvika sharan < >>>> malvikasharan at gmail.com>wrote: >>>> > >>>> >> My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. >>>> >> >>>> > >>>> > That's a very old version of Biopython. Are you able to install a more >>>> > recent version? >>>> > >>>> > >>>> > And well as i said that the the error shows at the import. >>>> >> import os, sys >>>> >> from Bio import SeqIO >>>> >> from Bio import Entrez >>>> >> from Bio.Blast import NCBIWWW* >>>> >> *from Bio.Blast.NCBIStandalone import PSIBlastParser >>>> >> *from Bio.Blast.Applications import NcbipsiblastCommandline* >>>> >> from Bio.Blast import NCBIXML >>>> >> >>>> > >>>> > If the earlier imports of SeqIO and Entrez work, then >>>> > NcbipsiblastCommandline probably wasn't included in Biopython 1.50, I >>>> would >>>> > guess. >>>> > >>>> >>>> Correct. The BLAST+ wrappers were added in Biopython 1.53. >>>> You will need to update it, ideally to the current release, 1.57 >>>> >>>> In general if some imports work and others fail, your library >>>> Is either too old (and what you want didn't exist in the old >>>> version), or too new (the code you want to use was obsolete >>>> and removed). >>>> >>>> Peter >>>> >>> >>> >>> >> > > > -- Regards, Molly From anaryin at gmail.com Wed Jul 20 12:48:14 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 20 Jul 2011 18:48:14 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: Dear Molly, I never worked with that module so this might be wrong. But from my experience, that error is common when you have aliased the executable, which doesn't work. Try adding the directory where the executable 'psiblast' is (usually something /bin) to your PATH variable: export PATH="${PATH}:/my/blast/directory/bin/' Troubleshooting and debugging are parts of coding, so I'd recommend you to spend half an hour on this and I'm sure you'll get i through. Best, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Wed, Jul 20, 2011 at 5:30 PM, molly mutant wrote: > Can anyone please share the functioning codes for PSI-BLAST using NCBI > commandline or suggest me the source from where i can get it?? i am in an > urgent need of it :( and i can not find the problem with my command/code. > > Regards, > Molly > > On Wed, Jul 20, 2011 at 4:19 PM, molly mutant >wrote: > > > if it use cline() command: > > > > psi_cline = NcbipsiblastCommandline('psiblast', db = > > 'refseq_protein',\ > > query = queryID+".fasta", > > evalue = 10 , \ > > out = queryID+"_psi.xml", > > outfmt = 7, \ > > out_pssm = queryID+"_pssm") > > str(psi_cline) > > psi_cline() > > > > the following error occurs : > > Traceback (most recent call last): > > File "psiBlast.py", line 113, in > > psi_cline() > > File > > > "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Application/__init__.py", > > line 432, in __call__ > > stdout_str, stderr_str) > > Bio.Application.ApplicationError: Command 'psiblast -out > NP_012649_psi.xml > > -outfmt 7 -query NP_012649.fasta -db refseq_protein -evalue 10 -out_pssm > > NP_012649_pssm' returned non-zero exit status 127, '/bin/sh: psiblast: > not > > found' > > > > I think this error stands for that the command is not found, which means > > that my command is incorrect, am i right?? > > > > > > > > On Wed, Jul 20, 2011 at 3:33 PM, molly mutant < > mollymutant at googlemail.com>wrote: > > > >> Oh, i forgot to mention : > >> > >> queryID here is a protein ID for example NP_010247.1 > >> ' query = queryID+".fasta" ' is a fasta file for this protein. > >> i want to get the XML output from the psi blast. > >> > >> Regards > >> Molly > >> > >> > >> On Wed, Jul 20, 2011 at 3:26 PM, molly mutant < > mollymutant at googlemail.com > >> > wrote: > >> > >>> hello all, > >>> > >>> I was also trying to run my program using the same wrapper from > >>> Bio.Blast.Application for psiblast commandline. > >>> > >>> i use the following code but this is not generating XML file. > >>> psi_cline = NcbipsiblastCommandline('psiblast', db = > >>> 'refseq_protein', query = queryID+".fasta", evalue = 10 , out = > >>> queryID+"_psi.xml", outfmt = 7, out_pssm = queryID+"_pssm") > >>> p = > >>> > subprocess.Popen(str(psi_cline),stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=(sys.platform!="win32")) > >>> blastParser(p.stdout) > >>> > >>> i have defined blastParser for parsing XML files which works perfectly > >>> with other xml files. > >>> > >>> i get the following error : > >>> > >>> Traceback (most recent call last): > >>> File "psiBlast.py", line 110, in > >>> blastParser(p.stdout) > >>> > >>> File > >>> > "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Blast/NCBIXML.py", > >>> line 617, in parse > >>> raise ValueError("Your XML file was empty") > >>> ValueError: Your XML file was empty > >>> > >>> > >>> you can see that i am using python 2.6 and Biopython 1.57. Do you know > >>> where am i going incorrect? > >>> > >>> Molly > >>> > >>> On Wed, Jul 20, 2011 at 3:07 PM, malvika sharan < > malvikasharan at gmail.com > >>> > wrote: > >>> > >>> > >>>> > >>>> ---------- Forwarded message ---------- > >>>> > >>>> > >>>> Thank you Peter and Eric. > >>>> > >>>> you are right and i think i should have known this :( > >>>> I updated Biopython. and revising my codes. it should work now. > >>>> > >>>> Malvika > >>>> > >>>> > >>>> On Tue, Jul 19, 2011 at 8:08 AM, Peter Cock < > p.j.a.cock at googlemail.com>wrote: > >>>> > >>>>> On Monday, July 18, 2011, Eric Talevich > >>>>> wrote: > >>>>> > On Sun, Jul 17, 2011 at 7:27 PM, malvika sharan < > >>>>> malvikasharan at gmail.com>wrote: > >>>>> > > >>>>> >> My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. > >>>>> >> > >>>>> > > >>>>> > That's a very old version of Biopython. Are you able to install a > >>>>> more > >>>>> > recent version? > >>>>> > > >>>>> > > >>>>> > And well as i said that the the error shows at the import. > >>>>> >> import os, sys > >>>>> >> from Bio import SeqIO > >>>>> >> from Bio import Entrez > >>>>> >> from Bio.Blast import NCBIWWW* > >>>>> >> *from Bio.Blast.NCBIStandalone import PSIBlastParser > >>>>> >> *from Bio.Blast.Applications import NcbipsiblastCommandline* > >>>>> >> from Bio.Blast import NCBIXML > >>>>> >> > >>>>> > > >>>>> > If the earlier imports of SeqIO and Entrez work, then > >>>>> > NcbipsiblastCommandline probably wasn't included in Biopython 1.50, > I > >>>>> would > >>>>> > guess. > >>>>> > > >>>>> > >>>>> Correct. The BLAST+ wrappers were added in Biopython 1.53. > >>>>> You will need to update it, ideally to the current release, 1.57 > >>>>> > >>>>> In general if some imports work and others fail, your library > >>>>> Is either too old (and what you want didn't exist in the old > >>>>> version), or too new (the code you want to use was obsolete > >>>>> and removed). > >>>>> > >>>>> Peter > >>>>> > >>>> > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From eric.talevich at gmail.com Wed Jul 20 12:54:18 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 20 Jul 2011 12:54:18 -0400 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: On Wed, Jul 20, 2011 at 11:30 AM, molly mutant wrote: > Can anyone please share the functioning codes for PSI-BLAST using NCBI > commandline or suggest me the source from where i can get it?? i am in an > urgent need of it :( and i can not find the problem with my command/code. > > Regards, > Molly > > On Wed, Jul 20, 2011 at 4:19 PM, molly mutant >wrote: > > the following error occurs : > > Bio.Application.ApplicationError: Command 'psiblast -out > NP_012649_psi.xml > > -outfmt 7 -query NP_012649.fasta -db refseq_protein -evalue 10 -out_pssm > > NP_012649_pssm' returned non-zero exit status 127, '/bin/sh: psiblast: > not > > found' > > > > I think this error stands for that the command is not found, which means > > that my command is incorrect, am i right?? > To follow up on what Jo?o wrote, the important error is: '/bin/sh: psiblast: not found' Which means that psiblast is not installed correctly on your system. If you try running just that command on the command line: psiblast -out NP_012649_psi.xml -outfmt 7 -query NP_012649.fasta -db refseq_protein -evalue 10 -out_pssm NP_012649_pssm will also report an error, even without using Python. So, try making sure psiblast is available on your system path ($PATH). The command "which psiblast" should print where the executable is, if it can be found. Or, if you don't want to mess with $PATH, you can include the complete path to the psiblast executable: psi_cline = NcbipsiblastCommandline('/usr/local/bin/psiblast', db = ... Cheers, Eric From p.j.a.cock at googlemail.com Wed Jul 20 13:27:00 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 20 Jul 2011 18:27:00 +0100 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: On Wednesday, July 20, 2011, Sheila the angel wrote: > This method do not ?work with Bio.Emboss.Applications!!!I am trying to do the same with 'Bio.Emboss.Applications' as > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") ?#or a sequence object > water_cline = WaterCommandline(asequence="-", bsequence="beta.fasta", gapopen=10, gapextend=0.5, outfile="water.txt") > stdout, stderr =water_cline(stdin=seq_record.format("fasta")) > but it is displaying error.How can I specify file handle or sequence only in ?Emboss Applications??? > What is the error message? I think you need asequence="stdin" not "-" (although the later is a widely used convention in command line tools). See: http://emboss.sourceforge.net/docs/themes/UniformSequenceAddress.html#stdin Also try including auto=True in the command line, this tells EMBOSS not to try and ask for user input (by default it will try and prompt the user for any missing arguments, with the auto setting it uses it's defaults). Peter From anaryin at gmail.com Wed Jul 20 12:58:37 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 20 Jul 2011 18:58:37 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: Adding to what Eric said, sometimes (depending on your config) *which* won't tell you where the command is aliased to. Instead, try *locate*. From from.d.putto at gmail.com Thu Jul 21 05:14:06 2011 From: from.d.putto at gmail.com (Sheila the angel) Date: Thu, 21 Jul 2011 11:14:06 +0200 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: yes replacing '-' by 'stdin' works :) seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence object water_cline = WaterCommandline(asequence="stdin", bsequence="beta.fasta", gapopen=10, gapextend=0.5, outfile="water.txt") stdout, stderr =water_cline(stdin=seq_record.format("fasta")) but I tried to replace bsequence also by 'stdin' and it shows error #---------------------------------------------------------------------------------------------------------------------- seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence object seq_record2 = SeqIO.read(open("beta.fasta"), "fasta") water_cline = WaterCommandline(asequence="stdin", bsequence="stdin", gapopen=10, gapextend=0.5, outfile="water.txt", auto=True) stdout, stderr =water_cline(stdin=seq_record.format("fasta"), stdin=seq_record2.format("fasta")) # File "", line 1 # wrote: > On Wednesday, July 20, 2011, Sheila the angel > wrote: > > This method do not work with Bio.Emboss.Applications!!!I am trying to do > the same with 'Bio.Emboss.Applications' as > > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence > object > > water_cline = WaterCommandline(asequence="-", bsequence="beta.fasta", > gapopen=10, gapextend=0.5, outfile="water.txt") > > stdout, stderr =water_cline(stdin=seq_record.format("fasta")) > > but it is displaying error.How can I specify file handle or sequence only > in Emboss Applications??? > > > > What is the error message? > > I think you need asequence="stdin" not "-" (although the later is a > widely used convention in command line tools). See: > > http://emboss.sourceforge.net/docs/themes/UniformSequenceAddress.html#stdin > > Also try including auto=True in the command line, this tells EMBOSS > not to try and ask for user input (by default it will try and prompt > the user for any missing arguments, with the auto setting it uses it's > defaults). > > Peter > From p.j.a.cock at googlemail.com Thu Jul 21 05:40:36 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 21 Jul 2011 10:40:36 +0100 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: On Thu, Jul 21, 2011 at 10:14 AM, Sheila the angel wrote: > yes replacing '-' by 'stdin' works :) > > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") ?#or a sequence object > water_cline = WaterCommandline(asequence="stdin", bsequence="beta.fasta", > gapopen=10, gapextend=0.5, outfile="water.txt") > stdout, stderr =water_cline(stdin=seq_record.format("fasta")) Good - thanks for letting us know. I can probably make the tutorial text a little clearer here. > but I tried to replace?bsequence also by 'stdin' and it shows error > #---------------------------------------------------------------------------------------------------------------------- > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") ?#or a sequence object > seq_record2 = SeqIO.read(open("beta.fasta"), "fasta") > water_cline = WaterCommandline(asequence="stdin", bsequence="stdin", > gapopen=10, gapextend=0.5, outfile="water.txt", auto=True) That won't work - there is only one stdin pipe in a Unix style command line environment. You have to use either two input files, or stdin and one file, (or one file and stdin). > stdout, stderr =water_cline(stdin=seq_record.format("fasta"), > stdin=seq_record2.format("fasta")) > # ?File "", line 1 # #SyntaxError: keyword argument repeated That is a python syntax error message because you tried to use the stdin argument twice. Named Python function arguments can only be used once. I hope that makes sense. Peter From from.d.putto at gmail.com Thu Jul 21 05:48:28 2011 From: from.d.putto at gmail.com (Sheila the angel) Date: Thu, 21 Jul 2011 11:48:28 +0200 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: Thanks Peter, Yes I understand that I can't use 'stdin' twice so this is not going to work >>> water_cline = WaterCommandline(asequence="stdin", bsequence="stdin",gapopen=10, gapextend=0.5, outfile="water.txt") >>> stdout, stderr =water_cline(stdin=seq_record.format("fasta"), stdin=seq_record2.format("fasta")) >You have to use either two input files, or stdin and one file, (or one >file and stdin). But how can I specify two input files, or stdin On Thu, Jul 21, 2011 at 11:40 AM, Peter Cock wrote: > On Thu, Jul 21, 2011 at 10:14 AM, Sheila the angel > wrote: > > yes replacing '-' by 'stdin' works :) > > > > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence > object > > water_cline = WaterCommandline(asequence="stdin", bsequence="beta.fasta", > > gapopen=10, gapextend=0.5, outfile="water.txt") > > stdout, stderr =water_cline(stdin=seq_record.format("fasta")) > > Good - thanks for letting us know. > > I can probably make the tutorial text a little clearer here. > > > > but I tried to replace bsequence also by 'stdin' and it shows error > > > #---------------------------------------------------------------------------------------------------------------------- > > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence > object > > seq_record2 = SeqIO.read(open("beta.fasta"), "fasta") > > water_cline = WaterCommandline(asequence="stdin", bsequence="stdin", > > gapopen=10, gapextend=0.5, outfile="water.txt", auto=True) > > That won't work - there is only one stdin pipe in a Unix style command > line environment. > You have to use either two input files, or stdin and one file, (or one > file and stdin). > > > stdout, stderr =water_cline(stdin=seq_record.format("fasta"), > > stdin=seq_record2.format("fasta")) > > # File "", line 1 # > #SyntaxError: keyword argument repeated > > That is a python syntax error message because you tried to use the stdin > argument twice. Named Python function arguments can only be used once. > > I hope that makes sense. > > Peter > From p.j.a.cock at googlemail.com Thu Jul 21 05:56:17 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 21 Jul 2011 10:56:17 +0100 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: On Thu, Jul 21, 2011 at 10:48 AM, Sheila the angel wrote: > Thanks Peter, > Yes I understand that I can't use 'stdin' twice so this is not going to work > >>You have to use either two input files, or stdin and one file, (or one >>file and stdin). > > But how can I specify two input files, or stdin > Two files is covered in the Tutorial example you originally started with, isn't it?: water_cline = WaterCommandline(asequence="alpha.fasta", bsequence="beta.fasta", gapopen=10, gapextend=0.5, outfile="water.txt") stdout, stderr =water_cline() You've already done stdin and file for a and b, and said that works: seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence object water_cline = WaterCommandline(asequence="stdin", bsequence="beta.fasta", gapopen=10, gapextend=0.5, outfile="water.txt") stdout, stderr =water_cline(stdin=seq_record.format("fasta")) Doing it the other way round with a file and stdin for a and b would be just: seq_record = SeqIO.read(open("beta.fasta"), "fasta") #or a sequence object water_cline = WaterCommandline(asequence="alpha.fasta", bsequence="stdin", gapopen=10, gapextend=0.5, outfile="water.txt") stdout, stderr =water_cline(stdin=seq_record.format("fasta")) Obviously if you already have the sequences in FASTA files, then just give the filenames to the EMBOSS tool (rather than needlessly loading them into python just to write them to stdin). Peter From malvikasharan at gmail.com Thu Jul 21 11:51:35 2011 From: malvikasharan at gmail.com (malvika sharan) Date: Thu, 21 Jul 2011 17:51:35 +0200 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: Hi, i have tried this tool for aligning 1 sequence ( 'aseq.fasta ') against 5 sequence present in other file ('bseq.fasta'). as expected aseq.fasta gives pairwise alignment with every fasta sequence present in bseq.fasta. the alignment works perfectly and saves the alignment output as well. 1> the question is if there is anyway to extraxt consensus out of all pairwise alignments? I know its is possible with clustalw or muscle or other alignment tool. but i do not want pairwise alignment between all the sequence with each other. In this case Emboss seemed the better tool where i can align all sequence only with 1 query. but the crucial part is to extract the conserved residue from the alignment. 2> The best would be to align all the sequence together against 1 sequence (aseq.fasta) like it happens in COBALT. and find the conserved residue directly. but i did not find any commandline tool like that unfortunately. It would be great if you can suggest a tool if you know any. thank you ! Malvika On Thu, Jul 21, 2011 at 11:56 AM, Peter Cock wrote: > On Thu, Jul 21, 2011 at 10:48 AM, Sheila the angel > wrote: > > Thanks Peter, > > Yes I understand that I can't use 'stdin' twice so this is not going to > work > > > >>You have to use either two input files, or stdin and one file, (or one > >>file and stdin). > > > > But how can I specify two input files, or stdin > > > > Two files is covered in the Tutorial example you originally started > with, isn't it?: > > water_cline = WaterCommandline(asequence="alpha.fasta", > bsequence="beta.fasta", gapopen=10, gapextend=0.5, > outfile="water.txt") > stdout, stderr =water_cline() > > You've already done stdin and file for a and b, and said that works: > > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence > object > water_cline = WaterCommandline(asequence="stdin", > bsequence="beta.fasta", gapopen=10, gapextend=0.5, > outfile="water.txt") > stdout, stderr =water_cline(stdin=seq_record.format("fasta")) > > Doing it the other way round with a file and stdin for a and b would be > just: > > seq_record = SeqIO.read(open("beta.fasta"), "fasta") #or a sequence object > water_cline = WaterCommandline(asequence="alpha.fasta", > bsequence="stdin", gapopen=10, gapextend=0.5, outfile="water.txt") > stdout, stderr =water_cline(stdin=seq_record.format("fasta")) > > Obviously if you already have the sequences in FASTA files, then just give > the filenames to the EMBOSS tool (rather than needlessly loading them into > python just to write them to stdin). > > Peter > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Thu Jul 21 12:09:18 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 21 Jul 2011 17:09:18 +0100 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: On Thu, Jul 21, 2011 at 4:51 PM, malvika sharan wrote: > Hi, > > i have tried this tool for aligning 1 sequence ( 'aseq.fasta ') against 5 > sequence present in other file ('bseq.fasta'). as expected aseq.fasta gives > pairwise alignment with every fasta sequence present in bseq.fasta. > > the alignment works perfectly and saves the alignment output as well. > > 1> the question is if there is anyway to extraxt consensus out of all > pairwise alignments? I know its is possible with clustalw or muscle or other > alignment tool. but i do not want pairwise alignment between all the > sequence with each other. In this case Emboss seemed the better tool where i > can align all sequence only with 1 query. but the crucial part is to extract > the conserved residue from the alignment. > > 2> The best would be to align all the sequence together against 1 sequence > (aseq.fasta) like it happens in COBALT. and find the conserved residue > directly. but i did not find any commandline tool like that unfortunately. > It would be great if you can suggest a tool if you know any. > > thank you ! > Malvika I don't understand what you are asking for - it is hard to define a consensus from a pairwise alignment. Could you give a short example? Also this thread really is going off topic, perhaps a new email thread (with a more relevant title) would be better? Peter From malvikasharan at gmail.com Thu Jul 21 12:32:52 2011 From: malvikasharan at gmail.com (malvika sharan) Date: Thu, 21 Jul 2011 18:32:52 +0200 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: yup sure!! On Thu, Jul 21, 2011 at 6:09 PM, Peter Cock wrote: > On Thu, Jul 21, 2011 at 4:51 PM, malvika sharan > wrote: > > Hi, > > > > i have tried this tool for aligning 1 sequence ( 'aseq.fasta ') against 5 > > sequence present in other file ('bseq.fasta'). as expected aseq.fasta > gives > > pairwise alignment with every fasta sequence present in bseq.fasta. > > > > the alignment works perfectly and saves the alignment output as well. > > > > 1> the question is if there is anyway to extraxt consensus out of all > > pairwise alignments? I know its is possible with clustalw or muscle or > other > > alignment tool. but i do not want pairwise alignment between all the > > sequence with each other. In this case Emboss seemed the better tool > where i > > can align all sequence only with 1 query. but the crucial part is to > extract > > the conserved residue from the alignment. > > > > 2> The best would be to align all the sequence together against 1 > sequence > > (aseq.fasta) like it happens in COBALT. and find the conserved residue > > directly. but i did not find any commandline tool like that > unfortunately. > > It would be great if you can suggest a tool if you know any. > > > > thank you ! > > Malvika > > I don't understand what you are asking for - it is hard to define a > consensus from a pairwise alignment. Could you give a short example? > > Also this thread really is going off topic, perhaps a new email thread > (with a more relevant title) would be better? > > Peter > From from.d.putto at gmail.com Fri Jul 22 06:58:53 2011 From: from.d.putto at gmail.com (Sheila the angel) Date: Fri, 22 Jul 2011 12:58:53 +0200 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: Oh I am sorry for the line seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence object This was meant to show that I have a sequence record object. In my actual problem I have open 2 different genbank files which contains many sequence record. I want to run EMBOSS to each sequence. from Bio import SeqIO for seq_record in SeqIO.parse("ls_orchid.gbk", "genbank"): seq1=seq_record.seq for seq_record2 in SeqIO.parse("other_file.gbk", "genbank"): #NOTE- this is a different genbank file seq2=seq_record2.seq EMBOSS_OUT= RUN_EMBOSS_WATER_on_seq1_and_seq2(seq1,seq2) #Analyse 'EMBOSS_OUT' for further analysis One possible way to do this - - convert genbank files to fasta and then use one file as 'beta.fasta'. But in such case I can't analysis EMBOSS result directly. I have to wait till EMBOSS is done with all the sequences. Another way is to create 2 temporary files temp1 and temp2 and pass them to EMBOSS. Though it makes code little slow. (Because every time 1st you have write sequence in temporary files and after analysis delete it) I tried another solution.....it may be little dirty but may be useful for someone. #--------------------------------------------------------------------------------------- def RUN_EMBOSS_WATER_on_seq1_and_seq2 (seq1,seq2,out_file='stdout'): water_cline = WaterCommandline() water_cline.asequence='asis:'+str(seq1) water_cline.bsequence='asis:'+str(seq2) water_cline.gapopen=10 water_cline.gapextend=0.5 water_cline.outfile=out_file stdout, stderr = water_cline() return (stdout) #--------------------------------------------------------------------------------------- you can call the function as EMBOSS_OUT=RUN_EMBOSS_WATER_on_seq1_and_seq2 (seq1,seq2) and then can do analysis with 'EMBOSS_OUT'. Here neither you have to wait till EMBOSS is done with all the sequences nor you have to create temporary files :) BUT the only problem is the name of the sequence are not present in output and it writes only # Aligned_sequences: 2 # 1: asis # 2: asis So if you don't care which sequence is what then it works :) On Thu, Jul 21, 2011 at 11:56 AM, Peter Cock wrote: > On Thu, Jul 21, 2011 at 10:48 AM, Sheila the angel > wrote: > > Thanks Peter, > > Yes I understand that I can't use 'stdin' twice so this is not going to > work > > > >>You have to use either two input files, or stdin and one file, (or one > >>file and stdin). > > > > But how can I specify two input files, or stdin > > > > Two files is covered in the Tutorial example you originally started > with, isn't it?: > > water_cline = WaterCommandline(asequence="alpha.fasta", > bsequence="beta.fasta", gapopen=10, gapextend=0.5, > outfile="water.txt") > stdout, stderr =water_cline() > > You've already done stdin and file for a and b, and said that works: > > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence > object > water_cline = WaterCommandline(asequence="stdin", > bsequence="beta.fasta", gapopen=10, gapextend=0.5, > outfile="water.txt") > stdout, stderr =water_cline(stdin=seq_record.format("fasta")) > > Doing it the other way round with a file and stdin for a and b would be > just: > > seq_record = SeqIO.read(open("beta.fasta"), "fasta") #or a sequence object > water_cline = WaterCommandline(asequence="alpha.fasta", > bsequence="stdin", gapopen=10, gapextend=0.5, outfile="water.txt") > stdout, stderr =water_cline(stdin=seq_record.format("fasta")) > > Obviously if you already have the sequences in FASTA files, then just give > the filenames to the EMBOSS tool (rather than needlessly loading them into > python just to write them to stdin). > > Peter > From p.j.a.cock at googlemail.com Fri Jul 22 07:18:08 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 22 Jul 2011 12:18:08 +0100 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: On Fri, Jul 22, 2011 at 11:58 AM, Sheila the angel wrote: > Oh I am sorry for the line > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") ?#or a sequence object > This was meant to show that I have a sequence record object. I understood. > In my actual problem I have open 2 different genbank files which contains > many sequence record. I want to run EMBOSS to each sequence. The EMBOSS tools can read GenBank files too. Try something like this using their Uniform Sequence Address (USA) convention, http://emboss.sourceforge.net/docs/#Usa http://emboss.sourceforge.net/docs/themes/UniformSequenceAddress.html water_cline = WaterCommandline(asequence="genbank::ls_orchid.gbk", bsequence="genbank::other_file.gbk", gapopen=10, gapextend=0.5, outfile="water.txt", auto=True) Alternative some (all?) of the EMBOSS tools provide an explicit separate argument for the input file format. In this case it wasn't very obvious as it wasn't listed in the "water --help" text, you could use -sformat1 and -sformat2 but it looks like our needle/water wrappers don't include them. So go with the USA approach. Peter From gowencm at vcu.edu Thu Jul 28 12:28:40 2011 From: gowencm at vcu.edu (Chris Gowen) Date: Thu, 28 Jul 2011 12:28:40 -0400 Subject: [Biopython] PWM using gapped alignments Message-ID: Hello all, We are trying to perform pwm calculations using the Motif.pwm() function, and many of our alignments have gaps, which raise KeyError when it tries the key '-'. I am fairly inexperienced with this analysis technique, but from looking at the source, it seems the error itself may be avoided by adding a line before line 97 to skip that letter in the calculation. Would this mess up the calculation for the pwm scores? Has anyone dealt with this problem in a more clever way? Thanks for any advise you can offer. Best, Chris Gowen 82 - def pwm (self,laplace=True): 83 """ 84 returns the PWM computed for the set of instances 85 86 if laplace=True (default), pseudocounts equal to self.background multiplied by self.beta are added to all positions. 87 """ 88 89 if self. _pwm_is_current: 90 return self._pwm 91 #we need to compute new pwm 92 self._pwm = [] 93 for i in xrange ( self.length ): 94 dict = {} 95 #filling the dict with 0's 96 for letter in self. alphabet . letters : 97 if laplace: 98 dict[letter]=self.beta*self.background[letter] 99 else: 100 dict[letter]=0.0 101 if self.has_counts: 102 #taking the raw counts 103 for letter in self.alphabet .letters : 104 dict[letter]+=self.counts[letter][i ] 105 elif self.has_instances: 106 #counting the occurences of letters in instances 107 for seq in self.instances: 108 #dict[seq[i]]=dict[seq[i]]+1 109 dict[seq [i ]]+=1 110 self._pwm .append ( FreqTable . FreqTable ( dict,FreqTable .COUNT , self.alphabet )) 111 self._pwm_is_current=1 112 return self._pwm From p.j.a.cock at googlemail.com Thu Jul 28 12:54:40 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 28 Jul 2011 17:54:40 +0100 Subject: [Biopython] PWM using gapped alignments In-Reply-To: References: Message-ID: On Thu, Jul 28, 2011 at 5:28 PM, Chris Gowen wrote: > Hello all, > > We are trying to perform pwm calculations using the Motif.pwm() function, > and many of our alignments have gaps, which raise KeyError when it tries the > key '-'. I am fairly inexperienced with this analysis technique, but from > looking at the source, it seems the error itself may be avoided by adding a > line before line 97 to skip that letter in the calculation. Would this mess > up the calculation for the pwm scores? Has anyone dealt with this problem in > a more clever way? > > Thanks for any advise you can offer. > > Best, > Chris Gowen Which alphabet are you using? My guess is you didn't have a gapped alphabet. As an aside, making the Seq object test this has certain appeal but would impose a performance penalty. Peter From gowencm at vcu.edu Thu Jul 28 13:14:03 2011 From: gowencm at vcu.edu (Chris Gowen) Date: Thu, 28 Jul 2011 13:14:03 -0400 Subject: [Biopython] PWM using gapped alignments In-Reply-To: References: Message-ID: Hi Peter, Thanks for the response. We are initiating the alignments and motif as Gapped(IUPAC.unambiguous_dna), so our letters are 'GATC-'. As far as I can tell, there is no means for pwm() to know to skip the gaps, if that's even the 'right' thing for it to do. On Thu, Jul 28, 2011 at 12:54 PM, Peter Cock wrote: > On Thu, Jul 28, 2011 at 5:28 PM, Chris Gowen wrote: > > Hello all, > > > > We are trying to perform pwm calculations using the Motif.pwm() function, > > and many of our alignments have gaps, which raise KeyError when it tries > the > > key '-'. I am fairly inexperienced with this analysis technique, but from > > looking at the source, it seems the error itself may be avoided by adding > a > > line before line 97 to skip that letter in the calculation. Would this > mess > > up the calculation for the pwm scores? Has anyone dealt with this problem > in > > a more clever way? > > > > Thanks for any advise you can offer. > > > > Best, > > Chris Gowen > > Which alphabet are you using? My guess is you didn't have a gapped > alphabet. > > As an aside, making the Seq object test this has certain appeal but > would impose a performance penalty. > > Peter > From dilara.ally at gmail.com Fri Jul 1 01:21:47 2011 From: dilara.ally at gmail.com (Dilara Ally) Date: Thu, 30 Jun 2011 18:21:47 -0700 Subject: [Biopython] Having a hard time getting a handle on handles Message-ID: <4E0D212B.4040206@gmail.com> Hi All I have ~700,000 contigs that I would like to blast search and then from the blast record parse out particular pieces of information from the BLAST report. I can get my code to pull in files and then loop over seq_records, blast, and then write a BLAST report. But since I don't want to have 700,000 BLAST reports, I'd like to parse particular pieces of information from the report and store it in a table. This is the error I get from the code I have pasted below: /Users/dally/Desktop/NextGenData/Python_Scripts/batchedfastafiles/group_1.fasta 1 0 GTCTTCGGCGTTGCACCGGCGATGAAGAACCAGTACGAGGCGTCTGGCGAGAGTAACAACGCTG Traceback (most recent call last): File "", line 13, in NameError: name 'NCBIXML' is not defined Do i have to close the result_handle and then reopen it? If so why? Thanks in advance for your help. > > from Bio import SeqIO > from Bio.Blast import NCBIWWW > import time > import os > import os.path > > dirname1="/Users/dally/Desktop/NextGenData/Python_Scripts/batchedfastafiles/" > allfiles=os.listdir(dirname1) > fanddir=[os.path.join(dirname1,fname) for fname in allfiles] > i = 0 > for f in fanddir: > print f > handle = open(f, "rU") > contigs =list(SeqIO.parse(handle,"fasta")) > handle.close() > start=time.time() > for seq_record in contigs: > i=i+1 > print seq_record.id > print seq_record.seq > result_handle=NCBIWWW.qblast("blastn", "nr", > seq_record.format("fasta"),hitlist_size=10) > blast_record=list(NCBIXML.read(result_handle)) <== HERE IS THE > PROBLEM > E_VALUE_THRESH = 0.000004 > countr=0 > for alignment in blast_record.alignments: > countr=countr+1 > for hsp in alignment.hsps: > if hsp.expect < E_VALUE_THRESH: > print '****Alignment****' > print 'sequence:', alignment.title > print 'length:', alignment.length > print 'e value:', hsp.expect > print hsp.query[0:75] + '...' > print hsp.match[0:75] + '...' > print hsp.sbjct[0:75] + '...' From eric.talevich at gmail.com Fri Jul 1 03:24:04 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Thu, 30 Jun 2011 23:24:04 -0400 Subject: [Biopython] Having a hard time getting a handle on handles In-Reply-To: <4E0D212B.4040206@gmail.com> References: <4E0D212B.4040206@gmail.com> Message-ID: On Thu, Jun 30, 2011 at 9:21 PM, Dilara Ally wrote: > Hi All > > I have ~700,000 contigs that I would like to blast search and then from the > blast record parse out particular pieces of information from the BLAST > report. I can get my code to pull in files and then loop over seq_records, > blast, and then write a BLAST report. But since I don't want to have > 700,000 BLAST reports, I'd like to parse particular pieces of information > from the report and store it in a table. This is the error I get from the > code I have pasted below: > > /Users/dally/Desktop/**NextGenData/Python_Scripts/** > batchedfastafiles/group_1.**fasta > 1 > 0 > GTCTTCGGCGTTGCACCGGCGATGAAGAAC**CAGTACGAGGCGTCTGGCGAGAGTAACAAC**GCTG > Traceback (most recent call last): > File "", line 13, in > NameError: name 'NCBIXML' is not defined > > Do i have to close the result_handle and then reopen it? If so why? > Thanks in advance for your help. > Try adding this import to the top of your script: from Bio.Blast import NCBIXML Does it work now? In general, whenever you see a NameError you should check for (a) missing imports and (b) mis-typed variable names. The problem is usually one of those. Cheers, Eric From dilara.ally at gmail.com Fri Jul 1 03:50:52 2011 From: dilara.ally at gmail.com (Dilara Ally) Date: Thu, 30 Jun 2011 20:50:52 -0700 Subject: [Biopython] Having a hard time getting a handle on handles In-Reply-To: References: <4E0D212B.4040206@gmail.com> Message-ID: <4E0D441C.5020600@gmail.com> Thanks, I tried that and now the error is Traceback (most recent call last): File "", line 12, in TypeError: iteration over non-sequence Dilara On 6/30/11 8:24 PM, Eric Talevich wrote: > from Bio.Blast import NCBIXML From eric.talevich at gmail.com Fri Jul 1 15:27:42 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Fri, 1 Jul 2011 11:27:42 -0400 Subject: [Biopython] Having a hard time getting a handle on handles In-Reply-To: <4E0D441C.5020600@gmail.com> References: <4E0D212B.4040206@gmail.com> <4E0D441C.5020600@gmail.com> Message-ID: OK, the "iteration" is where you're trying to construct a list. Where it says: blast_record = list(NCBIXML.read(result_handle)) Try: blast_record = NCBIXML.read(result_handle) print blast_record And see what's in the blast_record object. That should make it more clear how the rest of your code should navigate it. -E On Thu, Jun 30, 2011 at 11:50 PM, Dilara Ally wrote: > Thanks, I tried that and now the error is > > > Traceback (most recent call last): > File "", line 12, in > TypeError: iteration over non-sequence > > Dilara > > > On 6/30/11 8:24 PM, Eric Talevich wrote: > >> from Bio.Blast import NCBIXML >> > From p.j.a.cock at googlemail.com Sat Jul 2 09:18:45 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 2 Jul 2011 10:18:45 +0100 Subject: [Biopython] multiple sequence blast In-Reply-To: <20110630104227.GA2883@sobchak> References: <4E0BAD67.70305@gmail.com> <20110630104227.GA2883@sobchak> Message-ID: On Thu, Jun 30, 2011 at 11:42 AM, Brad Chapman wrote: > Dilara; > Thanks for the message. It would be helpful if you'd include the > error message traceback that you got stuck on; this will help > pinpoint the problem. > > From reading your code, my guess is that you are getting and IOError > about files not existing. When you do os.listdir, it only includes > the name of the files, not the full path to where they are located. I would have suggested the same thing. In addition, are you really trying to run 100,000 contigs though the NCBI online BLAST service? If it works it will take a long time, but they might not like that and block your access. Big BLAST jobs like this are better done by installing BLAST+ (and in this case the NR database) locally. Biopython has wrappers to help call standalone BLAST too. Peter From dilara.ally at gmail.com Sun Jul 3 18:27:12 2011 From: dilara.ally at gmail.com (Dilara Ally) Date: Sun, 03 Jul 2011 11:27:12 -0700 Subject: [Biopython] multiple sequence blast In-Reply-To: References: <4E0BAD67.70305@gmail.com> <20110630104227.GA2883@sobchak> Message-ID: <4E10B480.1030600@gmail.com> Thanks Peter. On 7/2/11 2:18 AM, Peter Cock wrote: > On Thu, Jun 30, 2011 at 11:42 AM, Brad Chapman wrote: >> Dilara; >> Thanks for the message. It would be helpful if you'd include the >> error message traceback that you got stuck on; this will help >> pinpoint the problem. >> >> From reading your code, my guess is that you are getting and IOError >> about files not existing. When you do os.listdir, it only includes >> the name of the files, not the full path to where they are located. > I would have suggested the same thing. > > In addition, are you really trying to run 100,000 contigs though > the NCBI online BLAST service? If it works it will take a long time, > but they might not like that and block your access. Big BLAST > jobs like this are better done by installing BLAST+ (and in this case > the NR database) locally. Biopython has wrappers to help call > standalone BLAST too. > > Peter > From dilara.ally at gmail.com Sun Jul 3 19:27:53 2011 From: dilara.ally at gmail.com (Dilara Ally) Date: Sun, 03 Jul 2011 12:27:53 -0700 Subject: [Biopython] multiple sequence blast In-Reply-To: References: <4E0BAD67.70305@gmail.com> <20110630104227.GA2883@sobchak> Message-ID: <4E10C2B9.2000909@gmail.com> Hi Peter How long will it take then to do a big BLAST job that has over 600,000 contigs. Wouldn't downloading the databasese and doing a standalone BLAST take a lot of cpu memory? Should I be doing this on a cluster? Dilara On 7/2/11 2:18 AM, Peter Cock wrote: > On Thu, Jun 30, 2011 at 11:42 AM, Brad Chapman wrote: >> Dilara; >> Thanks for the message. It would be helpful if you'd include the >> error message traceback that you got stuck on; this will help >> pinpoint the problem. >> >> From reading your code, my guess is that you are getting and IOError >> about files not existing. When you do os.listdir, it only includes >> the name of the files, not the full path to where they are located. > I would have suggested the same thing. > > In addition, are you really trying to run 100,000 contigs though > the NCBI online BLAST service? If it works it will take a long time, > but they might not like that and block your access. Big BLAST > jobs like this are better done by installing BLAST+ (and in this case > the NR database) locally. Biopython has wrappers to help call > standalone BLAST too. > > Peter > From p.j.a.cock at googlemail.com Sun Jul 3 23:52:34 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 4 Jul 2011 00:52:34 +0100 Subject: [Biopython] multiple sequence blast In-Reply-To: <4E10C2B9.2000909@gmail.com> References: <4E0BAD67.70305@gmail.com> <20110630104227.GA2883@sobchak> <4E10C2B9.2000909@gmail.com> Message-ID: On Sun, Jul 3, 2011 at 8:27 PM, Dilara Ally wrote: > Hi Peter > > How long will it take then to do a big BLAST job that has over > 600,000 contigs. How long is a piece of string? ;) What I mean is this is hard to say without looking at your data. Do you know the total sequence length of the contigs? Try doing 60 representative contigs on your machine for an estimate (note their lengths are important - shorter contigs should be faster to run as BLAST queries). Remember that standalone BLAST+ can be run multi-threaded. It will depend on the number of CPUs and how much RAM you have. > Wouldn't downloading the databasese and doing a standalone > BLAST take a lot of cpu memory? Yes, it will take a lot of CPU time, and a moderate amount of RAM (if you are doing genome assembly to get the contigs, that will probably have needed far more RAM than running BLAST will). >?Should I be doing this on a cluster? It would probably be worth while. You *might* manage with a powerful multicore desktop (like a recent MacPro or similar) or powerful server. Peter From kokomutai at gmail.com Mon Jul 4 08:25:35 2011 From: kokomutai at gmail.com (Koko Mutai) Date: Mon, 4 Jul 2011 11:25:35 +0300 Subject: [Biopython] (no subject) Message-ID: hallo l would like to ask how to import,sequence matrix metalloproteinases and DNA annotation using biopython From from.d.putto at gmail.com Thu Jul 7 10:26:48 2011 From: from.d.putto at gmail.com (Sheila the angel) Date: Thu, 7 Jul 2011 12:26:48 +0200 Subject: [Biopython] Passing sequence to local BLAST Message-ID: Hi All, I want to download genbank file from NCBI and pass the protein sequence directly to the local BLAST. But I am getting error in BLAST step #------------------------------------------------------------------------------------------- from Bio import SeqIO from Bio import Entrez from Bio.Blast.Applications import NcbiblastpCommandline id='200203' handle = Entrez.efetch(db="protein", id=id, rettype="gp") seq_record = SeqIO.read(handle, "gb") x=seq_record.seq #getting the sequence in a variable x blastp_cline = NcbiblastpCommandline(query=x, db="protein_database", evalue=0.001) # My BLAST command result_handle, stderr = blastp_cline() #Running BLAST and getting error :( #------------------------------------------------------------------------------------------- At this last step I am getting error..... I sort-of understand the problem.....it is taking value of x as a file name while its a variable which contains the sequence. Is there any way out to this problem without making temporary file. Thanks in Advance -- Cheers Sheila From p.j.a.cock at googlemail.com Thu Jul 7 10:43:22 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 7 Jul 2011 11:43:22 +0100 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: On Thu, Jul 7, 2011 at 11:26 AM, Sheila the angel wrote: > Hi All, > > I want to download genbank file from NCBI and pass the protein sequence > directly to the local BLAST. But I am getting error in BLAST step > #------------------------------------------------------------------------------------------- > from Bio import SeqIO > from Bio import Entrez > from Bio.Blast.Applications import NcbiblastpCommandline > id='200203' > handle = Entrez.efetch(db="protein", id=id, rettype="gp") > seq_record = SeqIO.read(handle, "gb") > x=seq_record.seq ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?#getting the > sequence in a variable x > blastp_cline = NcbiblastpCommandline(query=x, db="protein_database", > evalue=0.001) ? ?# My BLAST command > result_handle, stderr = blastp_cline() ? ? ? ? ? ? ? ? ? ?#Running BLAST and > getting error :( > > #------------------------------------------------------------------------------------------- > > At this last step I am getting error..... > I sort-of understand the problem.....it is taking value of x as a file name > while its a variable which contains the sequence. > Is there any way out to this problem without making temporary file. With the standalone blast tools you generally need to prepare an input FASTA file with your query sequence(s). However, in principle you can give the input filename as - (default), and instead pipe the query FASTA record in as stdin (standard input). Try something like this (untested): ... blastp_cline = NcbiblastpCommandline(query="-", db="protein_database", evalue=0.001) stdout, stderr = blastp_cline(stdin=seq_record.format("fasta")) Peter From from.d.putto at gmail.com Thu Jul 7 10:53:28 2011 From: from.d.putto at gmail.com (Sheila the angel) Date: Thu, 7 Jul 2011 12:53:28 +0200 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: Great !!!! It works Thanks a lot :) On Thu, Jul 7, 2011 at 12:43 PM, Peter Cock wrote: > On Thu, Jul 7, 2011 at 11:26 AM, Sheila the angel > wrote: > > Hi All, > > > > I want to download genbank file from NCBI and pass the protein sequence > > directly to the local BLAST. But I am getting error in BLAST step > > > #------------------------------------------------------------------------------------------- > > from Bio import SeqIO > > from Bio import Entrez > > from Bio.Blast.Applications import NcbiblastpCommandline > > id='200203' > > handle = Entrez.efetch(db="protein", id=id, rettype="gp") > > seq_record = SeqIO.read(handle, "gb") > > x=seq_record.seq #getting the > > sequence in a variable x > > blastp_cline = NcbiblastpCommandline(query=x, db="protein_database", > > evalue=0.001) # My BLAST command > > result_handle, stderr = blastp_cline() #Running BLAST > and > > getting error :( > > > > > #------------------------------------------------------------------------------------------- > > > > At this last step I am getting error..... > > I sort-of understand the problem.....it is taking value of x as a file > name > > while its a variable which contains the sequence. > > Is there any way out to this problem without making temporary file. > > With the standalone blast tools you generally need to prepare an input > FASTA file with your query sequence(s). > > However, in principle you can give the input filename as - (default), > and instead pipe the query FASTA record in as stdin (standard input). > Try something like this (untested): > > ... > blastp_cline = NcbiblastpCommandline(query="-", db="protein_database", > evalue=0.001) > stdout, stderr = blastp_cline(stdin=seq_record.format("fasta")) > > Peter > From p.j.a.cock at googlemail.com Thu Jul 7 10:55:29 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 7 Jul 2011 11:55:29 +0100 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: On Thu, Jul 7, 2011 at 11:53 AM, Sheila the angel wrote: > Great !!!!?It works > Thanks a lot :) Thanks for letting us know. Another solution would be to just download the file from the NCBI in FASTA format rather than GenBank format - but I expect you had other reasons for doing that. Peter From jessecolangelolillis at googlemail.com Thu Jul 7 11:30:58 2011 From: jessecolangelolillis at googlemail.com (Jesse Colangelo-Lillis) Date: Thu, 7 Jul 2011 04:30:58 -0700 Subject: [Biopython] Bio.Blast; entrez_query= multiple organisms Message-ID: Can someone tell me the format for specifying multiple organisms within the blast parameters? I have this: result_handle = NCBIWWW.qblast("blastp", "nr", gene_seq, expect=100, hitlist_size=1, entrez_query="unclassified Caudovirales[orgn]") but I actually want to blast against both 'Caudovirales' and 'unclassified Caudovirales'. Thanks for any help. -- Jesse Colangelo-Lillis -- Australian National University Research School Earth Sciences Bldg 61. Mills Road Canberra, ACT 0200 Australia cell: 0415 380 105 -- From p.j.a.cock at googlemail.com Thu Jul 7 11:42:45 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 7 Jul 2011 12:42:45 +0100 Subject: [Biopython] Bio.Blast; entrez_query= multiple organisms In-Reply-To: References: Message-ID: On Thu, Jul 7, 2011 at 12:30 PM, Jesse Colangelo-Lillis wrote: > Can someone tell me the format for specifying multiple organisms > within the blast parameters? > > I have this: > > result_handle = NCBIWWW.qblast("blastp", "nr", gene_seq, expect=100, > hitlist_size=1, entrez_query="unclassified Caudovirales[orgn]") > > but I actually want to blast against both 'Caudovirales' and > 'unclassified Caudovirales'. > Thanks for any help. You probably would need explicit quotes round unclassified Caudovirales on the Entrez query, otherwise it will do this I think: unclassified AND Caudovirales[orgn] I would use the taxid rather than the name to avoid the space problem. Is a taxid that covers both your clades of interest? Otherwise combine fields with OR (or AND as appropriate). Play with the web interface to build the right query: http://www.ncbi.nlm.nih.gov/protein/advanced Peter From mnemonico at posthocergopropterhoc.net Fri Jul 8 04:19:20 2011 From: mnemonico at posthocergopropterhoc.net (A M Torres, Hugo) Date: Fri, 8 Jul 2011 01:19:20 -0300 Subject: [Biopython] error writing fasta file using SeqIO Message-ID: Hi. Can someone spot why I can't create a fasta file here? I tried following the cookbook tutorial but something goes wrong when I try to write the sequence from a SeqRecord object to a fasta file: Lodge It - New - All - About - ? Paste #9205 Paste Details reply | raw posted on Jul 8, 2011 4:12:16 AM - reply to this paste - download paste - compare with paste - select different colorscheme Autumn Borland Bw Colorful Default Emacs Friendly Fruity Manni Monokai Murphy Native Pastie Perldoc Tango Trac Vs - toggle line numbers 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 import abifpy from Bio.Seq import Seq from Bio.SeqRecord import SeqRecord from Bio.Emboss.Applications import NeedleCommandline import os #uso a funcao listdir from Bio import SeqIO def acessa_ab1(arquivo,trim=True): #generalizar depois """acessa um arquivo ab1 e retorna um objeto SeqRecord""" dado = abifpy.Trace(arquivo) if trim: cortado = dado.trim(dado.seq(ambig=True)) return SeqRecord(cortado, id=arquivo, description='dado cortado') else: return dado.seqrecord() def abre_ref(arquivo): """acessa um arquivo contendo uma sequencia de referencia retorna um objeto SeqRecord""" with open(arquivo, 'rUb') as dado: referencia = SeqIO.read(dado, 'genbank') return referencia def salva_fasta(obj_SeqRecord): """Pega um objeto SeqRecord e cria um fasta com a sua sequencia""" SeqIO.write([obj_SeqRecord], obj_SeqRecord + '.fasta','fasta') def processar_lote(diretorio, ref): """abre os arquivos ab1 de uma pasta, apara, salva em fasta, faz o alinhamento com o fasta de referencia e salva o alinhamento em um arquivo para analise posterior. diretorio --> uma string representando o caminho da pasta contendo os arquivos ref --> uma string representando o caminho absoluto + genbank com a sequencia de referencia. """ referencia = abre_ref(ref) referencia.id = 'sequencia de referencia' salva_fasta(referencia) ab1files = [x for x in os.listdir(diretorio) if x.endswith('.ab1')] for file in ab1files: dado = acessa_ab1(diretorio + file) salva_fasta(dado) needle_cline = NeedleCommandline(asequence='referencia.fasta', bsequence= file + '.fasta', gapopen=10, gapextend=0.5, outfile=file + "_aligned.txt") stdout, stderr = needle_cline() #pasta = '/home/mercutio22/Dropbox/My scripts/Fabi/vs/Seq_placa273 analisada/' #referencia = '/home/mercutio22/Dropbox/My scripts/Fabi/vs/Seq_placa273 analisada/BRCA1 (total) - Frag 3450.gb' #processar_lote(pasta, referencia) dado = acessa_ab1('/home/mercutio22/Dropbox/My scripts/Fabi/vs/Seq_placa273 analisada/1174411_3450F_A01.ab1') print type(dado) salva_fasta(dado) ===============================error msg============== Traceback (most recent call last): File "louise.py", line 59, in salva_fasta(dado) File "louise.py", line 26, in salva_fasta SeqIO.write([obj_SeqRecord], obj_SeqRecord + '.fasta','fasta') File "/usr/lib/pymodules/python2.6/Bio/SeqIO/__init__.py", line 412, in write count = writer_class(handle).write_file(sequences) File "/usr/lib/pymodules/python2.6/Bio/SeqIO/Interfaces.py", line 271, in write_file count = self.write_records(records) File "/usr/lib/pymodules/python2.6/Bio/SeqIO/Interfaces.py", line 256, in write_records self.write_record(record) File "/usr/lib/pymodules/python2.6/Bio/SeqIO/FastaIO.py", line 134, in write_record self.handle.write(">%s\n" % title) AttributeError: 'SeqRecord' object has no attribute 'write' From w.arindrarto at gmail.com Fri Jul 8 04:46:11 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Fri, 8 Jul 2011 06:46:11 +0200 Subject: [Biopython] error writing fasta file using SeqIO In-Reply-To: References: Message-ID: Hi Hugo, I think the problem is you tried to concatenate a SeqRecord object and a string object. Do this in 'salva_fasta' instead: SeqIO.write([obj_SeqRecord], obj_SeqRecord.id + '.fasta', 'fasta') And just as an additional input, in the 'processar_lote' method, you can use this to generate a list of absolute file name paths (import the os and glob module beforehand). files = [os.path.abspath(x) for x in glob.glob('*.ab1')] os.path.abspath() returns the absolute file path for a given file, and glob.glob() returns a list of names that matches the given pattern. Hope that helps! --- Wibowo Arindrarto (bow) http://bow.web.id On Fri, Jul 8, 2011 at 06:19, A M Torres, Hugo < mnemonico at posthocergopropterhoc.net> wrote: > Hi. Can someone spot why I can't create a fasta file here? I tried > following > the cookbook tutorial but something goes wrong when I try to write the > sequence from a SeqRecord object to a fasta file: > > Lodge It > > - New > - All > - About > - ? > > Paste #9205 > Paste Details > > reply | > raw > > posted on Jul 8, 2011 4:12:16 AM > > - reply to this paste > - download paste > - compare with paste > - select different colorscheme Autumn Borland Bw Colorful Default Emacs > Friendly Fruity Manni Monokai Murphy Native Pastie Perldoc Tango Trac Vs > - toggle line numbers< > http://paste.pound-python.org/show/9205/?linenos=no> > > 1 > 2 > 3 > 4 > 5 > 6 > 7 > 8 > 9 > 10 > 11 > 12 > 13 > 14 > 15 > 16 > 17 > 18 > 19 > 20 > 21 > 22 > 23 > 24 > 25 > 26 > 27 > 28 > 29 > 30 > 31 > 32 > 33 > 34 > 35 > 36 > 37 > 38 > 39 > 40 > 41 > 42 > 43 > 44 > 45 > 46 > 47 > 48 > 49 > 50 > 51 > 52 > 53 > 54 > 55 > 56 > 57 > 58 > 59 > 60 > 61 > 62 > 63 > 64 > 65 > 66 > 67 > 68 > 69 > 70 > 71 > 72 > 73 > 74 > 75 > 76 > 77 > 78 > > import abifpy > from Bio.Seq import Seq > from Bio.SeqRecord import SeqRecord > from Bio.Emboss.Applications import NeedleCommandline > import os #uso a funcao listdir > from Bio import SeqIO > > def acessa_ab1(arquivo,trim=True): #generalizar depois > """acessa um arquivo ab1 e retorna um objeto SeqRecord""" > dado = abifpy.Trace(arquivo) > if trim: > cortado = dado.trim(dado.seq(ambig=True)) > return SeqRecord(cortado, id=arquivo, description='dado cortado') > else: > return dado.seqrecord() > > def abre_ref(arquivo): > """acessa um arquivo contendo uma sequencia de referencia > retorna um objeto SeqRecord""" > with open(arquivo, 'rUb') as dado: > referencia = SeqIO.read(dado, 'genbank') > return referencia > > def salva_fasta(obj_SeqRecord): > """Pega um objeto SeqRecord e cria um fasta com a sua sequencia""" > > SeqIO.write([obj_SeqRecord], obj_SeqRecord + '.fasta','fasta') > > def processar_lote(diretorio, ref): > """abre os arquivos ab1 de uma pasta, apara, salva em fasta, faz o > alinhamento com > o fasta de referencia e salva o alinhamento em um arquivo para analise > posterior. > diretorio --> uma string representando o caminho da pasta contendo > os arquivos > ref --> uma string representando o caminho absoluto + genbank com a > sequencia de referencia. > """ > > referencia = abre_ref(ref) > referencia.id = 'sequencia de referencia' > salva_fasta(referencia) > ab1files = [x for x in os.listdir(diretorio) if x.endswith('.ab1')] > for file in ab1files: > dado = acessa_ab1(diretorio + file) > salva_fasta(dado) > needle_cline = NeedleCommandline(asequence='referencia.fasta', > bsequence= file + '.fasta', > gapopen=10, gapextend=0.5, > outfile=file + "_aligned.txt") > stdout, stderr = needle_cline() > > > > > #pasta = '/home/mercutio22/Dropbox/My scripts/Fabi/vs/Seq_placa273 > analisada/' > #referencia = '/home/mercutio22/Dropbox/My > scripts/Fabi/vs/Seq_placa273 analisada/BRCA1 (total) - Frag 3450.gb' > > #processar_lote(pasta, referencia) > > dado = acessa_ab1('/home/mercutio22/Dropbox/My > scripts/Fabi/vs/Seq_placa273 analisada/1174411_3450F_A01.ab1') > print type(dado) > salva_fasta(dado) > > > ===============================error msg============== > > Traceback (most recent call last): > File "louise.py", line 59, in > salva_fasta(dado) > File "louise.py", line 26, in salva_fasta > SeqIO.write([obj_SeqRecord], obj_SeqRecord + '.fasta','fasta') > File "/usr/lib/pymodules/python2.6/Bio/SeqIO/__init__.py", line 412, in > write > count = writer_class(handle).write_file(sequences) > File "/usr/lib/pymodules/python2.6/Bio/SeqIO/Interfaces.py", line > 271, in write_file > count = self.write_records(records) > File "/usr/lib/pymodules/python2.6/Bio/SeqIO/Interfaces.py", line > 256, in write_records > self.write_record(record) > File "/usr/lib/pymodules/python2.6/Bio/SeqIO/FastaIO.py", line 134, > in write_record > self.handle.write(">%s\n" % title) > AttributeError: 'SeqRecord' object has no attribute 'write' > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From cmccoy at fhcrc.org Fri Jul 8 22:33:35 2011 From: cmccoy at fhcrc.org (Connor McCoy) Date: Fri, 8 Jul 2011 15:33:35 -0700 Subject: [Biopython] seqmagick Message-ID: Hi all, We wrote (and use) seqmagick, a little tool to conveniently access BioPython's Sequence I/O and manipulation capabilities from the command line. Seqmagick allows one to extract summary information on sequence files, convert between formats based on file extension, modify sequences, and much more. For example: # Convert from fasta to stockholm > seqmagick convert seqfile.fasta seqfile.sto # Reverse complement the first 5 sequences in file > seqmagick convert --reverse-complement --head 5 seqfile.fasta seqfile.rev= comp.fasta # Remove all columns containing > 5% gaps, in place > seqmagick mogrify --squeeze-threshold 0.95 seqfile.fasta For more info, see: http://fhcrc.github.com/seqmagick/ Or to install: pip install seqmagick Comments / contributions welcome. Cheers, Connor From mnemonico at posthocergopropterhoc.net Mon Jul 11 06:22:10 2011 From: mnemonico at posthocergopropterhoc.net (A M Torres, Hugo) Date: Mon, 11 Jul 2011 03:22:10 -0300 Subject: [Biopython] error writing fasta file using SeqIO In-Reply-To: References: Message-ID: Hi folks. Thanks once again for helping me out. That problem is solved. I took a look at the glob module. It is really just neat! A new problem has arised when I try to call the process that should run the sequence alignment. I try to use the 'needle' subprocess but it fails with this error: http://paste.pound-python.org/show/9395/. Here is how the code looks now: http://paste.pound-python.org/show/9394/. The weird thing is needle executes ok when I use needle_cline as it is in a bash shell. I took care to write full paths when pointing to individual files and the needle binary but that hasn't solved the error. Any clues? On Fri, Jul 8, 2011 at 1:46 AM, Wibowo Arindrarto wrote: > Hi Hugo, > > I think the problem is you tried to concatenate a SeqRecord object and a > string object. Do this in 'salva_fasta' instead: > > SeqIO.write([obj_SeqRecord], obj_SeqRecord.id + '.fasta', 'fasta') > > And just as an additional input, in the 'processar_lote' method, you can > use this to generate a list of absolute file name paths (import the os and > glob module beforehand). > > files = [os.path.abspath(x) for x in glob.glob('*.ab1')] > > os.path.abspath() returns the absolute file path for a given file, and > glob.glob() returns a list of names that matches the given pattern. > > Hope that helps! > --- > Wibowo Arindrarto (bow) > http://bow.web.id > > > > From gori at cs.ru.nl Mon Jul 11 16:07:59 2011 From: gori at cs.ru.nl (Fabio Gori) Date: Mon, 11 Jul 2011 18:07:59 +0200 Subject: [Biopython] Parsing FASTA records based on headers Message-ID: <201107111808.00013.gori@cs.ru.nl> Hi all, I tried to parse a FASTA file to select the sequences whose headers satisfy a condition. The condition is that the first word of the header belongs to a list named SelectedSequencesId. In the page http://biopython.org/wiki/SeqIO, I found this example, where the condition is that sequence length <300: 1 from Bio import SeqIO 2 3 input_seq_iterator = SeqIO.parse(open("cor6_6.gb", "rU"), "genbank") 4 short_seq_iterator = (record for record in input_seq_iterator \ 5 if len(record.seq) < 300) 6 7 output_handle = open("short_seqs.fasta", "w") 8 SeqIO.write(short_seq_iterator, output_handle, "fasta") 9 output_handle.close() so I tried to substitute line 5 with 5 record.id.split()[0] in SelectedSequencesId) But it did not work. I was able to get what I wanted generating a list with all the records and then parsing it, but I'd like to find a solution that uses a generating expression. Thanks in advance, Fabio -- F. Gori, PhD student Intelligent Systems ICIS (Institute for Computing and Information Sciences) Radboud University Nijmegen Home Page: http://www.cs.ru.nl/~gori/ From surykartka at gmail.com Mon Jul 11 17:02:51 2011 From: surykartka at gmail.com (Dorota Matelska) Date: Mon, 11 Jul 2011 19:02:51 +0200 Subject: [Biopython] Parsing FASTA records based on headers In-Reply-To: <201107111808.00013.gori@cs.ru.nl> References: <201107111808.00013.gori@cs.ru.nl> Message-ID: <5EECA7AF-1767-4BFB-9ECC-D8EACFCEED54@gmail.com> Hi Fabio, You forgot to change also the format name of your input file while using SeqIO.parse(). Your input is of fasta format, so instead of "genbank" put there "fasta", and it should work. Hope this will help you :-) Dorota On Jul 11, 2011, at 6:07 PM, Fabio Gori wrote: > Hi all, > > I tried to parse a FASTA file to select the sequences whose headers satisfy a > condition. The condition is that the first word of the header belongs to a list > named SelectedSequencesId. > In the page http://biopython.org/wiki/SeqIO, I found this example, where the > condition is that sequence length <300: > > 1 from Bio import SeqIO > 2 > 3 input_seq_iterator = SeqIO.parse(open("cor6_6.gb", "rU"), "genbank") > 4 short_seq_iterator = (record for record in input_seq_iterator \ > 5 if len(record.seq) < 300) > 6 > 7 output_handle = open("short_seqs.fasta", "w") > 8 SeqIO.write(short_seq_iterator, output_handle, "fasta") > 9 output_handle.close() > > so I tried to substitute line 5 with > 5 record.id.split()[0] in SelectedSequencesId) > > But it did not work. > I was able to get what I wanted generating a list with all the records and > then parsing it, but I'd like to find a solution that uses a generating > expression. > > Thanks in advance, > > Fabio > > -- > > F. Gori, PhD student > Intelligent Systems > ICIS (Institute for Computing and Information Sciences) > Radboud University Nijmegen > > Home Page: http://www.cs.ru.nl/~gori/ > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython From w.arindrarto at gmail.com Mon Jul 11 17:02:36 2011 From: w.arindrarto at gmail.com (Wibowo Arindrarto) Date: Mon, 11 Jul 2011 19:02:36 +0200 Subject: [Biopython] error writing fasta file using SeqIO In-Reply-To: References: Message-ID: Hi Hugo, I think you should pass 'alinhar' as the argument for subprocess.call() instead of 'needle_cline'. You can use the 'needle_cline' for the argument, but you should also set shell to true, so the command is subprocess.call(needle_cline, shell=True). Hope that helps. --- Wibowo Arindrarto (bow) http://bow.web.id On Mon, Jul 11, 2011 at 08:22, A M Torres, Hugo < mnemonico at posthocergopropterhoc.net> wrote: > Hi folks. > > Thanks once again for helping me out. That problem is solved. I took a look > at the glob module. It is really just neat! > > A new problem has arised when I try to call the process that should run the > sequence alignment. I try to use the 'needle' subprocess but it fails with > this error: http://paste.pound-python.org/show/9395/. > > Here is how the code looks now: http://paste.pound-python.org/show/9394/. > > The weird thing is needle executes ok when I use needle_cline as it is in a > bash shell. I took care to write full paths when pointing to individual > files and the needle binary but that hasn't solved the error. > > Any clues? > > > On Fri, Jul 8, 2011 at 1:46 AM, Wibowo Arindrarto wrote: > >> Hi Hugo, >> >> I think the problem is you tried to concatenate a SeqRecord object and a >> string object. Do this in 'salva_fasta' instead: >> >> SeqIO.write([obj_SeqRecord], obj_SeqRecord.id + '.fasta', 'fasta') >> >> And just as an additional input, in the 'processar_lote' method, you can >> use this to generate a list of absolute file name paths (import the os and >> glob module beforehand). >> >> files = [os.path.abspath(x) for x in glob.glob('*.ab1')] >> >> os.path.abspath() returns the absolute file path for a given file, and >> glob.glob() returns a list of names that matches the given pattern. >> >> Hope that helps! >> --- >> Wibowo Arindrarto (bow) >> http://bow.web.id >> >> >> >> > From devaniranjan at gmail.com Mon Jul 11 19:43:56 2011 From: devaniranjan at gmail.com (George Devaniranjan) Date: Mon, 11 Jul 2011 15:43:56 -0400 Subject: [Biopython] comparision of alignment scores Message-ID: I have several statistical comparison of alignment scores for a list of proteins--generated using biopython with the use of BLOSUM and other matrix generated by me. All matrix methods (inc BLOSUM) correctly identies it's own sequence in a collection of seq (high score set apart from the other scores) but I want to see if my own matrix is better performing than the BLOSUM. i.e --is my matrix more sensitive than BLOSUM. Is there a way using statistics to find this out? I know that this might not be the most appropriate forum to ask this question but since many of you work in this area I thought I will try. Thank you, George From p.j.a.cock at googlemail.com Mon Jul 11 19:51:09 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 11 Jul 2011 20:51:09 +0100 Subject: [Biopython] Parsing FASTA records based on headers In-Reply-To: <201107111808.00013.gori@cs.ru.nl> References: <201107111808.00013.gori@cs.ru.nl> Message-ID: On Mon, Jul 11, 2011 at 5:07 PM, Fabio Gori wrote: > Hi all, > > I tried to parse a FASTA file to select the sequences whose headers > satisfy a condition. > > The condition is that the first word of the header belongs to a list > named SelectedSequencesId. > > In the page http://biopython.org/wiki/SeqIO, I found this example, where the > condition is that sequence length <300: > > ... > > so I tried to substitute line 5 with > 5 record.id.split()[0] in SelectedSequencesId) The SeqIO parse uses the first word of the ">" line as the id, so all you need is this: record.id in SelectedSequencesId rather than: len(record.seq) < 300 > But it did not work. In what way? Did you also change the format to "fasta" as Dorota pointed out? Peter From mnemonico at posthocergopropterhoc.net Tue Jul 12 07:59:54 2011 From: mnemonico at posthocergopropterhoc.net (A M Torres, Hugo) Date: Tue, 12 Jul 2011 04:59:54 -0300 Subject: [Biopython] error writing fasta file using SeqIO In-Reply-To: References: Message-ID: Yes of course! I should've known. Thanks a bunch, this one was killing me. On Mon, Jul 11, 2011 at 2:02 PM, Wibowo Arindrarto wrote: > Hi Hugo, > > I think you should pass 'alinhar' as the argument for subprocess.call() > instead of 'needle_cline'. You can use the 'needle_cline' for the argument, > but you should also set shell to true, so the command > is subprocess.call(needle_cline, shell=True). > > Hope that helps. > --- > Wibowo Arindrarto (bow) > http://bow.web.id > > > > On Mon, Jul 11, 2011 at 08:22, A M Torres, Hugo < > mnemonico at posthocergopropterhoc.net> wrote: > >> Hi folks. >> >> Thanks once again for helping me out. That problem is solved. I took a >> look at the glob module. It is really just neat! >> >> A new problem has arised when I try to call the process that should run >> the sequence alignment. I try to use the 'needle' subprocess but it fails >> with this error: http://paste.pound-python.org/show/9395/. >> >> Here is how the code looks now: http://paste.pound-python.org/show/9394/. >> >> The weird thing is needle executes ok when I use needle_cline as it is in >> a bash shell. I took care to write full paths when pointing to individual >> files and the needle binary but that hasn't solved the error. >> >> Any clues? >> >> >> On Fri, Jul 8, 2011 at 1:46 AM, Wibowo Arindrarto > > wrote: >> >>> Hi Hugo, >>> >>> I think the problem is you tried to concatenate a SeqRecord object and a >>> string object. Do this in 'salva_fasta' instead: >>> >>> SeqIO.write([obj_SeqRecord], obj_SeqRecord.id + '.fasta', 'fasta') >>> >>> And just as an additional input, in the 'processar_lote' method, you can >>> use this to generate a list of absolute file name paths (import the os and >>> glob module beforehand). >>> >>> files = [os.path.abspath(x) for x in glob.glob('*.ab1')] >>> >>> os.path.abspath() returns the absolute file path for a given file, and >>> glob.glob() returns a list of names that matches the given pattern. >>> >>> Hope that helps! >>> --- >>> Wibowo Arindrarto (bow) >>> http://bow.web.id >>> >>> >>> >>> >> > From lawson.jones at gmail.com Wed Jul 13 18:40:18 2011 From: lawson.jones at gmail.com (Daniel Jones) Date: Wed, 13 Jul 2011 14:40:18 -0400 Subject: [Biopython] clustalw align multiple sequences to reference. Message-ID: Hi Biopython users, I have a file with many (~50,000) 200 bp sequences, each of which I would like to align to a fixed reference sequence. I *don't* care about aligning all 50,000 sequences with each other; I only care about aligning each one with the reference sequence. I can't figure out a way to do this without generating 50,000 files, which seems like ridiculous unnecessary overhead. It seems like ClustalW's interface is quite inflexible in demanding separate input and output files for each alignment, but I don't have much experience using it so maybe I'm completely missing something. Incidentally, I'm not wedded to the idea of using ClustalW, so if there's an alternate alignment program that would make this easier, I'd certainly be open to trying it. Thanks, Daniel Jones From eric.talevich at gmail.com Wed Jul 13 20:38:42 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 13 Jul 2011 16:38:42 -0400 Subject: [Biopython] clustalw align multiple sequences to reference. In-Reply-To: References: Message-ID: On Wed, Jul 13, 2011 at 2:40 PM, Daniel Jones wrote: > Hi Biopython users, > I have a file with many (~50,000) 200 bp sequences, each of which I would > like to align to a fixed reference sequence. I *don't* care about aligning > all 50,000 sequences with each other; I only care about aligning each one > with the reference sequence. I can't figure out a way to do this without > generating 50,000 files, which seems like ridiculous unnecessary overhead. > It seems like ClustalW's interface is quite inflexible in demanding > separate > input and output files for each alignment, but I don't have much experience > using it so maybe I'm completely missing something. > > Incidentally, I'm not wedded to the idea of using ClustalW, so if there's > an > alternate alignment program that would make this easier, I'd certainly be > open to trying it. > > Are these reads from sequencing? If so, then BWA or Bowtie might be what you want: http://bio-bwa.sourceforge.net/ http://bowtie-bio.sourceforge.net/index.shtml If not, then you could try BLAST with your reference sequence as the query and the short sequences as your database. Cheers, Eric From p.j.a.cock at googlemail.com Wed Jul 13 21:44:55 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 13 Jul 2011 22:44:55 +0100 Subject: [Biopython] clustalw align multiple sequences to reference. In-Reply-To: References: Message-ID: On Wednesday, July 13, 2011, Daniel Jones wrote: > Hi Biopython users, > I have a file with many (~50,000) 200 bp sequences, each of which I would > like to align to a fixed reference sequence. I *don't* care about aligning > all 50,000 sequences with each other; I only care about aligning each one > with the reference sequence. I can't figure out a way to do this without > generating 50,000 files, which seems like ridiculous unnecessary overhead. > It seems like ClustalW's interface is quite inflexible in demanding separate > input and output files for each alignment, but I don't have much experience > using it so maybe I'm completely missing something. > > Incidentally, I'm not wedded to the idea of using ClustalW, so if there's an > alternate alignment program that would make this easier, I'd certainly be > open to trying it. > > Thanks, > Daniel Jones You need a pairwise alignment tool. Perhaps needle or water from the EMBOSS suite, or Biopython's pairwise2 module would be suitable (not in the tutorial, read the API docs). However, as Eric suggested, an NGS alignment tool might be more appropriate. Peter From mnemonico at posthocergopropterhoc.net Thu Jul 14 04:19:22 2011 From: mnemonico at posthocergopropterhoc.net (A M Torres, Hugo) Date: Thu, 14 Jul 2011 01:19:22 -0300 Subject: [Biopython] clustalw align multiple sequences to reference. In-Reply-To: References: Message-ID: I am having the same problem. First I was running "needle" on each sequencing data and its reference then I thought of generating a big fasta containing all sequences and then running MUSCLE. Like Daniel Jones I don't mind that sequences are aligned against each other but they should be aligned properly against the reference sequence. On a side note: I have sequenced DNA data which I need to align to a reference sequence and look for mutations. I am confused. Should I use a global alignment algorithm or a local alignment algorithm? I have data for forward and reverse strands, so it could be useful if I had both aligned with the reference in a single file. I don't mean to hijack the thread, but this seems exactly like my problem. Excuse me for any inconvenience. Hugo Torres On Wed, Jul 13, 2011 at 6:44 PM, Peter Cock wrote: > On Wednesday, July 13, 2011, Daniel Jones wrote: > > Hi Biopython users, > > I have a file with many (~50,000) 200 bp sequences, each of which I would > > like to align to a fixed reference sequence. I *don't* care about > aligning > > all 50,000 sequences with each other; I only care about aligning each one > > with the reference sequence. I can't figure out a way to do this without > > generating 50,000 files, which seems like ridiculous unnecessary > overhead. > > It seems like ClustalW's interface is quite inflexible in demanding > separate > > input and output files for each alignment, but I don't have much > experience > > using it so maybe I'm completely missing something. > > > > Incidentally, I'm not wedded to the idea of using ClustalW, so if there's > an > > alternate alignment program that would make this easier, I'd certainly be > > open to trying it. > > > > Thanks, > > Daniel Jones > > You need a pairwise alignment tool. Perhaps needle or water from the > EMBOSS suite, or Biopython's pairwise2 module would be suitable (not > in the tutorial, read the API docs). > > However, as Eric suggested, an NGS alignment tool might be more > appropriate. > > Peter > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From gori at cs.ru.nl Thu Jul 14 11:21:31 2011 From: gori at cs.ru.nl (Fabio Gori) Date: Thu, 14 Jul 2011 13:21:31 +0200 Subject: [Biopython] Parsing FASTA records based on headers In-Reply-To: References: <201107111808.00013.gori@cs.ru.nl> Message-ID: <201107141321.31160.gori@cs.ru.nl> The condition "if record.id in SelectedSequencesId " works fine now, thank you. Fabio On Monday, July 11, 2011 09:51:09 pm Peter Cock wrote: > On Mon, Jul 11, 2011 at 5:07 PM, Fabio Gori wrote: > > Hi all, > > > > I tried to parse a FASTA file to select the sequences whose headers > > satisfy a condition. > > > > The condition is that the first word of the header belongs to a list > > named SelectedSequencesId. > > > > In the page http://biopython.org/wiki/SeqIO, I found this example, where > > the condition is that sequence length <300: > > > > ... > > > > so I tried to substitute line 5 with > > 5 record.id.split()[0] in SelectedSequencesId) > > The SeqIO parse uses the first word of the ">" line as the id, > so all you need is this: record.id in SelectedSequencesId > rather than: len(record.seq) < 300 > > > But it did not work. > > In what way? Did you also change the format to "fasta" > as Dorota pointed out? > > Peter -- F. Gori, PhD student Intelligent Systems ICIS (Institute for Computing and Information Sciences) Radboud University Nijmegen Post Address: Intelligent Systems Postbus 9010 6500 GL Nijmegen The Netherlands Visiting Address: Room HG02.517 Faculty of Science Heyendaalseweg 135 6525 AJ Nijmegen Tel.: +31 (0)24 36 52703 E-mail: gori at cs.ru.nl Home Page: http://www.cs.ru.nl/~gori/ From srivastavaisha.06 at gmail.com Fri Jul 15 07:29:42 2011 From: srivastavaisha.06 at gmail.com (isha srivastava) Date: Fri, 15 Jul 2011 12:59:42 +0530 Subject: [Biopython] Query Message-ID: Hello, I am new user of BioPython. I have Downloaded the latest version of BioPython i.e. BioPython 1.57. fallowed all the instructions according to tutorial. but wen i am running the program : >>> from Bio import SeqIO >>> for seq_record in SeqIO.parse("ls_orchid.fasta", "fasta"): ... print seq_record.id ... print repr(seq_record.seq) ... print len(seq_record) ... it is showing error as fallows :- Traceback (most recent call last): File "", line 1, in File "/usr/lib/pymodules/python2.6/Bio/SeqIO/__init__.py", line 424, in parse raise TypeError("Need a file handle, not a string (i.e. not a filename)") TypeError: Need a file handle, not a string (i.e. not a filename) Sir, how to solve this error? Kindly make a soon reply. With due regards, Isha From anaryin at gmail.com Fri Jul 15 07:44:02 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Fri, 15 Jul 2011 09:44:02 +0200 Subject: [Biopython] Query In-Reply-To: References: Message-ID: Hello Isha, As the error message says, you should not provide the filename to the function, but instead a file handle. In other words, you need to open the file first and then provide this to the function. handle = open("ls_orchid.fasta") > for seq_record in SeqIO.parse(handle, "fasta"): > blablabla Regards, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Fri, Jul 15, 2011 at 9:29 AM, isha srivastava < srivastavaisha.06 at gmail.com> wrote: > Hello, > > I am new user of BioPython. > I have Downloaded the latest version of BioPython i.e. BioPython 1.57. > fallowed all the instructions according to tutorial. > but wen i am running the program : > > >>> from Bio import SeqIO > >>> for seq_record in SeqIO.parse("ls_orchid.fasta", "fasta"): > ... print seq_record.id > ... print repr(seq_record.seq) > ... print len(seq_record) > ... > > > it is showing error as fallows :- > > Traceback (most recent call last): > File "", line 1, in > File "/usr/lib/pymodules/python2.6/Bio/SeqIO/__init__.py", line 424, in > parse > raise TypeError("Need a file handle, not a string (i.e. not a > filename)") > TypeError: Need a file handle, not a string (i.e. not a filename) > > Sir, how to solve this error? > Kindly make a soon reply. > > > With due regards, > Isha > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Fri Jul 15 09:37:03 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 15 Jul 2011 10:37:03 +0100 Subject: [Biopython] Query In-Reply-To: References: Message-ID: On Fri, Jul 15, 2011 at 8:29 AM, isha srivastava wrote: > ? ?Hello, > > ? ?I am new user of BioPython. > ? ?I have Downloaded the latest version of BioPython ?i.e. BioPython 1.57. > ? ?fallowed all the instructions according to tutorial. > ? ?but wen i am running the program : > > ? ? >>> from Bio import SeqIO > ? ? >>> for seq_record in SeqIO.parse("ls_orchid.fasta", "fasta"): > ? ? ... ? ? ? ? print seq_record.id > ? ? ... ? ? ? ? print repr(seq_record.seq) > ? ? ... ? ? ? ? print len(seq_record) > ? ? ... > > > ? ?it is showing error as fallows :- > > ? ?Traceback (most recent call last): > ? ? File "", line 1, in > ? ? File "/usr/lib/pymodules/python2.6/Bio/SeqIO/__init__.py", line 424, in > parse > ? ? raise TypeError("Need a file handle, not a string (i.e. not a > filename)") > ? ?TypeError: Need a file handle, not a string (i.e. not a filename) > > Sir, how to solve this error? That happens on older versions of Biopython - you can check what is being used within python with: import Bio print Bio.__version__ My guess is your install of Biopython 1.57 hasn't worked properly, and an older version is still being used. Peter From p.j.a.cock at googlemail.com Fri Jul 15 12:48:36 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 15 Jul 2011 13:48:36 +0100 Subject: [Biopython] Query In-Reply-To: References: Message-ID: On Fri, Jul 15, 2011 at 10:37 AM, Peter Cock wrote: >> >> That happens on older versions of Biopython - you can check >> what is being used within python with: >> >> import Bio >> print Bio.__version__ >> >> My guess is your install of Biopython 1.57 hasn't worked properly, >> and an older version is still being used. >> >> Peter >> On Fri, Jul 15, 2011 at 12:56 PM, isha srivastava wrote: > Hi Peter, > > you were right. i checked the version of biopython and it is showing version > 1.53. > how to get rid of this problem? > > hope for your soon reply. > > Thanx so much > isha > Hi again, Please CC the mailing list rather than emailing me directly with things like this. How was Biopython installed? What OS are you using? Peter From srivastavaisha.06 at gmail.com Fri Jul 15 18:03:18 2011 From: srivastavaisha.06 at gmail.com (isha srivastava) Date: Fri, 15 Jul 2011 23:33:18 +0530 Subject: [Biopython] Hi Message-ID: Hello Peter, I am sorry . Further i will not mail on your email directly. I am using UBUNTU. ya I ll reinstall BioPython 1.57 again. Regards, Isha From p.j.a.cock at googlemail.com Sun Jul 17 20:19:02 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sun, 17 Jul 2011 21:19:02 +0100 Subject: [Biopython] Hi In-Reply-To: References: Message-ID: On Fri, Jul 15, 2011 at 7:03 PM, isha srivastava wrote: > Hello Peter, > > I am sorry . Further i will not mail on your email directly. One reason I asked was I am at a conference at the moment, so it was possible someone else might have been able to try to help first. The main reason is the mailing list is public, and people can search it for solutions to this kind of problem in future. > > I am using UBUNTU. ya I ll reinstall BioPython 1.57 again. > Was Biopython 1.53 installed using the Ubuntu package system? If so, please use the package manager to uninstall this old version before trying to reinstall Biopython 1.57. Peter From malvikasharan at gmail.com Sun Jul 17 21:53:03 2011 From: malvikasharan at gmail.com (malvika sharan) Date: Sun, 17 Jul 2011 23:53:03 +0200 Subject: [Biopython] PSI-BLAST Message-ID: Hi All, I am trying to code WWW version of PSI-BLAST. My intention is to integrate PSI-Blast in my tool in order to find distant homologs. i some how can not run the commandline version of psi blast (from Bio.Blast.Applications import *NcbipsiblastCommandline*) as it states the error *" **Python* error: ImportError: *cannot* import *name **NcbipsiblastCommandline "* * * *There are not much information available about it. Can somebody figure out why?* * * *Malvika* * * From senthil.debian at gmail.com Sun Jul 17 22:23:50 2011 From: senthil.debian at gmail.com (Senthil Kumar M) Date: Sun, 17 Jul 2011 15:23:50 -0700 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: On Sun, Jul 17, 2011 at 2:53 PM, malvika sharan wrote: > Hi All, > > I am trying to code WWW version of PSI-BLAST. My intention is to integrate > PSI-Blast in my tool in order to find distant homologs. > > i some how can not run the commandline version of psi blast (from > Bio.Blast.Applications import *NcbipsiblastCommandline*) as it states the > error > *" **Python* error: ImportError: *cannot* import *name > **NcbipsiblastCommandline > "* > * > * > *There are not much information available about it. Can somebody figure out > why?* > * > * > *Malvika* > * > * Hi, According to the Biopython tutorial http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc93 (2 April 2011 version): "You can run the standalone verion of PSI-BLAST (the legacy NCBI command line tool blastpgp, or its replacement psiblast) using the wrappers in Bio.Blast.Applications module. At the time of writing, the NCBI do not appear to support tools running a PSI-BLAST search via the internet." HTH, Senthil -/ "You know, it's at times like this when I'm trapped in a Vogon airlock with a man from Betelgeuse and about to die of asphyxiation in deep space that I really wish I'd listened to what my mother told me when I was young!" "Why, what did she tell you?" "I don't know, I didn't listen." -- Douglas Adams, "The Hitchhiker's Guide to the Galaxy" From malvikasharan at gmail.com Sun Jul 17 22:58:57 2011 From: malvikasharan at gmail.com (malvika sharan) Date: Mon, 18 Jul 2011 00:58:57 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: Thanks Senthil, but that is what i mentioned that i am trying to use wrappers in Bio.Blast.Applications module. It just does not seem to work. I have the whole Biopython installed and i am using Python 2.7. the problem occurs while importing itself, i mentioned the error as well. here is the whole error description: Traceback (most recent call last): File "psiBlast.py", line 6, in from Bio.Blast.Applications import NcbipsiblastCommandline ImportError: cannot importname NcbipsiblastCommandline i just cant figure out why. On Mon, Jul 18, 2011 at 12:23 AM, Senthil Kumar M wrote: > On Sun, Jul 17, 2011 at 2:53 PM, malvika sharan > wrote: > > Hi All, > > > > I am trying to code WWW version of PSI-BLAST. My intention is to > integrate > > PSI-Blast in my tool in order to find distant homologs. > > > > i some how can not run the commandline version of psi blast (from > > Bio.Blast.Applications import *NcbipsiblastCommandline*) as it states the > > error > > *" **Python* error: ImportError: *cannot* import *name > > **NcbipsiblastCommandline > > "* > > * > > * > > *There are not much information available about it. Can somebody figure > out > > why?* > > * > > * > > *Malvika* > > * > > * > > Hi, > > According to the Biopython tutorial > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc93 (2 April > 2011 version): > > "You can run the standalone verion of PSI-BLAST (the legacy NCBI > command line tool blastpgp, or its replacement psiblast) using the > wrappers in Bio.Blast.Applications module. > At the time of writing, the NCBI do not appear to support tools > running a PSI-BLAST search via the internet." > > HTH, > > Senthil > > -/ > "You know, it's at times like this when I'm trapped in a Vogon > airlock with a man from Betelgeuse and about to die of asphyxiation in > deep space that I really wish I'd listened to what my mother told me > when I was young!" > "Why, what did she tell you?" > "I don't know, I didn't listen." > -- Douglas Adams, "The Hitchhiker's Guide to the Galaxy" > From senthil.debian at gmail.com Sun Jul 17 23:20:17 2011 From: senthil.debian at gmail.com (Senthil Kumar M) Date: Sun, 17 Jul 2011 16:20:17 -0700 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: On Sun, Jul 17, 2011 at 3:58 PM, malvika sharan wrote: > Thanks Senthil, but that is what i mentioned that i am trying to use > wrappers in Bio.Blast.Applications module. It just does not seem to work. I > have the whole Biopython installed and i am using Python 2.7. > > the problem occurs while importing itself, i mentioned the error as well. > here is the whole error description: > Traceback (most recent call last): > ??? File "psiBlast.py", line 6, in > ??????? from Bio.Blast.Applications import NcbipsiblastCommandline > ImportError: cannot importname NcbipsiblastCommandline > > i just cant figure out why. Hi, Could you please provide more details such as your operating system, python/biopython versions and a minimal example of your script? On my system for example, 'from Bio.Blast.Applications import NcbipsiblastCommandline' does not raise an error. senthil at deepthought ~ $ python Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) [GCC 4.4.5] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import Bio >>> print Bio.__version__ 1.57 >>> from Bio.Blast.Applications import NcbipsiblastCommandline >>> senthil at deepthought ~ $ uname -srm Linux 2.6.35-22-generic x86_64 HTH Senthil -/ Time is an illusion, lunchtime doubly so. -- The Hitchhiker's Guide to the Galaxy From malvikasharan at gmail.com Sun Jul 17 23:27:33 2011 From: malvikasharan at gmail.com (malvika sharan) Date: Mon, 18 Jul 2011 01:27:33 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. And well as i said that the the error shows at the import. import os, sys from Bio import SeqIO from Bio import Entrez from Bio.Blast import NCBIWWW* *from Bio.Blast.NCBIStandalone import PSIBlastParser *from Bio.Blast.Applications import NcbipsiblastCommandline* from Bio.Blast import NCBIXML its irrelevant with my programming cos the program dies at line 6. On Mon, Jul 18, 2011 at 1:20 AM, Senthil Kumar M wrote: > On Sun, Jul 17, 2011 at 3:58 PM, malvika sharan > wrote: > > Thanks Senthil, but that is what i mentioned that i am trying to use > > wrappers in Bio.Blast.Applications module. It just does not seem to work. > I > > have the whole Biopython installed and i am using Python 2.7. > > > > the problem occurs while importing itself, i mentioned the error as well. > > here is the whole error description: > > Traceback (most recent call last): > > File "psiBlast.py", line 6, in > > from Bio.Blast.Applications import NcbipsiblastCommandline > > ImportError: cannot importname NcbipsiblastCommandline > > > > i just cant figure out why. > > Hi, > > Could you please provide more details such as your operating system, > python/biopython versions and a minimal example of your script? > > On my system for example, 'from Bio.Blast.Applications import > NcbipsiblastCommandline' does not raise an error. > > senthil at deepthought ~ $ python > Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) > [GCC 4.4.5] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import Bio > >>> print Bio.__version__ > 1.57 > >>> from Bio.Blast.Applications import NcbipsiblastCommandline > >>> > > senthil at deepthought ~ $ uname -srm > Linux 2.6.35-22-generic x86_64 > > HTH > > Senthil > > -/ > Time is an illusion, lunchtime doubly so. > -- The Hitchhiker's Guide to the Galaxy > From malvikasharan at gmail.com Sun Jul 17 23:48:20 2011 From: malvikasharan at gmail.com (malvika sharan) Date: Mon, 18 Jul 2011 01:48:20 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: please ignore the asterisk * , it appeared due to ctrl b. On Mon, Jul 18, 2011 at 1:27 AM, malvika sharan wrote: > My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. > > And well as i said that the the error shows at the import. > import os, sys > from Bio import SeqIO > from Bio import Entrez > from Bio.Blast import NCBIWWW* > *from Bio.Blast.NCBIStandalone import PSIBlastParser > > *from Bio.Blast.Applications import NcbipsiblastCommandline* > from Bio.Blast import NCBIXML > > its irrelevant with my programming cos the program dies at line 6. > > > On Mon, Jul 18, 2011 at 1:20 AM, Senthil Kumar M > wrote: > >> On Sun, Jul 17, 2011 at 3:58 PM, malvika sharan >> wrote: >> > Thanks Senthil, but that is what i mentioned that i am trying to use >> > wrappers in Bio.Blast.Applications module. It just does not seem to >> work. I >> > have the whole Biopython installed and i am using Python 2.7. >> > >> > the problem occurs while importing itself, i mentioned the error as >> well. >> > here is the whole error description: >> > Traceback (most recent call last): >> > File "psiBlast.py", line 6, in >> > from Bio.Blast.Applications import NcbipsiblastCommandline >> > ImportError: cannot importname NcbipsiblastCommandline >> > >> > i just cant figure out why. >> >> Hi, >> >> Could you please provide more details such as your operating system, >> python/biopython versions and a minimal example of your script? >> >> On my system for example, 'from Bio.Blast.Applications import >> NcbipsiblastCommandline' does not raise an error. >> >> senthil at deepthought ~ $ python >> Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) >> [GCC 4.4.5] on linux2 >> Type "help", "copyright", "credits" or "license" for more information. >> >>> import Bio >> >>> print Bio.__version__ >> 1.57 >> >>> from Bio.Blast.Applications import NcbipsiblastCommandline >> >>> >> >> senthil at deepthought ~ $ uname -srm >> Linux 2.6.35-22-generic x86_64 >> >> HTH >> >> Senthil >> >> -/ >> Time is an illusion, lunchtime doubly so. >> -- The Hitchhiker's Guide to the Galaxy >> > > From alrakib at hotmail.com Mon Jul 18 07:54:23 2011 From: alrakib at hotmail.com (L. Zarel) Date: Mon, 18 Jul 2011 09:54:23 +0200 Subject: [Biopython] translating a FASTA file of CDS entries Message-ID: Good morning, I'm Luis Zarel, from Spain. I'm a new user. My question is: How I can make a translation into FASTA from CCDS entries? Any examples? Thank you very much. From p.j.a.cock at googlemail.com Mon Jul 18 09:14:30 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 18 Jul 2011 10:14:30 +0100 Subject: [Biopython] translating a FASTA file of CDS entries In-Reply-To: References: Message-ID: On Monday, July 18, 2011, L. Zarel wrote: > > Good morning, > > I'm Luis Zarel, from Spain. I'm a new user. > > My question is: How I can make a translation into FASTA from CCDS entries? Any examples? > > Thank you very much. > Hi Luis, Could you clarify your question? Did you mean CDS (coding sequences), for example from GenBank files? Peter From alrakib at hotmail.com Mon Jul 18 10:54:41 2011 From: alrakib at hotmail.com (L. Zarel) Date: Mon, 18 Jul 2011 12:54:41 +0200 Subject: [Biopython] translating a FASTA file of CDS entries In-Reply-To: References: , Message-ID: Yes, I mean CDS > Hi Luis, > > Could you clarify your question? Did you mean CDS (coding sequences), > for example from GenBank files? > > Peter From p.j.a.cock at googlemail.com Mon Jul 18 12:07:37 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 18 Jul 2011 13:07:37 +0100 Subject: [Biopython] translating a FASTA file of CDS entries In-Reply-To: References: Message-ID: On Monday, July 18, 2011, L. Zarel wrote: > > Yes, I mean CDS > Hi Luis, So you have a FASTA file containing CDS nucleotide sequences, and you want to turn this into a FASTA file containing translated protein sequences? Have you looked at the documentation? http://biopython.org/DIST/docs/tutorial/Tutorial.html http://biopython.org/DIST/docs/tutorial/Tutorial.pdf In particular, in the Cookbook chapter there is a section "Translating a FASTA file of CDS entries". If not, what form do you have your CDS data in? For example, do you have an annotated GenBank file, or a tabular file of co-ordinates and strands, or a GFF3 file, or something else? Peter From srivastavaisha.06 at gmail.com Mon Jul 18 12:48:24 2011 From: srivastavaisha.06 at gmail.com (isha srivastava) Date: Mon, 18 Jul 2011 18:18:24 +0530 Subject: [Biopython] Biopyhton 1.57 version problem Message-ID: Hello Sir / Mam Sir , last days i asked problem about biopython 1.57 version that i have downloaded the latest version that is 1.57 of biopython. when i am giving the print Bio.__version__ command within the BioPython directory this is showing correct version 1.57. But when i am giving this command outside the biopyton directory then this is showing biopython version 1.53. I searched about the problem and found that my python 2.6.5 has already some files of biopython such as --- 1) python-biopython 1.53-1 Python library for bioinformatics 2) python-biopython-doc 1.53-1 Documentation for the Biopython library 3) python-biopython-sql 1.53-1 Biopython support for the BioSQL database sc I think the problem is because of these already existing biopython 1.53 files. Am i right? If yes then how to get rid off this proble,? If no then can anyone suggest me whats the problem and solution? Kindly reply me soon. Thanku very much. regards isha From anaryin at gmail.com Mon Jul 18 13:00:53 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Mon, 18 Jul 2011 15:00:53 +0200 Subject: [Biopython] Biopyhton 1.57 version problem In-Reply-To: References: Message-ID: Dear Isha, The problem most likely is that you didn't completely remove 1.53, and your PYTHONPATH variable is likely still pointing to it. When you start python from the Biopython directory and import Bio, it will not load system-wide packages but instead the one it has at hand. I'd suggest you: 1) Do sudo aptitude remove/purge python-biopython At this point, if you do: cd $HOME python >> import Bio It should give an ImportError. If not, there is still some version somewhere that you need to remove. 2) Install Biopython 1.57 from the directory you downloaded, using sudo make install and not giving any --home option during setup. This will ensure it is installed system-wide and accessible from any python call. Best, Jo?o From eric.talevich at gmail.com Mon Jul 18 15:13:54 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Mon, 18 Jul 2011 11:13:54 -0400 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: On Sun, Jul 17, 2011 at 7:27 PM, malvika sharan wrote: > My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. > That's a very old version of Biopython. Are you able to install a more recent version? And well as i said that the the error shows at the import. > import os, sys > from Bio import SeqIO > from Bio import Entrez > from Bio.Blast import NCBIWWW* > *from Bio.Blast.NCBIStandalone import PSIBlastParser > *from Bio.Blast.Applications import NcbipsiblastCommandline* > from Bio.Blast import NCBIXML > If the earlier imports of SeqIO and Entrez work, then NcbipsiblastCommandline probably wasn't included in Biopython 1.50, I would guess. On Mon, Jul 18, 2011 at 1:20 AM, Senthil Kumar M > wrote: > > > On Sun, Jul 17, 2011 at 3:58 PM, malvika sharan > > > wrote: > > > Thanks Senthil, but that is what i mentioned that i am trying to use > > > wrappers in Bio.Blast.Applications module. It just does not seem to > work. > > I > > > have the whole Biopython installed and i am using Python 2.7. > > > > > > the problem occurs while importing itself, i mentioned the error as > well. > > > here is the whole error description: > > > Traceback (most recent call last): > > > File "psiBlast.py", line 6, in > > > from Bio.Blast.Applications import NcbipsiblastCommandline > > > ImportError: cannot importname NcbipsiblastCommandline > > > > > > i just cant figure out why. > > > > Hi, > > > > Could you please provide more details such as your operating system, > > python/biopython versions and a minimal example of your script? > > > > On my system for example, 'from Bio.Blast.Applications import > > NcbipsiblastCommandline' does not raise an error. > > > > senthil at deepthought ~ $ python > > Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) > > [GCC 4.4.5] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> import Bio > > >>> print Bio.__version__ > > 1.57 > > >>> from Bio.Blast.Applications import NcbipsiblastCommandline > > >>> > > > > senthil at deepthought ~ $ uname -srm > > Linux 2.6.35-22-generic x86_64 > > > > HTH > > > > Senthil > > > > -/ > > Time is an illusion, lunchtime doubly so. > > -- The Hitchhiker's Guide to the Galaxy > > > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From akooser at unm.edu Mon Jul 18 15:15:39 2011 From: akooser at unm.edu (Ara Kooser) Date: Mon, 18 Jul 2011 09:15:39 -0600 Subject: [Biopython] NCBIWWW genbank files In-Reply-To: References: Message-ID: Good morning all, I am in the process of writing some code for pulling down files from NCBI. I wrote this based on the Biopython manual: from Bio.Blast import NCBIWWW def query(): file_query = raw_input("Please enter the name of your sequence file: ") fasta_seq = open(file_query).read() result_handle = NCBIWWW.qblast("blastn","nr", fasta_seq, expect=1e-30, hitlist_size=20000) save_file = open("blast_results.xml","w") save_file.write(result_handle.read()) save_file.close() result_handle.close() query() Everything works fine. But I was wondering is there a way to pull down the Genbank files using this method. I used the help(NCBIWWW.qblast) to look at all the options but didn't see the Genbank file format. Downstream in the program I use information extracted from both the .xml and genbank files since they contain different information to we need. I was hoping to combine everything into one program. Currently we use the web interface to pull down the xml and genbank files. Thanks! Ara From srivastavaisha.06 at gmail.com Tue Jul 19 05:34:43 2011 From: srivastavaisha.06 at gmail.com (isha srivastava) Date: Tue, 19 Jul 2011 11:04:43 +0530 Subject: [Biopython] Biopyhton 1.57 version problem In-Reply-To: References: Message-ID: Hi Thanx for ur answer sir. ya have already tried sudo apt-get remove python-biopython.it was worked properly. even i removes python-biopython-doc and sql too. Now when i m giving the command --> >>>import Bio >>>print Bio.__version__ this is showing error . which is good. but import Bio is still working . which is not gd and it means biopython is not completely removed .. thats y i m giving the fallowing command to findout all biopython named files in my PC -- $ dpkg --list | grep 'biopython' this command is showing no any result means there is no any biopyhton file.If there is no any biopython directory then why ( import Bio ) is still working. Hope for ur soon reply. Thanx isha From p.j.a.cock at googlemail.com Tue Jul 19 05:54:12 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 19 Jul 2011 06:54:12 +0100 Subject: [Biopython] NCBIWWW genbank files In-Reply-To: References: Message-ID: On Monday, July 18, 2011, Ara Kooser wrote: > Good morning all, > > > ? I am in the process of writing some code for pulling down files from NCBI. I wrote this based on the Biopython manual: > > from Bio.Blast import NCBIWWW > > def query(): > ? ?file_query = raw_input("Please enter the name of your sequence file: ") > ? ?fasta_seq = open(file_query).read() > ? ?result_handle = NCBIWWW.qblast("blastn","nr", fasta_seq, expect=1e-30, hitlist_size=20000) > ? ?save_file = open("blast_results.xml","w") > ? ?save_file.write(result_handle.read()) > ? ?save_file.close() > ? ?result_handle.close() > > > query() > > Everything works fine. But I was wondering is there a way to pull down the Genbank files using this method. I used the help(NCBIWWW.qblast) to look at all the options but didn't see the Genbank file format. Downstream in the program I use information extracted from both the .xml and genbank files since they contain different information to we need. I was hoping to combine everything into one program. Currently we use the web interface to pull down the xml and genbank files. > > Thanks! > Ara > Hi Ara, BLAST does not offer GenBank as an output format. Assuming I have understood your aim, this can be done as a multi step process: Run BLAST, extract a list of matching record accessions, download these records in GenBank format from the NCBI. You may find it useful to request tabular output from BLAST and extract the match names (column two). This should be faster as the XML version of the data is much larger. Also to avoid trying to download the same GenBank record more than once, I would use a Python set rather than a Python list object when recording this information from the BLAST file. You can use the NCBI Entrez utilities API to download GenBank files, see Bio.Entrez in the Biopython tutorial, function efetch. Peter From p.j.a.cock at googlemail.com Tue Jul 19 06:03:44 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 19 Jul 2011 07:03:44 +0100 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: Senthil wrote: > > Hi, > > According to the Biopython tutorial > http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc93 (2 April > 2011 version): > > "You can run the standalone verion of PSI-BLAST (the legacy NCBI > command line tool blastpgp, or its replacement psiblast) using the > wrappers in Bio.Blast.Applications module. > At the time of writing, the NCBI do not appear to support tools > running a PSI-BLAST search via the internet." > > HTH, > > Senthil > That might be possibly after all, see this information reported recently by F?bio Madeira: https://github.com/biopython/biopython/pull/12 It should be possible with any recent Biopython, Fabio's change was to the documentation for the blast function only. We'd want to update the tutorial too ideally. Peter From p.j.a.cock at googlemail.com Tue Jul 19 06:08:59 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 19 Jul 2011 07:08:59 +0100 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: On Monday, July 18, 2011, Eric Talevich wrote: > On Sun, Jul 17, 2011 at 7:27 PM, malvika sharan wrote: > >> My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. >> > > That's a very old version of Biopython. Are you able to install a more > recent version? > > > And well as i said that the the error shows at the import. >> import os, sys >> from Bio import SeqIO >> from Bio import Entrez >> from Bio.Blast import NCBIWWW* >> *from Bio.Blast.NCBIStandalone import PSIBlastParser >> *from Bio.Blast.Applications import NcbipsiblastCommandline* >> from Bio.Blast import NCBIXML >> > > If the earlier imports of SeqIO and Entrez work, then > NcbipsiblastCommandline probably wasn't included in Biopython 1.50, I would > guess. > Correct. The BLAST+ wrappers were added in Biopython 1.53. You will need to update it, ideally to the current release, 1.57 In general if some imports work and others fail, your library Is either too old (and what you want didn't exist in the old version), or too new (the code you want to use was obsolete and removed). Peter From anaryin at gmail.com Tue Jul 19 06:48:29 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 19 Jul 2011 08:48:29 +0200 Subject: [Biopython] Biopyhton 1.57 version problem In-Reply-To: References: Message-ID: What error does print Bio.__version__ give? In which directory are you executing those python commands? From anaryin at gmail.com Tue Jul 19 09:28:25 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 19 Jul 2011 11:28:25 +0200 Subject: [Biopython] Biopyhton 1.57 version problem In-Reply-To: References: Message-ID: Hello Isha, Where did you run the Python interpreter (i.e. in which directory were you when you typed "python" and then "import Bio")? Did you install the new version of Biopython correctly? Go to that ~/Downloads/python/biopython-1.57/ directory and run sudo python setup.py install (assuming you have super user privileges). Regards, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Tue, Jul 19, 2011 at 11:24 AM, isha srivastava < srivastavaisha.06 at gmail.com> wrote: > error is --> > > > Traceback (most recent call last): > File "", line 1, in > AttributeError: 'module' object has no attribute '__version__' > > in python directory i made a biopython directory . > > isha at desktop:~/Downloads/python$ ls > > biopython-1.57 numpy-1.6.1rc3 python.mk Tutorial_files > debian_defaults ori_fasta pyversions.py > Tutorial.html > egenix-mx-base-3.2.0 prog reportlab-2.5 > fetch.py pyProgram.py runtime.d > > isha at -desktop:~/Downloads/python$ cd biopython-1.57/ > isha at -desktop:~/Downloads/python/biopython-1.57$ ls > > Bio build DEPRECATED LICENSE NEWS README setup.py > BioSQL CONTRIB Doc MANIFEST.in PKG-INFO Scripts Tests > > From malvikasharan at gmail.com Tue Jul 19 09:49:53 2011 From: malvikasharan at gmail.com (malvika sharan) Date: Tue, 19 Jul 2011 11:49:53 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: Thank you Peter and Eric. you are right and i think i should have known this :( I updated Biopython. and revising my codes. it should work now. Malvika On Tue, Jul 19, 2011 at 8:08 AM, Peter Cock wrote: > On Monday, July 18, 2011, Eric Talevich wrote: > > On Sun, Jul 17, 2011 at 7:27 PM, malvika sharan >wrote: > > > >> My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. > >> > > > > That's a very old version of Biopython. Are you able to install a more > > recent version? > > > > > > And well as i said that the the error shows at the import. > >> import os, sys > >> from Bio import SeqIO > >> from Bio import Entrez > >> from Bio.Blast import NCBIWWW* > >> *from Bio.Blast.NCBIStandalone import PSIBlastParser > >> *from Bio.Blast.Applications import NcbipsiblastCommandline* > >> from Bio.Blast import NCBIXML > >> > > > > If the earlier imports of SeqIO and Entrez work, then > > NcbipsiblastCommandline probably wasn't included in Biopython 1.50, I > would > > guess. > > > > Correct. The BLAST+ wrappers were added in Biopython 1.53. > You will need to update it, ideally to the current release, 1.57 > > In general if some imports work and others fail, your library > Is either too old (and what you want didn't exist in the old > version), or too new (the code you want to use was obsolete > and removed). > > Peter > From anaryin at gmail.com Tue Jul 19 09:54:29 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Tue, 19 Jul 2011 11:54:29 +0200 Subject: [Biopython] Biopyhton 1.57 version problem In-Reply-To: References: Message-ID: Dear Isha, If you run the interpreter in biopython-1.57 directory it will load the modules from that folder, even if it is not installed in the system. You have to do "sudo python setup.py install" in the biopython-1.57 directory, wait for completion of the installation, and then move to another directory, say for example your home directory or the Desktop, and there type "python" and try "import Bio". You don't have to remove anything.. Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Tue, Jul 19, 2011 at 11:40 AM, isha srivastava < srivastavaisha.06 at gmail.com> wrote: > Hello Sir > > I run in the terminal. > I typed that in Python directory. > Now i m removing the Biopyhton -1.57 and ll again download and install > that. > > Regards > isha > From akooser at unm.edu Tue Jul 19 15:07:23 2011 From: akooser at unm.edu (Ara Kooser) Date: Tue, 19 Jul 2011 09:07:23 -0600 Subject: [Biopython] NCBIWWW genbank files In-Reply-To: References: Message-ID: Peter, Thanks for the clarification there. I was a little confused. I'll give this a try. Regards, Ara On Jul 18, 2011, at 11:54 PM, Peter Cock wrote: > On Monday, July 18, 2011, Ara Kooser wrote: >> Good morning all, >> >> >> I am in the process of writing some code for pulling down files from NCBI. I wrote this based on the Biopython manual: >> >> from Bio.Blast import NCBIWWW >> >> def query(): >> file_query = raw_input("Please enter the name of your sequence file: ") >> fasta_seq = open(file_query).read() >> result_handle = NCBIWWW.qblast("blastn","nr", fasta_seq, expect=1e-30, hitlist_size=20000) >> save_file = open("blast_results.xml","w") >> save_file.write(result_handle.read()) >> save_file.close() >> result_handle.close() >> >> >> query() >> >> Everything works fine. But I was wondering is there a way to pull down the Genbank files using this method. I used the help(NCBIWWW.qblast) to look at all the options but didn't see the Genbank file format. Downstream in the program I use information extracted from both the .xml and genbank files since they contain different information to we need. I was hoping to combine everything into one program. Currently we use the web interface to pull down the xml and genbank files. >> >> Thanks! >> Ara >> > > Hi Ara, > > BLAST does not offer GenBank as an output format. > > Assuming I have understood your aim, this can be done as a multi step > process: Run BLAST, extract a list of matching record accessions, > download these records in GenBank format from the NCBI. > > You may find it useful to request tabular output from BLAST and > extract the match names (column two). This should be faster as the XML > version of the data is much larger. > > Also to avoid trying to download the same GenBank record more than > once, I would use a Python set rather than a Python list object when > recording this information from the BLAST file. > > You can use the NCBI Entrez utilities API to download GenBank files, > see Bio.Entrez in the Biopython tutorial, function efetch. > > Peter From mollymutant at googlemail.com Wed Jul 20 13:26:33 2011 From: mollymutant at googlemail.com (molly mutant) Date: Wed, 20 Jul 2011 15:26:33 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: hello all, I was also trying to run my program using the same wrapper from Bio.Blast.Application for psiblast commandline. i use the following code but this is not generating XML file. psi_cline = NcbipsiblastCommandline('psiblast', db = 'refseq_protein', query = queryID+".fasta", evalue = 10 , out = queryID+"_psi.xml", outfmt = 7, out_pssm = queryID+"_pssm") p = subprocess.Popen(str(psi_cline),stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=(sys.platform!="win32")) blastParser(p.stdout) i have defined blastParser for parsing XML files which works perfectly with other xml files. i get the following error : Traceback (most recent call last): File "psiBlast.py", line 110, in blastParser(p.stdout) File "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Blast/NCBIXML.py", line 617, in parse raise ValueError("Your XML file was empty") ValueError: Your XML file was empty you can see that i am using python 2.6 and Biopython 1.57. Do you know where am i going incorrect? Molly On Wed, Jul 20, 2011 at 3:07 PM, malvika sharan wrote: > > > ---------- Forwarded message ---------- > > > Thank you Peter and Eric. > > you are right and i think i should have known this :( > I updated Biopython. and revising my codes. it should work now. > > Malvika > > > On Tue, Jul 19, 2011 at 8:08 AM, Peter Cock wrote: > >> On Monday, July 18, 2011, Eric Talevich wrote: >> > On Sun, Jul 17, 2011 at 7:27 PM, malvika sharan < >> malvikasharan at gmail.com>wrote: >> > >> >> My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. >> >> >> > >> > That's a very old version of Biopython. Are you able to install a more >> > recent version? >> > >> > >> > And well as i said that the the error shows at the import. >> >> import os, sys >> >> from Bio import SeqIO >> >> from Bio import Entrez >> >> from Bio.Blast import NCBIWWW* >> >> *from Bio.Blast.NCBIStandalone import PSIBlastParser >> >> *from Bio.Blast.Applications import NcbipsiblastCommandline* >> >> from Bio.Blast import NCBIXML >> >> >> > >> > If the earlier imports of SeqIO and Entrez work, then >> > NcbipsiblastCommandline probably wasn't included in Biopython 1.50, I >> would >> > guess. >> > >> >> Correct. The BLAST+ wrappers were added in Biopython 1.53. >> You will need to update it, ideally to the current release, 1.57 >> >> In general if some imports work and others fail, your library >> Is either too old (and what you want didn't exist in the old >> version), or too new (the code you want to use was obsolete >> and removed). >> >> Peter >> > > > From mollymutant at googlemail.com Wed Jul 20 13:33:04 2011 From: mollymutant at googlemail.com (molly mutant) Date: Wed, 20 Jul 2011 15:33:04 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: Oh, i forgot to mention : queryID here is a protein ID for example NP_010247.1 ' query = queryID+".fasta" ' is a fasta file for this protein. i want to get the XML output from the psi blast. Regards Molly On Wed, Jul 20, 2011 at 3:26 PM, molly mutant wrote: > hello all, > > I was also trying to run my program using the same wrapper from > Bio.Blast.Application for psiblast commandline. > > i use the following code but this is not generating XML file. > psi_cline = NcbipsiblastCommandline('psiblast', db = > 'refseq_protein', query = queryID+".fasta", evalue = 10 , out = > queryID+"_psi.xml", outfmt = 7, out_pssm = queryID+"_pssm") > p = > subprocess.Popen(str(psi_cline),stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=(sys.platform!="win32")) > blastParser(p.stdout) > > i have defined blastParser for parsing XML files which works perfectly with > other xml files. > > i get the following error : > > Traceback (most recent call last): > File "psiBlast.py", line 110, in > blastParser(p.stdout) > > File > "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Blast/NCBIXML.py", > line 617, in parse > raise ValueError("Your XML file was empty") > ValueError: Your XML file was empty > > > you can see that i am using python 2.6 and Biopython 1.57. Do you know > where am i going incorrect? > > Molly > > On Wed, Jul 20, 2011 at 3:07 PM, malvika sharan wrote: > > >> >> ---------- Forwarded message ---------- >> >> >> Thank you Peter and Eric. >> >> you are right and i think i should have known this :( >> I updated Biopython. and revising my codes. it should work now. >> >> Malvika >> >> >> On Tue, Jul 19, 2011 at 8:08 AM, Peter Cock wrote: >> >>> On Monday, July 18, 2011, Eric Talevich wrote: >>> > On Sun, Jul 17, 2011 at 7:27 PM, malvika sharan < >>> malvikasharan at gmail.com>wrote: >>> > >>> >> My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. >>> >> >>> > >>> > That's a very old version of Biopython. Are you able to install a more >>> > recent version? >>> > >>> > >>> > And well as i said that the the error shows at the import. >>> >> import os, sys >>> >> from Bio import SeqIO >>> >> from Bio import Entrez >>> >> from Bio.Blast import NCBIWWW* >>> >> *from Bio.Blast.NCBIStandalone import PSIBlastParser >>> >> *from Bio.Blast.Applications import NcbipsiblastCommandline* >>> >> from Bio.Blast import NCBIXML >>> >> >>> > >>> > If the earlier imports of SeqIO and Entrez work, then >>> > NcbipsiblastCommandline probably wasn't included in Biopython 1.50, I >>> would >>> > guess. >>> > >>> >>> Correct. The BLAST+ wrappers were added in Biopython 1.53. >>> You will need to update it, ideally to the current release, 1.57 >>> >>> In general if some imports work and others fail, your library >>> Is either too old (and what you want didn't exist in the old >>> version), or too new (the code you want to use was obsolete >>> and removed). >>> >>> Peter >>> >> >> >> > From mollymutant at googlemail.com Wed Jul 20 15:30:13 2011 From: mollymutant at googlemail.com (molly mutant) Date: Wed, 20 Jul 2011 17:30:13 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: Can anyone please share the functioning codes for PSI-BLAST using NCBI commandline or suggest me the source from where i can get it?? i am in an urgent need of it :( and i can not find the problem with my command/code. Regards, Molly On Wed, Jul 20, 2011 at 4:19 PM, molly mutant wrote: > if it use cline() command: > > psi_cline = NcbipsiblastCommandline('psiblast', db = > 'refseq_protein',\ > query = queryID+".fasta", > evalue = 10 , \ > out = queryID+"_psi.xml", > outfmt = 7, \ > out_pssm = queryID+"_pssm") > str(psi_cline) > psi_cline() > > the following error occurs : > Traceback (most recent call last): > File "psiBlast.py", line 113, in > psi_cline() > File > "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Application/__init__.py", > line 432, in __call__ > stdout_str, stderr_str) > Bio.Application.ApplicationError: Command 'psiblast -out NP_012649_psi.xml > -outfmt 7 -query NP_012649.fasta -db refseq_protein -evalue 10 -out_pssm > NP_012649_pssm' returned non-zero exit status 127, '/bin/sh: psiblast: not > found' > > I think this error stands for that the command is not found, which means > that my command is incorrect, am i right?? > > > > On Wed, Jul 20, 2011 at 3:33 PM, molly mutant wrote: > >> Oh, i forgot to mention : >> >> queryID here is a protein ID for example NP_010247.1 >> ' query = queryID+".fasta" ' is a fasta file for this protein. >> i want to get the XML output from the psi blast. >> >> Regards >> Molly >> >> >> On Wed, Jul 20, 2011 at 3:26 PM, molly mutant > > wrote: >> >>> hello all, >>> >>> I was also trying to run my program using the same wrapper from >>> Bio.Blast.Application for psiblast commandline. >>> >>> i use the following code but this is not generating XML file. >>> psi_cline = NcbipsiblastCommandline('psiblast', db = >>> 'refseq_protein', query = queryID+".fasta", evalue = 10 , out = >>> queryID+"_psi.xml", outfmt = 7, out_pssm = queryID+"_pssm") >>> p = >>> subprocess.Popen(str(psi_cline),stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=(sys.platform!="win32")) >>> blastParser(p.stdout) >>> >>> i have defined blastParser for parsing XML files which works perfectly >>> with other xml files. >>> >>> i get the following error : >>> >>> Traceback (most recent call last): >>> File "psiBlast.py", line 110, in >>> blastParser(p.stdout) >>> >>> File >>> "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Blast/NCBIXML.py", >>> line 617, in parse >>> raise ValueError("Your XML file was empty") >>> ValueError: Your XML file was empty >>> >>> >>> you can see that i am using python 2.6 and Biopython 1.57. Do you know >>> where am i going incorrect? >>> >>> Molly >>> >>> On Wed, Jul 20, 2011 at 3:07 PM, malvika sharan >> > wrote: >>> >>> >>>> >>>> ---------- Forwarded message ---------- >>>> >>>> >>>> Thank you Peter and Eric. >>>> >>>> you are right and i think i should have known this :( >>>> I updated Biopython. and revising my codes. it should work now. >>>> >>>> Malvika >>>> >>>> >>>> On Tue, Jul 19, 2011 at 8:08 AM, Peter Cock wrote: >>>> >>>>> On Monday, July 18, 2011, Eric Talevich >>>>> wrote: >>>>> > On Sun, Jul 17, 2011 at 7:27 PM, malvika sharan < >>>>> malvikasharan at gmail.com>wrote: >>>>> > >>>>> >> My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. >>>>> >> >>>>> > >>>>> > That's a very old version of Biopython. Are you able to install a >>>>> more >>>>> > recent version? >>>>> > >>>>> > >>>>> > And well as i said that the the error shows at the import. >>>>> >> import os, sys >>>>> >> from Bio import SeqIO >>>>> >> from Bio import Entrez >>>>> >> from Bio.Blast import NCBIWWW* >>>>> >> *from Bio.Blast.NCBIStandalone import PSIBlastParser >>>>> >> *from Bio.Blast.Applications import NcbipsiblastCommandline* >>>>> >> from Bio.Blast import NCBIXML >>>>> >> >>>>> > >>>>> > If the earlier imports of SeqIO and Entrez work, then >>>>> > NcbipsiblastCommandline probably wasn't included in Biopython 1.50, I >>>>> would >>>>> > guess. >>>>> > >>>>> >>>>> Correct. The BLAST+ wrappers were added in Biopython 1.53. >>>>> You will need to update it, ideally to the current release, 1.57 >>>>> >>>>> In general if some imports work and others fail, your library >>>>> Is either too old (and what you want didn't exist in the old >>>>> version), or too new (the code you want to use was obsolete >>>>> and removed). >>>>> >>>>> Peter >>>>> >>>> From from.d.putto at gmail.com Wed Jul 20 16:22:37 2011 From: from.d.putto at gmail.com (Sheila the angel) Date: Wed, 20 Jul 2011 18:22:37 +0200 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: This method do not work with Bio.Emboss.Applications!!! I am trying to do the same with 'Bio.Emboss.Applications' as seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence object water_cline = WaterCommandline(asequence="-", bsequence="beta.fasta", gapopen=10, gapextend=0.5, outfile="water.txt") stdout, stderr =water_cline(stdin=seq_record.format("fasta")) but it is displaying error. How can I specify file handle or sequence only in Emboss Applications??? On Thu, Jul 7, 2011 at 12:43 PM, Peter Cock wrote: > On Thu, Jul 7, 2011 at 11:26 AM, Sheila the angel > wrote: > > Hi All, > > > > I want to download genbank file from NCBI and pass the protein sequence > > directly to the local BLAST. But I am getting error in BLAST step > > > #------------------------------------------------------------------------------------------- > > from Bio import SeqIO > > from Bio import Entrez > > from Bio.Blast.Applications import NcbiblastpCommandline > > id='200203' > > handle = Entrez.efetch(db="protein", id=id, rettype="gp") > > seq_record = SeqIO.read(handle, "gb") > > x=seq_record.seq #getting the > > sequence in a variable x > > blastp_cline = NcbiblastpCommandline(query=x, db="protein_database", > > evalue=0.001) # My BLAST command > > result_handle, stderr = blastp_cline() #Running BLAST > and > > getting error :( > > > > > #------------------------------------------------------------------------------------------- > > > > At this last step I am getting error..... > > I sort-of understand the problem.....it is taking value of x as a file > name > > while its a variable which contains the sequence. > > Is there any way out to this problem without making temporary file. > > With the standalone blast tools you generally need to prepare an input > FASTA file with your query sequence(s). > > However, in principle you can give the input filename as - (default), > and instead pipe the query FASTA record in as stdin (standard input). > Try something like this (untested): > > ... > blastp_cline = NcbiblastpCommandline(query="-", db="protein_database", > evalue=0.001) > stdout, stderr = blastp_cline(stdin=seq_record.format("fasta")) > > Peter > From mollymutant at googlemail.com Wed Jul 20 14:19:35 2011 From: mollymutant at googlemail.com (molly mutant) Date: Wed, 20 Jul 2011 16:19:35 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: if it use cline() command: psi_cline = NcbipsiblastCommandline('psiblast', db = 'refseq_protein',\ query = queryID+".fasta", evalue = 10 , \ out = queryID+"_psi.xml", outfmt = 7, \ out_pssm = queryID+"_pssm") str(psi_cline) psi_cline() the following error occurs : Traceback (most recent call last): File "psiBlast.py", line 113, in psi_cline() File "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Application/__init__.py", line 432, in __call__ stdout_str, stderr_str) Bio.Application.ApplicationError: Command 'psiblast -out NP_012649_psi.xml -outfmt 7 -query NP_012649.fasta -db refseq_protein -evalue 10 -out_pssm NP_012649_pssm' returned non-zero exit status 127, '/bin/sh: psiblast: not found' I think this error stands for that the command is not found, which means that my command is incorrect, am i right?? On Wed, Jul 20, 2011 at 3:33 PM, molly mutant wrote: > Oh, i forgot to mention : > > queryID here is a protein ID for example NP_010247.1 > ' query = queryID+".fasta" ' is a fasta file for this protein. > i want to get the XML output from the psi blast. > > Regards > Molly > > > On Wed, Jul 20, 2011 at 3:26 PM, molly mutant wrote: > >> hello all, >> >> I was also trying to run my program using the same wrapper from >> Bio.Blast.Application for psiblast commandline. >> >> i use the following code but this is not generating XML file. >> psi_cline = NcbipsiblastCommandline('psiblast', db = >> 'refseq_protein', query = queryID+".fasta", evalue = 10 , out = >> queryID+"_psi.xml", outfmt = 7, out_pssm = queryID+"_pssm") >> p = >> subprocess.Popen(str(psi_cline),stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=(sys.platform!="win32")) >> blastParser(p.stdout) >> >> i have defined blastParser for parsing XML files which works perfectly >> with other xml files. >> >> i get the following error : >> >> Traceback (most recent call last): >> File "psiBlast.py", line 110, in >> blastParser(p.stdout) >> >> File >> "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Blast/NCBIXML.py", >> line 617, in parse >> raise ValueError("Your XML file was empty") >> ValueError: Your XML file was empty >> >> >> you can see that i am using python 2.6 and Biopython 1.57. Do you know >> where am i going incorrect? >> >> Molly >> >> On Wed, Jul 20, 2011 at 3:07 PM, malvika sharan wrote: >> >> >>> >>> ---------- Forwarded message ---------- >>> >>> >>> Thank you Peter and Eric. >>> >>> you are right and i think i should have known this :( >>> I updated Biopython. and revising my codes. it should work now. >>> >>> Malvika >>> >>> >>> On Tue, Jul 19, 2011 at 8:08 AM, Peter Cock wrote: >>> >>>> On Monday, July 18, 2011, Eric Talevich >>>> wrote: >>>> > On Sun, Jul 17, 2011 at 7:27 PM, malvika sharan < >>>> malvikasharan at gmail.com>wrote: >>>> > >>>> >> My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. >>>> >> >>>> > >>>> > That's a very old version of Biopython. Are you able to install a more >>>> > recent version? >>>> > >>>> > >>>> > And well as i said that the the error shows at the import. >>>> >> import os, sys >>>> >> from Bio import SeqIO >>>> >> from Bio import Entrez >>>> >> from Bio.Blast import NCBIWWW* >>>> >> *from Bio.Blast.NCBIStandalone import PSIBlastParser >>>> >> *from Bio.Blast.Applications import NcbipsiblastCommandline* >>>> >> from Bio.Blast import NCBIXML >>>> >> >>>> > >>>> > If the earlier imports of SeqIO and Entrez work, then >>>> > NcbipsiblastCommandline probably wasn't included in Biopython 1.50, I >>>> would >>>> > guess. >>>> > >>>> >>>> Correct. The BLAST+ wrappers were added in Biopython 1.53. >>>> You will need to update it, ideally to the current release, 1.57 >>>> >>>> In general if some imports work and others fail, your library >>>> Is either too old (and what you want didn't exist in the old >>>> version), or too new (the code you want to use was obsolete >>>> and removed). >>>> >>>> Peter >>>> >>> >>> >>> >> > > > -- Regards, Molly From anaryin at gmail.com Wed Jul 20 16:48:14 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 20 Jul 2011 18:48:14 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: Dear Molly, I never worked with that module so this might be wrong. But from my experience, that error is common when you have aliased the executable, which doesn't work. Try adding the directory where the executable 'psiblast' is (usually something /bin) to your PATH variable: export PATH="${PATH}:/my/blast/directory/bin/' Troubleshooting and debugging are parts of coding, so I'd recommend you to spend half an hour on this and I'm sure you'll get i through. Best, Jo?o [...] Rodrigues http://nmr.chem.uu.nl/~joao On Wed, Jul 20, 2011 at 5:30 PM, molly mutant wrote: > Can anyone please share the functioning codes for PSI-BLAST using NCBI > commandline or suggest me the source from where i can get it?? i am in an > urgent need of it :( and i can not find the problem with my command/code. > > Regards, > Molly > > On Wed, Jul 20, 2011 at 4:19 PM, molly mutant >wrote: > > > if it use cline() command: > > > > psi_cline = NcbipsiblastCommandline('psiblast', db = > > 'refseq_protein',\ > > query = queryID+".fasta", > > evalue = 10 , \ > > out = queryID+"_psi.xml", > > outfmt = 7, \ > > out_pssm = queryID+"_pssm") > > str(psi_cline) > > psi_cline() > > > > the following error occurs : > > Traceback (most recent call last): > > File "psiBlast.py", line 113, in > > psi_cline() > > File > > > "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Application/__init__.py", > > line 432, in __call__ > > stdout_str, stderr_str) > > Bio.Application.ApplicationError: Command 'psiblast -out > NP_012649_psi.xml > > -outfmt 7 -query NP_012649.fasta -db refseq_protein -evalue 10 -out_pssm > > NP_012649_pssm' returned non-zero exit status 127, '/bin/sh: psiblast: > not > > found' > > > > I think this error stands for that the command is not found, which means > > that my command is incorrect, am i right?? > > > > > > > > On Wed, Jul 20, 2011 at 3:33 PM, molly mutant < > mollymutant at googlemail.com>wrote: > > > >> Oh, i forgot to mention : > >> > >> queryID here is a protein ID for example NP_010247.1 > >> ' query = queryID+".fasta" ' is a fasta file for this protein. > >> i want to get the XML output from the psi blast. > >> > >> Regards > >> Molly > >> > >> > >> On Wed, Jul 20, 2011 at 3:26 PM, molly mutant < > mollymutant at googlemail.com > >> > wrote: > >> > >>> hello all, > >>> > >>> I was also trying to run my program using the same wrapper from > >>> Bio.Blast.Application for psiblast commandline. > >>> > >>> i use the following code but this is not generating XML file. > >>> psi_cline = NcbipsiblastCommandline('psiblast', db = > >>> 'refseq_protein', query = queryID+".fasta", evalue = 10 , out = > >>> queryID+"_psi.xml", outfmt = 7, out_pssm = queryID+"_pssm") > >>> p = > >>> > subprocess.Popen(str(psi_cline),stdout=subprocess.PIPE,stderr=subprocess.PIPE,shell=(sys.platform!="win32")) > >>> blastParser(p.stdout) > >>> > >>> i have defined blastParser for parsing XML files which works perfectly > >>> with other xml files. > >>> > >>> i get the following error : > >>> > >>> Traceback (most recent call last): > >>> File "psiBlast.py", line 110, in > >>> blastParser(p.stdout) > >>> > >>> File > >>> > "/usr/local/lib/python2.6/dist-packages/biopython-1.57-py2.6-linux-x86_64.egg/Bio/Blast/NCBIXML.py", > >>> line 617, in parse > >>> raise ValueError("Your XML file was empty") > >>> ValueError: Your XML file was empty > >>> > >>> > >>> you can see that i am using python 2.6 and Biopython 1.57. Do you know > >>> where am i going incorrect? > >>> > >>> Molly > >>> > >>> On Wed, Jul 20, 2011 at 3:07 PM, malvika sharan < > malvikasharan at gmail.com > >>> > wrote: > >>> > >>> > >>>> > >>>> ---------- Forwarded message ---------- > >>>> > >>>> > >>>> Thank you Peter and Eric. > >>>> > >>>> you are right and i think i should have known this :( > >>>> I updated Biopython. and revising my codes. it should work now. > >>>> > >>>> Malvika > >>>> > >>>> > >>>> On Tue, Jul 19, 2011 at 8:08 AM, Peter Cock < > p.j.a.cock at googlemail.com>wrote: > >>>> > >>>>> On Monday, July 18, 2011, Eric Talevich > >>>>> wrote: > >>>>> > On Sun, Jul 17, 2011 at 7:27 PM, malvika sharan < > >>>>> malvikasharan at gmail.com>wrote: > >>>>> > > >>>>> >> My OS is mac, Python is 2.7 as mentioned,Biopython 1.50. > >>>>> >> > >>>>> > > >>>>> > That's a very old version of Biopython. Are you able to install a > >>>>> more > >>>>> > recent version? > >>>>> > > >>>>> > > >>>>> > And well as i said that the the error shows at the import. > >>>>> >> import os, sys > >>>>> >> from Bio import SeqIO > >>>>> >> from Bio import Entrez > >>>>> >> from Bio.Blast import NCBIWWW* > >>>>> >> *from Bio.Blast.NCBIStandalone import PSIBlastParser > >>>>> >> *from Bio.Blast.Applications import NcbipsiblastCommandline* > >>>>> >> from Bio.Blast import NCBIXML > >>>>> >> > >>>>> > > >>>>> > If the earlier imports of SeqIO and Entrez work, then > >>>>> > NcbipsiblastCommandline probably wasn't included in Biopython 1.50, > I > >>>>> would > >>>>> > guess. > >>>>> > > >>>>> > >>>>> Correct. The BLAST+ wrappers were added in Biopython 1.53. > >>>>> You will need to update it, ideally to the current release, 1.57 > >>>>> > >>>>> In general if some imports work and others fail, your library > >>>>> Is either too old (and what you want didn't exist in the old > >>>>> version), or too new (the code you want to use was obsolete > >>>>> and removed). > >>>>> > >>>>> Peter > >>>>> > >>>> > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From eric.talevich at gmail.com Wed Jul 20 16:54:18 2011 From: eric.talevich at gmail.com (Eric Talevich) Date: Wed, 20 Jul 2011 12:54:18 -0400 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: On Wed, Jul 20, 2011 at 11:30 AM, molly mutant wrote: > Can anyone please share the functioning codes for PSI-BLAST using NCBI > commandline or suggest me the source from where i can get it?? i am in an > urgent need of it :( and i can not find the problem with my command/code. > > Regards, > Molly > > On Wed, Jul 20, 2011 at 4:19 PM, molly mutant >wrote: > > the following error occurs : > > Bio.Application.ApplicationError: Command 'psiblast -out > NP_012649_psi.xml > > -outfmt 7 -query NP_012649.fasta -db refseq_protein -evalue 10 -out_pssm > > NP_012649_pssm' returned non-zero exit status 127, '/bin/sh: psiblast: > not > > found' > > > > I think this error stands for that the command is not found, which means > > that my command is incorrect, am i right?? > To follow up on what Jo?o wrote, the important error is: '/bin/sh: psiblast: not found' Which means that psiblast is not installed correctly on your system. If you try running just that command on the command line: psiblast -out NP_012649_psi.xml -outfmt 7 -query NP_012649.fasta -db refseq_protein -evalue 10 -out_pssm NP_012649_pssm will also report an error, even without using Python. So, try making sure psiblast is available on your system path ($PATH). The command "which psiblast" should print where the executable is, if it can be found. Or, if you don't want to mess with $PATH, you can include the complete path to the psiblast executable: psi_cline = NcbipsiblastCommandline('/usr/local/bin/psiblast', db = ... Cheers, Eric From p.j.a.cock at googlemail.com Wed Jul 20 17:27:00 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Wed, 20 Jul 2011 18:27:00 +0100 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: On Wednesday, July 20, 2011, Sheila the angel wrote: > This method do not ?work with Bio.Emboss.Applications!!!I am trying to do the same with 'Bio.Emboss.Applications' as > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") ?#or a sequence object > water_cline = WaterCommandline(asequence="-", bsequence="beta.fasta", gapopen=10, gapextend=0.5, outfile="water.txt") > stdout, stderr =water_cline(stdin=seq_record.format("fasta")) > but it is displaying error.How can I specify file handle or sequence only in ?Emboss Applications??? > What is the error message? I think you need asequence="stdin" not "-" (although the later is a widely used convention in command line tools). See: http://emboss.sourceforge.net/docs/themes/UniformSequenceAddress.html#stdin Also try including auto=True in the command line, this tells EMBOSS not to try and ask for user input (by default it will try and prompt the user for any missing arguments, with the auto setting it uses it's defaults). Peter From anaryin at gmail.com Wed Jul 20 16:58:37 2011 From: anaryin at gmail.com (=?UTF-8?Q?Jo=C3=A3o_Rodrigues?=) Date: Wed, 20 Jul 2011 18:58:37 +0200 Subject: [Biopython] PSI-BLAST In-Reply-To: References: Message-ID: Adding to what Eric said, sometimes (depending on your config) *which* won't tell you where the command is aliased to. Instead, try *locate*. From from.d.putto at gmail.com Thu Jul 21 09:14:06 2011 From: from.d.putto at gmail.com (Sheila the angel) Date: Thu, 21 Jul 2011 11:14:06 +0200 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: yes replacing '-' by 'stdin' works :) seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence object water_cline = WaterCommandline(asequence="stdin", bsequence="beta.fasta", gapopen=10, gapextend=0.5, outfile="water.txt") stdout, stderr =water_cline(stdin=seq_record.format("fasta")) but I tried to replace bsequence also by 'stdin' and it shows error #---------------------------------------------------------------------------------------------------------------------- seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence object seq_record2 = SeqIO.read(open("beta.fasta"), "fasta") water_cline = WaterCommandline(asequence="stdin", bsequence="stdin", gapopen=10, gapextend=0.5, outfile="water.txt", auto=True) stdout, stderr =water_cline(stdin=seq_record.format("fasta"), stdin=seq_record2.format("fasta")) # File "", line 1 # wrote: > On Wednesday, July 20, 2011, Sheila the angel > wrote: > > This method do not work with Bio.Emboss.Applications!!!I am trying to do > the same with 'Bio.Emboss.Applications' as > > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence > object > > water_cline = WaterCommandline(asequence="-", bsequence="beta.fasta", > gapopen=10, gapextend=0.5, outfile="water.txt") > > stdout, stderr =water_cline(stdin=seq_record.format("fasta")) > > but it is displaying error.How can I specify file handle or sequence only > in Emboss Applications??? > > > > What is the error message? > > I think you need asequence="stdin" not "-" (although the later is a > widely used convention in command line tools). See: > > http://emboss.sourceforge.net/docs/themes/UniformSequenceAddress.html#stdin > > Also try including auto=True in the command line, this tells EMBOSS > not to try and ask for user input (by default it will try and prompt > the user for any missing arguments, with the auto setting it uses it's > defaults). > > Peter > From p.j.a.cock at googlemail.com Thu Jul 21 09:40:36 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 21 Jul 2011 10:40:36 +0100 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: On Thu, Jul 21, 2011 at 10:14 AM, Sheila the angel wrote: > yes replacing '-' by 'stdin' works :) > > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") ?#or a sequence object > water_cline = WaterCommandline(asequence="stdin", bsequence="beta.fasta", > gapopen=10, gapextend=0.5, outfile="water.txt") > stdout, stderr =water_cline(stdin=seq_record.format("fasta")) Good - thanks for letting us know. I can probably make the tutorial text a little clearer here. > but I tried to replace?bsequence also by 'stdin' and it shows error > #---------------------------------------------------------------------------------------------------------------------- > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") ?#or a sequence object > seq_record2 = SeqIO.read(open("beta.fasta"), "fasta") > water_cline = WaterCommandline(asequence="stdin", bsequence="stdin", > gapopen=10, gapextend=0.5, outfile="water.txt", auto=True) That won't work - there is only one stdin pipe in a Unix style command line environment. You have to use either two input files, or stdin and one file, (or one file and stdin). > stdout, stderr =water_cline(stdin=seq_record.format("fasta"), > stdin=seq_record2.format("fasta")) > # ?File "", line 1 # #SyntaxError: keyword argument repeated That is a python syntax error message because you tried to use the stdin argument twice. Named Python function arguments can only be used once. I hope that makes sense. Peter From from.d.putto at gmail.com Thu Jul 21 09:48:28 2011 From: from.d.putto at gmail.com (Sheila the angel) Date: Thu, 21 Jul 2011 11:48:28 +0200 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: Thanks Peter, Yes I understand that I can't use 'stdin' twice so this is not going to work >>> water_cline = WaterCommandline(asequence="stdin", bsequence="stdin",gapopen=10, gapextend=0.5, outfile="water.txt") >>> stdout, stderr =water_cline(stdin=seq_record.format("fasta"), stdin=seq_record2.format("fasta")) >You have to use either two input files, or stdin and one file, (or one >file and stdin). But how can I specify two input files, or stdin On Thu, Jul 21, 2011 at 11:40 AM, Peter Cock wrote: > On Thu, Jul 21, 2011 at 10:14 AM, Sheila the angel > wrote: > > yes replacing '-' by 'stdin' works :) > > > > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence > object > > water_cline = WaterCommandline(asequence="stdin", bsequence="beta.fasta", > > gapopen=10, gapextend=0.5, outfile="water.txt") > > stdout, stderr =water_cline(stdin=seq_record.format("fasta")) > > Good - thanks for letting us know. > > I can probably make the tutorial text a little clearer here. > > > > but I tried to replace bsequence also by 'stdin' and it shows error > > > #---------------------------------------------------------------------------------------------------------------------- > > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence > object > > seq_record2 = SeqIO.read(open("beta.fasta"), "fasta") > > water_cline = WaterCommandline(asequence="stdin", bsequence="stdin", > > gapopen=10, gapextend=0.5, outfile="water.txt", auto=True) > > That won't work - there is only one stdin pipe in a Unix style command > line environment. > You have to use either two input files, or stdin and one file, (or one > file and stdin). > > > stdout, stderr =water_cline(stdin=seq_record.format("fasta"), > > stdin=seq_record2.format("fasta")) > > # File "", line 1 # > #SyntaxError: keyword argument repeated > > That is a python syntax error message because you tried to use the stdin > argument twice. Named Python function arguments can only be used once. > > I hope that makes sense. > > Peter > From p.j.a.cock at googlemail.com Thu Jul 21 09:56:17 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 21 Jul 2011 10:56:17 +0100 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: On Thu, Jul 21, 2011 at 10:48 AM, Sheila the angel wrote: > Thanks Peter, > Yes I understand that I can't use 'stdin' twice so this is not going to work > >>You have to use either two input files, or stdin and one file, (or one >>file and stdin). > > But how can I specify two input files, or stdin > Two files is covered in the Tutorial example you originally started with, isn't it?: water_cline = WaterCommandline(asequence="alpha.fasta", bsequence="beta.fasta", gapopen=10, gapextend=0.5, outfile="water.txt") stdout, stderr =water_cline() You've already done stdin and file for a and b, and said that works: seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence object water_cline = WaterCommandline(asequence="stdin", bsequence="beta.fasta", gapopen=10, gapextend=0.5, outfile="water.txt") stdout, stderr =water_cline(stdin=seq_record.format("fasta")) Doing it the other way round with a file and stdin for a and b would be just: seq_record = SeqIO.read(open("beta.fasta"), "fasta") #or a sequence object water_cline = WaterCommandline(asequence="alpha.fasta", bsequence="stdin", gapopen=10, gapextend=0.5, outfile="water.txt") stdout, stderr =water_cline(stdin=seq_record.format("fasta")) Obviously if you already have the sequences in FASTA files, then just give the filenames to the EMBOSS tool (rather than needlessly loading them into python just to write them to stdin). Peter From malvikasharan at gmail.com Thu Jul 21 15:51:35 2011 From: malvikasharan at gmail.com (malvika sharan) Date: Thu, 21 Jul 2011 17:51:35 +0200 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: Hi, i have tried this tool for aligning 1 sequence ( 'aseq.fasta ') against 5 sequence present in other file ('bseq.fasta'). as expected aseq.fasta gives pairwise alignment with every fasta sequence present in bseq.fasta. the alignment works perfectly and saves the alignment output as well. 1> the question is if there is anyway to extraxt consensus out of all pairwise alignments? I know its is possible with clustalw or muscle or other alignment tool. but i do not want pairwise alignment between all the sequence with each other. In this case Emboss seemed the better tool where i can align all sequence only with 1 query. but the crucial part is to extract the conserved residue from the alignment. 2> The best would be to align all the sequence together against 1 sequence (aseq.fasta) like it happens in COBALT. and find the conserved residue directly. but i did not find any commandline tool like that unfortunately. It would be great if you can suggest a tool if you know any. thank you ! Malvika On Thu, Jul 21, 2011 at 11:56 AM, Peter Cock wrote: > On Thu, Jul 21, 2011 at 10:48 AM, Sheila the angel > wrote: > > Thanks Peter, > > Yes I understand that I can't use 'stdin' twice so this is not going to > work > > > >>You have to use either two input files, or stdin and one file, (or one > >>file and stdin). > > > > But how can I specify two input files, or stdin > > > > Two files is covered in the Tutorial example you originally started > with, isn't it?: > > water_cline = WaterCommandline(asequence="alpha.fasta", > bsequence="beta.fasta", gapopen=10, gapextend=0.5, > outfile="water.txt") > stdout, stderr =water_cline() > > You've already done stdin and file for a and b, and said that works: > > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence > object > water_cline = WaterCommandline(asequence="stdin", > bsequence="beta.fasta", gapopen=10, gapextend=0.5, > outfile="water.txt") > stdout, stderr =water_cline(stdin=seq_record.format("fasta")) > > Doing it the other way round with a file and stdin for a and b would be > just: > > seq_record = SeqIO.read(open("beta.fasta"), "fasta") #or a sequence object > water_cline = WaterCommandline(asequence="alpha.fasta", > bsequence="stdin", gapopen=10, gapextend=0.5, outfile="water.txt") > stdout, stderr =water_cline(stdin=seq_record.format("fasta")) > > Obviously if you already have the sequences in FASTA files, then just give > the filenames to the EMBOSS tool (rather than needlessly loading them into > python just to write them to stdin). > > Peter > _______________________________________________ > Biopython mailing list - Biopython at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython > From p.j.a.cock at googlemail.com Thu Jul 21 16:09:18 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 21 Jul 2011 17:09:18 +0100 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: On Thu, Jul 21, 2011 at 4:51 PM, malvika sharan wrote: > Hi, > > i have tried this tool for aligning 1 sequence ( 'aseq.fasta ') against 5 > sequence present in other file ('bseq.fasta'). as expected aseq.fasta gives > pairwise alignment with every fasta sequence present in bseq.fasta. > > the alignment works perfectly and saves the alignment output as well. > > 1> the question is if there is anyway to extraxt consensus out of all > pairwise alignments? I know its is possible with clustalw or muscle or other > alignment tool. but i do not want pairwise alignment between all the > sequence with each other. In this case Emboss seemed the better tool where i > can align all sequence only with 1 query. but the crucial part is to extract > the conserved residue from the alignment. > > 2> The best would be to align all the sequence together against 1 sequence > (aseq.fasta) like it happens in COBALT. and find the conserved residue > directly. but i did not find any commandline tool like that unfortunately. > It would be great if you can suggest a tool if you know any. > > thank you ! > Malvika I don't understand what you are asking for - it is hard to define a consensus from a pairwise alignment. Could you give a short example? Also this thread really is going off topic, perhaps a new email thread (with a more relevant title) would be better? Peter From malvikasharan at gmail.com Thu Jul 21 16:32:52 2011 From: malvikasharan at gmail.com (malvika sharan) Date: Thu, 21 Jul 2011 18:32:52 +0200 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: yup sure!! On Thu, Jul 21, 2011 at 6:09 PM, Peter Cock wrote: > On Thu, Jul 21, 2011 at 4:51 PM, malvika sharan > wrote: > > Hi, > > > > i have tried this tool for aligning 1 sequence ( 'aseq.fasta ') against 5 > > sequence present in other file ('bseq.fasta'). as expected aseq.fasta > gives > > pairwise alignment with every fasta sequence present in bseq.fasta. > > > > the alignment works perfectly and saves the alignment output as well. > > > > 1> the question is if there is anyway to extraxt consensus out of all > > pairwise alignments? I know its is possible with clustalw or muscle or > other > > alignment tool. but i do not want pairwise alignment between all the > > sequence with each other. In this case Emboss seemed the better tool > where i > > can align all sequence only with 1 query. but the crucial part is to > extract > > the conserved residue from the alignment. > > > > 2> The best would be to align all the sequence together against 1 > sequence > > (aseq.fasta) like it happens in COBALT. and find the conserved residue > > directly. but i did not find any commandline tool like that > unfortunately. > > It would be great if you can suggest a tool if you know any. > > > > thank you ! > > Malvika > > I don't understand what you are asking for - it is hard to define a > consensus from a pairwise alignment. Could you give a short example? > > Also this thread really is going off topic, perhaps a new email thread > (with a more relevant title) would be better? > > Peter > From from.d.putto at gmail.com Fri Jul 22 10:58:53 2011 From: from.d.putto at gmail.com (Sheila the angel) Date: Fri, 22 Jul 2011 12:58:53 +0200 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: Oh I am sorry for the line seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence object This was meant to show that I have a sequence record object. In my actual problem I have open 2 different genbank files which contains many sequence record. I want to run EMBOSS to each sequence. from Bio import SeqIO for seq_record in SeqIO.parse("ls_orchid.gbk", "genbank"): seq1=seq_record.seq for seq_record2 in SeqIO.parse("other_file.gbk", "genbank"): #NOTE- this is a different genbank file seq2=seq_record2.seq EMBOSS_OUT= RUN_EMBOSS_WATER_on_seq1_and_seq2(seq1,seq2) #Analyse 'EMBOSS_OUT' for further analysis One possible way to do this - - convert genbank files to fasta and then use one file as 'beta.fasta'. But in such case I can't analysis EMBOSS result directly. I have to wait till EMBOSS is done with all the sequences. Another way is to create 2 temporary files temp1 and temp2 and pass them to EMBOSS. Though it makes code little slow. (Because every time 1st you have write sequence in temporary files and after analysis delete it) I tried another solution.....it may be little dirty but may be useful for someone. #--------------------------------------------------------------------------------------- def RUN_EMBOSS_WATER_on_seq1_and_seq2 (seq1,seq2,out_file='stdout'): water_cline = WaterCommandline() water_cline.asequence='asis:'+str(seq1) water_cline.bsequence='asis:'+str(seq2) water_cline.gapopen=10 water_cline.gapextend=0.5 water_cline.outfile=out_file stdout, stderr = water_cline() return (stdout) #--------------------------------------------------------------------------------------- you can call the function as EMBOSS_OUT=RUN_EMBOSS_WATER_on_seq1_and_seq2 (seq1,seq2) and then can do analysis with 'EMBOSS_OUT'. Here neither you have to wait till EMBOSS is done with all the sequences nor you have to create temporary files :) BUT the only problem is the name of the sequence are not present in output and it writes only # Aligned_sequences: 2 # 1: asis # 2: asis So if you don't care which sequence is what then it works :) On Thu, Jul 21, 2011 at 11:56 AM, Peter Cock wrote: > On Thu, Jul 21, 2011 at 10:48 AM, Sheila the angel > wrote: > > Thanks Peter, > > Yes I understand that I can't use 'stdin' twice so this is not going to > work > > > >>You have to use either two input files, or stdin and one file, (or one > >>file and stdin). > > > > But how can I specify two input files, or stdin > > > > Two files is covered in the Tutorial example you originally started > with, isn't it?: > > water_cline = WaterCommandline(asequence="alpha.fasta", > bsequence="beta.fasta", gapopen=10, gapextend=0.5, > outfile="water.txt") > stdout, stderr =water_cline() > > You've already done stdin and file for a and b, and said that works: > > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") #or a sequence > object > water_cline = WaterCommandline(asequence="stdin", > bsequence="beta.fasta", gapopen=10, gapextend=0.5, > outfile="water.txt") > stdout, stderr =water_cline(stdin=seq_record.format("fasta")) > > Doing it the other way round with a file and stdin for a and b would be > just: > > seq_record = SeqIO.read(open("beta.fasta"), "fasta") #or a sequence object > water_cline = WaterCommandline(asequence="alpha.fasta", > bsequence="stdin", gapopen=10, gapextend=0.5, outfile="water.txt") > stdout, stderr =water_cline(stdin=seq_record.format("fasta")) > > Obviously if you already have the sequences in FASTA files, then just give > the filenames to the EMBOSS tool (rather than needlessly loading them into > python just to write them to stdin). > > Peter > From p.j.a.cock at googlemail.com Fri Jul 22 11:18:08 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 22 Jul 2011 12:18:08 +0100 Subject: [Biopython] Passing sequence to local BLAST In-Reply-To: References: Message-ID: On Fri, Jul 22, 2011 at 11:58 AM, Sheila the angel wrote: > Oh I am sorry for the line > seq_record = SeqIO.read(open("alpha.fasta"), "fasta") ?#or a sequence object > This was meant to show that I have a sequence record object. I understood. > In my actual problem I have open 2 different genbank files which contains > many sequence record. I want to run EMBOSS to each sequence. The EMBOSS tools can read GenBank files too. Try something like this using their Uniform Sequence Address (USA) convention, http://emboss.sourceforge.net/docs/#Usa http://emboss.sourceforge.net/docs/themes/UniformSequenceAddress.html water_cline = WaterCommandline(asequence="genbank::ls_orchid.gbk", bsequence="genbank::other_file.gbk", gapopen=10, gapextend=0.5, outfile="water.txt", auto=True) Alternative some (all?) of the EMBOSS tools provide an explicit separate argument for the input file format. In this case it wasn't very obvious as it wasn't listed in the "water --help" text, you could use -sformat1 and -sformat2 but it looks like our needle/water wrappers don't include them. So go with the USA approach. Peter From gowencm at vcu.edu Thu Jul 28 16:28:40 2011 From: gowencm at vcu.edu (Chris Gowen) Date: Thu, 28 Jul 2011 12:28:40 -0400 Subject: [Biopython] PWM using gapped alignments Message-ID: Hello all, We are trying to perform pwm calculations using the Motif.pwm() function, and many of our alignments have gaps, which raise KeyError when it tries the key '-'. I am fairly inexperienced with this analysis technique, but from looking at the source, it seems the error itself may be avoided by adding a line before line 97 to skip that letter in the calculation. Would this mess up the calculation for the pwm scores? Has anyone dealt with this problem in a more clever way? Thanks for any advise you can offer. Best, Chris Gowen 82 - def pwm (self,laplace=True): 83 """ 84 returns the PWM computed for the set of instances 85 86 if laplace=True (default), pseudocounts equal to self.background multiplied by self.beta are added to all positions. 87 """ 88 89 if self. _pwm_is_current: 90 return self._pwm 91 #we need to compute new pwm 92 self._pwm = [] 93 for i in xrange ( self.length ): 94 dict = {} 95 #filling the dict with 0's 96 for letter in self. alphabet . letters : 97 if laplace: 98 dict[letter]=self.beta*self.background[letter] 99 else: 100 dict[letter]=0.0 101 if self.has_counts: 102 #taking the raw counts 103 for letter in self.alphabet .letters : 104 dict[letter]+=self.counts[letter][i ] 105 elif self.has_instances: 106 #counting the occurences of letters in instances 107 for seq in self.instances: 108 #dict[seq[i]]=dict[seq[i]]+1 109 dict[seq [i ]]+=1 110 self._pwm .append ( FreqTable . FreqTable ( dict,FreqTable .COUNT , self.alphabet )) 111 self._pwm_is_current=1 112 return self._pwm From p.j.a.cock at googlemail.com Thu Jul 28 16:54:40 2011 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 28 Jul 2011 17:54:40 +0100 Subject: [Biopython] PWM using gapped alignments In-Reply-To: References: Message-ID: On Thu, Jul 28, 2011 at 5:28 PM, Chris Gowen wrote: > Hello all, > > We are trying to perform pwm calculations using the Motif.pwm() function, > and many of our alignments have gaps, which raise KeyError when it tries the > key '-'. I am fairly inexperienced with this analysis technique, but from > looking at the source, it seems the error itself may be avoided by adding a > line before line 97 to skip that letter in the calculation. Would this mess > up the calculation for the pwm scores? Has anyone dealt with this problem in > a more clever way? > > Thanks for any advise you can offer. > > Best, > Chris Gowen Which alphabet are you using? My guess is you didn't have a gapped alphabet. As an aside, making the Seq object test this has certain appeal but would impose a performance penalty. Peter From gowencm at vcu.edu Thu Jul 28 17:14:03 2011 From: gowencm at vcu.edu (Chris Gowen) Date: Thu, 28 Jul 2011 13:14:03 -0400 Subject: [Biopython] PWM using gapped alignments In-Reply-To: References: Message-ID: Hi Peter, Thanks for the response. We are initiating the alignments and motif as Gapped(IUPAC.unambiguous_dna), so our letters are 'GATC-'. As far as I can tell, there is no means for pwm() to know to skip the gaps, if that's even the 'right' thing for it to do. On Thu, Jul 28, 2011 at 12:54 PM, Peter Cock wrote: > On Thu, Jul 28, 2011 at 5:28 PM, Chris Gowen wrote: > > Hello all, > > > > We are trying to perform pwm calculations using the Motif.pwm() function, > > and many of our alignments have gaps, which raise KeyError when it tries > the > > key '-'. I am fairly inexperienced with this analysis technique, but from > > looking at the source, it seems the error itself may be avoided by adding > a > > line before line 97 to skip that letter in the calculation. Would this > mess > > up the calculation for the pwm scores? Has anyone dealt with this problem > in > > a more clever way? > > > > Thanks for any advise you can offer. > > > > Best, > > Chris Gowen > > Which alphabet are you using? My guess is you didn't have a gapped > alphabet. > > As an aside, making the Seq object test this has certain appeal but > would impose a performance penalty. > > Peter >