[Biopython-dev] 7/5 biopython Questions - BioStar
Feed My Inbox
updates at feedmyinbox.com
Tue Jul 5 06:56:46 EDT 2011
// GenBank to Fasta failing with CONTIG fields
// July 5, 2011 at 6:31 AM
http://biostar.stackexchange.com/questions/9892/genbank-to-fasta-failing-with-contig-fields
I used to generate FASTA out of my GenBank source files using a simple conversion script:
#!/usr/bin/env python
import sys, signal
from Bio import SeqIO
def wrap( text, width=80 ):
for i in xrange( 0, len( text ), width ):
yield text[i:i+width]
if name == "main":
status = progress()
for record in SeqIO.parse( sys.stdin, "genbank"):
try:
gi = record.annotations["gi"]
except KeyError:
gi = None
accession = record.id
desc = record.description
seq = record.seq
locus = record.name
print ">gi|%s|emb|%s|%s| %s" % (gi, accession, locus, desc)
for block in wrap( seq ):
print block
When I changed the sequence files to newer versions some of the resulting FASTA file sequences were just filled with Ns. After closer inspection of the GenBank source files, it turns out that they have replaced the ORIGIN block
ORIGIN
sequence...
with a CONTIG block, something like
CONTIG join(BX640437.1:1..347356,BX640438.1:51..347786,...)
Is there a way to resolve this using BioPython?
I was working with BioPython 1.52 and 1.57 (latest).
Thanks for your suggestions.
// Parsing BLAST output BioPython Error
// July 5, 2011 at 2:25 AM
http://biostar.stackexchange.com/questions/9882/parsing-blast-output-biopython-error
Hi,
I have the following code
def runBLAST(self):
print "Running BLAST .........."
cmd=subprocess.Popen("blastp -db nr -query repeat.txt -out out.faa -evalue 0.001 -gapopen 11 -gapextend 1 -matrix BLOSUM62 -remote -outfmt 5",shell=True)
cmd.communicate()[0]
f1=open("out.faa")
blast_records = NCBIXML.parse(f1)
save_file = open("my_fasta_seq.fasta", 'w')
for blast_record in blast_records[:10]:
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
save_file.write('>%s\n' % (alignment.hseq,))
save_file.close()
f1.close()
f2=open("my_fasta_seq.fasta")
for record in SeqIO.parse(f2,"fasta"):
f=open("tempBLAST1.txt","w")
f.write(">"+"\n"+str(record.name)+"\n"+str(record.seq)+"\n")
f.close()
I get the error on TypeError: for blast_record in blast_records[:10]: saying 'generator' object is not subscriptable.
I am looking to get top 10 blast hits (sequences)
// Getting top 10 sequences of BLAST results Bio Python
// July 5, 2011 at 12:29 AM
http://biostar.stackexchange.com/questions/9880/getting-top-10-sequences-of-blast-results-bio-python
Hi,
I want to get top 10 sequences of BLAST results (just the sequences, no alignment or score or e-value etc). I am inputting a text file containing 5 fasta file. So my output should be top 10 blast hits of each fasta file.. therefore my output file will have 50 sequences.
I am reading each of my input fasta file through Bio.SeqIO, writing it as temp.faa and then passing it to command line BLAST through subprocess as
blastp -db nr -query temp.faa -out out.faa -evalue 0.001 -gapopen 11 -gapextend 1 -matrix BLOSUM62 -remote -outfmt 2
the output has lots of other information. Should I parse this output now or there's a better way.
Thanks
P.S XML might be the way, but I didn't find a relavant NCBIXML parser syntax
--
Website: http://biostar.stackexchange.com/questions/tagged/biopython
Account Login:
https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email
Unsubscribe here:
http://www.feedmyinbox.com/feeds/unsubscribe/782463/cfe3e2c307e215f87d612a439b646b9c22290b84/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email
--
This email was carefully delivered by FeedMyInbox.com.
PO Box 682532 Franklin, TN 37068
More information about the Biopython-dev
mailing list