[Biopython] Passing sequence to local BLAST

Sheila the angel from.d.putto at gmail.com
Fri Jul 22 10:58:53 UTC 2011


Oh I am sorry for the line
seq_record = SeqIO.read(open("alpha.fasta"), "fasta")  #or a sequence object

This was meant to show that I have a sequence record object.
In my actual problem I have open 2 different genbank files which contains
many sequence record. I want to run EMBOSS to each sequence.

from Bio import SeqIO
for seq_record in SeqIO.parse("ls_orchid.gbk", "genbank"):
  seq1=seq_record.seq
  for seq_record2 in SeqIO.parse("other_file.gbk", "genbank"): #NOTE- this
is a different genbank file
       seq2=seq_record2.seq
       EMBOSS_OUT= RUN_EMBOSS_WATER_on_seq1_and_seq2(seq1,seq2)
       #Analyse 'EMBOSS_OUT' for further analysis

One possible way to do this - - convert genbank files to fasta and then use
one file as 'beta.fasta'. But in such case I can't analysis EMBOSS result
directly. I have to wait till EMBOSS is done with all the sequences.
Another way is to create 2 temporary files temp1 and temp2 and pass them
to EMBOSS. Though it makes code little slow. (Because every time 1st you
have write sequence in temporary files and after analysis delete it)
I tried another solution.....it may be little dirty but may be useful for
someone.
#---------------------------------------------------------------------------------------
def RUN_EMBOSS_WATER_on_seq1_and_seq2 (seq1,seq2,out_file='stdout'):
water_cline = WaterCommandline()
water_cline.asequence='asis:'+str(seq1)
water_cline.bsequence='asis:'+str(seq2)
water_cline.gapopen=10
water_cline.gapextend=0.5
water_cline.outfile=out_file
stdout, stderr = water_cline()
return (stdout)
#---------------------------------------------------------------------------------------

you can call the function as

EMBOSS_OUT=RUN_EMBOSS_WATER_on_seq1_and_seq2 (seq1,seq2)

and then can do analysis with 'EMBOSS_OUT'. Here neither you have to wait
till EMBOSS is done with all the sequences nor you have to
create temporary files :)
BUT the only problem is the name of the sequence are not present in output
and it writes only
# Aligned_sequences: 2
# 1: asis
# 2: asis

So if you don't care which sequence is what then it works :)


On Thu, Jul 21, 2011 at 11:56 AM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Thu, Jul 21, 2011 at 10:48 AM, Sheila the angel
> <from.d.putto at gmail.com> wrote:
> > Thanks Peter,
> > Yes I understand that I can't use 'stdin' twice so this is not going to
> work
> >
> >>You have to use either two input files, or stdin and one file, (or one
> >>file and stdin).
> >
> > But how can I specify two input files, or stdin
> >
>
> Two files is covered in the Tutorial example you originally started
> with, isn't it?:
>
> water_cline = WaterCommandline(asequence="alpha.fasta",
> bsequence="beta.fasta", gapopen=10, gapextend=0.5,
> outfile="water.txt")
> stdout, stderr =water_cline()
>
> You've already done stdin and file for a and b, and said that works:
>
> seq_record = SeqIO.read(open("alpha.fasta"), "fasta")  #or a sequence
> object
> water_cline = WaterCommandline(asequence="stdin",
> bsequence="beta.fasta", gapopen=10, gapextend=0.5,
> outfile="water.txt")
> stdout, stderr =water_cline(stdin=seq_record.format("fasta"))
>
> Doing it the other way round with a file and stdin for a and b would be
> just:
>
> seq_record = SeqIO.read(open("beta.fasta"), "fasta")  #or a sequence object
> water_cline = WaterCommandline(asequence="alpha.fasta",
> bsequence="stdin", gapopen=10, gapextend=0.5, outfile="water.txt")
> stdout, stderr =water_cline(stdin=seq_record.format("fasta"))
>
> Obviously if you already have the sequences in FASTA files, then just give
> the filenames to the EMBOSS tool (rather than needlessly loading them into
> python just to write them to stdin).
>
> Peter
>



More information about the Biopython mailing list