[Biopython] muscle alignment using stdin and stdout - broken pipe error
Brian Osborne
bosborne11 at verizon.net
Wed Oct 28 17:52:41 UTC 2015
Jakub,
This is how I run muscle, without using files:
seqs = [seq1, seq2]
muscle_cmd = ['muscle', "-quiet", "-maxiters", "1", "-diags"]
muscle = subprocess.Popen(muscle_cmd, stdin=subprocess.PIPE,
stdout=subprocess.PIPE, universal_newlines=True)
SeqIO.write(seqs, muscle.stdin, "fasta")
muscle.stdin.close()
align = AlignIO.read(muscle.stdout, 'fasta')
muscle.stdout.close()
“import subprocess”, SeqIO, AlignIO, of course.
Brian O.
> On Oct 28, 2015, at 12:23 PM, Jakub Nowak <jakub.nowak at ed.ac.uk> wrote:
>
> Dear biopython users,
>
> I am trying to perform consensus motif analysis for my CLIP data. I have list of motifs that I can load to the data frame using pandas module. I can successfully go through the motifs in the column and write them into external fasta file using biopython tools.
> However I wanted to bypass external file and use stdin and stdout to perform alignment with muscle command line. For some reason I am getting pipe error [Errno 32] Broken pipe.
>
> Any ideas what is the problem and how to fix it?
>
> Below is the code I used Ipython
>
> # In[1]:
> import pandas as pd
> import sys
> import subprocess
> from Bio import SeqIO
> from Bio.Seq import Seq
> from Bio.SeqRecord import SeqRecord
> from Bio.Alphabet import generic_dna
> from Bio.Align.Applications import MuscleCommandline
>
>
> # In[2]:
> muscle_cline = MuscleCommandline(clwstrict=True)
> print muscle_cline
>
>
> # In[3]:
> df = pd.read_csv('Trim25_motifs_k-mers.txt', sep='\t', comment="#", header=None)
> df.columns = ['Motif','Count']
>
>
> # In[4]:
> df.head(5)
> Motif Count
> 0 GAGGAG 1321
> 1 AGGAGG 1168
> 2 GGAGGA 1167
> 3 GGGAGG 1056
> 4 CTGGAG 1038
>
> # In[5]:
> child = subprocess.Popen(str(muscle_cline),
> stdin = subprocess.PIPE,
> stdout = subprocess.PIPE,
> stderr = subprocess.PIPE,
> universal_newlines = True,
> shell = (sys.platform!='Darwin'))
>
>
> # In[6]:
> Record = []
> for x in range(0, len(df['Motif'])):
> Record.append(SeqRecord(Seq(df['Motif'].get_value(x), generic_dna), id = "motif_" + str(x+1)))
> SeqIO.write(Record, child.stdin, "fasta")
>
>
> # In[7]:
> for x in range(0, len(df['Motif'])):
> SeqIO.write(SeqRecord(Seq(df['Motif'].get_value(x), generic_dna), id = "motif_" + str(x+1)), child.stdin, 'fasta')
>
>
> # In[8]: #this works fine
> Record = []
> for x in range(0, len(df['Motif'])):
> Record.append(SeqRecord(Seq(df['Motif'].get_value(x), generic_dna), id = "motif_" + str(x+1)))
> SeqIO.write(Record, 'motif.fasta', "fasta")
>
>
> In[6] and In[7] generate the same error message as below:
>
>> IOError
>> Traceback (most recent call last)
>>
>> <ipython-input-6-90b52142f481> in <module>()
>> 2 for x in range(0, len(df['Motif'])):
>> 3 Record.append(SeqRecord(Seq(df['Motif'].get_value(x), generic_dna), id = "motif_" + str(x+1)))
>> ----> 4 SeqIO.write(Record, child.stdin, "fasta")
>>
>> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/__init__.pyc in write(sequences, handle, format)
>> 465 if format in _FormatToWriter:
>> 466 writer_class = _FormatToWriter[format]
>> --> 467 count = writer_class(fp).write_file(sequences)
>> 468 elif format in AlignIO._FormatToWriter:
>> 469 #Try and turn all the records into a single alignment,
>>
>> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.pyc in write_file(self, records)
>> 205
>> """
>> 206 self.write_header()
>> --> 207 count = self.write_records(records)
>> 208 self.write_footer()
>> 209 return count
>>
>> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.pyc in write_records(self, records)
>> 190 count = 0
>> 191 for record in records:
>> --> 192 self.write_record(record)
>> 193 count += 1
>> 194 #Mark as true, even if there where no records
>>
>> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/FastaIO.pyc in write_record(self, record)
>> 198 assert "\n" not in title
>> 199 assert "\r" not in title
>> --> 200 self.handle.write(">%s\n" % title)
>> 201
>> 202 data = self._get_seq_string(record) # Catches sequence being None
>>
>> IOError: [Errno 32] Broken pipe
>
> In[8] Successfully writes to the fasta file all sequences. I guess I can use this file as input but as I said I wanted to bypass external file.
>
> If you have any suggestion what could have caused that I will be grateful.
>
> Follow up problem I might run in the future is regarding the occurrence number. If you have some idea how to include occurrence number in alignment calculation as weight. I can populate fasta file with multiple same sequences matching with the number of occurrence events. But this approach sounds quite crude and redundant.
> If you can think of any other solution I will be grateful
>
> Thanks very much for your help,
>
> Jakub
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
> _______________________________________________
> Biopython mailing list - Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list