[Biopython] muscle alignment using stdin and stdout - broken pipe error

Jakub Nowak jakub.nowak at ed.ac.uk
Wed Oct 28 16:23:38 UTC 2015


Dear biopython users,

I am trying to perform consensus motif analysis for my CLIP data. I have list of motifs that I can load to the data frame using pandas module. I can successfully go through the motifs in the column and write them into external fasta file using biopython tools.
However I wanted to bypass external file and use stdin and stdout to perform alignment with muscle command line. For some reason I am getting pipe error [Errno 32] Broken pipe.

Any ideas what is the problem and how to fix it?

Below is the code I used Ipython

# In[1]:
import pandas as pd
import sys
import subprocess
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import generic_dna
from Bio.Align.Applications import MuscleCommandline


# In[2]:
muscle_cline = MuscleCommandline(clwstrict=True)
print muscle_cline


# In[3]:
df = pd.read_csv('Trim25_motifs_k-mers.txt', sep='\t', comment="#", header=None)
df.columns = ['Motif','Count']


# In[4]:
df.head(5)
Motif	Count
0	GAGGAG	1321
1	AGGAGG	1168
2	GGAGGA	1167
3	GGGAGG	1056
4	CTGGAG	1038

# In[5]:
child = subprocess.Popen(str(muscle_cline),
                        stdin = subprocess.PIPE,
                        stdout = subprocess.PIPE,
                        stderr = subprocess.PIPE,
                        universal_newlines = True,
                        shell = (sys.platform!='Darwin'))


# In[6]:
Record = []
for x in range(0, len(df['Motif'])):
    Record.append(SeqRecord(Seq(df['Motif'].get_value(x), generic_dna), id = "motif_" + str(x+1)))
SeqIO.write(Record, child.stdin, "fasta")


# In[7]:
for x in range(0, len(df['Motif'])):
    SeqIO.write(SeqRecord(Seq(df['Motif'].get_value(x), generic_dna), id = "motif_" + str(x+1)), child.stdin, 'fasta')


# In[8]: #this works fine
Record = []
for x in range(0, len(df['Motif'])):
    Record.append(SeqRecord(Seq(df['Motif'].get_value(x), generic_dna), id = "motif_" + str(x+1)))
SeqIO.write(Record, 'motif.fasta', "fasta")


In[6] and In[7] generate the same error message as below:

> IOError
>                                    Traceback (most recent call last)
> 
> <ipython-input-6-90b52142f481> in <module>()
>       2 for x in range(0, len(df['Motif'])):
>       3     Record.append(SeqRecord(Seq(df['Motif'].get_value(x), generic_dna), id = "motif_" + str(x+1)))
> ----> 4 SeqIO.write(Record, child.stdin, "fasta")
> 
> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/__init__.pyc in write(sequences, handle, format)
>     465         if format in _FormatToWriter:
>     466             writer_class = _FormatToWriter[format]
> --> 467             count = writer_class(fp).write_file(sequences)
>     468         elif format in AlignIO._FormatToWriter:
>     469             #Try and turn all the records into a single alignment,
> 
> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.pyc in write_file(self, records)
>     205
>          """
>     206         self.write_header()
> --> 207         count = self.write_records(records)
>     208         self.write_footer()
>     209         return count
> 
> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/Interfaces.pyc in write_records(self, records)
>     190         count = 0
>     191         for record in records:
> --> 192             self.write_record(record)
>     193             count += 1
>     194         #Mark as true, even if there where no records
> 
> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/Bio/SeqIO/FastaIO.pyc in write_record(self, record)
>     198         assert "\n" not in title
>     199         assert "\r" not in title
> --> 200         self.handle.write(">%s\n" % title)
>     201 
>     202         data = self._get_seq_string(record)  # Catches sequence being None
> 
> IOError: [Errno 32] Broken pipe

In[8] Successfully writes to the fasta file all sequences. I guess I can use this file as input but as I said I wanted to bypass external file.

If you have any suggestion what could have caused that I will be grateful.

Follow up problem I might run in the future is regarding the occurrence number. If you have some idea how to include occurrence number in alignment calculation as weight. I can populate fasta file with multiple same sequences matching with the number of occurrence events. But this approach sounds quite crude and redundant.
If you can think of any other solution I will be grateful

Thanks very much for your help,

Jakub
-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.




More information about the Biopython mailing list