[Biopython] MultiProcess SeqIO objects
Willis, Jordan R
jordan.r.willis at Vanderbilt.Edu
Tue Mar 6 03:37:30 UTC 2012
Hello BioPython,
I was wondering if anyone has used the multiprocessing tool in conjunction with Biopython type objects? Here is my problem, I have 60 million sequences given in fastq format and I want to multiprocess these without having to iterate through the list multiple times.
So I have something like this:
from multiprocessing import Pool
from Bio import SeqIO
input_handle = open("huge_fastaqf_file.fastq,)
def convert_to_fasta(input)
return [[record.id , record.seq.reverse_complement ] for record in SeqIO.parse(input,'fastq')]
p = Pool(processes=4)
g = p.map(convert_to_fasta,input_handle)
for i in g:
print i[0],i[1]
Unfortunately, it seems to divide up the handle by all the names and tries makes the input in the function convert_to_fasta the first line of input. What I want it to do is divide up the fastq object and do my function on 4 processors.
I can't figure out how in the world to do this though.
Thanks,
jordan
More information about the Biopython
mailing list