[Biopython] Python getting stuck reading fastq file
Philipp Schiffer
philipp.schiffer at gmail.com
Sat Dec 21 15:46:40 UTC 2013
Hi!
I am experiencing a problem when reading from a fastq file (qualities in Sanger scoring). Whatever I do, at one point through my file the reading (or writing, or comparing, which is later in my script and excluded here) gets stuck. The following output is from an ipython session (Python 2.7.5).
Biopython was installed through pip on a Scientific Linux 6.2 system.
Is this an error with the SeqIO parser? Or am I doing something wrong?
Any help with this would be highly appreciated.
Kind regards
Philipp
import string
from pprint import pprint
import os
from Bio import SeqIO
from subprocess import call
import sys
import re
fqoutfile = open('/data2/PS1159/reads_Feb_12/fqoutfile.fq', 'w')
my_abundantheads=set()
my_onetwo = re.compile('\/[1-2]’)
abundant = open('/data2/PS1159/reads_Feb_12/120126_0281_AD088PACXX_5_SA-PE-023_shuf.clean.fq.gz.keep.gz.abundfilt', 'rU')
for record in SeqIO.parse(abundant, "fastq"):
ids = my_onetwo.split(record.id)
my_abundantheads.update([ids[0]])
….
^C---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-14-d87560a5b18b> in <module>()
----> 1 for record in SeqIO.parse(abundant, "fastq"):
2 ids = my_onetwo.split(record.id)
3 my_abundantheads.update([ids[0]])
4
/usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/__init__.pyc in parse(handle, format, alphabet)
539 raise ValueError("Unknown format '%s'" % format)
540 #This imposes some overhead... wait until we drop Python 2.4 to fix it
--> 541 for r in i:
542 yield r
543
/usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/QualityIO.pyc in FastqPhredIterator(handle, alphabet, title2ids)
1034 for letter in range(0, 255):
1035 q_mapping[chr(letter)] = letter - SANGER_SCORE_OFFSET
-> 1036 for title_line, seq_string, quality_string in FastqGeneralIterator(handle):
1037 if title2ids:
1038 id, name, descr = title2ids(title_line)
/usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/QualityIO.pyc in FastqGeneralIterator(handle)
934 #There may now be more quality data, or another sequence, or EOF
935 while True:
--> 936 line = handle_readline()
937 if not line:
938 break # end of file
KeyboardInterrupt:
--
Philipp Schiffer
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
More information about the Biopython
mailing list