[Biopython] Python getting stuck reading fastq file
Philipp Schiffer
philipp.schiffer at gmail.com
Sat Dec 21 15:57:35 UTC 2013
Hi again!
Seems to be working now, after I pulled the latest Biopython from github.
Very sorry for Saturday evening disturbance.
Have a great Christmas season everybody.
Philipp
--
Philipp Schiffer
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
On Saturday, 21 December 2013 at 16:46, Philipp Schiffer wrote:
> Hi!
>
> I am experiencing a problem when reading from a fastq file (qualities in Sanger scoring). Whatever I do, at one point through my file the reading (or writing, or comparing, which is later in my script and excluded here) gets stuck. The following output is from an ipython session (Python 2.7.5).
> Biopython was installed through pip on a Scientific Linux 6.2 system.
> Is this an error with the SeqIO parser? Or am I doing something wrong?
>
> Any help with this would be highly appreciated.
>
> Kind regards
>
> Philipp
>
> import string
> from pprint import pprint
> import os
> from Bio import SeqIO
> from subprocess import call
> import sys
> import re
>
> fqoutfile = open('/data2/PS1159/reads_Feb_12/fqoutfile.fq', 'w')
> my_abundantheads=set()
> my_onetwo = re.compile('\/[1-2]’)
>
> abundant = open('/data2/PS1159/reads_Feb_12/120126_0281_AD088PACXX_5_SA-PE-023_shuf.clean.fq.gz.keep.gz.abundfilt', 'rU')
> for record in SeqIO.parse(abundant, "fastq"):
> ids = my_onetwo.split(record.id)
>
> my_abundantheads.update([ids[0]])
>
>
>
> ….
>
> ^C---------------------------------------------------------------------------
> KeyboardInterrupt Traceback (most recent call last)
> <ipython-input-14-d87560a5b18b> in <module>()
> ----> 1 for record in SeqIO.parse(abundant, "fastq"):
> 2 ids = my_onetwo.split(record.id)
> 3 my_abundantheads.update([ids[0]])
> 4
>
> /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/__init__.pyc in parse(handle, format, alphabet)
> 539 raise ValueError("Unknown format '%s'" % format)
> 540 #This imposes some overhead... wait until we drop Python 2.4 to fix it
> --> 541 for r in i:
> 542 yield r
> 543
>
> /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/QualityIO.pyc in FastqPhredIterator(handle, alphabet, title2ids)
> 1034 for letter in range(0, 255):
> 1035 q_mapping[chr(letter)] = letter - SANGER_SCORE_OFFSET
> -> 1036 for title_line, seq_string, quality_string in FastqGeneralIterator(handle):
> 1037 if title2ids:
> 1038 id, name, descr = title2ids(title_line)
>
> /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/QualityIO.pyc in FastqGeneralIterator(handle)
> 934 #There may now be more quality data, or another sequence, or EOF
> 935 while True:
> --> 936 line = handle_readline()
> 937 if not line:
> 938 break # end of file
>
> KeyboardInterrupt:
>
>
>
>
> --
> Philipp Schiffer
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>
More information about the Biopython
mailing list