[Biopython] Python getting stuck reading fastq file

Sat Dec 21 15:57:35 UTC 2013

Hi again!

Seems to be working now, after I pulled the latest Biopython from github.
Very sorry for Saturday evening disturbance.

Have a great Christmas season everybody.

Philipp  

--  
Philipp Schiffer
Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

On Saturday, 21 December 2013 at 16:46, Philipp Schiffer wrote:

> Hi!
>  
> I am experiencing a problem when reading from a fastq file (qualities in Sanger scoring). Whatever I do, at one point through my file the reading (or writing, or comparing, which is later in my script and excluded here) gets stuck. The following output is from an ipython session (Python 2.7.5).
> Biopython was installed through pip on a Scientific Linux 6.2 system.
> Is this an error with the SeqIO parser? Or am I doing something wrong?  
>  
> Any help with this would be highly appreciated.
>  
> Kind regards
>  
> Philipp
>  
> import string
> from pprint import pprint
> import os
> from Bio import SeqIO
> from subprocess import call
> import sys
> import re
>  
> fqoutfile = open('/data2/PS1159/reads_Feb_12/fqoutfile.fq', 'w')
> my_abundantheads=set()
> my_onetwo = re.compile('\/[1-2]’)
>  
> abundant = open('/data2/PS1159/reads_Feb_12/120126_0281_AD088PACXX_5_SA-PE-023_shuf.clean.fq.gz.keep.gz.abundfilt', 'rU')
> for record in SeqIO.parse(abundant, "fastq"):
>         ids = my_onetwo.split(record.id)
>  
> my_abundantheads.update([ids[0]])
>  
>  
>  
> ….
>  
> ^C---------------------------------------------------------------------------
> KeyboardInterrupt                         Traceback (most recent call last)
> <ipython-input-14-d87560a5b18b> in <module>()
> ----> 1 for record in SeqIO.parse(abundant, "fastq"):
>       2         ids = my_onetwo.split(record.id)
>       3         my_abundantheads.update([ids[0]])
>       4
>  
> /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/__init__.pyc in parse(handle, format, alphabet)
>     539             raise ValueError("Unknown format '%s'" % format)
>     540         #This imposes some overhead... wait until we drop Python 2.4 to fix it
> --> 541         for r in i:
>     542             yield r
>     543
>  
> /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/QualityIO.pyc in FastqPhredIterator(handle, alphabet, title2ids)
>    1034     for letter in range(0, 255):
>    1035         q_mapping[chr(letter)] = letter - SANGER_SCORE_OFFSET
> -> 1036     for title_line, seq_string, quality_string in FastqGeneralIterator(handle):
>    1037         if title2ids:
>    1038             id, name, descr = title2ids(title_line)
>  
> /usr/local/lib/python2.7/site-packages/biopython-1.61-py2.7-linux-x86_64.egg/Bio/SeqIO/QualityIO.pyc in FastqGeneralIterator(handle)
>     934         #There may now be more quality data, or another sequence, or EOF
>     935         while True:
> --> 936             line = handle_readline()
>     937             if not line:
>     938                 break  # end of file
>  
> KeyboardInterrupt:
>  
>  
>  
>  
> --  
> Philipp Schiffer
> Sent with Sparrow (http://www.sparrowmailapp.com/?sig)
>