[Biopython-dev] [Bug 3161] New: MEME Parser fails for large MEME files
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Tue Dec 21 14:16:06 UTC 2010
http://bugzilla.open-bio.org/show_bug.cgi?id=3161
Summary: MEME Parser fails for large MEME files
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: lpritc at scri.sari.ac.uk
When using the MEME parser for MEME (4.5.0) text output containing more than 99
sequences, the parser fails to read motif header lines for motifs 100+:
In [1]: from Bio import Motif
In [2]: data = list(Motif.parse(open('meme.txt'), 'MEME'))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/Volumes/RAID_Mirror/Organisms/Phytophthora
infestans/RXLR/rxlr_meme/purge_clustering/rxlr_full/<ipython console> in
<module>()
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Motif/__init__.pyc
in parse(handle, format)
76 yield reader(handle)
77 else: # we have a proper reader
---> 78 for m in parser(handle).motifs:
79 yield m
80
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Motif/Parsers/MEME.pyc
in read(handle)
39 raise ValueError('Unexpected end of stream')
40 while True:
---> 41 motif = __create_motif(line)
42 motif.alphabet = record.alphabet
43 record.motifs.append(motif)
/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Motif/Parsers/MEME.pyc
in __create_motif(line)
260 ls = line.split()
261 motif = MEMEMotif()
--> 262 motif.length = int(ls[4])
263 motif._numoccurrences(ls[7])
264 motif._evalue(ls[13])
ValueError: invalid literal for int() with base 10: 'sites'
This happens because for motifs with number greater than 99 there is no
whitespace between 'MOTIF' and the motif number in the motif header, e.g.:
********************************************************************************
MOTIF 99 width = 29 sites = 4 llr = 286 E-value = 4.0e-016
********************************************************************************
********************************************************************************
MOTIF100 width = 29 sites = 3 llr = 253 E-value = 1.4e-023
********************************************************************************
which throws off the indexing of the parser's __create_motif function. This
can be fixed by offsetting the header line by five characters to remove the
MOTIF string, and changing the indexing accordingly:
def __create_motif(line):
line = line[5:].strip()
ls = line.split()
motif = MEMEMotif()
motif.length = int(ls[3])
motif._numoccurrences(ls[6])
motif._evalue(ls[12])
return motif
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list