[Biopython-dev] [Bug 3161] New: MEME Parser fails for large MEME files

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Dec 21 14:16:06 UTC 2010


http://bugzilla.open-bio.org/show_bug.cgi?id=3161

           Summary: MEME Parser fails for large MEME files
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: lpritc at scri.sari.ac.uk


When using the MEME parser for MEME (4.5.0) text output containing more than 99
sequences, the parser fails to read motif header lines for motifs 100+:


In [1]: from Bio import Motif

In [2]: data = list(Motif.parse(open('meme.txt'), 'MEME'))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)

/Volumes/RAID_Mirror/Organisms/Phytophthora
infestans/RXLR/rxlr_meme/purge_clustering/rxlr_full/<ipython console> in
<module>()

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Motif/__init__.pyc
in parse(handle, format)
     76             yield reader(handle)
     77     else: # we have a proper reader
---> 78         for m in parser(handle).motifs:
     79             yield m
     80 

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Motif/Parsers/MEME.pyc
in read(handle)
     39         raise ValueError('Unexpected end of stream')
     40     while True:
---> 41         motif = __create_motif(line)
     42         motif.alphabet = record.alphabet
     43         record.motifs.append(motif)

/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/Bio/Motif/Parsers/MEME.pyc
in __create_motif(line)
    260     ls = line.split()
    261     motif = MEMEMotif()
--> 262     motif.length = int(ls[4])
    263     motif._numoccurrences(ls[7])
    264     motif._evalue(ls[13])

ValueError: invalid literal for int() with base 10: 'sites'

This happens because for motifs with number greater than 99 there is no
whitespace between 'MOTIF' and the motif number in the motif header, e.g.:

********************************************************************************
MOTIF 99        width =   29   sites =   4   llr = 286   E-value = 4.0e-016
********************************************************************************

********************************************************************************
MOTIF100        width =   29   sites =   3   llr = 253   E-value = 1.4e-023
********************************************************************************

which throws off the indexing of the parser's __create_motif function.  This
can be fixed by offsetting the header line by five characters to remove the
MOTIF string, and changing the indexing accordingly:

def __create_motif(line):
    line = line[5:].strip()
    ls = line.split()
    motif = MEMEMotif()
    motif.length = int(ls[3])
    motif._numoccurrences(ls[6])
    motif._evalue(ls[12])
    return motif


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list