[Biopython] Bio.motifs raising Exceptions using pypy

Marco Galardini marco.galardini at unifi.it
Fri Jul 12 09:40:59 UTC 2013


Hi,

i've arranged a sample script and sample data to replicate the issue:

python test.py test.fa test.txt
551 20.9172
-5389 21.0426

pypy test.py test.fa test.txt
551 20.9172
-5389 21.0426
Traceback (most recent call last):
   File "app_main.py", line 72, in run_toplevel
   File "test.py", line 20, in <module>
     for position, score in pssm.search(s.seq, threshold=score_t):
   File "/usr/local/lib/pypy2.7/dist-packages/Bio/motifs/matrix.py", 
line 354, in search
     score = self.calculate(s)
   File "/usr/local/lib/pypy2.7/dist-packages/Bio/motifs/matrix.py", 
line 331, in calculate
     score += self[letter][position]
   File "/usr/local/lib/pypy2.7/dist-packages/Bio/motifs/matrix.py", 
line 113, in __getitem__
     return dict.__getitem__(self, letter)
KeyError: 'N'

Hope this helps, my guess is that it may be something related to the 
implementation of dictionaries in pypy, since the object raising the 
exception inherits dict.

Thanks a lot for the help,
Marco


On 07/11/2013 01:26 PM, Peter Cock wrote:
> On Thu, Jul 11, 2013 at 12:05 PM, Marco Galardini
> <marco.galardini at unifi.it> wrote:
>> Dear Biopython team,
>>
>> I am using the Bio.motifs package to perform a motif search inside DNA
>> sequences; the motif is retrieved from a MEME file.
>>
>> When using python 2.7 the search works just fine (biopython 1.61), even
>> though a bit slow; when using pypy (2.0.2, biopython 1.61+) to speed things
>> up the same script raises an exception, complaining about the presence of
>> "N" chars inside the sequence.
>>
>> Here's the traceback:
>>
>> Traceback (most recent call last):
>>    File "app_main.py", line 72, in run_toplevel
>>    File "test.py", line 20, in <module>
>>      for position, score in pssm.search(s.seq, threshold=score_t):
>>    File "/usr/local/lib/pypy2.7/dist-packages/Bio/motifs/matrix.py", line
>> 354, in search
>>      score = self.calculate(s)
>>    File "/usr/local/lib/pypy2.7/dist-packages/Bio/motifs/matrix.py", line
>> 331, in calculate
>>      score += self[letter][position]
>>    File "/usr/local/lib/pypy2.7/dist-packages/Bio/motifs/matrix.py", line
>> 113, in __getitem__
>>      return dict.__getitem__(self, letter)
>> KeyError: 'N'
>>
>> If needed, I can provide you with the input files and a sample script.
>>
>> Thanks for the help, and keep up with the great work.
>>
>> Marco
> A short test script (which we maybe can turn into another unit
> test for this code) would be great to sort this out. Thanks!
>
> Peter
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython


-- 
-------------------------------------------------
Marco Galardini, PhD
Dipartimento di Biologia
Via Madonna del Piano, 6 - 50019 Sesto Fiorentino (FI)

e-mail: marco.galardini at unifi.it
www: http://www.unifi.it/dblage/CMpro-v-p-51.html
phone:  +39 055 4574737
mobile: +39 340 2808041
-------------------------------------------------

-------------- next part --------------
>test
GCGCCGCCGGTCCCCGAAAAAGGCGCCGGACAGTCCGTCCCGCTCATCGGGGTCGCCGCC
TCGTGGGAATCGGATTTCGACACCGGCGAGCCGGTCGGTCTGGAAACGCTTGTCGCCAAG
CGCATGATCGTTCCGACGGAGCGCCCGAAGACAGGCGTGATCGGCACCGCAGTCGGCGCG
GTCGCAAGCGTCATCCCCGATTCGCTGAAGCCCGGAAAAACACCGACCAGCTCGCGGCCG
GAGCTTGACAGGCTGATCAAACATTATGCCGAGCTGAACGGTCTGCCGCTCGAGCTGGTG
CACCGGGTGGTCAGGCGCGAGAGCAACTACAACCCGCGAGCCTACAGCAAAGGCAATTAC
GGGTTGATGCAGATCCGCTACAACACGGCCAAGGGTCTCGGCTATGAGGGCCCGGCCGAA
GGTCTCTTCGACGCGGAAACCAACCTCAAATACGCGACGAAGTACCTGCGCGGAGCGTGG
ATGGTTGCCGACAACCAGCACGACGGCGCGGTAAGGCTCTATGCCAGCGGCTATTATTAC
CATGCCAAGCGTTGATCTGGATCAAAGCTGAATATGAGGTAAGCCGCGACCAGCGGCCGA
TGGCCTATCTGCCAGACATCATTCAATCGAGCGCGTCGATTATCCTCGAATTCAGCTTCT
GCACGTCGTAGCCGAGGCGCGACGGTGTCAGCCCCAGGCGGACGACCGCGAGGCGAAGCG
AGGGGACGATCATGATCGCCTGCCCGTCATGTCCAAGCATCCAGAACGTATCGGGCGGGA
AATTCGCCGTTCCGGCGCGGGTGCCGTTTTCCTGGAGCCAGACCTGGCCTGCCCCGTAGT
CGCCCCCGGAAGCCGCAGTCGGCGTGCGCATGAAGGACACGTAACCTTCCGGCAGGAGCC
GCCTCCCCTTCCAGCTTCCGTCCTGAAGCAGAAACTCGGCGAAGCGCGCCCAGTCCTGTG
CCGACGCATACATGTAGGAAGAGCCGACGAAGGTTCCGCTTGCATCCGTCTCCATAACGG
CGCTCGTCATCCCGAGCGGAGCGAAGAACGCCTCGCGCGGATAGGAAAGCGCTTCGGCCG
GATCGTCGAATGTCTGCATCCANNNCCGGGACAGAAGATTGCTCGTGCCGCTCGAATAGG
CGAATTTCGTGCCCGGAGCCGCCTCCAGCGGCTTCGAGGCGACGAAGCCGGCCATGTCGC
TTTCCCGATAGAGCATACGCGTCACGTCCGTGACGTCGCCGTAATCCTCGTTGAAATCGA
GCCCGCTCTGCATCGCGAGAAGGTCCGTCAGCTTGATGCGAGCCCGGTCATCGCCGTTCC
ATTCGGTCACCAGATTGGTCTGGGCCAGATCCATCCGCCCTTCGGCAATGCGCCGGCCGA
TGATCGCCGCCGTCACGGACTTCGTCATCGACCAGCCGAGCAGGGGCGTGTTCCGGTCGA
AGCCCGCCGCATAGGTCTCCGCGACCAGCCTGCCATCCCTGACGACCACGATTGCACGCA
TGCCCGGACCTGCCAGTGCCGGATCTTCGACAAGCTTTTGAATGGCCGGGTCGATGTCCG
GCTTGTCCCCGTCCGGCCAGTCGAGGCTCGGATCGGGGGCGAGCGGCGCCGTTGCCGACT
CGGTCCCGCGCATCCCCGCGATGGCCTCGGCGCTGCCTCCGCTCACATTGGCGCAACCGC
GGCCCGGACGGTAGACGGCGCGGCCTGGGGCAGCAAAGCCCAGGAGACGCGCCGTCACGC
TCTGCTCTTCCCGATCGACCGAAACGCGCACGAGCTTCAGGAGCGGGTGGCCAGGCGCCT
GCACGTCTTCCTCCAGCACTTCCTGCGGATCGCGTCCCGCGAGGAACACATTGGAGCAGA
CGATCTTGGCGGCATAGCCATCGCCCACCTTGAGGAGTTCAGGCGGGAACAGCGCCAGCC
AGCCAACGAGGCCCGCGAGCGTAGCCACAACCAGCCCGCCAAGCGTCTTCAGCAGACCCT
TCATTCTCGCCCTCCTGCCCTTTGTATAAAGTGCTACAGCGCTTTCGCCCGTCTGACCAG
TGTACATGACTATTGCGTCTTGTATCCGGCAGCAGAGGCTCAGGTGGTGAGGATGACCTC
TCCTCCGGTTTGCCCTTTCGTCGCAAAATGCCGTCACCGCAACCGCTTTGTCGGAAGGGC
CTGGTGGTCGCCGCGACTCTCCTTCGCACCGCTTGCGGGGAGAAGATGCCGGCAGGCAGA
TGAGAGGCAATACCCGAATCCCTGCAAGCCCCTGTGCGAAACCTCGTCATCAAAGTGTAG
CCGAGTCACCTTAGAAGCGGCTCAGTTTCAACTGGACGACAGGCAAGATGACCGACTTCG
CCCCGGATGCCGGCTTCGGCAAGAAGAATCCGAAACTGAAAAGCGCACTCCTGCAGCACA
AAGCTCTCTCCCCCGCCGGTCTCTCCGAACGCCTGTTCGGGCTGCTCTTTTCCGGACTCG
TCTACCCGCAGATCTGGGAGGACCCGATTGTCGACATGGAAGCGATGCAGATCCGTCCCG
GACATCGGATCGTGACGATCGGTTCCGGCGGCTGCAACATGCTGACCTATCTCTCCGCCG
AGCCTGCCCGGATAGACGTGGTCGATCTCAACCCCCATCACATCGCGCTCAACCGGCTGA
AGCTGTCTGCCTTTCGCCACCTGCCGAGCCACAAGGACGTGGTGCGGTTCCTCGCCGTCG
AAGGTACGCGCACGAATGGCCAGGCCTACGACGTGTTCCTCGCGCCGAAGCTCGATCCGG
CAACCCGCGCCTATTGGAACGGCCGAGATCTCACCGGCCGCCGGCGCATCGGCGTCTTCG
GGCGCAACGTTTATCGTACCGGCCTGCTTGGCCGTTTCATTTCCGCCAGCCATGCTCTCG
CACGGCTGCACGGCATCAATCCGGAAGATTTCGTCAAGGCGCGCTCCATGCGCGAGCAGC
GGCAGTTCTTCGACGACAAGCTCGCTCCGCTCTTCGAGCGTCCGGTCATCCGTTGGATCA
CCAGCCGCAAGAGCTCCCTTTTCGGCCTCGGCATCCCGCCGCAGCAGTTCGACGAACTCG
CGAGCCTGAGCCGGGAGAAATCCGTCGCCGCGGTGCTGCGCAATCGCCTGGAAAAGCTGA
CCTGTCATTTCCCCTTGCGCGATAACTACTTCGCCTGGCAGGCCTTTGCACGGCGCTACC
CGCGGCCGGACGAGGGCGAGTTGCCACCTTATCTTCAGGCATCGCGATACGAAGCGATTC
GCGACAATGCGGAGCGCGTCGAGGTCCACCATGCGAGCTTCACGGAGCTTCTCGCCGGCA
AGCCCGCCGCCTCAGTCGACCGCTACGTGCTCCTCGACGCACAGGACTGGATGACCGACC
AGCAGCTGAACGACCTCTGGACGGAGATCACCCGCACCGCCGACGCCGGCGCGGTCGTGA
TCTTCCGCACGGCGGCCGAAGCGAGCATCCTGCCGGGGCGCCTCTCCACCACCCTCCTCG
ATCAGTGGTACTATGATGCCGAGACTTCGATGAGGCTCGGCGCTGAAGACCGGTCGGCGA
TCTATGGCGGCTTCCACATCTACCGGAAGAAAGCATGAGCGCCGTGCAGACCGCGAATGA
AAGCCACGCTCATCTGATGGACCGCATGTATCGCTACCAGCGGTACATCTATGATTTCAC
TCGCAAATACTATCTCTTCGGCCGTGACACGCTGATCCGTGAACTGAACCCGCCGCCAGG
CGCATCGGTGCTGGAAGTCGGCTGCGGCACGGGCCGCAATCTCGCCGTGATCGGGGATCT
CTACCCCGGTGCGCGCCTCTTCGGCCTCGATATCTCGGCCGAAATGCTGGCGACCGCCAA
AGCCAAGCTCCGGCGCCAAAATCGGCCGGACGCAGTGTTGCGGGTCGCCGACGCGACGAA
TTTCACCGCCGCCTCATTCGATCAGGAAGGCTTCGACCGGATCGTCATTTCCTACGCCCT
TTCCATGGTTCCCGAATGGGAAAAGGCGGTCGATGCCGCGATTGCCGCGCTCAAGCCGGG
CGGCTCGCTGCATATCGCCGACTTCGGCCAGCAGGAAGGTTGGCCGGCCGGCTTCCGCCG
CTTCCTCCAGGCCTGGCTCAGACGCTTCCACGTCACGCCGCGCGAAACGCTTTTCGATGT
GATGCGCAAAAGAGCCGAGAGAAACGGAGCGGCGCTCGAGGTCAGATCGCTGAGACGAGG
TTATGCCTGGCTTGTCGTCTATCGCCGCGCGGCACCGTAGCGGACGGTGGCGGATTGCAT
TCGGCTGCAATTCACACTTGAGCTAACGCAATTTTTACGATGATATGGTGAAAAGGAGGT
CACGCCTCCCTGGGGGACATCACCAATCATGGAAACCATCGCGTGAGGCAGGATCGTCGT
TCGTCTCGAAACGGAACCCCCATGCGCCGGCTTCTCCTGGCATTGCTGCCCATCGCCACC
ATTCTCTCCTCCTGTACCTCCACCGATTACGATCTCGTCAAGACGGCCTCCATTCAGCCG
CGCTTCCACGACACCGATCCCCAGGATTTCGGCGGCCGCACGCCGCACCATCACAGCGTT
CACGGGATCGACGTCTCCAAGTGGAACGGCGACATCGATTGGCGGAAGGTTAAGAATTCC
GGGGTGTCCTTCGCGTTCATCAAGGCAACCGAGGGCAAGGACCGGGTGGACTCGCGCTTC
CACGAATATTGGCAGCAGGCGCGCGCCGTCGGCCTCGCCTACGCGCCCTATCATTTCTAT
TATTTCTGCTCCACCGCCGACGCCCAGGCCGACTGGTTCATCGCCAACGTGCCGAAGAGC
GCCGTCCACCTGCCGCCCGTCCTGGATGTCGAATGGAATGGCGAATCCAAGNCCTGCCGT
CACCGGCCGGCGCCGGAAACCGTGCGGTCCGAAATGAAGCGGTTCATGGATCGGCTCGAG
GCCCATTACGGCAAGCGGCCGATCATCTACACGTCCGTCGACTTCCACCATGACAATCTG
GTCGGCGCCTTCAACGACTATCATTTCTGGGTGCGCTCGGTAGCCAAGCACCCGAAGGAC
ATCTACGTCGAACGCCGCTGGGCCTTCTGGCAATATACCAGCACCGGCGTGATCCCCGGC
ATTCAGGGCAGCACGGACATCAACGCCTTCGCCGGTTCCGCCAGGAACTGGCAGAAGTGG
GTCGCGACCGTCTCGCAGGCAAGATAGACCAGAGGACGCGGCGGCATGGTCCGCATTTTC
TTCATTCGGTCATAATGCTCTGAGAGAGCATCGATAGATTTCATTCTCGACAGACTTCGG
GCCCGGCGGCATTCCTGTGCGGCCGGCATGGAAAGGAATTGTAATGACAGCCACAGCGCG
CAAAGCCCTTCTCTCCCTCGGATTCCTTGCGATCGCCGGCGCGCCGGCCCTGGCGCAAGC
TCCGGCTCAACCGGGGAACCCAGCCGCCGCGTGCGGCGGCGACCTCGGCTCCTTTCTGGA
GGGCGTCAAGGCCGAAGCGGTCGCCAAGGGCATCCCCGCAGACGTCGCCGATCGGGCGCT
CGCAGGCGCCGCCATCGACCAGAAGGTGCTGAGCCGCGACCGCGCTCAGGGCGTGTTCAA
GCAGACCTTCACCGAATTTTCGAAGCGTACCGTCAGCAAGTCGCGCCTCGACATCGGTGC
GCAGAAGATGCGGGAATATGCCGACGTCTTTGCCCGGGCCGAGCAGGAGTTCGGCGTACC
GGCGCCCGTGATCACCGCATTCTGGGCCATGGAGACCGACTTCGGCGCCGTGCAGGGCGA
TTTCAATACGCGTGATGCGCTGGTGACGCTGGCGCATGACTGCCGCCGCCCGGAAATGTT
CCGGCCGCAGCTTCTCGCCGCAATCGAGATGGTGCAGCACGGCGATCTCGATCCCGCCGC
GACCACCGGCGCCTGGGCGGGCGAGATCGGTCAGGTACAGATGCTGCCTGAGGACATCAT
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.py
Type: text/x-python
Size: 454 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20130712/ac7df2da/attachment-0002.py>
-------------- next part --------------
********************************************************************************
MEME - Motif discovery tool
********************************************************************************
MEME version 4.9.0 (Release date: Wed Oct  3 11:07:26 EST 2012)

For further information on how to interpret these results or to get
a copy of the MEME software please access http://meme.nbcr.net.

This file may be used as input to the MAST algorithm for searching
sequence databases for matches to groups of motifs.  MAST is available
for interactive use and downloading at http://meme.nbcr.net.
********************************************************************************


********************************************************************************
REFERENCE
********************************************************************************
If you use this program in your research, please cite:

Timothy L. Bailey and Charles Elkan,
"Fitting a mixture model by expectation maximization to discover
motifs in biopolymers", Proceedings of the Second International
Conference on Intelligent Systems for Molecular Biology, pp. 28-36,
AAAI Press, Menlo Park, California, 1994.
********************************************************************************


********************************************************************************
TRAINING SET
********************************************************************************
DATAFILE= FixK-ovl.faa
ALPHABET= ACGT
Sequence name            Weight Length  Sequence name            Weight Length  
-------------            ------ ------  -------------            ------ ------  
TEST0625;                 1.0000    500  TEST0633;                 1.0000    500  
TEST0661;                 1.0000    466  TEST0667;                 1.0000    500  
TEST0682;                 1.0000    305  TEST0684;                 1.0000    500  
TEST0690;                 1.0000    500  TEST0693;                 1.0000    500  
TEST0760;                 1.0000    148  TEST0765;                 1.0000    202  
TEST1086;                 1.0000    201  TEST1087;                 1.0000    201  
TEST1093;                 1.0000    353  TEST1100;                 1.0000    470  
TEST1118;                 1.0000    500  TEST1131;                 1.0000    500  
TEST1134;                 1.0000    147  TEST1136;                 1.0000    395  
TEST1146;                 1.0000    239  TEST1147;                 1.0000    177  
TEST1149;                 1.0000    237  TEST1151;                 1.0000    245  
TEST1153;                 1.0000    245  TEST1163;                 1.0000    229  
TEST1166;                 1.0000    214  TEST1169;                 1.0000    183  
TEST1176;                 1.0000    379  TEST1179;                 1.0000    271  
TEST1201;                 1.0000    336  TEST1207;                 1.0000    173  
TEST1211;                 1.0000    328  TEST1220;                 1.0000    414  
TEST1226;                 1.0000    198  TEST1231;                 1.0000    333  
TEST1241;                 1.0000    359  TEST1243;                 1.0000    210  
TEST1266;                 1.0000    500  TEST1279;                 1.0000    500  
TEST1283;                 1.0000    500  TEST1296;                 1.0000    347  
********************************************************************************

********************************************************************************
COMMAND LINE SUMMARY
********************************************************************************
This information can also be useful in the event you wish to report a
problem with the MEME software.

command: meme -dna test.faa -oc zoops -mod zoops -w 14 -cons TTGANNNNNNTCAA -pal -bfile test.ntfreq 

model:  mod=         zoops    nmotifs=         1    evt=           inf
object function=  E-value of product of p-values
width:  minw=           14    maxw=           14    minic=        0.00
width:  wg=             11    ws=              1    endgaps=       yes
nsites: minsites=        2    maxsites=       40    wnsites=       0.8
theta:  prob=            1    spmap=         uni    spfuzz=        0.5
global: substring=      no    branching=      no    wbranch=        no
em:     prior=   dirichlet    b=            0.01    maxiter=        50
        distance=    1e-05
data:   n=           13505    N=              40
strands: +
sample: seed=            0    seqfrac=         1
Letter frequencies in dataset:
A 0.215 C 0.285 G 0.285 T 0.214 
Background letter frequencies (from Rm1021.ntfreq):
A 0.189 C 0.311 G 0.311 T 0.189 
********************************************************************************


********************************************************************************
MOTIF  1	width =   14   sites =  35   llr = 428   E-value = 2.1e-064
********************************************************************************
--------------------------------------------------------------------------------
	Motif 1 Description
--------------------------------------------------------------------------------
Simplified        A  :::9:12316::aa
pos.-specific     C  :::1263231:a::
probability       G  ::a:1323621:::
matrix            T  aa::61321:9:::

         bits    2.4               
                 2.2 **          **
                 1.9 **          **
                 1.7 ****      ****
Relative         1.4 ****      ****
Entropy          1.2 ****      ****
(17.7 bits)      1.0 ****      ****
                 0.7 *****    *****
                 0.5 *****    *****
                 0.2 ******  ******
                 0.0 --------------

Multilevel           TTGATCTAGATCAA
consensus                CGCGCG    
sequence                   AT      
                                   
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
	Motif 1 sites sorted by position p-value
--------------------------------------------------------------------------------
Sequence name             Start   P-value                 Site   
-------------             ----- ---------            --------------
TEST1220;                    209  3.97e-09 TCCAAAGCAC TTGATCTGGATCAA GGTGCCCAAG
TEST0682;                    114  2.35e-08 GGTCATAGGT TTGATCGGGATCAA CGACGCGGCG
TEST1207;                      5  2.77e-08       CTAT TTGACCAAGATCAA CTTACCGAAA
TEST0633;                    189  3.69e-08 CCGCCTGGAT TTGATGGAGATCAA TGCGCAGAAG
TEST1136;                    146  5.60e-08 TTCCACGGCT TTGATGAACATCAA TGACGGGCCA
TEST1169;                     37  7.91e-08 GAGATCCACT TTGAGCTTGATCAA GGAGTTTCCG
TEST1131;                    115  7.91e-08 AGCTTGTTGT TTGATACAGATCAA GTTCACGGAT
TEST1231;                    155  1.21e-07 CGCGACAGTA TTGACCGTGATCAA TGTAGCCGCC
TEST1087;                     55  1.21e-07 GAGCAGGAGA TTGATGTTGGTCAA AGAATTGTCT
TEST1086;                     34  1.21e-07 AGACAATTCT TTGACCAACATCAA TCTCCTGCTC
TEST0693;                     92  1.21e-07 CGACAAGTCG TTGATCGTGGTCAA GAACGAGAAA
TEST0667;                    249  1.21e-07 CCTATCGATA TTGACCACGATCAA TGCCACCGAC
TEST1211;                    150  1.79e-07 GGCCGCAGAC TTGACGCAGATCAA GGTGAACAGC
TEST0661;                    162  1.96e-07 TTGACCATTG TTGATCACAATCAA CGACTCAACC
TEST1100;                    309  2.51e-07 AAACGGCCCT TTGATCAGCGTCAA TGCTTCTCGC
TEST1166;                     51  3.38e-07 ATCGATTCTT TTGAGGCAGATCAA AGCCCTCGCG
TEST1201;                    160  3.94e-07 CCAACGGTTG TTGATCTGGAACAA TGATCGGTTT
TEST0625;                    336  3.94e-07 CCCACGGTTG TTGATCTGGAACAA TGGTTGGTTC
TEST1146;                     71  4.56e-07 GACTTTTTGT TTGAGCGCGATCAA AGCACCGTCG
TEST1279;                    346  5.50e-07 GGACCGGTCT TTGATCGAGAGCAA AGAGCCGGCC
TEST1176;                    176  7.41e-07 GAAGAGTAGA TTGATCCGGAACAA TGCGCTCCAT
TEST1153;                     62  7.88e-07 ATGCTGCGCT TTGATGTGCCTCAA TGACGGCGGG
TEST1151;                     71  7.88e-07 CCCGCCGTCA TTGAGGCACATCAA AGCGCAGCAT
TEST1296;                    125  1.03e-06 ATGCCCTTCT TTGATGCCCGTCAA GGAACGCTGG
TEST1243;                     22  1.27e-06 CGGTGGCTAT TTGACAAGCATCAA AGAGCAGGTG
TEST1241;                    132  1.45e-06 TGCCGAGTAA TTGACGGAAATCAA TTTCTCGGAA
TEST1118;                    232  1.62e-06 CACCCGGTCT TTGACGCCGGTCAA TGAGGCTGCC
TEST1179;                     92  2.42e-06 TTTAATCAAG TTGATCTGGCGCAA AGAAATTCAT
TEST1226;                     10  3.10e-06  TCTGCCGAG TTGATCTCGCGCAA TGCGGCGCGT
TEST1163;                    140  1.21e-05 TTGCGGGATA TTGCGCAGAATCAA GACAACGGTT
TEST1266;                    318  1.78e-05 TCGACATCCT TTGACATTGCGCAA AGAGGAAGCC
TEST1093;                    181  1.78e-05 GAGCGCACGC AAGATCCAGATCAA ACAAGCCTAG
TEST0690;                    452  2.27e-05 GCTCATGTTG TCGATGCAAGTCAA CGGCTCACTT
TEST0684;                    100  3.80e-05 TGTTGCCGCA TCGAGCATTGTCAA TCTCAGATGC
TEST1149;                    162  1.18e-04 AATTCTTTTG ATAATCGGTGTCAA CGATCAGGAG
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
	Motif 1 block diagrams
--------------------------------------------------------------------------------
SEQUENCE NAME            POSITION P-VALUE  MOTIF DIAGRAM
-------------            ----------------  -------------
TEST1220;                            4e-09  208_[+1]_192
TEST0682;                          2.3e-08  113_[+1]_178
TEST1207;                          2.8e-08  4_[+1]_155
TEST0633;                          3.7e-08  188_[+1]_298
TEST1136;                          5.6e-08  145_[+1]_236
TEST1169;                          7.9e-08  36_[+1]_133
TEST1131;                          7.9e-08  114_[+1]_372
TEST1231;                          1.2e-07  154_[+1]_165
TEST1087;                          1.2e-07  54_[+1]_133
TEST1086;                          1.2e-07  33_[+1]_154
TEST0693;                          1.2e-07  91_[+1]_395
TEST0667;                          1.2e-07  248_[+1]_238
TEST1211;                          1.8e-07  149_[+1]_165
TEST0661;                            2e-07  161_[+1]_291
TEST1100;                          2.5e-07  308_[+1]_148
TEST1166;                          3.4e-07  50_[+1]_150
TEST1201;                          3.9e-07  159_[+1]_163
TEST0625;                          3.9e-07  335_[+1]_151
TEST1146;                          4.6e-07  70_[+1]_155
TEST1279;                          5.5e-07  345_[+1]_141
TEST1176;                          7.4e-07  175_[+1]_190
TEST1153;                          7.9e-07  61_[+1]_170
TEST1151;                          7.9e-07  70_[+1]_161
TEST1296;                            1e-06  124_[+1]_209
TEST1243;                          1.3e-06  21_[+1]_175
TEST1241;                          1.4e-06  131_[+1]_214
TEST1118;                          1.6e-06  231_[+1]_255
TEST1179;                          2.4e-06  91_[+1]_166
TEST1226;                          3.1e-06  9_[+1]_175
TEST1163;                          1.2e-05  139_[+1]_76
TEST1266;                          1.8e-05  317_[+1]_169
TEST1093;                          1.8e-05  180_[+1]_159
TEST0690;                          2.3e-05  451_[+1]_35
TEST0684;                          3.8e-05  99_[+1]_387
TEST1149;                          0.00012  161_[+1]_62
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
	Motif 1 in BLOCKS format
--------------------------------------------------------------------------------
BL   MOTIF 1 width=14 seqs=35
TEST1220;                 (  209) TTGATCTGGATCAA  1 
TEST0682;                 (  114) TTGATCGGGATCAA  1 
TEST1207;                 (    5) TTGACCAAGATCAA  1 
TEST0633;                 (  189) TTGATGGAGATCAA  1 
TEST1136;                 (  146) TTGATGAACATCAA  1 
TEST1169;                 (   37) TTGAGCTTGATCAA  1 
TEST1131;                 (  115) TTGATACAGATCAA  1 
TEST1231;                 (  155) TTGACCGTGATCAA  1 
TEST1087;                 (   55) TTGATGTTGGTCAA  1 
TEST1086;                 (   34) TTGACCAACATCAA  1 
TEST0693;                 (   92) TTGATCGTGGTCAA  1 
TEST0667;                 (  249) TTGACCACGATCAA  1 
TEST1211;                 (  150) TTGACGCAGATCAA  1 
TEST0661;                 (  162) TTGATCACAATCAA  1 
TEST1100;                 (  309) TTGATCAGCGTCAA  1 
TEST1166;                 (   51) TTGAGGCAGATCAA  1 
TEST1201;                 (  160) TTGATCTGGAACAA  1 
TEST0625;                 (  336) TTGATCTGGAACAA  1 
TEST1146;                 (   71) TTGAGCGCGATCAA  1 
TEST1279;                 (  346) TTGATCGAGAGCAA  1 
TEST1176;                 (  176) TTGATCCGGAACAA  1 
TEST1153;                 (   62) TTGATGTGCCTCAA  1 
TEST1151;                 (   71) TTGAGGCACATCAA  1 
TEST1296;                 (  125) TTGATGCCCGTCAA  1 
TEST1243;                 (   22) TTGACAAGCATCAA  1 
TEST1241;                 (  132) TTGACGGAAATCAA  1 
TEST1118;                 (  232) TTGACGCCGGTCAA  1 
TEST1179;                 (   92) TTGATCTGGCGCAA  1 
TEST1226;                 (   10) TTGATCTCGCGCAA  1 
TEST1163;                 (  140) TTGCGCAGAATCAA  1 
TEST1266;                 (  318) TTGACATTGCGCAA  1 
TEST1093;                 (  181) AAGATCCAGATCAA  1 
TEST0690;                 (  452) TCGATGCAAGTCAA  1 
TEST0684;                 (  100) TCGAGCATTGTCAA  1 
TEST1149;                 (  162) ATAATCGGTGTCAA  1 
//

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
	Motif 1 position-specific scoring matrix
--------------------------------------------------------------------------------
log-odds matrix: alength= 4 w= 14 n= 12985 bayes= 8.63413 E= 2.1e-064 
  -272  -1177  -1177    236 
  -372   -344  -1177    234 
  -372  -1177    166  -1177 
   223   -212  -1177   -214 
 -1177    -36   -112    170 
  -140     98    -27   -173 
    18    -12    -64     67 
    67    -64    -12     18 
  -173    -27     98   -140 
   170   -112    -36  -1180 
  -214  -1179   -212    223 
 -1180    166  -1179   -372 
   234  -1179   -344   -372 
   236  -1179  -1179   -272 
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
	Motif 1 position-specific probability matrix
--------------------------------------------------------------------------------
letter-probability matrix: alength= 4 w= 14 nsites= 35 E= 2.1e-064 
 0.028571  0.000000  0.000000  0.971429 
 0.014286  0.028571  0.000000  0.957143 
 0.014286  0.000000  0.985714  0.000000 
 0.885714  0.071429  0.000000  0.042857 
 0.000000  0.242857  0.142857  0.614286 
 0.071429  0.614286  0.257143  0.057143 
 0.214284  0.285713  0.199998  0.299999 
 0.299999  0.199999  0.285714  0.214285 
 0.057142  0.257142  0.614285  0.071428 
 0.614285  0.142856  0.242856  0.000000 
 0.042856  0.000000  0.071428  0.885713 
 0.000000  0.985713  0.000000  0.014285 
 0.957142  0.000000  0.028570  0.014285 
 0.971428  0.000000  0.000000  0.028570 
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
	Motif 1 regular expression
--------------------------------------------------------------------------------
TTGA[TC][CG][TCA][AGT][GC][AG]TCAA
--------------------------------------------------------------------------------




Time  2.66 secs.

********************************************************************************


********************************************************************************
SUMMARY OF MOTIFS
********************************************************************************

--------------------------------------------------------------------------------
	Combined block diagrams: non-overlapping sites with p-value < 0.0001
--------------------------------------------------------------------------------
SEQUENCE NAME            COMBINED P-VALUE  MOTIF DIAGRAM
-------------            ----------------  -------------
TEST0625;                         1.92e-04  278_[+1(1.90e-05)]_43_\
    [+1(3.94e-07)]_151
TEST0633;                         1.80e-05  188_[+1(3.69e-08)]_298
TEST0661;                         8.88e-05  161_[+1(1.96e-07)]_291
TEST0667;                         5.88e-05  248_[+1(1.21e-07)]_238
TEST0682;                         6.86e-06  113_[+1(2.35e-08)]_178
TEST0684;                         1.83e-02  99_[+1(3.80e-05)]_387
TEST0690;                         1.10e-02  451_[+1(2.27e-05)]_35
TEST0693;                         5.88e-05  91_[+1(1.21e-07)]_95_[+1(5.50e-07)]_\
    286
TEST0760;                         3.13e-01  148
TEST0765;                         3.22e-01  202
TEST1086;                         2.27e-05  33_[+1(1.21e-07)]_154
TEST1087;                         2.27e-05  54_[+1(1.21e-07)]_133
TEST1093;                         6.02e-03  180_[+1(1.78e-05)]_159
TEST1100;                         1.15e-04  308_[+1(2.51e-07)]_148
TEST1118;                         7.90e-04  231_[+1(1.62e-06)]_255
TEST1131;                         2.73e-05  114_[+1(7.91e-08)]_197_\
    [+1(5.60e-08)]_161
TEST1134;                         6.15e-01  147
TEST1136;                         2.14e-05  145_[+1(5.60e-08)]_236
TEST1146;                         1.03e-04  70_[+1(4.56e-07)]_155
TEST1147;                         4.86e-01  177
TEST1149;                         2.60e-02  237
TEST1151;                         1.83e-04  70_[+1(7.88e-07)]_161
TEST1153;                         1.83e-04  61_[+1(7.88e-07)]_170
TEST1163;                         2.61e-03  139_[+1(1.21e-05)]_76
TEST1166;                         6.79e-05  50_[+1(3.38e-07)]_150
TEST1169;                         1.34e-05  36_[+1(7.91e-08)]_133
TEST1176;                         2.71e-04  175_[+1(7.41e-07)]_190
TEST1179;                         6.24e-04  36_[+1(6.46e-05)]_41_[+1(2.42e-06)]_\
    166
TEST1201;                         1.27e-04  159_[+1(3.94e-07)]_163
TEST1207;                         4.44e-06  4_[+1(2.77e-08)]_155
TEST1211;                         5.65e-05  149_[+1(1.79e-07)]_165
TEST1220;                         1.59e-06  208_[+1(3.97e-09)]_192
TEST1226;                         5.74e-04  9_[+1(3.10e-06)]_175
TEST1231;                         3.86e-05  154_[+1(1.21e-07)]_165
TEST1241;                         5.01e-04  131_[+1(1.45e-06)]_214
TEST1243;                         2.51e-04  21_[+1(1.27e-06)]_175
TEST1266;                         8.62e-03  317_[+1(1.78e-05)]_169
TEST1279;                         2.68e-04  345_[+1(5.50e-07)]_141
TEST1283;                         3.03e-01  500
TEST1296;                         3.44e-04  124_[+1(1.03e-06)]_209
--------------------------------------------------------------------------------

********************************************************************************


********************************************************************************
Stopped because nmotifs = 1 reached.
********************************************************************************

CPU: pino

********************************************************************************


More information about the Biopython mailing list