[Biopython] Bio.motifs raising Exceptions using pypy
Marco Galardini
marco.galardini at unifi.it
Fri Jul 12 09:40:59 UTC 2013
Hi,
i've arranged a sample script and sample data to replicate the issue:
python test.py test.fa test.txt
551 20.9172
-5389 21.0426
pypy test.py test.fa test.txt
551 20.9172
-5389 21.0426
Traceback (most recent call last):
File "app_main.py", line 72, in run_toplevel
File "test.py", line 20, in <module>
for position, score in pssm.search(s.seq, threshold=score_t):
File "/usr/local/lib/pypy2.7/dist-packages/Bio/motifs/matrix.py",
line 354, in search
score = self.calculate(s)
File "/usr/local/lib/pypy2.7/dist-packages/Bio/motifs/matrix.py",
line 331, in calculate
score += self[letter][position]
File "/usr/local/lib/pypy2.7/dist-packages/Bio/motifs/matrix.py",
line 113, in __getitem__
return dict.__getitem__(self, letter)
KeyError: 'N'
Hope this helps, my guess is that it may be something related to the
implementation of dictionaries in pypy, since the object raising the
exception inherits dict.
Thanks a lot for the help,
Marco
On 07/11/2013 01:26 PM, Peter Cock wrote:
> On Thu, Jul 11, 2013 at 12:05 PM, Marco Galardini
> <marco.galardini at unifi.it> wrote:
>> Dear Biopython team,
>>
>> I am using the Bio.motifs package to perform a motif search inside DNA
>> sequences; the motif is retrieved from a MEME file.
>>
>> When using python 2.7 the search works just fine (biopython 1.61), even
>> though a bit slow; when using pypy (2.0.2, biopython 1.61+) to speed things
>> up the same script raises an exception, complaining about the presence of
>> "N" chars inside the sequence.
>>
>> Here's the traceback:
>>
>> Traceback (most recent call last):
>> File "app_main.py", line 72, in run_toplevel
>> File "test.py", line 20, in <module>
>> for position, score in pssm.search(s.seq, threshold=score_t):
>> File "/usr/local/lib/pypy2.7/dist-packages/Bio/motifs/matrix.py", line
>> 354, in search
>> score = self.calculate(s)
>> File "/usr/local/lib/pypy2.7/dist-packages/Bio/motifs/matrix.py", line
>> 331, in calculate
>> score += self[letter][position]
>> File "/usr/local/lib/pypy2.7/dist-packages/Bio/motifs/matrix.py", line
>> 113, in __getitem__
>> return dict.__getitem__(self, letter)
>> KeyError: 'N'
>>
>> If needed, I can provide you with the input files and a sample script.
>>
>> Thanks for the help, and keep up with the great work.
>>
>> Marco
> A short test script (which we maybe can turn into another unit
> test for this code) would be great to sort this out. Thanks!
>
> Peter
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
--
-------------------------------------------------
Marco Galardini, PhD
Dipartimento di Biologia
Via Madonna del Piano, 6 - 50019 Sesto Fiorentino (FI)
e-mail: marco.galardini at unifi.it
www: http://www.unifi.it/dblage/CMpro-v-p-51.html
phone: +39 055 4574737
mobile: +39 340 2808041
-------------------------------------------------
-------------- next part --------------
>test
GCGCCGCCGGTCCCCGAAAAAGGCGCCGGACAGTCCGTCCCGCTCATCGGGGTCGCCGCC
TCGTGGGAATCGGATTTCGACACCGGCGAGCCGGTCGGTCTGGAAACGCTTGTCGCCAAG
CGCATGATCGTTCCGACGGAGCGCCCGAAGACAGGCGTGATCGGCACCGCAGTCGGCGCG
GTCGCAAGCGTCATCCCCGATTCGCTGAAGCCCGGAAAAACACCGACCAGCTCGCGGCCG
GAGCTTGACAGGCTGATCAAACATTATGCCGAGCTGAACGGTCTGCCGCTCGAGCTGGTG
CACCGGGTGGTCAGGCGCGAGAGCAACTACAACCCGCGAGCCTACAGCAAAGGCAATTAC
GGGTTGATGCAGATCCGCTACAACACGGCCAAGGGTCTCGGCTATGAGGGCCCGGCCGAA
GGTCTCTTCGACGCGGAAACCAACCTCAAATACGCGACGAAGTACCTGCGCGGAGCGTGG
ATGGTTGCCGACAACCAGCACGACGGCGCGGTAAGGCTCTATGCCAGCGGCTATTATTAC
CATGCCAAGCGTTGATCTGGATCAAAGCTGAATATGAGGTAAGCCGCGACCAGCGGCCGA
TGGCCTATCTGCCAGACATCATTCAATCGAGCGCGTCGATTATCCTCGAATTCAGCTTCT
GCACGTCGTAGCCGAGGCGCGACGGTGTCAGCCCCAGGCGGACGACCGCGAGGCGAAGCG
AGGGGACGATCATGATCGCCTGCCCGTCATGTCCAAGCATCCAGAACGTATCGGGCGGGA
AATTCGCCGTTCCGGCGCGGGTGCCGTTTTCCTGGAGCCAGACCTGGCCTGCCCCGTAGT
CGCCCCCGGAAGCCGCAGTCGGCGTGCGCATGAAGGACACGTAACCTTCCGGCAGGAGCC
GCCTCCCCTTCCAGCTTCCGTCCTGAAGCAGAAACTCGGCGAAGCGCGCCCAGTCCTGTG
CCGACGCATACATGTAGGAAGAGCCGACGAAGGTTCCGCTTGCATCCGTCTCCATAACGG
CGCTCGTCATCCCGAGCGGAGCGAAGAACGCCTCGCGCGGATAGGAAAGCGCTTCGGCCG
GATCGTCGAATGTCTGCATCCANNNCCGGGACAGAAGATTGCTCGTGCCGCTCGAATAGG
CGAATTTCGTGCCCGGAGCCGCCTCCAGCGGCTTCGAGGCGACGAAGCCGGCCATGTCGC
TTTCCCGATAGAGCATACGCGTCACGTCCGTGACGTCGCCGTAATCCTCGTTGAAATCGA
GCCCGCTCTGCATCGCGAGAAGGTCCGTCAGCTTGATGCGAGCCCGGTCATCGCCGTTCC
ATTCGGTCACCAGATTGGTCTGGGCCAGATCCATCCGCCCTTCGGCAATGCGCCGGCCGA
TGATCGCCGCCGTCACGGACTTCGTCATCGACCAGCCGAGCAGGGGCGTGTTCCGGTCGA
AGCCCGCCGCATAGGTCTCCGCGACCAGCCTGCCATCCCTGACGACCACGATTGCACGCA
TGCCCGGACCTGCCAGTGCCGGATCTTCGACAAGCTTTTGAATGGCCGGGTCGATGTCCG
GCTTGTCCCCGTCCGGCCAGTCGAGGCTCGGATCGGGGGCGAGCGGCGCCGTTGCCGACT
CGGTCCCGCGCATCCCCGCGATGGCCTCGGCGCTGCCTCCGCTCACATTGGCGCAACCGC
GGCCCGGACGGTAGACGGCGCGGCCTGGGGCAGCAAAGCCCAGGAGACGCGCCGTCACGC
TCTGCTCTTCCCGATCGACCGAAACGCGCACGAGCTTCAGGAGCGGGTGGCCAGGCGCCT
GCACGTCTTCCTCCAGCACTTCCTGCGGATCGCGTCCCGCGAGGAACACATTGGAGCAGA
CGATCTTGGCGGCATAGCCATCGCCCACCTTGAGGAGTTCAGGCGGGAACAGCGCCAGCC
AGCCAACGAGGCCCGCGAGCGTAGCCACAACCAGCCCGCCAAGCGTCTTCAGCAGACCCT
TCATTCTCGCCCTCCTGCCCTTTGTATAAAGTGCTACAGCGCTTTCGCCCGTCTGACCAG
TGTACATGACTATTGCGTCTTGTATCCGGCAGCAGAGGCTCAGGTGGTGAGGATGACCTC
TCCTCCGGTTTGCCCTTTCGTCGCAAAATGCCGTCACCGCAACCGCTTTGTCGGAAGGGC
CTGGTGGTCGCCGCGACTCTCCTTCGCACCGCTTGCGGGGAGAAGATGCCGGCAGGCAGA
TGAGAGGCAATACCCGAATCCCTGCAAGCCCCTGTGCGAAACCTCGTCATCAAAGTGTAG
CCGAGTCACCTTAGAAGCGGCTCAGTTTCAACTGGACGACAGGCAAGATGACCGACTTCG
CCCCGGATGCCGGCTTCGGCAAGAAGAATCCGAAACTGAAAAGCGCACTCCTGCAGCACA
AAGCTCTCTCCCCCGCCGGTCTCTCCGAACGCCTGTTCGGGCTGCTCTTTTCCGGACTCG
TCTACCCGCAGATCTGGGAGGACCCGATTGTCGACATGGAAGCGATGCAGATCCGTCCCG
GACATCGGATCGTGACGATCGGTTCCGGCGGCTGCAACATGCTGACCTATCTCTCCGCCG
AGCCTGCCCGGATAGACGTGGTCGATCTCAACCCCCATCACATCGCGCTCAACCGGCTGA
AGCTGTCTGCCTTTCGCCACCTGCCGAGCCACAAGGACGTGGTGCGGTTCCTCGCCGTCG
AAGGTACGCGCACGAATGGCCAGGCCTACGACGTGTTCCTCGCGCCGAAGCTCGATCCGG
CAACCCGCGCCTATTGGAACGGCCGAGATCTCACCGGCCGCCGGCGCATCGGCGTCTTCG
GGCGCAACGTTTATCGTACCGGCCTGCTTGGCCGTTTCATTTCCGCCAGCCATGCTCTCG
CACGGCTGCACGGCATCAATCCGGAAGATTTCGTCAAGGCGCGCTCCATGCGCGAGCAGC
GGCAGTTCTTCGACGACAAGCTCGCTCCGCTCTTCGAGCGTCCGGTCATCCGTTGGATCA
CCAGCCGCAAGAGCTCCCTTTTCGGCCTCGGCATCCCGCCGCAGCAGTTCGACGAACTCG
CGAGCCTGAGCCGGGAGAAATCCGTCGCCGCGGTGCTGCGCAATCGCCTGGAAAAGCTGA
CCTGTCATTTCCCCTTGCGCGATAACTACTTCGCCTGGCAGGCCTTTGCACGGCGCTACC
CGCGGCCGGACGAGGGCGAGTTGCCACCTTATCTTCAGGCATCGCGATACGAAGCGATTC
GCGACAATGCGGAGCGCGTCGAGGTCCACCATGCGAGCTTCACGGAGCTTCTCGCCGGCA
AGCCCGCCGCCTCAGTCGACCGCTACGTGCTCCTCGACGCACAGGACTGGATGACCGACC
AGCAGCTGAACGACCTCTGGACGGAGATCACCCGCACCGCCGACGCCGGCGCGGTCGTGA
TCTTCCGCACGGCGGCCGAAGCGAGCATCCTGCCGGGGCGCCTCTCCACCACCCTCCTCG
ATCAGTGGTACTATGATGCCGAGACTTCGATGAGGCTCGGCGCTGAAGACCGGTCGGCGA
TCTATGGCGGCTTCCACATCTACCGGAAGAAAGCATGAGCGCCGTGCAGACCGCGAATGA
AAGCCACGCTCATCTGATGGACCGCATGTATCGCTACCAGCGGTACATCTATGATTTCAC
TCGCAAATACTATCTCTTCGGCCGTGACACGCTGATCCGTGAACTGAACCCGCCGCCAGG
CGCATCGGTGCTGGAAGTCGGCTGCGGCACGGGCCGCAATCTCGCCGTGATCGGGGATCT
CTACCCCGGTGCGCGCCTCTTCGGCCTCGATATCTCGGCCGAAATGCTGGCGACCGCCAA
AGCCAAGCTCCGGCGCCAAAATCGGCCGGACGCAGTGTTGCGGGTCGCCGACGCGACGAA
TTTCACCGCCGCCTCATTCGATCAGGAAGGCTTCGACCGGATCGTCATTTCCTACGCCCT
TTCCATGGTTCCCGAATGGGAAAAGGCGGTCGATGCCGCGATTGCCGCGCTCAAGCCGGG
CGGCTCGCTGCATATCGCCGACTTCGGCCAGCAGGAAGGTTGGCCGGCCGGCTTCCGCCG
CTTCCTCCAGGCCTGGCTCAGACGCTTCCACGTCACGCCGCGCGAAACGCTTTTCGATGT
GATGCGCAAAAGAGCCGAGAGAAACGGAGCGGCGCTCGAGGTCAGATCGCTGAGACGAGG
TTATGCCTGGCTTGTCGTCTATCGCCGCGCGGCACCGTAGCGGACGGTGGCGGATTGCAT
TCGGCTGCAATTCACACTTGAGCTAACGCAATTTTTACGATGATATGGTGAAAAGGAGGT
CACGCCTCCCTGGGGGACATCACCAATCATGGAAACCATCGCGTGAGGCAGGATCGTCGT
TCGTCTCGAAACGGAACCCCCATGCGCCGGCTTCTCCTGGCATTGCTGCCCATCGCCACC
ATTCTCTCCTCCTGTACCTCCACCGATTACGATCTCGTCAAGACGGCCTCCATTCAGCCG
CGCTTCCACGACACCGATCCCCAGGATTTCGGCGGCCGCACGCCGCACCATCACAGCGTT
CACGGGATCGACGTCTCCAAGTGGAACGGCGACATCGATTGGCGGAAGGTTAAGAATTCC
GGGGTGTCCTTCGCGTTCATCAAGGCAACCGAGGGCAAGGACCGGGTGGACTCGCGCTTC
CACGAATATTGGCAGCAGGCGCGCGCCGTCGGCCTCGCCTACGCGCCCTATCATTTCTAT
TATTTCTGCTCCACCGCCGACGCCCAGGCCGACTGGTTCATCGCCAACGTGCCGAAGAGC
GCCGTCCACCTGCCGCCCGTCCTGGATGTCGAATGGAATGGCGAATCCAAGNCCTGCCGT
CACCGGCCGGCGCCGGAAACCGTGCGGTCCGAAATGAAGCGGTTCATGGATCGGCTCGAG
GCCCATTACGGCAAGCGGCCGATCATCTACACGTCCGTCGACTTCCACCATGACAATCTG
GTCGGCGCCTTCAACGACTATCATTTCTGGGTGCGCTCGGTAGCCAAGCACCCGAAGGAC
ATCTACGTCGAACGCCGCTGGGCCTTCTGGCAATATACCAGCACCGGCGTGATCCCCGGC
ATTCAGGGCAGCACGGACATCAACGCCTTCGCCGGTTCCGCCAGGAACTGGCAGAAGTGG
GTCGCGACCGTCTCGCAGGCAAGATAGACCAGAGGACGCGGCGGCATGGTCCGCATTTTC
TTCATTCGGTCATAATGCTCTGAGAGAGCATCGATAGATTTCATTCTCGACAGACTTCGG
GCCCGGCGGCATTCCTGTGCGGCCGGCATGGAAAGGAATTGTAATGACAGCCACAGCGCG
CAAAGCCCTTCTCTCCCTCGGATTCCTTGCGATCGCCGGCGCGCCGGCCCTGGCGCAAGC
TCCGGCTCAACCGGGGAACCCAGCCGCCGCGTGCGGCGGCGACCTCGGCTCCTTTCTGGA
GGGCGTCAAGGCCGAAGCGGTCGCCAAGGGCATCCCCGCAGACGTCGCCGATCGGGCGCT
CGCAGGCGCCGCCATCGACCAGAAGGTGCTGAGCCGCGACCGCGCTCAGGGCGTGTTCAA
GCAGACCTTCACCGAATTTTCGAAGCGTACCGTCAGCAAGTCGCGCCTCGACATCGGTGC
GCAGAAGATGCGGGAATATGCCGACGTCTTTGCCCGGGCCGAGCAGGAGTTCGGCGTACC
GGCGCCCGTGATCACCGCATTCTGGGCCATGGAGACCGACTTCGGCGCCGTGCAGGGCGA
TTTCAATACGCGTGATGCGCTGGTGACGCTGGCGCATGACTGCCGCCGCCCGGAAATGTT
CCGGCCGCAGCTTCTCGCCGCAATCGAGATGGTGCAGCACGGCGATCTCGATCCCGCCGC
GACCACCGGCGCCTGGGCGGGCGAGATCGGTCAGGTACAGATGCTGCCTGAGGACATCAT
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.py
Type: text/x-python
Size: 454 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython/attachments/20130712/ac7df2da/attachment-0002.py>
-------------- next part --------------
********************************************************************************
MEME - Motif discovery tool
********************************************************************************
MEME version 4.9.0 (Release date: Wed Oct 3 11:07:26 EST 2012)
For further information on how to interpret these results or to get
a copy of the MEME software please access http://meme.nbcr.net.
This file may be used as input to the MAST algorithm for searching
sequence databases for matches to groups of motifs. MAST is available
for interactive use and downloading at http://meme.nbcr.net.
********************************************************************************
********************************************************************************
REFERENCE
********************************************************************************
If you use this program in your research, please cite:
Timothy L. Bailey and Charles Elkan,
"Fitting a mixture model by expectation maximization to discover
motifs in biopolymers", Proceedings of the Second International
Conference on Intelligent Systems for Molecular Biology, pp. 28-36,
AAAI Press, Menlo Park, California, 1994.
********************************************************************************
********************************************************************************
TRAINING SET
********************************************************************************
DATAFILE= FixK-ovl.faa
ALPHABET= ACGT
Sequence name Weight Length Sequence name Weight Length
------------- ------ ------ ------------- ------ ------
TEST0625; 1.0000 500 TEST0633; 1.0000 500
TEST0661; 1.0000 466 TEST0667; 1.0000 500
TEST0682; 1.0000 305 TEST0684; 1.0000 500
TEST0690; 1.0000 500 TEST0693; 1.0000 500
TEST0760; 1.0000 148 TEST0765; 1.0000 202
TEST1086; 1.0000 201 TEST1087; 1.0000 201
TEST1093; 1.0000 353 TEST1100; 1.0000 470
TEST1118; 1.0000 500 TEST1131; 1.0000 500
TEST1134; 1.0000 147 TEST1136; 1.0000 395
TEST1146; 1.0000 239 TEST1147; 1.0000 177
TEST1149; 1.0000 237 TEST1151; 1.0000 245
TEST1153; 1.0000 245 TEST1163; 1.0000 229
TEST1166; 1.0000 214 TEST1169; 1.0000 183
TEST1176; 1.0000 379 TEST1179; 1.0000 271
TEST1201; 1.0000 336 TEST1207; 1.0000 173
TEST1211; 1.0000 328 TEST1220; 1.0000 414
TEST1226; 1.0000 198 TEST1231; 1.0000 333
TEST1241; 1.0000 359 TEST1243; 1.0000 210
TEST1266; 1.0000 500 TEST1279; 1.0000 500
TEST1283; 1.0000 500 TEST1296; 1.0000 347
********************************************************************************
********************************************************************************
COMMAND LINE SUMMARY
********************************************************************************
This information can also be useful in the event you wish to report a
problem with the MEME software.
command: meme -dna test.faa -oc zoops -mod zoops -w 14 -cons TTGANNNNNNTCAA -pal -bfile test.ntfreq
model: mod= zoops nmotifs= 1 evt= inf
object function= E-value of product of p-values
width: minw= 14 maxw= 14 minic= 0.00
width: wg= 11 ws= 1 endgaps= yes
nsites: minsites= 2 maxsites= 40 wnsites= 0.8
theta: prob= 1 spmap= uni spfuzz= 0.5
global: substring= no branching= no wbranch= no
em: prior= dirichlet b= 0.01 maxiter= 50
distance= 1e-05
data: n= 13505 N= 40
strands: +
sample: seed= 0 seqfrac= 1
Letter frequencies in dataset:
A 0.215 C 0.285 G 0.285 T 0.214
Background letter frequencies (from Rm1021.ntfreq):
A 0.189 C 0.311 G 0.311 T 0.189
********************************************************************************
********************************************************************************
MOTIF 1 width = 14 sites = 35 llr = 428 E-value = 2.1e-064
********************************************************************************
--------------------------------------------------------------------------------
Motif 1 Description
--------------------------------------------------------------------------------
Simplified A :::9:12316::aa
pos.-specific C :::1263231:a::
probability G ::a:1323621:::
matrix T aa::61321:9:::
bits 2.4
2.2 ** **
1.9 ** **
1.7 **** ****
Relative 1.4 **** ****
Entropy 1.2 **** ****
(17.7 bits) 1.0 **** ****
0.7 ***** *****
0.5 ***** *****
0.2 ****** ******
0.0 --------------
Multilevel TTGATCTAGATCAA
consensus CGCGCG
sequence AT
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 1 sites sorted by position p-value
--------------------------------------------------------------------------------
Sequence name Start P-value Site
------------- ----- --------- --------------
TEST1220; 209 3.97e-09 TCCAAAGCAC TTGATCTGGATCAA GGTGCCCAAG
TEST0682; 114 2.35e-08 GGTCATAGGT TTGATCGGGATCAA CGACGCGGCG
TEST1207; 5 2.77e-08 CTAT TTGACCAAGATCAA CTTACCGAAA
TEST0633; 189 3.69e-08 CCGCCTGGAT TTGATGGAGATCAA TGCGCAGAAG
TEST1136; 146 5.60e-08 TTCCACGGCT TTGATGAACATCAA TGACGGGCCA
TEST1169; 37 7.91e-08 GAGATCCACT TTGAGCTTGATCAA GGAGTTTCCG
TEST1131; 115 7.91e-08 AGCTTGTTGT TTGATACAGATCAA GTTCACGGAT
TEST1231; 155 1.21e-07 CGCGACAGTA TTGACCGTGATCAA TGTAGCCGCC
TEST1087; 55 1.21e-07 GAGCAGGAGA TTGATGTTGGTCAA AGAATTGTCT
TEST1086; 34 1.21e-07 AGACAATTCT TTGACCAACATCAA TCTCCTGCTC
TEST0693; 92 1.21e-07 CGACAAGTCG TTGATCGTGGTCAA GAACGAGAAA
TEST0667; 249 1.21e-07 CCTATCGATA TTGACCACGATCAA TGCCACCGAC
TEST1211; 150 1.79e-07 GGCCGCAGAC TTGACGCAGATCAA GGTGAACAGC
TEST0661; 162 1.96e-07 TTGACCATTG TTGATCACAATCAA CGACTCAACC
TEST1100; 309 2.51e-07 AAACGGCCCT TTGATCAGCGTCAA TGCTTCTCGC
TEST1166; 51 3.38e-07 ATCGATTCTT TTGAGGCAGATCAA AGCCCTCGCG
TEST1201; 160 3.94e-07 CCAACGGTTG TTGATCTGGAACAA TGATCGGTTT
TEST0625; 336 3.94e-07 CCCACGGTTG TTGATCTGGAACAA TGGTTGGTTC
TEST1146; 71 4.56e-07 GACTTTTTGT TTGAGCGCGATCAA AGCACCGTCG
TEST1279; 346 5.50e-07 GGACCGGTCT TTGATCGAGAGCAA AGAGCCGGCC
TEST1176; 176 7.41e-07 GAAGAGTAGA TTGATCCGGAACAA TGCGCTCCAT
TEST1153; 62 7.88e-07 ATGCTGCGCT TTGATGTGCCTCAA TGACGGCGGG
TEST1151; 71 7.88e-07 CCCGCCGTCA TTGAGGCACATCAA AGCGCAGCAT
TEST1296; 125 1.03e-06 ATGCCCTTCT TTGATGCCCGTCAA GGAACGCTGG
TEST1243; 22 1.27e-06 CGGTGGCTAT TTGACAAGCATCAA AGAGCAGGTG
TEST1241; 132 1.45e-06 TGCCGAGTAA TTGACGGAAATCAA TTTCTCGGAA
TEST1118; 232 1.62e-06 CACCCGGTCT TTGACGCCGGTCAA TGAGGCTGCC
TEST1179; 92 2.42e-06 TTTAATCAAG TTGATCTGGCGCAA AGAAATTCAT
TEST1226; 10 3.10e-06 TCTGCCGAG TTGATCTCGCGCAA TGCGGCGCGT
TEST1163; 140 1.21e-05 TTGCGGGATA TTGCGCAGAATCAA GACAACGGTT
TEST1266; 318 1.78e-05 TCGACATCCT TTGACATTGCGCAA AGAGGAAGCC
TEST1093; 181 1.78e-05 GAGCGCACGC AAGATCCAGATCAA ACAAGCCTAG
TEST0690; 452 2.27e-05 GCTCATGTTG TCGATGCAAGTCAA CGGCTCACTT
TEST0684; 100 3.80e-05 TGTTGCCGCA TCGAGCATTGTCAA TCTCAGATGC
TEST1149; 162 1.18e-04 AATTCTTTTG ATAATCGGTGTCAA CGATCAGGAG
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 1 block diagrams
--------------------------------------------------------------------------------
SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM
------------- ---------------- -------------
TEST1220; 4e-09 208_[+1]_192
TEST0682; 2.3e-08 113_[+1]_178
TEST1207; 2.8e-08 4_[+1]_155
TEST0633; 3.7e-08 188_[+1]_298
TEST1136; 5.6e-08 145_[+1]_236
TEST1169; 7.9e-08 36_[+1]_133
TEST1131; 7.9e-08 114_[+1]_372
TEST1231; 1.2e-07 154_[+1]_165
TEST1087; 1.2e-07 54_[+1]_133
TEST1086; 1.2e-07 33_[+1]_154
TEST0693; 1.2e-07 91_[+1]_395
TEST0667; 1.2e-07 248_[+1]_238
TEST1211; 1.8e-07 149_[+1]_165
TEST0661; 2e-07 161_[+1]_291
TEST1100; 2.5e-07 308_[+1]_148
TEST1166; 3.4e-07 50_[+1]_150
TEST1201; 3.9e-07 159_[+1]_163
TEST0625; 3.9e-07 335_[+1]_151
TEST1146; 4.6e-07 70_[+1]_155
TEST1279; 5.5e-07 345_[+1]_141
TEST1176; 7.4e-07 175_[+1]_190
TEST1153; 7.9e-07 61_[+1]_170
TEST1151; 7.9e-07 70_[+1]_161
TEST1296; 1e-06 124_[+1]_209
TEST1243; 1.3e-06 21_[+1]_175
TEST1241; 1.4e-06 131_[+1]_214
TEST1118; 1.6e-06 231_[+1]_255
TEST1179; 2.4e-06 91_[+1]_166
TEST1226; 3.1e-06 9_[+1]_175
TEST1163; 1.2e-05 139_[+1]_76
TEST1266; 1.8e-05 317_[+1]_169
TEST1093; 1.8e-05 180_[+1]_159
TEST0690; 2.3e-05 451_[+1]_35
TEST0684; 3.8e-05 99_[+1]_387
TEST1149; 0.00012 161_[+1]_62
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 1 in BLOCKS format
--------------------------------------------------------------------------------
BL MOTIF 1 width=14 seqs=35
TEST1220; ( 209) TTGATCTGGATCAA 1
TEST0682; ( 114) TTGATCGGGATCAA 1
TEST1207; ( 5) TTGACCAAGATCAA 1
TEST0633; ( 189) TTGATGGAGATCAA 1
TEST1136; ( 146) TTGATGAACATCAA 1
TEST1169; ( 37) TTGAGCTTGATCAA 1
TEST1131; ( 115) TTGATACAGATCAA 1
TEST1231; ( 155) TTGACCGTGATCAA 1
TEST1087; ( 55) TTGATGTTGGTCAA 1
TEST1086; ( 34) TTGACCAACATCAA 1
TEST0693; ( 92) TTGATCGTGGTCAA 1
TEST0667; ( 249) TTGACCACGATCAA 1
TEST1211; ( 150) TTGACGCAGATCAA 1
TEST0661; ( 162) TTGATCACAATCAA 1
TEST1100; ( 309) TTGATCAGCGTCAA 1
TEST1166; ( 51) TTGAGGCAGATCAA 1
TEST1201; ( 160) TTGATCTGGAACAA 1
TEST0625; ( 336) TTGATCTGGAACAA 1
TEST1146; ( 71) TTGAGCGCGATCAA 1
TEST1279; ( 346) TTGATCGAGAGCAA 1
TEST1176; ( 176) TTGATCCGGAACAA 1
TEST1153; ( 62) TTGATGTGCCTCAA 1
TEST1151; ( 71) TTGAGGCACATCAA 1
TEST1296; ( 125) TTGATGCCCGTCAA 1
TEST1243; ( 22) TTGACAAGCATCAA 1
TEST1241; ( 132) TTGACGGAAATCAA 1
TEST1118; ( 232) TTGACGCCGGTCAA 1
TEST1179; ( 92) TTGATCTGGCGCAA 1
TEST1226; ( 10) TTGATCTCGCGCAA 1
TEST1163; ( 140) TTGCGCAGAATCAA 1
TEST1266; ( 318) TTGACATTGCGCAA 1
TEST1093; ( 181) AAGATCCAGATCAA 1
TEST0690; ( 452) TCGATGCAAGTCAA 1
TEST0684; ( 100) TCGAGCATTGTCAA 1
TEST1149; ( 162) ATAATCGGTGTCAA 1
//
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 1 position-specific scoring matrix
--------------------------------------------------------------------------------
log-odds matrix: alength= 4 w= 14 n= 12985 bayes= 8.63413 E= 2.1e-064
-272 -1177 -1177 236
-372 -344 -1177 234
-372 -1177 166 -1177
223 -212 -1177 -214
-1177 -36 -112 170
-140 98 -27 -173
18 -12 -64 67
67 -64 -12 18
-173 -27 98 -140
170 -112 -36 -1180
-214 -1179 -212 223
-1180 166 -1179 -372
234 -1179 -344 -372
236 -1179 -1179 -272
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 1 position-specific probability matrix
--------------------------------------------------------------------------------
letter-probability matrix: alength= 4 w= 14 nsites= 35 E= 2.1e-064
0.028571 0.000000 0.000000 0.971429
0.014286 0.028571 0.000000 0.957143
0.014286 0.000000 0.985714 0.000000
0.885714 0.071429 0.000000 0.042857
0.000000 0.242857 0.142857 0.614286
0.071429 0.614286 0.257143 0.057143
0.214284 0.285713 0.199998 0.299999
0.299999 0.199999 0.285714 0.214285
0.057142 0.257142 0.614285 0.071428
0.614285 0.142856 0.242856 0.000000
0.042856 0.000000 0.071428 0.885713
0.000000 0.985713 0.000000 0.014285
0.957142 0.000000 0.028570 0.014285
0.971428 0.000000 0.000000 0.028570
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 1 regular expression
--------------------------------------------------------------------------------
TTGA[TC][CG][TCA][AGT][GC][AG]TCAA
--------------------------------------------------------------------------------
Time 2.66 secs.
********************************************************************************
********************************************************************************
SUMMARY OF MOTIFS
********************************************************************************
--------------------------------------------------------------------------------
Combined block diagrams: non-overlapping sites with p-value < 0.0001
--------------------------------------------------------------------------------
SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM
------------- ---------------- -------------
TEST0625; 1.92e-04 278_[+1(1.90e-05)]_43_\
[+1(3.94e-07)]_151
TEST0633; 1.80e-05 188_[+1(3.69e-08)]_298
TEST0661; 8.88e-05 161_[+1(1.96e-07)]_291
TEST0667; 5.88e-05 248_[+1(1.21e-07)]_238
TEST0682; 6.86e-06 113_[+1(2.35e-08)]_178
TEST0684; 1.83e-02 99_[+1(3.80e-05)]_387
TEST0690; 1.10e-02 451_[+1(2.27e-05)]_35
TEST0693; 5.88e-05 91_[+1(1.21e-07)]_95_[+1(5.50e-07)]_\
286
TEST0760; 3.13e-01 148
TEST0765; 3.22e-01 202
TEST1086; 2.27e-05 33_[+1(1.21e-07)]_154
TEST1087; 2.27e-05 54_[+1(1.21e-07)]_133
TEST1093; 6.02e-03 180_[+1(1.78e-05)]_159
TEST1100; 1.15e-04 308_[+1(2.51e-07)]_148
TEST1118; 7.90e-04 231_[+1(1.62e-06)]_255
TEST1131; 2.73e-05 114_[+1(7.91e-08)]_197_\
[+1(5.60e-08)]_161
TEST1134; 6.15e-01 147
TEST1136; 2.14e-05 145_[+1(5.60e-08)]_236
TEST1146; 1.03e-04 70_[+1(4.56e-07)]_155
TEST1147; 4.86e-01 177
TEST1149; 2.60e-02 237
TEST1151; 1.83e-04 70_[+1(7.88e-07)]_161
TEST1153; 1.83e-04 61_[+1(7.88e-07)]_170
TEST1163; 2.61e-03 139_[+1(1.21e-05)]_76
TEST1166; 6.79e-05 50_[+1(3.38e-07)]_150
TEST1169; 1.34e-05 36_[+1(7.91e-08)]_133
TEST1176; 2.71e-04 175_[+1(7.41e-07)]_190
TEST1179; 6.24e-04 36_[+1(6.46e-05)]_41_[+1(2.42e-06)]_\
166
TEST1201; 1.27e-04 159_[+1(3.94e-07)]_163
TEST1207; 4.44e-06 4_[+1(2.77e-08)]_155
TEST1211; 5.65e-05 149_[+1(1.79e-07)]_165
TEST1220; 1.59e-06 208_[+1(3.97e-09)]_192
TEST1226; 5.74e-04 9_[+1(3.10e-06)]_175
TEST1231; 3.86e-05 154_[+1(1.21e-07)]_165
TEST1241; 5.01e-04 131_[+1(1.45e-06)]_214
TEST1243; 2.51e-04 21_[+1(1.27e-06)]_175
TEST1266; 8.62e-03 317_[+1(1.78e-05)]_169
TEST1279; 2.68e-04 345_[+1(5.50e-07)]_141
TEST1283; 3.03e-01 500
TEST1296; 3.44e-04 124_[+1(1.03e-06)]_209
--------------------------------------------------------------------------------
********************************************************************************
********************************************************************************
Stopped because nmotifs = 1 reached.
********************************************************************************
CPU: pino
********************************************************************************
More information about the Biopython
mailing list