[BioPython] Informative content problem, SOLVED!!!
Sebastian Bassi
sbassi@asalup.org
Fri, 29 Nov 2002 01:21:31 -0300
Hi,
Now I know what's going on. The formulae used by biopython is ONLY for
aligments (since it uses information from every sequence on the aligment).
My formula is LCC (local content complexity), so I implemented here:
Before submiting my code, I know it sucks, so it would be nice to have
it as a module, like lcc(STRING, STARTPOSITION, ENDPOSITION)
Now, my code:
from Bio import Fasta
import string
import math
parser=Fasta.RecordParser()
entrada=open("C:\\bioinfo-adv\\blast\\data\\vector.nn","r")
cur_record=1
iterator=Fasta.Iterator(entrada,parser)
while cur_record:
cur_record=iterator.next()
if cur_record is None:
break
tamseq=len(cur_record.sequence)
print tamseq
for ini in range(tamseq-18):
fin=ini+18
primer=cur_record.sequence[ini:fin]
if string.count(primer,'A')==0:
term_a=0
else:
term_a=(string.count(primer,'A')/float(len(primer)))*((math.log(string.count(primer,'A')/float(len(primer))))/math.log(2))
if string.count(primer,'C')==0:
term_c=0
else:
term_c=(string.count(primer,'C')/float(len(primer)))*((math.log(string.count(primer,'C')/float(len(primer))))/math.log(2))
if string.count(primer,'T')==0:
term_t=0
else:
term_t=(string.count(primer,'T')/float(len(primer)))*((math.log(string.count(primer,'T')/float(len(primer))))/math.log(2))
if string.count(primer,'G')==0:
term_g=0
else:
term_g=(string.count(primer,'G')/float(len(primer)))*((math.log(string.count(primer,'G')/float(len(primer))))/math.log(2))
lcc=-(term_a+term_c+term_t+term_g)
print lcc
print 'Cambio crom'
print ''
print ''
entrada.close()
--
Best regards,
//=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ //=\
\=// IT Manager Advanta Seeds - Balcarce Research Center - \=//
//=\ Pro secretario ASALUP - www.asalup.org - PGP key available //=\
\=// E-mail: sbassi@genesdigitales.com - ICQ UIN: 3356556 - \=//
Linux para todos: http://Linuxfacil.info