[BioPython] Informative content problem, SOLVED!!!

Iddo Friedberg idoerg@burnham.org
Fri, 29 Nov 2002 08:46:27 -0800


Hi Sebastian,

Yep, you were absolutely right in your diagnosis: IC (as a measure of 
positional evolutionary conservarion) is applicable to alignments. 
Whereas LCC is used (as you have demonstrated) for finding complexity in 
a sequence segment.

Yes, it might be a good idea to implement LCC as a method. Not of 
Seq.Seq though... Jeff, what is the policy on these things? Method or 
function? Where?

Best,

Iddo

Sebastian Bassi wrote:
> Hi,
> 
> Now I know what's going on. The formulae used by biopython is ONLY for 
> aligments (since it uses information from every sequence on the aligment).
> My formula is LCC (local content complexity), so I implemented here:
> 
> Before submiting my code, I know it sucks, so it would be nice to have 
> it as a module, like lcc(STRING, STARTPOSITION, ENDPOSITION)
> 
> Now, my code:
> 
> 
> from Bio import Fasta
> import string
> import math
> 
> 
> parser=Fasta.RecordParser()
> entrada=open("C:\\bioinfo-adv\\blast\\data\\vector.nn","r")
> cur_record=1
> iterator=Fasta.Iterator(entrada,parser)
> 
> while cur_record:
>     cur_record=iterator.next()
>     if cur_record is None:
>         break
>     tamseq=len(cur_record.sequence)
>     print tamseq
> 
>     for ini in range(tamseq-18):
>         fin=ini+18
>         primer=cur_record.sequence[ini:fin]
> 
>         if string.count(primer,'A')==0:
>             term_a=0
>         else:
> 
> term_a=(string.count(primer,'A')/float(len(primer)))*((math.log(string.count(primer,'A')/float(len(primer))))/math.log(2)) 
> 
> 
>         if string.count(primer,'C')==0:
>             term_c=0
>         else:
> 
> term_c=(string.count(primer,'C')/float(len(primer)))*((math.log(string.count(primer,'C')/float(len(primer))))/math.log(2)) 
> 
> 
>         if string.count(primer,'T')==0:
>             term_t=0
>         else:
> 
> term_t=(string.count(primer,'T')/float(len(primer)))*((math.log(string.count(primer,'T')/float(len(primer))))/math.log(2)) 
> 
> 
>         if string.count(primer,'G')==0:
>             term_g=0
>         else:
> 
> term_g=(string.count(primer,'G')/float(len(primer)))*((math.log(string.count(primer,'G')/float(len(primer))))/math.log(2)) 
> 
> 
>         lcc=-(term_a+term_c+term_t+term_g)
>         print lcc
>     print 'Cambio crom'
>     print ''
>     print ''
> 
> entrada.close()
> 
> 

-- 
Iddo Friedberg
The Burnham Institute
10901 N. Torrey Pines Rd.
La Jolla, CA 92037
USA
Tel: +1 (858) 646 3100 x3516
Fax: +1 (858) 646 3171