[BioPython] Is there a limit on Fasta parser? (an a bug spotted on LCC function)

Sebastian Bassi sbassi at asalup.org
Wed Mar 26 16:54:52 EST 2003


Hi,

When I extract info using the fasta parser, I get up to 999950 (and 
sometimes only 999932) nucleotides.
I'm using Biopython 1.10 on Python 2.2.2 on Win2000.
Is this something known?

Regarding LCC function, I found a bug, I forgot to reset a list, so each 
function call, the list resturned was bigger than previous (because it 
include previos results). Here is correct code:

def lcc_mult(seq,wsize,start,end):
     """Return a vector called lccsal, the LCC, a complexity measure 
from a sequence, called seq."""
     l2=math.log(2)
     tamseq=end-start
     global compone
     #print "compone"+str(len(compone))
     global lccsal
     #print "lccsal"+str(len(lccsal))
     compone=[0]
     lccsal=[0]
     for i in range(wsize):
 
compone.append(((i+1)/float(wsize))*((math.log((i+1)/float(wsize)))/l2))
     window=seq[0:wsize]
     cant_a=count(window,'A')
     cant_c=count(window,'C')
     cant_t=count(window,'T')
     cant_g=count(window,'G')
     term_a=compone[cant_a]
     term_c=compone[cant_c]
     term_t=compone[cant_t]
     term_g=compone[cant_g]
     lccsal[0]=(-(term_a+term_c+term_t+term_g))
     tail=seq[0]
     for x in range (tamseq-wsize):
         window=seq[x+1:wsize+x+1]
         if tail==window[-1]:
             lccsal.append(lccsal[-1])
             #break
         elif tail=='A':
             cant_a=cant_a-1
             if window[-1]=='C':
                 cant_c=cant_c+1
                 term_a=compone[cant_a]
                 term_c=compone[cant_c]
                 lccsal.append(-(term_a+term_c+term_t+term_g))
             elif window[-1]=='T':
                 cant_t=cant_t+1
                 term_a=compone[cant_a]
                 term_t=compone[cant_t]
                 lccsal.append(-(term_a+term_c+term_t+term_g))
             elif window[-1]=='G':
                 cant_g=cant_g+1
                 term_a=compone[cant_a]
                 term_g=compone[cant_g]
                 lccsal.append(-(term_a+term_c+term_t+term_g))
         elif tail=='C':
             cant_c=cant_c-1
             if window[-1]=='A':
                 cant_a=cant_a+1
                 term_a=compone[cant_a]
                 term_c=compone[cant_c]
                 lccsal.append(-(term_a+term_c+term_t+term_g))
             elif window[-1]=='T':
                 cant_t=cant_t+1
                 term_c=compone[cant_c]
                 term_t=compone[cant_t]
                 lccsal.append(-(term_a+term_c+term_t+term_g))
             elif window[-1]=='G':
                 cant_g=cant_g+1
                 term_c=compone[cant_c]
                 term_g=compone[cant_g]
                 lccsal.append(-(term_a+term_c+term_t+term_g))
         elif tail=='T':
             cant_t=cant_t-1
             if window[-1]=='A':
                 cant_a=cant_a+1
                 term_a=compone[cant_a]
                 term_t=compone[cant_t]
                 lccsal.append(-(term_a+term_c+term_t+term_g))
             elif window[-1]=='C':
                 cant_c=cant_c+1
                 term_c=compone[cant_c]
                 term_t=compone[cant_t]
                 lccsal.append(-(term_a+term_c+term_t+term_g))
             elif window[-1]=='G':
                 cant_g=cant_g+1
                 term_t=compone[cant_t]
                 term_g=compone[cant_g]
                 lccsal.append(-(term_a+term_c+term_t+term_g))
         elif tail=='G':
             cant_g=cant_g-1
             if window[-1]=='A':
                 cant_a=cant_a+1
                 term_a=compone[cant_a]
                 term_g=compone[cant_g]
                 lccsal.append(-(term_a+term_c+term_t+term_g))
             elif window[-1]=='C':
                 cant_c=cant_c+1
                 term_c=compone[cant_c]
                 term_g=compone[cant_g]
                 lccsal.append(-(term_a+term_c+term_t+term_g))
             elif window[-1]=='T':
                 cant_t=cant_t+1
                 term_t=compone[cant_t]
                 term_g=compone[cant_g]
                 lccsal.append(-(term_a+term_c+term_t+term_g))
         tail=window[0]
     return lccsal


-- 
Best regards,

//=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ   //=\
\=// IT Manager Advanta Seeds - Balcarce Research Center -      \=//
//=\ Pro secretario ASALUP - www.asalup.org - PGP key available //=\
\=// E-mail: sbassi at genesdigitales.com - ICQ UIN: 3356556 -     \=//

               Linux para todos: http://Linuxfacil.info



More information about the BioPython mailing list