[BioPython] Is there a limit on Fasta parser? (an a bug spotted on LCC function)

Jeffrey Chang jchang at jeffchang.com
Sun Mar 30 20:47:31 EST 2003


Hi Sebastian,

Are you using the parser in Bio.Fasta, or from Bio.SeqRecord?  Either  
way, though, there should be no limits in the parser.  It should be  
limited only by memory.  It's troubling that you are getting different  
number of nucleotides.  Is this reproducible?

As for LCC, thanks for the bug report.  Who's working on this part?   
Iddo?  Which module is the LCC function in?

Jeff


On Wednesday, March 26, 2003, at 11:54  AM, Sebastian Bassi wrote:

> Hi,
>
> When I extract info using the fasta parser, I get up to 999950 (and  
> sometimes only 999932) nucleotides.
> I'm using Biopython 1.10 on Python 2.2.2 on Win2000.
> Is this something known?
>
> Regarding LCC function, I found a bug, I forgot to reset a list, so  
> each function call, the list resturned was bigger than previous  
> (because it include previos results). Here is correct code:
>
> def lcc_mult(seq,wsize,start,end):
>     """Return a vector called lccsal, the LCC, a complexity measure  
> from a sequence, called seq."""
>     l2=math.log(2)
>     tamseq=end-start
>     global compone
>     #print "compone"+str(len(compone))
>     global lccsal
>     #print "lccsal"+str(len(lccsal))
>     compone=[0]
>     lccsal=[0]
>     for i in range(wsize):
> compone.append(((i+1)/float(wsize))*((math.log((i+1)/float(wsize)))/ 
> l2))
>     window=seq[0:wsize]
>     cant_a=count(window,'A')
>     cant_c=count(window,'C')
>     cant_t=count(window,'T')
>     cant_g=count(window,'G')
>     term_a=compone[cant_a]
>     term_c=compone[cant_c]
>     term_t=compone[cant_t]
>     term_g=compone[cant_g]
>     lccsal[0]=(-(term_a+term_c+term_t+term_g))
>     tail=seq[0]
>     for x in range (tamseq-wsize):
>         window=seq[x+1:wsize+x+1]
>         if tail==window[-1]:
>             lccsal.append(lccsal[-1])
>             #break
>         elif tail=='A':
>             cant_a=cant_a-1
>             if window[-1]=='C':
>                 cant_c=cant_c+1
>                 term_a=compone[cant_a]
>                 term_c=compone[cant_c]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>             elif window[-1]=='T':
>                 cant_t=cant_t+1
>                 term_a=compone[cant_a]
>                 term_t=compone[cant_t]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>             elif window[-1]=='G':
>                 cant_g=cant_g+1
>                 term_a=compone[cant_a]
>                 term_g=compone[cant_g]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>         elif tail=='C':
>             cant_c=cant_c-1
>             if window[-1]=='A':
>                 cant_a=cant_a+1
>                 term_a=compone[cant_a]
>                 term_c=compone[cant_c]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>             elif window[-1]=='T':
>                 cant_t=cant_t+1
>                 term_c=compone[cant_c]
>                 term_t=compone[cant_t]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>             elif window[-1]=='G':
>                 cant_g=cant_g+1
>                 term_c=compone[cant_c]
>                 term_g=compone[cant_g]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>         elif tail=='T':
>             cant_t=cant_t-1
>             if window[-1]=='A':
>                 cant_a=cant_a+1
>                 term_a=compone[cant_a]
>                 term_t=compone[cant_t]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>             elif window[-1]=='C':
>                 cant_c=cant_c+1
>                 term_c=compone[cant_c]
>                 term_t=compone[cant_t]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>             elif window[-1]=='G':
>                 cant_g=cant_g+1
>                 term_t=compone[cant_t]
>                 term_g=compone[cant_g]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>         elif tail=='G':
>             cant_g=cant_g-1
>             if window[-1]=='A':
>                 cant_a=cant_a+1
>                 term_a=compone[cant_a]
>                 term_g=compone[cant_g]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>             elif window[-1]=='C':
>                 cant_c=cant_c+1
>                 term_c=compone[cant_c]
>                 term_g=compone[cant_g]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>             elif window[-1]=='T':
>                 cant_t=cant_t+1
>                 term_t=compone[cant_t]
>                 term_g=compone[cant_g]
>                 lccsal.append(-(term_a+term_c+term_t+term_g))
>         tail=window[0]
>     return lccsal
>
>
> -- 
> Best regards,
>
> //=\ Sebastian Bassi - Diplomado en Ciencia y Tecnologia, UNQ   //=\
> \=// IT Manager Advanta Seeds - Balcarce Research Center -      \=//
> //=\ Pro secretario ASALUP - www.asalup.org - PGP key available //=\
> \=// E-mail: sbassi at genesdigitales.com - ICQ UIN: 3356556 -     \=//
>
>               Linux para todos: http://Linuxfacil.info
>
> _______________________________________________
> BioPython mailing list  -  BioPython at biopython.org
> http://biopython.org/mailman/listinfo/biopython



More information about the BioPython mailing list