[Biopython] remove list redundancy

Fri Mar 23 23:35:54 UTC 2012

Thanks everyone for helping.
Have I weekend.
Fred

Citando ferreirafm at usp.br:

> Hi Biopy users,
> I have a mult-sequence fasta file which I've read as a list. Is  
> there a clever way/method to remove redundant sequences?
> Thanks in advance,
> Fred
>
> ### CODE:
>     def redundancy(fastafile):
>     f=open(fastafile, 'r')
>     record = list(SeqIO.parse(f,"fasta"))
>     new_rec = record
>     f.close
>     print len(record)
>     for i in range(len(record)):
>         for j in range(len(record)):
>             if i < j:
>                 if record[i].seq == record[j].seq:
>                     del new_rec[j]
>      print len(new_rec)
>
>
> ### RESULTS:
> $ redundancy.py -run all_emm_fake.fasta
> 823
> /usr/lib64/python2.7/site-packages/Bio/Seq.py:197: FutureWarning: In  
> future comparing Seq objects will use string comparison (not object  
> comparison). Incompatible alphabets will trigger a warning (not an  
> exception). In the interim please use id(seq1)==id(seq2) or  
> str(seq1)==str(seq2) to make your code explicit and to avoid this  
> warning.
>   "and to avoid this warning.", FutureWarning)
> 823
>
> ### EXPECTING:
> Worse, the function above is not working. I was expecting 823 before  
> and 822 after running it.
>
>
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>