[Biopython] remove list redundancy

Fri Mar 23 22:27:54 UTC 2012

Not a BioPython solution per se but you can uniquify your list using a set.

HTH
Paulo
Sent from my BlackBerry device on the Rogers Wireless Network

-----Original Message-----
From: ferreirafm at usp.br
Sender: biopython-bounces at lists.open-bio.org
Date: Fri, 23 Mar 2012 18:55:27 
To: <biopython at biopython.org>
Subject: [Biopython] remove list redundancy

Hi Biopy users,
I have a mult-sequence fasta file which I've read as a list. Is there  
a clever way/method to remove redundant sequences?
Thanks in advance,
Fred

### CODE:
     def redundancy(fastafile):
     f=open(fastafile, 'r')
     record = list(SeqIO.parse(f,"fasta"))
     new_rec = record
     f.close
     print len(record)
     for i in range(len(record)):
         for j in range(len(record)):
             if i < j:
                 if record[i].seq == record[j].seq:
                     del new_rec[j]
      print len(new_rec)

### RESULTS:
$ redundancy.py -run all_emm_fake.fasta
823
/usr/lib64/python2.7/site-packages/Bio/Seq.py:197: FutureWarning: In  
future comparing Seq objects will use string comparison (not object  
comparison). Incompatible alphabets will trigger a warning (not an  
exception). In the interim please use id(seq1)==id(seq2) or  
str(seq1)==str(seq2) to make your code explicit and to avoid this  
warning.
   "and to avoid this warning.", FutureWarning)
823

### EXPECTING:
Worse, the function above is not working. I was expecting 823 before  
and 822 after running it.

_______________________________________________
Biopython mailing list  -  Biopython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython