[Biopython] remove list redundancy
nuin at genedrift.org
nuin at genedrift.org
Fri Mar 23 22:27:54 UTC 2012
Not a BioPython solution per se but you can uniquify your list using a set.
HTH
Paulo
Sent from my BlackBerry device on the Rogers Wireless Network
-----Original Message-----
From: ferreirafm at usp.br
Sender: biopython-bounces at lists.open-bio.org
Date: Fri, 23 Mar 2012 18:55:27
To: <biopython at biopython.org>
Subject: [Biopython] remove list redundancy
Hi Biopy users,
I have a mult-sequence fasta file which I've read as a list. Is there
a clever way/method to remove redundant sequences?
Thanks in advance,
Fred
### CODE:
def redundancy(fastafile):
f=open(fastafile, 'r')
record = list(SeqIO.parse(f,"fasta"))
new_rec = record
f.close
print len(record)
for i in range(len(record)):
for j in range(len(record)):
if i < j:
if record[i].seq == record[j].seq:
del new_rec[j]
print len(new_rec)
### RESULTS:
$ redundancy.py -run all_emm_fake.fasta
823
/usr/lib64/python2.7/site-packages/Bio/Seq.py:197: FutureWarning: In
future comparing Seq objects will use string comparison (not object
comparison). Incompatible alphabets will trigger a warning (not an
exception). In the interim please use id(seq1)==id(seq2) or
str(seq1)==str(seq2) to make your code explicit and to avoid this
warning.
"and to avoid this warning.", FutureWarning)
823
### EXPECTING:
Worse, the function above is not working. I was expecting 823 before
and 822 after running it.
_______________________________________________
Biopython mailing list - Biopython at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list