[Biopython] remove list redundancy

Iddo Friedberg idoerg at gmail.com
Fri Mar 23 18:19:27 EDT 2012


Python assigns by reference, not by value. So you can have the following:

>>> a=[1,2,3]
>>> b=a
>>> print b
[1, 2, 3]
>>> del b[1]
>>> print a
[1, 3]
>>>

So if you remove an item from list b, it will remove it from a as well.
Which is why in your case, record and new_rec end up the same, since they
were the same to start off with.

Furthermore, in your loop, you are changing the length of "record" which is
the target of a for loop. Never a good idea and yields unexpected results.
Finally, the index "j" you are using points to one thing in record, but
will point to another thing in new_rec.

You can do an assignment by value using the copy module
new_rec=copy.copy(record)

That will create a completely new copy of record in new_rec.

That still won't solve the problem that you have in the shifting place "j"
points to in the loop though.

I would suggest building a list of non-redundant sequences rather than
deleting from a list of redundant sequences.


HTH,

Iddo

On Fri, Mar 23, 2012 at 5:55 PM, <ferreirafm at usp.br> wrote:

> Hi Biopy users,
> I have a mult-sequence fasta file which I've read as a list. Is there a
> clever way/method to remove redundant sequences?
> Thanks in advance,
> Fred
>
> ### CODE:
>    def redundancy(fastafile):
>    f=open(fastafile, 'r')
>    record = list(SeqIO.parse(f,"fasta"))
>    new_rec = record
>    f.close
>    print len(record)
>    for i in range(len(record)):
>        for j in range(len(record)):
>            if i < j:
>                if record[i].seq == record[j].seq:
>                    del new_rec[j]
>     print len(new_rec)
>
>
> ### RESULTS:
> $ redundancy.py -run all_emm_fake.fasta
> 823
> /usr/lib64/python2.7/site-**packages/Bio/Seq.py:197: FutureWarning: In
> future comparing Seq objects will use string comparison (not object
> comparison). Incompatible alphabets will trigger a warning (not an
> exception). In the interim please use id(seq1)==id(seq2) or
> str(seq1)==str(seq2) to make your code explicit and to avoid this warning.
>  "and to avoid this warning.", FutureWarning)
> 823
>
> ### EXPECTING:
> Worse, the function above is not working. I was expecting 823 before and
> 822 after running it.
>
>
>
>
> ______________________________**_________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/**mailman/listinfo/biopython<http://lists.open-bio.org/mailman/listinfo/biopython>
>



-- 
Iddo Friedberg
http://iddo-friedberg.net/contact.html
++++++++++[>+++>++++++>++++++++>++++++++++>+++++++++++<<<<<-]>>>>++++.>
++++++..----.<<<<++++++++++++++++++++++++++++.-----------..>>>+.-----.
.>-.<<<<--.>>>++.>+++.<+++.----.-.<++++++++++++++++++.>+.>.<++.<<<+.>>
>>----.<--.>++++++.<<<<------------------------------------.


More information about the Biopython mailing list