[Biopython] removing redundant sequence

Brad Chapman chapmanb at 50mail.com
Thu Apr 22 12:18:10 UTC 2010


Bala;

> > I created a sample fasta
> > file with two redundant sequences. But when i use checksums seguid to spot
> > the redundancies, it spots only the first one.

> What you should do is loop over the records and keep a record
> of the checksums you have saved, and use that to ignore duplicates.
> I would use a python set rather than a python list for speed.
> 
> You could do this with a for loop. However, I would probably use an
> iterator based approach with a generator function - I think it is more
> elegant but perhaps not so easy for a beginner:
[... Nice code example from Peter ..]

This is a nice problem example and discussion. Bala, it sounds like
Peter provided some useful example code to solve this. Once you use
this to get together a program that solves your problem, it would be
very helpful if you could write it up as a Cookbook entry:

http://biopython.org/wiki/Category:Cookbook

That would help others in the future who will be tackling similar
issues. Thanks much,
Brad



More information about the Biopython mailing list