[Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting]

Jason Stajich jason at bioperl.org
Tue Feb 15 17:21:34 EST 2011


also see cd-hit which allows you to tune the %id matching.


Dave Messina wrote:
>> But one nice thing is clustering allows for partial matches (which I think
>> is the original criterion).  I don't believe SHA/MD5 would work for that
>> purpose.
>
>
> Yep, for sure. Checksums will find full-length exact matches only.
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Jason Stajich
jason at bioperl.org
http://bioperl.org/wiki




More information about the Bioperl-l mailing list