[Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting]

Dave Messina David.Messina at sbc.su.se
Tue Feb 15 15:47:20 EST 2011


SHA should work as well, didn't think of that (though I suppose the encoding
> step for either would be rate-limiting?).
>

I haven't tested it, but I suspect that encoding either MD5 or SHA would be
relatively quick compared to the sequence I/O, no?



Will have to keep an eye on UCLUST, didn't know about that one.


As it happens, my current pipeline uses MCL but I'm testing UCLUST as a
replacement since it's waaay faster. I'll let you know how the comparison
turns out.

And for that matter, if anyone listening has experience with UCLUST or
CD-HIT or other clustering methods (ideally in the context of metagenomic
next-gen sequence), please chime in with your thoughts.


Dave



More information about the Bioperl-l mailing list