[Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting]
Dave Messina
David.Messina at sbc.su.se
Tue Feb 15 15:47:20 EST 2011
SHA should work as well, didn't think of that (though I suppose the encoding
> step for either would be rate-limiting?).
>
I haven't tested it, but I suspect that encoding either MD5 or SHA would be
relatively quick compared to the sequence I/O, no?
Will have to keep an eye on UCLUST, didn't know about that one.
As it happens, my current pipeline uses MCL but I'm testing UCLUST as a
replacement since it's waaay faster. I'll let you know how the comparison
turns out.
And for that matter, if anyone listening has experience with UCLUST or
CD-HIT or other clustering methods (ideally in the context of metagenomic
next-gen sequence), please chime in with your thoughts.
Dave
More information about the Bioperl-l
mailing list