[Bioperl-l] Fishing redundant sequences in FASTA files [Right formatting]

Cook, Malcolm MEC at stowers.org
Tue Feb 15 15:28:09 EST 2011


there there is CD-HIT

and blastclust from ncbi (which I think still gets installed as part of installed NCBI blast suite)


Malcolm Cook
Stowers Institute for Medical Research -  Bioinformatics
Kansas City, Missouri  USA
 
 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Chris Fields
> Sent: Tuesday, February 15, 2011 2:25 PM
> To: Dave Messina
> Cc: Juan Jovel; bioperl
> Subject: Re: [Bioperl-l] Fishing redundant sequences in FASTA 
> files [Right formatting]
> 
> SHA should work as well, didn't think of that (though I 
> suppose the encoding step for either would be rate-limiting?).  
> 
> Will have to keep an eye on UCLUST, didn't know about that one.
> 
> chris
> 
> On Feb 15, 2011, at 2:09 PM, Dave Messina wrote:
> 
> > Hi Juan,
> > 
> > There's a nice example script in the BioPerl distribution 
> that Jason 
> > Stajich wrote which uses MD5 checksums to do the sequence 
> comparison:
> > 
> > 
> > 
> https://github.com/bioperl/bioperl-live/blob/master/scripts/utilities/
> > bp_nrdb.PLS
> > 
> > 
> > There are also faster, nonBioPerl tools for this, such as 
> the one that 
> > comes with UCLUST:
> > 
> >    http://www.drive5.com/usearch/
> > 
> > 
> > Dave
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



More information about the Bioperl-l mailing list