[Biopython] sort fasta file

Eric Talevich eric.talevich at gmail.com
Wed Mar 17 18:32:44 UTC 2010


xyz <mitlox at op.pl> wrote:

>
> Hello,
> I would like sort multiple fasta file depends on the sequence length,
>  ie. from the read with longest sequence to the read with the shortest
> sequence.
>
> I have tried to do it but I do not how to sort the records depends on
> the sequence length.
>
> [...]
>
> If I could not hold all the records in memory at once what could I do?
>

There's also a program called uclust which can sort reads by sequence length
very quickly:
http://www.drive5.com/uclust/

It's designed for clustering short reads, but it includes a feature to sort
sequences by decreasing length. I think it can handle files larger than
available RAM, too, though I haven't tested that.

-Eric



More information about the Biopython mailing list