[Biojava-l] sort fasta file

xyz mitlox at op.pl
Fri Mar 26 09:57:41 UTC 2010

@Andy: Thank you for the explanation. After the last sequence in the
input file in no newline character. 

@James: I change the code in order to get the biggest sequence first,
but the last sequence is missing. 

import java.io.*;
import java.util.*;

import org.biojava.bio.BioException;
import org.biojava.bio.symbol.*;
import org.biojavax.SimpleNamespace;
import org.biojavax.bio.seq.*;

import java.util.Comparator;

public class SortFasta2 {

  static private class RichSequenceComparator implements
  Comparator<RichSequence> {

    public int compare(RichSequence seq1, RichSequence seq2) {
      return  seq2.length() - seq1.length();

  // Usage:  SortFasta unsortedFile.fasta
  public static void main(String[] args) throws FileNotFoundException,
  BioException {

    String fastaFile = "sortFasta.fasta";

    BufferedReader br = new BufferedReader(new FileReader(fastaFile));
    SimpleNamespace ns = new SimpleNamespace("biojava");

    Alphabet protein = AlphabetManager.alphabetForName("DNA");

    RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br,

    SortedSet<RichSequence> sorted = new TreeSet<RichSequence>(new

    while (rsi.hasNext()) {

    Iterator<RichSequence> sortedIt = sorted.iterator();

    /*Do whatever you want here with the ascending list of
    RichSequences by length, I'll just print them. */
    while (sortedIt.hasNext()) {
      //System.out.println(((RichSequence) sortedIt.next()).length());

Input file:

Output on the screen:

How is it possible to get the last sequence and print the output in
fasta format on the screen?

Thank you in advance.

On Thu, 25 Mar 2010 10:17:31 -0400
James Swetnam wrote:

> Just replace the system.out.println with whatever you want to do with
> the sequences; write them to a file, etc.
> James

On Fri, 26 Mar 2010 09:40:28 +0000
"Andy Law (RI)" wrote:

> Does your input file have a line feed at the end or not? (Just a  
> thought)
> Comparable is for comparing two objects using their "natural"
> ordering and is therefore a "fundamental" property of the class. A
> Comparator lets you compare/sort two objects on any characteristics
> and you can have many different comparators. Since this is a somewhat
> arbitrary way of comparing sequences (you could sort them on
> alphabetical sequence for example, or GC content), I guess that's why
> James used a comparator.

More information about the Biojava-l mailing list