[Biojava-l] sort fasta file
James Swetnam
jswetnam at gmail.com
Sun Mar 21 20:56:35 UTC 2010
Just hacked this together, warning: I am new to both java and biojava.
import java.io.*;
import java.util.*;
import org.biojava.bio.BioException;
import org.biojava.bio.symbol.*;
import org.biojavax.SimpleNamespace;
import org.biojavax.bio.seq.*;
import java.util.Comparator;
public class SortFasta {
static private class RichSequenceComparator implements
Comparator<RichSequence> {
public int compare(RichSequence seq1, RichSequence seq2)
{
return seq1.length() - seq2.length();
}
}
// Usage: SortFasta unsortedFile.fasta
public static void main(String[] args) throws FileNotFoundException,
BioException {
String fastaFile = args[0];
BufferedReader br = new BufferedReader(new FileReader(fastaFile));
SimpleNamespace ns = new SimpleNamespace("biojava");
Alphabet protein = AlphabetManager.alphabetForName("PROTEIN");
RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br,
protein.getTokenization("token"),
ns);
SortedSet<RichSequence> sorted = new TreeSet<RichSequence>( new
SortFasta.RichSequenceComparator());
while (rsi.hasNext()) {
sorted.add(rsi.nextRichSequence());
}
Iterator<RichSequence> sortedIt = sorted.iterator();
//Do whatever you want here with the ascending list of RichSequences by
length, I'll just print them.
while(sortedIt.hasNext())
{
System.out.println(((RichSequence) sortedIt.next()).length());
}
}
}
On Sat, Mar 20, 2010 at 6:17 AM, xyz <mitlox at op.pl> wrote:
> Hello,
> I would like to sort multiple fasta file depends on the sequence length,
> ie. from the read with longest sequence to the read with the shortest
> sequence.
>
> import java.io.BufferedReader;
> import java.io.FileNotFoundException;
> import java.io.FileReader;
> import org.biojava.bio.BioException;
>
> import org.biojavax.SimpleNamespace;
> import org.biojavax.bio.seq.RichSequence;
> import org.biojavax.bio.seq.RichSequenceIterator;
>
> public class SortFasta {
>
> public static void main(String[] args) throws FileNotFoundException,
> BioException {
>
> BufferedReader br = new BufferedReader(new
> FileReader("sortfasta.fasta")); SimpleNamespace ns = new
> SimpleNamespace("biojava");
>
> RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, null,
> ns);
>
> while (rsi.hasNext()) {
> RichSequence rs = rsi.nextRichSequence();
> System.out.println(rs.getName());
> System.out.println(rs.seqString());
> }
> }
> }
>
> I have tried to do it, but I do not how to continue.
>
> Thank you in advance.
>
> Best regards,
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
More information about the Biojava-l
mailing list