[Biojava-l] sort fasta file
xyz
mitlox at op.pl
Thu Mar 25 13:23:37 UTC 2010
Hi James,
Thank you for the solution, but I get this
7
13
23
30
as output for this input file:
>1
atccccc
>2
atccccctttttt
>3
atccccccccccccccccctttt
>4
tttttttccccccccccccccccccccccc
>5
tttttttccccccccccccccccccccccc
How is it possible to fix it and why did you chose Comparator and not
Comparable?
Thank you in advance.
Best regards,
On Sun, 21 Mar 2010 16:56:35 -0400
James Swetnam <jswetnam at gmail.com> wrote:
> Just hacked this together, warning: I am new to both java and biojava.
>
> import java.io.*;
> import java.util.*;
>
> import org.biojava.bio.BioException;
> import org.biojava.bio.symbol.*;
> import org.biojavax.SimpleNamespace;
> import org.biojavax.bio.seq.*;
>
> import java.util.Comparator;
>
> public class SortFasta {
>
> static private class RichSequenceComparator implements
> Comparator<RichSequence> {
>
> public int compare(RichSequence seq1, RichSequence seq2)
> {
> return seq1.length() - seq2.length();
> }
>
>
> }
>
> // Usage: SortFasta unsortedFile.fasta
> public static void main(String[] args) throws
> FileNotFoundException, BioException {
>
> String fastaFile = args[0];
>
> BufferedReader br = new BufferedReader(new FileReader(fastaFile));
> SimpleNamespace ns = new SimpleNamespace("biojava");
>
> Alphabet protein = AlphabetManager.alphabetForName("PROTEIN");
>
> RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br,
> protein.getTokenization("token"),
> ns);
>
> SortedSet<RichSequence> sorted = new TreeSet<RichSequence>( new
> SortFasta.RichSequenceComparator());
>
> while (rsi.hasNext()) {
> sorted.add(rsi.nextRichSequence());
> }
>
> Iterator<RichSequence> sortedIt = sorted.iterator();
>
> //Do whatever you want here with the ascending list of
> RichSequences by length, I'll just print them.
> while(sortedIt.hasNext())
> {
> System.out.println(((RichSequence) sortedIt.next()).length());
> }
> }
> }
>
More information about the Biojava-l
mailing list