[Biojava-l] SeqIOTools

Ren, Zhen zren at amylin.com
Wed Apr 2 11:29:50 EST 2003


Hi, there,

I have a protein dataset in FASTA format.  The sequence has an ID, followed by a description as shown below:

>AAP00006; Sequence encoded by leader sequence of core antigen.
gglfhlcliiscscptvqasklclgwl

If I use the snippet attached at the end of this email, I will get the result with only the ID, but no description like this:

AAP00006;
GGLFHLCLIISCSCPTVQASKLCLGWL

If I delete a space between ";" and "Sequence" like this one:

>AAP00006;Sequence encoded by leader sequence of core antigen.
gglfhlcliiscscptvqasklclgwl

I will get this:

AAP00006;Sequence
GGLFHLCLIISCSCPTVQASKLCLGWL

So, obviously the method SeqIOTools.readFastaProtein() uses a space (probably all kinds of whitespace) as delimiters to parse whatever into the name property in a sequence.  My question is how I can specify my own delimiter and then display the whole line here as a sequence's name.

Please help.  Thanks a lot.

Zhen

Code snippet:

import java.io.*;
import org.biojava.bio.*;
import org.biojava.bio.seq.*;
import org.biojava.bio.seq.io.*;

public class TestSeqIOTools {

    public static void main(String[] args) {

        if (args.length != 1) {
            System.out.println("Usage: java TestSeqIOTools filename.fasta");
            System.exit(1);
        }

        try {
            BufferedReader fin = new BufferedReader(new FileReader(args[0]));
            SequenceIterator stream = SeqIOTools.readFastaProtein(fin);
            while(stream.hasNext()) {
                Sequence seq = stream.nextSequence();
                System.out.println(seq.getName());
                System.out.println(seq.seqString());
            }
            fin.close();
        } catch(BioException e) {
            System.err.println("BioException: " + e.getMessage());
            e.printStackTrace();
            System.exit(0);
        } catch(IOException ex) {
            System.err.println("IOException: " + ex.getMessage());
        }
    }
}



More information about the Biojava-l mailing list