[Biojava-l] SeqIOTools
Ren, Zhen
zren at amylin.com
Wed Apr 2 11:29:50 EST 2003
Hi, there,
I have a protein dataset in FASTA format. The sequence has an ID, followed by a description as shown below:
>AAP00006; Sequence encoded by leader sequence of core antigen.
gglfhlcliiscscptvqasklclgwl
If I use the snippet attached at the end of this email, I will get the result with only the ID, but no description like this:
AAP00006;
GGLFHLCLIISCSCPTVQASKLCLGWL
If I delete a space between ";" and "Sequence" like this one:
>AAP00006;Sequence encoded by leader sequence of core antigen.
gglfhlcliiscscptvqasklclgwl
I will get this:
AAP00006;Sequence
GGLFHLCLIISCSCPTVQASKLCLGWL
So, obviously the method SeqIOTools.readFastaProtein() uses a space (probably all kinds of whitespace) as delimiters to parse whatever into the name property in a sequence. My question is how I can specify my own delimiter and then display the whole line here as a sequence's name.
Please help. Thanks a lot.
Zhen
Code snippet:
import java.io.*;
import org.biojava.bio.*;
import org.biojava.bio.seq.*;
import org.biojava.bio.seq.io.*;
public class TestSeqIOTools {
public static void main(String[] args) {
if (args.length != 1) {
System.out.println("Usage: java TestSeqIOTools filename.fasta");
System.exit(1);
}
try {
BufferedReader fin = new BufferedReader(new FileReader(args[0]));
SequenceIterator stream = SeqIOTools.readFastaProtein(fin);
while(stream.hasNext()) {
Sequence seq = stream.nextSequence();
System.out.println(seq.getName());
System.out.println(seq.seqString());
}
fin.close();
} catch(BioException e) {
System.err.println("BioException: " + e.getMessage());
e.printStackTrace();
System.exit(0);
} catch(IOException ex) {
System.err.println("IOException: " + ex.getMessage());
}
}
}
More information about the Biojava-l
mailing list