[Biojava-l] reading paptides from a fasta file
Gerster Sarah
sgerster at student.ethz.ch
Wed Nov 8 08:14:39 UTC 2006
Hi!
I'm trying to read peptides from a fasta file:
>id|0|0.9992|1
ASITENGGAEEESVAK
>id|1|0.9953|1
ASITENGGAEEESVAK
>id|2|0.9998|1
ASNASSAGDEVDNVATSSK
>id|3|0.9998|1
EAAAAEEPQPSDEGDVVAK
>id|4|0.9998|1
EAAAAEEPQPSDEGDVVAK
....
I would like to have all peptides somewhere in the memory. I need, their id, the sequence and the 2 numbers at the end (e.g. id = 0, probability = 0.9992, rank = 1 for the first entry in the file).
I tried to use readFastaProtein... but I guess I don't use it right. Anyway, I get the sequences, but I don't get any of the other infomations I want...
Here is my code:
try
{
BufferedReader br = new BufferedReader(new FileReader(file_name));
RichSequenceIterator rich_stream = RichSequence.IOTools.readFastaProtein(br,null);
while(rich_stream.hasNext())
{
RichSequence rich_seq = rich_stream.nextRichSequence();
System.out.println(rich_seq.toString());
System.out.println(rich_seq.getAccession());
System.out.println(rich_seq.getAlphabet());
System.out.println(rich_seq.getAnnotation());
System.out.println(rich_seq.getName());
System.out.println(rich_seq.getDescription());
System.out.println(rich_seq.getIdentifier());
System.out.println(rich_seq.seqString());
}
}
catch(Exception e)
{
System.err.println("Bug while reading the sequences from the FASTA file");
}
Here's the output (for the first entry in the fasta file):
id|0:1/0.9992
0
org.biojava.bio.symbol.AlphabetManager$ImmutableWellKnownAlphabetWrapper at 1df073d
1
null
null
ASITENGGAEEESVAK
Can anyone tell me what's going wrong?
Is there already a function to put all the sequences directly in the memory (like a HashSet) while reading them?
Cheers
Sarah
More information about the Biojava-l
mailing list