[Biojava-l] readFasta problem
xyz
mitlox at op.pl
Wed Apr 21 11:18:24 UTC 2010
On Thu, 8 Apr 2010 12:41:25 +0100
Richard Holland <holland at eaglegenomics.com> wrote:
> You have passed null into the tokenizer parameter of
> RichSequence.IOTools.readFasta() - this is not allowed. The parser
> cannot guess the type of sequence, it must be told what to expect by
> specifying the tokenizer to use. (Importantly this also means that
> you cannot mix different types of sequence within the same file to be
> parsed.)
>
Thank you.
Q1:
Does RichSequenceIterator read the complete file in memory and then I
retrieve each read from memory? Or does it read the file line by line
and I get each read?
Q2:
Why am I not able to retrieve the header from the following fasta file:
>1
atccccc
>2
atccccctttttt
>3
atccccccccccccccccctttt
>4
tttttttccccccccccccccccccccccc
>5
tttttttcccccccccccccccccccccca
with the following code:
import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.FileReader;
import org.biojava.bio.BioException;
import org.biojava.bio.seq.io.SymbolTokenization;
import org.biojava.bio.symbol.AlphabetManager;
import org.biojavax.bio.seq.RichSequence;
import org.biojavax.bio.seq.RichSequenceIterator;
public class SortFasta {
public static void main(String[] args) throws FileNotFoundException,
BioException {
BufferedReader br = new BufferedReader(new
FileReader("sortFasta.fasta")); String type = "DNA";
SymbolTokenization toke = AlphabetManager.alphabetForName(type)
.getTokenization("token");
RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, toke,
null);
while (rsi.hasNext()) {
RichSequence rs = rsi.nextRichSequence();
System.out.println(rs.getDescription());
System.out.println(rs.seqString());
}
}
}
What did I wrong in order to retrieve the header?
More information about the Biojava-l
mailing list