[Biojava-dev] FastaFormat performance enhancement
ml-it-biojava-dev at epigenomics.com
ml-it-biojava-dev at epigenomics.com
Wed Oct 19 04:41:38 EDT 2005
Hi,
I had a lot of trouble using SeqIOTools.writeFasta on large sequences. The subStr method of SymbolList seems to introduce a memory leak (I did not track that in detail!). Anyway I would suggest to change FastaFormat:
public void writeSequence(Sequence seq, PrintStream os)
throws IOException {
os.print(">");
os.println(describeSequence(seq));
int length = seq.length();
for (int pos = 1; pos <= length; pos += lineWidth) {
int end = Math.min(pos + lineWidth - 1, length);
os.println(seq.subStr(pos, end));
}
}
to
public void writeSequence(Sequence seq, PrintStream os)
throws IOException {
os.print(">");
os.println(describeSequence(seq));
int length = seq.length();
String seqString = seq.seqString();
for (int pos = 0; pos < length; pos += lineWidth) {
int end = Math.min(pos + lineWidth, length);
String sub = seqString.substring(pos, end);
os.println(sub);
}
}
since it is String manipulation that takes place in the loop, I think there is no point in using SymbolList subStr anyway.
ciao dirk
--
Dirk Habighorst Software Engineer/ Bioinformatician
Epigenomics AG Kleine Praesidentenstr. 1 10178 Berlin, Germany
phone:+49-30-24345-372 fax:+49-30-24345-555
http://www.epigenomics.com dirk.habighorst at epigenomics.com
More information about the biojava-dev
mailing list