[Biojava-l] BioJava translation
Andy Yates
ayates at ebi.ac.uk
Wed Oct 13 16:25:41 UTC 2010
That's great news and should be even faster once we get rid of the requirement to upper case since you're having to parse the same sequence twice.
I wonder what the C version does to make itself even faster
Andy
On 13 Oct 2010, at 17:13, Pjotr Prins wrote:
> On Wed, Oct 13, 2010 at 12:27:05PM +0100, Richard Holland wrote:
>> BJ3 should be replacing most sequence operations with string operations, making the whole thing much faster.
>
> Good news, BJ3 is a lot faster! The previous version took 2 minutes
> for the C.elegans genome (33 Mb), the BJ3 version takes 27sec on my
> modest Thinkpad X61 laptop. After parsing the Fasta and turning it
> into an upper case string the actual translation takes 16sec.
>
> Only the C implementations are faster.
>
> Here the relevant Scala code:
>
> import bio._
> import java.io._
> import org.biojava3.core.sequence._
> import org.biojava3.core.sequence.transcription.TranscriptionEngine
> import org.biojava3.core.sequence.io.IUPACParser
>
> // <cut> fetching infile from command line...
>
> IUPACParser.getInstance().getTable(1); // not sure we need this
> IUPACParser.getInstance().getTable("UNIVERSAL");
> val engine = TranscriptionEngine.getDefault()
> val f = new FastaReader(infile)
> f.foreach {
> res =>
> val (id,tag,dna) = res
> println(List(">",id).mkString)
> val dna2 = new DNASequence(dna.mkString.toUpperCase)
> val rna = dna2.getRNASequence(engine)
> println(rna.getProteinSequence(engine))
> }
> }
>
> prints:
>
>> B0222.10
> MLYWNDLNTVGIVADTIWKYYADQYKRLIKEHSKIRFNPLLHAASVIRIFDIHINLFNNFVTFLLVIFLYIFLIYYVTFFVFPFGPLRVSHWMRFFKIIISRSHLG
>> B0222.11
> MSRRTASKLLVFVFLCSLCFGTQRYDMPRKIDLFNDLITQSTTPASPKCQCLPPTTPSTPPNCIPYDSRLQAASLEEAIVAFPDLTITRQEKTQQSTATLNNCKTKQCRDCYKDLRSQLRKVGLLPGTIDQVFHNQRNFTTCQKYRFARQDKGVYEKKKKAKQHYDWDYVEYDEDEDDDYFWDGLFWKKKRNVLKKIVKRDVEATTAISQPPNSTAMNSTGIIGIRFPISCTTRGVTPDGLGTVSLCSTCWVWRRLPSTYYPAYLNEVVCDYADTSCLSGYASCQTGTQQLNVLRNDSGKLIPISVSAGINCECRLAVGSTLESLVLGQGISKAMPPIDTTSTKPPNLATSTTSHS
> (...)
>
> Pj.
>
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
--
Andrew Yates Ensembl Genomes Engineer
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/
More information about the Biojava-l
mailing list