From matthew_pocock at yahoo.co.uk Mon Mar 1 06:42:51 2004 From: matthew_pocock at yahoo.co.uk (Matthew Pocock) Date: Mon Mar 1 06:49:30 2004 Subject: [Biojava-l] Bug in biojava GUI? In-Reply-To: <200402290833.i1T8XF4q003989@mail.cbi.pku.edu.cn> References: <200402290833.i1T8XF4q003989@mail.cbi.pku.edu.cn> Message-ID: <404321BB.1080900@yahoo.co.uk> Hi Wux, Sounds like a bug to me. Could you send me some example code? I will try to track it down. What version of Java and BioJava are you using? Matthew wux@mail.cbi.pku.edu.cn wrote: >Dear all: > > When I add a TitleBorder to SequencePanel, The bottom line in MutilLineRenderer is disappear. I change the border toLineBorder, all the lines are in the line frame. I am not sure it is due to biojava's problem or java's . >For example: > ---- Title Border -------------------------- > | -----> -----> | > | -----> | > | <---- | > | acgtttttttttaaatttttttttttttttttttttttt | > -------------------------------------------- > ----5------10------15-------------------- ( This line is disappear !) > >change a border to SequencePanel : > ----------------- -------------------------- > | -----> -----> | > | -----> | > | <---- | > | acgtttttttttaaatttttttttttttttttttttttt | > | ----5------10------15-------------------| ( This line is correct in line frame !) > -------------------------------------------- >¡¡¡¡ > Who else meets the same problem? >¡¡¡¡¡¡¡¡¡¡¡¡ > Yours faithfully, >¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ wux >¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ wux@mail.cbi.pku.edu.cn >¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡¡ 2004-02-29 >***************************************************** >WuXin Ph.D student of CBI (Center of Bioinformatics) >Peking University 100871 P.R.China >Email: wux@mail.cbi.pku.edu.cn >Tel: 010-62762409 (dorm) > 010-62755206 (office) >Address: Building 47#2026 Peking University >***************************************************** > > >_______________________________________________ >Biojava-l mailing list - Biojava-l@biojava.org >http://biojava.org/mailman/listinfo/biojava-l > > > From rohini_sulatycki at yahoo.com Mon Mar 1 11:50:18 2004 From: rohini_sulatycki at yahoo.com (Sulatycki Rohini) Date: Mon Mar 1 11:56:12 2004 Subject: [Biojava-l] Biojava development help needed? In-Reply-To: Message-ID: <20040301165019.81904.qmail@web9808.mail.yahoo.com> Hello, I am a newbie to Biojava and would like to contribute in any way to the project. I am fairly experienced Java architect/developer and can contribute towards development, architecture etc. I would appreciate it if this group could let me know if/how I can help. Thanks Rohini Sulatycki ===== Rohini Sulatycki Technical Architect rohini_sulatycki@yahoo.com __________________________________ Do you Yahoo!? Get better spam protection with Yahoo! Mail. http://antispam.yahoo.com/tools From douglas.hoen at mail.mcgill.ca Thu Mar 4 13:33:18 2004 From: douglas.hoen at mail.mcgill.ca (Douglas Hoen) Date: Thu Mar 4 13:39:28 2004 Subject: [Biojava-l] NPE from FlatModel. with SimpleModelInState Message-ID: <1078425198.2996.215.camel@elegans> Hi, I want to put a two-headed markov model in a SimpleModelInState. But when I attempt to call DPFactory.DEFAULT.createDP, I get a NullPointerException from inside the org.biojava.bio.dp.FlatModel constructor. I get the same problem from the biojava-1.30-jdk13.jar, biojava-1.30-jdk14.jar, and biojava-1.3.1.jar distributions. Also, I turned up a similar thread in Dec. 2002 biojava-l archive entitled "SimpleModelInState Problem?". Matthew Pocock responded to the query saying he was about to work on the issue, but there is no followup. My questions: Is this a real bug? If so, should I expect this API (ModelInState for two-headed models) to be supported in the near future, or should I simply build a flat model? The stack trace, possible problem biojava code, and snippets of my test code are below. Thanks very much for any help. Doug Here is the stack trace: ------------------------ Exception in thread "main" java.lang.NullPointerException at org.biojava.bio.dp.FlatModel.(FlatModel.java:251) at org.biojava.bio.dp.DP.flatView(DP.java:168) at org.biojava.bio.dp.DPFactory$DefaultFactory.createDP(DPFactory.java:52) at ltr.ModelInStateExample.main(ModelInStateExample.java:124) rethrown as org.biojava.utils.NestedError: Can't align. at ltr.ModelInStateExample.main(ModelInStateExample.java:133) The NPE seems to be coming from the following FlatModel code: ------------------------------------------------------------- // // FIXME -- [SOMEONE (ed.)] broked this... // TranslatedDistribution dist = null; // TranslatedDistribution dist TranslatedDistribution.getDistribution( // delegate.transitionsFrom(s), // sModel.getWeights(sOrig) // ); SimpleReversibleTranslationTable table = (SimpleReversibleTranslationTable) dist.getTable(); Here is the model creation code: -------------------------------- ... FiniteAlphabet singleAlphabet = DNATools.getDNA(); FiniteAlphabet doubleAlphabet = ( FiniteAlphabet ) AlphabetManager.getCrossProductAlphabet( Collections.nCopies( 2, singleAlphabet ) ); State matchState = new SimpleEmissionState( "match", Annotation.EMPTY_ANNOTATION, new int[] { 1, 1 }, new UniformDistribution( doubleAlphabet ) ); State insertState = new SimpleEmissionState( "insert", Annotation.EMPTY_ANNOTATION, new int[] { 0, 1 }, new PairDistribution( new GapDistribution( singleAlphabet ), new UniformDistribution( singleAlphabet ) ) ); State deleteState = new SimpleEmissionState( "delete", Annotation.EMPTY_ANNOTATION, new int[] { 1, 0 }, new PairDistribution( new UniformDistribution( singleAlphabet ), new GapDistribution( singleAlphabet ) ) ); SimpleMarkovModel innerModel = new SimpleMarkovModel( 2, doubleAlphabet, "inner"); innerModel.addState( matchState ); innerModel.addState( insertState ); innerModel.addState( deleteState ); innerModel.createTransition( innerModel.magicalState(), matchState ); innerModel.createTransition( matchState, insertState ); innerModel.createTransition( matchState, deleteState ); innerModel.createTransition( matchState, matchState ); innerModel.createTransition( matchState, innerModel.magicalState() ); innerModel.createTransition( insertState, matchState ); innerModel.createTransition( insertState, insertState ); innerModel.createTransition( deleteState, matchState ); innerModel.createTransition( deleteState, deleteState ); double d = 0.1; double e = 0.1; double t = 0.1; innerModel.getWeights( innerModel.magicalState() ) .setWeight( matchState, 1.0 ); Distribution fromMatch = innerModel.getWeights( matchState ); fromMatch.setWeight( insertState, d ); fromMatch.setWeight( deleteState, d ); fromMatch.setWeight( matchState, 1-2*d-t ); fromMatch.setWeight( innerModel.magicalState(), t ); Distribution fromInsert = innerModel.getWeights( insertState ); fromInsert.setWeight( matchState, 1-e ); fromInsert.setWeight( insertState, e ); Distribution fromDelete = innerModel.getWeights( deleteState ); fromDelete.setWeight( matchState, 1-e ); fromDelete.setWeight( deleteState, e ); State innerModelInState = new SimpleModelInState( innerModel, "modelInState" ); SimpleMarkovModel outerModel = new SimpleMarkovModel( 2, doubleAlphabet, "outer" ); outerModel.addState( innerModelInState ); outerModel.createTransition( outerModel.magicalState(), innerModelInState ); outerModel.createTransition( innerModelInState, outerModel.magicalState() ); outerModel.getWeights( outerModel.magicalState() ) .setWeight( innerModelInState, 1.0 ); outerModel.getWeights( innerModelInState ) .setWeight( outerModel.magicalState(), 1.0 ); ... DPFactory.DEFAULT.createDP( outerModel ); <==================== NPE From Gera.Jellema at wur.nl Thu Mar 4 08:30:12 2004 From: Gera.Jellema at wur.nl (Jellema, Gera) Date: Thu Mar 4 19:36:29 2004 Subject: [Biojava-l] Reading frames and amino acids Message-ID: Hi, I'm new to biojava so I don't know if this has already been asked. I have a genome sequence and I want to have it in the 6 reading frames, and then per reading frame translated into amino acids so I can look for proteins. I don't know how I have to do it. Thanks, Gera -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/ms-tnef Size: 2515 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-l/attachments/20040304/553abd15/attachment.bin From mark.schreiber at group.novartis.com Thu Mar 4 23:16:09 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Thu Mar 4 23:19:26 2004 Subject: [Biojava-l] Reading frames and amino acids Message-ID: Hi Gera - The code below seems to work for me. Please note that I have not tested it thoroughly so you might want to eyeball a few results to make sure they are sensible. You could probably improve it by making some of the 'in line' code into methods etc. - Mark import java.io.*; import org.biojava.bio.*; import org.biojava.bio.seq.*; import org.biojava.bio.seq.io.*; import org.biojava.bio.symbol.*; /** *

Program to six-frame translate a nucleotide sequence

*/ public class Hex { /** * Call this to get usage info, program terminates after call. */ public static void help() { System.out.println( "usage: java utils.Hex "); System.exit( -1); } public static void main(String[] args) throws Exception{ if (args.length != 3) { help(); } BufferedReader br = null; String format = args[1]; String alpha = args[2]; try { br = new BufferedReader(new FileReader(args[0])); SequenceIterator seqi = (SequenceIterator)SeqIOTools.fileToBiojava(format, alpha, br); //for each sequence while(seqi.hasNext()){ Sequence seq = seqi.nextSequence(); //for each frame for (int i = 0; i < 3; i++) { SymbolList prot; Sequence trans; //take the reading frame SymbolList syms = seq.subList( i+1, seq.length() - (seq.length() - i)%3); //if it is DNA transcribe it to RNA if(syms.getAlphabet() == DNATools.getDNA()){ syms = RNATools.transcribe(syms); } //output forward translation to STDOUT prot = RNATools.translate(syms); trans = SequenceTools.createSequence(prot, "", seq.getName()+ "TranslationFrame: +"+i, Annotation.EMPTY_ANNOTATION); SeqIOTools.writeFasta(System.out, trans); //output reverse frame translation to STDOUT syms = RNATools.reverseComplement(syms); prot = RNATools.translate(syms); trans = SequenceTools.createSequence(prot, "", seq.getName() + "TranslationFrame: -" + i, Annotation.EMPTY_ANNOTATION); SeqIOTools.writeFasta(System.out, trans); } } } finally { if(br != null){ br.close(); } } } } "Jellema, Gera" Sent by: biojava-l-bounces@portal.open-bio.org 03/04/2004 09:30 PM To: cc: Subject: [Biojava-l] Reading frames and amino acids Hi, I'm new to biojava so I don't know if this has already been asked. I have a genome sequence and I want to have it in the 6 reading frames, and then per reading frame translated into amino acids so I can look for proteins. I don't know how I have to do it. Thanks, Gera _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l [ Attachment ''WINMAIL.DAT'' removed by Mark Schreiber ] From dmb at mrc-dunn.cam.ac.uk Fri Mar 5 04:29:53 2004 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Fri Mar 5 04:33:29 2004 Subject: [Biojava-l] Reading frames and amino acids In-Reply-To: Message-ID: I have seen 'biojava in anger', but is their a 'biojava cookbook'? It strikes me that lots of small scripts like this could be useful for lots of people if archived properly. Does the cookbook exist? Cheers, Dan. On Fri, 5 Mar 2004 mark.schreiber@group.novartis.com wrote: > Hi Gera - > > The code below seems to work for me. Please note that I have not tested it > thoroughly so you might want to eyeball a few results to make sure they > are sensible. > > You could probably improve it by making some of the 'in line' code into > methods etc. > > - Mark > > import java.io.*; > > import org.biojava.bio.*; > import org.biojava.bio.seq.*; > import org.biojava.bio.seq.io.*; > import org.biojava.bio.symbol.*; > > /** > *

Program to six-frame translate a nucleotide sequence

> */ > > public class Hex { > /** > * Call this to get usage info, program terminates after call. > */ > public static void help() { > System.out.println( > "usage: java utils.Hex "); > System.exit( -1); > } > > public static void main(String[] args) throws Exception{ > if (args.length != 3) { > help(); > } > > BufferedReader br = null; > String format = args[1]; > String alpha = args[2]; > > try { > br = new BufferedReader(new FileReader(args[0])); > > SequenceIterator seqi = > (SequenceIterator)SeqIOTools.fileToBiojava(format, alpha, br); > > //for each sequence > while(seqi.hasNext()){ > Sequence seq = seqi.nextSequence(); > > //for each frame > for (int i = 0; i < 3; i++) { > SymbolList prot; > Sequence trans; > > //take the reading frame > SymbolList syms = seq.subList( > i+1, > seq.length() - (seq.length() - i)%3); > > > //if it is DNA transcribe it to RNA > if(syms.getAlphabet() == DNATools.getDNA()){ > syms = RNATools.transcribe(syms); > } > > //output forward translation to STDOUT > prot = RNATools.translate(syms); > trans = SequenceTools.createSequence(prot, "", > seq.getName()+ > "TranslationFrame: +"+i, > Annotation.EMPTY_ANNOTATION); > SeqIOTools.writeFasta(System.out, trans); > > //output reverse frame translation to STDOUT > syms = RNATools.reverseComplement(syms); > prot = RNATools.translate(syms); > trans = SequenceTools.createSequence(prot, "", > seq.getName() + > "TranslationFrame: -" + i, > Annotation.EMPTY_ANNOTATION); > SeqIOTools.writeFasta(System.out, trans); > } > } > } > finally { > if(br != null){ > br.close(); > } > } > } > } > > > > > > "Jellema, Gera" > Sent by: biojava-l-bounces@portal.open-bio.org > 03/04/2004 09:30 PM > > > To: > cc: > Subject: [Biojava-l] Reading frames and amino acids > > > Hi, > I'm new to biojava so I don't know if this has already been asked. I have > a genome sequence and I want to have it in the 6 reading frames, and then > per reading frame translated into amino acids so I can look for proteins. > I don't know how I have to do it. > Thanks, > Gera > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > [ Attachment ''WINMAIL.DAT'' removed by Mark Schreiber ] > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From mark.schreiber at group.novartis.com Fri Mar 5 04:35:14 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Fri Mar 5 04:38:56 2004 Subject: [Biojava-l] Reading frames and amino acids Message-ID: That is the intention of the 'biojava in anger' site. I just haven't had time to add lots of little scripts to it. I will gratefully take donations. - Mark Dan Bolser 03/05/2004 05:29 PM To: Mark Schreiber/GP/Novartis@PH cc: "Jellema, Gera" , biojava-l@portal.open-bio.org, biojava-l-bounces@portal.open-bio.org Subject: Re: [Biojava-l] Reading frames and amino acids I have seen 'biojava in anger', but is their a 'biojava cookbook'? It strikes me that lots of small scripts like this could be useful for lots of people if archived properly. Does the cookbook exist? Cheers, Dan. On Fri, 5 Mar 2004 mark.schreiber@group.novartis.com wrote: > Hi Gera - > > The code below seems to work for me. Please note that I have not tested it > thoroughly so you might want to eyeball a few results to make sure they > are sensible. > > You could probably improve it by making some of the 'in line' code into > methods etc. > > - Mark > > import java.io.*; > > import org.biojava.bio.*; > import org.biojava.bio.seq.*; > import org.biojava.bio.seq.io.*; > import org.biojava.bio.symbol.*; > > /** > *

Program to six-frame translate a nucleotide sequence

> */ > > public class Hex { > /** > * Call this to get usage info, program terminates after call. > */ > public static void help() { > System.out.println( > "usage: java utils.Hex "); > System.exit( -1); > } > > public static void main(String[] args) throws Exception{ > if (args.length != 3) { > help(); > } > > BufferedReader br = null; > String format = args[1]; > String alpha = args[2]; > > try { > br = new BufferedReader(new FileReader(args[0])); > > SequenceIterator seqi = > (SequenceIterator)SeqIOTools.fileToBiojava(format, alpha, br); > > //for each sequence > while(seqi.hasNext()){ > Sequence seq = seqi.nextSequence(); > > //for each frame > for (int i = 0; i < 3; i++) { > SymbolList prot; > Sequence trans; > > //take the reading frame > SymbolList syms = seq.subList( > i+1, > seq.length() - (seq.length() - i)%3); > > > //if it is DNA transcribe it to RNA > if(syms.getAlphabet() == DNATools.getDNA()){ > syms = RNATools.transcribe(syms); > } > > //output forward translation to STDOUT > prot = RNATools.translate(syms); > trans = SequenceTools.createSequence(prot, "", > seq.getName()+ > "TranslationFrame: +"+i, > Annotation.EMPTY_ANNOTATION); > SeqIOTools.writeFasta(System.out, trans); > > //output reverse frame translation to STDOUT > syms = RNATools.reverseComplement(syms); > prot = RNATools.translate(syms); > trans = SequenceTools.createSequence(prot, "", > seq.getName() + > "TranslationFrame: -" + i, > Annotation.EMPTY_ANNOTATION); > SeqIOTools.writeFasta(System.out, trans); > } > } > } > finally { > if(br != null){ > br.close(); > } > } > } > } > > > > > > "Jellema, Gera" > Sent by: biojava-l-bounces@portal.open-bio.org > 03/04/2004 09:30 PM > > > To: > cc: > Subject: [Biojava-l] Reading frames and amino acids > > > Hi, > I'm new to biojava so I don't know if this has already been asked. I have > a genome sequence and I want to have it in the 6 reading frames, and then > per reading frame translated into amino acids so I can look for proteins. > I don't know how I have to do it. > Thanks, > Gera > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > [ Attachment ''WINMAIL.DAT'' removed by Mark Schreiber ] > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From off2w0rk at yahoo.com Fri Mar 5 22:09:18 2004 From: off2w0rk at yahoo.com (facemann) Date: Sat Mar 6 02:31:22 2004 Subject: [Biojava-l] Reading frames and amino acids Message-ID: <20040306030918.83686.qmail@web41313.mail.yahoo.com> Here is a small contribution. I use it to find simple motifs. Feel free to edit or scrap. /** *MotifLister.java *Andy Hammer *08 Aug 2003 *Lists all instances of a motif in specified (dna\rna\protein) fasta file. *The motif can contain Ambiguity symbols *Lists the ORF title and position of motif *Outputs a list of counts to stdout. */ import java.io.*; import java.util.*; import java.util.regex.*; import org.biojava.bio.*; import org.biojava.bio.seq.*; import org.biojava.bio.seq.io.*; import org.biojava.bio.symbol.*; public class MotifLister{ public MotifLister(String type, String inputFile, String target, String placement)throws Exception{ System.out.println("MotifLister is searching file " + inputFile + " for the motif '" + target + "' in frame " + placement + "."); try{ if(type.equalsIgnoreCase("dna")){ motif = DNATools.createDNA(target); }else if(type.equalsIgnoreCase("rna")){ motif = RNATools.createRNA(target); }else{ motif = ProteinTools.createProtein(target); } } catch(BioError e){ System.out.println("Error!! Data type must match type of motif."); System.out.println("Specifically, " + target + " is not " + type); System.exit(0); } Pattern p = Pattern.compile( MotifTools.createRegex(motif) ); frame = Integer.parseInt(placement); if(frame < 0 || frame > 3){ System.out.println("Only frames 0 through 3 are alloweds"); System.out.println("frame zero searches all frames."); System.exit(0); } count = 0; //read the file //input FileInputStream fis = new FileInputStream(inputFile); InputStreamReader isr = new InputStreamReader(fis); BufferedReader input = new BufferedReader(isr); try{ if(type.equalsIgnoreCase("dna")){ si = SeqIOTools.readFastaDNA(input); }else if(type.equalsIgnoreCase("rna")){ si = SeqIOTools.readFastaRNA(input); }else{ si = SeqIOTools.readFastaProtein(input); } while (si.hasNext()){ Sequence seq = si.nextSequence(); Matcher matcher = p.matcher(seq.seqString()); int start = 0; while(matcher.find(start)) { start = matcher.start(); int end = matcher.end(); int result = (start % 3) + 1; if(result == frame || frame == 0){ System.out.println(seq.getName() + " : " + "[" + (start + 1) + "," + (end) + "]"); count++; } start++; } } input.close(); //close the file System.out.println("Total Hits = " + count); } catch(BioException e){ System.out.println(inputFile + " is not a " + type + " file."); System.out.println(e); } } public static void main(String[] args)throws Exception{ if (args.length < 4) { System.err.println(" Usage: >java -jar MotifLister.jar type fastaFile motif frame" + "\n Ex: >java -jar MotifLister.jar dna eColi.fasta AAAAAAG 3 > output.txt" + "\n would search for A AAA AAG in the third frame in dna file eColi.fasta" + "\n and print the results to file output.txt." + "\n 'type' can be dna, rna, or protein." + "\n 'frame' can be integers 0 through 3." + "\n 0 counts any instance of the motif." + "\n 1, 2, 3 counts only instances of the motif in the specified frame." + "\n Capture output with redirection operator '>'."); }else{ MotifLister ML = new MotifLister(args[0], args[1], args[2], args[3]); } } private SymbolList motif; private int frame; private int count; private SequenceIterator si; } --------------------------------- Do you Yahoo!? Yahoo! Search - Find what you’re looking for faster. From mark.schreiber at group.novartis.com Sun Mar 7 20:00:34 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Sun Mar 7 20:03:52 2004 Subject: [Biojava-l] Reading frames and amino acids Message-ID: Thanks - Give me a few days and I'll put this up. If I don't please nag me :) - Mark facemann Sent by: biojava-l-bounces@portal.open-bio.org 03/06/2004 11:09 AM To: biojava-l@biojava.org cc: Subject: [Biojava-l] Reading frames and amino acids Here is a small contribution. I use it to find simple motifs. Feel free to edit or scrap. /** *MotifLister.java *Andy Hammer *08 Aug 2003 *Lists all instances of a motif in specified (dna\rna\protein) fasta file. *The motif can contain Ambiguity symbols *Lists the ORF title and position of motif *Outputs a list of counts to stdout. */ import java.io.*; import java.util.*; import java.util.regex.*; import org.biojava.bio.*; import org.biojava.bio.seq.*; import org.biojava.bio.seq.io.*; import org.biojava.bio.symbol.*; public class MotifLister{ public MotifLister(String type, String inputFile, String target, String placement)throws Exception{ System.out.println("MotifLister is searching file " + inputFile + " for the motif '" + target + "' in frame " + placement + "."); try{ if(type.equalsIgnoreCase("dna")){ motif = DNATools.createDNA(target); }else if(type.equalsIgnoreCase("rna")){ motif = RNATools.createRNA(target); }else{ motif = ProteinTools.createProtein(target); } } catch(BioError e){ System.out.println("Error!! Data type must match type of motif."); System.out.println("Specifically, " + target + " is not " + type); System.exit(0); } Pattern p = Pattern.compile( MotifTools.createRegex(motif) ); frame = Integer.parseInt(placement); if(frame < 0 || frame > 3){ System.out.println("Only frames 0 through 3 are alloweds"); System.out.println("frame zero searches all frames."); System.exit(0); } count = 0; //read the file //input FileInputStream fis = new FileInputStream(inputFile); InputStreamReader isr = new InputStreamReader(fis); BufferedReader input = new BufferedReader(isr); try{ if(type.equalsIgnoreCase("dna")){ si = SeqIOTools.readFastaDNA(input); }else if(type.equalsIgnoreCase("rna")){ si = SeqIOTools.readFastaRNA(input); }else{ si = SeqIOTools.readFastaProtein(input); } while (si.hasNext()){ Sequence seq = si.nextSequence(); Matcher matcher = p.matcher(seq.seqString()); int start = 0; while(matcher.find(start)) { start = matcher.start(); int end = matcher.end(); int result = (start % 3) + 1; if(result == frame || frame == 0){ System.out.println(seq.getName() + " : " + "[" + (start + 1) + "," + (end) + "]"); count++; } start++; } } input.close(); //close the file System.out.println("Total Hits = " + count); } catch(BioException e){ System.out.println(inputFile + " is not a " + type + " file."); System.out.println(e); } } public static void main(String[] args)throws Exception{ if (args.length < 4) { System.err.println(" Usage: >java -jar MotifLister.jar type fastaFile motif frame" + "\n Ex: >java -jar MotifLister.jar dna eColi.fasta AAAAAAG 3 > output.txt" + "\n would search for A AAA AAG in the third frame in dna file eColi.fasta" + "\n and print the results to file output.txt." + "\n 'type' can be dna, rna, or protein." + "\n 'frame' can be integers 0 through 3." + "\n 0 counts any instance of the motif." + "\n 1, 2, 3 counts only instances of the motif in the specified frame." + "\n Capture output with redirection operator '>'."); }else{ MotifLister ML = new MotifLister(args[0], args[1], args[2], args[3]); } } private SymbolList motif; private int frame; private int count; private SequenceIterator si; } --------------------------------- Do you Yahoo!? Yahoo! Search - Find what you're looking for faster._______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From Gera.Jellema at wur.nl Mon Mar 8 10:50:40 2004 From: Gera.Jellema at wur.nl (Jellema, Gera) Date: Mon Mar 8 10:56:25 2004 Subject: [Biojava-l] Reading frames and amino acids Message-ID: I tried to compile the program Mark Schreiber wrote but the only thing I get is: Hex.java:13: illegal character: \154 /** ^ Hex.java:16: illegal character: \154 public static void help() { ^ Hex.java:17: illegal character: \154 System.out.println( ^ Hex.java:17: illegal character: \154 System.out.println( ^ Hex.java:17: illegal character: \154 System.out.println( ^ Hex.java:18: illegal character: \154 "usage: java utils.Hex "); ^ Hex.java:18: illegal character: \154 "usage: java utils.Hex "); ^ In total I get 100 of these error messages, and I don't understand them. Can you help me? Gera -----Oorspronkelijk bericht----- Van: mark.schreiber@group.novartis.com [mailto:mark.schreiber@group.novartis.com] Verzonden: vr 5-3-2004 10:35 Aan: Dan Bolser CC: biojava-l@portal.open-bio.org; biojava-l-bounces@portal.open-bio.org; Jellema, Gera Onderwerp: Re: [Biojava-l] Reading frames and amino acids That is the intention of the 'biojava in anger' site. I just haven't had time to add lots of little scripts to it. I will gratefully take donations. - Mark Dan Bolser 03/05/2004 05:29 PM To: Mark Schreiber/GP/Novartis@PH cc: "Jellema, Gera" , biojava-l@portal.open-bio.org, biojava-l-bounces@portal.open-bio.org Subject: Re: [Biojava-l] Reading frames and amino acids I have seen 'biojava in anger', but is their a 'biojava cookbook'? It strikes me that lots of small scripts like this could be useful for lots of people if archived properly. Does the cookbook exist? Cheers, Dan. On Fri, 5 Mar 2004 mark.schreiber@group.novartis.com wrote: > Hi Gera - > > The code below seems to work for me. Please note that I have not tested it > thoroughly so you might want to eyeball a few results to make sure they > are sensible. > > You could probably improve it by making some of the 'in line' code into > methods etc. > > - Mark > > import java.io.*; > > import org.biojava.bio.*; > import org.biojava.bio.seq.*; > import org.biojava.bio.seq.io.*; > import org.biojava.bio.symbol.*; > > /** > *

Program to six-frame translate a nucleotide sequence

> */ > > public class Hex { > /** > * Call this to get usage info, program terminates after call. > */ > public static void help() { > System.out.println( > "usage: java utils.Hex "); > System.exit( -1); > } > > public static void main(String[] args) throws Exception{ > if (args.length != 3) { > help(); > } > > BufferedReader br = null; > String format = args[1]; > String alpha = args[2]; > > try { > br = new BufferedReader(new FileReader(args[0])); > > SequenceIterator seqi = > (SequenceIterator)SeqIOTools.fileToBiojava(format, alpha, br); > > //for each sequence > while(seqi.hasNext()){ > Sequence seq = seqi.nextSequence(); > > //for each frame > for (int i = 0; i < 3; i++) { > SymbolList prot; > Sequence trans; > > //take the reading frame > SymbolList syms = seq.subList( > i+1, > seq.length() - (seq.length() - i)%3); > > > //if it is DNA transcribe it to RNA > if(syms.getAlphabet() == DNATools.getDNA()){ > syms = RNATools.transcribe(syms); > } > > //output forward translation to STDOUT > prot = RNATools.translate(syms); > trans = SequenceTools.createSequence(prot, "", > seq.getName()+ > "TranslationFrame: +"+i, > Annotation.EMPTY_ANNOTATION); > SeqIOTools.writeFasta(System.out, trans); > > //output reverse frame translation to STDOUT > syms = RNATools.reverseComplement(syms); > prot = RNATools.translate(syms); > trans = SequenceTools.createSequence(prot, "", > seq.getName() + > "TranslationFrame: -" + i, > Annotation.EMPTY_ANNOTATION); > SeqIOTools.writeFasta(System.out, trans); > } > } > } > finally { > if(br != null){ > br.close(); > } > } > } > } > > > > > > "Jellema, Gera" > Sent by: biojava-l-bounces@portal.open-bio.org > 03/04/2004 09:30 PM > > > To: > cc: > Subject: [Biojava-l] Reading frames and amino acids > > > Hi, > I'm new to biojava so I don't know if this has already been asked. I have > a genome sequence and I want to have it in the 6 reading frames, and then > per reading frame translated into amino acids so I can look for proteins. > I don't know how I have to do it. > Thanks, > Gera > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > [ Attachment ''WINMAIL.DAT'' removed by Mark Schreiber ] > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From smh1008 at cus.cam.ac.uk Mon Mar 8 11:37:16 2004 From: smh1008 at cus.cam.ac.uk (David Huen) Date: Mon Mar 8 11:43:18 2004 Subject: [Biojava-l] Reading frames and amino acids In-Reply-To: References: Message-ID: <200403081637.16804.smh1008@cus.cam.ac.uk> On Monday 08 Mar 2004 3:50 pm, Jellema, Gera wrote: > I tried to compile the program Mark Schreiber wrote but the only thing I > get is: Hex.java:13: illegal character: \154 > /** > ^ > Hex.java:16: illegal character: \154 > public static void help() { > ^ > Hex.java:17: illegal character: \154 > System.out.println( Is there any possibility that you are using an editor that inserts invisible characters and/or some usual end-of-line character? You need to save in plain text. Also I don't know how well javac will tolerate DOS line ends. Regards, David Huen From mark.schreiber at group.novartis.com Wed Mar 10 20:08:32 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Wed Mar 10 20:14:19 2004 Subject: [Biojava-l] BJIA updates Message-ID: Hi All - I have updated the BioJava in Anger site (http://www.biojava.org/docs/bj_in_anger/) adding some new tutorials and a bug fix. Specifically, there are new tutorials on: * Calculating mass and pI of a peptide (http://www.biojava.org/docs/bj_in_anger/calcmass.html) * Creating a regular expression from a sequence motif (http://www.biojava.org/docs/bj_in_anger/regex.html) thanks to Andy Hammer for this sample! * Performing a six frame translation (http://www.biojava.org/docs/bj_in_anger/sixframetranslate.html) I have also fixed the bug in the view sequence demo (http://www.biojava.org/docs/bj_in_anger/NameChange.htm). This now uses the recommended API for generating ViewSequences. Enjoy - Mark Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 1 Science Park Road #04-14 The Capricorn, Science Park II Singapore 117528 phone +65 6722 2973 fax +65 6722 2910 From dag at sonsorol.org Wed Mar 10 21:30:13 2004 From: dag at sonsorol.org (Chris Dagdigian) Date: Wed Mar 10 21:35:53 2004 Subject: [Biojava-l] O|B|F mail update -- making progress on anti-spam issues with our mailing lists Message-ID: <404FCF35.5010705@sonsorol.org> Hi folks, Apologies for the cross-posting but I just wanted to give our list members and admins an update on some new anti-spam measures we have (re)enabled. Good news to report basically... The most annoying spams recently have been the simple plain text messages without any HTML, attachments or mime-encoding that just slip right by our filters. Some lists have been forced to switch over to "only members can post" while other lists (like bioperl) have consistantly voted to stay as open as possible. I'll update you on our current efforts as well as a new effort that is about 24 hours old but already working really well so far. Until yesterday we had three main lines of defense against spam: 1. The mailserver itself (rejects mail from nonexistant domains, etc.) 2. The sendmail Mail::Milter extention (MIMEDefang+SpamAssassin are used to scan all incoming messages. Anything that scores higher than 8.0 is simply discarded automatically. MIMEDefang also strips dangerous attachments like .exe and .pif) 3. Our mailing list moderation queue (emails with attachments, odd MIME encodings and spamassassin scores from 0.0 - 7.9 are held in a moderator queue for a human to make an accept/discard decision) Here are some stats on how this system worked over the past few days: o 138 attempts to relay mail through our server blocked o 192 emails blocked due to forged or unresolvable sender domain o 577 emails discarded automatically by SpamAssassin+MIMEDefang This system worked *ok* but put a lot of work onto the shoulders of our list admins who constantly had to weed out the spam caught up in the mailing list moderator system. Yesterday I brought online another system that seems to be already working really well. It catches spam before we even accept it on our server which makes the load easier on both our scanning software and our human list moderators. The system is the RBL+ blackhole list from http://www.mail-abuse.org and the way it works is that we now query (via DNS) the RBL+ database each time someone connects to our mail server. If the RBL check against the sender IP address comes back as "positive" we reject the incoming email. RBL+ is a combination of four constantly updated databases: 1. RBL -- IP addresses of known, documented spammers and spam machines 2. RSS -- IP addresses of documented/tested unsecured email relays 3. OPS -- IP addresses of documented open proxy servers w/ spam history 3. DUL -- IP addresses belonging to ISP dialup and DHCP customers We have already blocked 137 email attempts in the last 24 hours from machines that were listed in one or more of the RBL databases. It is too soon to tell but if the RBL+ system plus our existing anti-spam measures work well enough we may be in a position where our "closed" mailing lists could revert back to being 'anyone can post'. Feedback appreciated. Especially if you get a "reject" message from us saying that you are listed in the RBL+ blackhole database! Regards, Chris O|B|F From DMGoodstein at lbl.gov Thu Mar 11 15:37:31 2004 From: DMGoodstein at lbl.gov (DMGoodstein@lbl.gov) Date: Thu Mar 11 15:43:09 2004 Subject: [Biojava-l] BlastXMLParserFacade Message-ID: <43dfea43ae54.43ae5443dfea@lbl.gov> I was wondering if anyone has successfully gotten BlastXMLParserFacade to work on an xml style NCBI blast output file (version 2.2.3)? I'm getting null pointer exceptions deep within the crimson parser implementation classes, but I now i'm correctly passing in the NCBI output file, since the dtd is getting read. --David Goodstein Joint Genome Institute /usr/java/j2sdk1.4.1_02/bin/java2/bin/java -classpath .:./biojava-1.3.1 .1.jar BlastParser fred.xml a/j2sdk1.4.1_02/bin/javac -classpath .:./biojava-1.3. java.lang.NullPointerException at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:524) at org.apache.crimson.parser.Parser2.parse(Parser2.java:305) at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442) at org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:172) at BlastParser.main(BlastParser.java:44) 1.jar BlastParser ./fred.xmlava/j2sdk1.4.1_02/bin/java -classpath .:./biojava-1.3.1 java.lang.NullPointerException at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:524) at org.apache.crimson.parser.Parser2.parse(Parser2.java:305) at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442) at org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:172) at BlastParser.main(BlastParser.java:44) From sacoca at MCB.McGill.CA Thu Mar 11 17:11:02 2004 From: sacoca at MCB.McGill.CA (sacoca@MCB.McGill.CA) Date: Thu Mar 11 17:16:37 2004 Subject: [Biojava-l] Parameter Settings in BaumWelchTraining Message-ID: <3505.66.131.112.37.1079043062.squirrel@mail.MCB.McGill.CA> Hi all. I'm trying to optimize the transition states probabilities for my HMM. I already have set them to values which I think are pretty good. Since I know the Baum Welch can only help with the scores and optimize them up to a local maxima I thought of using the parameters I calculated as a starting point. The problem is that I don't know how! I followed the example in biojava: .... //train the model to have uniform parameters ModelTrainer mt = new SimpleModelTrainer(); //register the model to train mt.registerModel(hmm); I want to use the values already set in my hmm as the starting parameters in the BaumWelch. I don't want to use the uniform distribution as indicated below! //as no other counts are being used the null weight will cause everything to be uniform mt.setNullModelWeight(1.0); mt.train(); I tried adding counts and looking up examples on the net but ended up more confused than I started. How do I use the addCounts to make this work! Stephane Acoca Master's Student McGill Center for Bioinformatics From mark.schreiber at group.novartis.com Thu Mar 11 20:58:52 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Thu Mar 11 21:04:33 2004 Subject: [Biojava-l] Parameter Settings in BaumWelchTraining Message-ID: Hi Stephane - Within EmissionState you can set a Distribution that contains emission probabilities for the Symbols states emission alphabet using the setDistribution method. This Distribution will be your predetermined weights. To set the transition probabilities you can use the setWeights(State source, Distribution weights). The source is the state you are transitioning from and the weights is the probability of transitioning to any State that the source connects too. Because States implement Symbol you can put them in a Distribution. To make a Distribution of States that state 'a' could connect to use the following pseudo code: State a; Model m; FiniteAlphabet endPoints; endPoints = m.transitionsFrom(a); Distribution d = DistributionFactory.DEFAULT.createDistribution(endPoints); //You can then train d or set it's weights and put it back in the model with m.setWeights(a, d); Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 1 Science Park Road #04-14 The Capricorn, Science Park II Singapore 117528 phone +65 6722 2973 fax +65 6722 2910 sacoca@mcb.mcgill.ca Sent by: biojava-l-bounces@portal.open-bio.org 03/12/2004 06:11 AM To: "Biojava Mailing List" cc: Subject: [Biojava-l] Parameter Settings in BaumWelchTraining Hi all. I'm trying to optimize the transition states probabilities for my HMM. I already have set them to values which I think are pretty good. Since I know the Baum Welch can only help with the scores and optimize them up to a local maxima I thought of using the parameters I calculated as a starting point. The problem is that I don't know how! I followed the example in biojava: .... //train the model to have uniform parameters ModelTrainer mt = new SimpleModelTrainer(); //register the model to train mt.registerModel(hmm); I want to use the values already set in my hmm as the starting parameters in the BaumWelch. I don't want to use the uniform distribution as indicated below! //as no other counts are being used the null weight will cause everything to be uniform mt.setNullModelWeight(1.0); mt.train(); I tried adding counts and looking up examples on the net but ended up more confused than I started. How do I use the addCounts to make this work! Stephane Acoca Master's Student McGill Center for Bioinformatics _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From sacoca at MCB.McGill.CA Fri Mar 12 00:28:05 2004 From: sacoca at MCB.McGill.CA (sacoca@MCB.McGill.CA) Date: Fri Mar 12 00:33:38 2004 Subject: [Biojava-l] Parameter Settings in BaumWelchTraining In-Reply-To: References: Message-ID: <4182.66.131.112.37.1079069285.squirrel@mail.MCB.McGill.CA> > Hi Stephane - > > Within EmissionState you can set a Distribution that contains emission > probabilities for the Symbols states emission alphabet using the > setDistribution method. This Distribution will be your predetermined > weights. > > To set the transition probabilities you can use the setWeights(State > source, Distribution weights). The source is the state you are > transitioning from and the weights is the probability of transitioning to > any State that the source connects too. Because States implement Symbol > you can put them in a Distribution. > > To make a Distribution of States that state 'a' could connect to use the > following pseudo code: > > State a; > Model m; > FiniteAlphabet endPoints; > > endPoints = m.transitionsFrom(a); > Distribution d = > DistributionFactory.DEFAULT.createDistribution(endPoints); > > //You can then train d or set it's weights and put it back in the model > with > > m.setWeights(a, d); > > Mark Schreiber > Principal Scientist (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 1 Science Park Road > #04-14 The Capricorn, Science Park II > Singapore 117528 > > phone +65 6722 2973 > fax +65 6722 2910 > > > > > > sacoca@mcb.mcgill.ca > Sent by: biojava-l-bounces@portal.open-bio.org > 03/12/2004 06:11 AM > > > To: "Biojava Mailing List" > cc: > Subject: [Biojava-l] Parameter Settings in > BaumWelchTraining > > > Hi all. I'm trying to optimize the transition states probabilities for my > HMM. I already have set them to values which I think are pretty good. > Since I know the Baum Welch can only help with the scores and optimize > them up to a local maxima I thought of using the parameters I calculated > as a starting point. The problem is that I don't know how! > I followed the example in biojava: > > .... > //train the model to have uniform parameters > ModelTrainer mt = new SimpleModelTrainer(); > //register the model to train > mt.registerModel(hmm); > > I want to use the values already set in my hmm as the starting parameters > in the BaumWelch. I don't want to use the uniform distribution as > indicated below! > > //as no other counts are being used the null weight will cause > everything to be uniform > mt.setNullModelWeight(1.0); > mt.train(); > > I tried adding counts and looking up examples on the net but ended up more > confused than I started. How do I use the addCounts to make this work! > > Stephane Acoca > Master's Student > McGill Center for Bioinformatics > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > From sacoca at MCB.McGill.CA Fri Mar 12 00:30:16 2004 From: sacoca at MCB.McGill.CA (sacoca@MCB.McGill.CA) Date: Fri Mar 12 00:35:49 2004 Subject: [Biojava-l] Parameter Settings in BaumWelchTraining] Message-ID: <4195.66.131.112.37.1079069416.squirrel@mail.MCB.McGill.CA> Sorry for the previous error. ---------------------------- Original Message ---------------------------- Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining From: sacoca@MCB.McGill.CA Date: Fri, March 12, 2004 12:27 am To: mark.schreiber@group.novartis.com -------------------------------------------------------------------------- Here is the code I have for the training. Using what you told me below, I can retreive all of the weights that I calculated manually for the hmm (distributions for the transitions and distributions for the alphabet of each state). What I do not understand is how to use this information and the sequences stored in a file to run the BaumWelchAlgorithm and then retreive the optimized values calculated by the algorithm to set them back into my HMM. //Retreive the alphabet of all states FiniteAlphabet SA = hmm.stateAlphabet(); Iterator i = SA.iterator(); SimpleModelTrainer MT = new SimpleModelTrainer(); MT.registerModel(hmm); //go through each state while(i.hasNext()) {Symbol Currentstate = (Symbol)i.next(); //Retreive the distribution of all transitions from the current state FiniteAlphabet From = hmm.transitionsFrom((State)Currentstate); Distribution d = hmm.getWeights((State)Currentstate); Iterator i2 = From.iterator(); //go through it and look at all the weights for each of the transitions while(i2.hasNext()) {Symbol s = (Symbol)i2.next(); System.out.println("From state "+Currentstate.getName()+ "To State "+s.getName()+ "Weight "+d.getWeight(s));} //get the distribution for the alphabet of the current state Distribution d2 =((EmissionState)Currentstate).getDistribution(); FiniteAlphabet IN = (FiniteAlphabet)hmm.emissionAlphabet(); Iterator i3 = IN.iterator(); //you can go through it the same way as above using a while loop ***************************************************************** This is what I don't understand!!!! ***************************************************************** here, we have a set of training sequences stored in a file in fasta format that i'd like to use with the BaumWelch algorithm to optimize the transition distributions mentionned above. //This is the file with all the training sequences BufferedInputStream is = new BufferedInputStream(new FileInputStream("z:/Sequences.faa")); //Load the file with the SequenceDB class SequenceDB DB = SeqIOTools.readFasta(is, ProtAlphabet); //use 100 cycles as the stop criteria StoppingCriteria stopper = new StoppingCriteria() {public boolean isTrainingComplete(TrainingAlgorithm ta) {return (ta.getCycle() > 100);}}; ***************************************** This part is what I am clueless about ***************************************** //How do I optimize my hmm with the BaumWelch algorithm and retreive //the optimized values ? How do I train the distribution above with //the baum welch and the sequences that I have ? DP dp= DPFactory.DEFAULT.createDP(hmm); BaumWelchTrainer bwt = new BaumWelchTrainer(dp); } PS : I do not know why you are helping all of us here but thank you. It makes Biojava a lot easier to deal with. Steve > Hi Stephane - > > Within EmissionState you can set a Distribution that contains emission probabilities for the Symbols states emission alphabet using the setDistribution method. This Distribution will be your predetermined weights. > > To set the transition probabilities you can use the setWeights(State source, Distribution weights). The source is the state you are > transitioning from and the weights is the probability of transitioning to any State that the source connects too. Because States implement Symbol you can put them in a Distribution. > > To make a Distribution of States that state 'a' could connect to use the following pseudo code: > > State a; > Model m; > FiniteAlphabet endPoints; > > endPoints = m.transitionsFrom(a); > Distribution d = > DistributionFactory.DEFAULT.createDistribution(endPoints); > > //You can then train d or set it's weights and put it back in the model with > > m.setWeights(a, d); > > Mark Schreiber > Principal Scientist (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 1 Science Park Road > #04-14 The Capricorn, Science Park II > Singapore 117528 > > phone +65 6722 2973 > fax +65 6722 2910 > > > > > > sacoca@mcb.mcgill.ca > Sent by: biojava-l-bounces@portal.open-bio.org > 03/12/2004 06:11 AM > > > To: "Biojava Mailing List" > cc: > Subject: [Biojava-l] Parameter Settings in > BaumWelchTraining > > > Hi all. I'm trying to optimize the transition states probabilities for my HMM. I already have set them to values which I think are pretty good. Since I know the Baum Welch can only help with the scores and optimize them up to a local maxima I thought of using the parameters I calculated as a starting point. The problem is that I don't know how! > I followed the example in biojava: > > .... > //train the model to have uniform parameters > ModelTrainer mt = new SimpleModelTrainer(); > //register the model to train > mt.registerModel(hmm); > > I want to use the values already set in my hmm as the starting parameters in the BaumWelch. I don't want to use the uniform distribution as indicated below! > > //as no other counts are being used the null weight will cause > everything to be uniform > mt.setNullModelWeight(1.0); > mt.train(); > > I tried adding counts and looking up examples on the net but ended up more confused than I started. How do I use the addCounts to make this work! > > Stephane Acoca > Master's Student > McGill Center for Bioinformatics > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > From mark.schreiber at group.novartis.com Fri Mar 12 01:00:43 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Fri Mar 12 01:06:22 2004 Subject: [Biojava-l] Parameter Settings in BaumWelchTraining] Message-ID: When you call the train() method of the BaumWelchTrainer you supply it with a SequenceDB. The sequences from this DB are used to optimize the weights of the model. However, I have a bad feeling that when you train your model with the BaumWelchTrainer your previously set counts will be ignored and overwritten. You could check by looking into AbstractModelTrainer.train() (which is what the BaumWelchTrainer extends). You could also run some tests to see if using a pre-trained model makes any difference to the final outcome. Does anyone more expert than me on the DP package (ie most people) know if the counts are overwritten? - Mark sacoca@mcb.mcgill.ca Sent by: biojava-l-bounces@portal.open-bio.org 03/12/2004 01:30 PM To: sacoca@mcb.mcgill.ca cc: Biojava Mailing List Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining] Sorry for the previous error. ---------------------------- Original Message ---------------------------- Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining From: sacoca@MCB.McGill.CA Date: Fri, March 12, 2004 12:27 am To: mark.schreiber@group.novartis.com -------------------------------------------------------------------------- Here is the code I have for the training. Using what you told me below, I can retreive all of the weights that I calculated manually for the hmm (distributions for the transitions and distributions for the alphabet of each state). What I do not understand is how to use this information and the sequences stored in a file to run the BaumWelchAlgorithm and then retreive the optimized values calculated by the algorithm to set them back into my HMM. //Retreive the alphabet of all states FiniteAlphabet SA = hmm.stateAlphabet(); Iterator i = SA.iterator(); SimpleModelTrainer MT = new SimpleModelTrainer(); MT.registerModel(hmm); //go through each state while(i.hasNext()) {Symbol Currentstate = (Symbol)i.next(); //Retreive the distribution of all transitions from the current state FiniteAlphabet From = hmm.transitionsFrom((State)Currentstate); Distribution d = hmm.getWeights((State)Currentstate); Iterator i2 = From.iterator(); //go through it and look at all the weights for each of the transitions while(i2.hasNext()) {Symbol s = (Symbol)i2.next(); System.out.println("From state "+Currentstate.getName()+ "To State "+s.getName()+ "Weight "+d.getWeight(s));} //get the distribution for the alphabet of the current state Distribution d2 =((EmissionState)Currentstate).getDistribution(); FiniteAlphabet IN = (FiniteAlphabet)hmm.emissionAlphabet(); Iterator i3 = IN.iterator(); //you can go through it the same way as above using a while loop ***************************************************************** This is what I don't understand!!!! ***************************************************************** here, we have a set of training sequences stored in a file in fasta format that i'd like to use with the BaumWelch algorithm to optimize the transition distributions mentionned above. //This is the file with all the training sequences BufferedInputStream is = new BufferedInputStream(new FileInputStream("z:/Sequences.faa")); //Load the file with the SequenceDB class SequenceDB DB = SeqIOTools.readFasta(is, ProtAlphabet); //use 100 cycles as the stop criteria StoppingCriteria stopper = new StoppingCriteria() {public boolean isTrainingComplete(TrainingAlgorithm ta) {return (ta.getCycle() > 100);}}; ***************************************** This part is what I am clueless about ***************************************** //How do I optimize my hmm with the BaumWelch algorithm and retreive //the optimized values ? How do I train the distribution above with //the baum welch and the sequences that I have ? DP dp= DPFactory.DEFAULT.createDP(hmm); BaumWelchTrainer bwt = new BaumWelchTrainer(dp); } PS : I do not know why you are helping all of us here but thank you. It makes Biojava a lot easier to deal with. Steve > Hi Stephane - > > Within EmissionState you can set a Distribution that contains emission probabilities for the Symbols states emission alphabet using the setDistribution method. This Distribution will be your predetermined weights. > > To set the transition probabilities you can use the setWeights(State source, Distribution weights). The source is the state you are > transitioning from and the weights is the probability of transitioning to any State that the source connects too. Because States implement Symbol you can put them in a Distribution. > > To make a Distribution of States that state 'a' could connect to use the following pseudo code: > > State a; > Model m; > FiniteAlphabet endPoints; > > endPoints = m.transitionsFrom(a); > Distribution d = > DistributionFactory.DEFAULT.createDistribution(endPoints); > > //You can then train d or set it's weights and put it back in the model with > > m.setWeights(a, d); > > Mark Schreiber > Principal Scientist (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 1 Science Park Road > #04-14 The Capricorn, Science Park II > Singapore 117528 > > phone +65 6722 2973 > fax +65 6722 2910 > > > > > > sacoca@mcb.mcgill.ca > Sent by: biojava-l-bounces@portal.open-bio.org > 03/12/2004 06:11 AM > > > To: "Biojava Mailing List" > cc: > Subject: [Biojava-l] Parameter Settings in > BaumWelchTraining > > > Hi all. I'm trying to optimize the transition states probabilities for my HMM. I already have set them to values which I think are pretty good. Since I know the Baum Welch can only help with the scores and optimize them up to a local maxima I thought of using the parameters I calculated as a starting point. The problem is that I don't know how! > I followed the example in biojava: > > .... > //train the model to have uniform parameters > ModelTrainer mt = new SimpleModelTrainer(); > //register the model to train > mt.registerModel(hmm); > > I want to use the values already set in my hmm as the starting parameters in the BaumWelch. I don't want to use the uniform distribution as indicated below! > > //as no other counts are being used the null weight will cause > everything to be uniform > mt.setNullModelWeight(1.0); > mt.train(); > > I tried adding counts and looking up examples on the net but ended up more confused than I started. How do I use the addCounts to make this work! > > Stephane Acoca > Master's Student > McGill Center for Bioinformatics > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From dmb at mrc-dunn.cam.ac.uk Fri Mar 12 03:47:50 2004 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Fri Mar 12 03:50:11 2004 Subject: [Biojava-l] Parameter Settings in BaumWelchTraining] In-Reply-To: Message-ID: On Fri, 12 Mar 2004 mark.schreiber@group.novartis.com wrote: > When you call the train() method of the BaumWelchTrainer you supply it > with a SequenceDB. The sequences from this DB are used to optimize the > weights of the model. > > However, I have a bad feeling that when you train your model with the > BaumWelchTrainer your previously set counts will be ignored and > overwritten. You could check by looking into AbstractModelTrainer.train() > (which is what the BaumWelchTrainer extends). You could also run some > tests to see if using a pre-trained model makes any difference to the > final outcome. Does anyone more expert than me on the DP package (ie most > people) know if the counts are overwritten? The idea sounds good either way, so it would be a shame to have to reject it on the basis of a technicality :) Cheers > > - Mark > > > > > > sacoca@mcb.mcgill.ca > Sent by: biojava-l-bounces@portal.open-bio.org > 03/12/2004 01:30 PM > > > To: sacoca@mcb.mcgill.ca > cc: Biojava Mailing List > Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining] > > > Sorry for the previous error. > ---------------------------- Original Message ---------------------------- > Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining > From: sacoca@MCB.McGill.CA > Date: Fri, March 12, 2004 12:27 am > To: mark.schreiber@group.novartis.com > -------------------------------------------------------------------------- > > Here is the code I have for the training. Using what you told me below, I > can retreive all of the weights that I calculated manually for the hmm > (distributions for the transitions and distributions for the alphabet of > each state). What I do not understand is how to use this information and > the sequences stored in a file to run the BaumWelchAlgorithm and then > retreive the optimized values calculated by the algorithm to set them back > into my HMM. > > //Retreive the alphabet of all states > FiniteAlphabet SA = hmm.stateAlphabet(); > Iterator i = SA.iterator(); > > SimpleModelTrainer MT = new SimpleModelTrainer(); > MT.registerModel(hmm); > > //go through each state > while(i.hasNext()) > {Symbol Currentstate = (Symbol)i.next(); > > //Retreive the distribution of all transitions from the current state > FiniteAlphabet From = hmm.transitionsFrom((State)Currentstate); > Distribution d = hmm.getWeights((State)Currentstate); > Iterator i2 = From.iterator(); > > //go through it and look at all the weights for each of the transitions > while(i2.hasNext()) > {Symbol s = (Symbol)i2.next(); > System.out.println("From state "+Currentstate.getName()+ > "To State "+s.getName()+ > "Weight "+d.getWeight(s));} > > //get the distribution for the alphabet of the current state > Distribution d2 =((EmissionState)Currentstate).getDistribution(); > FiniteAlphabet IN = (FiniteAlphabet)hmm.emissionAlphabet(); > Iterator i3 = IN.iterator(); > //you can go through it the same way as above using a while loop > ***************************************************************** > This is what I don't understand!!!! > ***************************************************************** > here, we have a set of training sequences stored in a file in fasta format > that i'd like to use with the BaumWelch algorithm to optimize the > transition distributions mentionned above. > > //This is the file with all the training sequences > BufferedInputStream is = new BufferedInputStream(new > FileInputStream("z:/Sequences.faa")); > > //Load the file with the SequenceDB class > SequenceDB DB = SeqIOTools.readFasta(is, ProtAlphabet); > > //use 100 cycles as the stop criteria > StoppingCriteria stopper = new StoppingCriteria() > {public boolean isTrainingComplete(TrainingAlgorithm ta) > {return (ta.getCycle() > 100);}}; > > ***************************************** > This part is what I am clueless about > ***************************************** > //How do I optimize my hmm with the BaumWelch algorithm and retreive //the > optimized values ? How do I train the distribution above with //the baum > welch and the sequences that I have ? > DP dp= DPFactory.DEFAULT.createDP(hmm); > BaumWelchTrainer bwt = new BaumWelchTrainer(dp); > } > > PS : I do not know why you are helping all of us here but thank you. It > makes Biojava a lot easier to deal with. > > Steve > > > Hi Stephane - > > > > Within EmissionState you can set a Distribution that contains emission > probabilities for the Symbols states emission alphabet using the > setDistribution method. This Distribution will be your predetermined > weights. > > > > To set the transition probabilities you can use the setWeights(State > source, Distribution weights). The source is the state you are > > transitioning from and the weights is the probability of transitioning > to any State that the source connects too. Because States implement > Symbol you can put them in a Distribution. > > > > To make a Distribution of States that state 'a' could connect to use the > following pseudo code: > > > > State a; > > Model m; > > FiniteAlphabet endPoints; > > > > endPoints = m.transitionsFrom(a); > > Distribution d = > > DistributionFactory.DEFAULT.createDistribution(endPoints); > > > > //You can then train d or set it's weights and put it back in the model > with > > > > m.setWeights(a, d); > > > > Mark Schreiber > > Principal Scientist (Bioinformatics) > > > > Novartis Institute for Tropical Diseases (NITD) > > 1 Science Park Road > > #04-14 The Capricorn, Science Park II > > Singapore 117528 > > > > phone +65 6722 2973 > > fax +65 6722 2910 > > > > > > > > > > > > sacoca@mcb.mcgill.ca > > Sent by: biojava-l-bounces@portal.open-bio.org > > 03/12/2004 06:11 AM > > > > > > To: "Biojava Mailing List" > > cc: > > Subject: [Biojava-l] Parameter Settings in > > BaumWelchTraining > > > > > > Hi all. I'm trying to optimize the transition states probabilities for > my HMM. I already have set them to values which I think are pretty good. > Since I know the Baum Welch can only help with the scores and optimize > them up to a local maxima I thought of using the parameters I calculated > as a starting point. The problem is that I don't know how! > > I followed the example in biojava: > > > > .... > > //train the model to have uniform parameters > > ModelTrainer mt = new SimpleModelTrainer(); > > //register the model to train > > mt.registerModel(hmm); > > > > I want to use the values already set in my hmm as the starting > parameters in the BaumWelch. I don't want to use the uniform > distribution as indicated below! > > > > //as no other counts are being used the null weight will cause > > everything to be uniform > > mt.setNullModelWeight(1.0); > > mt.train(); > > > > I tried adding counts and looking up examples on the net but ended up > more confused than I started. How do I use the addCounts to make this > work! > > > > Stephane Acoca > > Master's Student > > McGill Center for Bioinformatics > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l@biojava.org > > http://biojava.org/mailman/listinfo/biojava-l > > > > > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From mark.schreiber at group.novartis.com Fri Mar 12 03:58:23 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Fri Mar 12 04:04:04 2004 Subject: [Biojava-l] Parameter Settings in BaumWelchTraining] Message-ID: I agree. If the BaumWelch trainer does cause problems one could always implement a different version of ModelTrainer. - Mark Dan Bolser 03/12/2004 04:47 PM To: Mark Schreiber/GP/Novartis@PH cc: sacoca@mcb.mcgill.ca, Biojava Mailing List Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining] On Fri, 12 Mar 2004 mark.schreiber@group.novartis.com wrote: > When you call the train() method of the BaumWelchTrainer you supply it > with a SequenceDB. The sequences from this DB are used to optimize the > weights of the model. > > However, I have a bad feeling that when you train your model with the > BaumWelchTrainer your previously set counts will be ignored and > overwritten. You could check by looking into AbstractModelTrainer.train() > (which is what the BaumWelchTrainer extends). You could also run some > tests to see if using a pre-trained model makes any difference to the > final outcome. Does anyone more expert than me on the DP package (ie most > people) know if the counts are overwritten? The idea sounds good either way, so it would be a shame to have to reject it on the basis of a technicality :) Cheers > > - Mark > > > > > > sacoca@mcb.mcgill.ca > Sent by: biojava-l-bounces@portal.open-bio.org > 03/12/2004 01:30 PM > > > To: sacoca@mcb.mcgill.ca > cc: Biojava Mailing List > Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining] > > > Sorry for the previous error. > ---------------------------- Original Message ---------------------------- > Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining > From: sacoca@MCB.McGill.CA > Date: Fri, March 12, 2004 12:27 am > To: mark.schreiber@group.novartis.com > -------------------------------------------------------------------------- > > Here is the code I have for the training. Using what you told me below, I > can retreive all of the weights that I calculated manually for the hmm > (distributions for the transitions and distributions for the alphabet of > each state). What I do not understand is how to use this information and > the sequences stored in a file to run the BaumWelchAlgorithm and then > retreive the optimized values calculated by the algorithm to set them back > into my HMM. > > //Retreive the alphabet of all states > FiniteAlphabet SA = hmm.stateAlphabet(); > Iterator i = SA.iterator(); > > SimpleModelTrainer MT = new SimpleModelTrainer(); > MT.registerModel(hmm); > > //go through each state > while(i.hasNext()) > {Symbol Currentstate = (Symbol)i.next(); > > //Retreive the distribution of all transitions from the current state > FiniteAlphabet From = hmm.transitionsFrom((State)Currentstate); > Distribution d = hmm.getWeights((State)Currentstate); > Iterator i2 = From.iterator(); > > //go through it and look at all the weights for each of the transitions > while(i2.hasNext()) > {Symbol s = (Symbol)i2.next(); > System.out.println("From state "+Currentstate.getName()+ > "To State "+s.getName()+ > "Weight "+d.getWeight(s));} > > //get the distribution for the alphabet of the current state > Distribution d2 =((EmissionState)Currentstate).getDistribution(); > FiniteAlphabet IN = (FiniteAlphabet)hmm.emissionAlphabet(); > Iterator i3 = IN.iterator(); > //you can go through it the same way as above using a while loop > ***************************************************************** > This is what I don't understand!!!! > ***************************************************************** > here, we have a set of training sequences stored in a file in fasta format > that i'd like to use with the BaumWelch algorithm to optimize the > transition distributions mentionned above. > > //This is the file with all the training sequences > BufferedInputStream is = new BufferedInputStream(new > FileInputStream("z:/Sequences.faa")); > > //Load the file with the SequenceDB class > SequenceDB DB = SeqIOTools.readFasta(is, ProtAlphabet); > > //use 100 cycles as the stop criteria > StoppingCriteria stopper = new StoppingCriteria() > {public boolean isTrainingComplete(TrainingAlgorithm ta) > {return (ta.getCycle() > 100);}}; > > ***************************************** > This part is what I am clueless about > ***************************************** > //How do I optimize my hmm with the BaumWelch algorithm and retreive //the > optimized values ? How do I train the distribution above with //the baum > welch and the sequences that I have ? > DP dp= DPFactory.DEFAULT.createDP(hmm); > BaumWelchTrainer bwt = new BaumWelchTrainer(dp); > } > > PS : I do not know why you are helping all of us here but thank you. It > makes Biojava a lot easier to deal with. > > Steve > > > Hi Stephane - > > > > Within EmissionState you can set a Distribution that contains emission > probabilities for the Symbols states emission alphabet using the > setDistribution method. This Distribution will be your predetermined > weights. > > > > To set the transition probabilities you can use the setWeights(State > source, Distribution weights). The source is the state you are > > transitioning from and the weights is the probability of transitioning > to any State that the source connects too. Because States implement > Symbol you can put them in a Distribution. > > > > To make a Distribution of States that state 'a' could connect to use the > following pseudo code: > > > > State a; > > Model m; > > FiniteAlphabet endPoints; > > > > endPoints = m.transitionsFrom(a); > > Distribution d = > > DistributionFactory.DEFAULT.createDistribution(endPoints); > > > > //You can then train d or set it's weights and put it back in the model > with > > > > m.setWeights(a, d); > > > > Mark Schreiber > > Principal Scientist (Bioinformatics) > > > > Novartis Institute for Tropical Diseases (NITD) > > 1 Science Park Road > > #04-14 The Capricorn, Science Park II > > Singapore 117528 > > > > phone +65 6722 2973 > > fax +65 6722 2910 > > > > > > > > > > > > sacoca@mcb.mcgill.ca > > Sent by: biojava-l-bounces@portal.open-bio.org > > 03/12/2004 06:11 AM > > > > > > To: "Biojava Mailing List" > > cc: > > Subject: [Biojava-l] Parameter Settings in > > BaumWelchTraining > > > > > > Hi all. I'm trying to optimize the transition states probabilities for > my HMM. I already have set them to values which I think are pretty good. > Since I know the Baum Welch can only help with the scores and optimize > them up to a local maxima I thought of using the parameters I calculated > as a starting point. The problem is that I don't know how! > > I followed the example in biojava: > > > > .... > > //train the model to have uniform parameters > > ModelTrainer mt = new SimpleModelTrainer(); > > //register the model to train > > mt.registerModel(hmm); > > > > I want to use the values already set in my hmm as the starting > parameters in the BaumWelch. I don't want to use the uniform > distribution as indicated below! > > > > //as no other counts are being used the null weight will cause > > everything to be uniform > > mt.setNullModelWeight(1.0); > > mt.train(); > > > > I tried adding counts and looking up examples on the net but ended up > more confused than I started. How do I use the addCounts to make this > work! > > > > Stephane Acoca > > Master's Student > > McGill Center for Bioinformatics > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l@biojava.org > > http://biojava.org/mailman/listinfo/biojava-l > > > > > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From td2 at sanger.ac.uk Fri Mar 12 04:20:40 2004 From: td2 at sanger.ac.uk (Thomas Down) Date: Fri Mar 12 04:26:16 2004 Subject: [Biojava-l] Parameter Settings in BaumWelchTraining] In-Reply-To: References: Message-ID: <87D428DA-7406-11D8-BF8A-000A95C8B056@sanger.ac.uk> On 12 Mar 2004, at 06:00, mark.schreiber@group.novartis.com wrote: > When you call the train() method of the BaumWelchTrainer you supply it > with a SequenceDB. The sequences from this DB are used to optimize the > weights of the model. > > However, I have a bad feeling that when you train your model with the > BaumWelchTrainer your previously set counts will be ignored and > overwritten. You could check by looking into > AbstractModelTrainer.train() > (which is what the BaumWelchTrainer extends). You could also run some > tests to see if using a pre-trained model makes any difference to the > final outcome. Does anyone more expert than me on the DP package (ie > most > people) know if the counts are overwritten? The Baum-Welch algorithm is actually the Expectation Maximization algorithm applied to HMMs. When you run BaumWelchTrainer, it takes the existing model then calculates a matrix which defines the probability that each symbol in your training database was emitted by a particular state in the HMM. It then optimizes the transition and emission probabilities of the model to maximize the likelihood of the data *given that assignment of states to data*. This means that, although all the model parameters get overwritten every cycle, they do still depend on the previous state of the model. If you're starting with a model that is a good, but not quite optimal, fit to your data, BaumWelchTrainer ought to do what you want. Another thing you might look at is building a model with a mixture of trainable and untrainable distributions. This allows you to specifically optimize the bits of the model which you aren't sure about yet, while holding the bits which you already trust constant. Thomas. From matthew_pocock at yahoo.co.uk Fri Mar 12 06:17:56 2004 From: matthew_pocock at yahoo.co.uk (Matthew Pocock) Date: Fri Mar 12 06:23:44 2004 Subject: [Biojava-l] BlastXMLParserFacade In-Reply-To: <43dfea43ae54.43ae5443dfea@lbl.gov> References: <43dfea43ae54.43ae5443dfea@lbl.gov> Message-ID: <40519C64.5030700@yahoo.co.uk> Hi, The parser shouldn't be throwing a NPE. However, the NCBI blast xml output didn't used to be well-formed XML, so was not parseable by any XML parser. I don't know if this has since been fixed by the NCBI. Matthew DMGoodstein@lbl.gov wrote: >I was wondering if anyone has successfully gotten >BlastXMLParserFacade to work on an xml style NCBI >blast output file (version 2.2.3)? I'm getting null >pointer exceptions deep within the crimson parser >implementation classes, but I now i'm correctly >passing in the NCBI output file, since the dtd is >getting read. > >--David Goodstein > Joint Genome Institute > >/usr/java/j2sdk1.4.1_02/bin/java2/bin/java >-classpath .:./biojava-1.3.1 >.1.jar BlastParser fred.xml >a/j2sdk1.4.1_02/bin/javac -classpath .:./biojava-1.3. >java.lang.NullPointerException > at >org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:524) > at >org.apache.crimson.parser.Parser2.parse(Parser2.java:305) > at >org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442) > at >org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:172) > at BlastParser.main(BlastParser.java:44) >1.jar BlastParser >./fred.xmlava/j2sdk1.4.1_02/bin/java -classpath >.:./biojava-1.3.1 >java.lang.NullPointerException > at >org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:524) > at >org.apache.crimson.parser.Parser2.parse(Parser2.java:305) > at >org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442) > at >org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:172) > at BlastParser.main(BlastParser.java:44) > >_______________________________________________ >Biojava-l mailing list - Biojava-l@biojava.org >http://biojava.org/mailman/listinfo/biojava-l > > > From smh1008 at cus.cam.ac.uk Fri Mar 12 07:03:37 2004 From: smh1008 at cus.cam.ac.uk (David Huen) Date: Fri Mar 12 07:09:30 2004 Subject: [Biojava-l] BlastXMLParserFacade In-Reply-To: <40519C64.5030700@yahoo.co.uk> References: <43dfea43ae54.43ae5443dfea@lbl.gov> <40519C64.5030700@yahoo.co.uk> Message-ID: <200403121203.37225.smh1008@cus.cam.ac.uk> On Friday 12 Mar 2004 11:17 am, Matthew Pocock wrote: > Hi, > > The parser shouldn't be throwing a NPE. However, the NCBI blast xml > output didn't used to be well-formed XML, so was not parseable by any > XML parser. I don't know if this has since been fixed by the NCBI. > In this case, that's not the problem. The problem started when we switched from the Xerces parsers to Sun's own and no attempt on my part was successful in getting it it use the NCBI's DTD. Part of the problem is that DTD from NCBI refers to other DTDs and those references are not complete so my code used to explicitly resolve them but with Sun's parser, I could never convince it to use the provided DTDs. I don't have time to return to that problem for a while so perhaps someone else more familiar with Crimson could figure it out. Alternatively, we could hack it to not use DTDs at all (totally non-validating). Regards, David From dmb at mrc-dunn.cam.ac.uk Fri Mar 12 07:19:43 2004 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Fri Mar 12 07:22:34 2004 Subject: [Biojava-l] BlastXMLParserFacade In-Reply-To: <40519C64.5030700@yahoo.co.uk> Message-ID: On Fri, 12 Mar 2004, Matthew Pocock wrote: > Hi, > > The parser shouldn't be throwing a NPE. However, the NCBI blast xml > output didn't used to be well-formed XML, so was not parseable by any > XML parser. I don't know if this has since been fixed by the NCBI. ! I have a perl module to 'clean up' large amounts of psi-blast data. Do such modules exist in biojava? Specifically, given a 'family' label for query database, I wanted a 'family' assignment to the target database, and my module did this by assessing overlap within and between families. It would be really cool to do this a bit more systematically and 'modularly' with biojava. > > Matthew > > DMGoodstein@lbl.gov wrote: > > >I was wondering if anyone has successfully gotten > >BlastXMLParserFacade to work on an xml style NCBI > >blast output file (version 2.2.3)? I'm getting null > >pointer exceptions deep within the crimson parser > >implementation classes, but I now i'm correctly > >passing in the NCBI output file, since the dtd is > >getting read. > > > >--David Goodstein > > Joint Genome Institute > > > >/usr/java/j2sdk1.4.1_02/bin/java2/bin/java > >-classpath .:./biojava-1.3.1 > >.1.jar BlastParser fred.xml > >a/j2sdk1.4.1_02/bin/javac -classpath .:./biojava-1.3. > >java.lang.NullPointerException > > at > >org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:524) > > at > >org.apache.crimson.parser.Parser2.parse(Parser2.java:305) > > at > >org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442) > > at > >org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:172) > > at BlastParser.main(BlastParser.java:44) > >1.jar BlastParser > >./fred.xmlava/j2sdk1.4.1_02/bin/java -classpath > >.:./biojava-1.3.1 > >java.lang.NullPointerException > > at > >org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:524) > > at > >org.apache.crimson.parser.Parser2.parse(Parser2.java:305) > > at > >org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442) > > at > >org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade.parse(BlastXMLParserFacade.java:172) > > at BlastParser.main(BlastParser.java:44) > > > >_______________________________________________ > >Biojava-l mailing list - Biojava-l@biojava.org > >http://biojava.org/mailman/listinfo/biojava-l > > > > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From td2 at sanger.ac.uk Fri Mar 12 07:27:02 2004 From: td2 at sanger.ac.uk (Thomas Down) Date: Fri Mar 12 07:32:38 2004 Subject: [Biojava-l] BlastXMLParserFacade In-Reply-To: <200403121203.37225.smh1008@cus.cam.ac.uk> References: <43dfea43ae54.43ae5443dfea@lbl.gov> <40519C64.5030700@yahoo.co.uk> <200403121203.37225.smh1008@cus.cam.ac.uk> Message-ID: <911015F5-7420-11D8-BF8A-000A95C8B056@sanger.ac.uk> On 12 Mar 2004, at 12:03, David Huen wrote: > On Friday 12 Mar 2004 11:17 am, Matthew Pocock wrote: >> Hi, >> >> The parser shouldn't be throwing a NPE. However, the NCBI blast xml >> output didn't used to be well-formed XML, so was not parseable by any >> XML parser. I don't know if this has since been fixed by the NCBI. >> > In this case, that's not the problem. The problem started when we > switched > from the Xerces parsers to Sun's own and no attempt on my part was > successful in getting it it use the NCBI's DTD. Part of the problem is > that DTD from NCBI refers to other DTDs and those references are not > complete so my code used to explicitly resolve them but with Sun's > parser, > I could never convince it to use the provided DTDs. > > I don't have time to return to that problem for a while so perhaps > someone > else more familiar with Crimson could figure it out. Alternatively, we > could hack it to not use DTDs at all (totally non-validating). We're not really using Sun's parser specifically -- just the default JAXP parser that's installed on the system. It ought to be possible to make Xerces the latest JAXP parser, although I think that this means putting it on the bootstrap classpath. I'll try this out. A while ago I was under the distinct impression that Sun were going to junk Crimson and make Xerces the standard parser. Looks like this hasn't happened though -- I don't know why. Thomas. From fangl at genomics.org.cn Fri Mar 19 01:11:12 2004 From: fangl at genomics.org.cn (Magic Fang) Date: Fri Mar 19 01:20:22 2004 Subject: [Biojava-l] is there a biojava class diagram? like bioperl diagram Message-ID: <405A8F00.40602@genomics.org.cn> From fangl at genomics.org.cn Fri Mar 19 01:37:51 2004 From: fangl at genomics.org.cn (Magic Fang) Date: Fri Mar 19 01:47:19 2004 Subject: [Biojava-l] yes, something like this Message-ID: <405A953F.4020807@genomics.org.cn> this one seems too detail and complex, i mean just class function and inherit relationship. From mark.schreiber at group.novartis.com Fri Mar 19 03:26:08 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Fri Mar 19 03:32:16 2004 Subject: [Biojava-l] yes, something like this Message-ID: Unfortunately the heavy use of interfaces and 'mutliple inheritance' in biojava means that those UML diagrams are about as simple as you will get. I suppose they could be pruned a bit to show only interfaces implemted and classes extended and not dependencies and reverse dependencies. - Mark Magic Fang Sent by: biojava-l-bounces@portal.open-bio.org 03/19/2004 02:37 PM To: Mark Schreiber/GP/Novartis@PH cc: biojava-l@biojava.org Subject: [Biojava-l] yes, something like this this one seems too detail and complex, i mean just class function and inherit relationship. _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From matthew_pocock at yahoo.co.uk Fri Mar 19 06:12:39 2004 From: matthew_pocock at yahoo.co.uk (Matthew Pocock) Date: Fri Mar 19 06:18:21 2004 Subject: [Biojava-l] yes, something like this In-Reply-To: References: Message-ID: <405AD5A7.9080606@yahoo.co.uk> Agreed - biojava and UML don't like each other. UML seems much more suited to data-blob designs where as biojava is mainly a relationships and associations design. The OWL for biojava looks ok though. Hey ho. mark.schreiber@group.novartis.com wrote: >Unfortunately the heavy use of interfaces and 'mutliple inheritance' in >biojava means that those UML diagrams are about as simple as you will get. >I suppose they could be pruned a bit to show only interfaces implemted and >classes extended and not dependencies and reverse dependencies. > >- Mark > > > > > >Magic Fang >Sent by: biojava-l-bounces@portal.open-bio.org >03/19/2004 02:37 PM > > > To: Mark Schreiber/GP/Novartis@PH > cc: biojava-l@biojava.org > Subject: [Biojava-l] yes, something like this > > >this one seems too detail and complex, i mean just class function and >inherit relationship. >_______________________________________________ >Biojava-l mailing list - Biojava-l@biojava.org >http://biojava.org/mailman/listinfo/biojava-l > > > >_______________________________________________ >Biojava-l mailing list - Biojava-l@biojava.org >http://biojava.org/mailman/listinfo/biojava-l > > > From heuermh at acm.org Fri Mar 19 12:28:58 2004 From: heuermh at acm.org (Michael Heuer) Date: Fri Mar 19 12:43:32 2004 Subject: [Biojava-l] yes, something like this In-Reply-To: <405AD5A7.9080606@yahoo.co.uk> Message-ID: On Fri, 19 Mar 2004, Matthew Pocock wrote: > The OWL for biojava looks ok though. Hey ho. Is this a tease? :) What would/could one do with an OWL representation of the biojava object model? michael From fangl at genomics.org.cn Fri Mar 19 23:57:09 2004 From: fangl at genomics.org.cn (Magic Fang) Date: Sat Mar 20 00:02:29 2004 Subject: [Biojava-l] yes, something like this In-Reply-To: References: Message-ID: <405BCF25.3030307@genomics.org.cn> Michael Heuer wrote: >On Fri, 19 Mar 2004, Matthew Pocock wrote: > > > >>The OWL for biojava looks ok though. Hey ho. >> >> > >Is this a tease? :) What would/could one do with an OWL representation of >the biojava object model? > > michael > > > > > Does the OWL means the Open Wrokflow/Lifecycle model? I would refer it to owl, that few would see it or I am confused on biojava architecture:-) It is more complex than bioperl. From heuermh at acm.org Sat Mar 20 00:35:10 2004 From: heuermh at acm.org (Michael Heuer) Date: Sat Mar 20 00:44:52 2004 Subject: [Biojava-l] yes, something like this In-Reply-To: <405BCF25.3030307@genomics.org.cn> Message-ID: On Sat, 20 Mar 2004, Magic Fang wrote: > Does the OWL means the Open Wrokflow/Lifecycle model? I would refer it > to owl, that few would see it or I am confused on biojava architecture:-) > It is more complex than bioperl. Nope, this OWL > http://www.w3.org/TR/owl-features/ michael From mark.schreiber at group.novartis.com Sun Mar 21 21:21:41 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Sun Mar 21 21:27:11 2004 Subject: [Biojava-l] BioJava mentioned in 'on java' article Message-ID: Thanks to Russell Smithies for passing this on: http://www.onjava.com/pub/a/onjava/2004/03/10/bioinf.html Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 1 Science Park Road #04-14 The Capricorn, Science Park II Singapore 117528 phone +65 6722 2973 fax +65 6722 2910 From Christian.Gruber at biomax.com Mon Mar 22 10:30:02 2004 From: Christian.Gruber at biomax.com (Christian Gruber) Date: Mon Mar 22 10:35:27 2004 Subject: [Biojava-l] Missing ProcessTools class Message-ID: <405F067A.5040708@biomax.com> Hi! I just found that the class org.biojava.utils.ProcessTools is not part of the source and binary distribution of BioJava 1.3.1, but they are part of the biojava-live CVS. Are they missing by accident? By the way, what is the recommended way to compile BioJava 1.3.1, since biojava-1.3.1.tar.gz is missing build.xml etc. I tried to unpack 1.3.0 and on top of that the 1.3.1 source, and it seems to work, but I'm not sure if this is the correct way to do that... Greetings, Christian From mark.schreiber at group.novartis.com Mon Mar 22 20:16:43 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Mon Mar 22 20:22:16 2004 Subject: [Biojava-l] Missing ProcessTools class Message-ID: BioJava 1.3.1 was an update release of 1.3. There are a number of things that biojava live contains that 1.3.1 does not. The goal was to release a 1.3.2 but unfortunately I moved countries and jobs etc so the plan slipped somewhat. It now seems unlikely that there will be any further updates of the 1.3.x lineage. If 1.3.1 doesn't have what you need I would reccommend getting one of the nightly snapshots of biojava-live (one that passes all the unit tests) and build that. The vast majority of the live build is now stable. Only newly added things tend to change rapidly. - Mark Christian Gruber Sent by: biojava-l-bounces@portal.open-bio.org 03/22/2004 11:30 PM To: biojava-l@biojava.org cc: Subject: [Biojava-l] Missing ProcessTools class Hi! I just found that the class org.biojava.utils.ProcessTools is not part of the source and binary distribution of BioJava 1.3.1, but they are part of the biojava-live CVS. Are they missing by accident? By the way, what is the recommended way to compile BioJava 1.3.1, since biojava-1.3.1.tar.gz is missing build.xml etc. I tried to unpack 1.3.0 and on top of that the 1.3.1 source, and it seems to work, but I'm not sure if this is the correct way to do that... Greetings, Christian _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From fangl at genomics.org.cn Mon Mar 22 20:22:38 2004 From: fangl at genomics.org.cn (Magic Fang) Date: Mon Mar 22 20:32:35 2004 Subject: [Biojava-l] is blast parser flawed? Message-ID: <405F915E.8090809@genomics.org.cn> it seems that blast parser only check blast version on the first line of blast output, and i tricked her by changing the first line to 2.2.3, while i use version of 2.2.8. From mark.schreiber at group.novartis.com Mon Mar 22 20:53:30 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Mon Mar 22 20:59:23 2004 Subject: [Biojava-l] is blast parser flawed? Message-ID: Hi - The BlastLikeSAXParser checks the version of blast against a list it is known to support. If you call the .setModeLazy() method then the parser doesn't check the version. If you do this you should check the results carefully. If people know of BLAST versions that the parser reads reliably could they let us know and we can add them to the list of supported versions. - Mark Magic Fang Sent by: biojava-l-bounces@portal.open-bio.org 03/23/2004 09:22 AM To: biojava-l@biojava.org cc: Subject: [Biojava-l] is blast parser flawed? it seems that blast parser only check blast version on the first line of blast output, and i tricked her by changing the first line to 2.2.3, while i use version of 2.2.8. _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From Christian.Gruber at biomax.com Tue Mar 23 09:02:18 2004 From: Christian.Gruber at biomax.com (Christian Gruber) Date: Tue Mar 23 09:07:42 2004 Subject: [Biojava-l] cvs.biojava.org down? Message-ID: <4060436A.4090902@biomax.com> Hi! I can't access the host cvs.biojava.org via ping and http. Maybe the machine is down? Christian From max_dipl at web.de Thu Mar 25 10:22:02 2004 From: max_dipl at web.de (Maximilian Haeussler) Date: Thu Mar 25 10:27:15 2004 Subject: [Biojava-l] LabelledSequenceRenderer does not work with TranslatedSequenceRenderer, Newbie Message-ID: <4062F91A.8020306@web.de> Hallo everyone, I am pretty new to biojava. Can anyone give me a hint why the LabelledSequenceRenderer works fine for the SequenceRenderer, but not with the TranslatedSequenceRenderer? I've modified FastBeadDemo and BeadDemo to include labels. BeadDemo works fine but not the translated one. What is so different about the TranslatedSequenceRenderer that prohibits painting into the trailing space of the sequence? Another question, this time a rather stupid one: How can I get rid of those "Originally:"/"Now:"-messages of the demos? I can't find the message anywhere, neither in biojava nor in the demo directory... Thanks Max From dmb at mrc-dunn.cam.ac.uk Thu Mar 25 10:49:09 2004 From: dmb at mrc-dunn.cam.ac.uk (Dan Bolser) Date: Thu Mar 25 10:49:05 2004 Subject: [Biojava-l] biojava script? Message-ID: Hello, I am hoping to write a script to parse HMMER results and display the results. I know modules to do both these things exist, but has anyone put them together in a script yet? Cheers, Dan. From kalle.naslund at genpat.uu.se Thu Mar 25 11:25:43 2004 From: kalle.naslund at genpat.uu.se (=?ISO-8859-1?Q?Kalle_N=E4slund?=) Date: Thu Mar 25 11:31:20 2004 Subject: [Biojava-l] LabelledSequenceRenderer does not work with TranslatedSequenceRenderer, Newbie In-Reply-To: <4062F91A.8020306@web.de> References: <4062F91A.8020306@web.de> Message-ID: <40630807.40000@genpat.uu.se> Maximilian Haeussler wrote: > Hallo everyone, > > I am pretty new to biojava. Can anyone give me a hint why the > LabelledSequenceRenderer works fine for the SequenceRenderer, but not > with the TranslatedSequenceRenderer? I've modified FastBeadDemo and > BeadDemo to include labels. BeadDemo works fine but not the translated > one. What is so different about the TranslatedSequenceRenderer that > prohibits painting into the trailing space of the sequence? > > Another question, this time a rather stupid one: How can I get rid of > those "Originally:"/"Now:"-messages of the demos? I can't find the > message anywhere, neither in biojava nor in the demo directory... > > Thanks > Max > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l HI! Firstly, i am going to make a few assumptions here, as some of the classnames you mention seems a bit out of context or do not exist. My appologies if these assumptions are wrong, i will then have to blame my lack of proper english knowledge. But, hope i assumed correctly and this helps. The assumption are. 1) SequenceRenderer == SequencePanel ( if we are going to compare this class to TranslatedSequencePanel, SequencePanel is the thing that makes sense ) 2) TranslatedSequenceRenderer == TranslatedSequencePanel ( there is no TranslatedSequenceRenderer in the bj source tree ) 3) The part about painting in the trailing space of the sequence is realy about the leading space ( The LabelledSequenceRenderer doesnt support painting into the trailing space, as indicated by the javadocs ) This should of course be fixed ) then then the question becomes, why does LabelledSequenceRenderer work with SequencePanel and not with TranslatedSequencePanel ? My guess is that this is a bug in either the SequencePanel or the TranslatedSequencePanel when it comes to handling leading and trailing space. The obvious solutions as i see it is 1) you use SequencePanel, after a quick look at the SequencePanel api it seems you can use the setRange() method call to mimic the TranslatedSequencePanel setSymbolTranslation(int translation) behaviour. by something like setRange( new RangeLocation( translation, symbolList.length() ) ); 2) have me or someone else fix TranslatedSEquencePanel/SequencePanel/LabelledSequenceRenderer properly ( this should realy be done irrespectively of what solution you choose ) I will be able to assist you with point 2, but not until sometime in the beginning of next week. Also, can anyone remind me why we have several Panel implementations please ? shouldnt it be very nice to have one single implementation, that supports the unique features of both SequencePanel and TranslatedSequencePanel ? Live long and prosper /Kalle From matthew_pocock at yahoo.co.uk Thu Mar 25 11:30:53 2004 From: matthew_pocock at yahoo.co.uk (Matthew Pocock) Date: Thu Mar 25 11:41:46 2004 Subject: [Biojava-l] LabelledSequenceRenderer does not work with TranslatedSequenceRenderer, Newbie In-Reply-To: <4062F91A.8020306@web.de> References: <4062F91A.8020306@web.de> Message-ID: <4063093D.1010709@yahoo.co.uk> Maximilian Haeussler wrote: > Hallo everyone, > > Another question, this time a rather stupid one: How can I get rid of > those "Originally:"/"Now:"-messages of the demos? I can't find the > message anywhere, neither in biojava nor in the demo directory... I found some code in my copy in org.biojava.bio.gui.sequence.GuiTools that is printing these messages (lines #33 and #34). These lines are commented out in the version in CVS (revision 1.4). Does using code from CVS help? If you think you are doing this already, is it possible that you have an earlier biojava.jar in your classpath? Matthew > > Thanks > Max > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From matthew_pocock at yahoo.co.uk Thu Mar 25 12:20:24 2004 From: matthew_pocock at yahoo.co.uk (Matthew Pocock) Date: Thu Mar 25 12:26:04 2004 Subject: [Biojava-l] LabelledSequenceRenderer does not work with TranslatedSequenceRenderer, Newbie In-Reply-To: <40630807.40000@genpat.uu.se> References: <4062F91A.8020306@web.de> <40630807.40000@genpat.uu.se> Message-ID: <406314D8.1020805@yahoo.co.uk> Kalle N?slund wrote: > Also, can anyone remind me why we have several Panel implementations > please ? shouldnt it be very nice to have one single > implementation, that supports the unique features of both > SequencePanel and TranslatedSequencePanel ? Amen brother! Let's drop the poster class - it's bad. SequencePanel behaves itself in a scrollable pane. TranslatedSequencePanel needs to have hooks added to scroll it. Is there any performance difference between the two now? Does one render things better than the other e.g. on long sequences? Let's drop one of these and go with the other. Kalle - want to proposing what to do next? > > Live long and prosper > /Kalle Matthew From kdj at sanger.ac.uk Thu Mar 25 12:26:46 2004 From: kdj at sanger.ac.uk (Keith James) Date: Thu Mar 25 12:32:05 2004 Subject: [Biojava-l] LabelledSequenceRenderer does not work with TranslatedSequenceRenderer, Newbie In-Reply-To: <40630807.40000@genpat.uu.se> References: <4062F91A.8020306@web.de> <40630807.40000@genpat.uu.se> Message-ID: >>>>> "Kalle" == Kalle N?slund writes: [...] Kalle> Also, can anyone remind me why we have several Panel Kalle> implementations please ? shouldnt it be very nice to have Kalle> one single implementation, that supports the unique Kalle> features of both SequencePanel and TranslatedSequencePanel Kalle> ? Long story... This came about because SequencePanel had two problems: 1. It will by default render a huge sequence in a single huge coordinate space which leads to floating point errors in marking tics/points at the far end of the sequence. To solve this a hack was put in to translate the drawing coordinates to ~0 before drawing. However that broke rendering to any Graphics2D derived from a BufferedImage where it produced an un-renderable region between 0 and the offset (which was 50 pixels wide, if I remember) 2. It was far too slow with available sequence/feature renderers. So I wrote TranslatedSequencePanel which solves the first problem without using the offset hack. Instead of placing a huge virtual rendering area in a ScrollPane, it clips a window of sequence to render and runs the window along the sequence. Therefore you are always drawing from 0, even at base 100,000,001 of the sequence. As a nice side-effect, it also turned out to be 5-10x faster at rendering with the current sequence/feature renderers. But people were still using the SequencePanel in their client code, so I didn't want to remove it. Keith -- - Keith James Microarray Facility, Team 65 - - The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK - From smh1008 at cus.cam.ac.uk Thu Mar 25 12:45:10 2004 From: smh1008 at cus.cam.ac.uk (David Huen) Date: Thu Mar 25 12:59:32 2004 Subject: [Biojava-l] LabelledSequenceRenderer does not work with TranslatedSequenceRenderer, Newbie In-Reply-To: References: <4062F91A.8020306@web.de> <40630807.40000@genpat.uu.se> Message-ID: <200403251745.10493.smh1008@cus.cam.ac.uk> On Thursday 25 Mar 2004 5:26 pm, Keith James wrote: > >>>>> "Kalle" == Kalle N?slund writes: > > [...] > > Kalle> Also, can anyone remind me why we have several Panel > Kalle> implementations please ? shouldnt it be very nice to have > Kalle> one single implementation, that supports the unique > Kalle> features of both SequencePanel and TranslatedSequencePanel > Kalle> ? > > Long story... > > This came about because SequencePanel had two problems: > > 1. It will by default render a huge sequence in a single huge > coordinate space which leads to floating point errors in marking > tics/points at the far end of the sequence. > > To solve this a hack was put in to translate the drawing > coordinates to ~0 before drawing. However that broke rendering to > any Graphics2D derived from a BufferedImage where it produced an > un-renderable region between 0 and the offset (which was 50 pixels > wide, if I remember) > > 2. It was far too slow with available sequence/feature renderers. > > > So I wrote TranslatedSequencePanel which solves the first problem > without using the offset hack. Instead of placing a huge virtual > rendering area in a ScrollPane, it clips a window of sequence to > render and runs the window along the sequence. Therefore you are > always drawing from 0, even at base 100,000,001 of the sequence. > > As a nice side-effect, it also turned out to be 5-10x faster at > rendering with the current sequence/feature renderers. > > But people were still using the SequencePanel in their client code, so > I didn't want to remove it. > Let's go to TranslatedSequencePanel then. It seems the sane way to go and it is good at high coord spaces. The only way this is going to happen is if we tag it "DEPRECATED" immediately and shoot it at 1.4 final. David Huen From mark.schreiber at group.novartis.com Fri Mar 19 01:25:14 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Thu Mar 25 16:39:23 2004 Subject: [Biojava-l] is there a biojava class diagram? like bioperl diagram Message-ID: Do you mean UML diagrams? Attached is an example of the SymbolList UML -------------- next part -------------- A non-text attachment was scrubbed... Name: symbol-list-uml.png Type: application/octet-stream Size: 11377 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-l/attachments/20040319/b038a990/symbol-list-uml.obj From fangl at genomics.org.cn Thu Mar 25 20:26:05 2004 From: fangl at genomics.org.cn (Magic Fang) Date: Thu Mar 25 20:32:00 2004 Subject: [Biojava-l] what is the design goal of biojava? Message-ID: <406386AD.5070503@genomics.org.cn> hi, i am now using biojava, and i think it is quite good, especially the modeling and statistics class, but on data manipulation, it seems somewhat complex. if she is like bioperl in seqio or searchio, she will be more beautiful i think. so i want the design goal of biojava, for biologist or programmer? From mark.schreiber at group.novartis.com Fri Mar 26 02:58:33 2004 From: mark.schreiber at group.novartis.com (mark.schreiber@group.novartis.com) Date: Fri Mar 26 03:04:03 2004 Subject: [Biojava-l] what is the design goal of biojava? Message-ID: I would say the design goal (not that there is an official one) is general sequence manipulation and analysis. BioJava is probably more for the developer than the biologist. Part of the reason is the design and part of the reason is that Java is more suited to application development than simple scripting which is better done in Perl, Python or Ruby. - Mark Magic Fang Sent by: biojava-l-bounces@portal.open-bio.org 03/26/2004 09:26 AM To: biojava-l@biojava.org cc: Subject: [Biojava-l] what is the design goal of biojava? hi, i am now using biojava, and i think it is quite good, especially the modeling and statistics class, but on data manipulation, it seems somewhat complex. if she is like bioperl in seqio or searchio, she will be more beautiful i think. so i want the design goal of biojava, for biologist or programmer? _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From kdj at sanger.ac.uk Fri Mar 26 04:52:51 2004 From: kdj at sanger.ac.uk (Keith James) Date: Fri Mar 26 04:58:11 2004 Subject: [Biojava-l] LabelledSequenceRenderer does not work with TranslatedSequenceRenderer, Newbie In-Reply-To: <406314D8.1020805@yahoo.co.uk> References: <4062F91A.8020306@web.de> <40630807.40000@genpat.uu.se> <406314D8.1020805@yahoo.co.uk> Message-ID: >>>>> "Matthew" == Matthew Pocock writes: Matthew> Kalle N?slund wrote: >> Also, can anyone remind me why we have several Panel >> implementations please ? shouldnt it be very nice to have one >> single implementation, that supports the unique features of >> both SequencePanel and TranslatedSequencePanel ? Matthew> Amen brother! Let's drop the poster class - it's Matthew> bad. SequencePanel behaves itself in a scrollable Matthew> pane. TranslatedSequencePanel needs to have hooks added Matthew> to scroll it. Is there any performance difference between Matthew> the two now? Does one render things better than the other Matthew> e.g. on long sequences? Let's drop one of these and go Matthew> with the other. That performance difference was on JDK 1.3 (I think) Linux and Tru64. It could be that with all the changes in 1.4+ that the difference has closed - one would hope so. I haven't looked at this in a long while. I will check it out. Keith -- - Keith James Microarray Facility, Team 65 - - The Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK - From matthew_pocock at yahoo.co.uk Fri Mar 26 08:36:10 2004 From: matthew_pocock at yahoo.co.uk (Matthew Pocock) Date: Fri Mar 26 08:41:59 2004 Subject: [Biojava-l] what is the design goal of biojava? In-Reply-To: <406386AD.5070503@genomics.org.cn> References: <406386AD.5070503@genomics.org.cn> Message-ID: <406431CA.1030402@yahoo.co.uk> Magic Fang wrote: > hi, > i am now using biojava, and i think it is quite good, especially the > modeling and statistics class, but on data manipulation, it seems > somewhat complex. if she is like bioperl in seqio or searchio, she > will be more beautiful i think. so i want the design goal of biojava, > for biologist or programmer? I agree with you about the complexity. If you can think of a cleaner and simpler API and have the time or can motivate others then feel free to tackle this. I think this happened because the early developers where all using BioJava as a library for developing compute-intensive applications, rather than as a scripting environment for running external anaysis programs. Obviously, now it would be good to address this. Matthew From fangl at genomics.org.cn Tue Mar 30 01:33:55 2004 From: fangl at genomics.org.cn (Magic Fang) Date: Tue Mar 30 01:39:33 2004 Subject: [Biojava-l] how to get renderer in SequencePanel Message-ID: <406914D3.6030407@genomics.org.cn> hi, i am now developing a trace viewer and add trace dynamicly, but i want to change the AbiTraceRenderer's depth when i need. and how to get MultiLineRenderer and then AbiTaceRenderer? thank u. From kalle.naslund at genpat.uu.se Tue Mar 30 04:05:59 2004 From: kalle.naslund at genpat.uu.se (=?ISO-8859-1?Q?Kalle_N=E4slund?=) Date: Tue Mar 30 04:10:55 2004 Subject: [Fwd: Re: [Biojava-l] LabelledSequenceRenderer does not work with TranslatedSequenceRenderer, Newbie] Message-ID: <40693877.40107@genpat.uu.se> Keith James wrote: > [ ... ] > > Long story... > > This came about because SequencePanel had two problems: > > 1. It will by default render a huge sequence in a single huge > coordinate space which leads to floating point errors in marking > tics/points at the far end of the sequence. > > To solve this a hack was put in to translate the drawing > coordinates to ~0 before drawing. However that broke rendering to > any Graphics2D derived from a BufferedImage where it produced an > un-renderable region between 0 and the offset (which was 50 pixels > wide, if I remember) > > 2. It was far too slow with available sequence/feature renderers. > > > So I wrote TranslatedSequencePanel which solves the first problem > without using the offset hack. Instead of placing a huge virtual > rendering area in a ScrollPane, it clips a window of sequence to > render and runs the window along the sequence. Therefore you are > always drawing from 0, even at base 100,000,001 of the sequence. > > As a nice side-effect, it also turned out to be 5-10x faster at > rendering with the current sequence/feature renderers. > > But people were still using the SequencePanel in their client code, so > I didn't want to remove it. > > Keith > > > Matthew have done some work on the rendering part, there is now a class called org.biojava.bio.gui.sequence.GUITools that contains some helper methods, so that SequenceRenderers can figure out what part is visible, so they can limit rendering to the visible parts. From a quick test, the speed with SequencePanel seems comparable to the speed you get with TranslatedSequencePanel. I dont know if he has fixed issue number 1 aswell, but i will try to take a look at that later on today, and see if i can see the rendering errors with the CVS version of SequencePanel. Kalle