From boehme at mpiib-berlin.mpg.de Thu Jun 2 09:03:30 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Thu Jun 2 08:55:31 2005 Subject: [Biojava-l] Re: [BioSQL-l] How to add a feature? In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA606E7F7@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA606E7F7@ANTARESIA.be.devgen.com> Message-ID: <429F03A2.1090208@mpiib-berlin.mpg.de> Thanks Marc, but I don't know how to make a feature persistent in Biojava. Maybe someone from the bioJava list can help me? Martina Marc Logghe wrote: > Hi Martina, > I don't know how it goes in BioJava but in BioPerl the flow looks like > this: > 1) create your feature > 2) make it persistent > 3) add it to your (persistent) sequence object > 4) store the sequence object in the databse > 5) commit if necessary > > HTH, > Marc > > >>I'm wondering how to add a feature to a given sequence? >>I know, I can use createFeature, but that changes nothing in >>the database, that does addSequence. So is the proper way to >>retrieve the seq., get all its features, copy it to new seq >>and add a feature, delete the seq in the database and store >>the new one? >>There must be a simpler way? BioJava In Anger is rather >>sparse on things like that, I could do with a lot more examples .. >> >>Martina >>_______________________________________________ >>BioSQL-l mailing list >>BioSQL-l@open-bio.org >>http://open-bio.org/mailman/listinfo/biosql-l > > From jesse-t at chello.nl Thu Jun 2 09:40:27 2005 From: jesse-t at chello.nl (Jesse) Date: Thu Jun 2 09:34:08 2005 Subject: [Biojava-l] [1.4pre1] BioJava's-Regex with ambigous symbols Message-ID: <20050602134016.9AD142E02A@rbox4.erasmusmc.nl> Can someone tell me how I can perform a BioJava 1.4pre1 regex search using ambiguous symbols? I'm using the following ambiguous DNA symbols: (http://rebase.neb.com/rebase/link_withrefm) -R = G or A -Y = C or T -M = A or C -K = G or T -S = G or C -W = A or T -B = not A (C or G or T) -D = not C (A or G or T) -H = not G (A or C or T) -V = not T (A or C or G) -N = A or C or G or T If correct, to perform a BioJava-Regex, I need to make a PatternFactory using the following method: FiniteAlphabet fa = DNATools.getDNA(); org.biojava.utils.regex.PatternFactory.makeFactory(fa) So I need a FiniteAlphabet containing ambiguous symbols right? How can I make such FiniteAlphabet? My goal is to perform a searchpattern like "g[agr]cg[cty]c" on a SymbolList like "ATGCGACGTCTTAANNNNNNATGCAAC"; Thanks. -Jesse From mark.schreiber at novartis.com Thu Jun 2 21:02:57 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Thu Jun 2 20:55:02 2005 Subject: [Biojava-l] Re: [BioSQL-l] How to add a feature? Message-ID: >There must be a simpler way? BioJava In Anger is rather >sparse on things like that, I could do with a lot more examples .. > All donations of examples are gratefully received. As you say it could do with more examples but hey, I'm only one man, with a day job that is rapidly turning into a night job too : ) - Mark From boehme at mpiib-berlin.mpg.de Mon Jun 6 05:34:50 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Mon Jun 6 05:27:28 2005 Subject: Bio Java (was: Re: [Biojava-l] Re: [BioSQL-l] How to add a feature?) In-Reply-To: References: Message-ID: <42A418BA.8090407@mpiib-berlin.mpg.de> Sorry - I didn't mean you personally! Because it is quite hard for me to figure out how things are working just from the api and the sources, I assumed it would be similar for others starting with BioJava/BioSQL. There must be some working code around somewhere which could be donated? Please do :-) It would increase the popularity of BioJava/BioSQL, which it deserved, I would think. Martina mark.schreiber@novartis.com wrote: >>There must be a simpler way? BioJava In Anger is rather >>sparse on things like that, I could do with a lot more examples .. >> > > > All donations of examples are gratefully received. As you say it could do > with more examples but hey, I'm only one man, with a day job that is > rapidly turning into a night job too : ) > > - Mark > > From jesse-t at chello.nl Mon Jun 6 05:49:17 2005 From: jesse-t at chello.nl (Jesse) Date: Mon Jun 6 05:41:05 2005 Subject: [Biojava-l] [1.4pre1] BioJava's-Regex with ambigous symbols In-Reply-To: Message-ID: <20050606094906.EA0142E02F@rbox4.erasmusmc.nl> Hi, Thanks for your reply. I'm using regex on SymbolLists instead of Strings, because I'm working with large sequences stored in the memory. I think SymbolLists are more memory efficient than Strings. But my problem is solved now. I removed ambiguous symbols from the regex pattern. Regards, Jesse -----Original Message----- From: Sylvain Subject: [Biojava-l] [1.4pre1] BioJava's-Regex with ambigous symbols Hi, Have a look at the MotifTools class. You'll find the createRegex method that creates a regex with a degenerate sequence using a SymbolList that have the sequence with degenerate letters. It works great. The returned String can then be used with the usual Pattern/Matcher classes in Java. Hope this helps Best regards Sylvain From jesse-t at chello.nl Mon Jun 6 06:08:03 2005 From: jesse-t at chello.nl (Jesse) Date: Mon Jun 6 05:59:43 2005 Subject: [Biojava-l] [1.4pre1] BioJava's-Regex with ambigous symbols In-Reply-To: <200506052155.33599.c.lieftink@xs4all.nl> Message-ID: <20050606100752.96D622E02A@rbox4.erasmusmc.nl> Hi Cor, Thanks for your reply. I corrected the pattern by doing the following. When BioJava's org.biojava.bio.molbio.RestrictionEnzyme.forwardRegex() returns the regex of a RestrictionEnzyme "gtakm" it will return "gta[gtk][acm]". In which k (G or T) and m (A or C) are ambiguous. So the ambiguous symbol "k" is converted ambiguous "[gtk]", by putting the "k" in the brackets. I simply solved it by removed all ambiguous symbols from the returned regex string. String searchPattern = re.getForwardRegex().replaceAll("[rymkswbdhvn]", ""); Regards, Jesse -----Original Message----- From: Cor Subject: RE: [Biojava-l] [1.4pre1] BioJava's-Regex with ambigous symbols Hi Jesse, Although I am a newbie myself, I have written some example code based on existing BioJava-testcode : String symbols = "atgcgacgtcttaannnnnnatgcaac"; SymbolList sl = DNATools.createDNA(symbols); String patternString = "g[ag]cg[ct]c"; PatternFactory fact = PatternFactory.makeFactory(DNATools.getDNA()); Pattern pattern = fact.compile(patternString); Matcher matcher = pattern.matcher(sl); if (matcher.find()) { System.out.println("match found"); } else { fail("failed to find target "); } In the pattern, you have to use [ag] in stead of [agr]. Otherwise you will get the error: org.biojava.utils.regex.RegexException: all variant symbols must be atomic. at org.biojava.utils.regex.PatternChecker.parseVariantSymbols(PatternChecker.ja va:363) Regards, Cor From boehme at mpiib-berlin.mpg.de Mon Jun 6 10:18:54 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Mon Jun 6 10:13:15 2005 Subject: [Biojava-l] Re: [BioSQL-l] How to add a feature? In-Reply-To: <429F0AE6.6020806@nrc-cnrc.gc.ca> References: <0C528E3670D8CE4B8E013F6749231AA606E7F7@ANTARESIA.be.devgen.com> <429F03A2.1090208@mpiib-berlin.mpg.de> <429F0AE6.6020806@nrc-cnrc.gc.ca> Message-ID: <42A45B4E.5070906@mpiib-berlin.mpg.de> Thanks - I knew it would be quite simple, as always with BioJava (once I've figuered out how to, that is)! Martina Simon Foote wrote: > Hi Martina, > > To add a feature to a sequence stored in a BioSQL database, all you > have to do is retrieve the sequence and then add a feature to it. The > following simplified code shows you the steps: > > // Retrieve the sequence from BioSQLSequenceDB > Sequence seq = bsd.getSequence(id); > // Create new stranded feature > StrandedFeature.Template templ = new StrandedFeature.Template(); > templ.location = ... > templ.strand = ... > templ.type = ... > templ.source = ... > templ.annotation = [A created SimpleAnnotation object] > // Add feature to sequence > seq.createFeature(templ); > // Note: adding the feature like this will automatically persist the > feature, so you don't have to worry about doing that. > > Cheers, > Simon Foote > From corlieftink at hotmail.com Mon Jun 6 14:36:45 2005 From: corlieftink at hotmail.com (Cor Lieftink) Date: Mon Jun 6 14:28:49 2005 Subject: [Biojava-l] BioJava libraries for cell modelling wanted? Message-ID: Hi all, Is anyone working on cell modelling as for example described in the article below (1)? And if so, is she (also) using bioJava for this and/or other open source projects? And if so, what kind of libraries would be helpfull for you? Myself, I am a Java-programmer, in daily life working for a bank, but shortly in my own time I entered the field of bioinformatics. Thanks for your reply in advance! Regards, Cor (1) Cell Modeling Plays Role in Filling the Black Box http://www.genpromag.com/ShowPR~PUBCODE~018~ACCT~1800000100~ISSUE~0504~RELTYPE~PR~ORIGRELTYPE~BIO~PRODCODE~00000000~PRODLETT~AG.html _________________________________________________________________ MSN Webmessenger overal en altijd beschikbaar http://webmessenger.msn.com/ From mark.schreiber at novartis.com Mon Jun 6 21:35:52 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Jun 6 21:27:57 2005 Subject: [Biojava-l] BioJava libraries for cell modelling wanted? Message-ID: It aint biojava. Both of the screen shots look like commercial metabolic engineering and cell modelling software. There is a nice open source project called cellware that might be of interest though. www.bii.a-star.edu.sg/achievements/ applications/cellware/index.asp - Mark "Cor Lieftink" Sent by: biojava-l-bounces@portal.open-bio.org 06/07/2005 02:36 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] BioJava libraries for cell modelling wanted? Hi all, Is anyone working on cell modelling as for example described in the article below (1)? And if so, is she (also) using bioJava for this and/or other open source projects? And if so, what kind of libraries would be helpfull for you? Myself, I am a Java-programmer, in daily life working for a bank, but shortly in my own time I entered the field of bioinformatics. Thanks for your reply in advance! Regards, Cor (1) Cell Modeling Plays Role in Filling the Black Box http://www.genpromag.com/ShowPR~PUBCODE~018~ACCT~1800000100~ISSUE~0504~RELTYPE~PR~ORIGRELTYPE~BIO~PRODCODE~00000000~PRODLETT~AG.html _________________________________________________________________ MSN Webmessenger overal en altijd beschikbaar http://webmessenger.msn.com/ _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From bader at cbio.mskcc.org Tue Jun 7 20:22:23 2005 From: bader at cbio.mskcc.org (Gary Bader) Date: Tue Jun 7 20:14:24 2005 Subject: [Biojava-l] Homologene parser update? Message-ID: <42A63A3F.2080308@cbio.mskcc.org> Hi, I just tried the Homologene parser in biojava 1.4pre1 and noticed that it only supports the deprecated Homologene file format and throws a number of exceptions parsing that file (likely because of updates to the old file format). Is there an update for this parser available anywhere? I think it would be very useful. Thanks, Gary From mark.schreiber at novartis.com Tue Jun 7 20:54:15 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Jun 7 20:46:10 2005 Subject: [Biojava-l] Homologene parser update? Message-ID: Hello - It seems most of this was contributed by David Huen. I'm not sure if he plans an update. If the changes are not large then you might want to consider contributing the changes yourself. Best of Luck - Mark Gary Bader Sent by: biojava-l-bounces@portal.open-bio.org 06/08/2005 08:22 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Homologene parser update? Hi, I just tried the Homologene parser in biojava 1.4pre1 and noticed that it only supports the deprecated Homologene file format and throws a number of exceptions parsing that file (likely because of updates to the old file format). Is there an update for this parser available anywhere? I think it would be very useful. Thanks, Gary _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From bader at cbio.mskcc.org Wed Jun 8 17:53:03 2005 From: bader at cbio.mskcc.org (Gary Bader) Date: Wed Jun 8 17:44:38 2005 Subject: [Biojava-l] Homologene parser update? In-Reply-To: References: Message-ID: <42A768BF.7070901@cbio.mskcc.org> Hi Mark, Thanks. The changes that NCBI made to their file formats are large, so I could write another builder/parser for the new file format, but it would not map completely to the existing parser (which is now broken because of NCBI format changes to even the deprecated file format). If I want to contribute code, who decides that the code is worthy (e.g. does a design review)? Ideally, this would happen before I start coding. I haven't figured out if I am going to use the existing framework for my current project, since the file format has become simpler, but I would like to contribute if possible. Thanks, Gary mark.schreiber@novartis.com wrote: > Hello - > > It seems most of this was contributed by David Huen. I'm not sure if he > plans an update. If the changes are not large then you might want to > consider contributing the changes yourself. > > Best of Luck > > - Mark > > > > > > Gary Bader > Sent by: biojava-l-bounces@portal.open-bio.org > 06/08/2005 08:22 AM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] Homologene parser update? > > > Hi, > I just tried the Homologene parser in biojava 1.4pre1 and > noticed that > it only supports the deprecated Homologene file format and throws a > number of exceptions parsing that file (likely because of updates to the > old file format). Is there an update for this parser available > anywhere? I think it would be very useful. > > Thanks, > Gary > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > From mark.schreiber at novartis.com Wed Jun 8 20:55:03 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Jun 8 20:47:05 2005 Subject: [Biojava-l] Homologene parser update? Message-ID: Hello - The best way to proove the code is worthy is to provide JUnit tests with the code that provide good coverage of the functionality. If the tests pass the code is worthy. Other more esoteric things like good API design and efficiency are also nice but if the unit tests don't pass it doesn't work. Let us know what you plan to do. - Mark Gary Bader 06/09/2005 05:53 AM To: Mark Schreiber/GP/Novartis@PH cc: biojava-l@biojava.org, smh1008@cus.cam.ac.uk Subject: Re: [Biojava-l] Homologene parser update? Hi Mark, Thanks. The changes that NCBI made to their file formats are large, so I could write another builder/parser for the new file format, but it would not map completely to the existing parser (which is now broken because of NCBI format changes to even the deprecated file format). If I want to contribute code, who decides that the code is worthy (e.g. does a design review)? Ideally, this would happen before I start coding. I haven't figured out if I am going to use the existing framework for my current project, since the file format has become simpler, but I would like to contribute if possible. Thanks, Gary mark.schreiber@novartis.com wrote: > Hello - > > It seems most of this was contributed by David Huen. I'm not sure if he > plans an update. If the changes are not large then you might want to > consider contributing the changes yourself. > > Best of Luck > > - Mark > > > > > > Gary Bader > Sent by: biojava-l-bounces@portal.open-bio.org > 06/08/2005 08:22 AM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] Homologene parser update? > > > Hi, > I just tried the Homologene parser in biojava 1.4pre1 and > noticed that > it only supports the deprecated Homologene file format and throws a > number of exceptions parsing that file (likely because of updates to the > old file format). Is there an update for this parser available > anywhere? I think it would be very useful. > > Thanks, > Gary > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > From bader at cbio.mskcc.org Wed Jun 8 21:15:00 2005 From: bader at cbio.mskcc.org (Gary Bader) Date: Wed Jun 8 21:06:48 2005 Subject: [Biojava-l] Homologene parser update? In-Reply-To: References: Message-ID: <42A79814.3050400@cbio.mskcc.org> Hi, I just wrote a homologene parser that includes unit tests. NCBI split homologene into simple + complex file formats - I only parse the simple format. Should I send you the code? I doubt you would want to integrate it now, since I can imagine a few more useful methods, but, with design pointers, I could extend the code to be more useful. Should we take this discussion off the list? Cheers, Gary mark.schreiber@novartis.com wrote: > Hello - > > The best way to proove the code is worthy is to provide JUnit tests with > the code that provide good coverage of the functionality. If the tests > pass the code is worthy. Other more esoteric things like good API design > and efficiency are also nice but if the unit tests don't pass it doesn't > work. > > Let us know what you plan to do. > > - Mark > > > > > > Gary Bader > 06/09/2005 05:53 AM > > > To: Mark Schreiber/GP/Novartis@PH > cc: biojava-l@biojava.org, smh1008@cus.cam.ac.uk > Subject: Re: [Biojava-l] Homologene parser update? > > > Hi Mark, > Thanks. The changes that NCBI made to their file formats > are large, so > I could write another builder/parser for the new file format, but it > would not map completely to the existing parser (which is now broken > because of NCBI format changes to even the deprecated file format). If > I want to contribute code, who decides that the code is worthy (e.g. > does a design review)? Ideally, this would happen before I start coding. > I haven't figured out if I am going to use the existing > framework for > my current project, since the file format has become simpler, but I > would like to contribute if possible. > > Thanks, > Gary > > mark.schreiber@novartis.com wrote: > >>Hello - >> >>It seems most of this was contributed by David Huen. I'm not sure if he >>plans an update. If the changes are not large then you might want to >>consider contributing the changes yourself. >> >>Best of Luck >> >>- Mark >> >> >> >> >> >>Gary Bader >>Sent by: biojava-l-bounces@portal.open-bio.org >>06/08/2005 08:22 AM >> >> >> To: biojava-l@biojava.org >> cc: (bcc: Mark Schreiber/GP/Novartis) >> Subject: [Biojava-l] Homologene parser update? >> >> >>Hi, >> I just tried the Homologene parser in biojava 1.4pre1 > > and > >>noticed that >>it only supports the deprecated Homologene file format and throws a >>number of exceptions parsing that file (likely because of updates to the > > >>old file format). Is there an update for this parser available >>anywhere? I think it would be very useful. >> >>Thanks, >>Gary >> >>_______________________________________________ >>Biojava-l mailing list - Biojava-l@biojava.org >>http://biojava.org/mailman/listinfo/biojava-l >> >> >> > > > > From great_fred at yahoo.com Fri Jun 10 10:18:29 2005 From: great_fred at yahoo.com (=?iso-8859-1?q?S=E9bastien=20PETIT?=) Date: Fri Jun 10 10:10:16 2005 Subject: [Biojava-l] Parse a Blast Message-ID: <20050610141829.41126.qmail@web32210.mail.mud.yahoo.com> Hello First of all, sorry for my English (I'm French..) Now, my problem... I have Blast in XML format. I want to parse it because I want just the alignment. But, when I use a script found on the Net, the answer is : --> org.xml.sax.SAXException: Could not recognise the format of this file as one supported by the framework. Does anybody have the same problem? Or Does anybody have an idea to resolve my problem? Here is the script (Sorry, the comment are in French..) import java.io.*; import java.util.*; import org.biojava.bio.program.sax.*; import org.biojava.bio.program.ssbind.*; import org.biojava.bio.search.*; import org.biojava.bio.seq.db.*; import org.xml.sax.*; import org.biojava.bio.*; public class BlastParser { /** * args[0] est assum? ?tre le nom du fichier de sortie BLAST */ public static void main(String[] args) { try { //obtenir les entr?es Blast sous la forme de Stream InputStream is = new FileInputStream(args[0]); //construire un BlastLikeSAXParser BlastLikeSAXParser parser = new BlastLikeSAXParser(); //construire un adaptateur pour SAX event qui les passera a un Handler. SeqSimilarityAdapter adapter = new SeqSimilarityAdapter(); //initialiser l'adaptateur des SAX events de l'objet parser parser.setContentHandler(adapter); //la liste qui contiendra les SeqSimilaritySearchResults List results = new ArrayList(); //cr?er le SearchContentHandler qui construira les SeqSimilaritySearchResults //dans la liste results SearchContentHandler builder = new BlastLikeSearchBuilder(results, new DummySequenceDB("queries"), new DummySequenceDBInstallation()); //enregistrer builder aupres de adapter adapter.setSearchContentHandler(builder); //parcourir le fichier; apr?s, la liste result contiendra //les SeqSimilaritySearchResults parser.parse(new InputSource(is)); //formatResults(results); } catch (SAXException ex) { //probleme de XML ex.printStackTrace(); } catch (IOException ex) { //probleme de IO, comme un fichier introuvable ex.printStackTrace(); } } } Thank you for any answer... Great-Fred ___________________________________________________________________________ Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger T?l?chargez cette version sur http://fr.messenger.yahoo.com From mark.schreiber at novartis.com Sun Jun 12 22:05:58 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Jun 12 21:57:44 2005 Subject: [Biojava-l] Parse a Blast Message-ID: Hello - I suspect the problem is that you are using the BlastLikeSAXParser. This was written in the days before blast xml was available (and stable) and is an adapter that parses a non-xml blast report and produces SAX events. The parser you want is org.biojava.bio.program.sax.blastxml.BlastXMLParser Hope this helps, - Mark S?bastien PETIT Sent by: biojava-l-bounces@portal.open-bio.org 06/10/2005 10:18 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Parse a Blast Hello First of all, sorry for my English (I'm French..) Now, my problem... I have Blast in XML format. I want to parse it because I want just the alignment. But, when I use a script found on the Net, the answer is : --> org.xml.sax.SAXException: Could not recognise the format of this file as one supported by the framework. Does anybody have the same problem? Or Does anybody have an idea to resolve my problem? Here is the script (Sorry, the comment are in French..) import java.io.*; import java.util.*; import org.biojava.bio.program.sax.*; import org.biojava.bio.program.ssbind.*; import org.biojava.bio.search.*; import org.biojava.bio.seq.db.*; import org.xml.sax.*; import org.biojava.bio.*; public class BlastParser { /** * args[0] est assum? ?tre le nom du fichier de sortie BLAST */ public static void main(String[] args) { try { //obtenir les entr?es Blast sous la forme de Stream InputStream is = new FileInputStream(args[0]); //construire un BlastLikeSAXParser BlastLikeSAXParser parser = new BlastLikeSAXParser(); //construire un adaptateur pour SAX event qui les passera a un Handler. SeqSimilarityAdapter adapter = new SeqSimilarityAdapter(); //initialiser l'adaptateur des SAX events de l'objet parser parser.setContentHandler(adapter); //la liste qui contiendra les SeqSimilaritySearchResults List results = new ArrayList(); //cr?er le SearchContentHandler qui construira les SeqSimilaritySearchResults //dans la liste results SearchContentHandler builder = new BlastLikeSearchBuilder(results, new DummySequenceDB("queries"), new DummySequenceDBInstallation()); //enregistrer builder aupres de adapter adapter.setSearchContentHandler(builder); //parcourir le fichier; apr?s, la liste result contiendra //les SeqSimilaritySearchResults parser.parse(new InputSource(is)); //formatResults(results); } catch (SAXException ex) { //probleme de XML ex.printStackTrace(); } catch (IOException ex) { //probleme de IO, comme un fichier introuvable ex.printStackTrace(); } } } Thank you for any answer... Great-Fred ___________________________________________________________________________ Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger T?l?chargez cette version sur http://fr.messenger.yahoo.com _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From great_fred at yahoo.com Mon Jun 13 08:47:36 2005 From: great_fred at yahoo.com (=?iso-8859-1?q?S=E9bastien=20PETIT?=) Date: Mon Jun 13 08:44:10 2005 Subject: [Biojava-l] Parse a Blast In-Reply-To: Message-ID: <20050613124736.57341.qmail@web32211.mail.mud.yahoo.com> Hello, Mark Thanks for the answer... The parser you talked me about, may be the good one... But I'm maybe not enought good in Java and I can't use it. I spent all the morning trying to understand this class and how it works, but, nothing....I didn't understand how I can use it... Thank you for any additional help Great-Fred --- mark.schreiber@novartis.com a ?crit : > Hello - > > I suspect the problem is that you are using the BlastLikeSAXParser. > This > was written in the days before blast xml was available (and stable) > and is > an adapter that parses a non-xml blast report and produces SAX > events. > > The parser you want is > org.biojava.bio.program.sax.blastxml.BlastXMLParser > > Hope this helps, > > - Mark > > > > > S?bastien PETIT > Sent by: biojava-l-bounces@portal.open-bio.org > 06/10/2005 10:18 PM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] Parse a Blast > > > Hello > > First of all, sorry for my English (I'm French..) > Now, my problem... > I have Blast in XML format. I want to parse it because I want just > the > alignment. > But, when I use a script found on the Net, the answer is : > --> org.xml.sax.SAXException: Could not recognise the format of this > file as one supported by the framework. > > Does anybody have the same problem? > Or Does anybody have an idea to resolve my problem? > > Here is the script (Sorry, the comment are in French..) > > import java.io.*; > import java.util.*; > > import org.biojava.bio.program.sax.*; > import org.biojava.bio.program.ssbind.*; > import org.biojava.bio.search.*; > import org.biojava.bio.seq.db.*; > import org.xml.sax.*; > import org.biojava.bio.*; > > public class BlastParser { > /** > * args[0] est assum? ?tre le nom du fichier de sortie BLAST */ > public static void main(String[] args) { > try { > //obtenir les entr?es Blast sous la forme de Stream > InputStream is = new FileInputStream(args[0]); > > //construire un BlastLikeSAXParser > BlastLikeSAXParser parser = new BlastLikeSAXParser(); > > //construire un adaptateur pour SAX event qui les passera a > un > Handler. > SeqSimilarityAdapter adapter = new > SeqSimilarityAdapter(); > > //initialiser l'adaptateur des SAX events de l'objet parser > parser.setContentHandler(adapter); > > //la liste qui contiendra les SeqSimilaritySearchResults > List results = new ArrayList(); > > //cr?er le SearchContentHandler qui construira les > SeqSimilaritySearchResults > //dans la liste results > SearchContentHandler builder = new > BlastLikeSearchBuilder(results, > new DummySequenceDB("queries"), new > DummySequenceDBInstallation()); > > //enregistrer builder aupres de adapter > adapter.setSearchContentHandler(builder); > > //parcourir le fichier; apr?s, la liste result contiendra > //les SeqSimilaritySearchResults > > parser.parse(new InputSource(is)); > //formatResults(results); > } > catch (SAXException ex) { > //probleme de XML > ex.printStackTrace(); > } > catch (IOException ex) { > //probleme de IO, comme un fichier introuvable > ex.printStackTrace(); > } > } > } > > Thank you for any answer... > > Great-Fred > > > > > > > ___________________________________________________________________________ > > > Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! > Messenger > > T?l?chargez cette version sur http://fr.messenger.yahoo.com > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > > ___________________________________________________________________________ Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger T?l?chargez cette version sur http://fr.messenger.yahoo.com From Russell.Smithies at agresearch.co.nz Mon Jun 13 21:51:26 2005 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Mon Jun 13 21:43:12 2005 Subject: [Biojava-l] OT: JTest Message-ID: Hi all, Sorry about the off-topic question but has anyone tried JTest from ParaSoft? We're thinking of buying it and all the reviews I've read seem OK but I'd like to hear comments from someone who actually uses it. Thanx, Russell Smithies Bioinformatics Software Developer Invermay Research Centre Puddle Alley, Mosgiel, New Zealand www.agresearch.co.nz ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From franckv at ebi.ac.uk Tue Jun 14 10:47:26 2005 From: franckv at ebi.ac.uk (Franck Valentin) Date: Tue Jun 14 10:39:38 2005 Subject: [Biojava-l] Applet and bytecore.jar Message-ID: <1118760446.12636.113.camel@pongo.ebi.ac.uk> Hi all, I want to create an applet which displays feature tables graphically. As a test and to learn biojava I adapted the FastBeadDemo.java example. The problem is that it works fine as a standalone application but when use it as an applet I get the following error : java.lang.ExceptionInInitializerError at org.biojava.bio.gui.sequence.FilteringRenderer.getContext(FilteringRenderer.java:171) [...] Caused by: java.security.AccessControlException: access denied (java.lang.RuntimePermission createClassLoader) at java.security.AccessControlContext.checkPermission(AccessControlContext.java:269) [...] I guess it comes from the bytecore.jar library I need to use and which tries to create a ClassLoader in an applet context. Does that mean I need to create something like a signed applet (I know very little about that !) to use the biojava libray, or does a turn around exist ? By the way what bytecore.jar is used for ? Many thanks Franck From mark.schreiber at novartis.com Tue Jun 14 21:09:18 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Jun 14 21:02:21 2005 Subject: [Biojava-l] Re: [Biojava-dev] Local binary execution Message-ID: We would normally not like to use a new JDK in biojava unless it is well supported on all the OS's people are using. Having said that there are several attractive features which would make it nice to use. Is anyones current OS not supporting java 1.5? - Mark Michael Barton Sent by: biojava-dev-bounces@portal.open-bio.org 06/15/2005 02:26 AM To: BioJava-dev cc: (bcc: Mark Schreiber/GP/Novartis) Subject: Re: [Biojava-dev] Local binary execution I had a look at the post you were refering to. In terms of the ant support for local binary execution I think it is very similar to the newly implemented ProcessBuilder in Java 1.5. This class has a similar way way of adding command line arguments to that of ant . The classes I'm suggesting have an enum of arguments specific to the application which may convienient for suppling different switch/argument pairs, as it it means that only arguments for which the binary allows can be supplied. Any errors should therefore come from incorrent argument values rather than incorrect arguments. If that makes sense. In addition the class throws an exception if the essential arguments required to run the search are not supplied. This means however that the classes are written in Java 1.5. Would this be a problem? On Thu, 2005-06-09 at 11:54 -0400, Michael Heuer wrote: > Hello Michael, > > Personally I think this kind of code might be better suited in a more > general library, say in an Apache Jakarta Commons project for example. > > In fact, there was just a proposal to pull the exec code out of ant into a > separate self-contained library to the commonds-dev mailing list a couple > of days ago: > > > http://tinyurl.com/9culs > > That said, this comes up quite frequently here, so perhaps we should just > bite the bullet and do it up right. > > michael > > > On Thu, 9 Jun 2005, Michael Barton wrote: > > > > > Hi, > > > > I'm Bioinformatics MRes student at Newcastle. I've been messing around > > with some java code to execute bioinformatics binaries. It was > > originally intended for blast but has also been extended for genewise. > > It takes the hassle out of using process / process builder a little bit. > > > > Use goes along the lines of something like this > > > > //Search factory for creating searches > > SearchFactory bsf; > > bsf = new BlastSearchFactory(); > > > > //Paramterise with search specific variables > > bsf.setSearchBinaryLocation(test_data + "/blast/binary"); > > bsf.setSearchParameter(BlastSearchFactory.Parameter.blastType,"blastn"); > > bsf.setSearchParameter(BlastSearchFactory.Parameter.database, > > test_data + "/blast/db/sargasso"); > > > > //Create immutable search object which can be used to run mutiple > > searches on the same database > > Search blastSearch = bsf.getSearch(); > > > > Simple search result object which returns inputstream > > SearchResult sr = blastSearch.execute(new File(test_data + > > "/blast/query/query")); > > > > InputStream is = sr.getResultStream(); > > > > It's seems to work okay on linux, I haven't tested it on windows. > > > > There's a little bit of JavaDoc I started work on but it's a little bit > > messed up from where I've been changing things around. > > > > The source/jar/doc are all here. There's test cases too. > > > > http://www.students.ncl.ac.uk/michael.barton1/ > > > > Mike > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev@biojava.org > > http://biojava.org/mailman/listinfo/biojava-dev > > > _______________________________________________ biojava-dev mailing list biojava-dev@biojava.org http://biojava.org/mailman/listinfo/biojava-dev From mark.schreiber at novartis.com Tue Jun 14 21:11:50 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Jun 14 21:03:49 2005 Subject: [Biojava-l] LSID Message-ID: Hello - Does anyone know what happened to the Life Science Identifier proposal? I notice that there are some classes in biojava to handle it but I'm not sure it was ever widely accepted by the community. Come to think of it, does anyone know what happened to the I3C who proposed it? If it's all dead or dying maybe it should be deprecated or removed at a later date? - Mark Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From hollandr at gis.a-star.edu.sg Tue Jun 14 21:36:09 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Tue Jun 14 21:30:46 2005 Subject: [Biojava-l] Re: [Biojava-dev] Local binary execution Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCA935@BIONIC.biopolis.one-north.com> Linux supports Java 1.5 but only using the Sun JDK on ia32 and AMD Opterons. Support for other architectures on Linux (such as ia64, PPC, or Alpha) is restricted to specialist provisions from vendors such as HP and the open source efforts such as Blackdown JDK. At a quick check, the Alpha is only at 1.4.2 (from HP), likewise PPC (from IBM), whereas ia64 can run 1.5 apps using HP's JRE but no compiler yet exists for them. There may also be some open source purists out there who object when they can't use their favourite open source JDK any more... Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > mark.schreiber@novartis.com > Sent: Wednesday, June 15, 2005 9:09 AM > To: Michael Barton > Cc: biojava-l@open-bio.org; BioJava-dev > Subject: [Biojava-l] Re: [Biojava-dev] Local binary execution > > > We would normally not like to use a new JDK in biojava unless > it is well > supported on all the OS's people are using. Having said that > there are > several attractive features which would make it nice to use. > > Is anyones current OS not supporting java 1.5? > > - Mark > > > > > > Michael Barton > Sent by: biojava-dev-bounces@portal.open-bio.org > 06/15/2005 02:26 AM > > > To: BioJava-dev > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: Re: [Biojava-dev] Local binary execution > > > I had a look at the post you were refering to. In terms of the ant > support for local binary execution I think it is very similar to the > newly implemented ProcessBuilder in Java 1.5. > This class has a similar way way of adding command line arguments to > that of ant . > > The classes I'm suggesting have an enum of arguments specific to the > application which may convienient for suppling different > switch/argument > pairs, as it it means that only arguments for which the binary allows > can be supplied. > Any errors should therefore come from incorrent argument values rather > than incorrect arguments. If that makes sense. > In addition the class throws an exception if the essential arguments > required to run the search are not supplied. > > This means however that the classes are written in Java 1.5. > Would this > be a problem? > > > On Thu, 2005-06-09 at 11:54 -0400, Michael Heuer wrote: > > Hello Michael, > > > > Personally I think this kind of code might be better suited > in a more > > general library, say in an Apache Jakarta Commons project > for example. > > > > In fact, there was just a proposal to pull the exec code > out of ant into > a > > separate self-contained library to the commonds-dev mailing list a > couple > > of days ago: > > > > > http://tinyurl.com/9culs > > > > That said, this comes up quite frequently here, so perhaps > we should > just > > bite the bullet and do it up right. > > > > michael > > > > > > On Thu, 9 Jun 2005, Michael Barton wrote: > > > > > > > > Hi, > > > > > > I'm Bioinformatics MRes student at Newcastle. I've been > messing around > > > with some java code to execute bioinformatics binaries. It was > > > originally intended for blast but has also been extended > for genewise. > > > It takes the hassle out of using process / process > builder a little > bit. > > > > > > Use goes along the lines of something like this > > > > > > //Search factory for creating searches > > > SearchFactory bsf; > > > bsf = new BlastSearchFactory(); > > > > > > //Paramterise with search specific variables > > > bsf.setSearchBinaryLocation(test_data + "/blast/binary"); > > > > bsf.setSearchParameter(BlastSearchFactory.Parameter.blastType, > "blastn"); > > > bsf.setSearchParameter(BlastSearchFactory.Parameter.database, > > > test_data + "/blast/db/sargasso"); > > > > > > //Create immutable search object which can be used to run mutiple > > > searches on the same database > > > Search blastSearch = bsf.getSearch(); > > > > > > Simple search result object which returns inputstream > > > SearchResult sr = blastSearch.execute(new File(test_data + > > > "/blast/query/query")); > > > > > > InputStream is = sr.getResultStream(); > > > > > > It's seems to work okay on linux, I haven't tested it on windows. > > > > > > There's a little bit of JavaDoc I started work on but > it's a little > bit > > > messed up from where I've been changing things around. > > > > > > The source/jar/doc are all here. There's test cases too. > > > > > > http://www.students.ncl.ac.uk/michael.barton1/ > > > > > > Mike > > > > > > _______________________________________________ > > > biojava-dev mailing list > > > biojava-dev@biojava.org > > > http://biojava.org/mailman/listinfo/biojava-dev > > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev@biojava.org > http://biojava.org/mailman/listinfo/biojava-dev > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From bader at cbio.mskcc.org Tue Jun 14 22:20:49 2005 From: bader at cbio.mskcc.org (Gary Bader) Date: Tue Jun 14 22:12:03 2005 Subject: [Biojava-l] LSID In-Reply-To: References: Message-ID: <42AF9081.5000505@cbio.mskcc.org> LSID is still cooking, but is not widely accepted, but there are a number of people still pushing for it. IBM is the current caretaker. It remains to be seen whether it will be widely adopted by e.g. the sequence databases. http://lsid.sourceforge.net/ Gary mark.schreiber@novartis.com wrote: > Hello - > > Does anyone know what happened to the Life Science Identifier proposal? I > notice that there are some classes in biojava to handle it but I'm not > sure it was ever widely accepted by the community. Come to think of it, > does anyone know what happened to the I3C who proposed it? > > If it's all dead or dying maybe it should be deprecated or removed at a > later date? > > - Mark > > Mark Schreiber > Principal Scientist (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l From terry at triplett.org Tue Jun 14 22:27:19 2005 From: terry at triplett.org (Terry L. Triplett) Date: Tue Jun 14 22:19:28 2005 Subject: [Biojava-l] Re: [Biojava-dev] Local binary execution In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601DCA935@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCA935@BIONIC.biopolis.one-north.com> Message-ID: <42AF9207.7000608@triplett.org> Not to be pedantic, and only peripherally on topic, but Blackdown is not, and has never been open source. Back before Sun became interested in supporting a JDK on Linux, the Blackdown folks made it possible by signing whatever NDA was required and getting access to the JDK source. When Sun did become interested in Linux Java, the Sun JDK for Linux was the Blackdown codebase, plus some stuff from Borland/Inprise/whatever. These days the Sun JDK and Blackdown's version are more or less equivalent, as I understand it. Richard HOLLAND wrote: >and the open source efforts such as Blackdown JDK. > > From mark.schreiber at novartis.com Tue Jun 14 23:44:11 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Jun 14 23:36:06 2005 Subject: [Biojava-l] Announce: BioJava 1.4pre2 Message-ID: Hello All - A second release candidate for biojava 1.4 is now out. Apart from a years worth of bug fixes and javadoc clean ups the major change over 1.4pre1 is a major work over of the biosql bindings so that BioJava now operates with the upcoming biosql 1.0. Please take this code out for a spin and give your feedback to the list. I hope to make an official release in about a week so we can start working on 1.5. It's certainly been a long time between releases and I would like to reduce this in the near future. Check it out from www.biojava.org or go directly to http://www.biojava.org/download14.html (do not pass Go, do not collect $200). Thanks to Michael Heuer and Richard Holland for helping to squeeze this one out. - Mark Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From heuermh at acm.org Wed Jun 15 15:27:38 2005 From: heuermh at acm.org (Michael Heuer) Date: Wed Jun 15 15:22:48 2005 Subject: [Biojava-l] LSID In-Reply-To: <42AF9081.5000505@cbio.mskcc.org> Message-ID: The biojava LSID and the IBM LSID are slightly different APIs, the IBM one the more complete of the two. There also are/were LSID client implementations that I'm not very familiar with in taverna [0] and for whatever reason in an email client called Haystack [1]. I would move the biojava LSID implementation for deprecation after release of version 1.4.x but note that it is used internally, see e.g. org/biojava/utils/lsid/class-use/LifeScienceIdentifier.html in the 1.4pre2 javadocs. michael [0] http://taverna.sf.net [1] http://haystack.lcs.mit.edu On Tue, 14 Jun 2005, Gary Bader wrote: > LSID is still cooking, but is not widely accepted, but there are a > number of people still pushing for it. IBM is the current caretaker. > It remains to be seen whether it will be widely adopted by e.g. the > sequence databases. > > http://lsid.sourceforge.net/ > > Gary > > mark.schreiber@novartis.com wrote: > > Hello - > > > > Does anyone know what happened to the Life Science Identifier proposal? I > > notice that there are some classes in biojava to handle it but I'm not > > sure it was ever widely accepted by the community. Come to think of it, > > does anyone know what happened to the I3C who proposed it? > > > > If it's all dead or dying maybe it should be deprecated or removed at a > > later date? > > > > - Mark > > > > Mark Schreiber > > Principal Scientist (Bioinformatics) > > > > Novartis Institute for Tropical Diseases (NITD) > > 10 Biopolis Road > > #05-01 Chromos > > Singapore 138670 > > www.nitd.novartis.com > > > > phone +65 6722 2973 > > fax +65 6722 2910 > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l@biojava.org > > http://biojava.org/mailman/listinfo/biojava-l > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From avinash at lanl.gov Wed Jun 15 15:34:50 2005 From: avinash at lanl.gov (Avinash Kewalramani) Date: Wed Jun 15 15:28:13 2005 Subject: [Biojava-l] Phrap output Message-ID: <42B082DA.4030903@lanl.gov> Hi I need store some information from an ace assembly file(which is Phrap plain text output). To do this I will have to write my own parses to parse this complicated text file. Is there any class In bioJava or anywhere else which does this.The best scenario would be if some code converts this file to xml output which can be easily parsed I have looked around a bit in Biojava and elsewhere and couldn't find anything for this. I dont want to use Perl(BioPerl probably has this) Thanks -- ---------------------------------------------------------------------- Avinash Kewalramani Technical Lead-Genome Informatics Group Bioscience Division Los Alamos National Laboratory Los Alamos, NM 87545 Phone: 505-664-0527 Cell: 816-213-1908 E-mail: avinash@lanl.gov ---------------------------------------------------------------------- From mark.schreiber at novartis.com Wed Jun 15 21:34:10 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Jun 15 21:26:12 2005 Subject: [Biojava-l] LSID Message-ID: The internal use was mine (I was just using it as a substitute for a namespace). Maybe we should upgrade it to be compatable with IBM or Taverna? - Mark Michael Heuer Sent by: biojava-l-bounces@portal.open-bio.org 06/16/2005 03:27 AM To: Gary Bader cc: biojava-l@open-bio.org, Mark Schreiber/GP/Novartis@PH Subject: Re: [Biojava-l] LSID The biojava LSID and the IBM LSID are slightly different APIs, the IBM one the more complete of the two. There also are/were LSID client implementations that I'm not very familiar with in taverna [0] and for whatever reason in an email client called Haystack [1]. I would move the biojava LSID implementation for deprecation after release of version 1.4.x but note that it is used internally, see e.g. org/biojava/utils/lsid/class-use/LifeScienceIdentifier.html in the 1.4pre2 javadocs. michael [0] http://taverna.sf.net [1] http://haystack.lcs.mit.edu On Tue, 14 Jun 2005, Gary Bader wrote: > LSID is still cooking, but is not widely accepted, but there are a > number of people still pushing for it. IBM is the current caretaker. > It remains to be seen whether it will be widely adopted by e.g. the > sequence databases. > > http://lsid.sourceforge.net/ > > Gary > > mark.schreiber@novartis.com wrote: > > Hello - > > > > Does anyone know what happened to the Life Science Identifier proposal? I > > notice that there are some classes in biojava to handle it but I'm not > > sure it was ever widely accepted by the community. Come to think of it, > > does anyone know what happened to the I3C who proposed it? > > > > If it's all dead or dying maybe it should be deprecated or removed at a > > later date? > > > > - Mark > > > > Mark Schreiber > > Principal Scientist (Bioinformatics) > > > > Novartis Institute for Tropical Diseases (NITD) > > 10 Biopolis Road > > #05-01 Chromos > > Singapore 138670 > > www.nitd.novartis.com > > > > phone +65 6722 2973 > > fax +65 6722 2910 > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l@biojava.org > > http://biojava.org/mailman/listinfo/biojava-l > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Wed Jun 15 21:36:49 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Jun 15 21:28:40 2005 Subject: [Biojava-l] Phrap output Message-ID: Hi - The classes in org.biojava.bio.program.phred might do what you need although they are more for reading phd files. They may give you a starting point though. - Mark Avinash Kewalramani Sent by: biojava-l-bounces@portal.open-bio.org 06/16/2005 03:34 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Phrap output Hi I need store some information from an ace assembly file(which is Phrap plain text output). To do this I will have to write my own parses to parse this complicated text file. Is there any class In bioJava or anywhere else which does this.The best scenario would be if some code converts this file to xml output which can be easily parsed I have looked around a bit in Biojava and elsewhere and couldn't find anything for this. I dont want to use Perl(BioPerl probably has this) Thanks -- ---------------------------------------------------------------------- Avinash Kewalramani Technical Lead-Genome Informatics Group Bioscience Division Los Alamos National Laboratory Los Alamos, NM 87545 Phone: 505-664-0527 Cell: 816-213-1908 E-mail: avinash@lanl.gov ---------------------------------------------------------------------- _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From hollandr at gis.a-star.edu.sg Wed Jun 15 21:41:43 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Wed Jun 15 21:34:31 2005 Subject: [Biojava-l] Phrap output Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCA9E9@BIONIC.biopolis.one-north.com> Nope, nothing exists yet for reading Phrap/ACE. If you do end up writing your own parser, it'd be really great if you could contribute it to the project too. The way the BioJava file parsers work removes the need for an XML-translation step. File parsers read file, then fire events to listeners, eg. you could fire an event that says 'add another sequence', or one that says 'assembly finished'. The listener uses the events to construct the appropriate objects. When writing the file back out again the same events are generated, and another listener receives them and writes out the corresponding bits of file. You'd also have to decide how to represent the assembly once it is in memory. The interface org.biojava.seq.Assembly might be a good starting point. cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > Avinash Kewalramani > Sent: Thursday, June 16, 2005 3:35 AM > To: biojava-l@biojava.org > Subject: [Biojava-l] Phrap output > > > Hi > > I need store some information from an ace assembly file(which > is Phrap > plain text output). To do this I will have to write my own parses to > parse this complicated text file. > > Is there any class In bioJava or anywhere else which does > this.The best > scenario would be if some code converts this file to xml output which > can be easily parsed > > I have looked around a bit in Biojava and elsewhere and couldn't find > anything for this. I dont want to use Perl(BioPerl probably has this) > > Thanks > > -- > ---------------------------------------------------------------------- > Avinash Kewalramani > Technical Lead-Genome Informatics Group > Bioscience Division > Los Alamos National Laboratory > Los Alamos, NM 87545 > > Phone: 505-664-0527 > Cell: 816-213-1908 > E-mail: avinash@lanl.gov > ---------------------------------------------------------------------- > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From mark.schreiber at novartis.com Wed Jun 15 21:51:42 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Jun 15 21:43:36 2005 Subject: [Biojava-l] Phrap output Message-ID: If your going to follow the event based parsing model (which I strongly reccomend you do), I would make a Format implementation (possibly extended if you need more methods) and fire your events at something like the SimpleAssemblyBuilder object (again possibly extended if you need it to do more). - Mark "Richard HOLLAND" Sent by: biojava-l-bounces@portal.open-bio.org 06/16/2005 09:41 AM To: "Avinash Kewalramani" cc: biojava-l@biojava.org, (bcc: Mark Schreiber/GP/Novartis) Subject: RE: [Biojava-l] Phrap output Nope, nothing exists yet for reading Phrap/ACE. If you do end up writing your own parser, it'd be really great if you could contribute it to the project too. The way the BioJava file parsers work removes the need for an XML-translation step. File parsers read file, then fire events to listeners, eg. you could fire an event that says 'add another sequence', or one that says 'assembly finished'. The listener uses the events to construct the appropriate objects. When writing the file back out again the same events are generated, and another listener receives them and writes out the corresponding bits of file. You'd also have to decide how to represent the assembly once it is in memory. The interface org.biojava.seq.Assembly might be a good starting point. cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > Avinash Kewalramani > Sent: Thursday, June 16, 2005 3:35 AM > To: biojava-l@biojava.org > Subject: [Biojava-l] Phrap output > > > Hi > > I need store some information from an ace assembly file(which > is Phrap > plain text output). To do this I will have to write my own parses to > parse this complicated text file. > > Is there any class In bioJava or anywhere else which does > this.The best > scenario would be if some code converts this file to xml output which > can be easily parsed > > I have looked around a bit in Biojava and elsewhere and couldn't find > anything for this. I dont want to use Perl(BioPerl probably has this) > > Thanks > > -- > ---------------------------------------------------------------------- > Avinash Kewalramani > Technical Lead-Genome Informatics Group > Bioscience Division > Los Alamos National Laboratory > Los Alamos, NM 87545 > > Phone: 505-664-0527 > Cell: 816-213-1908 > E-mail: avinash@lanl.gov > ---------------------------------------------------------------------- > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From tmo at ebi.ac.uk Thu Jun 16 05:45:21 2005 From: tmo at ebi.ac.uk (Tom Oinn) Date: Thu Jun 16 05:36:38 2005 Subject: [Biojava-l] LSID In-Reply-To: References: Message-ID: <42B14A31.4080400@ebi.ac.uk> mark.schreiber@novartis.com wrote: > The internal use was mine (I was just using it as a substitute for a > namespace). Maybe we should upgrade it to be compatable with IBM or > Taverna? > > - Mark > > > > > > Michael Heuer > Sent by: biojava-l-bounces@portal.open-bio.org > 06/16/2005 03:27 AM > > > To: Gary Bader > cc: biojava-l@open-bio.org, Mark Schreiber/GP/Novartis@PH > Subject: Re: [Biojava-l] LSID > > > > The biojava LSID and the IBM LSID are slightly different APIs, the IBM one > the more complete of the two. There also are/were LSID client > implementations that I'm not very familiar with in taverna [0] and for > whatever reason in an email client called Haystack [1]. Taverna and Haystack (not an email client!) both use the reference implementation, i.e. the IBM one. In theory though the implementation shouldn't be that important, it's a standard after all - I'm not sure how actively supported IBM's one is but we've been using it quite happily for ages now. We use LSIDs in a slightly different manner to that originally intended, in that we're mostly using them to name transient entities such as workflow process instances although we do also name concrete data items. Cheers, Tom (Taverna lead) From fpepin at cs.mcgill.ca Thu Jun 16 18:55:54 2005 From: fpepin at cs.mcgill.ca (Francois Pepin) Date: Thu Jun 16 18:48:04 2005 Subject: [Biojava-l] Local binary execution In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601DCA935@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCA935@BIONIC.biopolis.one-north.com> Message-ID: <1118962554.5239.28.camel@elm.mcb.mcgill.ca> Not quite true. I've been using linux x86_64 version of the sun jvm 1.5 for a while now. I do agree that it's limited to windows, linux and solaris (32 and 64 bits for all). I don't know about other jvms. I personally like 1.5 a lot, but I'm not sure if I'd force it on all biojava users. Are you talking about core features, or just nifty add- ons that can be selectively compiled using ant? Francois On Wed, 2005-15-06 at 09:36 +0800, Richard HOLLAND wrote: > Linux supports Java 1.5 but only using the Sun JDK on ia32 and AMD > Opterons. Support for other architectures on Linux (such as ia64, PPC, > or Alpha) is restricted to specialist provisions from vendors such as HP > and the open source efforts such as Blackdown JDK. At a quick check, the > Alpha is only at 1.4.2 (from HP), likewise PPC (from IBM), whereas ia64 > can run 1.5 apps using HP's JRE but no compiler yet exists for them. > There may also be some open source purists out there who object when > they can't use their favourite open source JDK any more... > > > Richard Holland > Bioinformatics Specialist > GIS extension 8199 > --------------------------------------------- > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us immediately. Please > do not copy or use it for any purpose, or disclose its content to any > other person. Thank you. > --------------------------------------------- > > > > -----Original Message----- > > From: biojava-l-bounces@portal.open-bio.org > > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > > mark.schreiber@novartis.com > > Sent: Wednesday, June 15, 2005 9:09 AM > > To: Michael Barton > > Cc: biojava-l@open-bio.org; BioJava-dev > > Subject: [Biojava-l] Re: [Biojava-dev] Local binary execution > > > > > > We would normally not like to use a new JDK in biojava unless > > it is well > > supported on all the OS's people are using. Having said that > > there are > > several attractive features which would make it nice to use. > > > > Is anyones current OS not supporting java 1.5? > > > > - Mark > > > > > > > > > > > > Michael Barton > > Sent by: biojava-dev-bounces@portal.open-bio.org > > 06/15/2005 02:26 AM > > > > > > To: BioJava-dev > > cc: (bcc: Mark Schreiber/GP/Novartis) > > Subject: Re: [Biojava-dev] Local binary execution > > > > > > I had a look at the post you were refering to. In terms of the ant > > support for local binary execution I think it is very similar to the > > newly implemented ProcessBuilder in Java 1.5. > > This class has a similar way way of adding command line arguments to > > that of ant . > > > > The classes I'm suggesting have an enum of arguments specific to the > > application which may convienient for suppling different > > switch/argument > > pairs, as it it means that only arguments for which the binary allows > > can be supplied. > > Any errors should therefore come from incorrent argument values rather > > than incorrect arguments. If that makes sense. > > In addition the class throws an exception if the essential arguments > > required to run the search are not supplied. > > > > This means however that the classes are written in Java 1.5. > > Would this > > be a problem? > > > > > > On Thu, 2005-06-09 at 11:54 -0400, Michael Heuer wrote: > > > Hello Michael, > > > > > > Personally I think this kind of code might be better suited > > in a more > > > general library, say in an Apache Jakarta Commons project > > for example. > > > > > > In fact, there was just a proposal to pull the exec code > > out of ant into > > a > > > separate self-contained library to the commonds-dev mailing list a > > couple > > > of days ago: > > > > > > > http://tinyurl.com/9culs > > > > > > That said, this comes up quite frequently here, so perhaps > > we should > > just > > > bite the bullet and do it up right. > > > > > > michael > > > > > > > > > On Thu, 9 Jun 2005, Michael Barton wrote: > > > > > > > > > > > Hi, > > > > > > > > I'm Bioinformatics MRes student at Newcastle. I've been > > messing around > > > > with some java code to execute bioinformatics binaries. It was > > > > originally intended for blast but has also been extended > > for genewise. > > > > It takes the hassle out of using process / process > > builder a little > > bit. > > > > > > > > Use goes along the lines of something like this > > > > > > > > //Search factory for creating searches > > > > SearchFactory bsf; > > > > bsf = new BlastSearchFactory(); > > > > > > > > //Paramterise with search specific variables > > > > bsf.setSearchBinaryLocation(test_data + "/blast/binary"); > > > > > > bsf.setSearchParameter(BlastSearchFactory.Parameter.blastType, > > "blastn"); > > > > bsf.setSearchParameter(BlastSearchFactory.Parameter.database, > > > > test_data + "/blast/db/sargasso"); > > > > > > > > //Create immutable search object which can be used to run mutiple > > > > searches on the same database > > > > Search blastSearch = bsf.getSearch(); > > > > > > > > Simple search result object which returns inputstream > > > > SearchResult sr = blastSearch.execute(new File(test_data + > > > > "/blast/query/query")); > > > > > > > > InputStream is = sr.getResultStream(); > > > > > > > > It's seems to work okay on linux, I haven't tested it on windows. > > > > > > > > There's a little bit of JavaDoc I started work on but > > it's a little > > bit > > > > messed up from where I've been changing things around. > > > > > > > > The source/jar/doc are all here. There's test cases too. > > > > > > > > http://www.students.ncl.ac.uk/michael.barton1/ > > > > > > > > Mike > > > > > > > > _______________________________________________ > > > > biojava-dev mailing list > > > > biojava-dev@biojava.org > > > > http://biojava.org/mailman/listinfo/biojava-dev > > > > > > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev@biojava.org > > http://biojava.org/mailman/listinfo/biojava-dev > > > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l@biojava.org > > http://biojava.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From great_fred at yahoo.com Fri Jun 17 08:01:10 2005 From: great_fred at yahoo.com (=?iso-8859-1?q?S=E9bastien=20PETIT?=) Date: Fri Jun 17 07:53:03 2005 Subject: [Biojava-l] Parse with HSPHandler ?? Message-ID: <20050617120110.6119.qmail@web32208.mail.mud.yahoo.com> Hi everybody... I try to understand how Biojava works and I have a lot of problem... Maybe because I'm new in Java and Biojava.... I have files from blast programs of NCBI.... I can get them in text or XML format.... But, my wish is to keep just the aligments sequences and the name of the protein of each sequence... I tried to use HspHandler class and the example "BlastParser", given by Mark Schreiber, but I haven't what I want...And I don't know anymore how I can do... If it's not clear, I can try to better explain...Ask me.... (Because I'm French and not very good in English...;);) ) Thank you for any answer.. Sebastien ___________________________________________________________________________ Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger T?l?chargez cette version sur http://fr.messenger.yahoo.com From mark.schreiber at novartis.com Sun Jun 19 21:01:29 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Jun 19 20:53:03 2005 Subject: [Biojava-l] Parse with HSPHandler ?? Message-ID: What is it that you want from the BLAST record that you are not getting? - Mark S?bastien PETIT Sent by: biojava-l-bounces@portal.open-bio.org 06/17/2005 08:01 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Parse with HSPHandler ?? Hi everybody... I try to understand how Biojava works and I have a lot of problem... Maybe because I'm new in Java and Biojava.... I have files from blast programs of NCBI.... I can get them in text or XML format.... But, my wish is to keep just the aligments sequences and the name of the protein of each sequence... I tried to use HspHandler class and the example "BlastParser", given by Mark Schreiber, but I haven't what I want...And I don't know anymore how I can do... If it's not clear, I can try to better explain...Ask me.... (Because I'm French and not very good in English...;);) ) Thank you for any answer.. Sebastien ___________________________________________________________________________ Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger T?l?chargez cette version sur http://fr.messenger.yahoo.com _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From boehme at mpiib-berlin.mpg.de Mon Jun 20 05:43:35 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Mon Jun 20 05:39:35 2005 Subject: [Biojava-l] _removeSequence Message-ID: <42B68FC7.3060102@mpiib-berlin.mpg.de> Hi, Im trying to delete a sequence and recursivly all its features. So: for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { Sequence s = si.nextSequence(); String name = s.getName(); s = null; db.removeSequence(name); } But if I look in the database (MySQL 4.1.12) I can still see plenty of entries and I have problems entering the same features again, because of dublicate key error. I would like to know if _removeSequence(String) in BioSQLSequenceDB is supposed to remove features recursivly or just the features of the removed sequence? If so - what is the best way do delete the features of the features (and so on)? And how to empty the db completly? Martina From mark.schreiber at novartis.com Mon Jun 20 05:56:40 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Jun 20 05:48:18 2005 Subject: [Biojava-l] Re: [BioSQL-l] _removeSequence Message-ID: Biojava doesn't attempt to recusivley remove features by itself. It relies on cascading deletes in the database. I know Oracle can be set to do this (and it works very well). If MySQL has equivalent functionality you may need to turn it on. I'm pretty sure it does but you need to set it up. - Mark Martina Sent by: biosql-l-bounces@portal.open-bio.org 06/20/2005 05:43 PM To: biosql-l@open-bio.org, BioJava cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [BioSQL-l] _removeSequence Hi, Im trying to delete a sequence and recursivly all its features. So: for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { Sequence s = si.nextSequence(); String name = s.getName(); s = null; db.removeSequence(name); } But if I look in the database (MySQL 4.1.12) I can still see plenty of entries and I have problems entering the same features again, because of dublicate key error. I would like to know if _removeSequence(String) in BioSQLSequenceDB is supposed to remove features recursivly or just the features of the removed sequence? If so - what is the best way do delete the features of the features (and so on)? And how to empty the db completly? Martina _______________________________________________ BioSQL-l mailing list BioSQL-l@open-bio.org http://open-bio.org/mailman/listinfo/biosql-l From mark.schreiber at novartis.com Mon Jun 20 06:06:32 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Jun 20 05:58:14 2005 Subject: [Biojava-l] _removeSequence Message-ID: To remove the database completely (while still keeping the tables etc) you would again need to turn on cascading deletes and delete the appropriate biodatabase row from the biodatabase table (or all of them if you have more than one). You cannot currently do this using the biojava interface. You would need to code a JDBC statement to do it for you, or connect to the DB and issue the SQL statement yourself. - Mark Martina Sent by: biojava-l-bounces@portal.open-bio.org 06/20/2005 05:43 PM To: biosql-l@open-bio.org, BioJava cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] _removeSequence Hi, Im trying to delete a sequence and recursivly all its features. So: for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { Sequence s = si.nextSequence(); String name = s.getName(); s = null; db.removeSequence(name); } But if I look in the database (MySQL 4.1.12) I can still see plenty of entries and I have problems entering the same features again, because of dublicate key error. I would like to know if _removeSequence(String) in BioSQLSequenceDB is supposed to remove features recursivly or just the features of the removed sequence? If so - what is the best way do delete the features of the features (and so on)? And how to empty the db completly? Martina _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From hollandr at gis.a-star.edu.sg Mon Jun 20 06:10:29 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Mon Jun 20 06:03:38 2005 Subject: [Biojava-l] RE: [BioSQL-l] _removeSequence Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> To do cascading deletes in MySQL requires the tables to have been set up using the InnoDB table style (as opposed to the default MyISAM tables). In InnoDB, foreign keys are actually enforced and deletes will cascade, whereas in MyISAM it has no concept of foreign keys and so is unable to enforce data integrity. The people on the BioSQL-L mailing list will be able to help you there. The next version of BioJava's database interfaces after the 1.4 release will assume that the underlying database does have cascading deletes turned on. The existing version half-attempts to make up for the lack of cascading deletes in databases that don't support it, but it doesn't do it well at all, hence the problems you are seeing. After consulting with Hilmar last week we decided it was a fair assumption to make that all BioSQL instances are installed with cascading deletes enabled. BioPerl-db already makes this assumption. cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biosql-l-bounces@portal.open-bio.org > [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of > mark.schreiber@novartis.com > Sent: Monday, June 20, 2005 5:57 PM > To: Martina > Cc: biosql-l-bounces@portal.open-bio.org; BioJava; > biosql-l@open-bio.org > Subject: Re: [BioSQL-l] _removeSequence > > > Biojava doesn't attempt to recusivley remove features by > itself. It relies > on cascading deletes in the database. I know Oracle can be > set to do this > (and it works very well). If MySQL has equivalent > functionality you may > need to turn it on. I'm pretty sure it does but you need to set it up. > > - Mark > > > > > > Martina > Sent by: biosql-l-bounces@portal.open-bio.org > 06/20/2005 05:43 PM > > > To: biosql-l@open-bio.org, BioJava > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [BioSQL-l] _removeSequence > > > Hi, > > Im trying to delete a sequence and recursivly all its features. > > So: > > for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { > Sequence s = si.nextSequence(); > String name = s.getName(); > s = null; > db.removeSequence(name); > } > > But if I look in the database (MySQL 4.1.12) I can still see plenty > of entries and I have problems entering the same features again, > because of dublicate key error. I would like to know if > _removeSequence(String) in BioSQLSequenceDB is supposed to remove > features recursivly or just the features of the removed sequence? > If so - what is the best way do delete the features of the features > (and so on)? And how to empty the db completly? > > Martina > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > From hollandr at gis.a-star.edu.sg Mon Jun 20 06:11:57 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Mon Jun 20 06:05:31 2005 Subject: [BioSQL-l] Re: [Biojava-l] _removeSequence Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB7A@BIONIC.biopolis.one-north.com> There is also the BS-zap-all script in the BioSQL distribution which will wipe the whole lot for you in one go. :) Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biosql-l-bounces@portal.open-bio.org > [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of > mark.schreiber@novartis.com > Sent: Monday, June 20, 2005 6:07 PM > To: Martina > Cc: biojava-l-bounces@portal.open-bio.org; BioJava; > biosql-l@open-bio.org > Subject: [BioSQL-l] Re: [Biojava-l] _removeSequence > > > To remove the database completely (while still keeping the > tables etc) you > would again need to turn on cascading deletes and delete the > appropriate > biodatabase row from the biodatabase table (or all of them if > you have > more than one). > > You cannot currently do this using the biojava interface. You > would need > to code a JDBC statement to do it for you, or connect to the > DB and issue > the SQL statement yourself. > > - Mark > > > > > > Martina > Sent by: biojava-l-bounces@portal.open-bio.org > 06/20/2005 05:43 PM > > > To: biosql-l@open-bio.org, BioJava > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] _removeSequence > > > Hi, > > Im trying to delete a sequence and recursivly all its features. > > So: > > for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { > Sequence s = si.nextSequence(); > String name = s.getName(); > s = null; > db.removeSequence(name); > } > > But if I look in the database (MySQL 4.1.12) I can still see plenty > of entries and I have problems entering the same features again, > because of dublicate key error. I would like to know if > _removeSequence(String) in BioSQLSequenceDB is supposed to remove > features recursivly or just the features of the removed sequence? > If so - what is the best way do delete the features of the features > (and so on)? And how to empty the db completly? > > Martina > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > BioSQL-l mailing list > BioSQL-l@open-bio.org > http://open-bio.org/mailman/listinfo/biosql-l > From boehme at mpiib-berlin.mpg.de Mon Jun 20 06:20:37 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Mon Jun 20 06:24:35 2005 Subject: [Biojava-l] Re: [BioSQL-l] _removeSequence In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> Message-ID: <42B69875.3050306@mpiib-berlin.mpg.de> My tables are all InnoDB tables and in the biosqldb-mysql.sql (v 1.40 2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE. Do I need to do anything else? Thanks, Martina Richard HOLLAND wrote: > To do cascading deletes in MySQL requires the tables to have been set up > using the InnoDB table style (as opposed to the default MyISAM tables). > In InnoDB, foreign keys are actually enforced and deletes will cascade, > whereas in MyISAM it has no concept of foreign keys and so is unable to > enforce data integrity. The people on the BioSQL-L mailing list will be > able to help you there. > > The next version of BioJava's database interfaces after the 1.4 release > will assume that the underlying database does have cascading deletes > turned on. The existing version half-attempts to make up for the lack of > cascading deletes in databases that don't support it, but it doesn't do > it well at all, hence the problems you are seeing. After consulting with > Hilmar last week we decided it was a fair assumption to make that all > BioSQL instances are installed with cascading deletes enabled. > BioPerl-db already makes this assumption. > > cheers, > Richard > > Richard Holland > Bioinformatics Specialist > GIS extension 8199 > --------------------------------------------- > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us immediately. Please > do not copy or use it for any purpose, or disclose its content to any > other person. Thank you. > --------------------------------------------- > > > >>-----Original Message----- >>From: biosql-l-bounces@portal.open-bio.org >>[mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of >>mark.schreiber@novartis.com >>Sent: Monday, June 20, 2005 5:57 PM >>To: Martina >>Cc: biosql-l-bounces@portal.open-bio.org; BioJava; >>biosql-l@open-bio.org >>Subject: Re: [BioSQL-l] _removeSequence >> >> >>Biojava doesn't attempt to recusivley remove features by >>itself. It relies >>on cascading deletes in the database. I know Oracle can be >>set to do this >>(and it works very well). If MySQL has equivalent >>functionality you may >>need to turn it on. I'm pretty sure it does but you need to set it up. >> >>- Mark >> >> >> >> >> >>Martina >>Sent by: biosql-l-bounces@portal.open-bio.org >>06/20/2005 05:43 PM >> >> >> To: biosql-l@open-bio.org, BioJava >> cc: (bcc: Mark Schreiber/GP/Novartis) >> Subject: [BioSQL-l] _removeSequence >> >> >>Hi, >> >>Im trying to delete a sequence and recursivly all its features. >> >>So: >> >>for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { >> Sequence s = si.nextSequence(); >> String name = s.getName(); >> s = null; >> db.removeSequence(name); >>} >> >>But if I look in the database (MySQL 4.1.12) I can still see plenty >>of entries and I have problems entering the same features again, >>because of dublicate key error. I would like to know if >>_removeSequence(String) in BioSQLSequenceDB is supposed to remove >>features recursivly or just the features of the removed sequence? >>If so - what is the best way do delete the features of the features >>(and so on)? And how to empty the db completly? >> >>Martina >> >>_______________________________________________ >>BioSQL-l mailing list >>BioSQL-l@open-bio.org >>http://open-bio.org/mailman/listinfo/biosql-l >> >> >> >>_______________________________________________ >>BioSQL-l mailing list >>BioSQL-l@open-bio.org >>http://open-bio.org/mailman/listinfo/biosql-l > > From hollandr at gis.a-star.edu.sg Mon Jun 20 06:33:02 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Mon Jun 20 06:26:17 2005 Subject: [Biojava-l] RE: [BioSQL-l] _removeSequence Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com> Well, technically that should work because BioJava simply issues a delete against the seqfeature table, and therefore all features related through foreign keys should automatically delete themselves as a result without any further intervention by BioJava... beats me why it doesn't! Unfortunately I don't currently use the MySQL implementation myself so I can't help much. I hope someone on BioSQL-L knows a little more? Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: Martina [mailto:boehme@mpiib-berlin.mpg.de] > Sent: Monday, June 20, 2005 6:21 PM > To: Richard HOLLAND > Cc: biosql-l-bounces@portal.open-bio.org; BioJava; > biosql-l@open-bio.org > Subject: Re: [BioSQL-l] _removeSequence > > > My tables are all InnoDB tables and in the biosqldb-mysql.sql (v 1.40 > 2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE. > Do I need to do anything else? > > Thanks, > Martina > > Richard HOLLAND wrote: > > > To do cascading deletes in MySQL requires the tables to > have been set up > > using the InnoDB table style (as opposed to the default > MyISAM tables). > > In InnoDB, foreign keys are actually enforced and deletes > will cascade, > > whereas in MyISAM it has no concept of foreign keys and so > is unable to > > enforce data integrity. The people on the BioSQL-L mailing > list will be > > able to help you there. > > > > The next version of BioJava's database interfaces after the > 1.4 release > > will assume that the underlying database does have cascading deletes > > turned on. The existing version half-attempts to make up > for the lack of > > cascading deletes in databases that don't support it, but > it doesn't do > > it well at all, hence the problems you are seeing. After > consulting with > > Hilmar last week we decided it was a fair assumption to > make that all > > BioSQL instances are installed with cascading deletes enabled. > > BioPerl-db already makes this assumption. > > > > cheers, > > Richard > > > > Richard Holland > > Bioinformatics Specialist > > GIS extension 8199 > > --------------------------------------------- > > This email is confidential and may be privileged. If you are not the > > intended recipient, please delete it and notify us > immediately. Please > > do not copy or use it for any purpose, or disclose its > content to any > > other person. Thank you. > > --------------------------------------------- > > > > > > > >>-----Original Message----- > >>From: biosql-l-bounces@portal.open-bio.org > >>[mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of > >>mark.schreiber@novartis.com > >>Sent: Monday, June 20, 2005 5:57 PM > >>To: Martina > >>Cc: biosql-l-bounces@portal.open-bio.org; BioJava; > >>biosql-l@open-bio.org > >>Subject: Re: [BioSQL-l] _removeSequence > >> > >> > >>Biojava doesn't attempt to recusivley remove features by > >>itself. It relies > >>on cascading deletes in the database. I know Oracle can be > >>set to do this > >>(and it works very well). If MySQL has equivalent > >>functionality you may > >>need to turn it on. I'm pretty sure it does but you need to > set it up. > >> > >>- Mark > >> > >> > >> > >> > >> > >>Martina > >>Sent by: biosql-l-bounces@portal.open-bio.org > >>06/20/2005 05:43 PM > >> > >> > >> To: biosql-l@open-bio.org, BioJava > > >> cc: (bcc: Mark Schreiber/GP/Novartis) > >> Subject: [BioSQL-l] _removeSequence > >> > >> > >>Hi, > >> > >>Im trying to delete a sequence and recursivly all its features. > >> > >>So: > >> > >>for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { > >> Sequence s = si.nextSequence(); > >> String name = s.getName(); > >> s = null; > >> db.removeSequence(name); > >>} > >> > >>But if I look in the database (MySQL 4.1.12) I can still > see plenty > >>of entries and I have problems entering the same features again, > >>because of dublicate key error. I would like to know if > >>_removeSequence(String) in BioSQLSequenceDB is supposed to remove > >>features recursivly or just the features of the removed sequence? > >>If so - what is the best way do delete the features of the features > >>(and so on)? And how to empty the db completly? > >> > >>Martina > >> > >>_______________________________________________ > >>BioSQL-l mailing list > >>BioSQL-l@open-bio.org > >>http://open-bio.org/mailman/listinfo/biosql-l > >> > >> > >> > >>_______________________________________________ > >>BioSQL-l mailing list > >>BioSQL-l@open-bio.org > >>http://open-bio.org/mailman/listinfo/biosql-l > > > > > From boehme at mpiib-berlin.mpg.de Mon Jun 20 09:11:25 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Mon Jun 20 09:05:22 2005 Subject: [Biojava-l] Re: [BioSQL-l] _removeSequence In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com> Message-ID: <42B6C07D.7000106@mpiib-berlin.mpg.de> I droped the db and run the bioSql again - looks like its working now! Must have stopped before the alter table statements - didn't had the foreign keys - but I didn't know, that they had to be there. Thanks! Richard HOLLAND wrote: > Well, technically that should work because BioJava simply issues a > delete against the seqfeature table, and therefore all features related > through foreign keys should automatically delete themselves as a result > without any further intervention by BioJava... beats me why it doesn't! > Unfortunately I don't currently use the MySQL implementation myself so I > can't help much. I hope someone on BioSQL-L knows a little more? > > Richard Holland > Bioinformatics Specialist > GIS extension 8199 > --------------------------------------------- > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us immediately. Please > do not copy or use it for any purpose, or disclose its content to any > other person. Thank you. > --------------------------------------------- > > > >>-----Original Message----- >>From: Martina [mailto:boehme@mpiib-berlin.mpg.de] >>Sent: Monday, June 20, 2005 6:21 PM >>To: Richard HOLLAND >>Cc: biosql-l-bounces@portal.open-bio.org; BioJava; >>biosql-l@open-bio.org >>Subject: Re: [BioSQL-l] _removeSequence >> >> >>My tables are all InnoDB tables and in the biosqldb-mysql.sql (v 1.40 >>2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE. >>Do I need to do anything else? >> >>Thanks, >>Martina >> >>Richard HOLLAND wrote: >> >> >>>To do cascading deletes in MySQL requires the tables to >> >>have been set up >> >>>using the InnoDB table style (as opposed to the default >> >>MyISAM tables). >> >>>In InnoDB, foreign keys are actually enforced and deletes >> >>will cascade, >> >>>whereas in MyISAM it has no concept of foreign keys and so >> >>is unable to >> >>>enforce data integrity. The people on the BioSQL-L mailing >> >>list will be >> >>>able to help you there. >>> >>>The next version of BioJava's database interfaces after the >> >>1.4 release >> >>>will assume that the underlying database does have cascading deletes >>>turned on. The existing version half-attempts to make up >> >>for the lack of >> >>>cascading deletes in databases that don't support it, but >> >>it doesn't do >> >>>it well at all, hence the problems you are seeing. After >> >>consulting with >> >>>Hilmar last week we decided it was a fair assumption to >> >>make that all >> >>>BioSQL instances are installed with cascading deletes enabled. >>>BioPerl-db already makes this assumption. >>> >>>cheers, >>>Richard >>> >>>Richard Holland >>>Bioinformatics Specialist >>>GIS extension 8199 >>>--------------------------------------------- >>>This email is confidential and may be privileged. If you are not the >>>intended recipient, please delete it and notify us >> >>immediately. Please >> >>>do not copy or use it for any purpose, or disclose its >> >>content to any >> >>>other person. Thank you. >>>--------------------------------------------- >>> >>> >>> >>> >>>>-----Original Message----- >>>>From: biosql-l-bounces@portal.open-bio.org >>>>[mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of >>>>mark.schreiber@novartis.com >>>>Sent: Monday, June 20, 2005 5:57 PM >>>>To: Martina >>>>Cc: biosql-l-bounces@portal.open-bio.org; BioJava; >>>>biosql-l@open-bio.org >>>>Subject: Re: [BioSQL-l] _removeSequence >>>> >>>> >>>>Biojava doesn't attempt to recusivley remove features by >>>>itself. It relies >>>>on cascading deletes in the database. I know Oracle can be >>>>set to do this >>>>(and it works very well). If MySQL has equivalent >>>>functionality you may >>>>need to turn it on. I'm pretty sure it does but you need to >> >>set it up. >> >>>>- Mark >>>> >>>> >>>> >>>> >>>> >>>>Martina >>>>Sent by: biosql-l-bounces@portal.open-bio.org >>>>06/20/2005 05:43 PM >>>> >>>> >>>> To: biosql-l@open-bio.org, BioJava >> >> >> >>>> cc: (bcc: Mark Schreiber/GP/Novartis) >>>> Subject: [BioSQL-l] _removeSequence >>>> >>>> >>>>Hi, >>>> >>>>Im trying to delete a sequence and recursivly all its features. >>>> >>>>So: >>>> >>>>for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { >>>> Sequence s = si.nextSequence(); >>>> String name = s.getName(); >>>> s = null; >>>> db.removeSequence(name); >>>>} >>>> >>>>But if I look in the database (MySQL 4.1.12) I can still >> >>see plenty >> >>>>of entries and I have problems entering the same features again, >>>>because of dublicate key error. I would like to know if >>>>_removeSequence(String) in BioSQLSequenceDB is supposed to remove >>>>features recursivly or just the features of the removed sequence? >>>>If so - what is the best way do delete the features of the features >>>>(and so on)? And how to empty the db completly? >>>> >>>>Martina >>>> >>>>_______________________________________________ >>>>BioSQL-l mailing list >>>>BioSQL-l@open-bio.org >>>>http://open-bio.org/mailman/listinfo/biosql-l >>>> >>>> >>>> >>>>_______________________________________________ >>>>BioSQL-l mailing list >>>>BioSQL-l@open-bio.org >>>>http://open-bio.org/mailman/listinfo/biosql-l >>> >>> > From boehme at mpiib-berlin.mpg.de Mon Jun 20 11:20:35 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Mon Jun 20 11:20:27 2005 Subject: [Biojava-l] Re: [BioSQL-l] _removeSequence In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> Message-ID: <42B6DEC3.9090807@mpiib-berlin.mpg.de> Hi, so I have this new database (still biosqldb-mysql.sqlv 1.40 2004/11/04 01:49:41) and after removing all sequences, I do still have entries in term, term_relationship,term_relationship_term and ontology. And of course, in biodatabase. If I delete the entry in biodatabase too, nothing changes. Is that what is to be expected? Cause I still have trouble with the dublicate entry key, but that must be my code then. Thanks Martina From jesse-t at chello.nl Mon Jun 20 19:36:25 2005 From: jesse-t at chello.nl (Jesse) Date: Mon Jun 20 19:28:12 2005 Subject: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? Message-ID: <20050620233623.ZBAZ1226.amsfep20-int.chello.nl@anonymous> I found some strange things when using RestrictionEnzymeManager to get Restriction Enzymes (RE's) from REBASE. When I change a specific name of a RE in the REBASE file, it will crash. Steps to prepare: -Download the latest REBASE file from http://rebase.neb.com/rebase/link_withrefm -Rename it to rebase_common.dat -Overwrite it on the default smaller rebase_common.dat which is in the BioJava classpath org/biojava/bio/molbio/ For example: When I change <1>XmnI in <1>XmbbnI It will crash. When I change it back again (using the same texteditor), it will work again. Is the RE name of the "<1>" section linked to other sections like "<2>"? And then sees that a RE name is missing? Another strange thing is that when I remove some RE enties (so from <1> to <8> including the empty separator line after it), it will crash. Even though hexeditors show that only the entry is removed and not some newline characters etc. So the format is still the same. Does somebody know how these problems are caused? Or did I do something wrong? Thanks, Jesse -------- Error --------- Exception in thread "main" org.biojava.bio.BioError: Failed to read REBASE data file at org.biojava.bio.molbio.RestrictionEnzymeManager.loadData(RestrictionEnzymeMa nager.java:415) at org.biojava.bio.molbio.RestrictionEnzymeManager.(RestrictionEnzymeMa nager.java:136) at RETools.printAllRE(RETools.java:32) at RETools.main(RETools.java:15) Caused by: java.lang.NullPointerException at org.biojava.utils.SmallSet.contains(SmallSet.java:68) at org.biojava.utils.SmallSet.add(SmallSet.java:81) at org.biojava.bio.molbio.RestrictionEnzymeManager.loadData(RestrictionEnzymeMa nager.java:407) ... 3 more ------------------------- From mark.schreiber at novartis.com Mon Jun 20 21:35:09 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Jun 20 21:27:27 2005 Subject: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? Message-ID: Hi - When you say crash do you mean blue-screen-of-death type crash, chernobyl-type-meltdown or just a throws-an-exception and exits? If the latter please paste in your stack trace so we can figure out what happened. Also your JVM, OS, BioJava version would be good. Please also make sure you are using the latest biojava version (1.4pre2). Thanks, - Mark "Jesse" Sent by: biojava-l-bounces@portal.open-bio.org 06/21/2005 07:36 AM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? I found some strange things when using RestrictionEnzymeManager to get Restriction Enzymes (RE's) from REBASE. When I change a specific name of a RE in the REBASE file, it will crash. Steps to prepare: -Download the latest REBASE file from http://rebase.neb.com/rebase/link_withrefm -Rename it to rebase_common.dat -Overwrite it on the default smaller rebase_common.dat which is in the BioJava classpath org/biojava/bio/molbio/ For example: When I change <1>XmnI in <1>XmbbnI It will crash. When I change it back again (using the same texteditor), it will work again. Is the RE name of the "<1>" section linked to other sections like "<2>"? And then sees that a RE name is missing? Another strange thing is that when I remove some RE enties (so from <1> to <8> including the empty separator line after it), it will crash. Even though hexeditors show that only the entry is removed and not some newline characters etc. So the format is still the same. Does somebody know how these problems are caused? Or did I do something wrong? Thanks, Jesse -------- Error --------- Exception in thread "main" org.biojava.bio.BioError: Failed to read REBASE data file at org.biojava.bio.molbio.RestrictionEnzymeManager.loadData(RestrictionEnzymeMa nager.java:415) at org.biojava.bio.molbio.RestrictionEnzymeManager.(RestrictionEnzymeMa nager.java:136) at RETools.printAllRE(RETools.java:32) at RETools.main(RETools.java:15) Caused by: java.lang.NullPointerException at org.biojava.utils.SmallSet.contains(SmallSet.java:68) at org.biojava.utils.SmallSet.add(SmallSet.java:81) at org.biojava.bio.molbio.RestrictionEnzymeManager.loadData(RestrictionEnzymeMa nager.java:407) ... 3 more ------------------------- _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From jesse-t at chello.nl Mon Jun 20 23:39:56 2005 From: jesse-t at chello.nl (Jesse) Date: Mon Jun 20 23:31:33 2005 Subject: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? Message-ID: <20050621033955.JXSL1231.amsfep18-int.chello.nl@anonymous> Hi Mark, With "crash" I mean an exception. This one: -------- Exception --------- Exception in thread "main" org.biojava.bio.BioError: Failed to read REBASE data file at org.biojava.bio.molbio.RestrictionEnzymeManager.loadData(RestrictionEnzymeMa nager.java:415) at org.biojava.bio.molbio.RestrictionEnzymeManager.(RestrictionEnzymeMa nager.java:136) at RETools.printAllRE(RETools.java:32) at RETools.main(RETools.java:15) Caused by: java.lang.NullPointerException at org.biojava.utils.SmallSet.contains(SmallSet.java:68) at org.biojava.utils.SmallSet.add(SmallSet.java:81) at org.biojava.bio.molbio.RestrictionEnzymeManager.loadData(RestrictionEnzymeMa nager.java:407) ... 3 more ------------------------- OS: Microsoft Windows XP Professional SP 2 [Version 5.1.2600] Java: java version "1.5.0_02" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_02-b09) Java HotSpot(TM) Client VM (build 1.5.0_02-b09, mixed mode, sharing) BioJava: BioJava 1.4pre2 The problem I described happens when calling RestrictionEnzymeManager.getAllEnzymes() on a modified REBASE file. What I modified (as test) was only the name "<1>XmnI" to "<1>XmbnI" (line 35343 of REBASE format 31, version 506). Sometimes the exception also occurs when removing some specific restriction enzyme entries (from <1> to <8> including the trailing empty line). From jesse-t at chello.nl Mon Jun 20 23:41:36 2005 From: jesse-t at chello.nl (Jesse) Date: Mon Jun 20 23:33:06 2005 Subject: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? Message-ID: <20050621034134.XZXC1610.amsfep12-int.chello.nl@anonymous> Hi Mark, With "crash" I mean an exception. This one: -------- Exception --------- Exception in thread "main" org.biojava.bio.BioError: Failed to read REBASE data file at org.biojava.bio.molbio.RestrictionEnzymeManager.loadData(RestrictionEnzymeMa nager.java:415) at org.biojava.bio.molbio.RestrictionEnzymeManager.(RestrictionEnzymeMa nager.java:136) at RETools.printAllRE(RETools.java:32) at RETools.main(RETools.java:15) Caused by: java.lang.NullPointerException at org.biojava.utils.SmallSet.contains(SmallSet.java:68) at org.biojava.utils.SmallSet.add(SmallSet.java:81) at org.biojava.bio.molbio.RestrictionEnzymeManager.loadData(RestrictionEnzymeMa nager.java:407) ... 3 more ------------------------- OS: Microsoft Windows XP Professional SP 2 [Version 5.1.2600] Java: java version "1.5.0_02" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_02-b09) Java HotSpot(TM) Client VM (build 1.5.0_02-b09, mixed mode, sharing) BioJava: BioJava 1.4pre2 The problem I described happens when calling RestrictionEnzymeManager.getAllEnzymes() on a modified REBASE file. What I modified (as test) was only the name "<1>XmnI" to "<1>XmbnI" (line 35343 of REBASE format 31, version 506). Sometimes the exception also occurs when removing some specific restriction enzyme entries (from <1> to <8> including the trailing empty line). From mark.schreiber at novartis.com Tue Jun 21 01:45:49 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Jun 21 01:37:19 2005 Subject: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? Message-ID: I guess the other question I would ask is, should it crash? By modifying the file are you fundamentally changing the format? The NullPointerException seems to suggest that you inserted something that doesn't have a matching record (or some similar problem, I'm not familiar with REBASE). Try taking a look into the RestrictionEnzymeManager code at the root of the exception to give you some clues what might be going wrong. It's hard to tell if this is actually a bug or if you have incorrectly modified the file. - Mark "Jesse" Sent by: biojava-l-bounces@portal.open-bio.org 06/21/2005 11:39 AM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? Hi Mark, With "crash" I mean an exception. This one: -------- Exception --------- Exception in thread "main" org.biojava.bio.BioError: Failed to read REBASE data file at org.biojava.bio.molbio.RestrictionEnzymeManager.loadData(RestrictionEnzymeMa nager.java:415) at org.biojava.bio.molbio.RestrictionEnzymeManager.(RestrictionEnzymeMa nager.java:136) at RETools.printAllRE(RETools.java:32) at RETools.main(RETools.java:15) Caused by: java.lang.NullPointerException at org.biojava.utils.SmallSet.contains(SmallSet.java:68) at org.biojava.utils.SmallSet.add(SmallSet.java:81) at org.biojava.bio.molbio.RestrictionEnzymeManager.loadData(RestrictionEnzymeMa nager.java:407) ... 3 more ------------------------- OS: Microsoft Windows XP Professional SP 2 [Version 5.1.2600] Java: java version "1.5.0_02" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_02-b09) Java HotSpot(TM) Client VM (build 1.5.0_02-b09, mixed mode, sharing) BioJava: BioJava 1.4pre2 The problem I described happens when calling RestrictionEnzymeManager.getAllEnzymes() on a modified REBASE file. What I modified (as test) was only the name "<1>XmnI" to "<1>XmbnI" (line 35343 of REBASE format 31, version 506). Sometimes the exception also occurs when removing some specific restriction enzyme entries (from <1> to <8> including the trailing empty line). _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From franckv at ebi.ac.uk Tue Jun 21 03:50:26 2005 From: franckv at ebi.ac.uk (Franck Valentin) Date: Tue Jun 21 03:41:54 2005 Subject: [Biojava-l] Use of 'LabelledSequenceRenderer' and 'FeatureLabelRenderer' Message-ID: <1119340226.12636.2185.camel@pongo.ebi.ac.uk> Hi, I would like to display graphically a feature table in the usual form like this one : label1 label2 feature1 <---------> <---------> label3 label4 feature2 <----------> <--------------> ..... I've adapted FastBeadDemo.java and tried to use the class 'LabelledSequenceRenderer' to display the name of the features and 'FeatureLabelRenderer'to display the labels but after several different tries nothing is displayed by both the classes. I haven't seen any use of this classes in the demos, have someone already used them and do you know where I can find examples of uses ? Thanks Franck From boehme at mpiib-berlin.mpg.de Tue Jun 21 05:46:22 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Tue Jun 21 05:37:59 2005 Subject: [Biojava-l] Re: [BioSQL-l] _removeSequence In-Reply-To: <78e39420822012ffbf691b5edc233b4a@gnf.org> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB80@BIONIC.biopolis.one-north.com> <78e39420822012ffbf691b5edc233b4a@gnf.org> Message-ID: <42B7E1EE.5090505@mpiib-berlin.mpg.de> Hi Hilmar, I wasn't aware of 2 different types of features. I'm making features as described in http://www.biojava.org/docs/bj_in_anger/feature.htm, and as far as I can tell from the results, its the first type you describe. The second type of feature is confusing me: as I understood the feature relationships, the graph is a tree, with only one parent for a given feature, and if that feature is deleted, all its children should get deleted too? Martina Hilmar Lapp wrote: > There's one thing that I'm unsure about in Martina's original email, > namely whether she was referring to features related to a sequence > (bioentry), or to features hierarchically related to each other through > the seqfeature_relationship table. > > If the former, then the cascading delete should have taken care of > removing the features when you remove the sequence (bioentry) to which > they point through their foreign key (and recursively the locations etc). > > However, if the question was about hierarchical features, then deleting > one feature in the hierarchy will never (and shouldn't ever) delete any > other feature in the hierarchy (except if all of them reference the same > bioentry and you deleted the bioentry). If you delete a seqfeature in a > hierarchy of seqfeatures then by cascading delete this will also delete > all rows in seqfeature_relationship that reference that seqfeature as > either a subject or an object in a nesting relationship between > features. I.e., looking at the hierarchy as a graph, removing a node > will cascade to deleting all incoming and outgoing arcs for that node, > but not other nodes. > > If your application wants to take down all nodes in the hierarchy when > one node is deleted, you need to write code to do this. (Except if, as > mentioned before, all features reference the same bioentry, in which > case deleting the bioentry will delete the entire feature hierarchy.) > > -hilmar > > On Jun 20, 2005, at 3:33 AM, Richard HOLLAND wrote: > >> Well, technically that should work because BioJava simply issues a >> delete against the seqfeature table, and therefore all features related >> through foreign keys should automatically delete themselves as a result >> without any further intervention by BioJava... beats me why it doesn't! >> Unfortunately I don't currently use the MySQL implementation myself so I >> can't help much. I hope someone on BioSQL-L knows a little more? >> >> Richard Holland >> Bioinformatics Specialist >> GIS extension 8199 >> --------------------------------------------- >> This email is confidential and may be privileged. If you are not the >> intended recipient, please delete it and notify us immediately. Please >> do not copy or use it for any purpose, or disclose its content to any >> other person. Thank you. >> --------------------------------------------- >> >> >>> -----Original Message----- >>> From: Martina [mailto:boehme@mpiib-berlin.mpg.de] >>> Sent: Monday, June 20, 2005 6:21 PM >>> To: Richard HOLLAND >>> Cc: biosql-l-bounces@portal.open-bio.org; BioJava; >>> biosql-l@open-bio.org >>> Subject: Re: [BioSQL-l] _removeSequence >>> >>> >>> My tables are all InnoDB tables and in the biosqldb-mysql.sql (v 1.40 >>> 2004/11/04 01:49:41) which created them, it says ON DELETE CASCADE. >>> Do I need to do anything else? >>> >>> Thanks, >>> Martina >>> >>> Richard HOLLAND wrote: >>> >>>> To do cascading deletes in MySQL requires the tables to >>> >>> have been set up >>> >>>> using the InnoDB table style (as opposed to the default >>> >>> MyISAM tables). >>> >>>> In InnoDB, foreign keys are actually enforced and deletes >>> >>> will cascade, >>> >>>> whereas in MyISAM it has no concept of foreign keys and so >>> >>> is unable to >>> >>>> enforce data integrity. The people on the BioSQL-L mailing >>> >>> list will be >>> >>>> able to help you there. >>>> >>>> The next version of BioJava's database interfaces after the >>> >>> 1.4 release >>> >>>> will assume that the underlying database does have cascading deletes >>>> turned on. The existing version half-attempts to make up >>> >>> for the lack of >>> >>>> cascading deletes in databases that don't support it, but >>> >>> it doesn't do >>> >>>> it well at all, hence the problems you are seeing. After >>> >>> consulting with >>> >>>> Hilmar last week we decided it was a fair assumption to >>> >>> make that all >>> >>>> BioSQL instances are installed with cascading deletes enabled. >>>> BioPerl-db already makes this assumption. >>>> >>>> cheers, >>>> Richard >>>> >>>> Richard Holland >>>> Bioinformatics Specialist >>>> GIS extension 8199 >>>> --------------------------------------------- >>>> This email is confidential and may be privileged. If you are not the >>>> intended recipient, please delete it and notify us >>> >>> immediately. Please >>> >>>> do not copy or use it for any purpose, or disclose its >>> >>> content to any >>> >>>> other person. Thank you. >>>> --------------------------------------------- >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: biosql-l-bounces@portal.open-bio.org >>>>> [mailto:biosql-l-bounces@portal.open-bio.org] On Behalf Of >>>>> mark.schreiber@novartis.com >>>>> Sent: Monday, June 20, 2005 5:57 PM >>>>> To: Martina >>>>> Cc: biosql-l-bounces@portal.open-bio.org; BioJava; >>>>> biosql-l@open-bio.org >>>>> Subject: Re: [BioSQL-l] _removeSequence >>>>> >>>>> >>>>> Biojava doesn't attempt to recusivley remove features by >>>>> itself. It relies >>>>> on cascading deletes in the database. I know Oracle can be >>>>> set to do this >>>>> (and it works very well). If MySQL has equivalent >>>>> functionality you may >>>>> need to turn it on. I'm pretty sure it does but you need to >>> >>> set it up. >>> >>>>> >>>>> - Mark >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Martina >>>>> Sent by: biosql-l-bounces@portal.open-bio.org >>>>> 06/20/2005 05:43 PM >>>>> >>>>> >>>>> To: biosql-l@open-bio.org, BioJava >>> >>> >>> >>>>> cc: (bcc: Mark Schreiber/GP/Novartis) >>>>> Subject: [BioSQL-l] _removeSequence >>>>> >>>>> >>>>> Hi, >>>>> >>>>> Im trying to delete a sequence and recursivly all its features. >>>>> >>>>> So: >>>>> >>>>> for (SequenceIterator si = db.sequenceIterator(); si.hasNext();) { >>>>> Sequence s = si.nextSequence(); >>>>> String name = s.getName(); >>>>> s = null; >>>>> db.removeSequence(name); >>>>> } >>>>> >>>>> But if I look in the database (MySQL 4.1.12) I can still >>> >>> see plenty >>> >>>>> of entries and I have problems entering the same features again, >>>>> because of dublicate key error. I would like to know if >>>>> _removeSequence(String) in BioSQLSequenceDB is supposed to remove >>>>> features recursivly or just the features of the removed sequence? >>>>> If so - what is the best way do delete the features of the features >>>>> (and so on)? And how to empty the db completly? >>>>> >>>>> Martina >>>>> >>>>> _______________________________________________ >>>>> BioSQL-l mailing list >>>>> BioSQL-l@open-bio.org >>>>> http://open-bio.org/mailman/listinfo/biosql-l >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> BioSQL-l mailing list >>>>> BioSQL-l@open-bio.org >>>>> http://open-bio.org/mailman/listinfo/biosql-l >>>> >>>> >>>> >>> >> >> _______________________________________________ >> BioSQL-l mailing list >> BioSQL-l@open-bio.org >> http://open-bio.org/mailman/listinfo/biosql-l >> From boehme at mpiib-berlin.mpg.de Tue Jun 21 06:10:16 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Tue Jun 21 06:02:40 2005 Subject: [Biojava-l] Re: [BioSQL-l] _removeSequence In-Reply-To: References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> Message-ID: <42B7E788.3040205@mpiib-berlin.mpg.de> > Yes. When you insert a sequence you must be prepared that when inserting > its ontology term or tag/value annotation the term may already be > present because another bioentry uses it too. Ok, the proper way is to catch the SQLException in BIOSQLFeature, test if it is a Dublicate key entry, get the identifier of the term (would that be the BioSQLfeatureId ?) and insert it in the term_relationship table? And there is no nice BioJava method for this, I have to do it "manually", like conn.prepareStatement(..) and stuff? BioJava spoiled me so! Martina From jesse-t at chello.nl Tue Jun 21 09:06:12 2005 From: jesse-t at chello.nl (Jesse) Date: Tue Jun 21 08:57:50 2005 Subject: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? In-Reply-To: Message-ID: <20050621130611.TSND24432.amsfep19-int.chello.nl@anonymous> I think I found the problem. The Restriction Enzyme name (<1>) of an entry can be linked to the isoschizomers field (<2>) of other entries. So when I remove an entry, I also have to remove those names in the isoschizomers field of other entries. So it's not a bug. - Jesse -----Oorspronkelijk bericht----- Van: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] Verzonden: dinsdag 21 juni 2005 7:46 Aan: Jesse CC: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org Onderwerp: Re: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? I guess the other question I would ask is, should it crash? By modifying the file are you fundamentally changing the format? The NullPointerException seems to suggest that you inserted something that doesn't have a matching record (or some similar problem, I'm not familiar with REBASE). Try taking a look into the RestrictionEnzymeManager code at the root of the exception to give you some clues what might be going wrong. It's hard to tell if this is actually a bug or if you have incorrectly modified the file. - Mark "Jesse" Sent by: biojava-l-bounces@portal.open-bio.org 06/21/2005 11:39 AM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? Hi Mark, With "crash" I mean an exception. This one: -------- Exception --------- Exception in thread "main" org.biojava.bio.BioError: Failed to read REBASE data file at org.biojava.bio.molbio.RestrictionEnzymeManager.loadData(RestrictionEnzymeMa nager.java:415) at org.biojava.bio.molbio.RestrictionEnzymeManager.(RestrictionEnzymeMa nager.java:136) at RETools.printAllRE(RETools.java:32) at RETools.main(RETools.java:15) Caused by: java.lang.NullPointerException at org.biojava.utils.SmallSet.contains(SmallSet.java:68) at org.biojava.utils.SmallSet.add(SmallSet.java:81) at org.biojava.bio.molbio.RestrictionEnzymeManager.loadData(RestrictionEnzymeMa nager.java:407) ... 3 more ------------------------- OS: Microsoft Windows XP Professional SP 2 [Version 5.1.2600] Java: java version "1.5.0_02" Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_02-b09) Java HotSpot(TM) Client VM (build 1.5.0_02-b09, mixed mode, sharing) BioJava: BioJava 1.4pre2 The problem I described happens when calling RestrictionEnzymeManager.getAllEnzymes() on a modified REBASE file. What I modified (as test) was only the name "<1>XmnI" to "<1>XmbnI" (line 35343 of REBASE format 31, version 506). Sometimes the exception also occurs when removing some specific restriction enzyme entries (from <1> to <8> including the trailing empty line). _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From boehme at mpiib-berlin.mpg.de Tue Jun 21 09:55:15 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Tue Jun 21 09:52:33 2005 Subject: [Biojava-l] Re: [BioSQL-l] _removeSequence In-Reply-To: <0be3992b92f6a14b6d06d5a06549555b@gnf.org> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> <42B7E788.3040205@mpiib-berlin.mpg.de> <0be3992b92f6a14b6d06d5a06549555b@gnf.org> Message-ID: <42B81C43.9010404@mpiib-berlin.mpg.de> That means, that I can't have 2 features refering to the same bioentry with the same type (= type_term_id)and source (=source_term_id) but different parent features because of the composite key bioentry_id in the seqfeature table? Or what does "rank" in that table mean (its part of that key), how can I get different ranks? Martina Hilmar Lapp wrote: > The Biojava people will respond to this. Note though that > Term_Relationship is for storing subject-predicate-object triples of > terms, so I'm not sure why you want to use it for storing/associating > annotation. Maybe you meant bioentry_qualifier_value? > > -hilmar > > On Jun 21, 2005, at 3:10 AM, Martina wrote: > >> >>> Yes. When you insert a sequence you must be prepared that when >>> inserting its ontology term or tag/value annotation the term may >>> already be present because another bioentry uses it too. >> >> >> Ok, the proper way is to catch the SQLException in BIOSQLFeature, test >> if it is a Dublicate key entry, get the identifier of the term (would >> that be the BioSQLfeatureId ?) and insert it in the term_relationship >> table? And there is no nice BioJava method for this, I have to do it >> "manually", like conn.prepareStatement(..) and stuff? BioJava spoiled >> me so! >> >> Martina >> From gwaldon at geneinfinity.org Tue Jun 21 12:12:53 2005 From: gwaldon at geneinfinity.org (george waldon) Date: Tue Jun 21 12:05:11 2005 Subject: =?US-ASCII?B?UkU6IFtCaW9qYXZhLWxdIFJlc3RyaWN0aW9uRW56eW1lTWFuYWdlciBSRUJBU0UgcmVhZGVyIGJ1Zz8=?= Message-ID: <200506211612.j5LGCrQp078068@mmm1924.dulles19-verio.com> Of course it's a bug and I reported it a while ago: Dated from Wed 5/11/2005 11:31 AM "There is also a bug I found a while ago. In RestrictionEnzymeManager.java, around 2/3 down, put for (Iterator ii = isoschizomers.iterator(); ii.hasNext();) { String isoName = (String) ii.next(); Object re = nameToEnzyme.get(isoName); if(re!=null) tempSet.add(re); } helps to deal with isoschizomers." A mean to track bugs would be nice but more important I think would be a searchable mail archive. I remember there is a way to search biojava archive somewhere but I couldn't find it on the biojava web site. Would be nice to have a link on the site. - George -----Original Message----- From: biojava-l-bounces@portal.open-bio.org [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of Jesse Sent: Tuesday, June 21, 2005 6:06 AM To: biojava-l@biojava.org Subject: RE: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? I think I found the problem. The Restriction Enzyme name (<1>) of an entry can be linked to the isoschizomers field (<2>) of other entries. So when I remove an entry, I also have to remove those names in the isoschizomers field of other entries. So it's not a bug. - Jesse From simon.foote at nrc-cnrc.gc.ca Tue Jun 21 12:15:45 2005 From: simon.foote at nrc-cnrc.gc.ca (Simon Foote) Date: Tue Jun 21 12:06:44 2005 Subject: [Biojava-l] Re: [BioSQL-l] _removeSequence In-Reply-To: <42B81C43.9010404@mpiib-berlin.mpg.de> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> <42B7E788.3040205@mpiib-berlin.mpg.de> <0be3992b92f6a14b6d06d5a06549555b@gnf.org> <42B81C43.9010404@mpiib-berlin.mpg.de> Message-ID: <42B83D31.2000403@nrc-cnrc.gc.ca> Hi Martina, In fact you can, as rank is the field that allows this to happen. In Biojava, currently it's just a linearily incremented number such that you can have the same type and source IDs for a given bioentry. For example, adding a Genbank entry with 10 CDS features for 1 bioentry will give you identical keys for bioentry_id, type_term_id and source_term_id, but will have a rank of 1 - 10 for each. Simon Martina wrote: > That means, that I can't have 2 features refering to the same bioentry > with the same type (= type_term_id)and source (=source_term_id) but > different parent features because of the composite key bioentry_id in > the seqfeature table? Or what does "rank" in that table mean (its part > of that key), how can I get different ranks? > > Martina > > Hilmar Lapp wrote: > >> The Biojava people will respond to this. Note though that >> Term_Relationship is for storing subject-predicate-object triples of >> terms, so I'm not sure why you want to use it for storing/associating >> annotation. Maybe you meant bioentry_qualifier_value? >> >> -hilmar >> >> On Jun 21, 2005, at 3:10 AM, Martina wrote: >> >>> >>>> Yes. When you insert a sequence you must be prepared that when >>>> inserting its ontology term or tag/value annotation the term may >>>> already be present because another bioentry uses it too. >>> >>> >>> >>> Ok, the proper way is to catch the SQLException in BIOSQLFeature, >>> test if it is a Dublicate key entry, get the identifier of the term >>> (would that be the BioSQLfeatureId ?) and insert it in the >>> term_relationship table? And there is no nice BioJava method for >>> this, I have to do it "manually", like conn.prepareStatement(..) and >>> stuff? BioJava spoiled me so! >>> >>> Martina >>> > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l -- Bioinformatics Programmer Pathogen Genomics Institute for Biological Sciences National Research Council of Canada [T] 613-990-0561 [F] 613-952-9092 simon.foote@nrc-cnrc.gc.ca From kturner at idtdna.com Tue Jun 21 15:17:02 2005 From: kturner at idtdna.com (Keith Turner) Date: Tue Jun 21 15:08:33 2005 Subject: [Biojava-l] Using SeqIOTools in a JNLP context Message-ID: <03D1119D99B98D4D9762E01F1D4FB980010FA82F@EXCHANGE.idtdna.com> Hello- I am new to the list. I enjoy working with the Biojava API, but a problem has arisen for me, and I need some help with it. I am developing an application to be used in the Java Webstart framework, and this brings with it some interesting file permission issues. Basically, you use the JNLP interface FileOpenService to open a file from within the secure "sandbox" environment, and then you can get an InputStream out of that. So I want to take this InputStream (which presumably is from a Fasta file), and read a DNA sequence from it. However, all the methods that worked when I was running my software as a Java application no longer work in the JNLP environment. In the past, I was doing: InputStreamReader fr = new InputStreamReader(in); BufferedReader br = new BufferedReader(fr); SequenceIterator stream = SeqIOTools.readFastaDNA(br); Sequence seq = stream.nextSequence(); But the program freezes on the SeqIOTools.readFastaDNA(br) call. No exception is thrown back, it just does nothing. Does anyone have any suggestions as to how I can solve or work around this problem? Thank you very much -Keith Turner From ap3 at sanger.ac.uk Tue Jun 21 18:08:19 2005 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Tue Jun 21 17:57:40 2005 Subject: [Biojava-l] Using SeqIOTools in a JNLP context In-Reply-To: <03D1119D99B98D4D9762E01F1D4FB980010FA82F@EXCHANGE.idtdna.com> References: <03D1119D99B98D4D9762E01F1D4FB980010FA82F@EXCHANGE.idtdna.com> Message-ID: <8a9eaf07220858b58692215950175e85@sanger.ac.uk> Hi Keith, You should get an java.security.AccessControlException: access denied from webstart. To access the filesystem from an application started with webstart requires special permission. This means you have to sign your application and the user has to permit the execution. see e.g. http://java.sun.com/docs/books/tutorial/security1.2/toolsign/signer.html Cheers, Andreas On 21 Jun 2005, at 20:17, Keith Turner wrote: > Hello- > > I am new to the list. I enjoy working with the Biojava API, but a > problem has arisen for me, and I need some help with it. I am > developing an application to be used in the Java Webstart framework, > and this brings with it some interesting file permission issues. > Basically, you use the JNLP interface FileOpenService to open a file > from within the secure "sandbox" environment, and then you can get an > InputStream out of that. > > So I want to take this InputStream (which presumably is from a Fasta > file), and read a DNA sequence from it. However, all the methods that > worked when I was running my software as a Java application no longer > work in the JNLP environment. In the past, I was doing: > InputStreamReader fr = new InputStreamReader(in); > BufferedReader br = new BufferedReader(fr); > SequenceIterator stream = SeqIOTools.readFastaDNA(br); > Sequence seq = stream.nextSequence(); > But the program freezes on the SeqIOTools.readFastaDNA(br) call. No > exception is thrown back, it just does nothing. Does anyone have any > suggestions as to how I can solve or work around this problem? Thank > you very much > > -Keith Turner > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From mark.schreiber at novartis.com Tue Jun 21 20:48:57 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Jun 21 20:40:38 2005 Subject: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? Message-ID: Oops. I was supposed to check that in. A bug tracking feature would be nice although I fear that the number of hands available to fix those tracked bugs might be severely limiting. If people know of good and free systems I could reccomend them to the open-bio admins. There was some talk of a searchable mail archive a while ago (although google seems to do a pretty good job of indexing our mail). I'll try and follow it up. - Mark "george waldon" Sent by: biojava-l-bounces@portal.open-bio.org 06/22/2005 12:12 AM Please respond to george waldon To: Biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: RE: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? Of course it's a bug and I reported it a while ago: Dated from Wed 5/11/2005 11:31 AM "There is also a bug I found a while ago. In RestrictionEnzymeManager.java, around 2/3 down, put for (Iterator ii = isoschizomers.iterator(); ii.hasNext();) { String isoName = (String) ii.next(); Object re = nameToEnzyme.get(isoName); if(re!=null) tempSet.add(re); } helps to deal with isoschizomers." A mean to track bugs would be nice but more important I think would be a searchable mail archive. I remember there is a way to search biojava archive somewhere but I couldn't find it on the biojava web site. Would be nice to have a link on the site. - George -----Original Message----- From: biojava-l-bounces@portal.open-bio.org [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of Jesse Sent: Tuesday, June 21, 2005 6:06 AM To: biojava-l@biojava.org Subject: RE: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? I think I found the problem. The Restriction Enzyme name (<1>) of an entry can be linked to the isoschizomers field (<2>) of other entries. So when I remove an entry, I also have to remove those names in the isoschizomers field of other entries. So it's not a bug. - Jesse _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Tue Jun 21 22:22:52 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Jun 21 22:14:29 2005 Subject: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? Message-ID: Hello - This is now checked in. All tests pass (no surprise as checking for null never hurt anyone). This will make it into biojava1.4. If you want to add a test to the Junit to ensure this stays fixed it would be most appreciated. I also remember some discussion a while back about the behaivour of certain enzymes with respect to their cleavage points which may or may not have been a bug. Was this ever resolved? If so does anything need fixing? Thanks. - Mark "george waldon" Sent by: biojava-l-bounces@portal.open-bio.org 06/22/2005 12:12 AM Please respond to george waldon To: Biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: RE: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? Of course it's a bug and I reported it a while ago: Dated from Wed 5/11/2005 11:31 AM "There is also a bug I found a while ago. In RestrictionEnzymeManager.java, around 2/3 down, put for (Iterator ii = isoschizomers.iterator(); ii.hasNext();) { String isoName = (String) ii.next(); Object re = nameToEnzyme.get(isoName); if(re!=null) tempSet.add(re); } helps to deal with isoschizomers." A mean to track bugs would be nice but more important I think would be a searchable mail archive. I remember there is a way to search biojava archive somewhere but I couldn't find it on the biojava web site. Would be nice to have a link on the site. - George -----Original Message----- From: biojava-l-bounces@portal.open-bio.org [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of Jesse Sent: Tuesday, June 21, 2005 6:06 AM To: biojava-l@biojava.org Subject: RE: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? I think I found the problem. The Restriction Enzyme name (<1>) of an entry can be linked to the isoschizomers field (<2>) of other entries. So when I remove an entry, I also have to remove those names in the isoschizomers field of other entries. So it's not a bug. - Jesse _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Wed Jun 22 01:55:56 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Jun 22 01:47:26 2005 Subject: [Biojava-l] searching mailing lists Message-ID: I have found the open-bio search page (http://search.open-bio.org/) you can use this to search mailing lists and webpages for any open-bio project. - Mark Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From jesse-t at chello.nl Wed Jun 22 04:15:32 2005 From: jesse-t at chello.nl (Jesse) Date: Wed Jun 22 04:07:03 2005 Subject: [Biojava-l] RestrictionEnzymeManager can't correctly handle incomplete enzymes Message-ID: <20050622081531.CHWO1610.amsfep12-int.chello.nl@anonymous> RestrictionEnzymeManager can't correctly handle incomplete enzymes and gives wrong data. (Correct me if I'm wrong.) I'm not sure if this is already discussed or not. I think RestrictionEnzymeManager can not handle incomplete restriction enzymes. BioJava 1.4Pre2 knows two types of RestrictionEnzymes: -RestrictionEnzyme.CUT_SIMPLE -RestrictionEnzyme.CUT_COMPOUND But in REBASE, there are also other restriction enzyme entries: -Unknown recognition sites. For example "<3>?". RestrictionEnzymeManager skips this one (which is ok). -Unknown cut location. For example AacI "<3>GGATCC". The problem with RestrictionEnzymeManager is with those REBASE entries which have an unknown cutlocation. RestrictionEnzymeManager will actually tell that there is a cutlocation, even though it's unknown in the REBASE file. For example: http://rebase.neb.com/rebase/link_withrefm --------- REBASE ENTRY ----------- <1>AacI <2>BamHI,AaeI,AcaII,AccEBI,AinII,AliI,Ali12257I,Ali12258I,ApaCI,AsiI,AspTII, Atu1II,BamFI,BamKI,BamNI,Bca1259I,Bce751I,Bco10278I,BnaI,BsaDI,Bsp30I,Bsp46I ,Bsp90II,Bsp98I,Bsp130I,Bsp131I,Bsp144I,Bsp4009I,BspAAIII,BstI,Bst1126I,Bst2 464I,Bst2902I,BstQI,Bsu90I,Bsu8565I,Bsu8646I,BsuB519I,BsuB763I,CelI,DdsI,Gdo I,GinI,GoxI,GseIII,GstI,MleI,Mlu23I,NasBI,Nsp29132II,NspSAIV,OkrAI,Pac1110I, Pae177I,Pfl8I,Psp56I,RhsI,Rlu4I,RspLKII,SolI,SpvI,SurI,Uba19I,Uba31I,Uba38I, Uba51I,Uba88I,Uba1098I,Uba1163I,Uba1167I,Uba1172I,Uba1173I,Uba1205I,Uba1224I ,Uba1242I,Uba1250I,Uba1258I,Uba1297I,Uba1302I,Uba1324I,Uba1325I,Uba1334I,Uba 1339I,Uba1346I,Uba1383I,Uba1398I,Uba1402I,Uba1414I,Uba4009I <3>GGATCC <4> <5>Acetobacter aceti sub. liquefaciens <6>IFO 12388 <7> <8>Seurinck, J., van Montagu, M., Unpublished observations. ---------------------------------- --------- RestrictionEnzyme values -------- Name: AacI RecognitionSite:ggatcc ForwardRegex: g{2}atc{2} ReverseRegex: g{2}atc{2} CutType: 0 (RestrictionEnzyme.CUT_SIMPLE) DownStreamEndType: 2 IsPalindromic: true DownstreamCut: 1, 1, ------------------------------------------- As you can see, AaCI is used as RestrictionEnzyme.CUT_SIMPLE and it has a cutlocation while the REBASE entry says that the cutlocation is unknown, only the recognition site is known. So RestrictionEnzymeManager should also filter out those with an unknown cutlocation, otherwise it gives wrong data. - Jesse [Biojava-l] RestrictionEnzymeManager REBASE reader bug? mark.schreiber at novartis.com mark.schreiber at novartis.com Tue Jun 21 22:22:52 EDT 2005 Hello - This is now checked in. All tests pass (no surprise as checking for null never hurt anyone). This will make it into biojava1.4. If you want to add a test to the Junit to ensure this stays fixed it would be most appreciated. I also remember some discussion a while back about the behaivour of certain enzymes with respect to their cleavage points which may or may not have been a bug. Was this ever resolved? If so does anything need fixing? Thanks. - Mark From boehme at mpiib-berlin.mpg.de Wed Jun 22 05:24:08 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Wed Jun 22 05:16:15 2005 Subject: [Biojava-l] update seqfeature In-Reply-To: <42B83D31.2000403@nrc-cnrc.gc.ca> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> <42B7E788.3040205@mpiib-berlin.mpg.de> <0be3992b92f6a14b6d06d5a06549555b@gnf.org> <42B81C43.9010404@mpiib-berlin.mpg.de> <42B83D31.2000403@nrc-cnrc.gc.ca> Message-ID: <42B92E38.2020008@mpiib-berlin.mpg.de> Hi Simon, I'm changing the FeatureSource and in setFeatureSource an update on the source_term_id happens. In the case the combination is already there, I get an Exception. The proper way to deal with that would be to get the seqfeature_id of the entry already there and use that, or try to update the rank unless its a unique combination? Or should I rather not mess with the BioJava and delete that entry and insert it as new to let BioJava handle the rank increase? Thanks for any advise Martina Simon Foote wrote: > Hi Martina, > > In fact you can, as rank is the field that allows this to happen. In > Biojava, currently it's just a linearily incremented number such that > you can have the same type and source IDs for a given bioentry. > > For example, adding a Genbank entry with 10 CDS features for 1 bioentry > will give you identical keys for bioentry_id, type_term_id and > source_term_id, but will have a rank of 1 - 10 for each. > > Simon > From mark.schreiber at novartis.com Wed Jun 22 05:24:52 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Jun 22 05:17:08 2005 Subject: [Biojava-l] RestrictionEnzymeManager can't correctly handle incomplete enzymes Message-ID: I take your point but I notice that BamHI is an isoscizomer. Is the cleavage site of BamHI really unknown?? - Mark "Jesse" Sent by: biojava-l-bounces@portal.open-bio.org 06/22/2005 04:15 PM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] RestrictionEnzymeManager can't correctly handle incomplete enzymes RestrictionEnzymeManager can't correctly handle incomplete enzymes and gives wrong data. (Correct me if I'm wrong.) I'm not sure if this is already discussed or not. I think RestrictionEnzymeManager can not handle incomplete restriction enzymes. BioJava 1.4Pre2 knows two types of RestrictionEnzymes: -RestrictionEnzyme.CUT_SIMPLE -RestrictionEnzyme.CUT_COMPOUND But in REBASE, there are also other restriction enzyme entries: -Unknown recognition sites. For example "<3>?". RestrictionEnzymeManager skips this one (which is ok). -Unknown cut location. For example AacI "<3>GGATCC". The problem with RestrictionEnzymeManager is with those REBASE entries which have an unknown cutlocation. RestrictionEnzymeManager will actually tell that there is a cutlocation, even though it's unknown in the REBASE file. For example: http://rebase.neb.com/rebase/link_withrefm --------- REBASE ENTRY ----------- <1>AacI <2>BamHI,AaeI,AcaII,AccEBI,AinII,AliI,Ali12257I,Ali12258I,ApaCI,AsiI,AspTII, Atu1II,BamFI,BamKI,BamNI,Bca1259I,Bce751I,Bco10278I,BnaI,BsaDI,Bsp30I,Bsp46I ,Bsp90II,Bsp98I,Bsp130I,Bsp131I,Bsp144I,Bsp4009I,BspAAIII,BstI,Bst1126I,Bst2 464I,Bst2902I,BstQI,Bsu90I,Bsu8565I,Bsu8646I,BsuB519I,BsuB763I,CelI,DdsI,Gdo I,GinI,GoxI,GseIII,GstI,MleI,Mlu23I,NasBI,Nsp29132II,NspSAIV,OkrAI,Pac1110I, Pae177I,Pfl8I,Psp56I,RhsI,Rlu4I,RspLKII,SolI,SpvI,SurI,Uba19I,Uba31I,Uba38I, Uba51I,Uba88I,Uba1098I,Uba1163I,Uba1167I,Uba1172I,Uba1173I,Uba1205I,Uba1224I ,Uba1242I,Uba1250I,Uba1258I,Uba1297I,Uba1302I,Uba1324I,Uba1325I,Uba1334I,Uba 1339I,Uba1346I,Uba1383I,Uba1398I,Uba1402I,Uba1414I,Uba4009I <3>GGATCC <4> <5>Acetobacter aceti sub. liquefaciens <6>IFO 12388 <7> <8>Seurinck, J., van Montagu, M., Unpublished observations. ---------------------------------- --------- RestrictionEnzyme values -------- Name: AacI RecognitionSite:ggatcc ForwardRegex: g{2}atc{2} ReverseRegex: g{2}atc{2} CutType: 0 (RestrictionEnzyme.CUT_SIMPLE) DownStreamEndType: 2 IsPalindromic: true DownstreamCut: 1, 1, ------------------------------------------- As you can see, AaCI is used as RestrictionEnzyme.CUT_SIMPLE and it has a cutlocation while the REBASE entry says that the cutlocation is unknown, only the recognition site is known. So RestrictionEnzymeManager should also filter out those with an unknown cutlocation, otherwise it gives wrong data. - Jesse [Biojava-l] RestrictionEnzymeManager REBASE reader bug? mark.schreiber at novartis.com mark.schreiber at novartis.com Tue Jun 21 22:22:52 EDT 2005 Hello - This is now checked in. All tests pass (no surprise as checking for null never hurt anyone). This will make it into biojava1.4. If you want to add a test to the Junit to ensure this stays fixed it would be most appreciated. I also remember some discussion a while back about the behaivour of certain enzymes with respect to their cleavage points which may or may not have been a bug. Was this ever resolved? If so does anything need fixing? Thanks. - Mark _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From jesse-t at chello.nl Wed Jun 22 06:09:01 2005 From: jesse-t at chello.nl (Jesse) Date: Wed Jun 22 06:01:08 2005 Subject: [Biojava-l] RestrictionEnzymeManager can't correctly handle incomplete enzymes Message-ID: <20050622100859.JIQP1610.amsfep12-int.chello.nl@anonymous> (I'm not an expert on restriction enzymes.) I was talking about AacI, of which BamHI is an isoschizomer. The recognition site of AacI is unknown, but the one from BamHI is known. Maybe RestrictionEnzymeManager uses the cutlocation of BamHI when asking the unknown cutlocation of AacI. http://rebase.neb.com/rebase/enz/AacI.html That might also be the reason why RestrictionEnzymeManager requires links between restriction enzymes. If a restriction enzyme entry is removed from the REBASE file RestrictionEnzymeManager fails to read in some cases. But I think using cutlocation of isoschizomers is wrong. Because of this: REBASE says: "A isoschizomers is a restriction enzymes that recognize the same DNA sequence. The cut sites may or may not be identical." So the cut site might be different between different isoschizomers. I searched for examples in the REBASE file, and found them: <1>BspKT6I <2>MboI,AspMDI,AsuMBI,Bce243I,Bfi57I,BfiSHI,BfuCI,Bme12I,Bme2494I,BsaPI,BscF I,BsmXII,BspI,Bsp9I,Bsp18I,Bsp49I,Bsp51I,Bsp52I,Bsp54I,Bsp57I,Bsp58I,Bsp59I, Bsp60I,Bsp61I,Bsp64I,Bsp65I,Bsp66I,Bsp67I,Bsp72I,Bsp74I,Bsp76I,Bsp91I,Bsp105 I,Bsp122I,Bsp135I,Bsp136I,Bsp138I,Bsp143I,Bsp147I,Bsp2095I,BspAI,BspFI,BspJI ,BspJ64I,BsrMI,BsrPII,BssGII,Bst19II,Bst1274I,BstEIII,BstENII,BstKTI,BstMBI, BstXII,BtcI,Bth84I,Bth211I,Bth213I,Bth221I,Bth945I,Bth1140I,Bth1141I,Bth1786 I,Bth1997I,BthCanI,BtkII,Btu33I,Btu34I,Btu36I,Btu37I,Btu39I,Btu41I,CacI,CcoP 31I,CcoP76I,CcoP84I,CcoP95II,CcoP219I,CcyI,CdiCD6II,ChaI,Cin1467I,CjeP338I,C paI,CpfI,CpfAI,Csp5I,Cte1179I,Cte1180I,CtyI,CviAI,CviHI,DpnII,EsaLHCI,FnuAII ,FnuCI,FnuEI,Gst1588II,HacI,HpyAIII,HpyHPK5II,Kzo9I,LlaAI,LlaDCHI,LlaKR2I,Ls p1109II,Mel3JI,Mel5JI,Mel7JI,Mel4OI,Mel5OI,Mel2TI,Mel5TI,MeuI,MgoI,MjaIII,Mk rAI,MmeII,Mmu5I,MmuP2I,MnoIII,MosI,Msp67II,MspBI,MthI,Mth1047I,MthAI,NciAI,N deII,NflI,NflAII,NflBI,NlaII,NlaDI,NmeCI,NphI,NsiAI,NspAI,NsuI,Pei9403I,PfaI ,Pph288I,RalF40I,Rlu1I,SalAI,SalHI,Sau15I,Sau6782I,Sau3AI,SauCI,SauDI,SauEI, SauFI,SauGI,SauMI,SinMI,SmiMBI,SsiAI,SsiBI,Ssu211I,Ssu212I,Ssu220I,R1.Ssu247 9I,R2.Ssu2479I,R1.Ssu4109I,R2.Ssu4109I,R1.Ssu4961I,R2.Ssu4961I,R1.Ssu8074I,R 2.Ssu8074I,R1.Ssu11318I,R2.Ssu11318I,R1.SsuDAT1I,R2.SsuDAT1I,SsuRBI,Sth368I, TrsKTI,TrsSI,TrsTI,TruII,Tsp133I,Uba4I,Uba59I,Uba1101I,Uba1177I,Uba1182I,Uba 1183I,Uba1204I,Uba1259I,Uba1317I,Uba1323I,Uba1366I,Vha44I <3>GAT^C <4>2(6) <5>Bacillus species KT6 <6>N.I. Matvienko <7> <8>Shapovalova, N.I., Zheleznaja, L.A., Matvienko, N.I., (1993) Nucleic Acids Res., vol. 21, pp. 5794. Shapovalova, N.I., Zheleznaya, L.A., Matvienko, N.I., (1994) Biokhimiia, vol. 59, pp. 1730-1738. <1>MboI <2>AspMDI,AsuMBI,Bce243I,Bfi57I,BfiSHI,BfuCI,Bme12I,Bme2494I,BsaPI,BscFI,Bsm XII,BspI,Bsp9I,Bsp18I,Bsp49I,Bsp51I,Bsp52I,Bsp54I,Bsp57I,Bsp58I,Bsp59I,Bsp60 I,Bsp61I,Bsp64I,Bsp65I,Bsp66I,Bsp67I,Bsp72I,Bsp74I,Bsp76I,Bsp91I,Bsp105I,Bsp 122I,Bsp135I,Bsp136I,Bsp138I,Bsp143I,Bsp147I,Bsp2095I,BspAI,BspFI,BspJI,BspJ 64I,BspKT6I,BsrMI,BsrPII,BssGII,Bst19II,Bst1274I,BstEIII,BstENII,BstKTI,BstM BI,BstXII,BtcI,Bth84I,Bth211I,Bth213I,Bth221I,Bth945I,Bth1140I,Bth1141I,Bth1 786I,Bth1997I,BthCanI,BtkII,Btu33I,Btu34I,Btu36I,Btu37I,Btu39I,Btu41I,CacI,C coP31I,CcoP76I,CcoP84I,CcoP95II,CcoP219I,CcyI,CdiCD6II,ChaI,Cin1467I,CjeP338 I,CpaI,CpfI,CpfAI,Csp5I,Cte1179I,Cte1180I,CtyI,CviAI,CviHI,DpnII,EsaLHCI,Fnu AII,FnuCI,FnuEI,Gst1588II,HacI,HpyAIII,HpyHPK5II,Kzo9I,LlaAI,LlaDCHI,LlaKR2I ,Lsp1109II,Mel3JI,Mel5JI,Mel7JI,Mel4OI,Mel5OI,Mel2TI,Mel5TI,MeuI,MgoI,MjaIII ,MkrAI,MmeII,Mmu5I,MmuP2I,MnoIII,MosI,Msp67II,MspBI,MthI,Mth1047I,MthAI,NciA I,NdeII,NflI,NflAII,NflBI,NlaII,NlaDI,NmeCI,NphI,NsiAI,NspAI,NsuI,Pei9403I,P faI,Pph288I,RalF40I,Rlu1I,SalAI,SalHI,Sau15I,Sau6782I,Sau3AI,SauCI,SauDI,Sau EI,SauFI,SauGI,SauMI,SinMI,SmiMBI,SsiAI,SsiBI,Ssu211I,Ssu212I,Ssu220I,R1.Ssu 2479I,R2.Ssu2479I,R1.Ssu4109I,R2.Ssu4109I,R1.Ssu4961I,R2.Ssu4961I,R1.Ssu8074 I,R2.Ssu8074I,R1.Ssu11318I,R2.Ssu11318I,R1.SsuDAT1I,R2.SsuDAT1I,SsuRBI,Sth36 8I,TrsKTI,TrsSI,TrsTI,TruII,Tsp133I,Uba4I,Uba59I,Uba1101I,Uba1177I,Uba1182I, Uba1183I,Uba1204I,Uba1259I,Uba1317I,Uba1323I,Uba1366I,Vha44I <3>^GATC <4>2(6) <5>Moraxella bovis <6>ATCC 10900 <7>ACFGKNQRUVX <8>Anton, B.P., Brooks, J.E., Unpublished observations. Gelinas, R.E., Myers, P.A., Roberts, R.J., (1977) J. Mol. Biol., vol. 114, pp. 169-179. Huang, L.-H., Farnet, C.M., Ehrlich, K.C., Ehrlich, M., (1982) Nucleic Acids Res., vol. 10, pp. 1579-1591. Ueno, T., Ito, H., Kimizuka, F., Kotani, H., Nakajima, K., (1993) Nucleic Acids Res., vol. 21, pp. 2309-2313. Ueno, T., Ito, H., Kotani, H., Nakajima, K., Japanese Patent Office, 1993. <1>Mel3JI <2>MboI,AspMDI,AsuMBI,Bce243I,Bfi57I,BfiSHI,BfuCI,Bme12I,Bme2494I,BsaPI,BscF I,BsmXII,BspI,Bsp9I,Bsp18I,Bsp49I,Bsp51I,Bsp52I,Bsp54I,Bsp57I,Bsp58I,Bsp59I, Bsp60I,Bsp61I,Bsp64I,Bsp65I,Bsp66I,Bsp67I,Bsp72I,Bsp74I,Bsp76I,Bsp91I,Bsp105 I,Bsp122I,Bsp135I,Bsp136I,Bsp138I,Bsp143I,Bsp147I,Bsp2095I,BspAI,BspFI,BspJI ,BspJ64I,BspKT6I,BsrMI,BsrPII,BssGII,Bst19II,Bst1274I,BstEIII,BstENII,BstKTI ,BstMBI,BstXII,BtcI,Bth84I,Bth211I,Bth213I,Bth221I,Bth945I,Bth1140I,Bth1141I ,Bth1786I,Bth1997I,BthCanI,BtkII,Btu33I,Btu34I,Btu36I,Btu37I,Btu39I,Btu41I,C acI,CcoP31I,CcoP76I,CcoP84I,CcoP95II,CcoP219I,CcyI,CdiCD6II,ChaI,Cin1467I,Cj eP338I,CpaI,CpfI,CpfAI,Csp5I,Cte1179I,Cte1180I,CtyI,CviAI,CviHI,DpnII,EsaLHC I,FnuAII,FnuCI,FnuEI,Gst1588II,HacI,HpyAIII,HpyHPK5II,Kzo9I,LlaAI,LlaDCHI,Ll aKR2I,Lsp1109II,Mel5JI,Mel7JI,Mel4OI,Mel5OI,Mel2TI,Mel5TI,MeuI,MgoI,MjaIII,M krAI,MmeII,Mmu5I,MmuP2I,MnoIII,MosI,Msp67II,MspBI,MthI,Mth1047I,MthAI,NciAI, NdeII,NflI,NflAII,NflBI,NlaII,NlaDI,NmeCI,NphI,NsiAI,NspAI,NsuI,Pei9403I,Pfa I,Pph288I,RalF40I,Rlu1I,SalAI,SalHI,Sau15I,Sau6782I,Sau3AI,SauCI,SauDI,SauEI ,SauFI,SauGI,SauMI,SinMI,SmiMBI,SsiAI,SsiBI,Ssu211I,Ssu212I,Ssu220I,R1.Ssu24 79I,R2.Ssu2479I,R1.Ssu4109I,R2.Ssu4109I,R1.Ssu4961I,R2.Ssu4961I,R1.Ssu8074I, R2.Ssu8074I,R1.Ssu11318I,R2.Ssu11318I,R1.SsuDAT1I,R2.SsuDAT1I,SsuRBI,Sth368I ,TrsKTI,TrsSI,TrsTI,TruII,Tsp133I,Uba4I,Uba59I,Uba1101I,Uba1177I,Uba1182I,Ub a1183I,Uba1204I,Uba1259I,Uba1317I,Uba1323I,Uba1366I,Vha44I <3>GATC <4> <5>Megasphaera elsedenii 3J <6>P. Pristas <7> <8>Piknova, M., Filova, M., Javorsky, P., Pristas, P., (2004) FEMS Microbiol. Lett., vol. 236, pp. 91-95. Piknova, M., Pristas, P., Javorsky, P., (2004) Folia Microbiol. (Praha), vol. 49, pp. 191-193. -----Oorspronkelijk bericht----- Van: mark.schreiber@novartis.com [ mailto:mark.schreiber@novartis.com] Verzonden: woensdag 22 juni 2005 11:25 Aan: Jesse CC: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org Onderwerp: Re: [Biojava-l] RestrictionEnzymeManager can't correctly handle incomplete enzymes I take your point but I notice that BamHI is an isoscizomer. Is the cleavage site of BamHI really unknown?? - Mark "Jesse" Sent by: biojava-l-bounces@portal.open-bio.org 06/22/2005 04:15 PM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] RestrictionEnzymeManager can't correctly handle incomplete enzymes RestrictionEnzymeManager can't correctly handle incomplete enzymes and gives wrong data. (Correct me if I'm wrong.) I'm not sure if this is already discussed or not. I think RestrictionEnzymeManager can not handle incomplete restriction enzymes. BioJava 1.4Pre2 knows two types of RestrictionEnzymes: -RestrictionEnzyme.CUT_SIMPLE -RestrictionEnzyme.CUT_COMPOUND But in REBASE, there are also other restriction enzyme entries: -Unknown recognition sites. For example "<3>?". RestrictionEnzymeManager skips this one (which is ok). -Unknown cut location. For example AacI "<3>GGATCC". The problem with RestrictionEnzymeManager is with those REBASE entries which have an unknown cutlocation. RestrictionEnzymeManager will actually tell that there is a cutlocation, even though it's unknown in the REBASE file. For example: http://rebase.neb.com/rebase/link_withrefm --------- REBASE ENTRY ----------- <1>AacI <2>BamHI,AaeI,AcaII,AccEBI,AinII,AliI,Ali12257I,Ali12258I,ApaCI,AsiI,AspTII, Atu1II,BamFI,BamKI,BamNI,Bca1259I,Bce751I,Bco10278I,BnaI,BsaDI,Bsp30I,Bsp46I ,Bsp90II,Bsp98I,Bsp130I,Bsp131I,Bsp144I,Bsp4009I,BspAAIII,BstI,Bst1126I,Bst2 464I,Bst2902I,BstQI,Bsu90I,Bsu8565I,Bsu8646I,BsuB519I,BsuB763I,CelI,DdsI,Gdo I,GinI,GoxI,GseIII,GstI,MleI,Mlu23I,NasBI,Nsp29132II,NspSAIV,OkrAI,Pac1110I, Pae177I,Pfl8I,Psp56I,RhsI,Rlu4I,RspLKII,SolI,SpvI,SurI,Uba19I,Uba31I,Uba38I, Uba51I,Uba88I,Uba1098I,Uba1163I,Uba1167I,Uba1172I,Uba1173I,Uba1205I,Uba1224I ,Uba1242I,Uba1250I,Uba1258I,Uba1297I,Uba1302I,Uba1324I,Uba1325I,Uba1334I,Uba 1339I,Uba1346I,Uba1383I,Uba1398I,Uba1402I,Uba1414I,Uba4009I <3>GGATCC <4> <5>Acetobacter aceti sub. liquefaciens <6>IFO 12388 <7> <8>Seurinck, J., van Montagu, M., Unpublished observations. ---------------------------------- --------- RestrictionEnzyme values -------- Name: AacI RecognitionSite:ggatcc ForwardRegex: g{2}atc{2} ReverseRegex: g{2}atc{2} CutType: 0 (RestrictionEnzyme.CUT_SIMPLE) DownStreamEndType: 2 IsPalindromic: true DownstreamCut: 1, 1, ------------------------------------------- As you can see, AaCI is used as RestrictionEnzyme.CUT_SIMPLE and it has a cutlocation while the REBASE entry says that the cutlocation is unknown, only the recognition site is known. So RestrictionEnzymeManager should also filter out those with an unknown cutlocation, otherwise it gives wrong data. - Jesse [Biojava-l] RestrictionEnzymeManager REBASE reader bug? mark.schreiber at novartis.com mark.schreiber at novartis.com Tue Jun 21 22:22:52 EDT 2005 Hello - This is now checked in. All tests pass (no surprise as checking for null never hurt anyone). This will make it into biojava1.4. If you want to add a test to the Junit to ensure this stays fixed it would be most appreciated. I also remember some discussion a while back about the behaivour of certain enzymes with respect to their cleavage points which may or may not have been a bug. Was this ever resolved? If so does anything need fixing? Thanks. - Mark _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From simon.foote at nrc-cnrc.gc.ca Wed Jun 22 08:51:11 2005 From: simon.foote at nrc-cnrc.gc.ca (Simon Foote) Date: Wed Jun 22 08:41:52 2005 Subject: [Biojava-l] Re: update seqfeature In-Reply-To: <42B92E38.2020008@mpiib-berlin.mpg.de> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> <42B7E788.3040205@mpiib-berlin.mpg.de> <0be3992b92f6a14b6d06d5a06549555b@gnf.org> <42B81C43.9010404@mpiib-berlin.mpg.de> <42B83D31.2000403@nrc-cnrc.gc.ca> <42B92E38.2020008@mpiib-berlin.mpg.de> Message-ID: <42B95EBF.7050403@nrc-cnrc.gc.ca> Hi Martina, Biojava should handle that correctly. I haven't done it by changing a feature source, but I have with changing a feature's location and strand. For changing a location: // Get the Feature you wish to edit StrandedFeature sf = ex. use a feature filter to grab the feature by it's ID Location loc = new Location(100, 1100); sf.setLocation(loc); Since you have already retrieved the feature to edit, biojava will automatically do this as an update and not an insert. Or it should in all cases where you are modifying a pre-existing feature. Simon Martina wrote: > Hi Simon, > > I'm changing the FeatureSource and in setFeatureSource an update on > the source_term_id happens. In the case the combination is already > there, I get an Exception. The proper way to deal with that would be > to get the seqfeature_id of the entry already there and use that, or > try to update the rank unless its a unique combination? Or should I > rather not mess with the BioJava and delete that entry and insert it > as new to let BioJava handle the rank increase? > > Thanks for any advise > > Martina > > Simon Foote wrote: > >> Hi Martina, >> >> In fact you can, as rank is the field that allows this to happen. In >> Biojava, currently it's just a linearily incremented number such that >> you can have the same type and source IDs for a given bioentry. >> >> For example, adding a Genbank entry with 10 CDS features for 1 >> bioentry will give you identical keys for bioentry_id, type_term_id >> and source_term_id, but will have a rank of 1 - 10 for each. >> >> Simon >> -- Bioinformatics Programmer Pathogen Genomics Institute for Biological Sciences National Research Council of Canada [T] 613-990-0561 [F] 613-952-9092 simon.foote@nrc-cnrc.gc.ca From boehme at mpiib-berlin.mpg.de Wed Jun 22 09:05:44 2005 From: boehme at mpiib-berlin.mpg.de (Martina) Date: Wed Jun 22 08:57:19 2005 Subject: [Biojava-l] Re: update seqfeature In-Reply-To: <42B95EBF.7050403@nrc-cnrc.gc.ca> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> <42B7E788.3040205@mpiib-berlin.mpg.de> <0be3992b92f6a14b6d06d5a06549555b@gnf.org> <42B81C43.9010404@mpiib-berlin.mpg.de> <42B83D31.2000403@nrc-cnrc.gc.ca> <42B92E38.2020008@mpiib-berlin.mpg.de> <42B95EBF.7050403@nrc-cnrc.gc.ca> Message-ID: <42B96228.4020100@mpiib-berlin.mpg.de> Hi Simon, sorry, I might haven't made that clear enough: The problem only exists with changing a feature source (or type, but I didn't try that) because of the composite unique index in biosql seqfeature table, it doesn't check if the location is the same or not, but the combination of type, source, bioentry id and rank has to be unique. So if I insert a new feature, the rank gets increased by BioJava somehow and all is well, but if I update an existing features source and hit by accident the same combination as anothers fetures type, source, .. I get the exception and the source doesn't change. At least that is what I suppose is happening. My question was how to handle this situation? Martina Simon Foote wrote: > Hi Martina, > > Biojava should handle that correctly. I haven't done it by changing a > feature source, but I have with changing a feature's location and > strand. For changing a location: > > // Get the Feature you wish to edit > StrandedFeature sf = ex. use a feature filter to grab the feature by > it's ID > Location loc = new Location(100, 1100); > sf.setLocation(loc); > > Since you have already retrieved the feature to edit, biojava will > automatically do this as an update and not an insert. Or it should in > all cases where you are modifying a pre-existing feature. > From simon.foote at nrc-cnrc.gc.ca Wed Jun 22 09:15:54 2005 From: simon.foote at nrc-cnrc.gc.ca (Simon Foote) Date: Wed Jun 22 09:10:09 2005 Subject: [Biojava-l] Re: update seqfeature In-Reply-To: <42B96228.4020100@mpiib-berlin.mpg.de> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCAB79@BIONIC.biopolis.one-north.com> <42B6DEC3.9090807@mpiib-berlin.mpg.de> <42B7E788.3040205@mpiib-berlin.mpg.de> <0be3992b92f6a14b6d06d5a06549555b@gnf.org> <42B81C43.9010404@mpiib-berlin.mpg.de> <42B83D31.2000403@nrc-cnrc.gc.ca> <42B92E38.2020008@mpiib-berlin.mpg.de> <42B95EBF.7050403@nrc-cnrc.gc.ca> <42B96228.4020100@mpiib-berlin.mpg.de> Message-ID: <42B9648A.5040001@nrc-cnrc.gc.ca> I get the problem now, that would then be a bug in biojava. It should do an internal check to see if a source/type term change will cause a non-unique exception and if so, then also update the rank to the next available one. One solution would be to catch the exception then do a select for the max(rank) for the given bioentry_id, source_term_id, type_term_id and then increment it by one. In fact, it would probably be wise to always update the rank when changing either the source or type term, so that the ranks stay incrementally consistent, if that really matters. Simon Martina wrote: > Hi Simon, > > sorry, I might haven't made that clear enough: > The problem only exists with changing a feature source (or type, but I > didn't try that) because of the composite unique index in biosql > seqfeature table, it doesn't check if the location is the same or not, > but the combination of type, source, bioentry id and rank has to be > unique. So if I insert a new feature, the rank gets increased by > BioJava somehow and all is well, but if I update an existing features > source and hit by accident the same combination as anothers fetures > type, source, .. I get the exception and the source doesn't change. > At least that is what I suppose is happening. > > My question was how to handle this situation? > > Martina > > > Simon Foote wrote: > >> Hi Martina, >> >> Biojava should handle that correctly. I haven't done it by changing >> a feature source, but I have with changing a feature's location and >> strand. For changing a location: >> >> // Get the Feature you wish to edit >> StrandedFeature sf = ex. use a feature filter to grab the feature by >> it's ID >> Location loc = new Location(100, 1100); >> sf.setLocation(loc); >> >> Since you have already retrieved the feature to edit, biojava will >> automatically do this as an update and not an insert. Or it should >> in all cases where you are modifying a pre-existing feature. >> -- Bioinformatics Programmer Pathogen Genomics Institute for Biological Sciences National Research Council of Canada [T] 613-990-0561 [F] 613-952-9092 simon.foote@nrc-cnrc.gc.ca From jesse-t at chello.nl Wed Jun 22 11:05:40 2005 From: jesse-t at chello.nl (Jesse) Date: Wed Jun 22 10:56:58 2005 Subject: [Biojava-l] RestrictionEnzyme can't handle double sites In-Reply-To: <20050622100859.JIQP1610.amsfep12-int.chello.nl@anonymous> Message-ID: <20050622150534.FXCZ11463.amsfep13-int.chello.nl@anonymous> Another problem. Some Restriction Enzymes have more than one recognition site. Usually this can be notated by using ambiguous symbols, but some for restriction enzymes this is not possible because in some cases the ambiguous symbols rely on each other. Usually an ambiguous symbol is something like this: ANNC The first "N" is independent of the second "N". For example, it can match with: AAAC AACC AAGC AATC .... .... ATTC 16 possibilities. The ambiguous symbols are independent of each other. But in some restriction enzyme, the ambiguous symbols are dependent of each other. So for a sequence like ANNC Would than only match with: AAAC ACCC AGGC ATTC Only 4 possibilities. The ambiguous symbols are dependent of each other. This happens with these enzymes: TaqII M.PhiBssHII (unknown cutlocation) M.Phi3TI (unknown cutlocation) M.Rho11sI (unknown cutlocation) M.SPBetaI (unknown cutlocation) M.SPRI (unknown cutlocation) <1>TaqII <2> <3>GACCGA(11/9),CACCCA(11/9) <4> <5>Thermus aquaticus YTI <6>J.I. Harris <7>X <8>Barker, D., Hoff, M., Oliphant, A., White, R., (1984) Nucleic Acids Res., vol. 12, pp. 5567-5581. Myers, P.A., Roberts, R.J., Unpublished observations. Rutkowska, S.M., Jaworowska, I., Skowron, P.M., Unpublished observations. RestrictionEnzymeManager takes the last recognition site in this example, it skips GACCGA. Name: TaqII RecognitionSite:caccca ForwardRegex: cac{3}a ReverseRegex: tg{3}tg CutType: 0 DownStreamEndType: 0 IsPalindromic: false DownstreamCut: 17, 15, - Jesse -----Oorspronkelijk bericht----- Van: biojava-l-bounces@portal.open-bio.org [mailto:biojava-l-bounces@portal.open-bio.org] Namens Jesse Verzonden: woensdag 22 juni 2005 12:09 Aan: biojava-l@biojava.org Onderwerp: RE: [Biojava-l] RestrictionEnzymeManager can't correctlyhandle incomplete enzymes (I'm not an expert on restriction enzymes.) I was talking about AacI, of which BamHI is an isoschizomer. The recognition site of AacI is unknown, but the one from BamHI is known. Maybe RestrictionEnzymeManager uses the cutlocation of BamHI when asking the unknown cutlocation of AacI. http://rebase.neb.com/rebase/enz/AacI.html That might also be the reason why RestrictionEnzymeManager requires links between restriction enzymes. If a restriction enzyme entry is removed from the REBASE file RestrictionEnzymeManager fails to read in some cases. But I think using cutlocation of isoschizomers is wrong. Because of this: REBASE says: "A isoschizomers is a restriction enzymes that recognize the same DNA sequence. The cut sites may or may not be identical." So the cut site might be different between different isoschizomers. I searched for examples in the REBASE file, and found them: <1>BspKT6I <2>MboI,AspMDI,AsuMBI,Bce243I,Bfi57I,BfiSHI,BfuCI,Bme12I,Bme2494I,BsaPI,BscF I,BsmXII,BspI,Bsp9I,Bsp18I,Bsp49I,Bsp51I,Bsp52I,Bsp54I,Bsp57I,Bsp58I,Bsp59I, Bsp60I,Bsp61I,Bsp64I,Bsp65I,Bsp66I,Bsp67I,Bsp72I,Bsp74I,Bsp76I,Bsp91I,Bsp105 I,Bsp122I,Bsp135I,Bsp136I,Bsp138I,Bsp143I,Bsp147I,Bsp2095I,BspAI,BspFI,BspJI ,BspJ64I,BsrMI,BsrPII,BssGII,Bst19II,Bst1274I,BstEIII,BstENII,BstKTI,BstMBI, BstXII,BtcI,Bth84I,Bth211I,Bth213I,Bth221I,Bth945I,Bth1140I,Bth1141I,Bth1786 I,Bth1997I,BthCanI,BtkII,Btu33I,Btu34I,Btu36I,Btu37I,Btu39I,Btu41I,CacI,CcoP 31I,CcoP76I,CcoP84I,CcoP95II,CcoP219I,CcyI,CdiCD6II,ChaI,Cin1467I,CjeP338I,C paI,CpfI,CpfAI,Csp5I,Cte1179I,Cte1180I,CtyI,CviAI,CviHI,DpnII,EsaLHCI,FnuAII ,FnuCI,FnuEI,Gst1588II,HacI,HpyAIII,HpyHPK5II,Kzo9I,LlaAI,LlaDCHI,LlaKR2I,Ls p1109II,Mel3JI,Mel5JI,Mel7JI,Mel4OI,Mel5OI,Mel2TI,Mel5TI,MeuI,MgoI,MjaIII,Mk rAI,MmeII,Mmu5I,MmuP2I,MnoIII,MosI,Msp67II,MspBI,MthI,Mth1047I,MthAI,NciAI,N deII,NflI,NflAII,NflBI,NlaII,NlaDI,NmeCI,NphI,NsiAI,NspAI,NsuI,Pei9403I,PfaI ,Pph288I,RalF40I,Rlu1I,SalAI,SalHI,Sau15I,Sau6782I,Sau3AI,SauCI,SauDI,SauEI, SauFI,SauGI,SauMI,SinMI,SmiMBI,SsiAI,SsiBI,Ssu211I,Ssu212I,Ssu220I,R1.Ssu247 9I,R2.Ssu2479I,R1.Ssu4109I,R2.Ssu4109I,R1.Ssu4961I,R2.Ssu4961I,R1.Ssu8074I,R 2.Ssu8074I,R1.Ssu11318I,R2.Ssu11318I,R1.SsuDAT1I,R2.SsuDAT1I,SsuRBI,Sth368I, TrsKTI,TrsSI,TrsTI,TruII,Tsp133I,Uba4I,Uba59I,Uba1101I,Uba1177I,Uba1182I,Uba 1183I,Uba1204I,Uba1259I,Uba1317I,Uba1323I,Uba1366I,Vha44I <3>GAT^C <4>2(6) <5>Bacillus species KT6 <6>N.I. Matvienko <7> <8>Shapovalova, N.I., Zheleznaja, L.A., Matvienko, N.I., (1993) Nucleic Acids Res., vol. 21, pp. 5794. Shapovalova, N.I., Zheleznaya, L.A., Matvienko, N.I., (1994) Biokhimiia, vol. 59, pp. 1730-1738. <1>MboI <2>AspMDI,AsuMBI,Bce243I,Bfi57I,BfiSHI,BfuCI,Bme12I,Bme2494I,BsaPI,BscFI,Bsm XII,BspI,Bsp9I,Bsp18I,Bsp49I,Bsp51I,Bsp52I,Bsp54I,Bsp57I,Bsp58I,Bsp59I,Bsp60 I,Bsp61I,Bsp64I,Bsp65I,Bsp66I,Bsp67I,Bsp72I,Bsp74I,Bsp76I,Bsp91I,Bsp105I,Bsp 122I,Bsp135I,Bsp136I,Bsp138I,Bsp143I,Bsp147I,Bsp2095I,BspAI,BspFI,BspJI,BspJ 64I,BspKT6I,BsrMI,BsrPII,BssGII,Bst19II,Bst1274I,BstEIII,BstENII,BstKTI,BstM BI,BstXII,BtcI,Bth84I,Bth211I,Bth213I,Bth221I,Bth945I,Bth1140I,Bth1141I,Bth1 786I,Bth1997I,BthCanI,BtkII,Btu33I,Btu34I,Btu36I,Btu37I,Btu39I,Btu41I,CacI,C coP31I,CcoP76I,CcoP84I,CcoP95II,CcoP219I,CcyI,CdiCD6II,ChaI,Cin1467I,CjeP338 I,CpaI,CpfI,CpfAI,Csp5I,Cte1179I,Cte1180I,CtyI,CviAI,CviHI,DpnII,EsaLHCI,Fnu AII,FnuCI,FnuEI,Gst1588II,HacI,HpyAIII,HpyHPK5II,Kzo9I,LlaAI,LlaDCHI,LlaKR2I ,Lsp1109II,Mel3JI,Mel5JI,Mel7JI,Mel4OI,Mel5OI,Mel2TI,Mel5TI,MeuI,MgoI,MjaIII ,MkrAI,MmeII,Mmu5I,MmuP2I,MnoIII,MosI,Msp67II,MspBI,MthI,Mth1047I,MthAI,NciA I,NdeII,NflI,NflAII,NflBI,NlaII,NlaDI,NmeCI,NphI,NsiAI,NspAI,NsuI,Pei9403I,P faI,Pph288I,RalF40I,Rlu1I,SalAI,SalHI,Sau15I,Sau6782I,Sau3AI,SauCI,SauDI,Sau EI,SauFI,SauGI,SauMI,SinMI,SmiMBI,SsiAI,SsiBI,Ssu211I,Ssu212I,Ssu220I,R1.Ssu 2479I,R2.Ssu2479I,R1.Ssu4109I,R2.Ssu4109I,R1.Ssu4961I,R2.Ssu4961I,R1.Ssu8074 I,R2.Ssu8074I,R1.Ssu11318I,R2.Ssu11318I,R1.SsuDAT1I,R2.SsuDAT1I,SsuRBI,Sth36 8I,TrsKTI,TrsSI,TrsTI,TruII,Tsp133I,Uba4I,Uba59I,Uba1101I,Uba1177I,Uba1182I, Uba1183I,Uba1204I,Uba1259I,Uba1317I,Uba1323I,Uba1366I,Vha44I <3>^GATC <4>2(6) <5>Moraxella bovis <6>ATCC 10900 <7>ACFGKNQRUVX <8>Anton, B.P., Brooks, J.E., Unpublished observations. Gelinas, R.E., Myers, P.A., Roberts, R.J., (1977) J. Mol. Biol., vol. 114, pp. 169-179. Huang, L.-H., Farnet, C.M., Ehrlich, K.C., Ehrlich, M., (1982) Nucleic Acids Res., vol. 10, pp. 1579-1591. Ueno, T., Ito, H., Kimizuka, F., Kotani, H., Nakajima, K., (1993) Nucleic Acids Res., vol. 21, pp. 2309-2313. Ueno, T., Ito, H., Kotani, H., Nakajima, K., Japanese Patent Office, 1993. <1>Mel3JI <2>MboI,AspMDI,AsuMBI,Bce243I,Bfi57I,BfiSHI,BfuCI,Bme12I,Bme2494I,BsaPI,BscF I,BsmXII,BspI,Bsp9I,Bsp18I,Bsp49I,Bsp51I,Bsp52I,Bsp54I,Bsp57I,Bsp58I,Bsp59I, Bsp60I,Bsp61I,Bsp64I,Bsp65I,Bsp66I,Bsp67I,Bsp72I,Bsp74I,Bsp76I,Bsp91I,Bsp105 I,Bsp122I,Bsp135I,Bsp136I,Bsp138I,Bsp143I,Bsp147I,Bsp2095I,BspAI,BspFI,BspJI ,BspJ64I,BspKT6I,BsrMI,BsrPII,BssGII,Bst19II,Bst1274I,BstEIII,BstENII,BstKTI ,BstMBI,BstXII,BtcI,Bth84I,Bth211I,Bth213I,Bth221I,Bth945I,Bth1140I,Bth1141I ,Bth1786I,Bth1997I,BthCanI,BtkII,Btu33I,Btu34I,Btu36I,Btu37I,Btu39I,Btu41I,C acI,CcoP31I,CcoP76I,CcoP84I,CcoP95II,CcoP219I,CcyI,CdiCD6II,ChaI,Cin1467I,Cj eP338I,CpaI,CpfI,CpfAI,Csp5I,Cte1179I,Cte1180I,CtyI,CviAI,CviHI,DpnII,EsaLHC I,FnuAII,FnuCI,FnuEI,Gst1588II,HacI,HpyAIII,HpyHPK5II,Kzo9I,LlaAI,LlaDCHI,Ll aKR2I,Lsp1109II,Mel5JI,Mel7JI,Mel4OI,Mel5OI,Mel2TI,Mel5TI,MeuI,MgoI,MjaIII,M krAI,MmeII,Mmu5I,MmuP2I,MnoIII,MosI,Msp67II,MspBI,MthI,Mth1047I,MthAI,NciAI, NdeII,NflI,NflAII,NflBI,NlaII,NlaDI,NmeCI,NphI,NsiAI,NspAI,NsuI,Pei9403I,Pfa I,Pph288I,RalF40I,Rlu1I,SalAI,SalHI,Sau15I,Sau6782I,Sau3AI,SauCI,SauDI,SauEI ,SauFI,SauGI,SauMI,SinMI,SmiMBI,SsiAI,SsiBI,Ssu211I,Ssu212I,Ssu220I,R1.Ssu24 79I,R2.Ssu2479I,R1.Ssu4109I,R2.Ssu4109I,R1.Ssu4961I,R2.Ssu4961I,R1.Ssu8074I, R2.Ssu8074I,R1.Ssu11318I,R2.Ssu11318I,R1.SsuDAT1I,R2.SsuDAT1I,SsuRBI,Sth368I ,TrsKTI,TrsSI,TrsTI,TruII,Tsp133I,Uba4I,Uba59I,Uba1101I,Uba1177I,Uba1182I,Ub a1183I,Uba1204I,Uba1259I,Uba1317I,Uba1323I,Uba1366I,Vha44I <3>GATC <4> <5>Megasphaera elsedenii 3J <6>P. Pristas <7> <8>Piknova, M., Filova, M., Javorsky, P., Pristas, P., (2004) FEMS Microbiol. Lett., vol. 236, pp. 91-95. Piknova, M., Pristas, P., Javorsky, P., (2004) Folia Microbiol. (Praha), vol. 49, pp. 191-193. -----Oorspronkelijk bericht----- Van: mark.schreiber@novartis.com [ mailto:mark.schreiber@novartis.com] Verzonden: woensdag 22 juni 2005 11:25 Aan: Jesse CC: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org Onderwerp: Re: [Biojava-l] RestrictionEnzymeManager can't correctly handle incomplete enzymes I take your point but I notice that BamHI is an isoscizomer. Is the cleavage site of BamHI really unknown?? - Mark "Jesse" Sent by: biojava-l-bounces@portal.open-bio.org 06/22/2005 04:15 PM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] RestrictionEnzymeManager can't correctly handle incomplete enzymes RestrictionEnzymeManager can't correctly handle incomplete enzymes and gives wrong data. (Correct me if I'm wrong.) I'm not sure if this is already discussed or not. I think RestrictionEnzymeManager can not handle incomplete restriction enzymes. BioJava 1.4Pre2 knows two types of RestrictionEnzymes: -RestrictionEnzyme.CUT_SIMPLE -RestrictionEnzyme.CUT_COMPOUND But in REBASE, there are also other restriction enzyme entries: -Unknown recognition sites. For example "<3>?". RestrictionEnzymeManager skips this one (which is ok). -Unknown cut location. For example AacI "<3>GGATCC". The problem with RestrictionEnzymeManager is with those REBASE entries which have an unknown cutlocation. RestrictionEnzymeManager will actually tell that there is a cutlocation, even though it's unknown in the REBASE file. For example: http://rebase.neb.com/rebase/link_withrefm --------- REBASE ENTRY ----------- <1>AacI <2>BamHI,AaeI,AcaII,AccEBI,AinII,AliI,Ali12257I,Ali12258I,ApaCI,AsiI,AspTII, Atu1II,BamFI,BamKI,BamNI,Bca1259I,Bce751I,Bco10278I,BnaI,BsaDI,Bsp30I,Bsp46I ,Bsp90II,Bsp98I,Bsp130I,Bsp131I,Bsp144I,Bsp4009I,BspAAIII,BstI,Bst1126I,Bst2 464I,Bst2902I,BstQI,Bsu90I,Bsu8565I,Bsu8646I,BsuB519I,BsuB763I,CelI,DdsI,Gdo I,GinI,GoxI,GseIII,GstI,MleI,Mlu23I,NasBI,Nsp29132II,NspSAIV,OkrAI,Pac1110I, Pae177I,Pfl8I,Psp56I,RhsI,Rlu4I,RspLKII,SolI,SpvI,SurI,Uba19I,Uba31I,Uba38I, Uba51I,Uba88I,Uba1098I,Uba1163I,Uba1167I,Uba1172I,Uba1173I,Uba1205I,Uba1224I ,Uba1242I,Uba1250I,Uba1258I,Uba1297I,Uba1302I,Uba1324I,Uba1325I,Uba1334I,Uba 1339I,Uba1346I,Uba1383I,Uba1398I,Uba1402I,Uba1414I,Uba4009I <3>GGATCC <4> <5>Acetobacter aceti sub. liquefaciens <6>IFO 12388 <7> <8>Seurinck, J., van Montagu, M., Unpublished observations. ---------------------------------- --------- RestrictionEnzyme values -------- Name: AacI RecognitionSite:ggatcc ForwardRegex: g{2}atc{2} ReverseRegex: g{2}atc{2} CutType: 0 (RestrictionEnzyme.CUT_SIMPLE) DownStreamEndType: 2 IsPalindromic: true DownstreamCut: 1, 1, ------------------------------------------- As you can see, AaCI is used as RestrictionEnzyme.CUT_SIMPLE and it has a cutlocation while the REBASE entry says that the cutlocation is unknown, only the recognition site is known. So RestrictionEnzymeManager should also filter out those with an unknown cutlocation, otherwise it gives wrong data. - Jesse [Biojava-l] RestrictionEnzymeManager REBASE reader bug? mark.schreiber at novartis.com mark.schreiber at novartis.com Tue Jun 21 22:22:52 EDT 2005 Hello - This is now checked in. All tests pass (no surprise as checking for null never hurt anyone). This will make it into biojava1.4. If you want to add a test to the Junit to ensure this stays fixed it would be most appreciated. I also remember some discussion a while back about the behaivour of certain enzymes with respect to their cleavage points which may or may not have been a bug. Was this ever resolved? If so does anything need fixing? Thanks. - Mark _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From patrick at bennour.de Wed Jun 22 11:43:58 2005 From: patrick at bennour.de (Patrick Bennour) Date: Wed Jun 22 11:35:21 2005 Subject: [Biojava-l] Looking for an Application to visualize Promoter Prediction results Message-ID: <002a01c57741$354a0470$2101a8c0@windowsxp> Dear All, I am looking for an application that does at least some of the following. Input: different promoter prediction analysis programs (like CpgProD, Eponine, FirstEF, McPromoter) The application should then - automatically parse the results - visualize the results in an graphical diagram, that contains the input sequence - visualize the different predictions in an comparative diagram - combine some predictions to improve prediction quality Thanks for your suggestions From kturner at idtdna.com Wed Jun 22 12:07:20 2005 From: kturner at idtdna.com (Keith Turner) Date: Wed Jun 22 11:58:58 2005 Subject: [Biojava-l] Using SeqIOTools in a JNLP context Message-ID: <03D1119D99B98D4D9762E01F1D4FB980010FA832@EXCHANGE.idtdna.com> I've done that, and accepted the permissions, but it still doesn't seem to like having streams passed between classes. It works fine if I am working with the stream in the same method that I got it in (by creating the FileOpenService), but when I try to pass a FileContents, or its associated InputStream or Readers as a parameter in a method call, it does not like it. For example, when trying to write data to a file, the file will get created, but no data is written to it. Maybe this is a more appropriate question for the JNLP developer community, but if any of you have any insight I'd appreciate it. Thanks for your reply, Andreas. -----Original Message----- From: biojava-l-bounces@portal.open-bio.org on behalf of Andreas Prlic Sent: Tue 6/21/2005 5:08 PM To: Cc: Subject: Re: [Biojava-l] Using SeqIOTools in a JNLP context Hi Keith, You should get an java.security.AccessControlException: access denied from webstart. To access the filesystem from an application started with webstart requires special permission. This means you have to sign your application and the user has to permit the execution. see e.g. http://java.sun.com/docs/books/tutorial/security1.2/toolsign/signer.html Cheers, Andreas On 21 Jun 2005, at 20:17, Keith Turner wrote: > Hello- > > I am new to the list. I enjoy working with the Biojava API, but a > problem has arisen for me, and I need some help with it. I am > developing an application to be used in the Java Webstart framework, > and this brings with it some interesting file permission issues. > Basically, you use the JNLP interface FileOpenService to open a file > from within the secure "sandbox" environment, and then you can get an > InputStream out of that. > > So I want to take this InputStream (which presumably is from a Fasta > file), and read a DNA sequence from it. However, all the methods that > worked when I was running my software as a Java application no longer > work in the JNLP environment. In the past, I was doing: > InputStreamReader fr = new InputStreamReader(in); > BufferedReader br = new BufferedReader(fr); > SequenceIterator stream = SeqIOTools.readFastaDNA(br); > Sequence seq = stream.nextSequence(); > But the program freezes on the SeqIOTools.readFastaDNA(br) call. No > exception is thrown back, it just does nothing. Does anyone have any > suggestions as to how I can solve or work around this problem? Thank > you very much > > -Keith Turner > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From heuermh at acm.org Wed Jun 22 12:31:48 2005 From: heuermh at acm.org (Michael Heuer) Date: Wed Jun 22 12:24:30 2005 Subject: [Biojava-l] RestrictionEnzymeManager REBASE reader bug? In-Reply-To: Message-ID: On Wed, 22 Jun 2005 mark.schreiber@novartis.com wrote: > Oops. I was supposed to check that in. > > A bug tracking feature would be nice although I fear that the number of > hands available to fix those tracked bugs might be severely limiting. If > people know of good and free systems I could reccomend them to the > open-bio admins. > > There was some talk of a searchable mail archive a while ago (although > google seems to do a pretty good job of indexing our mail). I'll try and > follow it up. I intend to speak to the open-bio folks while here at the BOSC conference about an open-bio subversion repository and installing bugzilla or a derivation thereof. Any other admin-related issues? michael From mark.schreiber at novartis.com Wed Jun 22 21:01:12 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Jun 22 20:52:45 2005 Subject: [Biojava-l] RestrictionEnzyme can't handle double sites Message-ID: What would be your reccomended solution to this problem? "Jesse" Sent by: biojava-l-bounces@portal.open-bio.org 06/22/2005 11:05 PM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] RestrictionEnzyme can't handle double sites Another problem. Some Restriction Enzymes have more than one recognition site. Usually this can be notated by using ambiguous symbols, but some for restriction enzymes this is not possible because in some cases the ambiguous symbols rely on each other. Usually an ambiguous symbol is something like this: ANNC The first "N" is independent of the second "N". For example, it can match with: AAAC AACC AAGC AATC .... .... ATTC 16 possibilities. The ambiguous symbols are independent of each other. But in some restriction enzyme, the ambiguous symbols are dependent of each other. So for a sequence like ANNC Would than only match with: AAAC ACCC AGGC ATTC Only 4 possibilities. The ambiguous symbols are dependent of each other. This happens with these enzymes: TaqII M.PhiBssHII (unknown cutlocation) M.Phi3TI (unknown cutlocation) M.Rho11sI (unknown cutlocation) M.SPBetaI (unknown cutlocation) M.SPRI (unknown cutlocation) <1>TaqII <2> <3>GACCGA(11/9),CACCCA(11/9) <4> <5>Thermus aquaticus YTI <6>J.I. Harris <7>X <8>Barker, D., Hoff, M., Oliphant, A., White, R., (1984) Nucleic Acids Res., vol. 12, pp. 5567-5581. Myers, P.A., Roberts, R.J., Unpublished observations. Rutkowska, S.M., Jaworowska, I., Skowron, P.M., Unpublished observations. RestrictionEnzymeManager takes the last recognition site in this example, it skips GACCGA. Name: TaqII RecognitionSite:caccca ForwardRegex: cac{3}a ReverseRegex: tg{3}tg CutType: 0 DownStreamEndType: 0 IsPalindromic: false DownstreamCut: 17, 15, - Jesse -----Oorspronkelijk bericht----- Van: biojava-l-bounces@portal.open-bio.org [mailto:biojava-l-bounces@portal.open-bio.org] Namens Jesse Verzonden: woensdag 22 juni 2005 12:09 Aan: biojava-l@biojava.org Onderwerp: RE: [Biojava-l] RestrictionEnzymeManager can't correctlyhandle incomplete enzymes (I'm not an expert on restriction enzymes.) I was talking about AacI, of which BamHI is an isoschizomer. The recognition site of AacI is unknown, but the one from BamHI is known. Maybe RestrictionEnzymeManager uses the cutlocation of BamHI when asking the unknown cutlocation of AacI. http://rebase.neb.com/rebase/enz/AacI.html That might also be the reason why RestrictionEnzymeManager requires links between restriction enzymes. If a restriction enzyme entry is removed from the REBASE file RestrictionEnzymeManager fails to read in some cases. But I think using cutlocation of isoschizomers is wrong. Because of this: REBASE says: "A isoschizomers is a restriction enzymes that recognize the same DNA sequence. The cut sites may or may not be identical." So the cut site might be different between different isoschizomers. I searched for examples in the REBASE file, and found them: <1>BspKT6I <2>MboI,AspMDI,AsuMBI,Bce243I,Bfi57I,BfiSHI,BfuCI,Bme12I,Bme2494I,BsaPI,BscF I,BsmXII,BspI,Bsp9I,Bsp18I,Bsp49I,Bsp51I,Bsp52I,Bsp54I,Bsp57I,Bsp58I,Bsp59I, Bsp60I,Bsp61I,Bsp64I,Bsp65I,Bsp66I,Bsp67I,Bsp72I,Bsp74I,Bsp76I,Bsp91I,Bsp105 I,Bsp122I,Bsp135I,Bsp136I,Bsp138I,Bsp143I,Bsp147I,Bsp2095I,BspAI,BspFI,BspJI ,BspJ64I,BsrMI,BsrPII,BssGII,Bst19II,Bst1274I,BstEIII,BstENII,BstKTI,BstMBI, BstXII,BtcI,Bth84I,Bth211I,Bth213I,Bth221I,Bth945I,Bth1140I,Bth1141I,Bth1786 I,Bth1997I,BthCanI,BtkII,Btu33I,Btu34I,Btu36I,Btu37I,Btu39I,Btu41I,CacI,CcoP 31I,CcoP76I,CcoP84I,CcoP95II,CcoP219I,CcyI,CdiCD6II,ChaI,Cin1467I,CjeP338I,C paI,CpfI,CpfAI,Csp5I,Cte1179I,Cte1180I,CtyI,CviAI,CviHI,DpnII,EsaLHCI,FnuAII ,FnuCI,FnuEI,Gst1588II,HacI,HpyAIII,HpyHPK5II,Kzo9I,LlaAI,LlaDCHI,LlaKR2I,Ls p1109II,Mel3JI,Mel5JI,Mel7JI,Mel4OI,Mel5OI,Mel2TI,Mel5TI,MeuI,MgoI,MjaIII,Mk rAI,MmeII,Mmu5I,MmuP2I,MnoIII,MosI,Msp67II,MspBI,MthI,Mth1047I,MthAI,NciAI,N deII,NflI,NflAII,NflBI,NlaII,NlaDI,NmeCI,NphI,NsiAI,NspAI,NsuI,Pei9403I,PfaI ,Pph288I,RalF40I,Rlu1I,SalAI,SalHI,Sau15I,Sau6782I,Sau3AI,SauCI,SauDI,SauEI, SauFI,SauGI,SauMI,SinMI,SmiMBI,SsiAI,SsiBI,Ssu211I,Ssu212I,Ssu220I,R1.Ssu247 9I,R2.Ssu2479I,R1.Ssu4109I,R2.Ssu4109I,R1.Ssu4961I,R2.Ssu4961I,R1.Ssu8074I,R 2.Ssu8074I,R1.Ssu11318I,R2.Ssu11318I,R1.SsuDAT1I,R2.SsuDAT1I,SsuRBI,Sth368I, TrsKTI,TrsSI,TrsTI,TruII,Tsp133I,Uba4I,Uba59I,Uba1101I,Uba1177I,Uba1182I,Uba 1183I,Uba1204I,Uba1259I,Uba1317I,Uba1323I,Uba1366I,Vha44I <3>GAT^C <4>2(6) <5>Bacillus species KT6 <6>N.I. Matvienko <7> <8>Shapovalova, N.I., Zheleznaja, L.A., Matvienko, N.I., (1993) Nucleic Acids Res., vol. 21, pp. 5794. Shapovalova, N.I., Zheleznaya, L.A., Matvienko, N.I., (1994) Biokhimiia, vol. 59, pp. 1730-1738. <1>MboI <2>AspMDI,AsuMBI,Bce243I,Bfi57I,BfiSHI,BfuCI,Bme12I,Bme2494I,BsaPI,BscFI,Bsm XII,BspI,Bsp9I,Bsp18I,Bsp49I,Bsp51I,Bsp52I,Bsp54I,Bsp57I,Bsp58I,Bsp59I,Bsp60 I,Bsp61I,Bsp64I,Bsp65I,Bsp66I,Bsp67I,Bsp72I,Bsp74I,Bsp76I,Bsp91I,Bsp105I,Bsp 122I,Bsp135I,Bsp136I,Bsp138I,Bsp143I,Bsp147I,Bsp2095I,BspAI,BspFI,BspJI,BspJ 64I,BspKT6I,BsrMI,BsrPII,BssGII,Bst19II,Bst1274I,BstEIII,BstENII,BstKTI,BstM BI,BstXII,BtcI,Bth84I,Bth211I,Bth213I,Bth221I,Bth945I,Bth1140I,Bth1141I,Bth1 786I,Bth1997I,BthCanI,BtkII,Btu33I,Btu34I,Btu36I,Btu37I,Btu39I,Btu41I,CacI,C coP31I,CcoP76I,CcoP84I,CcoP95II,CcoP219I,CcyI,CdiCD6II,ChaI,Cin1467I,CjeP338 I,CpaI,CpfI,CpfAI,Csp5I,Cte1179I,Cte1180I,CtyI,CviAI,CviHI,DpnII,EsaLHCI,Fnu AII,FnuCI,FnuEI,Gst1588II,HacI,HpyAIII,HpyHPK5II,Kzo9I,LlaAI,LlaDCHI,LlaKR2I ,Lsp1109II,Mel3JI,Mel5JI,Mel7JI,Mel4OI,Mel5OI,Mel2TI,Mel5TI,MeuI,MgoI,MjaIII ,MkrAI,MmeII,Mmu5I,MmuP2I,MnoIII,MosI,Msp67II,MspBI,MthI,Mth1047I,MthAI,NciA I,NdeII,NflI,NflAII,NflBI,NlaII,NlaDI,NmeCI,NphI,NsiAI,NspAI,NsuI,Pei9403I,P faI,Pph288I,RalF40I,Rlu1I,SalAI,SalHI,Sau15I,Sau6782I,Sau3AI,SauCI,SauDI,Sau EI,SauFI,SauGI,SauMI,SinMI,SmiMBI,SsiAI,SsiBI,Ssu211I,Ssu212I,Ssu220I,R1.Ssu 2479I,R2.Ssu2479I,R1.Ssu4109I,R2.Ssu4109I,R1.Ssu4961I,R2.Ssu4961I,R1.Ssu8074 I,R2.Ssu8074I,R1.Ssu11318I,R2.Ssu11318I,R1.SsuDAT1I,R2.SsuDAT1I,SsuRBI,Sth36 8I,TrsKTI,TrsSI,TrsTI,TruII,Tsp133I,Uba4I,Uba59I,Uba1101I,Uba1177I,Uba1182I, Uba1183I,Uba1204I,Uba1259I,Uba1317I,Uba1323I,Uba1366I,Vha44I <3>^GATC <4>2(6) <5>Moraxella bovis <6>ATCC 10900 <7>ACFGKNQRUVX <8>Anton, B.P., Brooks, J.E., Unpublished observations. Gelinas, R.E., Myers, P.A., Roberts, R.J., (1977) J. Mol. Biol., vol. 114, pp. 169-179. Huang, L.-H., Farnet, C.M., Ehrlich, K.C., Ehrlich, M., (1982) Nucleic Acids Res., vol. 10, pp. 1579-1591. Ueno, T., Ito, H., Kimizuka, F., Kotani, H., Nakajima, K., (1993) Nucleic Acids Res., vol. 21, pp. 2309-2313. Ueno, T., Ito, H., Kotani, H., Nakajima, K., Japanese Patent Office, 1993. <1>Mel3JI <2>MboI,AspMDI,AsuMBI,Bce243I,Bfi57I,BfiSHI,BfuCI,Bme12I,Bme2494I,BsaPI,BscF I,BsmXII,BspI,Bsp9I,Bsp18I,Bsp49I,Bsp51I,Bsp52I,Bsp54I,Bsp57I,Bsp58I,Bsp59I, Bsp60I,Bsp61I,Bsp64I,Bsp65I,Bsp66I,Bsp67I,Bsp72I,Bsp74I,Bsp76I,Bsp91I,Bsp105 I,Bsp122I,Bsp135I,Bsp136I,Bsp138I,Bsp143I,Bsp147I,Bsp2095I,BspAI,BspFI,BspJI ,BspJ64I,BspKT6I,BsrMI,BsrPII,BssGII,Bst19II,Bst1274I,BstEIII,BstENII,BstKTI ,BstMBI,BstXII,BtcI,Bth84I,Bth211I,Bth213I,Bth221I,Bth945I,Bth1140I,Bth1141I ,Bth1786I,Bth1997I,BthCanI,BtkII,Btu33I,Btu34I,Btu36I,Btu37I,Btu39I,Btu41I,C acI,CcoP31I,CcoP76I,CcoP84I,CcoP95II,CcoP219I,CcyI,CdiCD6II,ChaI,Cin1467I,Cj eP338I,CpaI,CpfI,CpfAI,Csp5I,Cte1179I,Cte1180I,CtyI,CviAI,CviHI,DpnII,EsaLHC I,FnuAII,FnuCI,FnuEI,Gst1588II,HacI,HpyAIII,HpyHPK5II,Kzo9I,LlaAI,LlaDCHI,Ll aKR2I,Lsp1109II,Mel5JI,Mel7JI,Mel4OI,Mel5OI,Mel2TI,Mel5TI,MeuI,MgoI,MjaIII,M krAI,MmeII,Mmu5I,MmuP2I,MnoIII,MosI,Msp67II,MspBI,MthI,Mth1047I,MthAI,NciAI, NdeII,NflI,NflAII,NflBI,NlaII,NlaDI,NmeCI,NphI,NsiAI,NspAI,NsuI,Pei9403I,Pfa I,Pph288I,RalF40I,Rlu1I,SalAI,SalHI,Sau15I,Sau6782I,Sau3AI,SauCI,SauDI,SauEI ,SauFI,SauGI,SauMI,SinMI,SmiMBI,SsiAI,SsiBI,Ssu211I,Ssu212I,Ssu220I,R1.Ssu24 79I,R2.Ssu2479I,R1.Ssu4109I,R2.Ssu4109I,R1.Ssu4961I,R2.Ssu4961I,R1.Ssu8074I, R2.Ssu8074I,R1.Ssu11318I,R2.Ssu11318I,R1.SsuDAT1I,R2.SsuDAT1I,SsuRBI,Sth368I ,TrsKTI,TrsSI,TrsTI,TruII,Tsp133I,Uba4I,Uba59I,Uba1101I,Uba1177I,Uba1182I,Ub a1183I,Uba1204I,Uba1259I,Uba1317I,Uba1323I,Uba1366I,Vha44I <3>GATC <4> <5>Megasphaera elsedenii 3J <6>P. Pristas <7> <8>Piknova, M., Filova, M., Javorsky, P., Pristas, P., (2004) FEMS Microbiol. Lett., vol. 236, pp. 91-95. Piknova, M., Pristas, P., Javorsky, P., (2004) Folia Microbiol. (Praha), vol. 49, pp. 191-193. -----Oorspronkelijk bericht----- Van: mark.schreiber@novartis.com [ mailto:mark.schreiber@novartis.com] Verzonden: woensdag 22 juni 2005 11:25 Aan: Jesse CC: biojava-l@biojava.org; biojava-l-bounces@portal.open-bio.org Onderwerp: Re: [Biojava-l] RestrictionEnzymeManager can't correctly handle incomplete enzymes I take your point but I notice that BamHI is an isoscizomer. Is the cleavage site of BamHI really unknown?? - Mark "Jesse" Sent by: biojava-l-bounces@portal.open-bio.org 06/22/2005 04:15 PM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] RestrictionEnzymeManager can't correctly handle incomplete enzymes RestrictionEnzymeManager can't correctly handle incomplete enzymes and gives wrong data. (Correct me if I'm wrong.) I'm not sure if this is already discussed or not. I think RestrictionEnzymeManager can not handle incomplete restriction enzymes. BioJava 1.4Pre2 knows two types of RestrictionEnzymes: -RestrictionEnzyme.CUT_SIMPLE -RestrictionEnzyme.CUT_COMPOUND But in REBASE, there are also other restriction enzyme entries: -Unknown recognition sites. For example "<3>?". RestrictionEnzymeManager skips this one (which is ok). -Unknown cut location. For example AacI "<3>GGATCC". The problem with RestrictionEnzymeManager is with those REBASE entries which have an unknown cutlocation. RestrictionEnzymeManager will actually tell that there is a cutlocation, even though it's unknown in the REBASE file. For example: http://rebase.neb.com/rebase/link_withrefm --------- REBASE ENTRY ----------- <1>AacI <2>BamHI,AaeI,AcaII,AccEBI,AinII,AliI,Ali12257I,Ali12258I,ApaCI,AsiI,AspTII, Atu1II,BamFI,BamKI,BamNI,Bca1259I,Bce751I,Bco10278I,BnaI,BsaDI,Bsp30I,Bsp46I ,Bsp90II,Bsp98I,Bsp130I,Bsp131I,Bsp144I,Bsp4009I,BspAAIII,BstI,Bst1126I,Bst2 464I,Bst2902I,BstQI,Bsu90I,Bsu8565I,Bsu8646I,BsuB519I,BsuB763I,CelI,DdsI,Gdo I,GinI,GoxI,GseIII,GstI,MleI,Mlu23I,NasBI,Nsp29132II,NspSAIV,OkrAI,Pac1110I, Pae177I,Pfl8I,Psp56I,RhsI,Rlu4I,RspLKII,SolI,SpvI,SurI,Uba19I,Uba31I,Uba38I, Uba51I,Uba88I,Uba1098I,Uba1163I,Uba1167I,Uba1172I,Uba1173I,Uba1205I,Uba1224I ,Uba1242I,Uba1250I,Uba1258I,Uba1297I,Uba1302I,Uba1324I,Uba1325I,Uba1334I,Uba 1339I,Uba1346I,Uba1383I,Uba1398I,Uba1402I,Uba1414I,Uba4009I <3>GGATCC <4> <5>Acetobacter aceti sub. liquefaciens <6>IFO 12388 <7> <8>Seurinck, J., van Montagu, M., Unpublished observations. ---------------------------------- --------- RestrictionEnzyme values -------- Name: AacI RecognitionSite:ggatcc ForwardRegex: g{2}atc{2} ReverseRegex: g{2}atc{2} CutType: 0 (RestrictionEnzyme.CUT_SIMPLE) DownStreamEndType: 2 IsPalindromic: true DownstreamCut: 1, 1, ------------------------------------------- As you can see, AaCI is used as RestrictionEnzyme.CUT_SIMPLE and it has a cutlocation while the REBASE entry says that the cutlocation is unknown, only the recognition site is known. So RestrictionEnzymeManager should also filter out those with an unknown cutlocation, otherwise it gives wrong data. - Jesse [Biojava-l] RestrictionEnzymeManager REBASE reader bug? mark.schreiber at novartis.com mark.schreiber at novartis.com Tue Jun 21 22:22:52 EDT 2005 Hello - This is now checked in. All tests pass (no surprise as checking for null never hurt anyone). This will make it into biojava1.4. If you want to add a test to the Junit to ensure this stays fixed it would be most appreciated. I also remember some discussion a while back about the behaivour of certain enzymes with respect to their cleavage points which may or may not have been a bug. Was this ever resolved? If so does anything need fixing? Thanks. - Mark _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From Yudong.Sun at newcastle.ac.uk Sun Jun 26 05:42:08 2005 From: Yudong.Sun at newcastle.ac.uk (Y D Sun) Date: Sun Jun 26 05:34:14 2005 Subject: [Biojava-l] BLAST Parser for extracting all BLAST data? Message-ID: Hi, I want to extract all data from BLASTP results. In the following hit, for example, I need to get the lengths of query and subject proteins, the identities (including all data 54, 124 and 43%), the positives (all data 79, 124 and 63%), and the gaps (3, 124 and 2%). Can the BLASTLikeSAXParser filter all these information? I can't find the methods in SeqSimilaritySearchHit and SeqSimilaritySearchSubHit APIs to retrieve these data. Does Biojava provide any methods for this purpose? Thanks, George BLASTP 2.2.5 [Nov-16-2002] Query= Prot0001 (138 letters) Database: /work/nys1/fasta/protein/AE000782.pro.fasta 2407 sequences; 662,866 total letters Searching.....done Score E Sequences producing significant alignments: (bits) Value Prot0002 100 1e-23 Prot0003 74 2e-15 Prot0004 43 3e-06 >Prot0002 Length = 138 Score = 100 bits (250), Expect = 1e-23 Identities = 54/124 (43%), Positives = 79/124 (63%), Gaps = 3/124 (2%) Query: 18 NARTKFTDIAKTLNLTEAAIRKRIKKLEENQIIKRYSIDIDYKKLGYNMAIIGLDIDMDY 77 NAR T IAK LN+TEAA+RKRI LE + I Y I+YKK+G + ++ G+D+D D Sbjct: 15 NARIPKTRIAKELNVTEAAVRKRIANLERREEILGYKAIINYKKVGLSASLTGVDVDPDK 74 Query: 78 FPKIIKELEKRKEFLHIYSSAGDHDIMVIAIYK---DLEEIYNYLKNLKGVKRVCPAIII 134 K+++EL+ + ++ + GDH IM I K +L EI+ + ++GVKRVCP+II Sbjct: 75 LWKVVEELKDLESVKSLWLTTGDHTIMAEIIAKSVQELSEIHQKIAEMEGVKRVCPSIIT 134 Query: 135 DQIK 138 D +K Sbjct: 135 DIVK 138 From hollandr at gis.a-star.edu.sg Sun Jun 26 11:06:40 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Sun Jun 26 10:59:35 2005 Subject: [Biojava-l] RE: [BioSQL-l] update seqfeature Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D56E562B5@BIONIC.biopolis.one-north.com> Actually, BioJava is not that clever. Yet. Martina's original observation is right, in that the correct way to do this would be to check the database to see if the altered seqfeature already existed, and if it did, to refer to that one instead. But this is not the way BioJava does things at present. A fix for this will probably end up being built in to the replacement BioJava/BioSQL classes currently in progress, but for now, to delete/create the feature is probably the best workaround. cheers, Richard -----Original Message----- From: biosql-l-bounces@portal.open-bio.org on behalf of Martina Sent: Wed 6/22/2005 5:24 PM To: simon.foote@nrc-cnrc.gc.ca Cc: biosql-l-bounces@portal.open-bio.org; BioJava; biosql-l@open-bio.org Subject: [BioSQL-l] update seqfeature Hi Simon, I'm changing the FeatureSource and in setFeatureSource an update on the source_term_id happens. In the case the combination is already there, I get an Exception. The proper way to deal with that would be to get the seqfeature_id of the entry already there and use that, or try to update the rank unless its a unique combination? Or should I rather not mess with the BioJava and delete that entry and insert it as new to let BioJava handle the rank increase? Thanks for any advise Martina Simon Foote wrote: > Hi Martina, > > In fact you can, as rank is the field that allows this to happen. In > Biojava, currently it's just a linearily incremented number such that > you can have the same type and source IDs for a given bioentry. > > For example, adding a Genbank entry with 10 CDS features for 1 bioentry > will give you identical keys for bioentry_id, type_term_id and > source_term_id, but will have a rank of 1 - 10 for each. > > Simon > _______________________________________________ BioSQL-l mailing list BioSQL-l@open-bio.org http://open-bio.org/mailman/listinfo/biosql-l From hollandr at gis.a-star.edu.sg Sun Jun 26 11:11:30 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Sun Jun 26 11:04:09 2005 Subject: [Biojava-l] Re: [BioSQL-l] _removeSequence Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D56E562B6@BIONIC.biopolis.one-north.com> The revamped BioJava/BioSQL classes will expose the rank to the user for all tables which have ranks. cheers, Richard -----Original Message----- From: biosql-l-bounces@portal.open-bio.org on behalf of Simon Foote Sent: Wed 6/22/2005 12:15 AM To: Martina Cc: Hilmar Lapp; biosql-l-bounces@portal.open-bio.org; BioJava; biosql-l@open-bio.org Subject: Re: [Biojava-l] Re: [BioSQL-l] _removeSequence Hi Martina, In fact you can, as rank is the field that allows this to happen. In Biojava, currently it's just a linearily incremented number such that you can have the same type and source IDs for a given bioentry. For example, adding a Genbank entry with 10 CDS features for 1 bioentry will give you identical keys for bioentry_id, type_term_id and source_term_id, but will have a rank of 1 - 10 for each. Simon Martina wrote: > That means, that I can't have 2 features refering to the same bioentry > with the same type (= type_term_id)and source (=source_term_id) but > different parent features because of the composite key bioentry_id in > the seqfeature table? Or what does "rank" in that table mean (its part > of that key), how can I get different ranks? > > Martina > > Hilmar Lapp wrote: > >> The Biojava people will respond to this. Note though that >> Term_Relationship is for storing subject-predicate-object triples of >> terms, so I'm not sure why you want to use it for storing/associating >> annotation. Maybe you meant bioentry_qualifier_value? >> >> -hilmar >> >> On Jun 21, 2005, at 3:10 AM, Martina wrote: >> >>> >>>> Yes. When you insert a sequence you must be prepared that when >>>> inserting its ontology term or tag/value annotation the term may >>>> already be present because another bioentry uses it too. >>> >>> >>> >>> Ok, the proper way is to catch the SQLException in BIOSQLFeature, >>> test if it is a Dublicate key entry, get the identifier of the term >>> (would that be the BioSQLfeatureId ?) and insert it in the >>> term_relationship table? And there is no nice BioJava method for >>> this, I have to do it "manually", like conn.prepareStatement(..) and >>> stuff? BioJava spoiled me so! >>> >>> Martina >>> > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l -- Bioinformatics Programmer Pathogen Genomics Institute for Biological Sciences National Research Council of Canada [T] 613-990-0561 [F] 613-952-9092 simon.foote@nrc-cnrc.gc.ca _______________________________________________ BioSQL-l mailing list BioSQL-l@open-bio.org http://open-bio.org/mailman/listinfo/biosql-l From hollandr at gis.a-star.edu.sg Sun Jun 26 11:33:14 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Sun Jun 26 11:26:10 2005 Subject: [Biojava-l] BLAST Parser for extracting all BLAST data? Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D56E562BB@BIONIC.biopolis.one-north.com> BioJava's BLAST framework parses files and fires events for every piece of information it finds. The SeqSimilarityAdapter class is an example of how to catch these events and construct basic BLAST result objects (SimpleSeqSimilarityHit), however they are not comprehensive and do not record full details of every hit. If you want the kind of detail you mention below you will have to write your own content handler for BLAST parsing and parse it to the BLASTLikeSAXParser when parsing a file. This event handler should implement the ContentHandler interface. Look at the source of SeqSimilarityAdapter for guidance. You will then receive events for every part of the file, from which you can construct your own custom BLAST result objects to describe them. If you're not sure what tag names to listen for in your ContentHandler the easiest thing to do is just run it once and dump them all out to see what you get. cheers, Richard -----Original Message----- From: biojava-l-bounces@portal.open-bio.org on behalf of Y D Sun Sent: Sun 6/26/2005 5:42 PM To: biojava-l@biojava.org Cc: Subject: [Biojava-l] BLAST Parser for extracting all BLAST data? Hi, I want to extract all data from BLASTP results. In the following hit, for example, I need to get the lengths of query and subject proteins, the identities (including all data 54, 124 and 43%), the positives (all data 79, 124 and 63%), and the gaps (3, 124 and 2%). Can the BLASTLikeSAXParser filter all these information? I can't find the methods in SeqSimilaritySearchHit and SeqSimilaritySearchSubHit APIs to retrieve these data. Does Biojava provide any methods for this purpose? Thanks, George BLASTP 2.2.5 [Nov-16-2002] Query= Prot0001 (138 letters) Database: /work/nys1/fasta/protein/AE000782.pro.fasta 2407 sequences; 662,866 total letters Searching.....done Score E Sequences producing significant alignments: (bits) Value Prot0002 100 1e-23 Prot0003 74 2e-15 Prot0004 43 3e-06 >Prot0002 Length = 138 Score = 100 bits (250), Expect = 1e-23 Identities = 54/124 (43%), Positives = 79/124 (63%), Gaps = 3/124 (2%) Query: 18 NARTKFTDIAKTLNLTEAAIRKRIKKLEENQIIKRYSIDIDYKKLGYNMAIIGLDIDMDY 77 NAR T IAK LN+TEAA+RKRI LE + I Y I+YKK+G + ++ G+D+D D Sbjct: 15 NARIPKTRIAKELNVTEAAVRKRIANLERREEILGYKAIINYKKVGLSASLTGVDVDPDK 74 Query: 78 FPKIIKELEKRKEFLHIYSSAGDHDIMVIAIYK---DLEEIYNYLKNLKGVKRVCPAIII 134 K+++EL+ + ++ + GDH IM I K +L EI+ + ++GVKRVCP+II Sbjct: 75 LWKVVEELKDLESVKSLWLTTGDHTIMAEIIAKSVQELSEIHQKIAEMEGVKRVCPSIIT 134 Query: 135 DQIK 138 D +K Sbjct: 135 DIVK 138 _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From tblum at andrew.cmu.edu Sun Jun 26 12:10:23 2005 From: tblum at andrew.cmu.edu (Tal Blum) Date: Sun Jun 26 12:01:41 2005 Subject: [Biojava-l] Psi-Blast results Message-ID: <200506261610.j5QGAOGk016520@smtp.andrew.cmu.edu> Hi, I need to get Psi-Blast results for a large dataset of proteins. Does anyone know if there are any free tools or classes to do that? Thanks, Tal From hollandr at gis.a-star.edu.sg Sun Jun 26 14:55:22 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Sun Jun 26 14:47:45 2005 Subject: [Biojava-l] Psi-Blast results Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D56E562BD@BIONIC.biopolis.one-north.com> The standard BLAST parser in BioJava cannot understand PsiBLAST output as far as I'm aware. I haven't used PsiBLAST much so I don't know if you can change its output format, but if you can persuade it to output its results in NCBI BLAST format instead, then you might have more luck. BioPerl most definitely does have functions for the job. cheers, Richard -----Original Message----- From: biojava-l-bounces@portal.open-bio.org on behalf of Tal Blum Sent: Mon 6/27/2005 12:10 AM To: biojava-l@biojava.org Cc: Subject: [Biojava-l] Psi-Blast results Hi, I need to get Psi-Blast results for a large dataset of proteins. Does anyone know if there are any free tools or classes to do that? Thanks, Tal _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From heuermh at acm.org Mon Jun 27 00:36:01 2005 From: heuermh at acm.org (Michael Heuer) Date: Mon Jun 27 00:26:23 2005 Subject: [Biojava-l] BOSC 2005 lightning talks Message-ID: Hello, The presentations for the lightning talks on biojava and the sourceforge stax I gave at BOSC 2005 are available (temporarily) from > http://shore.net/~heuermh/biojava-24jun2005.ppt and > http://shore.net/~heuermh/stax-24jun2005.ppt respectively. Please let me know if I should make any corrections before making them more widely available. I would like to move them to the public biojava.org and stax.sf.net project websites in a few days. michael From michael.tran at acpfg.com.au Mon Jun 27 01:19:33 2005 From: michael.tran at acpfg.com.au (Michael Tran) Date: Mon Jun 27 01:09:01 2005 Subject: [Biojava-l] Nucleotide translation Message-ID: Dear Members I'm a newbie to BioJava. I'm looking for a Class file that can translate a nucleotide sequence into its 6 reading frames and then into protein. I'm finding the BioJava API difficult to navigate. Help is much appreciated. Cheers Kally From rahul at genebrew.com Mon Jun 27 01:36:04 2005 From: rahul at genebrew.com (Rahul Karnik) Date: Mon Jun 27 01:25:42 2005 Subject: [Biojava-l] Nucleotide translation In-Reply-To: References: Message-ID: <42BF9044.4080500@genebrew.com> Michael Tran wrote: > I'm looking for a Class file that can translate a nucleotide sequence into its 6 reading frames and then into protein. > I'm finding the BioJava API difficult to navigate. http://www.biojava.org/docs/bj_in_anger/Translation.htm In general, for help with BioJava, you should first look at the BioJava in Anger site at http://www.biojava.org/docs/bj_in_anger/. Hope that helps, Rahul From mark.schreiber at novartis.com Mon Jun 27 01:41:46 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Jun 27 01:33:08 2005 Subject: [Biojava-l] Nucleotide translation Message-ID: To do a six frame translation have a look at: http://www.biojava.org/docs/bj_in_anger/sixframetranslate.html Rahul Karnik Sent by: biojava-l-bounces@portal.open-bio.org 06/27/2005 01:36 PM To: Michael Tran cc: biojava-l@biojava.org, (bcc: Mark Schreiber/GP/Novartis) Subject: Re: [Biojava-l] Nucleotide translation Michael Tran wrote: > I'm looking for a Class file that can translate a nucleotide sequence into its 6 reading frames and then into protein. > I'm finding the BioJava API difficult to navigate. http://www.biojava.org/docs/bj_in_anger/Translation.htm In general, for help with BioJava, you should first look at the BioJava in Anger site at http://www.biojava.org/docs/bj_in_anger/. Hope that helps, Rahul _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From jesse-t at chello.nl Tue Jun 28 04:46:09 2005 From: jesse-t at chello.nl (Jesse) Date: Tue Jun 28 04:37:27 2005 Subject: [Biojava-l] RestrictionEnzyme can't handle double sites Message-ID: <20050628084604.95E6B2E01D@rbox4.erasmusmc.nl> I think a solution requires the RestritionEnzyme class to be changed. Maybe changing getRecognitionSite() to return an array of Strings SymbolLists instead of a single String? -Jesse ----------------------------------- mark.schreiber at novartis.com mark.schreiber at novartis.com Wed Jun 22 21:01:12 EDT 2005 What would be your reccomended solution to this problem? "Jesse" Sent by: biojava-l-bounces at portal.open-bio.org 06/22/2005 11:05 PM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] RestrictionEnzyme can't handle double sites Another problem. Some Restriction Enzymes have more than one recognition site. Usually this can be notated by using ambiguous symbols, but some for restriction enzymes this is not possible because in some cases the ambiguous symbols rely on each other. Usually an ambiguous symbol is something like this: ANNC The first "N" is independent of the second "N". For example, it can match with: AAAC AACC AAGC AATC .... .... ATTC 16 possibilities. The ambiguous symbols are independent of each other. But in some restriction enzyme, the ambiguous symbols are dependent of each other. So for a sequence like ANNC Would than only match with: AAAC ACCC AGGC ATTC Only 4 possibilities. The ambiguous symbols are dependent of each other. This happens with these enzymes: TaqII M.PhiBssHII (unknown cutlocation) M.Phi3TI (unknown cutlocation) M.Rho11sI (unknown cutlocation) M.SPBetaI (unknown cutlocation) M.SPRI (unknown cutlocation) <1>TaqII <2> <3>GACCGA(11/9),CACCCA(11/9) <4> <5>Thermus aquaticus YTI <6>J.I. Harris <7>X <8>Barker, D., Hoff, M., Oliphant, A., White, R., (1984) Nucleic Acids Res., vol. 12, pp. 5567-5581. Myers, P.A., Roberts, R.J., Unpublished observations. Rutkowska, S.M., Jaworowska, I., Skowron, P.M., Unpublished observations. RestrictionEnzymeManager takes the last recognition site in this example, it skips GACCGA. Name: TaqII RecognitionSite:caccca ForwardRegex: cac{3}a ReverseRegex: tg{3}tg CutType: 0 DownStreamEndType: 0 IsPalindromic: false DownstreamCut: 17, 15, - Jesse From great_fred at yahoo.com Tue Jun 28 05:11:12 2005 From: great_fred at yahoo.com (=?iso-8859-1?q?S=E9bastien=20PETIT?=) Date: Tue Jun 28 05:02:40 2005 Subject: [Biojava-l] BLAST Parser for extracting all BLAST data? In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D56E562BB@BIONIC.biopolis.one-north.com> Message-ID: <20050628091112.34256.qmail@web32201.mail.mud.yahoo.com> Hi, everybody... I'm like Georges....I want to extract data from BLAST files..... I can have the alignements, no problem...But, now, I want the alignment between the 2 sequences (the lines with "+", "-" and some letters in George's example....) because with this, we can see in a glance if the alignment between the 2 sequences is really good or not. Is it possible, Docs?? Thank you. Sebastien --- Richard HOLLAND a ?crit : > BioJava's BLAST framework parses files and fires events for every > piece of information it finds. The SeqSimilarityAdapter class is an > example of how to catch these events and construct basic BLAST result > objects (SimpleSeqSimilarityHit), however they are not comprehensive > and do not record full details of every hit. > > If you want the kind of detail you mention below you will have to > write your own content handler for BLAST parsing and parse it to the > BLASTLikeSAXParser when parsing a file. This event handler should > implement the ContentHandler interface. Look at the source of > SeqSimilarityAdapter for guidance. You will then receive events for > every part of the file, from which you can construct your own custom > BLAST result objects to describe them. > > If you're not sure what tag names to listen for in your > ContentHandler the easiest thing to do is just run it once and dump > them all out to see what you get. > > cheers, > Richard > > > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org on behalf of Y D Sun > Sent: Sun 6/26/2005 5:42 PM > To: biojava-l@biojava.org > Cc: > Subject: [Biojava-l] BLAST Parser for extracting all BLAST data? > > Hi, > > I want to extract all data from BLASTP results. In the following hit, > for example, I need to get the lengths of query and subject proteins, > the identities (including all data 54, 124 and 43%), the positives > (all > data 79, 124 and 63%), and the gaps (3, 124 and 2%). Can the > BLASTLikeSAXParser filter all these information? I can't find the > methods in SeqSimilaritySearchHit and SeqSimilaritySearchSubHit APIs > to > retrieve these data. Does Biojava provide any methods for this > purpose? > > Thanks, > > George > > > BLASTP 2.2.5 [Nov-16-2002] > > Query= Prot0001 > (138 letters) > > Database: /work/nys1/fasta/protein/AE000782.pro.fasta > 2407 sequences; 662,866 total letters > > Searching.....done > > > Score > E > Sequences producing significant alignments: > (bits) > Value > > Prot0002 > 100 > 1e-23 > Prot0003 > 74 > 2e-15 > Prot0004 > 43 > 3e-06 > > >Prot0002 > Length = 138 > > Score = 100 bits (250), Expect = 1e-23 > Identities = 54/124 (43%), Positives = 79/124 (63%), Gaps = 3/124 > (2%) > > Query: 18 > NARTKFTDIAKTLNLTEAAIRKRIKKLEENQIIKRYSIDIDYKKLGYNMAIIGLDIDMDY > 77 > NAR T IAK LN+TEAA+RKRI LE + I Y I+YKK+G + ++ G+D+D > D > Sbjct: 15 > NARIPKTRIAKELNVTEAAVRKRIANLERREEILGYKAIINYKKVGLSASLTGVDVDPDK > 74 > > Query: 78 > FPKIIKELEKRKEFLHIYSSAGDHDIMVIAIYK---DLEEIYNYLKNLKGVKRVCPAIII > 134 > K+++EL+ + ++ + GDH IM I K +L EI+ + > ++GVKRVCP+II > Sbjct: 75 > LWKVVEELKDLESVKSLWLTTGDHTIMAEIIAKSVQELSEIHQKIAEMEGVKRVCPSIIT > 134 > > Query: 135 DQIK 138 > D +K > Sbjct: 135 DIVK 138 > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > ___________________________________________________________________________ Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger T?l?chargez cette version sur http://fr.messenger.yahoo.com From Yudong.Sun at newcastle.ac.uk Tue Jun 28 06:13:25 2005 From: Yudong.Sun at newcastle.ac.uk (Y D Sun) Date: Tue Jun 28 06:05:43 2005 Subject: [Biojava-l] BLAST Parser for extracting all BLAST data? Message-ID: Hi, With the example, I can extract all information I require except the length of query sequence. Is there any "hidden" method that can report the query length in parenthesis as (138 letters) in the sample output below? BTW, the addSubHitProperty() method doesn't report the Gaps data. Fortunately, I don't need it at the moment. Thanks, George >-----Original Message----- >From: mark.schreiber@novartis.com [mailto:mark.schreiber@novartis.com] >Sent: 27 June 2005 03:25 >To: Y D Sun >Subject: Re: [Biojava-l] BLAST Parser for extracting all BLAST data? > >Hello - > >Take a look at the Blast examples in biojava in anger (follow >the cookbook link from the biojava.org page). > >In particular look at >http://www.biojava.org/docs/bj_in_anger/blastecho.htm > >The example program will tell you which methods are being >called for what information and will give you some clues as to >where everything ends up. > >- Mark > > > > > >"Y D Sun" >Sent by: biojava-l-bounces@portal.open-bio.org >06/26/2005 05:42 PM > > > To: > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] BLAST Parser for >extracting all BLAST data? > > >Hi, > >I want to extract all data from BLASTP results. In the >following hit, for example, I need to get the lengths of query >and subject proteins, the identities (including all data 54, >124 and 43%), the positives (all data 79, 124 and 63%), and >the gaps (3, 124 and 2%). Can the BLASTLikeSAXParser filter >all these information? I can't find the methods in >SeqSimilaritySearchHit and SeqSimilaritySearchSubHit APIs to >retrieve these data. Does Biojava provide any methods for this purpose? > >Thanks, > >George > > >BLASTP 2.2.5 [Nov-16-2002] > >Query= Prot0001 > (138 letters) > >Database: /work/nys1/fasta/protein/AE000782.pro.fasta > 2407 sequences; 662,866 total letters > >Searching.....done > > > Score E >Sequences producing significant alignments: (bits) >Value > >Prot0002 100 >1e-23 >Prot0003 74 >2e-15 >Prot0004 43 >3e-06 > >>Prot0002 > Length = 138 > > Score = 100 bits (250), Expect = 1e-23 Identities = 54/124 >(43%), Positives = 79/124 (63%), Gaps = 3/124 (2%) > >Query: 18 NARTKFTDIAKTLNLTEAAIRKRIKKLEENQIIKRYSIDIDYKKLGYNMAIIGLDIDMDY >77 > NAR T IAK LN+TEAA+RKRI LE + I Y I+YKK+G + ++ G+D+D D >Sbjct: 15 NARIPKTRIAKELNVTEAAVRKRIANLERREEILGYKAIINYKKVGLSASLTGVDVDPDK >74 > >Query: 78 FPKIIKELEKRKEFLHIYSSAGDHDIMVIAIYK---DLEEIYNYLKNLKGVKRVCPAIII >134 > K+++EL+ + ++ + GDH IM I K +L EI+ + ++GVKRVCP+II >Sbjct: 75 LWKVVEELKDLESVKSLWLTTGDHTIMAEIIAKSVQELSEIHQKIAEMEGVKRVCPSIIT >134 > >Query: 135 DQIK 138 > D +K >Sbjct: 135 DIVK 138 > >_______________________________________________ >Biojava-l mailing list - Biojava-l@biojava.org >http://biojava.org/mailman/listinfo/biojava-l > > > > From great_fred at yahoo.com Tue Jun 28 07:34:17 2005 From: great_fred at yahoo.com (=?iso-8859-1?q?S=E9bastien=20PETIT?=) Date: Tue Jun 28 07:26:09 2005 Subject: [Biojava-l] BLAST Parser for extracting all BLAST data? In-Reply-To: Message-ID: <20050628113417.86358.qmail@web32207.mail.mud.yahoo.com> Arggh!!!!I didn't find what I wanted!! I used the program you gave me but with a light modification because it didn't recognize my XML file... The parser is, now, a BlastXMLParserFacade.... And it gave me everythings it found in the file..... BUT not what I want!!GRRR...>:( >:( >:( There is a mark out (I don't know if it's the good word...) in my XML file which frame what I'm searching for : .... Why the parser doesn't see it..?? I didn't really understand how the XML parser works....So, how can I modifie it to find my happiness...?? PLEASE DOC'!!! ;);) Help me!! Thanks for everythings.. Sebastien --- mark.schreiber@novartis.com a ?crit : > Hi - > > Try running this program > http://www.biojava.org/docs/bj_in_anger/blastecho.htm > > If you see what you need in the output then it is being read by the > Blast > parser and emitted as an event (which you could listen for). If it > isn't > then the Blast parser is not emitting those events although someone > confident with the blast format could probably modify it so it does. > > In short, it is possible but it might not be implemented ; ) > > - Mark > > > > > > S?bastien PETIT > Sent by: biojava-l-bounces@portal.open-bio.org > 06/28/2005 05:11 PM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: RE: [Biojava-l] BLAST Parser for extracting > all BLAST data? > > > Hi, everybody... > > I'm like Georges....I want to extract data from BLAST files..... > I can have the alignements, no problem...But, now, I want the > alignment > between the 2 sequences (the lines with "+", "-" and some letters in > George's example....) because with this, we can see in a glance if > the > alignment between the 2 sequences is really good or not. > > Is it possible, Docs?? > > Thank you. > > Sebastien > > --- Richard HOLLAND a ?crit : > > > BioJava's BLAST framework parses files and fires events for every > > piece of information it finds. The SeqSimilarityAdapter class is an > > example of how to catch these events and construct basic BLAST > result > > objects (SimpleSeqSimilarityHit), however they are not > comprehensive > > and do not record full details of every hit. > > > > If you want the kind of detail you mention below you will have to > > write your own content handler for BLAST parsing and parse it to > the > > BLASTLikeSAXParser when parsing a file. This event handler should > > implement the ContentHandler interface. Look at the source of > > SeqSimilarityAdapter for guidance. You will then receive events for > > every part of the file, from which you can construct your own > custom > > BLAST result objects to describe them. > > > > If you're not sure what tag names to listen for in your > > ContentHandler the easiest thing to do is just run it once and dump > > them all out to see what you get. > > > > cheers, > > Richard > > > > > > -----Original Message----- > > From: biojava-l-bounces@portal.open-bio.org on behalf of Y > D > Sun > > Sent: Sun 6/26/2005 5:42 PM > > To: biojava-l@biojava.org > > Cc: > > Subject: [Biojava-l] BLAST Parser for extracting all > BLAST > data? > > > > Hi, > > > > I want to extract all data from BLASTP results. In the following > hit, > > for example, I need to get the lengths of query and subject > proteins, > > the identities (including all data 54, 124 and 43%), the positives > > (all > > data 79, 124 and 63%), and the gaps (3, 124 and 2%). Can the > > BLASTLikeSAXParser filter all these information? I can't find the > > methods in SeqSimilaritySearchHit and SeqSimilaritySearchSubHit > APIs > > to > > retrieve these data. Does Biojava provide any methods for this > > purpose? > > > > Thanks, > > > > George > > > > > > BLASTP 2.2.5 [Nov-16-2002] > > > > Query= Prot0001 > > (138 letters) > > > > Database: /work/nys1/fasta/protein/AE000782.pro.fasta > > 2407 sequences; 662,866 total letters > > > > Searching.....done > > > > > > Score > > E > > Sequences producing significant alignments: > > (bits) > > Value > > > > Prot0002 > > 100 > > 1e-23 > > Prot0003 > > 74 > > 2e-15 > > Prot0004 > > 43 > > 3e-06 > > > > >Prot0002 > > Length = 138 > > > > Score = 100 bits (250), Expect = 1e-23 > > Identities = 54/124 (43%), Positives = 79/124 (63%), Gaps = 3/124 > > (2%) > > > > Query: 18 > > NARTKFTDIAKTLNLTEAAIRKRIKKLEENQIIKRYSIDIDYKKLGYNMAIIGLDIDMDY > > 77 > > NAR T IAK LN+TEAA+RKRI LE + I Y I+YKK+G + ++ > G+D+D > > D > > Sbjct: 15 > > NARIPKTRIAKELNVTEAAVRKRIANLERREEILGYKAIINYKKVGLSASLTGVDVDPDK > > 74 > > > > Query: 78 > > FPKIIKELEKRKEFLHIYSSAGDHDIMVIAIYK---DLEEIYNYLKNLKGVKRVCPAIII > > 134 > > K+++EL+ + ++ + GDH IM I K +L EI+ + > > ++GVKRVCP+II > > Sbjct: 75 > > LWKVVEELKDLESVKSLWLTTGDHTIMAEIIAKSVQELSEIHQKIAEMEGVKRVCPSIIT > > 134 > > > > Query: 135 DQIK 138 > > D +K > > Sbjct: 135 DIVK 138 > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l@biojava.org > > http://biojava.org/mailman/listinfo/biojava-l > > > > > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l@biojava.org > > http://biojava.org/mailman/listinfo/biojava-l > > > > > > > > > > ___________________________________________________________________________ > > > Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! > Messenger > > T?l?chargez cette version sur http://fr.messenger.yahoo.com > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > > ___________________________________________________________________________ Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger T?l?chargez cette version sur http://fr.messenger.yahoo.com From cgarnier at ttz-Bremerhaven.de Tue Jun 28 08:03:48 2005 From: cgarnier at ttz-Bremerhaven.de (BIBIS, Garnier, Christophe) Date: Tue Jun 28 07:53:45 2005 Subject: AW: [Biojava-l] BLAST Parser for extracting all BLAST data? Message-ID: if you don't find what you need through biojava, you can always write a small xml parser with for example jdom. 1 - download jdom.jar 2 - use the following code to find : 3 - replace the path of the xml file in the main method 4 - it prints out every found Element I hope it helps you Best, Christophe +++++++++++++++++++++++++++++++++++++ import java.io.File; import java.io.IOException; import java.util.Iterator; import java.util.List; import org.jdom.Document; import org.jdom.Element; import org.jdom.JDOMException; import org.jdom.input.SAXBuilder; public class JDomParser { private static void parseResults(Element iterations) { System.out.println("*** parseResults ***") ; Element it = iterations.getChild("Iteration") ; List elts = it.getChildren(); Iterator iterator = elts.iterator(); while (iterator.hasNext()) { Element child = (Element) iterator.next(); System.out.println(child + " - " + child.getText() + " - " + child.getName()); if ( child.getName().equals("Iteration_hits")) { parseHits(child) ; } if ( child.getName().equals("Iteration_stat")) { parseStatistics(child) ; } } } private static void parseHits(Element element) { List elts = element.getChildren(); Iterator iterator = elts.iterator(); while (iterator.hasNext()) { Element child = (Element) iterator.next(); printElt(child) ; parseHit(child) ; } } private static void parseHspHit(Element element) { Element hsp = element.getChild("Hsp") ; List hsps = hsp.getChildren(); Iterator iterator = hsps.iterator(); while (iterator.hasNext()) { Element child = (Element) iterator.next(); printElt(child) ; } } private static void printElt(Element elt) { System.out.println("Element: [" + elt.getName() + "] - text:" + elt.getText() ) ; } private static void parseHit(Element element) { List elts = element.getChildren(); Iterator iterator = elts.iterator(); while (iterator.hasNext()) { Element child = (Element) iterator.next(); printElt(child) ; if (child.getName().equals("Hit_hsps")) { parseHspHit(child) ; } } } private static void parseStatistics(Element element) { Element stat = element.getChild("Statistics") ; List elts = stat.getChildren(); Iterator iterator = elts.iterator(); while (iterator.hasNext()) { Element child = (Element) iterator.next(); printElt(child) ; } } public static void parseFile(File file) throws JDOMException, IOException { SAXBuilder parser = new SAXBuilder(); Document doc = parser.build(file); Element root = doc.getRootElement(); List elts = root.getChildren(); Iterator iterator = elts.iterator(); int index = 0; while (iterator.hasNext()) { Element child = (Element) iterator.next(); printElt(child) ; if (child.getName().equals("BlastOutput_iterations")) parseResults(child); } } /** * @param args */ public static void main(String[] args) { File f = new File("E:/result.xml"); try { parseFile(f); } catch (JDOMException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } } +++++++++++++++++++++++++++++++++++++ -----Urspr?ngliche Nachricht----- Von: S?bastien PETIT [mailto:great_fred@yahoo.com] Gesendet: Dienstag, 28. Juni 2005 13:34 An: biojava-l@biojava.org Betreff: RE: [Biojava-l] BLAST Parser for extracting all BLAST data? Arggh!!!!I didn't find what I wanted!! I used the program you gave me but with a light modification because it didn't recognize my XML file... The parser is, now, a BlastXMLParserFacade.... And it gave me everythings it found in the file..... BUT not what I want!!GRRR...>:( >:( >:( There is a mark out (I don't know if it's the good word...) in my XML file which frame what I'm searching for : .... Why the parser doesn't see it..?? I didn't really understand how the XML parser works....So, how can I modifie it to find my happiness...?? PLEASE DOC'!!! ;);) Help me!! Thanks for everythings.. Sebastien --- mark.schreiber@novartis.com a ?crit : > Hi - > > Try running this program > http://www.biojava.org/docs/bj_in_anger/blastecho.htm > > If you see what you need in the output then it is being read by the > Blast > parser and emitted as an event (which you could listen for). If it > isn't > then the Blast parser is not emitting those events although someone > confident with the blast format could probably modify it so it does. > > In short, it is possible but it might not be implemented ; ) > > - Mark > > > > > > S?bastien PETIT > Sent by: biojava-l-bounces@portal.open-bio.org > 06/28/2005 05:11 PM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: RE: [Biojava-l] BLAST Parser for extracting > all BLAST data? > > > Hi, everybody... > > I'm like Georges....I want to extract data from BLAST files..... > I can have the alignements, no problem...But, now, I want the > alignment > between the 2 sequences (the lines with "+", "-" and some letters in > George's example....) because with this, we can see in a glance if > the > alignment between the 2 sequences is really good or not. > > Is it possible, Docs?? > > Thank you. > > Sebastien > > --- Richard HOLLAND a ?crit : > > > BioJava's BLAST framework parses files and fires events for every > > piece of information it finds. The SeqSimilarityAdapter class is an > > example of how to catch these events and construct basic BLAST > result > > objects (SimpleSeqSimilarityHit), however they are not > comprehensive > > and do not record full details of every hit. > > > > If you want the kind of detail you mention below you will have to > > write your own content handler for BLAST parsing and parse it to > the > > BLASTLikeSAXParser when parsing a file. This event handler should > > implement the ContentHandler interface. Look at the source of > > SeqSimilarityAdapter for guidance. You will then receive events for > > every part of the file, from which you can construct your own > custom > > BLAST result objects to describe them. > > > > If you're not sure what tag names to listen for in your > > ContentHandler the easiest thing to do is just run it once and dump > > them all out to see what you get. > > > > cheers, > > Richard > > > > > > -----Original Message----- > > From: biojava-l-bounces@portal.open-bio.org on behalf of Y > D > Sun > > Sent: Sun 6/26/2005 5:42 PM > > To: biojava-l@biojava.org > > Cc: > > Subject: [Biojava-l] BLAST Parser for extracting all > BLAST > data? > > > > Hi, > > > > I want to extract all data from BLASTP results. In the following > hit, > > for example, I need to get the lengths of query and subject > proteins, > > the identities (including all data 54, 124 and 43%), the positives > > (all > > data 79, 124 and 63%), and the gaps (3, 124 and 2%). Can the > > BLASTLikeSAXParser filter all these information? I can't find the > > methods in SeqSimilaritySearchHit and SeqSimilaritySearchSubHit > APIs > > to > > retrieve these data. Does Biojava provide any methods for this > > purpose? > > > > Thanks, > > > > George > > > > > > BLASTP 2.2.5 [Nov-16-2002] > > > > Query= Prot0001 > > (138 letters) > > > > Database: /work/nys1/fasta/protein/AE000782.pro.fasta > > 2407 sequences; 662,866 total letters > > > > Searching.....done > > > > > > Score > > E > > Sequences producing significant alignments: > > (bits) > > Value > > > > Prot0002 > > 100 > > 1e-23 > > Prot0003 > > 74 > > 2e-15 > > Prot0004 > > 43 > > 3e-06 > > > > >Prot0002 > > Length = 138 > > > > Score = 100 bits (250), Expect = 1e-23 > > Identities = 54/124 (43%), Positives = 79/124 (63%), Gaps = 3/124 > > (2%) > > > > Query: 18 > > NARTKFTDIAKTLNLTEAAIRKRIKKLEENQIIKRYSIDIDYKKLGYNMAIIGLDIDMDY > > 77 > > NAR T IAK LN+TEAA+RKRI LE + I Y I+YKK+G + ++ > G+D+D > > D > > Sbjct: 15 > > NARIPKTRIAKELNVTEAAVRKRIANLERREEILGYKAIINYKKVGLSASLTGVDVDPDK > > 74 > > > > Query: 78 > > FPKIIKELEKRKEFLHIYSSAGDHDIMVIAIYK---DLEEIYNYLKNLKGVKRVCPAIII > > 134 > > K+++EL+ + ++ + GDH IM I K +L EI+ + > > ++GVKRVCP+II > > Sbjct: 75 > > LWKVVEELKDLESVKSLWLTTGDHTIMAEIIAKSVQELSEIHQKIAEMEGVKRVCPSIIT > > 134 > > > > Query: 135 DQIK 138 > > D +K > > Sbjct: 135 DIVK 138 > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l@biojava.org > > http://biojava.org/mailman/listinfo/biojava-l > > > > > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l@biojava.org > > http://biojava.org/mailman/listinfo/biojava-l > > > > > > > > > > ___________________________________________________________________________ > > > Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! > Messenger > > T?l?chargez cette version sur http://fr.messenger.yahoo.com > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > > ___________________________________________________________________________ Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger T?l?chargez cette version sur http://fr.messenger.yahoo.com _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From great_fred at yahoo.com Tue Jun 28 08:59:30 2005 From: great_fred at yahoo.com (=?iso-8859-1?q?S=E9bastien=20PETIT?=) Date: Tue Jun 28 08:50:52 2005 Subject: AW: [Biojava-l] BLAST Parser for extracting all BLAST data? In-Reply-To: Message-ID: <20050628125931.9771.qmail@web32209.mail.mud.yahoo.com> Thank you for JDOM and the code... But, it generates a ton of exceptions and error because it doesn't find a DTD file (NCBI_BlastOutput.dtd) that I don't have... So, I don't know how to do... Sebastien --- "BIBIS, Garnier, Christophe" a ?crit : > > if you don't find what you need through biojava, you can always write > a > small xml parser with for example jdom. > > 1 - download jdom.jar > 2 - use the following code to find : > 3 - replace the path of the xml file in the main method > 4 - it prints out every found Element > > > I hope it helps you > > Best, > Christophe > > +++++++++++++++++++++++++++++++++++++ > > import java.io.File; > import java.io.IOException; > import java.util.Iterator; > import java.util.List; > > import org.jdom.Document; > import org.jdom.Element; > import org.jdom.JDOMException; > import org.jdom.input.SAXBuilder; > > public class JDomParser > { > > private static void parseResults(Element iterations) > { > System.out.println("*** parseResults ***") ; > > Element it = iterations.getChild("Iteration") ; > > List elts = it.getChildren(); > > Iterator iterator = elts.iterator(); > > while (iterator.hasNext()) > { > Element child = (Element) iterator.next(); > > System.out.println(child + " - " + child.getText() + > " - " > + child.getName()); > > if ( child.getName().equals("Iteration_hits")) > { > parseHits(child) ; > } > > if ( child.getName().equals("Iteration_stat")) > { > parseStatistics(child) ; > } > > > } > } > > private static void parseHits(Element element) > { > List elts = element.getChildren(); > > Iterator iterator = elts.iterator(); > > while (iterator.hasNext()) > { > Element child = (Element) iterator.next(); > > printElt(child) ; > > parseHit(child) ; > > } > } > > private static void parseHspHit(Element element) > { > Element hsp = element.getChild("Hsp") ; > > List hsps = hsp.getChildren(); > > Iterator iterator = hsps.iterator(); > > while (iterator.hasNext()) > { > Element child = (Element) iterator.next(); > > printElt(child) ; > } > } > > private static void printElt(Element elt) > { > System.out.println("Element: [" + elt.getName() + "] - > text:" + elt.getText() ) ; > } > > private static void parseHit(Element element) > { > List elts = element.getChildren(); > > Iterator iterator = elts.iterator(); > > while (iterator.hasNext()) > { > Element child = (Element) iterator.next(); > > printElt(child) ; > > if (child.getName().equals("Hit_hsps")) > { > parseHspHit(child) ; > } > > } > } > > > private static void parseStatistics(Element element) > { > Element stat = element.getChild("Statistics") ; > > List elts = stat.getChildren(); > > Iterator iterator = elts.iterator(); > > while (iterator.hasNext()) > { > Element child = (Element) iterator.next(); > > printElt(child) ; > > } > > } > > > public static void parseFile(File file) throws JDOMException, > IOException > { > SAXBuilder parser = new SAXBuilder(); > Document doc = parser.build(file); > > Element root = doc.getRootElement(); > > List elts = root.getChildren(); > Iterator iterator = elts.iterator(); > > int index = 0; > while (iterator.hasNext()) > { > > Element child = (Element) iterator.next(); > > printElt(child) ; > > if > (child.getName().equals("BlastOutput_iterations")) > parseResults(child); > > } > > } > > /** > * @param args > */ > public static void main(String[] args) > { > File f = new File("E:/result.xml"); > > try > { > parseFile(f); > } > catch (JDOMException e) > { > e.printStackTrace(); > } > catch (IOException e) > { > e.printStackTrace(); > } > } > > } > > > > > > +++++++++++++++++++++++++++++++++++++ > > > > > === message truncated === ___________________________________________________________________________ Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger T?l?chargez cette version sur http://fr.messenger.yahoo.com From great_fred at yahoo.com Tue Jun 28 10:49:34 2005 From: great_fred at yahoo.com (=?iso-8859-1?q?S=E9bastien=20PETIT?=) Date: Tue Jun 28 10:40:49 2005 Subject: AW: AW: [Biojava-l] BLAST Parser for extracting all BLAST data? In-Reply-To: Message-ID: <20050628144934.18013.qmail@web32205.mail.mud.yahoo.com> I try the code you sent me. I just change the path of the XML file. But, in this file, there is this line : and I have exceptions and errors because of this line. If you want, I send the XML file so that you test it... But, I download the DTD and the MOD files necessary, I modified the DTD file a little bit, and it works... But, I would prefer to not have those files with my code... Thank you... Sebastien --- "BIBIS, Garnier, Christophe" a ?crit : > Did you try just the code i sent you? Or did you integrate it inside > your > program? > > As far as i know, jdom works without dtd files: it makes no control > on the > structure of the file > It should word because I tested it without using the corresponding > dtd file. > > > christophe > > > -----Urspr?ngliche Nachricht----- > Von: S?bastien PETIT [mailto:great_fred@yahoo.com] > Gesendet: Dienstag, 28. Juni 2005 15:00 > An: biojava-l@biojava.org > Betreff: RE: AW: [Biojava-l] BLAST Parser for extracting all BLAST > data? > > > Thank you for JDOM and the code... > But, it generates a ton of exceptions and error because it doesn't > find > a DTD file (NCBI_BlastOutput.dtd) that I don't have... > > So, I don't know how to do... > > Sebastien > > --- "BIBIS, Garnier, Christophe" a > ?crit > : > > > > > if you don't find what you need through biojava, you can always > write > > a > > small xml parser with for example jdom. > > > > 1 - download jdom.jar > > 2 - use the following code to find : > > 3 - replace the path of the xml file in the main method > > 4 - it prints out every found Element > > > > > > I hope it helps you > > > > Best, > > Christophe > > > > +++++++++++++++++++++++++++++++++++++ > > > > import java.io.File; > > import java.io.IOException; > > import java.util.Iterator; > > import java.util.List; > > > > import org.jdom.Document; > > import org.jdom.Element; > > import org.jdom.JDOMException; > > import org.jdom.input.SAXBuilder; > > > > public class JDomParser > > { > > > > private static void parseResults(Element iterations) > > { > > System.out.println("*** parseResults ***") ; > > > > Element it = iterations.getChild("Iteration") ; > > > > List elts = it.getChildren(); > > > > Iterator iterator = elts.iterator(); > > > > while (iterator.hasNext()) > > { > > Element child = (Element) iterator.next(); > > > > System.out.println(child + " - " + child.getText() + > > " - " > > + child.getName()); > > > > if ( child.getName().equals("Iteration_hits")) > > { > > parseHits(child) ; > > } > > > > if ( child.getName().equals("Iteration_stat")) > > { > > parseStatistics(child) ; > > } > > > > > > } > > } > > > > private static void parseHits(Element element) > > { > > List elts = element.getChildren(); > > > > Iterator iterator = elts.iterator(); > > > > while (iterator.hasNext()) > > { > > Element child = (Element) iterator.next(); > > > > printElt(child) ; > > > > parseHit(child) ; > > > > } > > } > > > > private static void parseHspHit(Element element) > > { > > Element hsp = element.getChild("Hsp") ; > > > > List hsps = hsp.getChildren(); > > > > Iterator iterator = hsps.iterator(); > > > > while (iterator.hasNext()) > > { > > Element child = (Element) iterator.next(); > > > > printElt(child) ; > > } > > } > > > > private static void printElt(Element elt) > > { > > System.out.println("Element: [" + elt.getName() + "] - > > text:" + elt.getText() ) ; > > } > > > > private static void parseHit(Element element) > > { > > List elts = element.getChildren(); > > > > Iterator iterator = elts.iterator(); > > > > while (iterator.hasNext()) > > { > > Element child = (Element) iterator.next(); > > > > printElt(child) ; > > > > if (child.getName().equals("Hit_hsps")) > > { > > parseHspHit(child) ; > > } > > > > } > > } > > > > > > private static void parseStatistics(Element element) > > { > > Element stat = element.getChild("Statistics") ; > > > > List elts = stat.getChildren(); > > > > Iterator iterator = elts.iterator(); > > > > while (iterator.hasNext()) > > { > > Element child = (Element) iterator.next(); > > > > printElt(child) ; > > > > } > > > > } > > > > > > public static void parseFile(File file) throws JDOMException, > > IOException > > { > > SAXBuilder parser = new SAXBuilder(); > > Document doc = parser.build(file); > > > > Element root = doc.getRootElement(); > > > > List elts = root.getChildren(); > > Iterator iterator = elts.iterator(); > > > > int index = 0; > > while (iterator.hasNext()) > > { > > > > Element child = (Element) iterator.next(); > > > > printElt(child) ; > > > > if > > (child.getName().equals("BlastOutput_iterations")) > > parseResults(child); > > > === message truncated === ___________________________________________________________________________ Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger T?l?chargez cette version sur http://fr.messenger.yahoo.com From gilson at cs.wisc.edu Tue Jun 28 14:51:48 2005 From: gilson at cs.wisc.edu (Michael C Gilson) Date: Tue Jun 28 14:43:06 2005 Subject: [Biojava-l] Implementing a Feature Message-ID: <35E633ED-EE51-49BB-9C37-F778506CE5AD@cs.wisc.edu> Hello, all. I am new to BioJava but finding it extremely useful. I'd like to add a few extra fields or methods to the SimpleFeature class and I'm wondering the best way to go about it? I have read through BioJava in Anger and also am wondering if there are any other documents out there that describe how to work with the API (beyond just the API javadocs). Thanks in advance, Michael C Gilson Genome Evolution Lab University of Wisconsin-Madison From hollandr at gis.a-star.edu.sg Tue Jun 28 15:03:28 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Tue Jun 28 14:55:47 2005 Subject: [Biojava-l] Implementing a Feature Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601E87174@BIONIC.biopolis.one-north.com> The best thing to do is write your own class which extends SimpleFeature, or ignores SimpleFeature and just implements the Feature interface with all-new methods written from scratch, plus your new ones. If the extra methods are pretty generic, you could create a new interface which extends Feature and add them there, then make your new class implement the new interface. Why not list out the methods/fields you'd like to add - then we can all offer suggestions as to the most appropriate places to make the changes. The BioJava in Anger book has a few links to how the API works - it's linked from biojava.org under the documentation section. cheer, Richard -----Original Message----- From: biojava-l-bounces@portal.open-bio.org on behalf of Michael C Gilson Sent: Wed 6/29/2005 2:51 AM To: biojava-l@biojava.org Cc: Subject: [Biojava-l] Implementing a Feature Hello, all. I am new to BioJava but finding it extremely useful. I'd like to add a few extra fields or methods to the SimpleFeature class and I'm wondering the best way to go about it? I have read through BioJava in Anger and also am wondering if there are any other documents out there that describe how to work with the API (beyond just the API javadocs). Thanks in advance, Michael C Gilson Genome Evolution Lab University of Wisconsin-Madison _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From Russell.Smithies at agresearch.co.nz Tue Jun 28 20:11:23 2005 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Tue Jun 28 20:02:52 2005 Subject: AW: AW: [Biojava-l] BLAST Parser for extracting all BLAST data? Message-ID: Easiest method (if you don't care about validating) is delete the DTD line in the XML. If you do need to validate, ensure you have your proxy settings stet correctly so the parser can access the DTD. Russell -----Original Message----- From: biojava-l-bounces@portal.open-bio.org [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of S?bastien PETIT Sent: Wednesday, 29 June 2005 2:50 a.m. To: biojava-l@biojava.org Subject: RE: AW: AW: [Biojava-l] BLAST Parser for extracting all BLAST data? I try the code you sent me. I just change the path of the XML file. But, in this file, there is this line : and I have exceptions and errors because of this line. If you want, I send the XML file so that you test it... But, I download the DTD and the MOD files necessary, I modified the DTD file a little bit, and it works... But, I would prefer to not have those files with my code... Thank you... Sebastien --- "BIBIS, Garnier, Christophe" a ?crit : > Did you try just the code i sent you? Or did you integrate it inside > your > program? > > As far as i know, jdom works without dtd files: it makes no control > on the > structure of the file > It should word because I tested it without using the corresponding > dtd file. > > > christophe > > > -----Urspr?ngliche Nachricht----- > Von: S?bastien PETIT [mailto:great_fred@yahoo.com] > Gesendet: Dienstag, 28. Juni 2005 15:00 > An: biojava-l@biojava.org > Betreff: RE: AW: [Biojava-l] BLAST Parser for extracting all BLAST > data? > > > Thank you for JDOM and the code... > But, it generates a ton of exceptions and error because it doesn't > find > a DTD file (NCBI_BlastOutput.dtd) that I don't have... > > So, I don't know how to do... > > Sebastien > > --- "BIBIS, Garnier, Christophe" a > ?crit > : > > > > > if you don't find what you need through biojava, you can always > write > > a > > small xml parser with for example jdom. > > > > 1 - download jdom.jar > > 2 - use the following code to find : > > 3 - replace the path of the xml file in the main method > > 4 - it prints out every found Element > > > > > > I hope it helps you > > > > Best, > > Christophe > > > > +++++++++++++++++++++++++++++++++++++ > > > > import java.io.File; > > import java.io.IOException; > > import java.util.Iterator; > > import java.util.List; > > > > import org.jdom.Document; > > import org.jdom.Element; > > import org.jdom.JDOMException; > > import org.jdom.input.SAXBuilder; > > > > public class JDomParser > > { > > > > private static void parseResults(Element iterations) > > { > > System.out.println("*** parseResults ***") ; > > > > Element it = iterations.getChild("Iteration") ; > > > > List elts = it.getChildren(); > > > > Iterator iterator = elts.iterator(); > > > > while (iterator.hasNext()) > > { > > Element child = (Element) iterator.next(); > > > > System.out.println(child + " - " + child.getText() + > > " - " > > + child.getName()); > > > > if ( child.getName().equals("Iteration_hits")) > > { > > parseHits(child) ; > > } > > > > if ( child.getName().equals("Iteration_stat")) > > { > > parseStatistics(child) ; > > } > > > > > > } > > } > > > > private static void parseHits(Element element) > > { > > List elts = element.getChildren(); > > > > Iterator iterator = elts.iterator(); > > > > while (iterator.hasNext()) > > { > > Element child = (Element) iterator.next(); > > > > printElt(child) ; > > > > parseHit(child) ; > > > > } > > } > > > > private static void parseHspHit(Element element) > > { > > Element hsp = element.getChild("Hsp") ; > > > > List hsps = hsp.getChildren(); > > > > Iterator iterator = hsps.iterator(); > > > > while (iterator.hasNext()) > > { > > Element child = (Element) iterator.next(); > > > > printElt(child) ; > > } > > } > > > > private static void printElt(Element elt) > > { > > System.out.println("Element: [" + elt.getName() + "] - > > text:" + elt.getText() ) ; > > } > > > > private static void parseHit(Element element) > > { > > List elts = element.getChildren(); > > > > Iterator iterator = elts.iterator(); > > > > while (iterator.hasNext()) > > { > > Element child = (Element) iterator.next(); > > > > printElt(child) ; > > > > if (child.getName().equals("Hit_hsps")) > > { > > parseHspHit(child) ; > > } > > > > } > > } > > > > > > private static void parseStatistics(Element element) > > { > > Element stat = element.getChild("Statistics") ; > > > > List elts = stat.getChildren(); > > > > Iterator iterator = elts.iterator(); > > > > while (iterator.hasNext()) > > { > > Element child = (Element) iterator.next(); > > > > printElt(child) ; > > > > } > > > > } > > > > > > public static void parseFile(File file) throws JDOMException, > > IOException > > { > > SAXBuilder parser = new SAXBuilder(); > > Document doc = parser.build(file); > > > > Element root = doc.getRootElement(); > > > > List elts = root.getChildren(); > > Iterator iterator = elts.iterator(); > > > > int index = 0; > > while (iterator.hasNext()) > > { > > > > Element child = (Element) iterator.next(); > > > > printElt(child) ; > > > > if > > (child.getName().equals("BlastOutput_iterations")) > > parseResults(child); > > > === message truncated === ___________________________________________________________________________ Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger T?l?chargez cette version sur http://fr.messenger.yahoo.com _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From mark.schreiber at novartis.com Tue Jun 28 21:49:19 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Jun 28 21:40:36 2005 Subject: [Biojava-l] BLAST functionality Message-ID: There have been a number of requests to the list (and directly to me) for increased functionality for the BLAST parsers (eg capturing more of the information in the report). Originally the design was lightweight and captured what most people wanted but as always there are always people who think different (as Steve Jobs might say) and want different things. The best way for the BLAST parsers to improve is for people to contribute code. There are lots of work arounds that people have made to improve the parsers that have not found there way into biojava. Ideally I'm hoping someone will volunteer to take a look at this and coordinate the effort. The ideal person should be a reasonable Java programmer with a good feel for how the BLAST part of the API works. They would also be someone who uses it a lot and is therefore motivated to improve it. The BLAST API is probably the most used part of biojava so instant fame and adulation await the generous volunteer : ) I know your out there somewhere... - Mark Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From Gem.Yang at jhu.edu Thu Jun 30 14:29:54 2005 From: Gem.Yang at jhu.edu (Gem Yang) Date: Thu Jun 30 14:21:55 2005 Subject: [Biojava-l] memory leak while reading nr.fasta In-Reply-To: Message-ID: <200506301830.j5UIUGe00711@storey.bme.jhu.edu> Hi, I am new to Biojava. I have the following program, which is copied from ReadFaster2 in the cookbook. public static void main(String[] args) { try { // args[0] is nr.fasta BufferedReader br = new BufferedReader(new FileReader(args[0])); String format = "FASTA"; String alphabet = "PROTEIN"; SequenceIterator iter = quenceIterator)SeqIOTools.fileToBiojava(format,alphabet, br); int count =0; long start = System.currentTimeMillis(); while(iter.hasNext()) { Sequence s = iter.nextSequence(); String name = s.getName(); //System.out.println(name); s.getAnnotation(); //System.out.println(s.seqString()); count ++; System.out.println(count); } long end = System.currentTimeMillis(); System.out.println("number of sequence " + count); System.out.println("time used" + (end-start)/1000 + "seconds"); System.out.println((end-start)/1000/60 + "minutes"); } catch (FileNotFoundException ex) { //can't find file specified by args[0] ex.printStackTrace(); }catch (BioException ex) { //error parsing requested format ex.printStackTrace(); } } When running this code, I got out of memory error in about half an hour and 1.5GB memory allocated. My workstation is a Windows XP with 2 GB of memory. My biojava version is 1.3. My JRE is one came with Websphere application developer. Thanks. Gem From Gem.Yang at jhu.edu Thu Jun 30 14:31:49 2005 From: Gem.Yang at jhu.edu (Gem Yang) Date: Thu Jun 30 14:23:21 2005 Subject: [Biojava-l] RE: memory leak while reading nr.fasta Message-ID: <200506301832.j5UIWAe00860@storey.bme.jhu.edu> I have just a couple of typos in my previous post. Sorry about that. Gem -----Original Message----- From: Gem Yang [mailto:cyang27@bme.jhu.edu] Sent: Thursday, June 30, 2005 2:30 PM To: 'biojava-l@biojava.org' Subject: memory leak while reading nr.fasta Hi, I am new to Biojava. I have the following program, which is copied from ReadFaster2 in the cookbook. public static void main(String[] args) { try { // args[0] is nr.fasta BufferedReader br = new BufferedReader(new FileReader(args[0])); String format = "FASTA"; String alphabet = "PROTEIN"; SequenceIterator iter = quenceIterator)SeqIOTools.fileToBiojava(format,alphabet, br); int count =0; long start = System.currentTimeMillis(); while(iter.hasNext()) { Sequence s = iter.nextSequence(); String name = s.getName(); //System.out.println(name); s.getAnnotation(); //System.out.println(s.seqString()); count ++; System.out.println(count); } long end = System.currentTimeMillis(); System.out.println("number of sequence " + count); System.out.println("time used" + (end-start)/1000 + "seconds"); System.out.println((end-start)/1000/60 + "minutes"); } catch (FileNotFoundException ex) { //can't find file specified by args[0] ex.printStackTrace(); }catch (BioException ex) { //error parsing requested format ex.printStackTrace(); } } When running this code, I got out of memory error in about half an hour and 1.5GB memory allocated. My workstation is a Windows XP with 2 GB of memory. My biojava version is 1.3. My JRE is one came with Websphere application developer. Thanks. Gem