From markjschreiber at gmail.com Thu Dec 6 02:32:31 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 6 Dec 2007 15:32:31 +0800 Subject: [Biojava-dev] Possible bug in AbstractRichSequenceDB Message-ID: <93b45ca50712052332g2ed423fje2e758a439b39688@mail.gmail.com> Hi - Yesterday I noticed a possible bug in HashRichSequenceDB which is probably caused by AbstractRichSequenceDB. The issue is that the iterator thinks there are no sequences even when there are. I don't have the code base with me but probably when sequences are added without an ID no ID is being generated for them and thus the set of ID's is empty even though there are sequences present in the DB. PS, would have submitted this to bugzilla but the biojava.org seems to be down. - Mark From ap3 at sanger.ac.uk Thu Dec 6 05:33:17 2007 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Thu, 6 Dec 2007 10:33:17 +0000 Subject: [Biojava-dev] status update SVN migration Message-ID: Hi, a quick status update of the CVS to SVN migration for BioJava: George Hartzell, created the first svn dumps for the CVS repository. I am running tests on these to make sure the whole repository has been exported correctly. For details please see here: http://biojava.org/wiki/CVS_to_SVN_Migration A few minor problems have been found during this. As soon as these have been resolved we will be ready to make the final migration. In order to speed the migration process up, please commit any uncommitted changes to CVS in the next couple of days. Once the tests are finished I will send another notification email which will declare a CVS freeze a few days after. After this freeze CVS will remain frozen forever and all new development should happen in SVN. There will also be a new BioJava release at that point. Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From holland at ebi.ac.uk Fri Dec 7 06:53:42 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Fri, 07 Dec 2007 11:53:42 +0000 Subject: [Biojava-dev] Possible bug in AbstractRichSequenceDB In-Reply-To: <93b45ca50712052332g2ed423fje2e758a439b39688@mail.gmail.com> References: <93b45ca50712052332g2ed423fje2e758a439b39688@mail.gmail.com> Message-ID: <47593446.6040600@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Definitely a bug. I'll commit a fix once the SVN migration is complete, to save confusion. cheers, Richard Mark Schreiber wrote: > Hi - > > Yesterday I noticed a possible bug in HashRichSequenceDB which is > probably caused by AbstractRichSequenceDB. The issue is that the > iterator thinks there are no sequences even when there are. > > I don't have the code base with me but probably when sequences are > added without an ID no ID is being generated for them and thus the set > of ID's is empty even though there are sequences present in the DB. > > PS, would have submitted this to bugzilla but the biojava.org seems to be down. > > - Mark > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHWTRG4C5LeMEKA/QRAu80AJ0TGBZVy9cjS93ZLL+ElFjMSt0ZsACePbow LJRAkqH5l8N1WLEwr4Z95nw= =AeGr -----END PGP SIGNATURE----- From simpleyrx at 163.com Fri Dec 7 11:10:27 2007 From: simpleyrx at 163.com (simpleyrx) Date: Sat, 8 Dec 2007 00:10:27 +0800 (CST) Subject: [Biojava-dev] add more class In-Reply-To: References: Message-ID: <632561876.693451197043827234.JavaMail.coremail@bj163app105.163.com> Biojava is a powerful tools for bioinformatist. I am studying protein bioinformatics recently, and found that I need many classifiction methods such as Bayes , RandomForest and so on. Some classifiction methods was developed by weka in JAVA. If these classifiction methods can add to biojava package maybe it will be better. Student From markjschreiber at gmail.com Sat Dec 8 02:07:55 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 8 Dec 2007 15:07:55 +0800 Subject: [Biojava-dev] add more class In-Reply-To: <632561876.693451197043827234.JavaMail.coremail@bj163app105.163.com> References: <632561876.693451197043827234.JavaMail.coremail@bj163app105.163.com> Message-ID: <93b45ca50712072307w4e58c82fk828f81b92c0207f8@mail.gmail.com> Hi - You can implement Gibbs sampler, Naive Bayes, Markov Chains and HMM's in biojava (take a look at the examples on the cookbook website http://biojava.org/wiki/BioJava:Cookbook). You might also want to look at BioWeka which combines biojava and weka (http://bioweka.sourceforge.net/). - Mark On Dec 8, 2007 12:10 AM, simpleyrx wrote: > > > > Biojava is a powerful tools for bioinformatist. I am studying protein > bioinformatics recently, and found that I need many classifiction methods > such as Bayes , RandomForest and so on. Some classifiction methods was > developed by weka in JAVA. If these classifiction methods can add to > biojava package maybe it will be better. > > > Student > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > From markjschreiber at gmail.com Sat Dec 8 02:19:55 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 8 Dec 2007 15:19:55 +0800 Subject: [Biojava-dev] Javascript to wordwrap Fasta format Message-ID: <93b45ca50712072319o42e12621of6096aa87be708fe@mail.gmail.com> Hi - I am building a webapp (yes I have gone to the dark side) in JSF which has a TextArea component that users have to type or paste a Fasta formatted sequence for processing. By default there is no word wrapping, instead scroll bars magically appear. What I really want is for the input to wrap every 60 characters (even without spaces) unless it is the description line (first line). It seems the best way would be to put some javascript in one of the 'onXXX' events to do it while the user is typing. Does anyone have a snippet that might do something like this? Thanks. - Mark From holland at ebi.ac.uk Sat Dec 8 14:42:43 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Sat, 8 Dec 2007 19:42:43 -0000 (GMT) Subject: [Biojava-dev] Javascript to wordwrap Fasta format In-Reply-To: <93b45ca50712072319o42e12621of6096aa87be708fe@mail.gmail.com> References: <93b45ca50712072319o42e12621of6096aa87be708fe@mail.gmail.com> Message-ID: <40026.80.42.52.173.1197142963.squirrel@webmail.ebi.ac.uk> You need to implement the 'onkeypress' function. This page has an example: http://www.w3schools.com/jsref/jsref_onkeypress.asp What you'd then do is append the key code to the end of the text (unless Javascript already did it for you before passing the event - I'm not sure if it does), then read the textarea content into a buffer, process it by stripping and replacing newlines, then write it back out to the textarea, like this: http://www.developer.be/forums/index.cfm/fuseaction/dsp_full_thread/fullthreadid/204/forumID/10.htm cheers, Richard On Sat, December 8, 2007 7:19 am, Mark Schreiber wrote: > Hi - > > I am building a webapp (yes I have gone to the dark side) in JSF which has > a > TextArea component that users have to type or paste a Fasta formatted > sequence for processing. By default there is no word wrapping, instead > scroll bars magically appear. What I really want is for the input to wrap > every 60 characters (even without spaces) unless it is the description > line > (first line). > > It seems the best way would be to put some javascript in one of the > 'onXXX' > events to do it while the user is typing. Does anyone have a snippet that > might do something like this? > > Thanks. > > - Mark > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- Richard Holland BioMart (http://www.biomart.org/) EMBL-EBI Hinxton, Cambridgeshire CB10 1SD, UK From ap3 at sanger.ac.uk Mon Dec 10 08:26:13 2007 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Mon, 10 Dec 2007 13:26:13 +0000 Subject: [Biojava-dev] SVN migration: declaring CVS freeze Message-ID: <099BDD0E-DA16-43C7-88CE-A47CA810D1EE@sanger.ac.uk> Hi, for the SVN migration please commit any remaining code to CVS in the next few days. On Wednesday, December 12th, 18:00 GMT the BioJava CVS will be frozen. In the following days the repository will be migrated to subversion (SVN) . From then on all future development will be happening in the new SVN repository. All code (+ history) will be available via SVN. I will send a confirmation email when the new SVN repository will become accessible. Detailed instructions on how to check out and commit code will be sent out at that stage as well. for more details see: http://biojava.org/wiki/CVS_to_SVN_Migration Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From holland at ebi.ac.uk Mon Dec 10 11:56:49 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Mon, 10 Dec 2007 16:56:49 +0000 Subject: [Biojava-dev] Possible bug in AbstractRichSequenceDB In-Reply-To: <93b45ca50712052332g2ed423fje2e758a439b39688@mail.gmail.com> References: <93b45ca50712052332g2ed423fje2e758a439b39688@mail.gmail.com> Message-ID: <475D6FD1.1080106@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Do you have a sample bit of code to demonstrate the problem? Mark Schreiber wrote: > Hi - > > Yesterday I noticed a possible bug in HashRichSequenceDB which is > probably caused by AbstractRichSequenceDB. The issue is that the > iterator thinks there are no sequences even when there are. > > I don't have the code base with me but probably when sequences are > added without an ID no ID is being generated for them and thus the set > of ID's is empty even though there are sequences present in the DB. > > PS, would have submitted this to bugzilla but the biojava.org seems to be down. > > - Mark > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHXW/R4C5LeMEKA/QRAvxVAJ42WIqipKfpMVUmbDuagwB6YDgqbACfSuNZ fYLxn6u6PhNsGGyvbYKAy5w= =cvR/ -----END PGP SIGNATURE----- From markjschreiber at gmail.com Mon Dec 10 23:30:31 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 11 Dec 2007 12:30:31 +0800 Subject: [Biojava-dev] Possible bug in AbstractRichSequenceDB In-Reply-To: <475D6FD1.1080106@ebi.ac.uk> References: <93b45ca50712052332g2ed423fje2e758a439b39688@mail.gmail.com> <475D6FD1.1080106@ebi.ac.uk> Message-ID: <93b45ca50712102030w358df713g58e904650d498732@mail.gmail.com> Sorry, seems this was a false alarm. The bug was somewhere else in my code, not in the iterator. - Mark On Dec 11, 2007 12:56 AM, Richard Holland wrote: > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Do you have a sample bit of code to demonstrate the problem? > > > > > Mark Schreiber wrote: > > Hi - > > > > Yesterday I noticed a possible bug in HashRichSequenceDB which is > > probably caused by AbstractRichSequenceDB. The issue is that the > > iterator thinks there are no sequences even when there are. > > > > I don't have the code base with me but probably when sequences are > > added without an ID no ID is being generated for them and thus the set > > of ID's is empty even though there are sequences present in the DB. > > > > PS, would have submitted this to bugzilla but the biojava.org seems to be down. > > > > - Mark > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > - -- > Richard Holland (BioMart) > EMBL EBI, Wellcome Trust Genome Campus, > Hinxton, Cambridgeshire CB10 1SD, UK > Tel. +44 (0)1223 494416 > > http://www.biomart.org/ > http://www.biojava.org/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHXW/R4C5LeMEKA/QRAvxVAJ42WIqipKfpMVUmbDuagwB6YDgqbACfSuNZ > fYLxn6u6PhNsGGyvbYKAy5w= > =cvR/ > -----END PGP SIGNATURE----- > From michaelgang at gmail.com Sun Dec 16 09:53:14 2007 From: michaelgang at gmail.com (Michael Gang) Date: Sun, 16 Dec 2007 16:53:14 +0200 Subject: [Biojava-dev] blast parser Message-ID: <6994d82b0712160653g26ab7ea9p6909d98f3b1a75ae@mail.gmail.com> Dear All, I have a problem when running the blast parser example from the cookbook (http://www.biojava.org/wiki/BioJava:CookBook:Blast:Parser) with my custom blast I get the following error. org.xml.sax.SAXException: Program ncbi-blastp Version 2.2.16 is not supported by the biojava blast-like parsing framework at org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:241) at org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:160) at parse.BlastParser.main(BlastParser.java:44) Does it mean that I just can use the parser for older versions of blast ? Furthermore, as I understand the parser first parses all the blast queries and then I can iterate on the results. Is there a blast parser available where it parses every time just the next hit (this would save lots of memory). I searched in google and in the biojava wiki and did not find answers on these questions. Can you please help me in this topic ? Best regards, Michael From markjschreiber at gmail.com Sun Dec 16 22:13:31 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sun, 16 Dec 2007 22:13:31 -0500 Subject: [Biojava-dev] blast parser In-Reply-To: <6994d82b0712160653g26ab7ea9p6909d98f3b1a75ae@mail.gmail.com> References: <6994d82b0712160653g26ab7ea9p6909d98f3b1a75ae@mail.gmail.com> Message-ID: <93b45ca50712161913g761c7e7ar637bf1c9b30ab4cd@mail.gmail.com> Hi - There have been lots of emails on this in the past. Essentially you need to set the BlastParser to lazy parsing. I keep meaning to make lazy parsing the default but I don't have a machine that can access CVS version at the moment to do the check in. Could someone make this change before we switch to subversion?? Thanks, - Mark On Dec 16, 2007 9:53 AM, Michael Gang wrote: > Dear All, > > I have a problem when running the blast parser example from the > cookbook (http://www.biojava.org/wiki/BioJava:CookBook:Blast:Parser) > with my custom blast I get the following error. > org.xml.sax.SAXException: Program ncbi-blastp Version 2.2.16 is not > supported by the biojava blast-like parsing framework > at org.biojava.bio.program.sax.BlastLikeSAXParser.interpret(BlastLikeSAXParser.java:241) > at org.biojava.bio.program.sax.BlastLikeSAXParser.parse(BlastLikeSAXParser.java:160) > at parse.BlastParser.main(BlastParser.java:44) > > Does it mean that I just can use the parser for older versions of blast ? > > Furthermore, as I understand the parser first parses all the blast > queries and then I can iterate on the results. > Is there a blast parser available where it parses every time just the > next hit (this would save lots of memory). > > I searched in google and in the biojava wiki and did not find answers > on these questions. > > Can you please help me in this topic ? > > Best regards, > Michael > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From ap3 at sanger.ac.uk Mon Dec 17 05:20:17 2007 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Mon, 17 Dec 2007 10:20:17 +0000 Subject: [Biojava-dev] blast parser In-Reply-To: <93b45ca50712161913g761c7e7ar637bf1c9b30ab4cd@mail.gmail.com> References: <6994d82b0712160653g26ab7ea9p6909d98f3b1a75ae@mail.gmail.com> <93b45ca50712161913g761c7e7ar637bf1c9b30ab4cd@mail.gmail.com> Message-ID: <9327BAA0-F06A-4A6C-83BB-BB60BCD02BB2@sanger.ac.uk> > > I keep meaning to make lazy parsing the default but I don't have a > machine that can access CVS version at the moment to do the check in. > Could someone make this change before we switch to subversion?? Sorry, CVS is frozen now. please wait with the commits until svn is working. I am testing the initial svn import at the moment. Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From simpleyrx at 163.com Tue Dec 18 04:32:36 2007 From: simpleyrx at 163.com (simpleyrx) Date: Tue, 18 Dec 2007 17:32:36 +0800 (CST) Subject: [Biojava-dev] protein_sequence_alignment In-Reply-To: References: Message-ID: <19647130.948901197970356286.JavaMail.coremail@bj163app25.163.com> Dear sir, package edu.cau.strLab; import java.io.File; import org.biojava.bio.alignment.NeedlemanWunsch; import org.biojava.bio.alignment.SequenceAlignment; import org.biojava.bio.alignment.SmithWaterman; import org.biojava.bio.alignment.SubstitutionMatrix; import org.biojava.bio.seq.ProteinTools; import org.biojava.bio.seq.Sequence; import org.biojava.bio.symbol.AlphabetManager; import org.biojava.bio.symbol.FiniteAlphabet; public class ProteinAlignment{ public static void main(String[] args) { // TODO Auto-generated method stub try { // The alphabet of the sequences. For this example DNA is choosen. FiniteAlphabet alphabet = (FiniteAlphabet) AlphabetManager.alphabetForName("Protein"); // Read the substitution matrix file. // For this example the matrix NUC.4.4 is good. SubstitutionMatrix matrix = new SubstitutionMatrix(alphabet, new File("E:\\bioinformatics_package\\matrices\\BLOSUM62")); // Define the default costs for sequence manipulation for the global alignment. SequenceAlignment aligner = new NeedlemanWunsch( 0, // match 3, // replace 2, // insert 2, // delete 1, // gapExtend matrix // SubstitutionMatrix ); // Sequence query = DNATools.createDNASequence("AC", "query"); // Sequence target = DNATools.createDNASequence("ACkG", "target"); Sequence query = ProteinTools.createProteinSequence("ACK","query"); Sequence subject = ProteinTools.createProteinSequence("ACK", "subject"); // Perform an alignment and save the results. // aligner.pairwiseAlignment( // query, // first sequence // target // second one // ); // aligner.pairwiseAlignment(query, subject); // Print the alignment to the screen System.out.println("Global alignment with Needleman-Wunsch:\n" + aligner.getAlignmentString()); // Perform a local alginment from the sequences with Smith-Waterman. // Firstly, define the expenses (penalties) for every single operation. // aligner = new SmithWaterman( // -1, // match // 3, // replace // 2, // insert // 2, // delete // 1, // gapExtend // matrix // SubstitutionMatrix // ); // // Perform the local alignment. // aligner.pairwiseAlignment(query, target); // // System.out.println("\nlocal alignment with SmithWaterman:\n" + aligner.getAlignmentString()); } catch (Exception exc) { exc.printStackTrace(); } } } the result is below: java.util.NoSuchElementException: No alphabet for name Protein could be found at org.biojava.bio.symbol.AlphabetManager.alphabetForName(AlphabetManager.java:248) at edu.cau.strLab.ProteinAlignment.main(ProteinAlignment.java:20) could sb tell why and how to use NeedlemanWunsch to align protein sequences ? From holland at ebi.ac.uk Tue Dec 18 05:56:16 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Tue, 18 Dec 2007 10:56:16 +0000 Subject: [Biojava-dev] Reply:Re: protein_sequence_alignment In-Reply-To: <12072388.975421197973560707.JavaMail.coremail@bj163app59.163.com> References: <47679B01.4050205@ebi.ac.uk> <19647130.948901197970356286.JavaMail.coremail@bj163app25.163.com> <12072388.975421197973560707.JavaMail.coremail@bj163app59.163.com> Message-ID: <4767A750.4000400@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Your existing sequence alignment builder, "aligner", has a method which returns an Alignment object over two Sequence objects: Alignment alignment = aligner.getAlignment(query, subject); You can then iterate over each position of this alignment and compute the identity: int matches = 0; for (int i = 1; i <= alignment.length(); i++) { Symbol querySym = alignment.symbolAt(query.getName(), i); Symbol subjectSym = alignment.symbolAt(subject.getName(), i); if (querySym!=null && querySym.equals(subjectSym)) matches++; } double identity = (double)alignment.length() / (double)matches; The code above will give you identity on a scale of 0.0 (no match) to 1.0 (exact match). cheers, Richard simpleyrx wrote: > Dear sir, > > Thank you for you letter. The program can work now. But I still > have a question, how to calculation the identity of the alignment ? > > Student > > > > > ??2007-12-18??"Richard Holland" ?????? > > The exception you are getting is caused by the following line: > > FiniteAlphabet alphabet = (FiniteAlphabet) > AlphabetManager.alphabetForName("Protein"); > > You should replace the whole line with this call: > > FiniteAlphabet alphabet = ProteinTools.getAlphabet(); > > If however your proteins contain the stop codon (*) then you will need > this line instead: > > FiniteAlphabet alphabet = ProteinTools.getTAlphabet(); > > Then the line will work and you will be able to continue testing the > remainder of your code. > > cheers, > Richard > > simpleyrx wrote: > > >> Dear sir, > > >> package edu.cau.strLab; >> import java.io.File; >> import org.biojava.bio.alignment.NeedlemanWunsch; >> import org.biojava.bio.alignment.SequenceAlignment; >> import org.biojava.bio.alignment.SmithWaterman; >> import org.biojava.bio.alignment.SubstitutionMatrix; >> import org.biojava.bio.seq.ProteinTools; >> import org.biojava.bio.seq.Sequence; >> import org.biojava.bio.symbol.AlphabetManager; >> import org.biojava.bio.symbol.FiniteAlphabet; > >> public class ProteinAlignment{ >> public static void main(String[] args) { >> // TODO Auto-generated method stub >> try { >> // The alphabet of the sequences. For this example DNA is choosen. >> FiniteAlphabet alphabet = (FiniteAlphabet) AlphabetManager.alphabetForName("Protein"); >> // Read the substitution matrix file. >> // For this example the matrix NUC.4.4 is good. >> SubstitutionMatrix matrix = new SubstitutionMatrix(alphabet, new File("E:\\bioinformatics_package\\matrices\\BLOSUM62")); >> // Define the default costs for sequence manipulation for the global alignment. >> SequenceAlignment aligner = new NeedlemanWunsch( >> 0, // match >> 3, // replace >> 2, // insert >> 2, // delete >> 1, // gapExtend >> matrix // SubstitutionMatrix >> ); >> // Sequence query = DNATools.createDNASequence("AC", "query"); >> // Sequence target = DNATools.createDNASequence("ACkG", "target"); >> Sequence query = ProteinTools.createProteinSequence("ACK","query"); >> Sequence subject = ProteinTools.createProteinSequence("ACK", "subject"); > >> // Perform an alignment and save the results. >> // aligner.pairwiseAlignment( >> // query, // first sequence >> // target // second one >> // ); >> // aligner.pairwiseAlignment(query, subject); > >> // Print the alignment to the screen > >> System.out.println("Global alignment with Needleman-Wunsch:\n" + aligner.getAlignmentString()); > >> // Perform a local alginment from the sequences with Smith-Waterman. >> // Firstly, define the expenses (penalties) for every single operation. >> // aligner = new SmithWaterman( >> // -1, // match >> // 3, // replace >> // 2, // insert >> // 2, // delete >> // 1, // gapExtend >> // matrix // SubstitutionMatrix >> // ); >> // // Perform the local alignment. >> // aligner.pairwiseAlignment(query, target); >> // >> // System.out.println("\nlocal alignment with SmithWaterman:\n" + aligner.getAlignmentString()); >> } catch (Exception exc) { >> exc.printStackTrace(); >> } >> } >> } > > >> the result is below: > > > >> java.util.NoSuchElementException: No alphabet for name Protein could be found >> at org.biojava.bio.symbol.AlphabetManager.alphabetForName(AlphabetManager.java:248) >> at edu.cau.strLab.ProteinAlignment.main(ProteinAlignment.java:20) > > > >> could sb tell why and how to use NeedlemanWunsch to align protein sequences ? > > > > >> ------------------------------------------------------------------------ > >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Richard Holland (BioMart) > EMBL EBI, Wellcome Trust Genome Campus, > Hinxton, Cambridgeshire CB10 1SD, UK > Tel. +44 (0)1223 494416 > > http://www.biomart.org/ > http://www.biojava.org/ - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHZ6dQ4C5LeMEKA/QRAqodAJ9wf9xxzJfgbXGH3YPxVg/ljxvskgCfVcQM oGKGETxB0HBOM1NexHEuJMI= =6zt9 -----END PGP SIGNATURE----- From felipe.albrecht at gmail.com Wed Dec 19 12:10:55 2007 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Wed, 19 Dec 2007 15:10:55 -0200 Subject: [Biojava-dev] protein_sequence_alignment In-Reply-To: <19647130.948901197970356286.JavaMail.coremail@bj163app25.163.com> References: <19647130.948901197970356286.JavaMail.coremail@bj163app25.163.com> Message-ID: >From http://biojava.org/wiki/BioJava:Cookbook:Alphabets it is: "PROTEIN" and not "Protein". In the 20th line, substitute FiniteAlphabet alphabet = (FiniteAlphabet) AlphabetManager.alphabetForName ("Protein"); to FiniteAlphabet alphabet = (FiniteAlphabet) AlphabetManager.alphabetForName ("PROTEIN"); Cheers, Felipe Albrecht On Dec 18, 2007 7:32 AM, simpleyrx wrote: > > > Dear sir, > > > package edu.cau.strLab; > import java.io.File; > import org.biojava.bio.alignment.NeedlemanWunsch; > import org.biojava.bio.alignment.SequenceAlignment; > import org.biojava.bio.alignment.SmithWaterman; > import org.biojava.bio.alignment.SubstitutionMatrix; > import org.biojava.bio.seq.ProteinTools; > import org.biojava.bio.seq.Sequence; > import org.biojava.bio.symbol.AlphabetManager; > import org.biojava.bio.symbol.FiniteAlphabet; > > public class ProteinAlignment{ > public static void main(String[] args) { > // TODO Auto-generated method stub > try { > // The alphabet of the sequences. For this example DNA is choosen. > FiniteAlphabet alphabet = (FiniteAlphabet) > AlphabetManager.alphabetForName("Protein"); > // Read the substitution matrix file. > // For this example the matrix NUC.4.4 is good. > SubstitutionMatrix matrix = new SubstitutionMatrix(alphabet, new > File("E:\\bioinformatics_package\\matrices\\BLOSUM62")); > // Define the default costs for sequence manipulation for the > global alignment. > SequenceAlignment aligner = new NeedlemanWunsch( > 0, // match > 3, // replace > 2, // insert > 2, // delete > 1, // gapExtend > matrix // SubstitutionMatrix > ); > // Sequence query = DNATools.createDNASequence("AC", "query"); > // Sequence target = DNATools.createDNASequence("ACkG", "target"); > Sequence query = ProteinTools.createProteinSequence("ACK","query"); > Sequence subject = ProteinTools.createProteinSequence("ACK", > "subject"); > > // Perform an alignment and save the results. > // aligner.pairwiseAlignment( > // query, // first sequence > // target // second one > // ); > // aligner.pairwiseAlignment(query, subject); > > // Print the alignment to the screen > > System.out.println("Global alignment with Needleman-Wunsch:\n" + > aligner.getAlignmentString()); > > // Perform a local alginment from the sequences with > Smith-Waterman. > // Firstly, define the expenses (penalties) for every single > operation. > // aligner = new SmithWaterman( > // -1, // match > // 3, // replace > // 2, // insert > // 2, // delete > // 1, // gapExtend > // matrix // SubstitutionMatrix > // ); > // // Perform the local alignment. > // aligner.pairwiseAlignment(query, target); > // > // System.out.println("\nlocal alignment with SmithWaterman:\n" + > aligner.getAlignmentString()); > } catch (Exception exc) { > exc.printStackTrace(); > } > } > } > > > the result is below: > > > > java.util.NoSuchElementException: No alphabet for name Protein could be > found > at org.biojava.bio.symbol.AlphabetManager.alphabetForName( > AlphabetManager.java:248) > at edu.cau.strLab.ProteinAlignment.main(ProteinAlignment.java:20) > > > > could sb tell why and how to use NeedlemanWunsch to align protein > sequences ? > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > From michaelgang at gmail.com Thu Dec 20 07:54:08 2007 From: michaelgang at gmail.com (Michael Gang) Date: Thu, 20 Dec 2007 14:54:08 +0200 Subject: [Biojava-dev] bioperl like blastparser Message-ID: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> Hi All, I used the interface of the java blast parser. I had mainly two problems with it: 1) The blast parser does not parse all the information (for example query length) 2) The blast parser parses the whole blast report into a list which eats a lot of memory. I would be interested to write and contribute a blast parser which parses all the information of the blast and parses the blast iteratively. Something like the following code in bioperl (just in Java). use Bio::SearchIO; # format can be 'fasta', 'blast' my $searchio = new Bio::SearchIO( -format => 'blastxml', -file => 'blastout.xml' ); while ( my $result = $searchio->next_result() ) { while( my $hit = $result->next_hit ) { # process the Bio::Search::Hit::HitI object while( my $hsp = $hit->next_hsp ) { # process the Bio::Search::HSP::HSPI object } } Would you be interested in such a contribution ? Best regards, Michael From holland at ebi.ac.uk Thu Dec 20 08:37:20 2007 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 20 Dec 2007 13:37:20 +0000 Subject: [Biojava-dev] bioperl like blastparser In-Reply-To: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> References: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> Message-ID: <476A7010.30502@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Definitely, yes! Thanks! Andreas Prlic (copied on this) will be able to help you more as he already knows a lot about the existing parser. cheers, Richard Michael Gang wrote: > Hi All, > > I used the interface of the java blast parser. > I had mainly two problems with it: > 1) The blast parser does not parse all the information (for example > query length) > 2) The blast parser parses the whole blast report into a list which > eats a lot of memory. > > I would be interested to write and contribute a blast parser which > parses all the information of the blast and parses the blast > iteratively. > Something like the following code in bioperl (just in Java). > use Bio::SearchIO; > # format can be 'fasta', 'blast' > my $searchio = new Bio::SearchIO( -format => 'blastxml', > -file => 'blastout.xml' ); > while ( my $result = $searchio->next_result() ) { > while( my $hit = $result->next_hit ) { > # process the Bio::Search::Hit::HitI object > while( my $hsp = $hit->next_hsp ) { > # process the Bio::Search::HSP::HSPI object > } > } > > Would you be interested in such a contribution ? > > Best regards, > Michael > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHanAH4C5LeMEKA/QRAlSWAJ9W4YMg+JzFjQwmp6ynRQJqUEz/dwCgnpkf e1p7LtPgjmgcFLBeGe2s4ug= =jr2B -----END PGP SIGNATURE----- From ap3 at sanger.ac.uk Thu Dec 20 11:15:31 2007 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Thu, 20 Dec 2007 16:15:31 +0000 Subject: [Biojava-dev] bioperl like blastparser In-Reply-To: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> References: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> Message-ID: Hi Michael, The blast parser (BlastLikeSaxParser) in BioJava has been around for a while and is frequently being used to parse a variety of different blast outputs. Still it is not complete and can not parse PSI blast. We have had a number of request about it lately so I suppose it needs a little maintenance now. To write a new blast parser from scratch will involve a significant amount of time. It will take time to fix all the bugs, add support for the different blast versions and write documentation. Much of this is already available in BioJava, so I would prefer if you could submit patches for the current blast parser. Would you also be interested to collaborate in this direction? Another feature that would be nice to add support for is the possibility to send off blast searches to webservices... Cheers, Andreas On 20 Dec 2007, at 12:54, Michael Gang wrote: > Hi All, > > I used the interface of the java blast parser. > I had mainly two problems with it: > 1) The blast parser does not parse all the information (for example > query length) > 2) The blast parser parses the whole blast report into a list which > eats a lot of memory. > > I would be interested to write and contribute a blast parser which > parses all the information of the blast and parses the blast > iteratively. > Something like the following code in bioperl (just in Java). > use Bio::SearchIO; > # format can be 'fasta', 'blast' > my $searchio = new Bio::SearchIO( -format => 'blastxml', > -file => 'blastout.xml' ); > while ( my $result = $searchio->next_result() ) { > while( my $hit = $result->next_hit ) { > # process the Bio::Search::Hit::HitI object > while( my $hsp = $hit->next_hsp ) { > # process the Bio::Search::HSP::HSPI object > } > } > > Would you be interested in such a contribution ? > > Best regards, > Michael > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From markjschreiber at gmail.com Fri Dec 21 02:59:27 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 21 Dec 2007 15:59:27 +0800 Subject: [Biojava-dev] bioperl like blastparser In-Reply-To: References: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> Message-ID: <93b45ca50712202359o2886762bk5f2e26d214014d95@mail.gmail.com> Hi - It is not required that you turn all Blast results into objects, because it is an event based parser you can do what you want with the events including turning them into objects or echoing them to STDOUT. Take a look at the examples in the cookbook. It may be that the query length is actually parsed but is not passed onto the object model by the event listeners. - Mark On Dec 21, 2007 12:15 AM, Andreas Prlic wrote: > Hi Michael, > > The blast parser (BlastLikeSaxParser) in BioJava has been around for > a while and is frequently being used to parse a variety > of different blast outputs. Still it is not complete and can not > parse PSI blast. We have had a number of request about it lately > so I suppose it needs a little maintenance now. > > To write a new blast parser from scratch will involve a significant > amount of time. It will take time to fix all the bugs, add support > for the different blast versions and write documentation. Much of > this is already available in BioJava, so I would prefer if you could > submit patches for > the current blast parser. Would you also be interested to > collaborate in this direction? > Another feature that would be nice to add support for is the > possibility to send off blast searches to webservices... > > Cheers, > Andreas > > > > On 20 Dec 2007, at 12:54, Michael Gang wrote: > > > Hi All, > > > > I used the interface of the java blast parser. > > I had mainly two problems with it: > > 1) The blast parser does not parse all the information (for example > > query length) > > 2) The blast parser parses the whole blast report into a list which > > eats a lot of memory. > > > > I would be interested to write and contribute a blast parser which > > parses all the information of the blast and parses the blast > > iteratively. > > Something like the following code in bioperl (just in Java). > > use Bio::SearchIO; > > # format can be 'fasta', 'blast' > > my $searchio = new Bio::SearchIO( -format => 'blastxml', > > -file => 'blastout.xml' ); > > while ( my $result = $searchio->next_result() ) { > > while( my $hit = $result->next_hit ) { > > # process the Bio::Search::Hit::HitI object > > while( my $hsp = $hit->next_hsp ) { > > # process the Bio::Search::HSP::HSPI object > > } > > } > > > > Would you be interested in such a contribution ? > > > > Best regards, > > Michael > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > ----------------------------------------------------------------------- > > Andreas Prlic Wellcome Trust Sanger Institute > Hinxton, Cambridge CB10 1SA, UK > +44 (0) 1223 49 6891 > > ----------------------------------------------------------------------- > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From michaelgang at gmail.com Sun Dec 23 10:22:24 2007 From: michaelgang at gmail.com (Michael Gang) Date: Sun, 23 Dec 2007 17:22:24 +0200 Subject: [Biojava-dev] bioperl like blastparser In-Reply-To: <93b45ca50712202359o2886762bk5f2e26d214014d95@mail.gmail.com> References: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> <93b45ca50712202359o2886762bk5f2e26d214014d95@mail.gmail.com> Message-ID: <6994d82b0712230722h7a7d3da1vb694a96740fc8e93@mail.gmail.com> Hi all, I've now added the extraction of the query length. Can someone explain me the procedure of checking in code to biojava ? I ran the unit tests in the biojava distribution? Are there additional tests available ? Best regards, Michael On Dec 21, 2007 9:59 AM, Mark Schreiber wrote: > Hi - > > It is not required that you turn all Blast results into objects, > because it is an event based parser you can do what you want with the > events including turning them into objects or echoing them to STDOUT. > Take a look at the examples in the cookbook. > > It may be that the query length is actually parsed but is not passed > onto the object model by the event listeners. > > - Mark > > > On Dec 21, 2007 12:15 AM, Andreas Prlic wrote: > > Hi Michael, > > > > The blast parser (BlastLikeSaxParser) in BioJava has been around for > > a while and is frequently being used to parse a variety > > of different blast outputs. Still it is not complete and can not > > parse PSI blast. We have had a number of request about it lately > > so I suppose it needs a little maintenance now. > > > > To write a new blast parser from scratch will involve a significant > > amount of time. It will take time to fix all the bugs, add support > > for the different blast versions and write documentation. Much of > > this is already available in BioJava, so I would prefer if you could > > submit patches for > > the current blast parser. Would you also be interested to > > collaborate in this direction? > > Another feature that would be nice to add support for is the > > possibility to send off blast searches to webservices... > > > > Cheers, > > Andreas > > > > > > > > On 20 Dec 2007, at 12:54, Michael Gang wrote: > > > > > Hi All, > > > > > > I used the interface of the java blast parser. > > > I had mainly two problems with it: > > > 1) The blast parser does not parse all the information (for example > > > query length) > > > 2) The blast parser parses the whole blast report into a list which > > > eats a lot of memory. > > > > > > I would be interested to write and contribute a blast parser which > > > parses all the information of the blast and parses the blast > > > iteratively. > > > Something like the following code in bioperl (just in Java). > > > use Bio::SearchIO; > > > # format can be 'fasta', 'blast' > > > my $searchio = new Bio::SearchIO( -format => 'blastxml', > > > -file => 'blastout.xml' ); > > > while ( my $result = $searchio->next_result() ) { > > > while( my $hit = $result->next_hit ) { > > > # process the Bio::Search::Hit::HitI object > > > while( my $hsp = $hit->next_hsp ) { > > > # process the Bio::Search::HSP::HSPI object > > > } > > > } > > > > > > Would you be interested in such a contribution ? > > > > > > Best regards, > > > Michael > > > _______________________________________________ > > > biojava-dev mailing list > > > biojava-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > ----------------------------------------------------------------------- > > > > Andreas Prlic Wellcome Trust Sanger Institute > > Hinxton, Cambridge CB10 1SA, UK > > +44 (0) 1223 49 6891 > > > > ----------------------------------------------------------------------- > > > > > > > > > > -- > > The Wellcome Trust Sanger Institute is operated by Genome Research > > Limited, a charity registered in England with number 1021457 and a > > company registered in England with number 2742969, whose registered > > office is 215 Euston Road, London, NW1 2BE. > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > From markjschreiber at gmail.com Sun Dec 23 19:32:32 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Mon, 24 Dec 2007 08:32:32 +0800 Subject: [Biojava-dev] bioperl like blastparser In-Reply-To: <6994d82b0712230722h7a7d3da1vb694a96740fc8e93@mail.gmail.com> References: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> <93b45ca50712202359o2886762bk5f2e26d214014d95@mail.gmail.com> <6994d82b0712230722h7a7d3da1vb694a96740fc8e93@mail.gmail.com> Message-ID: <93b45ca50712231632s51ffe0d4v861a5d16abb536e9@mail.gmail.com> Hi - We are currently merging the code base into subversion (from CVS) after this it will be possible to check in code again. For small additions it is usually easier to post the code to the dev list (in the body of the email as the list doesn't like attachments) or send it to one of the regular committers and get them to add it. The JUnit tests are the standard test package. If you have added new functionality it would be a good idea to add another test method in the appropriate JUnit test to make sure it works (and continues to work in the future). - Mark On Dec 23, 2007 11:22 PM, Michael Gang wrote: > Hi all, > > I've now added the extraction of the query length. > Can someone explain me the procedure of checking in code to biojava ? > I ran the unit tests in the biojava distribution? Are there additional > tests available ? > > Best regards, > Michael > > > On Dec 21, 2007 9:59 AM, Mark Schreiber wrote: > > Hi - > > > > It is not required that you turn all Blast results into objects, > > because it is an event based parser you can do what you want with the > > events including turning them into objects or echoing them to STDOUT. > > Take a look at the examples in the cookbook. > > > > It may be that the query length is actually parsed but is not passed > > onto the object model by the event listeners. > > > > - Mark > > > > > > On Dec 21, 2007 12:15 AM, Andreas Prlic wrote: > > > Hi Michael, > > > > > > The blast parser (BlastLikeSaxParser) in BioJava has been around for > > > a while and is frequently being used to parse a variety > > > of different blast outputs. Still it is not complete and can not > > > parse PSI blast. We have had a number of request about it lately > > > so I suppose it needs a little maintenance now. > > > > > > To write a new blast parser from scratch will involve a significant > > > amount of time. It will take time to fix all the bugs, add support > > > for the different blast versions and write documentation. Much of > > > this is already available in BioJava, so I would prefer if you could > > > submit patches for > > > the current blast parser. Would you also be interested to > > > collaborate in this direction? > > > Another feature that would be nice to add support for is the > > > possibility to send off blast searches to webservices... > > > > > > Cheers, > > > Andreas > > > > > > > > > > > > On 20 Dec 2007, at 12:54, Michael Gang wrote: > > > > > > > Hi All, > > > > > > > > I used the interface of the java blast parser. > > > > I had mainly two problems with it: > > > > 1) The blast parser does not parse all the information (for example > > > > query length) > > > > 2) The blast parser parses the whole blast report into a list which > > > > eats a lot of memory. > > > > > > > > I would be interested to write and contribute a blast parser which > > > > parses all the information of the blast and parses the blast > > > > iteratively. > > > > Something like the following code in bioperl (just in Java). > > > > use Bio::SearchIO; > > > > # format can be 'fasta', 'blast' > > > > my $searchio = new Bio::SearchIO( -format => 'blastxml', > > > > -file => 'blastout.xml' ); > > > > while ( my $result = $searchio->next_result() ) { > > > > while( my $hit = $result->next_hit ) { > > > > # process the Bio::Search::Hit::HitI object > > > > while( my $hsp = $hit->next_hsp ) { > > > > # process the Bio::Search::HSP::HSPI object > > > > } > > > > } > > > > > > > > Would you be interested in such a contribution ? > > > > > > > > Best regards, > > > > Michael > > > > _______________________________________________ > > > > biojava-dev mailing list > > > > biojava-dev at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > ----------------------------------------------------------------------- > > > > > > Andreas Prlic Wellcome Trust Sanger Institute > > > Hinxton, Cambridge CB10 1SA, UK > > > +44 (0) 1223 49 6891 > > > > > > ----------------------------------------------------------------------- > > > > > > > > > > > > > > > -- > > > The Wellcome Trust Sanger Institute is operated by Genome Research > > > Limited, a charity registered in England with number 1021457 and a > > > company registered in England with number 2742969, whose registered > > > office is 215 Euston Road, London, NW1 2BE. > > > > > > _______________________________________________ > > > biojava-dev mailing list > > > biojava-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From michaelgang at gmail.com Mon Dec 24 03:29:45 2007 From: michaelgang at gmail.com (Michael Gang) Date: Mon, 24 Dec 2007 10:29:45 +0200 Subject: [Biojava-dev] bioperl like blastparser In-Reply-To: <93b45ca50712231632s51ffe0d4v861a5d16abb536e9@mail.gmail.com> References: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> <93b45ca50712202359o2886762bk5f2e26d214014d95@mail.gmail.com> <6994d82b0712230722h7a7d3da1vb694a96740fc8e93@mail.gmail.com> <93b45ca50712231632s51ffe0d4v861a5d16abb536e9@mail.gmail.com> Message-ID: <6994d82b0712240029o14dabb3at241f49361681345a@mail.gmail.com> OK, I made four changes, in the package org.biojava.bio.program.sax; at class BlastSaxParser 1) at line 86 i added the variable private String oQueryLength; 2) at the method private void interpret(String poLine) throws SAXException in the if "if (iState == IN_HEADER) {" at line 209 i added if (poLine.startsWith("(", 9) && poLine.endsWith("letters)") ) { StringTokenizer st = new StringTokenizer(poLine); oQueryLength = st.nextToken().substring(1); } 3)at the function private void emitHeaderIds() throws SAXException { at line 564 i added oAttQName.setQName("queryLength"); oAtts.addAttribute(oAttQName.getURI(), oAttQName.getLocalName(), oAttQName.getQName(), "CDATA", oQueryLength); at the package org.biojava.bio.program.ssbind; in HeaderStAXHandler.java 4)at the private class QueryIDStAXHandler at line 95 I changed the method startelement public void startElement(String uri, String localName, String qName, Attributes attr, DelegationManager dm) throws SAXException { ssContext.getSearchContentHandler().setQueryID(attr.getValue("id")); if (attr.getValue("queryLength") != null) { ssContext.getSearchContentHandler().addSearchProperty("queryLength", attr.getValue("queryLength")); } } } Now query length is a property of the annotation of a blast result. It is really fun to participate in the biojava project. Best regards, Michael On Dec 24, 2007 2:32 AM, Mark Schreiber wrote: > Hi - > > We are currently merging the code base into subversion (from CVS) > after this it will be possible to check in code again. For small > additions it is usually easier to post the code to the dev list (in > the body of the email as the list doesn't like attachments) or send it > to one of the regular committers and get them to add it. > > The JUnit tests are the standard test package. If you have added new > functionality it would be a good idea to add another test method in > the appropriate JUnit test to make sure it works (and continues to > work in the future). > > - Mark > > > On Dec 23, 2007 11:22 PM, Michael Gang wrote: > > Hi all, > > > > I've now added the extraction of the query length. > > Can someone explain me the procedure of checking in code to biojava ? > > I ran the unit tests in the biojava distribution? Are there additional > > tests available ? > > > > Best regards, > > Michael > > > > > > On Dec 21, 2007 9:59 AM, Mark Schreiber wrote: > > > Hi - > > > > > > It is not required that you turn all Blast results into objects, > > > because it is an event based parser you can do what you want with the > > > events including turning them into objects or echoing them to STDOUT. > > > Take a look at the examples in the cookbook. > > > > > > It may be that the query length is actually parsed but is not passed > > > onto the object model by the event listeners. > > > > > > - Mark > > > > > > > > > On Dec 21, 2007 12:15 AM, Andreas Prlic wrote: > > > > Hi Michael, > > > > > > > > The blast parser (BlastLikeSaxParser) in BioJava has been around for > > > > a while and is frequently being used to parse a variety > > > > of different blast outputs. Still it is not complete and can not > > > > parse PSI blast. We have had a number of request about it lately > > > > so I suppose it needs a little maintenance now. > > > > > > > > To write a new blast parser from scratch will involve a significant > > > > amount of time. It will take time to fix all the bugs, add support > > > > for the different blast versions and write documentation. Much of > > > > this is already available in BioJava, so I would prefer if you could > > > > submit patches for > > > > the current blast parser. Would you also be interested to > > > > collaborate in this direction? > > > > Another feature that would be nice to add support for is the > > > > possibility to send off blast searches to webservices... > > > > > > > > Cheers, > > > > Andreas > > > > > > > > > > > > > > > > On 20 Dec 2007, at 12:54, Michael Gang wrote: > > > > > > > > > Hi All, > > > > > > > > > > I used the interface of the java blast parser. > > > > > I had mainly two problems with it: > > > > > 1) The blast parser does not parse all the information (for example > > > > > query length) > > > > > 2) The blast parser parses the whole blast report into a list which > > > > > eats a lot of memory. > > > > > > > > > > I would be interested to write and contribute a blast parser which > > > > > parses all the information of the blast and parses the blast > > > > > iteratively. > > > > > Something like the following code in bioperl (just in Java). > > > > > use Bio::SearchIO; > > > > > # format can be 'fasta', 'blast' > > > > > my $searchio = new Bio::SearchIO( -format => 'blastxml', > > > > > -file => 'blastout.xml' ); > > > > > while ( my $result = $searchio->next_result() ) { > > > > > while( my $hit = $result->next_hit ) { > > > > > # process the Bio::Search::Hit::HitI object > > > > > while( my $hsp = $hit->next_hsp ) { > > > > > # process the Bio::Search::HSP::HSPI object > > > > > } > > > > > } > > > > > > > > > > Would you be interested in such a contribution ? > > > > > > > > > > Best regards, > > > > > Michael > > > > > _______________________________________________ > > > > > biojava-dev mailing list > > > > > biojava-dev at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > ----------------------------------------------------------------------- > > > > > > > > Andreas Prlic Wellcome Trust Sanger Institute > > > > Hinxton, Cambridge CB10 1SA, UK > > > > +44 (0) 1223 49 6891 > > > > > > > > ----------------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > -- > > > > The Wellcome Trust Sanger Institute is operated by Genome Research > > > > Limited, a charity registered in England with number 1021457 and a > > > > company registered in England with number 2742969, whose registered > > > > office is 215 Euston Road, London, NW1 2BE. > > > > > > > > _______________________________________________ > > > > biojava-dev mailing list > > > > biojava-dev at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > From markjschreiber at gmail.com Tue Dec 25 16:44:32 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 26 Dec 2007 05:44:32 +0800 Subject: [Biojava-dev] bioperl like blastparser In-Reply-To: <6994d82b0712240029o14dabb3at241f49361681345a@mail.gmail.com> References: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> <93b45ca50712202359o2886762bk5f2e26d214014d95@mail.gmail.com> <6994d82b0712230722h7a7d3da1vb694a96740fc8e93@mail.gmail.com> <93b45ca50712231632s51ffe0d4v861a5d16abb536e9@mail.gmail.com> <6994d82b0712240029o14dabb3at241f49361681345a@mail.gmail.com> Message-ID: <93b45ca50712251344t2a27ffc3o5d6e7defee116326@mail.gmail.com> Hi - When will the subversion system be ready for checkin? - Mark On Dec 24, 2007 4:29 PM, Michael Gang wrote: > OK, > I made four changes, > in the package org.biojava.bio.program.sax; at class BlastSaxParser > 1) at line 86 i added the variable > private String oQueryLength; > 2) at the method private void interpret(String poLine) throws SAXException > in the if "if (iState == IN_HEADER) {" > at line 209 i added > > if (poLine.startsWith("(", 9) && poLine.endsWith("letters)") ) { > StringTokenizer st = new StringTokenizer(poLine); > oQueryLength = st.nextToken().substring(1); > } > 3)at the function private void emitHeaderIds() throws SAXException { > at line 564 i added > oAttQName.setQName("queryLength"); > oAtts.addAttribute(oAttQName.getURI(), > oAttQName.getLocalName(), > oAttQName.getQName(), > "CDATA", oQueryLength); > > at the package org.biojava.bio.program.ssbind; in HeaderStAXHandler.java > 4)at the private class QueryIDStAXHandler at line 95 I changed the > method startelement > > public void startElement(String uri, > String localName, > String qName, > Attributes attr, > DelegationManager dm) > throws SAXException > { > ssContext.getSearchContentHandler().setQueryID(attr.getValue("id")); > if (attr.getValue("queryLength") != null) > { > ssContext.getSearchContentHandler().addSearchProperty("queryLength", > attr.getValue("queryLength")); > } > } > } > > Now query length is a property of the annotation of a blast result. > It is really fun to participate in the biojava project. > > Best regards, > Michael > > > On Dec 24, 2007 2:32 AM, Mark Schreiber wrote: > > Hi - > > > > We are currently merging the code base into subversion (from CVS) > > after this it will be possible to check in code again. For small > > additions it is usually easier to post the code to the dev list (in > > the body of the email as the list doesn't like attachments) or send it > > to one of the regular committers and get them to add it. > > > > The JUnit tests are the standard test package. If you have added new > > functionality it would be a good idea to add another test method in > > the appropriate JUnit test to make sure it works (and continues to > > work in the future). > > > > - Mark > > > > > > On Dec 23, 2007 11:22 PM, Michael Gang wrote: > > > Hi all, > > > > > > I've now added the extraction of the query length. > > > Can someone explain me the procedure of checking in code to biojava ? > > > I ran the unit tests in the biojava distribution? Are there additional > > > tests available ? > > > > > > Best regards, > > > Michael > > > > > > > > > On Dec 21, 2007 9:59 AM, Mark Schreiber wrote: > > > > Hi - > > > > > > > > It is not required that you turn all Blast results into objects, > > > > because it is an event based parser you can do what you want with the > > > > events including turning them into objects or echoing them to STDOUT. > > > > Take a look at the examples in the cookbook. > > > > > > > > It may be that the query length is actually parsed but is not passed > > > > onto the object model by the event listeners. > > > > > > > > - Mark > > > > > > > > > > > > On Dec 21, 2007 12:15 AM, Andreas Prlic wrote: > > > > > Hi Michael, > > > > > > > > > > The blast parser (BlastLikeSaxParser) in BioJava has been around for > > > > > a while and is frequently being used to parse a variety > > > > > of different blast outputs. Still it is not complete and can not > > > > > parse PSI blast. We have had a number of request about it lately > > > > > so I suppose it needs a little maintenance now. > > > > > > > > > > To write a new blast parser from scratch will involve a significant > > > > > amount of time. It will take time to fix all the bugs, add support > > > > > for the different blast versions and write documentation. Much of > > > > > this is already available in BioJava, so I would prefer if you could > > > > > submit patches for > > > > > the current blast parser. Would you also be interested to > > > > > collaborate in this direction? > > > > > Another feature that would be nice to add support for is the > > > > > possibility to send off blast searches to webservices... > > > > > > > > > > Cheers, > > > > > Andreas > > > > > > > > > > > > > > > > > > > > On 20 Dec 2007, at 12:54, Michael Gang wrote: > > > > > > > > > > > Hi All, > > > > > > > > > > > > I used the interface of the java blast parser. > > > > > > I had mainly two problems with it: > > > > > > 1) The blast parser does not parse all the information (for example > > > > > > query length) > > > > > > 2) The blast parser parses the whole blast report into a list which > > > > > > eats a lot of memory. > > > > > > > > > > > > I would be interested to write and contribute a blast parser which > > > > > > parses all the information of the blast and parses the blast > > > > > > iteratively. > > > > > > Something like the following code in bioperl (just in Java). > > > > > > use Bio::SearchIO; > > > > > > # format can be 'fasta', 'blast' > > > > > > my $searchio = new Bio::SearchIO( -format => 'blastxml', > > > > > > -file => 'blastout.xml' ); > > > > > > while ( my $result = $searchio->next_result() ) { > > > > > > while( my $hit = $result->next_hit ) { > > > > > > # process the Bio::Search::Hit::HitI object > > > > > > while( my $hsp = $hit->next_hsp ) { > > > > > > # process the Bio::Search::HSP::HSPI object > > > > > > } > > > > > > } > > > > > > > > > > > > Would you be interested in such a contribution ? > > > > > > > > > > > > Best regards, > > > > > > Michael > > > > > > _______________________________________________ > > > > > > biojava-dev mailing list > > > > > > biojava-dev at lists.open-bio.org > > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > ----------------------------------------------------------------------- > > > > > > > > > > Andreas Prlic Wellcome Trust Sanger Institute > > > > > Hinxton, Cambridge CB10 1SA, UK > > > > > +44 (0) 1223 49 6891 > > > > > > > > > > ----------------------------------------------------------------------- > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > The Wellcome Trust Sanger Institute is operated by Genome Research > > > > > Limited, a charity registered in England with number 1021457 and a > > > > > company registered in England with number 2742969, whose registered > > > > > office is 215 Euston Road, London, NW1 2BE. > > > > > > > > > > _______________________________________________ > > > > > biojava-dev mailing list > > > > > biojava-dev at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > > > > > > > _______________________________________________ > > > biojava-dev mailing list > > > biojava-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From ap3 at sanger.ac.uk Tue Dec 25 18:42:39 2007 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Tue, 25 Dec 2007 23:42:39 +0000 Subject: [Biojava-dev] bioperl like blastparser In-Reply-To: <93b45ca50712251344t2a27ffc3o5d6e7defee116326@mail.gmail.com> References: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> <93b45ca50712202359o2886762bk5f2e26d214014d95@mail.gmail.com> <6994d82b0712230722h7a7d3da1vb694a96740fc8e93@mail.gmail.com> <93b45ca50712231632s51ffe0d4v861a5d16abb536e9@mail.gmail.com> <6994d82b0712240029o14dabb3at241f49361681345a@mail.gmail.com> <93b45ca50712251344t2a27ffc3o5d6e7defee116326@mail.gmail.com> Message-ID: Hi Mark, Unfortunately the biojava svn respository is not ready yet. George has converted our CVS to an initial svn dump, which I tested and fixed some details. This dump has been ready since dezember 17th. - ( see dev.open- bio.org:~andreas/biojava-final.svndump.bz2 ) The next step is to load this into the public open-bio repository, after which (and some more testing) the new biojava repository would be ready for new commits. At the present I am waiting for somebody who has admin rights on the open-bio servers to do these final steps. (or to delegate and give permissions to somebody else). I tried to contact support at open-bio, root-l, as well as mailing several people directly, but so far I did not get a response. could be that the holiday season is slowing response times down... Andreas On 25 Dec 2007, at 21:44, Mark Schreiber wrote: > Hi - > > When will the subversion system be ready for checkin? > > - Mark > > On Dec 24, 2007 4:29 PM, Michael Gang wrote: >> OK, >> I made four changes, >> in the package org.biojava.bio.program.sax; at class BlastSaxParser >> 1) at line 86 i added the variable >> private String >> oQueryLength; >> 2) at the method private void interpret(String poLine) throws >> SAXException >> in the if "if (iState == IN_HEADER) {" >> at line 209 i added >> >> if (poLine.startsWith("(", 9) && poLine.endsWith("letters)") ) { >> StringTokenizer st = new StringTokenizer(poLine); >> oQueryLength = st.nextToken().substring(1); >> } >> 3)at the function private void emitHeaderIds() throws SAXException { >> at line 564 i added >> oAttQName.setQName("queryLength"); >> oAtts.addAttribute(oAttQName.getURI(), >> oAttQName.getLocalName(), >> oAttQName.getQName(), >> "CDATA", oQueryLength); >> >> at the package org.biojava.bio.program.ssbind; in >> HeaderStAXHandler.java >> 4)at the private class QueryIDStAXHandler at line 95 I changed the >> method startelement >> >> public void startElement(String uri, >> String localName, >> String qName, >> Attributes attr, >> DelegationManager dm) >> throws SAXException >> { >> ssContext.getSearchContentHandler().setQueryID >> (attr.getValue("id")); >> if (attr.getValue("queryLength") != null) >> { >> ssContext.getSearchContentHandler >> ().addSearchProperty("queryLength", >> attr.getValue("queryLength")); >> } >> } >> } >> >> Now query length is a property of the annotation of a blast result. >> It is really fun to participate in the biojava project. >> >> Best regards, >> Michael >> >> >> On Dec 24, 2007 2:32 AM, Mark Schreiber >> wrote: >>> Hi - >>> >>> We are currently merging the code base into subversion (from CVS) >>> after this it will be possible to check in code again. For small >>> additions it is usually easier to post the code to the dev list (in >>> the body of the email as the list doesn't like attachments) or >>> send it >>> to one of the regular committers and get them to add it. >>> >>> The JUnit tests are the standard test package. If you have added new >>> functionality it would be a good idea to add another test method in >>> the appropriate JUnit test to make sure it works (and continues to >>> work in the future). >>> >>> - Mark >>> >>> >>> On Dec 23, 2007 11:22 PM, Michael Gang >>> wrote: >>>> Hi all, >>>> >>>> I've now added the extraction of the query length. >>>> Can someone explain me the procedure of checking in code to >>>> biojava ? >>>> I ran the unit tests in the biojava distribution? Are there >>>> additional >>>> tests available ? >>>> >>>> Best regards, >>>> Michael >>>> >>>> >>>> On Dec 21, 2007 9:59 AM, Mark Schreiber >>>> wrote: >>>>> Hi - >>>>> >>>>> It is not required that you turn all Blast results into objects, >>>>> because it is an event based parser you can do what you want >>>>> with the >>>>> events including turning them into objects or echoing them to >>>>> STDOUT. >>>>> Take a look at the examples in the cookbook. >>>>> >>>>> It may be that the query length is actually parsed but is not >>>>> passed >>>>> onto the object model by the event listeners. >>>>> >>>>> - Mark >>>>> >>>>> >>>>> On Dec 21, 2007 12:15 AM, Andreas Prlic wrote: >>>>>> Hi Michael, >>>>>> >>>>>> The blast parser (BlastLikeSaxParser) in BioJava has been >>>>>> around for >>>>>> a while and is frequently being used to parse a variety >>>>>> of different blast outputs. Still it is not complete and can not >>>>>> parse PSI blast. We have had a number of request about it lately >>>>>> so I suppose it needs a little maintenance now. >>>>>> >>>>>> To write a new blast parser from scratch will involve a >>>>>> significant >>>>>> amount of time. It will take time to fix all the bugs, add >>>>>> support >>>>>> for the different blast versions and write documentation. Much of >>>>>> this is already available in BioJava, so I would prefer if you >>>>>> could >>>>>> submit patches for >>>>>> the current blast parser. Would you also be interested to >>>>>> collaborate in this direction? >>>>>> Another feature that would be nice to add support for is the >>>>>> possibility to send off blast searches to webservices... >>>>>> >>>>>> Cheers, >>>>>> Andreas >>>>>> >>>>>> >>>>>> >>>>>> On 20 Dec 2007, at 12:54, Michael Gang wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> I used the interface of the java blast parser. >>>>>>> I had mainly two problems with it: >>>>>>> 1) The blast parser does not parse all the information (for >>>>>>> example >>>>>>> query length) >>>>>>> 2) The blast parser parses the whole blast report into a list >>>>>>> which >>>>>>> eats a lot of memory. >>>>>>> >>>>>>> I would be interested to write and contribute a blast parser >>>>>>> which >>>>>>> parses all the information of the blast and parses the blast >>>>>>> iteratively. >>>>>>> Something like the following code in bioperl (just in Java). >>>>>>> use Bio::SearchIO; >>>>>>> # format can be 'fasta', 'blast' >>>>>>> my $searchio = new Bio::SearchIO( -format => 'blastxml', >>>>>>> -file => >>>>>>> 'blastout.xml' ); >>>>>>> while ( my $result = $searchio->next_result() ) { >>>>>>> while( my $hit = $result->next_hit ) { >>>>>>> # process the Bio::Search::Hit::HitI object >>>>>>> while( my $hsp = $hit->next_hsp ) { >>>>>>> # process the Bio::Search::HSP::HSPI object >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> Would you be interested in such a contribution ? >>>>>>> >>>>>>> Best regards, >>>>>>> Michael >>>>>>> _______________________________________________ >>>>>>> biojava-dev mailing list >>>>>>> biojava-dev at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>> >>>>>> ----------------------------------------------------------------- >>>>>> ------ >>>>>> >>>>>> Andreas Prlic Wellcome Trust Sanger Institute >>>>>> Hinxton, Cambridge CB10 1SA, UK >>>>>> +44 (0) 1223 49 6891 >>>>>> >>>>>> ----------------------------------------------------------------- >>>>>> ------ >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> The Wellcome Trust Sanger Institute is operated by Genome >>>>>> Research >>>>>> Limited, a charity registered in England with number 1021457 >>>>>> and a >>>>>> company registered in England with number 2742969, whose >>>>>> registered >>>>>> office is 215 Euston Road, London, NW1 2BE. >>>>>> >>>>>> _______________________________________________ >>>>>> biojava-dev mailing list >>>>>> biojava-dev at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>> >>>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jason at bioperl.org Wed Dec 26 01:32:20 2007 From: jason at bioperl.org (Jason Stajich) Date: Tue, 25 Dec 2007 22:32:20 -0800 Subject: [Biojava-dev] bioperl like blastparser In-Reply-To: References: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> <93b45ca50712202359o2886762bk5f2e26d214014d95@mail.gmail.com> <6994d82b0712230722h7a7d3da1vb694a96740fc8e93@mail.gmail.com> <93b45ca50712231632s51ffe0d4v861a5d16abb536e9@mail.gmail.com> <6994d82b0712240029o14dabb3at241f49361681345a@mail.gmail.com> <93b45ca50712251344t2a27ffc3o5d6e7defee116326@mail.gmail.com> Message-ID: <6721E1DC-5BE9-4CE0-BEF5-1D998F4BF0C4@bioperl.org> You just need to put the repositor(ies) in /home/svn-repositories/biojava anyone in the biojava group can write there. you'll want to delete the existing biojava-live that is in there. I'm traveling most of 26th and will be on vacation most of the week, but will check in when I have a chance. -jason On Dec 25, 2007, at 3:42 PM, Andreas Prlic wrote: > Hi Mark, > > Unfortunately the biojava svn respository is not ready yet. > > George has converted our CVS to an initial svn dump, which I tested > and fixed some details. > This dump has been ready since dezember 17th. - ( see dev.open- > bio.org:~andreas/biojava-final.svndump.bz2 ) > The next step is to load this into the public open-bio repository, > after which (and some more testing) the new biojava repository > would be ready for new commits. > > At the present I am waiting for somebody who has admin rights on > the open-bio servers to do these final steps. > (or to delegate and give permissions to somebody else). > > I tried to contact support at open-bio, root-l, as well as mailing > several people directly, > but so far I did not get a response. could be that the holiday > season is slowing response times down... > > Andreas > > > > On 25 Dec 2007, at 21:44, Mark Schreiber wrote: > >> Hi - >> >> When will the subversion system be ready for checkin? >> >> - Mark >> >> On Dec 24, 2007 4:29 PM, Michael Gang wrote: >>> OK, >>> I made four changes, >>> in the package org.biojava.bio.program.sax; at class BlastSaxParser >>> 1) at line 86 i added the variable >>> private String >>> oQueryLength; >>> 2) at the method private void interpret(String poLine) throws >>> SAXException >>> in the if "if (iState == IN_HEADER) {" >>> at line 209 i added >>> >>> if (poLine.startsWith("(", 9) && poLine.endsWith("letters)") ) { >>> StringTokenizer st = new StringTokenizer(poLine); >>> oQueryLength = st.nextToken().substring(1); >>> } >>> 3)at the function private void emitHeaderIds() throws SAXException { >>> at line 564 i added >>> oAttQName.setQName("queryLength"); >>> oAtts.addAttribute(oAttQName.getURI(), >>> oAttQName.getLocalName(), >>> oAttQName.getQName(), >>> "CDATA", oQueryLength); >>> >>> at the package org.biojava.bio.program.ssbind; in >>> HeaderStAXHandler.java >>> 4)at the private class QueryIDStAXHandler at line 95 I changed the >>> method startelement >>> >>> public void startElement(String uri, >>> String localName, >>> String qName, >>> Attributes attr, >>> DelegationManager dm) >>> throws SAXException >>> { >>> ssContext.getSearchContentHandler().setQueryID >>> (attr.getValue("id")); >>> if (attr.getValue("queryLength") != null) >>> { >>> ssContext.getSearchContentHandler >>> ().addSearchProperty("queryLength", >>> attr.getValue("queryLength")); >>> } >>> } >>> } >>> >>> Now query length is a property of the annotation of a blast result. >>> It is really fun to participate in the biojava project. >>> >>> Best regards, >>> Michael >>> >>> >>> On Dec 24, 2007 2:32 AM, Mark Schreiber >>> wrote: >>>> Hi - >>>> >>>> We are currently merging the code base into subversion (from CVS) >>>> after this it will be possible to check in code again. For small >>>> additions it is usually easier to post the code to the dev list (in >>>> the body of the email as the list doesn't like attachments) or >>>> send it >>>> to one of the regular committers and get them to add it. >>>> >>>> The JUnit tests are the standard test package. If you have added >>>> new >>>> functionality it would be a good idea to add another test method in >>>> the appropriate JUnit test to make sure it works (and continues to >>>> work in the future). >>>> >>>> - Mark >>>> >>>> >>>> On Dec 23, 2007 11:22 PM, Michael Gang >>>> wrote: >>>>> Hi all, >>>>> >>>>> I've now added the extraction of the query length. >>>>> Can someone explain me the procedure of checking in code to >>>>> biojava ? >>>>> I ran the unit tests in the biojava distribution? Are there >>>>> additional >>>>> tests available ? >>>>> >>>>> Best regards, >>>>> Michael >>>>> >>>>> >>>>> On Dec 21, 2007 9:59 AM, Mark Schreiber >>>>> wrote: >>>>>> Hi - >>>>>> >>>>>> It is not required that you turn all Blast results into objects, >>>>>> because it is an event based parser you can do what you want >>>>>> with the >>>>>> events including turning them into objects or echoing them to >>>>>> STDOUT. >>>>>> Take a look at the examples in the cookbook. >>>>>> >>>>>> It may be that the query length is actually parsed but is not >>>>>> passed >>>>>> onto the object model by the event listeners. >>>>>> >>>>>> - Mark >>>>>> >>>>>> >>>>>> On Dec 21, 2007 12:15 AM, Andreas Prlic wrote: >>>>>>> Hi Michael, >>>>>>> >>>>>>> The blast parser (BlastLikeSaxParser) in BioJava has been >>>>>>> around for >>>>>>> a while and is frequently being used to parse a variety >>>>>>> of different blast outputs. Still it is not complete and can not >>>>>>> parse PSI blast. We have had a number of request about it lately >>>>>>> so I suppose it needs a little maintenance now. >>>>>>> >>>>>>> To write a new blast parser from scratch will involve a >>>>>>> significant >>>>>>> amount of time. It will take time to fix all the bugs, add >>>>>>> support >>>>>>> for the different blast versions and write documentation. >>>>>>> Much of >>>>>>> this is already available in BioJava, so I would prefer if >>>>>>> you could >>>>>>> submit patches for >>>>>>> the current blast parser. Would you also be interested to >>>>>>> collaborate in this direction? >>>>>>> Another feature that would be nice to add support for is the >>>>>>> possibility to send off blast searches to webservices... >>>>>>> >>>>>>> Cheers, >>>>>>> Andreas >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 20 Dec 2007, at 12:54, Michael Gang wrote: >>>>>>> >>>>>>>> Hi All, >>>>>>>> >>>>>>>> I used the interface of the java blast parser. >>>>>>>> I had mainly two problems with it: >>>>>>>> 1) The blast parser does not parse all the information (for >>>>>>>> example >>>>>>>> query length) >>>>>>>> 2) The blast parser parses the whole blast report into a >>>>>>>> list which >>>>>>>> eats a lot of memory. >>>>>>>> >>>>>>>> I would be interested to write and contribute a blast parser >>>>>>>> which >>>>>>>> parses all the information of the blast and parses the blast >>>>>>>> iteratively. >>>>>>>> Something like the following code in bioperl (just in Java). >>>>>>>> use Bio::SearchIO; >>>>>>>> # format can be 'fasta', 'blast' >>>>>>>> my $searchio = new Bio::SearchIO( -format => 'blastxml', >>>>>>>> -file => >>>>>>>> 'blastout.xml' ); >>>>>>>> while ( my $result = $searchio->next_result() ) { >>>>>>>> while( my $hit = $result->next_hit ) { >>>>>>>> # process the Bio::Search::Hit::HitI object >>>>>>>> while( my $hsp = $hit->next_hsp ) { >>>>>>>> # process the Bio::Search::HSP::HSPI object >>>>>>>> } >>>>>>>> } >>>>>>>> >>>>>>>> Would you be interested in such a contribution ? >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Michael >>>>>>>> _______________________________________________ >>>>>>>> biojava-dev mailing list >>>>>>>> biojava-dev at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>>> >>>>>>> ---------------------------------------------------------------- >>>>>>> ------- >>>>>>> >>>>>>> Andreas Prlic Wellcome Trust Sanger Institute >>>>>>> Hinxton, Cambridge CB10 1SA, UK >>>>>>> +44 (0) 1223 49 6891 >>>>>>> >>>>>>> ---------------------------------------------------------------- >>>>>>> ------- >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> The Wellcome Trust Sanger Institute is operated by Genome >>>>>>> Research >>>>>>> Limited, a charity registered in England with number 1021457 >>>>>>> and a >>>>>>> company registered in England with number 2742969, whose >>>>>>> registered >>>>>>> office is 215 Euston Road, London, NW1 2BE. >>>>>>> >>>>>>> _______________________________________________ >>>>>>> biojava-dev mailing list >>>>>>> biojava-dev at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>>> >>>>>> >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>> >>>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > ---------------------------------------------------------------------- > - > > Andreas Prlic Wellcome Trust Sanger Institute > Hinxton, Cambridge CB10 1SA, UK > +44 (0) 1223 49 6891 > > ---------------------------------------------------------------------- > - > > > > > -- > The Wellcome Trust Sanger Institute is operated by Genome > ResearchLimited, a charity registered in England with number > 1021457 and acompany registered in England with number 2742969, > whose registeredoffice is 215 Euston Road, London, NW1 2BE. From cjfields at uiuc.edu Wed Dec 26 09:59:46 2007 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 26 Dec 2007 08:59:46 -0600 Subject: [Biojava-dev] bioperl like blastparser In-Reply-To: <6721E1DC-5BE9-4CE0-BEF5-1D998F4BF0C4@bioperl.org> References: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> <93b45ca50712202359o2886762bk5f2e26d214014d95@mail.gmail.com> <6994d82b0712230722h7a7d3da1vb694a96740fc8e93@mail.gmail.com> <93b45ca50712231632s51ffe0d4v861a5d16abb536e9@mail.gmail.com> <6994d82b0712240029o14dabb3at241f49361681345a@mail.gmail.com> <93b45ca50712251344t2a27ffc3o5d6e7defee116326@mail.gmail.com> <6721E1DC-5BE9-4CE0-BEF5-1D998F4BF0C4@bioperl.org> Message-ID: <9B0545CA-52BA-41D1-8595-75D6BC607AB0@uiuc.edu> It looks like someone got around to it already (biojava/biojava-live is new, permissions look set). Is everything working? chris On Dec 26, 2007, at 12:32 AM, Jason Stajich wrote: > You just need to put the repositor(ies) in > /home/svn-repositories/biojava > > anyone in the biojava group can write there. > you'll want to delete the existing biojava-live that is in there. > > I'm traveling most of 26th and will be on vacation most of the week, > but will check in when I have a chance. > > -jason > > On Dec 25, 2007, at 3:42 PM, Andreas Prlic wrote: > >> Hi Mark, >> >> Unfortunately the biojava svn respository is not ready yet. >> >> George has converted our CVS to an initial svn dump, which I tested >> and fixed some details. >> This dump has been ready since dezember 17th. - ( see dev.open- >> bio.org:~andreas/biojava-final.svndump.bz2 ) >> The next step is to load this into the public open-bio repository, >> after which (and some more testing) the new biojava repository >> would be ready for new commits. >> >> At the present I am waiting for somebody who has admin rights on >> the open-bio servers to do these final steps. >> (or to delegate and give permissions to somebody else). >> >> I tried to contact support at open-bio, root-l, as well as mailing >> several people directly, >> but so far I did not get a response. could be that the holiday >> season is slowing response times down... >> >> Andreas >> >> >> >> On 25 Dec 2007, at 21:44, Mark Schreiber wrote: >> >>> Hi - >>> >>> When will the subversion system be ready for checkin? >>> >>> - Mark >>> >>> On Dec 24, 2007 4:29 PM, Michael Gang wrote: >>>> OK, >>>> I made four changes, >>>> in the package org.biojava.bio.program.sax; at class >>>> BlastSaxParser >>>> 1) at line 86 i added the variable >>>> private String >>>> oQueryLength; >>>> 2) at the method private void interpret(String poLine) throws >>>> SAXException >>>> in the if "if (iState == IN_HEADER) {" >>>> at line 209 i added >>>> >>>> if (poLine.startsWith("(", 9) && poLine.endsWith("letters)") ) { >>>> StringTokenizer st = new StringTokenizer(poLine); >>>> oQueryLength = st.nextToken().substring(1); >>>> } >>>> 3)at the function private void emitHeaderIds() throws >>>> SAXException { >>>> at line 564 i added >>>> oAttQName.setQName("queryLength"); >>>> oAtts.addAttribute(oAttQName.getURI(), >>>> oAttQName.getLocalName(), >>>> oAttQName.getQName(), >>>> "CDATA", oQueryLength); >>>> >>>> at the package org.biojava.bio.program.ssbind; in >>>> HeaderStAXHandler.java >>>> 4)at the private class QueryIDStAXHandler at line 95 I changed the >>>> method startelement >>>> >>>> public void startElement(String uri, >>>> String localName, >>>> String qName, >>>> Attributes attr, >>>> DelegationManager dm) >>>> throws SAXException >>>> { >>>> >>>> ssContext >>>> .getSearchContentHandler().setQueryID(attr.getValue("id")); >>>> if (attr.getValue("queryLength") != null) >>>> { >>>> >>>> ssContext >>>> .getSearchContentHandler().addSearchProperty("queryLength", >>>> attr.getValue("queryLength")); >>>> } >>>> } >>>> } >>>> >>>> Now query length is a property of the annotation of a blast >>>> result. >>>> It is really fun to participate in the biojava project. >>>> >>>> Best regards, >>>> Michael >>>> >>>> >>>> On Dec 24, 2007 2:32 AM, Mark Schreiber >>>> wrote: >>>>> Hi - >>>>> >>>>> We are currently merging the code base into subversion (from CVS) >>>>> after this it will be possible to check in code again. For small >>>>> additions it is usually easier to post the code to the dev list >>>>> (in >>>>> the body of the email as the list doesn't like attachments) or >>>>> send it >>>>> to one of the regular committers and get them to add it. >>>>> >>>>> The JUnit tests are the standard test package. If you have added >>>>> new >>>>> functionality it would be a good idea to add another test method >>>>> in >>>>> the appropriate JUnit test to make sure it works (and continues to >>>>> work in the future). >>>>> >>>>> - Mark >>>>> >>>>> >>>>> On Dec 23, 2007 11:22 PM, Michael Gang >>>>> wrote: >>>>>> Hi all, >>>>>> >>>>>> I've now added the extraction of the query length. >>>>>> Can someone explain me the procedure of checking in code to >>>>>> biojava ? >>>>>> I ran the unit tests in the biojava distribution? Are there >>>>>> additional >>>>>> tests available ? >>>>>> >>>>>> Best regards, >>>>>> Michael >>>>>> >>>>>> >>>>>> On Dec 21, 2007 9:59 AM, Mark Schreiber >>>>>> wrote: >>>>>>> Hi - >>>>>>> >>>>>>> It is not required that you turn all Blast results into objects, >>>>>>> because it is an event based parser you can do what you want >>>>>>> with the >>>>>>> events including turning them into objects or echoing them to >>>>>>> STDOUT. >>>>>>> Take a look at the examples in the cookbook. >>>>>>> >>>>>>> It may be that the query length is actually parsed but is not >>>>>>> passed >>>>>>> onto the object model by the event listeners. >>>>>>> >>>>>>> - Mark >>>>>>> >>>>>>> >>>>>>> On Dec 21, 2007 12:15 AM, Andreas Prlic >>>>>>> wrote: >>>>>>>> Hi Michael, >>>>>>>> >>>>>>>> The blast parser (BlastLikeSaxParser) in BioJava has been >>>>>>>> around for >>>>>>>> a while and is frequently being used to parse a variety >>>>>>>> of different blast outputs. Still it is not complete and can >>>>>>>> not >>>>>>>> parse PSI blast. We have had a number of request about it >>>>>>>> lately >>>>>>>> so I suppose it needs a little maintenance now. >>>>>>>> >>>>>>>> To write a new blast parser from scratch will involve a >>>>>>>> significant >>>>>>>> amount of time. It will take time to fix all the bugs, add >>>>>>>> support >>>>>>>> for the different blast versions and write documentation. >>>>>>>> Much of >>>>>>>> this is already available in BioJava, so I would prefer if >>>>>>>> you could >>>>>>>> submit patches for >>>>>>>> the current blast parser. Would you also be interested to >>>>>>>> collaborate in this direction? >>>>>>>> Another feature that would be nice to add support for is the >>>>>>>> possibility to send off blast searches to webservices... >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Andreas >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 20 Dec 2007, at 12:54, Michael Gang wrote: >>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> I used the interface of the java blast parser. >>>>>>>>> I had mainly two problems with it: >>>>>>>>> 1) The blast parser does not parse all the information (for >>>>>>>>> example >>>>>>>>> query length) >>>>>>>>> 2) The blast parser parses the whole blast report into a >>>>>>>>> list which >>>>>>>>> eats a lot of memory. >>>>>>>>> >>>>>>>>> I would be interested to write and contribute a blast parser >>>>>>>>> which >>>>>>>>> parses all the information of the blast and parses the blast >>>>>>>>> iteratively. >>>>>>>>> Something like the following code in bioperl (just in Java). >>>>>>>>> use Bio::SearchIO; >>>>>>>>> # format can be 'fasta', 'blast' >>>>>>>>> my $searchio = new Bio::SearchIO( -format => 'blastxml', >>>>>>>>> -file => >>>>>>>>> 'blastout.xml' ); >>>>>>>>> while ( my $result = $searchio->next_result() ) { >>>>>>>>> while( my $hit = $result->next_hit ) { >>>>>>>>> # process the Bio::Search::Hit::HitI object >>>>>>>>> while( my $hsp = $hit->next_hsp ) { >>>>>>>>> # process the Bio::Search::HSP::HSPI object >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> Would you be interested in such a contribution ? >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Michael >>>>>>>>> _______________________________________________ >>>>>>>>> biojava-dev mailing list >>>>>>>>> biojava-dev at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>>>> >>>>>>>> ----------------------------------------------------------------------- >>>>>>>> >>>>>>>> Andreas Prlic Wellcome Trust Sanger Institute >>>>>>>> Hinxton, Cambridge CB10 1SA, UK >>>>>>>> +44 (0) 1223 49 6891 >>>>>>>> >>>>>>>> ----------------------------------------------------------------------- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> The Wellcome Trust Sanger Institute is operated by Genome >>>>>>>> Research >>>>>>>> Limited, a charity registered in England with number 1021457 >>>>>>>> and a >>>>>>>> company registered in England with number 2742969, whose >>>>>>>> registered >>>>>>>> office is 215 Euston Road, London, NW1 2BE. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> biojava-dev mailing list >>>>>>>> biojava-dev at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> biojava-dev mailing list >>>>>> biojava-dev at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>> >>>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> ----------------------------------------------------------------------- >> >> Andreas Prlic Wellcome Trust Sanger Institute >> Hinxton, Cambridge CB10 1SA, UK >> +44 (0) 1223 49 6891 >> >> ----------------------------------------------------------------------- >> >> >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome >> ResearchLimited, a charity registered in England with number >> 1021457 and acompany registered in England with number 2742969, >> whose registeredoffice is 215 Euston Road, London, NW1 2BE. > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From chris at bioteam.net Wed Dec 26 11:57:03 2007 From: chris at bioteam.net (Chris Dagdigian) Date: Wed, 26 Dec 2007 11:57:03 -0500 Subject: [Biojava-dev] bioperl like blastparser In-Reply-To: References: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> <93b45ca50712202359o2886762bk5f2e26d214014d95@mail.gmail.com> <6994d82b0712230722h7a7d3da1vb694a96740fc8e93@mail.gmail.com> <93b45ca50712231632s51ffe0d4v861a5d16abb536e9@mail.gmail.com> <6994d82b0712240029o14dabb3at241f49361681345a@mail.gmail.com> <93b45ca50712251344t2a27ffc3o5d6e7defee116326@mail.gmail.com> Message-ID: Andreas, I can delegate this to you if you are comfortable doing it. Let me know specifically what you need -- elevated access via the sudoers file? Regards, Chris On Dec 25, 2007, at 6:42 PM, Andreas Prlic wrote: > Hi Mark, > > Unfortunately the biojava svn respository is not ready yet. > > George has converted our CVS to an initial svn dump, which I tested > and fixed some details. > This dump has been ready since dezember 17th. - ( see dev.open- > bio.org:~andreas/biojava-final.svndump.bz2 ) > The next step is to load this into the public open-bio repository, > after which (and some more testing) the new biojava repository > would be ready for new commits. > > At the present I am waiting for somebody who has admin rights on the > open-bio servers to do these final steps. > (or to delegate and give permissions to somebody else). > > I tried to contact support at open-bio, root-l, as well as mailing > several people directly, > but so far I did not get a response. could be that the holiday > season is slowing response times down... > > Andreas From ap3 at sanger.ac.uk Wed Dec 26 18:29:56 2007 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Wed, 26 Dec 2007 23:29:56 +0000 Subject: [Biojava-dev] Biojava - svn migration was : bioperl like blastparser In-Reply-To: <6721E1DC-5BE9-4CE0-BEF5-1D998F4BF0C4@bioperl.org> References: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> <93b45ca50712202359o2886762bk5f2e26d214014d95@mail.gmail.com> <6994d82b0712230722h7a7d3da1vb694a96740fc8e93@mail.gmail.com> <93b45ca50712231632s51ffe0d4v861a5d16abb536e9@mail.gmail.com> <6994d82b0712240029o14dabb3at241f49361681345a@mail.gmail.com> <93b45ca50712251344t2a27ffc3o5d6e7defee116326@mail.gmail.com> <6721E1DC-5BE9-4CE0-BEF5-1D998F4BF0C4@bioperl.org> Message-ID: <6DE4F8BC-D0C6-4AD8-8DAA-F3470CD3E42F@sanger.ac.uk> > You just need to put the repositor(ies) in > /home/svn-repositories/biojava Thanks for the info. I now have the new biojava svn repository for developers running and it is possible to check out (and do commits) via svn co svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/ biojava-svn/biojava-live/trunk/ ./biojava-svn I am just running final tests to see if all is fine. Access should work for other biojava developers as well. For the anonymous access - who will set this up? I assume there will be a commit hook in the developers repository which will do a svnsync with the anonymous repository? Andreas > > anyone in the biojava group can write there. > you'll want to delete the existing biojava-live that is in there. > > I'm traveling most of 26th and will be on vacation most of the > week, but will check in when I have a chance. > > -jason > > On Dec 25, 2007, at 3:42 PM, Andreas Prlic wrote: > >> Hi Mark, >> >> Unfortunately the biojava svn respository is not ready yet. >> >> George has converted our CVS to an initial svn dump, which I >> tested and fixed some details. >> This dump has been ready since dezember 17th. - ( see dev.open- >> bio.org:~andreas/biojava-final.svndump.bz2 ) >> The next step is to load this into the public open-bio repository, >> after which (and some more testing) the new biojava repository >> would be ready for new commits. >> >> At the present I am waiting for somebody who has admin rights on >> the open-bio servers to do these final steps. >> (or to delegate and give permissions to somebody else). >> >> I tried to contact support at open-bio, root-l, as well as mailing >> several people directly, >> but so far I did not get a response. could be that the holiday >> season is slowing response times down... >> >> Andreas >> >> >> >> On 25 Dec 2007, at 21:44, Mark Schreiber wrote: >> >>> Hi - >>> >>> When will the subversion system be ready for checkin? >>> >>> - Mark >>> >>> On Dec 24, 2007 4:29 PM, Michael Gang wrote: >>>> OK, >>>> I made four changes, >>>> in the package org.biojava.bio.program.sax; at class >>>> BlastSaxParser >>>> 1) at line 86 i added the variable >>>> private String >>>> oQueryLength; >>>> 2) at the method private void interpret(String poLine) throws >>>> SAXException >>>> in the if "if (iState == IN_HEADER) {" >>>> at line 209 i added >>>> >>>> if (poLine.startsWith("(", 9) && poLine.endsWith("letters)") ) { >>>> StringTokenizer st = new StringTokenizer(poLine); >>>> oQueryLength = st.nextToken().substring(1); >>>> } >>>> 3)at the function private void emitHeaderIds() throws >>>> SAXException { >>>> at line 564 i added >>>> oAttQName.setQName("queryLength"); >>>> oAtts.addAttribute(oAttQName.getURI(), >>>> oAttQName.getLocalName(), >>>> oAttQName.getQName(), >>>> "CDATA", oQueryLength); >>>> >>>> at the package org.biojava.bio.program.ssbind; in >>>> HeaderStAXHandler.java >>>> 4)at the private class QueryIDStAXHandler at line 95 I changed the >>>> method startelement >>>> >>>> public void startElement(String uri, >>>> String localName, >>>> String qName, >>>> Attributes attr, >>>> DelegationManager dm) >>>> throws SAXException >>>> { >>>> ssContext.getSearchContentHandler().setQueryID >>>> (attr.getValue("id")); >>>> if (attr.getValue("queryLength") != null) >>>> { >>>> ssContext.getSearchContentHandler >>>> ().addSearchProperty("queryLength", >>>> attr.getValue("queryLength")); >>>> } >>>> } >>>> } >>>> >>>> Now query length is a property of the annotation of a blast >>>> result. >>>> It is really fun to participate in the biojava project. >>>> >>>> Best regards, >>>> Michael >>>> >>>> >>>> On Dec 24, 2007 2:32 AM, Mark Schreiber >>>> wrote: >>>>> Hi - >>>>> >>>>> We are currently merging the code base into subversion (from CVS) >>>>> after this it will be possible to check in code again. For small >>>>> additions it is usually easier to post the code to the dev list >>>>> (in >>>>> the body of the email as the list doesn't like attachments) or >>>>> send it >>>>> to one of the regular committers and get them to add it. >>>>> >>>>> The JUnit tests are the standard test package. If you have >>>>> added new >>>>> functionality it would be a good idea to add another test >>>>> method in >>>>> the appropriate JUnit test to make sure it works (and continues to >>>>> work in the future). >>>>> >>>>> - Mark >>>>> >>>>> >>>>> On Dec 23, 2007 11:22 PM, Michael Gang >>>>> wrote: >>>>>> Hi all, >>>>>> >>>>>> I've now added the extraction of the query length. >>>>>> Can someone explain me the procedure of checking in code to >>>>>> biojava ? >>>>>> I ran the unit tests in the biojava distribution? Are there >>>>>> additional >>>>>> tests available ? >>>>>> >>>>>> Best regards, >>>>>> Michael >>>>>> >>>>>> >>>>>> On Dec 21, 2007 9:59 AM, Mark Schreiber >>>>>> wrote: >>>>>>> Hi - >>>>>>> >>>>>>> It is not required that you turn all Blast results into objects, >>>>>>> because it is an event based parser you can do what you want >>>>>>> with the >>>>>>> events including turning them into objects or echoing them to >>>>>>> STDOUT. >>>>>>> Take a look at the examples in the cookbook. >>>>>>> >>>>>>> It may be that the query length is actually parsed but is not >>>>>>> passed >>>>>>> onto the object model by the event listeners. >>>>>>> >>>>>>> - Mark >>>>>>> >>>>>>> >>>>>>> On Dec 21, 2007 12:15 AM, Andreas Prlic >>>>>>> wrote: >>>>>>>> Hi Michael, >>>>>>>> >>>>>>>> The blast parser (BlastLikeSaxParser) in BioJava has been >>>>>>>> around for >>>>>>>> a while and is frequently being used to parse a variety >>>>>>>> of different blast outputs. Still it is not complete and can >>>>>>>> not >>>>>>>> parse PSI blast. We have had a number of request about it >>>>>>>> lately >>>>>>>> so I suppose it needs a little maintenance now. >>>>>>>> >>>>>>>> To write a new blast parser from scratch will involve a >>>>>>>> significant >>>>>>>> amount of time. It will take time to fix all the bugs, add >>>>>>>> support >>>>>>>> for the different blast versions and write documentation. >>>>>>>> Much of >>>>>>>> this is already available in BioJava, so I would prefer if >>>>>>>> you could >>>>>>>> submit patches for >>>>>>>> the current blast parser. Would you also be interested to >>>>>>>> collaborate in this direction? >>>>>>>> Another feature that would be nice to add support for is the >>>>>>>> possibility to send off blast searches to webservices... >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Andreas >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 20 Dec 2007, at 12:54, Michael Gang wrote: >>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> I used the interface of the java blast parser. >>>>>>>>> I had mainly two problems with it: >>>>>>>>> 1) The blast parser does not parse all the information (for >>>>>>>>> example >>>>>>>>> query length) >>>>>>>>> 2) The blast parser parses the whole blast report into a >>>>>>>>> list which >>>>>>>>> eats a lot of memory. >>>>>>>>> >>>>>>>>> I would be interested to write and contribute a blast >>>>>>>>> parser which >>>>>>>>> parses all the information of the blast and parses the blast >>>>>>>>> iteratively. >>>>>>>>> Something like the following code in bioperl (just in Java). >>>>>>>>> use Bio::SearchIO; >>>>>>>>> # format can be 'fasta', 'blast' >>>>>>>>> my $searchio = new Bio::SearchIO( -format => 'blastxml', >>>>>>>>> -file => >>>>>>>>> 'blastout.xml' ); >>>>>>>>> while ( my $result = $searchio->next_result() ) { >>>>>>>>> while( my $hit = $result->next_hit ) { >>>>>>>>> # process the Bio::Search::Hit::HitI object >>>>>>>>> while( my $hsp = $hit->next_hsp ) { >>>>>>>>> # process the Bio::Search::HSP::HSPI object >>>>>>>>> } >>>>>>>>> } >>>>>>>>> >>>>>>>>> Would you be interested in such a contribution ? >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Michael >>>>>>>>> _______________________________________________ >>>>>>>>> biojava-dev mailing list >>>>>>>>> biojava-dev at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>>>> >>>>>>>> --------------------------------------------------------------- >>>>>>>> -------- >>>>>>>> >>>>>>>> Andreas Prlic Wellcome Trust Sanger Institute >>>>>>>> Hinxton, Cambridge CB10 1SA, UK >>>>>>>> +44 (0) 1223 49 6891 >>>>>>>> >>>>>>>> --------------------------------------------------------------- >>>>>>>> -------- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> The Wellcome Trust Sanger Institute is operated by Genome >>>>>>>> Research >>>>>>>> Limited, a charity registered in England with number >>>>>>>> 1021457 and a >>>>>>>> company registered in England with number 2742969, whose >>>>>>>> registered >>>>>>>> office is 215 Euston Road, London, NW1 2BE. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> biojava-dev mailing list >>>>>>>> biojava-dev at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> biojava-dev mailing list >>>>>> biojava-dev at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>>>> >>>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> --------------------------------------------------------------------- >> -- >> >> Andreas Prlic Wellcome Trust Sanger Institute >> Hinxton, Cambridge CB10 1SA, UK >> +44 (0) 1223 49 6891 >> >> --------------------------------------------------------------------- >> -- >> >> >> >> >> -- >> The Wellcome Trust Sanger Institute is operated by Genome >> ResearchLimited, a charity registered in England with number >> 1021457 and acompany registered in England with number 2742969, >> whose registeredoffice is 215 Euston Road, London, NW1 2BE. > ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From ap3 at sanger.ac.uk Thu Dec 27 03:36:32 2007 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Thu, 27 Dec 2007 08:36:32 +0000 Subject: [Biojava-dev] Biojava - svn migration In-Reply-To: <97DEB521-7692-4F35-B149-373070B18EF1@uiuc.edu> References: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> <93b45ca50712202359o2886762bk5f2e26d214014d95@mail.gmail.com> <6994d82b0712230722h7a7d3da1vb694a96740fc8e93@mail.gmail.com> <93b45ca50712231632s51ffe0d4v861a5d16abb536e9@mail.gmail.com> <6994d82b0712240029o14dabb3at241f49361681345a@mail.gmail.com> <93b45ca50712251344t2a27ffc3o5d6e7defee116326@mail.gmail.com> <6721E1DC-5BE9-4CE0-BEF5-1D998F4BF0C4@bioperl.org> <6DE4F8BC-D0C6-4AD8-8DAA-F3470CD3E42F@sanger.ac.uk> <36C67E02-5A8A-4694-BC77-28A9BEA97AFD@duke.edu> <97DEB521-7692-4F35-B149-373070B18EF1@uiuc.edu> Message-ID: > > I think the root URL is supposed to be /home/svn-repositories/ > biojava/biojava-live, which is what was set up before. That would > keep in line with having bio*-related subprojects in the same > directory (bioperl-db would be /home/svn-repositories/bioperl/ > bioperl-db, for instance). > > Andreas, any reason not to change the name to the above (does it > have to do with the BDB setup)? the current root level of the svn repository is /home/svn- repositories/biojava/biojava-svn which could be shortened to /home/svn/biojava/ below this there are sub-directories for all biojava - projects: .../biojava-live is the main biojava project .../biojava-ensj is the ensembl java api , etc. then the next step in the hierarchy is that each project is organized into the /trunk /branches and /tags directories that is recommended to be used with svn. so the shortest possible path that could be provided for the main developmental trunk of biojava is: /home/svn/biojava/biojava-live/trunk/ chris dag.: if I should set up this directory structure, please add me to the sudoers file. >>> >>> For the anonymous access - who will set this up? I assume there >>> will be a commit hook in the developers repository which will do >>> a svnsync with the anonymous repository? >>> > > Don't know that one; it will be the next step I'm assuming. Not > sure if we're keeping a sync'ed read-only CVS either as it appears > to be overly problematic. I see. So what are the other options to provide anonymous access? Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From markjschreiber at gmail.com Thu Dec 27 20:48:14 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 28 Dec 2007 09:48:14 +0800 Subject: [Biojava-dev] Biojava - svn migration In-Reply-To: <011016B2-FA19-4A33-BC95-2C230A658D29@duke.edu> References: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> <6DE4F8BC-D0C6-4AD8-8DAA-F3470CD3E42F@sanger.ac.uk> <36C67E02-5A8A-4694-BC77-28A9BEA97AFD@duke.edu> <97DEB521-7692-4F35-B149-373070B18EF1@uiuc.edu> <5EFA6EC9-E279-48A6-98C2-4DE224F0A817@duke.edu> <9C4CB1D4-EC49-4CE1-84FE-F1AFAAD217D0@uiuc.edu> <84BD3DC3-06A4-4176-84F3-9C540B4015F4@bioperl.org> <011016B2-FA19-4A33-BC95-2C230A658D29@duke.edu> Message-ID: <93b45ca50712271748x3019ce27m2d45008c8ce13ece@mail.gmail.com> Hi all - I am trying to do a SVN check out with Netbeans but the connection just seems to hang without doing anything (same happens with command line?). I am using the path specified in the biojava wiki but the conversation below suggests that the actual path may be different. What then is the 'home' path? Also, does the SSH+SVN command need to be prepended by an SSH command (or plink -ssh on windows)? - Mark On Dec 28, 2007 5:37 AM, Hilmar Lapp wrote: > I see. Makes sense. -hilmar > > > On Dec 27, 2007, at 4:30 PM, Jason Stajich wrote: > > > My idea was that just like > > /home/reposiitory/biojava > > we'd put the SVN in > > /home/svn-repository/biojava > > > > So each of the biojava SVN sub-projects ought would be > > /home/svn-repository/biojava/biojava-live > > /home/svn-repository/biojava/biojava-ensj > > > > /home/svn is really the homedir for the svn user and som utility > > stuff like the svn login passwords so I think it is better not to > > put the repos in there. > > > > -jason > > On Dec 27, 2007, at 1:31 PM, Hilmar Lapp wrote: > > > >> > >> On Dec 27, 2007, at 12:32 PM, Chris Fields wrote: > >> > >>> > >>> I agree, but there is already a /home/svn directory which appears > >>> related to blipkit. We would need to move the blipkit stuff into > >>> it's own subdir and go from there. > >>> > >> > >> > >> I.e., it was set up such that blipkit would be the only project > >> with in it? I'd assume that was by mistake. Also, the last update > >> that I have is that blipkit hasn't been active for a while, though > >> I'm not sure. > >> > >> ChrisM - do you recall the decisions leading to the blipkit svn > >> setup? Could it be moved to a subdirectory as ChrisF suggests? > >> > >> -hilmar > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : > >> =========================================================== > >> > >> > >> > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : > =========================================================== > > > > From heuermh at acm.org Thu Dec 27 23:34:54 2007 From: heuermh at acm.org (Michael Heuer) Date: Thu, 27 Dec 2007 23:34:54 -0500 (EST) Subject: [Biojava-dev] Biojava - svn migration In-Reply-To: <93b45ca50712271748x3019ce27m2d45008c8ce13ece@mail.gmail.com> Message-ID: Hello Mark, Andreas I was able to check out on linux with commandline subversion and in eclipse with the following URL: svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk michael On Fri, 28 Dec 2007, Mark Schreiber wrote: > Hi all - > > I am trying to do a SVN check out with Netbeans but the connection > just seems to hang without doing anything (same happens with command > line?). I am using the path specified in the biojava wiki but the > conversation below suggests that the actual path may be different. > What then is the 'home' path? > > Also, does the SSH+SVN command need to be prepended by an SSH command > (or plink -ssh on windows)? > > - Mark > > On Dec 28, 2007 5:37 AM, Hilmar Lapp wrote: > > I see. Makes sense. -hilmar > > > > > > On Dec 27, 2007, at 4:30 PM, Jason Stajich wrote: > > > > > My idea was that just like > > > /home/reposiitory/biojava > > > we'd put the SVN in > > > /home/svn-repository/biojava > > > > > > So each of the biojava SVN sub-projects ought would be > > > /home/svn-repository/biojava/biojava-live > > > /home/svn-repository/biojava/biojava-ensj > > > > > > /home/svn is really the homedir for the svn user and som utility > > > stuff like the svn login passwords so I think it is better not to > > > put the repos in there. > > > > > > -jason > > > On Dec 27, 2007, at 1:31 PM, Hilmar Lapp wrote: > > > > > >> > > >> On Dec 27, 2007, at 12:32 PM, Chris Fields wrote: > > >> > > >>> > > >>> I agree, but there is already a /home/svn directory which appears > > >>> related to blipkit. We would need to move the blipkit stuff into > > >>> it's own subdir and go from there. > > >>> > > >> > > >> > > >> I.e., it was set up such that blipkit would be the only project > > >> with in it? I'd assume that was by mistake. Also, the last update > > >> that I have is that blipkit hasn't been active for a while, though > > >> I'm not sure. > > >> > > >> ChrisM - do you recall the decisions leading to the blipkit svn > > >> setup? Could it be moved to a subdirectory as ChrisF suggests? > > >> > > >> -hilmar > > >> -- > > >> =========================================================== > > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : > > >> =========================================================== > > >> > > >> > > >> > > > > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at duke dot edu : > > =========================================================== > > > > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From hlapp at duke.edu Wed Dec 26 18:41:54 2007 From: hlapp at duke.edu (Hilmar Lapp) Date: Wed, 26 Dec 2007 18:41:54 -0500 Subject: [Biojava-dev] Biojava - svn migration was : bioperl like blastparser In-Reply-To: <6DE4F8BC-D0C6-4AD8-8DAA-F3470CD3E42F@sanger.ac.uk> References: <6994d82b0712200454r743db6bas248779885dd45f7c@mail.gmail.com> <93b45ca50712202359o2886762bk5f2e26d214014d95@mail.gmail.com> <6994d82b0712230722h7a7d3da1vb694a96740fc8e93@mail.gmail.com> <93b45ca50712231632s51ffe0d4v861a5d16abb536e9@mail.gmail.com> <6994d82b0712240029o14dabb3at241f49361681345a@mail.gmail.com> <93b45ca50712251344t2a27ffc3o5d6e7defee116326@mail.gmail.com> <6721E1DC-5BE9-4CE0-BEF5-1D998F4BF0C4@bioperl.org> <6DE4F8BC-D0C6-4AD8-8DAA-F3470CD3E42F@sanger.ac.uk> Message-ID: <36C67E02-5A8A-4694-BC77-28A9BEA97AFD@duke.edu> On Dec 26, 2007, at 6:29 PM, Andreas Prlic wrote: > >> You just need to put the repositor(ies) in >> /home/svn-repositories/biojava > > Thanks for the info. I now have the new biojava svn repository for > developers running Great, congrats! > and it is possible to check out (and do commits) via > > svn co svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/ > biojava-svn/biojava-live/trunk/ ./biojava-svn Is this the directory structure template we should all mirror for the different projects? I suppose some consistency isn't bad ... The URL looks awfully long - did we choose not to use /home/svn (instead of /home/svn-repositories), and is the intervening 'biojava- svn directory needed? I.e., is root URL of the repository at /home/svn-repositories/biojava/ biojava-svn/biojava-live/, or /home/svn-repositories/biojava/ ? -hilmar > > I am just running final tests to see if all is fine. Access should > work for other biojava developers as well. > > > For the anonymous access - who will set this up? I assume there > will be a commit hook in the developers repository which will do a > svnsync with the anonymous repository? > > Andreas > > > > > >> >> anyone in the biojava group can write there. >> you'll want to delete the existing biojava-live that is in there. >> >> I'm traveling most of 26th and will be on vacation most of the >> week, but will check in when I have a chance. >> >> -jason >> >> On Dec 25, 2007, at 3:42 PM, Andreas Prlic wrote: >> >>> Hi Mark, >>> >>> Unfortunately the biojava svn respository is not ready yet. >>> >>> George has converted our CVS to an initial svn dump, which I >>> tested and fixed some details. >>> This dump has been ready since dezember 17th. - ( see dev.open- >>> bio.org:~andreas/biojava-final.svndump.bz2 ) >>> The next step is to load this into the public open-bio >>> repository, after which (and some more testing) the new biojava >>> repository would be ready for new commits. >>> >>> At the present I am waiting for somebody who has admin rights on >>> the open-bio servers to do these final steps. >>> (or to delegate and give permissions to somebody else). >>> >>> I tried to contact support at open-bio, root-l, as well as mailing >>> several people directly, >>> but so far I did not get a response. could be that the holiday >>> season is slowing response times down... >>> >>> Andreas >>> >>> >>> >>> On 25 Dec 2007, at 21:44, Mark Schreiber wrote: >>> >>>> Hi - >>>> >>>> When will the subversion system be ready for checkin? >>>> >>>> - Mark >>>> >>>> On Dec 24, 2007 4:29 PM, Michael Gang >>>> wrote: >>>>> OK, >>>>> I made four changes, >>>>> in the package org.biojava.bio.program.sax; at class >>>>> BlastSaxParser >>>>> 1) at line 86 i added the variable >>>>> private String >>>>> oQueryLength; >>>>> 2) at the method private void interpret(String poLine) throws >>>>> SAXException >>>>> in the if "if (iState == IN_HEADER) {" >>>>> at line 209 i added >>>>> >>>>> if (poLine.startsWith("(", 9) && poLine.endsWith("letters)") ) { >>>>> StringTokenizer st = new StringTokenizer(poLine); >>>>> oQueryLength = st.nextToken().substring(1); >>>>> } >>>>> 3)at the function private void emitHeaderIds() throws >>>>> SAXException { >>>>> at line 564 i added >>>>> oAttQName.setQName("queryLength"); >>>>> oAtts.addAttribute(oAttQName.getURI(), >>>>> oAttQName.getLocalName(), >>>>> oAttQName.getQName(), >>>>> "CDATA", oQueryLength); >>>>> >>>>> at the package org.biojava.bio.program.ssbind; in >>>>> HeaderStAXHandler.java >>>>> 4)at the private class QueryIDStAXHandler at line 95 I changed the >>>>> method startelement >>>>> >>>>> public void startElement(String uri, >>>>> String localName, >>>>> String qName, >>>>> Attributes attr, >>>>> DelegationManager dm) >>>>> throws SAXException >>>>> { >>>>> ssContext.getSearchContentHandler().setQueryID >>>>> (attr.getValue("id")); >>>>> if (attr.getValue("queryLength") != null) >>>>> { >>>>> ssContext.getSearchContentHandler >>>>> ().add