From chapmanb at 50mail.com Fri Apr 2 09:07:06 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 2 Apr 2010 09:07:06 -0400 Subject: [Biojava-dev] BOSC and OpenBio solution challenge reminder -- April 15th Message-ID: <20100402130706.GJ36623@sobchak.mgh.harvard.edu> Hello all; A friendly reminder that the deadline for the Bioinformatics Open Source Conference (BOSC) is coming up on April 15th: http://www.open-bio.org/wiki/BOSC_2010 This is a great opportunity to discuss code and biology with fellow developers. One session which I'd like to emphasize is the OpenBio Solution Challenge, a section of talks that describes how to solve practical problems in bioinformatics using a variety of approaches: http://www.open-bio.org/wiki/SolutionChallenge Any toolkit developers who are interested in giving a talk are encouraged to submit an abstract for the challenge. We have some initial project ideas on the page and welcome your feedback for other useful workflows that would emphasize the advantages of using open source toolkits to solve biological problems. Please copy messages to the OpenBio mailing list as a central point for discussion and questions: http://lists.open-bio.org/mailman/listinfo/open-bio-l Looking forward to seeing everyone in July, Brad BOSC contact and dates: Date: July 9-10, 2010 Location: Boston, Massachusetts, USA BOSC 2010 web site: http://www.open-bio.org/wiki/BOSC_2010 Abstract submission via Open Conference System site: http://events.open-bio.org/BOSC2010/openconf.php E-mail: bosc at open-bio.org Bosc-announce list: http://lists.open-bio.org/mailman/listinfo/bosc-announce Important Dates April 15: Abstract deadline May 5: Notification of accepted abstracts May 28: Early Registration Discount Cut-off date July 8-9: Codefest 2010 July 9-10: BOSC 2010 August 15: Manuscript deadline for BOSC 2010 Proceedings published in BMC Bioinformatics From andreas at sdsc.edu Fri Apr 2 13:25:39 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 2 Apr 2010 10:25:39 -0700 Subject: [Biojava-dev] BOSC Message-ID: Hi, who is going to BOSC this year and who wants to present a BioJava talk? Andreas -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From holland at eaglegenomics.com Fri Apr 2 15:37:06 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 2 Apr 2010 20:37:06 +0100 Subject: [Biojava-dev] BOSC In-Reply-To: References: Message-ID: I will be there but for various reasons I can't talk this year. On 2 Apr 2010, at 18:25, Andreas Prlic wrote: > Hi, > > who is going to BOSC this year and who wants to present a BioJava talk? > > Andreas > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From heuermh at acm.org Fri Apr 2 23:23:15 2010 From: heuermh at acm.org (Michael Heuer) Date: Fri, 2 Apr 2010 23:23:15 -0400 (EDT) Subject: [Biojava-dev] BOSC In-Reply-To: Message-ID: Andreas Prlic wrote: > who is going to BOSC this year and who wants to present a BioJava talk? I will be there. I'm probably not the best person to present this time around. michael From aradwen at gmail.com Sat Apr 3 06:18:40 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Sat, 3 Apr 2010 12:18:40 +0200 Subject: [Biojava-dev] Protein sequence composition Message-ID: Hello, I'm writing an application that treats protein sequences, and I am using Biojava for a couple of things. One of these processings is to parse protein multifasta files, and treat the sequences one after the other. One of my purposes is to calculate composition. By composition I mean that I am interested to know in a given protein sequence what is the mean and the standard deviation composition of these groups : PAGST EDNQ LIVM KRH C example : protein fasta file : >SEQ1 DVSFRLSGATSSSYGVFISNLRKALPNERKLYDIPLLRSSLPGSQRYALI HLTNYADETISVAIDVTNVYIMGYRAGDTSYFFNEASATEAAKYVFKDAM RKVTLPYSGNYERLQTAAGKIRENIPLGLPALDSAITTLFYYNANSAASA LMVLIQSTSEAARYKFIEQQIGKRVDKTFLPSLAIISLENSWSALSKQIQ IASTNNGQFESPVVLINAQNQRVTITNVDAGVVTSNIALLLNRNNMA >SEQ2 IFPKQYPIINFTTAGATVQSYTNFIRAVRGRLTTGADVRHEIPVLPNRVG LPINQRFILVELSNHAELSVTLALDVTNAYVVGYRAGNSAYFFHPDNQED AEAITHLFTDVQNRYTFAFGGNYDRLEQLAGNLRENIELGNGPLEEAISA LYYYSTGGTQLPTLARSFIICIQMISEAARFQYIEGEMRTRIRYNRRSAP DPSVITLENSWGRLSTAIQESNQGAFASPIQLQRRNGSKFSVYDVSILIP IIALMVYRCAPPPSSQF I would like to 1/ parse SEQ1 to calculate the composition mean of PAGST residues for example ( number of residus/ length of the sequence) 2/ do same thing for SEQ2 3 / return the average mean of both sequences 4/ Return standard deviation of these values. I can do it writing a standard java code, but I would like to know (as I am using biojava already) if this is possible or not ( Which class / instances to use) Cheers From chapman at cs.wisc.edu Sat Apr 3 09:08:23 2010 From: chapman at cs.wisc.edu (Mark Chapman) Date: Sat, 03 Apr 2010 08:08:23 -0500 Subject: [Biojava-dev] Protein sequence composition In-Reply-To: References: Message-ID: <4BB73DC7.2020000@cs.wisc.edu> Hi Radwen, The example below solves most of what you asked for. It may not be the most elegant solution, but it should get you started in the right direction. Saving the proteins into sample.fasta and running the following command: > java ProteinComposition sample.fasta PAGST EDNQ LIVM KRH C produces the output: SEQ1 247: 0.36032388 0.19433199 0.25506073 0.097165994 0.0 SEQ2 267: 0.3445693 0.20224719 0.23595506 0.101123594 0.007490637 Take care, Mark -- ProteinComposition.java -- import java.io.*; import java.util.NoSuchElementException; import org.biojava.bio.BioException; import org.biojava.bio.seq.*; import org.biojava.bio.seq.db.SequenceDB; import org.biojava.bio.seq.io.SeqIOTools; import org.biojava.bio.symbol.*; @SuppressWarnings("deprecation") public class ProteinComposition { /** * Determines the composition of proteins in a Fasta file. * @param args ... * : file name of the Fasta file * : group(s) of one or more amino acid residues, statistics are printed out for each group */ public static void main(String[] args) { try { // load Fasta file into memory BufferedInputStream is = new BufferedInputStream(new FileInputStream(args[0])); Alphabet alpha = AlphabetManager.alphabetForName("PROTEIN"); SequenceDB db = SeqIOTools.readFasta(is, alpha); // load command line arguments into memory SymbolList[] res = new SymbolList[args.length-1]; for (int a = 1; a < args.length; a++) res[a-1] = ProteinTools.createProtein(args[a]); // store length and composition of each protein int[] lengths = new int[db.ids().size()]; int[][] counts = new int[lengths.length][res.length]; float[][] means = new float[lengths.length][res.length]; // iterate over proteins in Fasta file SequenceIterator sI = db.sequenceIterator(); for (int s = 0; sI.hasNext(); s++) { Sequence seq = sI.nextSequence(); lengths[s] = seq.length(); // iterate over each amino acid for (Object sr : seq.toList()) // check for amino acid in each residue group for (int a = 1; a < args.length; a++) // iterate over each residue in group for (Object r : res[a-1].toList()) // increment count if amino acid has a match in residue group if (((Symbol) r).getMatches().contains((Symbol) sr)) { counts[s][a-1]++; break; } // print "name length: composition" for each protein System.out.print(seq.getName() + "\t" + seq.length() + ":"); for (int a = 1; a < args.length; a++) System.out.print("\t" + (means[s][a-1] = (float) counts[s][a-1] / lengths[s])); System.out.println(); } } catch (FileNotFoundException ex) { System.err.println("Problem reading file..."); ex.printStackTrace(); } catch (BioException ex) { System.err.println("File not in fasta format or wrong alphabet..."); ex.printStackTrace(); } catch (NoSuchElementException ex) { System.err.println("No fasta sequences in the file..."); ex.printStackTrace(); } } } On 4/3/2010 5:18 AM, Radwen Aniba wrote: > Hello, > > I'm writing an application that treats protein sequences, and I am using > Biojava for a couple of things. > One of these processings is to parse protein multifasta files, and treat the > sequences one after the other. One of my purposes is to calculate > composition. By composition I mean that I am interested to know in a given > protein sequence what is the mean and the standard deviation composition of > these groups : > > PAGST > EDNQ > LIVM > KRH > C > > example : > > protein fasta file : > >> SEQ1 > > DVSFRLSGATSSSYGVFISNLRKALPNERKLYDIPLLRSSLPGSQRYALI > HLTNYADETISVAIDVTNVYIMGYRAGDTSYFFNEASATEAAKYVFKDAM > RKVTLPYSGNYERLQTAAGKIRENIPLGLPALDSAITTLFYYNANSAASA > LMVLIQSTSEAARYKFIEQQIGKRVDKTFLPSLAIISLENSWSALSKQIQ > IASTNNGQFESPVVLINAQNQRVTITNVDAGVVTSNIALLLNRNNMA > >> SEQ2 > > IFPKQYPIINFTTAGATVQSYTNFIRAVRGRLTTGADVRHEIPVLPNRVG > LPINQRFILVELSNHAELSVTLALDVTNAYVVGYRAGNSAYFFHPDNQED > AEAITHLFTDVQNRYTFAFGGNYDRLEQLAGNLRENIELGNGPLEEAISA > LYYYSTGGTQLPTLARSFIICIQMISEAARFQYIEGEMRTRIRYNRRSAP > DPSVITLENSWGRLSTAIQESNQGAFASPIQLQRRNGSKFSVYDVSILIP > IIALMVYRCAPPPSSQF > > I would like to > 1/ parse SEQ1 to calculate the composition mean of PAGST residues for > example ( number of residus/ length of the sequence) > 2/ do same thing for SEQ2 > 3 / return the average mean of both sequences > 4/ Return standard deviation of these values. > > > I can do it writing a standard java code, but I would like to know (as I am > using biojava already) if this is possible or not ( Which class / instances > to use) > > Cheers > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From andreas.prlic at gmail.com Sat Apr 3 11:20:58 2010 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Sat, 3 Apr 2010 08:20:58 -0700 Subject: [Biojava-dev] BOSC In-Reply-To: References: Message-ID: <2E000562-884F-4172-A94D-61488C605A9F@gmail.com> I am planning to attend 3d-SIG this year and will be mentioning biojava there... Andreas On 2 Apr 2010, at 20:23, Michael Heuer wrote: > Andreas Prlic wrote: > >> who is going to BOSC this year and who wants to present a BioJava >> talk? > > I will be there. I'm probably not the best person to present this > time > around. > > michael > From sheoran143 at gmail.com Sun Apr 11 15:16:29 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Sun, 11 Apr 2010 14:16:29 -0500 Subject: [Biojava-dev] Issue with SimpleNCBITaxon class Message-ID: <4BC2200D.8000109@gmail.com> Hi, Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it. 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue) 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id. ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry Thanks Deepak Sheoran From holland at eaglegenomics.com Sun Apr 11 15:53:06 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Sun, 11 Apr 2010 20:53:06 +0100 Subject: [Biojava-dev] Issue with SimpleNCBITaxon class In-Reply-To: <4BC2200D.8000109@gmail.com> References: <4BC2200D.8000109@gmail.com> Message-ID: I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead). thanks, Richard On 11 Apr 2010, at 20:16, Deepak Sheoran wrote: > Hi, > > Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it. > > 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue) > 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id. > > > > > > > > ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry > > Thanks > Deepak Sheoran > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From sheoran143 at gmail.com Sun Apr 11 17:08:22 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Sun, 11 Apr 2010 16:08:22 -0500 Subject: [Biojava-dev] Issue with SimpleNCBITaxon class In-Reply-To: References: <4BC2200D.8000109@gmail.com> Message-ID: <4BC23A46.7090304@gmail.com> I am using same table with biojava and bioperl taxon program and the output I get is below: *Biojava:* For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum var. haydenii. Biojava process of finding names: 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240 (wrong way of doing things) *Bioperl:* For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is Retroviridae; Orthoretrovirinae; Alpharetrovirus; unclassified Alpharetrovirus. Bioperl process of finding names: 11876==>353825==>153057==>327045==>11632 (Right way of doing things) Hint: biojava search ncbi_taxon_id column with a value from parent_taxon_id where bioperl search taxon_id column with a value from parent_taxon_id. *Taxon and Taxon_name Table content which is being relevant in discussion:* taxon_id ncbi_taxon_id parent_taxon_id node_rank name name_class 2901 3609 276240 genus Rhamnus scientific name 3610 4403 3609 species Platanus occidentalis scientific name 29052 48579 4403 species Suillus placidus scientific name 114412 143975 48579 species Diadasia australis scientific name 143976 176516 143975 species Arnicastrum guerrerense scientific name 30680 50447 176516 family Labiduridae scientific name 254757 301952 50447 varietas Oreostemma alpigenum var. haydenii scientific name 9394 11632 17394 family Retroviridae scientific name 277861 327045 9394 subfamily Orthoretrovirinae scientific name 122448 153057 277861 genus Alpharetrovirus scientific name 301952 353825 122448 no rank unclassified Alpharetrovirus scientific name 9584 11876 301952 species Avian sarcoma virus scientifice name Thanks Deepak On 4/11/2010 2:53 PM, Richard Holland wrote: > I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead). > > thanks, > Richard > > On 11 Apr 2010, at 20:16, Deepak Sheoran wrote: > > >> Hi, >> >> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it. >> >> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue) >> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id. >> >> >> >> >> >> >> >> ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry >> >> Thanks >> Deepak Sheoran >> >> >> > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > From sheoran143 at gmail.com Sun Apr 11 18:48:00 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Sun, 11 Apr 2010 17:48:00 -0500 Subject: [Biojava-dev] Issue with SimpleNCBITaxon class In-Reply-To: <4BC23A46.7090304@gmail.com> References: <4BC2200D.8000109@gmail.com> <4BC23A46.7090304@gmail.com> Message-ID: <4BC251A0.4090602@gmail.com> If we don't want to change the current code in biojava and still want to fix this bug I have found a way, 1) we can do this by changing one of hibernate files called "Taxon.hbm.xml" and replace the line with by changing the above setting in hibernate setting I am able to get the correct linage for ncbi_taxon_id = 11876(Avian sarcoma virus) which is Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Alpharetrovirus; unclassified Alpharetrovirus. 2) But the possible issue which we might get is with Taxonomy loader class which want to insert something for parent taxon_id into taxon table which I think won't be possible if we do this change to hibernate con-fig file. Deepak Sheoran On 4/11/2010 4:08 PM, Deepak Sheoran wrote: > I am using same table with biojava and bioperl taxon program and the > output I get is below: > > *Biojava:* > For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the > lineage i get is > Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia > australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum > var. haydenii. > > Biojava process of finding names: > 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240 > (wrong way of doing things) > > *Bioperl:* > For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the > lineage i get is > Retroviridae; Orthoretrovirinae; Alpharetrovirus; > unclassified Alpharetrovirus. > > Bioperl process of finding names: > 11876==>353825==>153057==>327045==>11632 (Right way of doing things) > > Hint: biojava search ncbi_taxon_id column with a value from > parent_taxon_id where bioperl search taxon_id column with a value from > parent_taxon_id. > > *Taxon and Taxon_name Table content which is being relevant in > discussion:* > > taxon_id ncbi_taxon_id parent_taxon_id node_rank name name_class > 2901 3609 276240 genus Rhamnus scientific name > 3610 4403 3609 species Platanus occidentalis scientific name > 29052 48579 4403 species Suillus placidus scientific name > 114412 143975 48579 species Diadasia australis scientific name > 143976 176516 143975 species Arnicastrum guerrerense scientific name > 30680 50447 176516 family Labiduridae scientific name > 254757 301952 50447 varietas Oreostemma alpigenum var. haydenii > scientific name > 9394 11632 17394 family Retroviridae scientific name > 277861 327045 9394 subfamily Orthoretrovirinae scientific name > 122448 153057 277861 genus Alpharetrovirus scientific name > 301952 353825 122448 no rank unclassified Alpharetrovirus > scientific name > 9584 > 11876 > 301952 > species > Avian sarcoma virus > scientifice name > > > Thanks > Deepak > > On 4/11/2010 2:53 PM, Richard Holland wrote: >> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead). >> >> thanks, >> Richard >> >> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote: >> >> >>> Hi, >>> >>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it. >>> >>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue) >>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id. >>> >>> >>> >>> >>> >>> >>> >>> ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry >>> >>> Thanks >>> Deepak Sheoran >>> >>> >>> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E:holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> > From holland at eaglegenomics.com Mon Apr 12 03:07:55 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 12 Apr 2010 08:07:55 +0100 Subject: [Biojava-dev] [Biojava-l] Issue with SimpleNCBITaxon class In-Reply-To: References: <4BC2200D.8000109@gmail.com> <4BC23A46.7090304@gmail.com> Message-ID: Incidentally, BioJava's approach matches the description in the BioSQL docs at: http://biosql.org/wiki/Schema_Overview#TAXON.2C_TAXON_NAME (first example SQL statement - find the taxon id of the parent taxon for 'Homo sapiens' using a self-join) The BioPerl/BioSQL load_ncbi_taxonomy.pl script however does not match this description. cheers, Richard On 12 Apr 2010, at 07:57, Richard Holland wrote: > Thanks Deepak. > > I've had a look at the code and I believe its due to the different ways in which BioJava and BioPerl load the taxon table. > > BioJava sets the ncbi_taxon_id and parent_taxon_id columns based on the values from the NCBI taxonomy file. The taxon_id column in BioJava is a meaningless auto-generated value that is never used. > > BioPerl however is generating taxon_id values and linking them by setting parent_taxon_id to the generated value. The parent value from the NCBI taxonomy file is therefore replaced with the BioPerl generated parent ID, meaning that instead of linking from parent_taxon_id to ncbi_taxon_id as per BioJava, the link is to taxon_id instead. (I'm basing this comment on looking at load_ncbi_taxonomy.pl from the BioSQL archives.) > > I believe if you load the taxonomy table using BioJava, you should see BioJava giving correct behaviour. Likewise if you load it using BioPerl, BioPerl will behave correctly. But if you load with one then query with the other, you'll get incorrect results. > > This sounds like a case for discussion on both lists - a matter of standardisation between the two projects. Not quickly/easily solvable for now. > > cheers, > Richard > > On 11 Apr 2010, at 22:08, Deepak Sheoran wrote: > >> I am using same table with biojava and bioperl taxon program and the output I get is below: >> >> Biojava: >> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is >> Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum var. haydenii. >> >> Biojava process of finding names: 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240 (wrong way of doing things) >> >> Bioperl: >> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is >> Retroviridae; Orthoretrovirinae; Alpharetrovirus; unclassified Alpharetrovirus. >> >> Bioperl process of finding names: 11876==>353825==>153057==>327045==>11632 (Right way of doing things) >> >> Hint: biojava search ncbi_taxon_id column with a value from parent_taxon_id where bioperl search taxon_id column with a value from parent_taxon_id. >> >> Taxon and Taxon_name Table content which is being relevant in discussion: >> >> taxon_id ncbi_taxon_id parent_taxon_id node_rank name name_class >> 2901 3609 276240 genus Rhamnus scientific name >> 3610 4403 3609 species Platanus occidentalis scientific name >> 29052 48579 4403 species Suillus placidus scientific name >> 114412 143975 48579 species Diadasia australis scientific name >> 143976 176516 143975 species Arnicastrum guerrerense scientific name >> 30680 50447 176516 family Labiduridae scientific name >> 254757 301952 50447 varietas Oreostemma alpigenum var. haydenii scientific name >> 9394 11632 17394 family Retroviridae scientific name >> 277861 327045 9394 subfamily Orthoretrovirinae scientific name >> 122448 153057 277861 genus Alpharetrovirus scientific name >> 301952 353825 122448 no rank unclassified Alpharetrovirus scientific name >> 9584 >> 11876 >> 301952 >> species >> Avian sarcoma virus >> scientifice name >> >> Thanks >> Deepak >> >> On 4/11/2010 2:53 PM, Richard Holland wrote: >>> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead). >>> >>> thanks, >>> Richard >>> >>> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote: >>> >>> >>> >>>> Hi, >>>> >>>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it. >>>> >>>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue) >>>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry >>>> >>>> Thanks >>>> Deepak Sheoran >>>> >>>> >>>> >>>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: >>> holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >>> >>> >> > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Mon Apr 12 02:57:57 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 12 Apr 2010 07:57:57 +0100 Subject: [Biojava-dev] Issue with SimpleNCBITaxon class In-Reply-To: <4BC23A46.7090304@gmail.com> References: <4BC2200D.8000109@gmail.com> <4BC23A46.7090304@gmail.com> Message-ID: Thanks Deepak. I've had a look at the code and I believe its due to the different ways in which BioJava and BioPerl load the taxon table. BioJava sets the ncbi_taxon_id and parent_taxon_id columns based on the values from the NCBI taxonomy file. The taxon_id column in BioJava is a meaningless auto-generated value that is never used. BioPerl however is generating taxon_id values and linking them by setting parent_taxon_id to the generated value. The parent value from the NCBI taxonomy file is therefore replaced with the BioPerl generated parent ID, meaning that instead of linking from parent_taxon_id to ncbi_taxon_id as per BioJava, the link is to taxon_id instead. (I'm basing this comment on looking at load_ncbi_taxonomy.pl from the BioSQL archives.) I believe if you load the taxonomy table using BioJava, you should see BioJava giving correct behaviour. Likewise if you load it using BioPerl, BioPerl will behave correctly. But if you load with one then query with the other, you'll get incorrect results. This sounds like a case for discussion on both lists - a matter of standardisation between the two projects. Not quickly/easily solvable for now. cheers, Richard On 11 Apr 2010, at 22:08, Deepak Sheoran wrote: > I am using same table with biojava and bioperl taxon program and the output I get is below: > > Biojava: > For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is > Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum var. haydenii. > > Biojava process of finding names: 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240 (wrong way of doing things) > > Bioperl: > For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is > Retroviridae; Orthoretrovirinae; Alpharetrovirus; unclassified Alpharetrovirus. > > Bioperl process of finding names: 11876==>353825==>153057==>327045==>11632 (Right way of doing things) > > Hint: biojava search ncbi_taxon_id column with a value from parent_taxon_id where bioperl search taxon_id column with a value from parent_taxon_id. > > Taxon and Taxon_name Table content which is being relevant in discussion: > > taxon_id ncbi_taxon_id parent_taxon_id node_rank name name_class > 2901 3609 276240 genus Rhamnus scientific name > 3610 4403 3609 species Platanus occidentalis scientific name > 29052 48579 4403 species Suillus placidus scientific name > 114412 143975 48579 species Diadasia australis scientific name > 143976 176516 143975 species Arnicastrum guerrerense scientific name > 30680 50447 176516 family Labiduridae scientific name > 254757 301952 50447 varietas Oreostemma alpigenum var. haydenii scientific name > 9394 11632 17394 family Retroviridae scientific name > 277861 327045 9394 subfamily Orthoretrovirinae scientific name > 122448 153057 277861 genus Alpharetrovirus scientific name > 301952 353825 122448 no rank unclassified Alpharetrovirus scientific name > 9584 > 11876 > 301952 > species > Avian sarcoma virus > scientifice name > > Thanks > Deepak > > On 4/11/2010 2:53 PM, Richard Holland wrote: >> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead). >> >> thanks, >> Richard >> >> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote: >> >> >> >>> Hi, >>> >>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it. >>> >>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue) >>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id. >>> >>> >>> >>> >>> >>> >>> >>> ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry >>> >>> Thanks >>> Deepak Sheoran >>> >>> >>> >>> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: >> holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> >> >> > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From trevor.paterson at roslin.ed.ac.uk Tue Apr 13 07:41:01 2010 From: trevor.paterson at roslin.ed.ac.uk (trevor paterson (RI)) Date: Tue, 13 Apr 2010 12:41:01 +0100 Subject: [Biojava-dev] Biojava3 structure In-Reply-To: <59a41c431003281902ic2c5ed3h4a2383899f465a8@mail.gmail.com> Message-ID: <050C9A545DC1D84BAC7A678B76A56C3C254D5F9552@ebrcexch1.ebrc.bbsrc.ac.uk> Andreas I am trying to do an anoymous checkout of the whole bio-java 3 trunk and it is failing on the structure module I cant even do a copy command the src/main tree seems corrupted - throwing an error Error: Decompression of svndiff data failed Trevor Paterson PhD new email trevor.paterson at roslin.ed.ac.uk Bioinformatics The Roslin Institute The Royal (Dick) School of Veterinary Studies University of Edinburgh Scotland EH25 9PS phone +44 (0)131 5274197 http://www.roslin.ed.ac.uk http://www.resspecies.org http://www.thearkdb.org Please consider the environment before printing this e-mail The University of Edinburgh is a charitable body, registered in Scotland with registration number SC005336 Disclaimer:This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of > Andreas Prlic > Sent: 29 March 2010 03:03 > To: Scooter Willis > Cc: biojava-dev > Subject: Re: [Biojava-dev] Biojava3 structure > > Hi Scooter, > > at the present the structure modules depend on the alignment > module and on the (old) core module. This is for aligning > ATOM and SEQRES residues in the PDB files, and for the Smith > Waterman alignment based 3D structure superposition. If we > target a release of biojava 3 in about a month, I don't think > it will be possible to break this out, mainly because the > alignment module is still based on the biojava 1 code base. > Overall I think that the core module probably should still be > part of the BioJava 3 release. Any opinions on that? > > Andreas > > On Sun, Mar 28, 2010 at 3:06 PM, Scooter Willis > wrote: > > > Andreas > > > > I needed to do some work with a PDB file so started to use the > > structure library. It looks like it depends on all the old biojava > > code. Mainly the structure exceptions that extend > bioexception is the > > first thing tripping me up. Should the biojava3-structure > module have > > any external dependencies or am I working with the wrong package? > > > > Thanks > > > > Scooter > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From andreas at sdsc.edu Tue Apr 13 10:04:20 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 13 Apr 2010 07:04:20 -0700 Subject: [Biojava-dev] Biojava3 structure In-Reply-To: <050C9A545DC1D84BAC7A678B76A56C3C254D5F9552@ebrcexch1.ebrc.bbsrc.ac.uk> References: <59a41c431003281902ic2c5ed3h4a2383899f465a8@mail.gmail.com> <050C9A545DC1D84BAC7A678B76A56C3C254D5F9552@ebrcexch1.ebrc.bbsrc.ac.uk> Message-ID: Hi Trevor, I can confirm the same behaviour from our anonymous SVN. Developer SVN seems to be ok and I also ran an svnadmin verify without problems. I suppose we are having issues with the anonymous SVN server again... I'll ask the OBF helpdesk to take a another look ... Can you try and let us know if checkout from svn/git from github works for you in the meanwhile ? e.g. svn co http://svn.github.com/biojava/biojava.git ./biojava Thanks, Andreas On Tue, Apr 13, 2010 at 4:41 AM, trevor paterson (RI) wrote: > Andreas > > I am trying to do an anoymous checkout of the whole bio-java 3 trunk ?and it is failing on the structure module > > I cant even do a copy command > > the src/main tree seems corrupted - throwing an error > Error: Decompression of svndiff data failed > > Trevor Paterson PhD > new email trevor.paterson at roslin.ed.ac.uk > > Bioinformatics > The Roslin Institute > The Royal (Dick) School of Veterinary Studies > University of Edinburgh > Scotland EH25 9PS > phone +44 (0)131 5274197 > http://www.roslin.ed.ac.uk > http://www.resspecies.org > http://www.thearkdb.org > Please consider the environment before printing this e-mail > > The University of Edinburgh is a charitable body, registered in Scotland with registration number SC005336 > Disclaimer:This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > >> -----Original Message----- >> From: biojava-dev-bounces at lists.open-bio.org >> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of >> Andreas Prlic >> Sent: 29 March 2010 03:03 >> To: Scooter Willis >> Cc: biojava-dev >> Subject: Re: [Biojava-dev] Biojava3 structure >> >> Hi Scooter, >> >> at the present the structure modules depend on the alignment >> module and on the (old) core module. ?This is for aligning >> ATOM and SEQRES residues in the PDB files, and for the Smith >> Waterman alignment based 3D structure superposition. If we >> target a release of biojava 3 in about a month, I don't think >> it will be possible to break this out, mainly because the >> alignment module is still based on the biojava 1 code base. >> Overall I think that the core module probably should still be >> part of the BioJava 3 release. Any opinions on that? >> >> Andreas >> >> On Sun, Mar 28, 2010 at 3:06 PM, Scooter Willis >> wrote: >> >> > Andreas >> > >> > I needed to do some work with a PDB file so started to use the >> > structure library. It looks like it depends on all the old biojava >> > code. Mainly the structure exceptions that extend >> bioexception is the >> > first thing tripping me up. Should the biojava3-structure >> module have >> > any external dependencies or am I working with the wrong package? >> > >> > Thanks >> > >> > Scooter >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From biopython at maubp.freeserve.co.uk Thu Apr 15 13:54:56 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Apr 2010 18:54:56 +0100 Subject: [Biojava-dev] [Biojava-l] Issue with SimpleNCBITaxon class In-Reply-To: References: <4BC2200D.8000109@gmail.com> <4BC23A46.7090304@gmail.com> Message-ID: Hi, I've CC'd this to the BioSQL mailing list for cross project discussion. On Mon, Apr 12, 2010 at 7:57 AM, Richard Holland wrote: > Thanks Deepak. > > I've had a look at the code and I believe its due to the > different ways in which BioJava and BioPerl load the > taxon table. > > BioJava sets the ncbi_taxon_id and parent_taxon_id > columns based on the values from the NCBI taxonomy > file. The taxon_id column in BioJava is a meaningless > auto-generated value that is never used. > > BioPerl however is generating taxon_id values and > linking them by setting parent_taxon_id to the > generated value. The parent value from the NCBI > taxonomy file is therefore replaced with the BioPerl > generated parent ID, meaning that instead of linking > from parent_taxon_id to ncbi_taxon_id as per BioJava, > the link is to taxon_id instead. (I'm basing this > comment on looking at load_ncbi_taxonomy.pl from > the BioSQL archives.) Note that old versions of load_ncbi_taxonomy.pl (which is part of BioSQL, not part of BioPerl) would set taxon_id equal to ncbi_taxon_id, see: http://bugzilla.open-bio.org/show_bug.cgi?id=2470 This may help explain the confusion. > I believe if you load the taxonomy table using BioJava, > you should see BioJava giving correct behaviour. > Likewise if you load it using BioPerl, BioPerl will > behave correctly. But if you load with one then query > with the other, you'll get incorrect results. > > This sounds like a case for discussion on both lists - > a matter of standardisation between the two projects. > Not quickly/easily solvable for now. Its not just two projects (BioPerl & BioJava) (grin). Its at least five projects (BioSQL itself plus BioRuby and Biopython). I'm not sure about BioRuby's implementation, but currently I think BioJava is the odd one out - BioPerl, Biopython, and the BioSQL's load_ncbi_taxonomy.pl all make entries in parent_taxon_id reference the automatically generated taxon_id (please correct me if I am wrong). My personal view is that bioperl-db is the reference implementation and should be followed in the event of any ambiguity within BioSQL. In this particular case, there is actually a BioSQL script to check against too (load_ncbi_taxonomy.pl). Hopefully Hilmar can give us an official verdict... Peter From andreas at sdsc.edu Fri Apr 16 13:39:37 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 16 Apr 2010 10:39:37 -0700 Subject: [Biojava-dev] Biojava3-genetics In-Reply-To: <4BC806F4.3090302@wur.nl> References: <4BC806F4.3090302@wur.nl> Message-ID: Hi Richard, any contribution is welcome. What do you have in mind in particular? Perhaps there is already something there along those lines... Andreas On Thu, Apr 15, 2010 at 11:43 PM, Richard Finkers wrote: > Dear List, > > I would be interested in adding a module for genetic analysis to the > biojava3 project. Are there others who are interested in this as well and > with who should I discuss this further? > > Thanks, > Richard > > > -- > Dr. Richard Finkers > Researcher Plant Breeding > Wageningen UR Plant Breeding > P.O. Box 16, 6700 AA, Wageningen, The Netherlands > Wageningen Campus, Building 107, Droevendaalsesteeg 1, 6708 PB > Wageningen, The Netherlands > Tel. +31-317-484165 Fax +31-317-418094 > http://www.plantbreeding.wur.nl/ > https://www.eu-sol.wur.nl/ > https://cbsgdbase.wur.nl/ > http://solgenomics.wur.nl/ > http://www.disclaimer-uk.wur.nl/ > > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From sheoran143 at gmail.com Fri Apr 16 14:43:59 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Fri, 16 Apr 2010 13:43:59 -0500 Subject: [Biojava-dev] [Biojava-l] Issue with SimpleNCBITaxon class In-Reply-To: References: <4BC2200D.8000109@gmail.com> <4BC23A46.7090304@gmail.com> Message-ID: <4BC8AFEF.70107@gmail.com> What my experience says on this issue we should make use of taxon_id because its a unique key in a local instance of biosql. ncbi_taxon_id should only be used for mapping purpose only so that a person can map his local taxon_id to a ncbi_taxon_id otherwise it defeat the sole purpose of having taxon_id as primary key in taxon table. The main goal which I think when biosql is designed is to make it independent of any other organization like genbank or NCBI but its a feature so that we can map a number(ncbi_taxon_id) given by a know authority to a local number (taxon_id). Deepak Sheoran On 4/15/2010 12:54 PM, Peter wrote: > Hi, > > I've CC'd this to the BioSQL mailing list for cross project > discussion. > > On Mon, Apr 12, 2010 at 7:57 AM, Richard Holland wrote: > >> Thanks Deepak. >> >> I've had a look at the code and I believe its due to the >> different ways in which BioJava and BioPerl load the >> taxon table. >> >> BioJava sets the ncbi_taxon_id and parent_taxon_id >> columns based on the values from the NCBI taxonomy >> file. The taxon_id column in BioJava is a meaningless >> auto-generated value that is never used. >> >> BioPerl however is generating taxon_id values and >> linking them by setting parent_taxon_id to the >> generated value. The parent value from the NCBI >> taxonomy file is therefore replaced with the BioPerl >> generated parent ID, meaning that instead of linking >> from parent_taxon_id to ncbi_taxon_id as per BioJava, >> the link is to taxon_id instead. (I'm basing this >> comment on looking at load_ncbi_taxonomy.pl from >> the BioSQL archives.) >> > Note that old versions of load_ncbi_taxonomy.pl > (which is part of BioSQL, not part of BioPerl) would > set taxon_id equal to ncbi_taxon_id, see: > http://bugzilla.open-bio.org/show_bug.cgi?id=2470 > > This may help explain the confusion. > > >> I believe if you load the taxonomy table using BioJava, >> you should see BioJava giving correct behaviour. >> Likewise if you load it using BioPerl, BioPerl will >> behave correctly. But if you load with one then query >> with the other, you'll get incorrect results. >> >> This sounds like a case for discussion on both lists - >> a matter of standardisation between the two projects. >> Not quickly/easily solvable for now. >> > Its not just two projects (BioPerl& BioJava) (grin). > Its at least five projects (BioSQL itself plus BioRuby > and Biopython). > > I'm not sure about BioRuby's implementation, but > currently I think BioJava is the odd one out - BioPerl, > Biopython, and the BioSQL's load_ncbi_taxonomy.pl > all make entries in parent_taxon_id reference the > automatically generated taxon_id (please correct > me if I am wrong). > > My personal view is that bioperl-db is the reference > implementation and should be followed in the event > of any ambiguity within BioSQL. In this particular > case, there is actually a BioSQL script to check > against too (load_ncbi_taxonomy.pl). > > Hopefully Hilmar can give us an official verdict... > > Peter > From sylvain.foisy at diploide.net Sat Apr 17 10:00:07 2010 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Sat, 17 Apr 2010 10:00:07 -0400 (EDT) Subject: [Biojava-dev] Eclipse + maven woes... Message-ID: <55335.76.10.128.89.1271512807.squirrel@humboldt.cyberlogic.net> Hi, Again, I feel stupid asking these newbie questions... I finally got my and on a new MacBook Pro and re-installing the apps to get stuff moving. As usual (I am sorry to say...), Eclipse and Maven are giving me a fit to do a checkout of the developer's tree. I have installed the latest Subversion and Maven plugins. When I want to create a new project, I try the following: 1) I right click to select "New > Other..." in the Navigator panel; 2) I select "SVN > Project from SVN", which leads me to a window where the location of the developer's tree is in svn+ssh; in the window that comes up next, I use this URL to get the "Finish" button activated: svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk 3) After that, I choose the "Check out as a project configured using the New Project Wizard", which pop the window where I select "Maven > Maven Project". 4) I get a "New Maven Project" window where I select the default. The window then changes to a "Select archetype" where I also use the default selections. 5) This is where I can't seem to be moving forward... The window that pops out ask me for an Artefact ID. I am clueless about what to put... The process stops there :-( Maven is probably a cool tool but its learning curve is pretty steep... Shouldn't all this be automatic after "Maven > Maven Project" Thanks in advance. I'll put the solution into the wiki ;-) Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net =================================================================== From heuermh at acm.org Sun Apr 18 23:33:00 2010 From: heuermh at acm.org (Michael Heuer) Date: Sun, 18 Apr 2010 23:33:00 -0400 (EDT) Subject: [Biojava-dev] Eclipse + maven woes... In-Reply-To: <55335.76.10.128.89.1271512807.squirrel@humboldt.cyberlogic.net> Message-ID: Sylvain Foisy wrote: > Again, I feel stupid asking these newbie questions... I finally got my and > on a new MacBook Pro and re-installing the apps to get stuff moving. As > usual (I am sorry to say...), Eclipse and Maven are giving me a fit to do a > checkout of the developer's tree. > > I have installed the latest Subversion and Maven plugins. When I want to > create a new project, I try the following: > > 1) I right click to select "New > Other..." in the Navigator panel; > > 2) I select "SVN > Project from SVN", which leads me to a window where the > location of the developer's tree is in svn+ssh; in the window that comes > up next, I use this URL to get the "Finish" button activated: > > svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk > > 3) After that, I choose the "Check out as a project configured using the New > Project Wizard", which pop the window where I select "Maven > Maven > Project". > > 4) I get a "New Maven Project" window where I select the default. The window > then changes to a "Select archetype" where I also use the default > selections. This last step doesn't sound right, here Eclipse is creating a brand new Maven project for you instead of creating a Maven-based project from the metadata already in subversion. In the SVN window you should see "Check out as Maven Project" when you right-click, unless that has changed with newer versions of the maven plugin. michael From andreas at sdsc.edu Mon Apr 19 00:17:17 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 18 Apr 2010 21:17:17 -0700 Subject: [Biojava-dev] Eclipse + maven woes... In-Reply-To: <55335.76.10.128.89.1271512807.squirrel@humboldt.cyberlogic.net> References: <55335.76.10.128.89.1271512807.squirrel@humboldt.cyberlogic.net> Message-ID: Hi Sylvain, The place to start the checkout in eclipse is the SVN repository browser. There you can do a right-click on the biojava/trunk folder and check out as a Maven project. Andreas On Sat, Apr 17, 2010 at 7:00 AM, Sylvain Foisy wrote: > Hi, > > Again, I feel stupid asking these newbie questions... I finally got my and > on a new MacBook Pro and re-installing the apps to get stuff moving. As > usual (I am sorry to say...), Eclipse and Maven are giving me a fit to do a > checkout of the developer's tree. > > I have installed the latest Subversion and Maven plugins. When I want to > create a new project, I try the following: > > 1) I right click to select "New > Other..." in the Navigator panel; > > 2) I select "SVN > Project from SVN", which leads me to a window where the > location of the developer's tree is in svn+ssh; in the window that comes > up next, I use this URL to get the "Finish" button activated: > > svn+ssh:// > dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk > > 3) After that, I choose the "Check out as a project configured using the > New > Project Wizard", which pop the window where I select "Maven > Maven > Project". > > 4) I get a "New Maven Project" window where I select the default. The > window > then changes to a "Select archetype" where I also use the default > selections. > > 5) This is where I can't seem to be moving forward... The window that pops > out ask me for an Artefact ID. I am clueless about what to put... The > process stops there :-( > > Maven is probably a cool tool but its learning curve is pretty steep... > Shouldn't all this be automatic after "Maven > Maven Project" > > Thanks in advance. I'll put the solution into the wiki ;-) > > Sylvain > > =================================================================== > > Sylvain Foisy, Ph. D. > Consultant Bio-informatique / Bioinformatics > Diploide.net - TI pour la vie / IT for Life > > Courriel: sylvain.foisy at diploide.net > Web: http://www.diploide.net > > =================================================================== > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From sylvain.foisy at diploide.net Mon Apr 19 16:47:43 2010 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Mon, 19 Apr 2010 16:47:43 -0400 Subject: [Biojava-dev] Eclipse + maven woes... In-Reply-To: Message-ID: Hi Andreas, I finally got something working but it wasn't automatic... Switching to the SVN Repositories perspective, I right-clicked on trunk and selected "Checkout..." After d/l the code, I had to right-click the biojava-live that was now found in the Java Browsing perspective, select the "m2 Maven > Enable Dependancy Management" to have it working. If I tried the "Check out as..." option, I would have a window popping out with "Check out Maven projects with SCM" pre-selected and I would be stuck in the Group ID/Artefact ID mayhem. Thanks for the time. Back to coding ;-) Sylvain On 19/04/10 00:17, "[NAME]" <[ADDRESS]> wrote: > Hi Sylvain, > > The place to start the checkout in eclipse is the SVN repository browser.? > There you can do a right-click on the biojava/trunk folder and check out as a > Maven project. > > Andreas > > On Sat, Apr 17, 2010 at 7:00 AM, Sylvain Foisy > wrote: >> Hi, >> >> Again, I feel stupid asking these newbie questions... I finally got my and >> on a new MacBook Pro and re-installing the apps to get stuff moving. As >> usual (I am sorry to say...), Eclipse and Maven are giving me a fit to do a >> checkout of the developer's tree. >> >> I have installed the latest Subversion and Maven plugins. When I want to >> create a new project, I try the following: >> >> 1) I right click to select "New > Other..." in the Navigator panel; >> >> 2) I select "SVN > Project from SVN", which leads me to a window where the >> ?location of the developer's tree is in svn+ssh; in the window that comes >> up next, I use this URL to get the "Finish" button activated: >> >> ?svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk >> >> >> 3) After that, I choose the "Check out as a project configured using the New >> Project Wizard", which pop the window where I select "Maven > Maven >> Project". >> >> 4) I get a "New Maven Project" window where I select the default. The window >> then changes to a "Select archetype" where I also use the default >> selections. >> >> 5) This is where I can't seem to be moving forward... The window that pops >> out ask me for an Artefact ID. I am clueless about what to put... The >> process stops there :-( >> >> Maven is probably a cool tool but its learning curve is pretty steep... >> Shouldn't all this be automatic after "Maven > Maven Project" >> >> Thanks in advance. I'll put the solution into the wiki ;-) >> >> Sylvain >> >> =================================================================== >> >> ?Sylvain Foisy, Ph. D. >> ?Consultant Bio-informatique / Bioinformatics >> ?Diploide.net - TI pour la vie / IT for Life >> >> ?Courriel: sylvain.foisy at diploide.net >> ?Web: http://www.diploide.net >> >> =================================================================== >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > From andreas at sdsc.edu Tue Apr 27 01:33:51 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 26 Apr 2010 22:33:51 -0700 Subject: [Biojava-dev] accepted GSoC projects Message-ID: Dear all, Google has released the results for GSoC: Congratulations to Mark Chapman and Jianjiong Gao for having been accepted to work on the MSA and PTM projects for BioJava! Let's start the "community bonding" process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap ) and we all are looking forward to work with you on this during the summer. The Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis and Kyle Ellrott for the MSA project (and me). I want to thank all of of you who submitted proposals or showed interest in other ways for the Google Summer of Code. We hope you are not too disappointed if your application did not get accepted this time. We had a large number (52) applications and the the overall quality of the submissions was very high. We would like to stay in touch with you and we hope that you are interested in BioJava also beyond the scope of GSoC. There are a number of different ways how to contribute: We are always looking for people who provide code and patches to further improve our library, help out with the documentation on the Wiki page, or answer questions on the mailing lists. Let's all give Mark and Jianjiong a warm welcome to the BioJava community. For those of you who are interested in following the progress of the projects, as usually, the development related discussions are going to be on the biojava-dev list. Happy coding! Andreas From jianjiong.gao at gmail.com Tue Apr 27 15:13:12 2010 From: jianjiong.gao at gmail.com (Jianjiong Gao) Date: Tue, 27 Apr 2010 14:13:12 -0500 Subject: [Biojava-dev] accepted GSoC projects In-Reply-To: References: Message-ID: Dear Dr. Prlic and Everyone, Thanks for the warm welcome. I am so glad that I have the chance to work with the BioJava community this summer. I would like to briefly introduce myself. My name is Jianjiong (JJ) Gao. I am a PhD student in Computer Science at University of Missouri, Columbia. My study is focusing on Bioinformatics, specifically computational proteomics and PTMs. I came across BioJava about two years ago when I was working on a plugin for Cytoscape, and was attracted by the idea of providing generic Java API for bioinformatics applications. I was thinking maybe someday I could do some coding for BioJava. And now I got the chance :) Best Regards, -JJ On Tue, Apr 27, 2010 at 12:33 AM, Andreas Prlic wrote: > Dear all, > > Google has released the results for GSoC: Congratulations to Mark Chapman > and Jianjiong Gao for having been accepted to work on the MSA and PTM > projects for BioJava! Let's start the "community bonding" process ( > http://en.flossmanuals.net/GSoCMentoring/MindtheGap ) ?and we all are > looking forward to work with you on this during the summer. The Mentors and > co-mentors will be Peter Rose for the PTM and Scooter Willis and Kyle > Ellrott for the MSA project (and me). > > I want to thank all of of you who submitted proposals or showed interest in > other ways for the Google Summer of Code. We hope you are not too > disappointed if your application did not get accepted this time. We had a > large number (52) applications and the the overall quality of the > submissions was very high. We would like to stay in touch with you and we > hope that you are interested in BioJava also beyond the scope of GSoC. There > are a number of different ways how to contribute: ?We are always looking for > people who provide code and patches to further improve our library, help out > with the documentation on the Wiki page, or answer questions on the mailing > lists. > > Let's all give Mark and Jianjiong ?a warm welcome to the BioJava community. > For those of you who are interested in following the progress of the > projects, as usually, the development related discussions are going to be on > the biojava-dev list. > > Happy coding! > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From chapman at cs.wisc.edu Wed Apr 28 00:18:25 2010 From: chapman at cs.wisc.edu (Mark Chapman) Date: Tue, 27 Apr 2010 23:18:25 -0500 Subject: [Biojava-dev] accepted GSoC projects In-Reply-To: References: Message-ID: <4BD7B711.9090108@cs.wisc.edu> Hi all, Thank you to Google, Open Bioinformatics Foundation, BioJava, and my mentors for this opportunity. As a short introduction, I am Mark Chapman, a graduate student in Computer Sciences at the University of Wisconsin - Madison. My focus is in artificial intelligence and bioinformatics. This summer, I will add a Multiple Sequence Alignment module to BioJava. My first task will be to update the alignment module to BioJava3 and to design the interface for MSA. My second goal is to implement a progressive MSA styled after clustalw. After that, I will add alternative routines for each step. Any ideas for the MSA project as well as more sources of programming wisdom are quite welcome. For example, Andreas suggested a series about Java parallelism and lazy execution (http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/). I also noted a useful tip for iterative development (http://en.flossmanuals.net/GSoCMentoring/Workflow). Thanks again, Mark On 4/27/2010 12:33 AM, Andreas Prlic wrote: > Dear all, > > Google has released the results for GSoC: Congratulations to Mark > Chapman and Jianjiong Gao for having been accepted to work on the MSA > and PTM projects for BioJava! Let's start the "community bonding" > process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap ) and we > all are looking forward to work with you on this during the summer. The > Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis > and Kyle Ellrott for the MSA project (and me). > > I want to thank all of of you who submitted proposals or showed interest > in other ways for the Google Summer of Code. We hope you are not too > disappointed if your application did not get accepted this time. We had > a large number (52) applications and the the overall quality of the > submissions was very high. We would like to stay in touch with you and > we hope that you are interested in BioJava also beyond the scope of > GSoC. There are a number of different ways how to contribute: We are > always looking for people who provide code and patches to further > improve our library, help out with the documentation on the Wiki page, > or answer questions on the mailing lists. > > Let's all give Mark and Jianjiong a warm welcome to the BioJava > community. For those of you who are interested in following the > progress of the projects, as usually, the development related > discussions are going to be on the biojava-dev list. > > Happy coding! > > Andreas > > From andreas at sdsc.edu Wed Apr 28 13:31:58 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 28 Apr 2010 10:31:58 -0700 Subject: [Biojava-dev] accepted GSoC projects In-Reply-To: <4BD7B711.9090108@cs.wisc.edu> References: <4BD7B711.9090108@cs.wisc.edu> Message-ID: > Any ideas for the MSA project as well as more sources of programming wisdom > are quite welcome. For example, Andreas suggested a series about Java > parallelism and lazy execution ( > http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/). > credits for the links go to Scooter, who recommended those ;-) My general recommendation is to read Joshua Bloch's "Effective Java". http://java.sun.com/docs/books/effective/ It is a collection of rules that should help in avoiding some frequently made mistakes... Andreas > I also noted a useful tip for iterative development ( > http://en.flossmanuals.net/GSoCMentoring/Workflow). > > Thanks again, > Mark > > > > On 4/27/2010 12:33 AM, Andreas Prlic wrote: > >> Dear all, >> >> Google has released the results for GSoC: Congratulations to Mark >> Chapman and Jianjiong Gao for having been accepted to work on the MSA >> and PTM projects for BioJava! Let's start the "community bonding" >> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap ) and we >> all are looking forward to work with you on this during the summer. The >> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis >> and Kyle Ellrott for the MSA project (and me). >> >> I want to thank all of of you who submitted proposals or showed interest >> in other ways for the Google Summer of Code. We hope you are not too >> disappointed if your application did not get accepted this time. We had >> a large number (52) applications and the the overall quality of the >> submissions was very high. We would like to stay in touch with you and >> we hope that you are interested in BioJava also beyond the scope of >> GSoC. There are a number of different ways how to contribute: We are >> always looking for people who provide code and patches to further >> improve our library, help out with the documentation on the Wiki page, >> or answer questions on the mailing lists. >> >> Let's all give Mark and Jianjiong a warm welcome to the BioJava >> community. For those of you who are interested in following the >> progress of the projects, as usually, the development related >> discussions are going to be on the biojava-dev list. >> >> Happy coding! >> >> Andreas >> >> >> -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From HWillis at scripps.edu Wed Apr 28 13:57:14 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Wed, 28 Apr 2010 13:57:14 -0400 Subject: [Biojava-dev] accepted GSoC projects In-Reply-To: References: <4BD7B711.9090108@cs.wisc.edu> Message-ID: <6C3A102F-AF2B-4E29-9C84-BB6B881BD083@scripps.edu> Andreas Those links were sent to me by Mark Southern who sits a couple doors down and a past BioJava contributor for the sequence viewer. We should avoid bringing in any external parallel frameworks but at minimum give ourselves enough abstraction with a backend multi-threaded job-processing approach to take advantage of a multi-processor box and a cluster via Terracotta. If the abstraction of the jobs and the mapping of resources is generic enough then that allows different implementations in various cluster environments for those who have found the next best thing in parallel computing! Scooter On Apr 28, 2010, at 1:31 PM, Andreas Prlic wrote: >> Any ideas for the MSA project as well as more sources of programming wisdom >> are quite welcome. For example, Andreas suggested a series about Java >> parallelism and lazy execution ( >> http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/). >> > > > credits for the links go to Scooter, who recommended those ;-) My general > recommendation is to read Joshua Bloch's "Effective Java". > http://java.sun.com/docs/books/effective/ It is a collection of rules that > should help in avoiding some frequently made mistakes... > > Andreas > > > > > > >> I also noted a useful tip for iterative development ( >> http://en.flossmanuals.net/GSoCMentoring/Workflow). >> >> Thanks again, >> Mark >> >> >> >> On 4/27/2010 12:33 AM, Andreas Prlic wrote: >> >>> Dear all, >>> >>> Google has released the results for GSoC: Congratulations to Mark >>> Chapman and Jianjiong Gao for having been accepted to work on the MSA >>> and PTM projects for BioJava! Let's start the "community bonding" >>> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap ) and we >>> all are looking forward to work with you on this during the summer. The >>> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis >>> and Kyle Ellrott for the MSA project (and me). >>> >>> I want to thank all of of you who submitted proposals or showed interest >>> in other ways for the Google Summer of Code. We hope you are not too >>> disappointed if your application did not get accepted this time. We had >>> a large number (52) applications and the the overall quality of the >>> submissions was very high. We would like to stay in touch with you and >>> we hope that you are interested in BioJava also beyond the scope of >>> GSoC. There are a number of different ways how to contribute: We are >>> always looking for people who provide code and patches to further >>> improve our library, help out with the documentation on the Wiki page, >>> or answer questions on the mailing lists. >>> >>> Let's all give Mark and Jianjiong a warm welcome to the BioJava >>> community. For those of you who are interested in following the >>> progress of the projects, as usually, the development related >>> discussions are going to be on the biojava-dev list. >>> >>> Happy coding! >>> >>> Andreas >>> >>> >>> > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From quantum7 at gmail.com Wed Apr 28 15:06:40 2010 From: quantum7 at gmail.com (Spencer Bliven) Date: Wed, 28 Apr 2010 12:06:40 -0700 Subject: [Biojava-dev] accepted GSoC projects In-Reply-To: <4BD7B711.9090108@cs.wisc.edu> References: <4BD7B711.9090108@cs.wisc.edu> Message-ID: Mark- Welcome to the Biojava community! Adding multiple sequence alignments will be a nice feature for the library. One suggestion I have is to make any data structures for multiple alignments you create as general as possible, and to think about whether the special cases can still be represented. For instance, can you store an alignment where some of the sequence is unknown (eg {ABCD, ABXD})? Can you store an alignment where only a subset of the sequences are defined? I recently had to represent an alignment like this: ABCD EFGH EFGH ABCD This sort of alignment can't be written using just gaps; I had to make a new structure to store pairs {(A,A), (B,B), ...} and rewrite much of the existing alignment functionality based on that. Anyway, I don't mean to get bogged down in specific examples or exceptions. I just wanted to point out that there are a lot of methods which can be used to define some sort of alignment between a set of sequences, and it would be nice if the BioJava alignment package was general enough to accommodate such methods in the future without reinventing the wheel. Cheers! Spencer P.S. I ran into such weird alignments while working on structural alignments, which are not well behaved like traditional multiple sequence alignments. Andreas knows all about both types of alignment, and can probably judge better than I how much generality is worth spending your time on. On Tue, Apr 27, 2010 at 9:18 PM, Mark Chapman wrote: > Hi all, > > Thank you to Google, Open Bioinformatics Foundation, BioJava, and my > mentors for this opportunity. As a short introduction, I am Mark Chapman, a > graduate student in Computer Sciences at the University of Wisconsin - > Madison. My focus is in artificial intelligence and bioinformatics. This > summer, I will add a Multiple Sequence Alignment module to BioJava. > > My first task will be to update the alignment module to BioJava3 and to > design the interface for MSA. My second goal is to implement a progressive > MSA styled after clustalw. After that, I will add alternative routines for > each step. > > Any ideas for the MSA project as well as more sources of programming wisdom > are quite welcome. For example, Andreas suggested a series about Java > parallelism and lazy execution ( > http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/). > I also noted a useful tip for iterative development ( > http://en.flossmanuals.net/GSoCMentoring/Workflow). > > Thanks again, > Mark > > > > On 4/27/2010 12:33 AM, Andreas Prlic wrote: > >> Dear all, >> >> Google has released the results for GSoC: Congratulations to Mark >> Chapman and Jianjiong Gao for having been accepted to work on the MSA >> and PTM projects for BioJava! Let's start the "community bonding" >> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap ) and we >> all are looking forward to work with you on this during the summer. The >> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis >> and Kyle Ellrott for the MSA project (and me). >> >> I want to thank all of of you who submitted proposals or showed interest >> in other ways for the Google Summer of Code. We hope you are not too >> disappointed if your application did not get accepted this time. We had >> a large number (52) applications and the the overall quality of the >> submissions was very high. We would like to stay in touch with you and >> we hope that you are interested in BioJava also beyond the scope of >> GSoC. There are a number of different ways how to contribute: We are >> always looking for people who provide code and patches to further >> improve our library, help out with the documentation on the Wiki page, >> or answer questions on the mailing lists. >> >> Let's all give Mark and Jianjiong a warm welcome to the BioJava >> community. For those of you who are interested in following the >> progress of the projects, as usually, the development related >> discussions are going to be on the biojava-dev list. >> >> Happy coding! >> >> Andreas >> >> >> _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From chapman at cs.wisc.edu Wed Apr 28 21:09:07 2010 From: chapman at cs.wisc.edu (Mark Chapman) Date: Wed, 28 Apr 2010 20:09:07 -0500 Subject: [Biojava-dev] accepted GSoC projects In-Reply-To: <6C3A102F-AF2B-4E29-9C84-BB6B881BD083@scripps.edu> References: <4BD7B711.9090108@cs.wisc.edu> <6C3A102F-AF2B-4E29-9C84-BB6B881BD083@scripps.edu> Message-ID: <4BD8DC33.7010607@cs.wisc.edu> Here is a summary of the concurrency lessons I learned that are useful with or without the functional programming paradigm -- 1: implement Callable to submit tasks for concurrent/parallel/lazy execution - call() methods just wrap a call to the computation intensive method 2: share a fixed size thread pool with task queue to avoid - overhead of thread creation/destruction, - too many simultaneous threads, and - most blocking issues 3: place thread blocking Future.get() calls within tasks later in the queue - while(!Future.isDone()) Thread.yield(); may also help keep the pool active 4: execution in a task queue also enables easier logging and progress listening There are two obvious places concurrent execution will fit in the MSA module -- 1: building the distance matrix - queue pairwise alignment/scoring tasks in loop over all sequence pairs 2: progressive alignment - queue profile-profile alignment tasks in postfix traversal of guide tree (from leaves to root) All our library copies of "Effective Java" are checked out, so I ordered a copy for my personal library. The sample chapter on generics sold me. Mark On 4/28/2010 12:57 PM, Scooter Willis wrote: > Andreas > > Those links were sent to me by Mark Southern who sits a couple doors down and a past BioJava contributor for the sequence viewer. We should avoid bringing in any external parallel frameworks but at minimum give ourselves enough abstraction with a backend multi-threaded job-processing approach to take advantage of a multi-processor box and a cluster via Terracotta. If the abstraction of the jobs and the mapping of resources is generic enough then that allows different implementations in various cluster environments for those who have found the next best thing in parallel computing! > > Scooter > > On Apr 28, 2010, at 1:31 PM, Andreas Prlic wrote: > >>> Any ideas for the MSA project as well as more sources of programming wisdom >>> are quite welcome. For example, Andreas suggested a series about Java >>> parallelism and lazy execution ( >>> http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/). >>> >> >> >> credits for the links go to Scooter, who recommended those ;-) My general >> recommendation is to read Joshua Bloch's "Effective Java". >> http://java.sun.com/docs/books/effective/ It is a collection of rules that >> should help in avoiding some frequently made mistakes... >> >> Andreas >> >> >> >> >> >> >>> I also noted a useful tip for iterative development ( >>> http://en.flossmanuals.net/GSoCMentoring/Workflow). >>> >>> Thanks again, >>> Mark >>> >>> >>> >>> On 4/27/2010 12:33 AM, Andreas Prlic wrote: >>> >>>> Dear all, >>>> >>>> Google has released the results for GSoC: Congratulations to Mark >>>> Chapman and Jianjiong Gao for having been accepted to work on the MSA >>>> and PTM projects for BioJava! Let's start the "community bonding" >>>> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap ) and we >>>> all are looking forward to work with you on this during the summer. The >>>> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis >>>> and Kyle Ellrott for the MSA project (and me). >>>> >>>> I want to thank all of of you who submitted proposals or showed interest >>>> in other ways for the Google Summer of Code. We hope you are not too >>>> disappointed if your application did not get accepted this time. We had >>>> a large number (52) applications and the the overall quality of the >>>> submissions was very high. We would like to stay in touch with you and >>>> we hope that you are interested in BioJava also beyond the scope of >>>> GSoC. There are a number of different ways how to contribute: We are >>>> always looking for people who provide code and patches to further >>>> improve our library, help out with the documentation on the Wiki page, >>>> or answer questions on the mailing lists. >>>> >>>> Let's all give Mark and Jianjiong a warm welcome to the BioJava >>>> community. For those of you who are interested in following the >>>> progress of the projects, as usually, the development related >>>> discussions are going to be on the biojava-dev list. >>>> >>>> Happy coding! >>>> >>>> Andreas >>>> >>>> >>>> >> >> >> -- >> ----------------------------------------------------------------------- >> Dr. Andreas Prlic >> Senior Scientist, RCSB PDB Protein Data Bank >> University of California, San Diego >> (+1) 858.246.0526 >> ----------------------------------------------------------------------- >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > From andreas at sdsc.edu Fri Apr 30 11:29:03 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 30 Apr 2010 08:29:03 -0700 Subject: [Biojava-dev] accepted GSoC projects In-Reply-To: <4BD8DC33.7010607@cs.wisc.edu> References: <4BD7B711.9090108@cs.wisc.edu> <6C3A102F-AF2B-4E29-9C84-BB6B881BD083@scripps.edu> <4BD8DC33.7010607@cs.wisc.edu> Message-ID: Hi Mark and Jianjiong, In the meanwhile you should have received your login info for the develoment SVN server. I suggest the following things as next steps: *) If you have not done so already, sign up to the biojava-l and biojava-dev mailing lists *) Get a biojava checkout from the developmental SVN server. *) add the LGPL license javadoc header http://www.biojava.org/wiki/BioJava3_license to the templates in your IDE. *) Take a look at the JUnit tests and add a new test for something that is related for your projects *) Take a look at the Wiki pages (e.g. http://www.biojava.org/wiki/BioJava:CookBook ), get an account on the wiki and improve one of the documentation pages *) take a look at the javadocs at http://www.biojava.org/docs/api/index.html Andreas From andreas at sdsc.edu Fri Apr 30 11:44:25 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 30 Apr 2010 08:44:25 -0700 Subject: [Biojava-dev] biojava SVN Message-ID: Hi, The BioJava SVN has not been fully compiling ever since the Hackathon. I guess things were quite in flux the last months and it is now time to make sure SVN fully compiles again. There is a few things we need to figure out in order for that: * Jar files for libraries that are not in a public Maven repository. Jules : at some point you indicated that we might be able to get such jar files hosted by the EBI Maven repository. Do you think that is still an possibility and could you get a few libraries into that? In particular that would be Jmol, Astex, and probably one or two other Jar files. That would make the BioJava checkout process much smoother and not require a developer to manually install jars for full functionality. * We have a couple of modules that are fragmented and broken. This is due to historic leftovers from when we started the re-factoring process. If all the functionality has been moved into the new biojava3-core module, I would vote for removing the modules starting with sequence* Andreas -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From ayates at ebi.ac.uk Fri Apr 30 11:48:01 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 30 Apr 2010 16:48:01 +0100 Subject: [Biojava-dev] biojava SVN In-Reply-To: References: Message-ID: <475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk> Does anyone know how hard it would be to get these into the public maven repository? The EBI repo is all well & good but updating it relies on BioJava always having a committer at the EBI. Now I know that is a very likely statement but is it something we can rely on? Andy On 30 Apr 2010, at 16:44, Andreas Prlic wrote: > Hi, > > The BioJava SVN has not been fully compiling ever since the Hackathon. I > guess things were quite in flux the last months and it is now time to make > sure SVN fully compiles again. There is a few things we need to figure out > in order for that: > > * Jar files for libraries that are not in a public Maven repository. Jules : > at some point you indicated that we might be able to get such jar files > hosted by the EBI Maven repository. Do you think that is still an > possibility and could you get a few libraries into that? In particular that > would be Jmol, Astex, and probably one or two other Jar files. That would > make the BioJava checkout process much smoother and not require a developer > to manually install jars for full functionality. > > * We have a couple of modules that are fragmented and broken. This is due to > historic leftovers from when we started the re-factoring process. If all the > functionality has been moved into the new biojava3-core module, I would vote > for removing the modules starting with sequence* > > Andreas > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From ayates at ebi.ac.uk Fri Apr 30 11:57:12 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 30 Apr 2010 16:57:12 +0100 Subject: [Biojava-dev] biojava SVN In-Reply-To: References: <475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk> Message-ID: <3C3AAC8F-5C03-44C1-B121-7808C0612A65@ebi.ac.uk> As far as I remember you 'can' have one setup manually. I think I offered one hand-developed from one of my projects. Infact FYI: http://code.google.com/p/dbcon/source/browse/#svn/trunk/maven-repo It just requires the correct structure in place & it works. I went for it being hosted in SVN because there's a HTTP interface to it offered by Google. The EBI Maven repo is just a public HTTP directory. It's been some years since I did a deployment there but it's not hard to do & we should be able to do it locally & sync it to SVN Andy On 30 Apr 2010, at 16:50, Richard Holland wrote: > Could a small MVN repo be set up at OBF? > > On 30 Apr 2010, at 16:48, Andy Yates wrote: > >> Does anyone know how hard it would be to get these into the public maven repository? The EBI repo is all well & good but updating it relies on BioJava always having a committer at the EBI. Now I know that is a very likely statement but is it something we can rely on? >> >> Andy >> >> On 30 Apr 2010, at 16:44, Andreas Prlic wrote: >> >>> Hi, >>> >>> The BioJava SVN has not been fully compiling ever since the Hackathon. I >>> guess things were quite in flux the last months and it is now time to make >>> sure SVN fully compiles again. There is a few things we need to figure out >>> in order for that: >>> >>> * Jar files for libraries that are not in a public Maven repository. Jules : >>> at some point you indicated that we might be able to get such jar files >>> hosted by the EBI Maven repository. Do you think that is still an >>> possibility and could you get a few libraries into that? In particular that >>> would be Jmol, Astex, and probably one or two other Jar files. That would >>> make the BioJava checkout process much smoother and not require a developer >>> to manually install jars for full functionality. >>> >>> * We have a couple of modules that are fragmented and broken. This is due to >>> historic leftovers from when we started the re-factoring process. If all the >>> functionality has been moved into the new biojava3-core module, I would vote >>> for removing the modules starting with sequence* >>> >>> Andreas >>> >>> >>> -- >>> ----------------------------------------------------------------------- >>> Dr. Andreas Prlic >>> Senior Scientist, RCSB PDB Protein Data Bank >>> University of California, San Diego >>> (+1) 858.246.0526 >>> ----------------------------------------------------------------------- >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From andreas at sdsc.edu Fri Apr 30 12:27:09 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 30 Apr 2010 09:27:09 -0700 Subject: [Biojava-dev] biojava SVN In-Reply-To: References: <475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk> Message-ID: > Could a small MVN repo be set up at OBF? I am pretty sure we could do that. Anybody volunteering? I can help with getting the necessary permissions... Anybody knows some good docu for how to set this up? Andreas On Fri, Apr 30, 2010 at 8:50 AM, Richard Holland wrote: > Could a small MVN repo be set up at OBF? > > On 30 Apr 2010, at 16:48, Andy Yates wrote: > > > Does anyone know how hard it would be to get these into the public maven > repository? The EBI repo is all well & good but updating it relies on > BioJava always having a committer at the EBI. Now I know that is a very > likely statement but is it something we can rely on? > > > > Andy > > > > On 30 Apr 2010, at 16:44, Andreas Prlic wrote: > > > >> Hi, > >> > >> The BioJava SVN has not been fully compiling ever since the Hackathon. I > >> guess things were quite in flux the last months and it is now time to > make > >> sure SVN fully compiles again. There is a few things we need to figure > out > >> in order for that: > >> > >> * Jar files for libraries that are not in a public Maven repository. > Jules : > >> at some point you indicated that we might be able to get such jar files > >> hosted by the EBI Maven repository. Do you think that is still an > >> possibility and could you get a few libraries into that? In particular > that > >> would be Jmol, Astex, and probably one or two other Jar files. That > would > >> make the BioJava checkout process much smoother and not require a > developer > >> to manually install jars for full functionality. > >> > >> * We have a couple of modules that are fragmented and broken. This is > due to > >> historic leftovers from when we started the re-factoring process. If all > the > >> functionality has been moved into the new biojava3-core module, I would > vote > >> for removing the modules starting with sequence* > >> > >> Andreas > >> > >> > >> -- > >> ----------------------------------------------------------------------- > >> Dr. Andreas Prlic > >> Senior Scientist, RCSB PDB Protein Data Bank > >> University of California, San Diego > >> (+1) 858.246.0526 > >> ----------------------------------------------------------------------- > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > -- > > Andrew Yates Ensembl Genomes Engineer > > EMBL-EBI Tel: +44-(0)1223-492538 > > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From holland at eaglegenomics.com Fri Apr 30 11:50:52 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 30 Apr 2010 16:50:52 +0100 Subject: [Biojava-dev] biojava SVN In-Reply-To: <475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk> References: <475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk> Message-ID: Could a small MVN repo be set up at OBF? On 30 Apr 2010, at 16:48, Andy Yates wrote: > Does anyone know how hard it would be to get these into the public maven repository? The EBI repo is all well & good but updating it relies on BioJava always having a committer at the EBI. Now I know that is a very likely statement but is it something we can rely on? > > Andy > > On 30 Apr 2010, at 16:44, Andreas Prlic wrote: > >> Hi, >> >> The BioJava SVN has not been fully compiling ever since the Hackathon. I >> guess things were quite in flux the last months and it is now time to make >> sure SVN fully compiles again. There is a few things we need to figure out >> in order for that: >> >> * Jar files for libraries that are not in a public Maven repository. Jules : >> at some point you indicated that we might be able to get such jar files >> hosted by the EBI Maven repository. Do you think that is still an >> possibility and could you get a few libraries into that? In particular that >> would be Jmol, Astex, and probably one or two other Jar files. That would >> make the BioJava checkout process much smoother and not require a developer >> to manually install jars for full functionality. >> >> * We have a couple of modules that are fragmented and broken. This is due to >> historic leftovers from when we started the re-factoring process. If all the >> functionality has been moved into the new biojava3-core module, I would vote >> for removing the modules starting with sequence* >> >> Andreas >> >> >> -- >> ----------------------------------------------------------------------- >> Dr. Andreas Prlic >> Senior Scientist, RCSB PDB Protein Data Bank >> University of California, San Diego >> (+1) 858.246.0526 >> ----------------------------------------------------------------------- >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From chapmanb at 50mail.com Fri Apr 2 13:07:06 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 2 Apr 2010 09:07:06 -0400 Subject: [Biojava-dev] BOSC and OpenBio solution challenge reminder -- April 15th Message-ID: <20100402130706.GJ36623@sobchak.mgh.harvard.edu> Hello all; A friendly reminder that the deadline for the Bioinformatics Open Source Conference (BOSC) is coming up on April 15th: http://www.open-bio.org/wiki/BOSC_2010 This is a great opportunity to discuss code and biology with fellow developers. One session which I'd like to emphasize is the OpenBio Solution Challenge, a section of talks that describes how to solve practical problems in bioinformatics using a variety of approaches: http://www.open-bio.org/wiki/SolutionChallenge Any toolkit developers who are interested in giving a talk are encouraged to submit an abstract for the challenge. We have some initial project ideas on the page and welcome your feedback for other useful workflows that would emphasize the advantages of using open source toolkits to solve biological problems. Please copy messages to the OpenBio mailing list as a central point for discussion and questions: http://lists.open-bio.org/mailman/listinfo/open-bio-l Looking forward to seeing everyone in July, Brad BOSC contact and dates: Date: July 9-10, 2010 Location: Boston, Massachusetts, USA BOSC 2010 web site: http://www.open-bio.org/wiki/BOSC_2010 Abstract submission via Open Conference System site: http://events.open-bio.org/BOSC2010/openconf.php E-mail: bosc at open-bio.org Bosc-announce list: http://lists.open-bio.org/mailman/listinfo/bosc-announce Important Dates April 15: Abstract deadline May 5: Notification of accepted abstracts May 28: Early Registration Discount Cut-off date July 8-9: Codefest 2010 July 9-10: BOSC 2010 August 15: Manuscript deadline for BOSC 2010 Proceedings published in BMC Bioinformatics From andreas at sdsc.edu Fri Apr 2 17:25:39 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 2 Apr 2010 10:25:39 -0700 Subject: [Biojava-dev] BOSC Message-ID: Hi, who is going to BOSC this year and who wants to present a BioJava talk? Andreas -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From holland at eaglegenomics.com Fri Apr 2 19:37:06 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 2 Apr 2010 20:37:06 +0100 Subject: [Biojava-dev] BOSC In-Reply-To: References: Message-ID: I will be there but for various reasons I can't talk this year. On 2 Apr 2010, at 18:25, Andreas Prlic wrote: > Hi, > > who is going to BOSC this year and who wants to present a BioJava talk? > > Andreas > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From heuermh at acm.org Sat Apr 3 03:23:15 2010 From: heuermh at acm.org (Michael Heuer) Date: Fri, 2 Apr 2010 23:23:15 -0400 (EDT) Subject: [Biojava-dev] BOSC In-Reply-To: Message-ID: Andreas Prlic wrote: > who is going to BOSC this year and who wants to present a BioJava talk? I will be there. I'm probably not the best person to present this time around. michael From aradwen at gmail.com Sat Apr 3 10:18:40 2010 From: aradwen at gmail.com (Radwen Aniba) Date: Sat, 3 Apr 2010 12:18:40 +0200 Subject: [Biojava-dev] Protein sequence composition Message-ID: Hello, I'm writing an application that treats protein sequences, and I am using Biojava for a couple of things. One of these processings is to parse protein multifasta files, and treat the sequences one after the other. One of my purposes is to calculate composition. By composition I mean that I am interested to know in a given protein sequence what is the mean and the standard deviation composition of these groups : PAGST EDNQ LIVM KRH C example : protein fasta file : >SEQ1 DVSFRLSGATSSSYGVFISNLRKALPNERKLYDIPLLRSSLPGSQRYALI HLTNYADETISVAIDVTNVYIMGYRAGDTSYFFNEASATEAAKYVFKDAM RKVTLPYSGNYERLQTAAGKIRENIPLGLPALDSAITTLFYYNANSAASA LMVLIQSTSEAARYKFIEQQIGKRVDKTFLPSLAIISLENSWSALSKQIQ IASTNNGQFESPVVLINAQNQRVTITNVDAGVVTSNIALLLNRNNMA >SEQ2 IFPKQYPIINFTTAGATVQSYTNFIRAVRGRLTTGADVRHEIPVLPNRVG LPINQRFILVELSNHAELSVTLALDVTNAYVVGYRAGNSAYFFHPDNQED AEAITHLFTDVQNRYTFAFGGNYDRLEQLAGNLRENIELGNGPLEEAISA LYYYSTGGTQLPTLARSFIICIQMISEAARFQYIEGEMRTRIRYNRRSAP DPSVITLENSWGRLSTAIQESNQGAFASPIQLQRRNGSKFSVYDVSILIP IIALMVYRCAPPPSSQF I would like to 1/ parse SEQ1 to calculate the composition mean of PAGST residues for example ( number of residus/ length of the sequence) 2/ do same thing for SEQ2 3 / return the average mean of both sequences 4/ Return standard deviation of these values. I can do it writing a standard java code, but I would like to know (as I am using biojava already) if this is possible or not ( Which class / instances to use) Cheers From chapman at cs.wisc.edu Sat Apr 3 13:08:23 2010 From: chapman at cs.wisc.edu (Mark Chapman) Date: Sat, 03 Apr 2010 08:08:23 -0500 Subject: [Biojava-dev] Protein sequence composition In-Reply-To: References: Message-ID: <4BB73DC7.2020000@cs.wisc.edu> Hi Radwen, The example below solves most of what you asked for. It may not be the most elegant solution, but it should get you started in the right direction. Saving the proteins into sample.fasta and running the following command: > java ProteinComposition sample.fasta PAGST EDNQ LIVM KRH C produces the output: SEQ1 247: 0.36032388 0.19433199 0.25506073 0.097165994 0.0 SEQ2 267: 0.3445693 0.20224719 0.23595506 0.101123594 0.007490637 Take care, Mark -- ProteinComposition.java -- import java.io.*; import java.util.NoSuchElementException; import org.biojava.bio.BioException; import org.biojava.bio.seq.*; import org.biojava.bio.seq.db.SequenceDB; import org.biojava.bio.seq.io.SeqIOTools; import org.biojava.bio.symbol.*; @SuppressWarnings("deprecation") public class ProteinComposition { /** * Determines the composition of proteins in a Fasta file. * @param args ... * : file name of the Fasta file * : group(s) of one or more amino acid residues, statistics are printed out for each group */ public static void main(String[] args) { try { // load Fasta file into memory BufferedInputStream is = new BufferedInputStream(new FileInputStream(args[0])); Alphabet alpha = AlphabetManager.alphabetForName("PROTEIN"); SequenceDB db = SeqIOTools.readFasta(is, alpha); // load command line arguments into memory SymbolList[] res = new SymbolList[args.length-1]; for (int a = 1; a < args.length; a++) res[a-1] = ProteinTools.createProtein(args[a]); // store length and composition of each protein int[] lengths = new int[db.ids().size()]; int[][] counts = new int[lengths.length][res.length]; float[][] means = new float[lengths.length][res.length]; // iterate over proteins in Fasta file SequenceIterator sI = db.sequenceIterator(); for (int s = 0; sI.hasNext(); s++) { Sequence seq = sI.nextSequence(); lengths[s] = seq.length(); // iterate over each amino acid for (Object sr : seq.toList()) // check for amino acid in each residue group for (int a = 1; a < args.length; a++) // iterate over each residue in group for (Object r : res[a-1].toList()) // increment count if amino acid has a match in residue group if (((Symbol) r).getMatches().contains((Symbol) sr)) { counts[s][a-1]++; break; } // print "name length: composition" for each protein System.out.print(seq.getName() + "\t" + seq.length() + ":"); for (int a = 1; a < args.length; a++) System.out.print("\t" + (means[s][a-1] = (float) counts[s][a-1] / lengths[s])); System.out.println(); } } catch (FileNotFoundException ex) { System.err.println("Problem reading file..."); ex.printStackTrace(); } catch (BioException ex) { System.err.println("File not in fasta format or wrong alphabet..."); ex.printStackTrace(); } catch (NoSuchElementException ex) { System.err.println("No fasta sequences in the file..."); ex.printStackTrace(); } } } On 4/3/2010 5:18 AM, Radwen Aniba wrote: > Hello, > > I'm writing an application that treats protein sequences, and I am using > Biojava for a couple of things. > One of these processings is to parse protein multifasta files, and treat the > sequences one after the other. One of my purposes is to calculate > composition. By composition I mean that I am interested to know in a given > protein sequence what is the mean and the standard deviation composition of > these groups : > > PAGST > EDNQ > LIVM > KRH > C > > example : > > protein fasta file : > >> SEQ1 > > DVSFRLSGATSSSYGVFISNLRKALPNERKLYDIPLLRSSLPGSQRYALI > HLTNYADETISVAIDVTNVYIMGYRAGDTSYFFNEASATEAAKYVFKDAM > RKVTLPYSGNYERLQTAAGKIRENIPLGLPALDSAITTLFYYNANSAASA > LMVLIQSTSEAARYKFIEQQIGKRVDKTFLPSLAIISLENSWSALSKQIQ > IASTNNGQFESPVVLINAQNQRVTITNVDAGVVTSNIALLLNRNNMA > >> SEQ2 > > IFPKQYPIINFTTAGATVQSYTNFIRAVRGRLTTGADVRHEIPVLPNRVG > LPINQRFILVELSNHAELSVTLALDVTNAYVVGYRAGNSAYFFHPDNQED > AEAITHLFTDVQNRYTFAFGGNYDRLEQLAGNLRENIELGNGPLEEAISA > LYYYSTGGTQLPTLARSFIICIQMISEAARFQYIEGEMRTRIRYNRRSAP > DPSVITLENSWGRLSTAIQESNQGAFASPIQLQRRNGSKFSVYDVSILIP > IIALMVYRCAPPPSSQF > > I would like to > 1/ parse SEQ1 to calculate the composition mean of PAGST residues for > example ( number of residus/ length of the sequence) > 2/ do same thing for SEQ2 > 3 / return the average mean of both sequences > 4/ Return standard deviation of these values. > > > I can do it writing a standard java code, but I would like to know (as I am > using biojava already) if this is possible or not ( Which class / instances > to use) > > Cheers > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From andreas.prlic at gmail.com Sat Apr 3 15:20:58 2010 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Sat, 3 Apr 2010 08:20:58 -0700 Subject: [Biojava-dev] BOSC In-Reply-To: References: Message-ID: <2E000562-884F-4172-A94D-61488C605A9F@gmail.com> I am planning to attend 3d-SIG this year and will be mentioning biojava there... Andreas On 2 Apr 2010, at 20:23, Michael Heuer wrote: > Andreas Prlic wrote: > >> who is going to BOSC this year and who wants to present a BioJava >> talk? > > I will be there. I'm probably not the best person to present this > time > around. > > michael > From sheoran143 at gmail.com Sun Apr 11 19:16:29 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Sun, 11 Apr 2010 14:16:29 -0500 Subject: [Biojava-dev] Issue with SimpleNCBITaxon class Message-ID: <4BC2200D.8000109@gmail.com> Hi, Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it. 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue) 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id. ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry Thanks Deepak Sheoran From holland at eaglegenomics.com Sun Apr 11 19:53:06 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Sun, 11 Apr 2010 20:53:06 +0100 Subject: [Biojava-dev] Issue with SimpleNCBITaxon class In-Reply-To: <4BC2200D.8000109@gmail.com> References: <4BC2200D.8000109@gmail.com> Message-ID: I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead). thanks, Richard On 11 Apr 2010, at 20:16, Deepak Sheoran wrote: > Hi, > > Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it. > > 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue) > 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id. > > > > > > > > ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry > > Thanks > Deepak Sheoran > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From sheoran143 at gmail.com Sun Apr 11 21:08:22 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Sun, 11 Apr 2010 16:08:22 -0500 Subject: [Biojava-dev] Issue with SimpleNCBITaxon class In-Reply-To: References: <4BC2200D.8000109@gmail.com> Message-ID: <4BC23A46.7090304@gmail.com> I am using same table with biojava and bioperl taxon program and the output I get is below: *Biojava:* For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum var. haydenii. Biojava process of finding names: 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240 (wrong way of doing things) *Bioperl:* For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is Retroviridae; Orthoretrovirinae; Alpharetrovirus; unclassified Alpharetrovirus. Bioperl process of finding names: 11876==>353825==>153057==>327045==>11632 (Right way of doing things) Hint: biojava search ncbi_taxon_id column with a value from parent_taxon_id where bioperl search taxon_id column with a value from parent_taxon_id. *Taxon and Taxon_name Table content which is being relevant in discussion:* taxon_id ncbi_taxon_id parent_taxon_id node_rank name name_class 2901 3609 276240 genus Rhamnus scientific name 3610 4403 3609 species Platanus occidentalis scientific name 29052 48579 4403 species Suillus placidus scientific name 114412 143975 48579 species Diadasia australis scientific name 143976 176516 143975 species Arnicastrum guerrerense scientific name 30680 50447 176516 family Labiduridae scientific name 254757 301952 50447 varietas Oreostemma alpigenum var. haydenii scientific name 9394 11632 17394 family Retroviridae scientific name 277861 327045 9394 subfamily Orthoretrovirinae scientific name 122448 153057 277861 genus Alpharetrovirus scientific name 301952 353825 122448 no rank unclassified Alpharetrovirus scientific name 9584 11876 301952 species Avian sarcoma virus scientifice name Thanks Deepak On 4/11/2010 2:53 PM, Richard Holland wrote: > I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead). > > thanks, > Richard > > On 11 Apr 2010, at 20:16, Deepak Sheoran wrote: > > >> Hi, >> >> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it. >> >> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue) >> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id. >> >> >> >> >> >> >> >> ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry >> >> Thanks >> Deepak Sheoran >> >> >> > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > From sheoran143 at gmail.com Sun Apr 11 22:48:00 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Sun, 11 Apr 2010 17:48:00 -0500 Subject: [Biojava-dev] Issue with SimpleNCBITaxon class In-Reply-To: <4BC23A46.7090304@gmail.com> References: <4BC2200D.8000109@gmail.com> <4BC23A46.7090304@gmail.com> Message-ID: <4BC251A0.4090602@gmail.com> If we don't want to change the current code in biojava and still want to fix this bug I have found a way, 1) we can do this by changing one of hibernate files called "Taxon.hbm.xml" and replace the line with by changing the above setting in hibernate setting I am able to get the correct linage for ncbi_taxon_id = 11876(Avian sarcoma virus) which is Viruses; Retro-transcribing viruses; Retroviridae; Orthoretrovirinae; Alpharetrovirus; unclassified Alpharetrovirus. 2) But the possible issue which we might get is with Taxonomy loader class which want to insert something for parent taxon_id into taxon table which I think won't be possible if we do this change to hibernate con-fig file. Deepak Sheoran On 4/11/2010 4:08 PM, Deepak Sheoran wrote: > I am using same table with biojava and bioperl taxon program and the > output I get is below: > > *Biojava:* > For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the > lineage i get is > Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia > australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum > var. haydenii. > > Biojava process of finding names: > 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240 > (wrong way of doing things) > > *Bioperl:* > For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the > lineage i get is > Retroviridae; Orthoretrovirinae; Alpharetrovirus; > unclassified Alpharetrovirus. > > Bioperl process of finding names: > 11876==>353825==>153057==>327045==>11632 (Right way of doing things) > > Hint: biojava search ncbi_taxon_id column with a value from > parent_taxon_id where bioperl search taxon_id column with a value from > parent_taxon_id. > > *Taxon and Taxon_name Table content which is being relevant in > discussion:* > > taxon_id ncbi_taxon_id parent_taxon_id node_rank name name_class > 2901 3609 276240 genus Rhamnus scientific name > 3610 4403 3609 species Platanus occidentalis scientific name > 29052 48579 4403 species Suillus placidus scientific name > 114412 143975 48579 species Diadasia australis scientific name > 143976 176516 143975 species Arnicastrum guerrerense scientific name > 30680 50447 176516 family Labiduridae scientific name > 254757 301952 50447 varietas Oreostemma alpigenum var. haydenii > scientific name > 9394 11632 17394 family Retroviridae scientific name > 277861 327045 9394 subfamily Orthoretrovirinae scientific name > 122448 153057 277861 genus Alpharetrovirus scientific name > 301952 353825 122448 no rank unclassified Alpharetrovirus > scientific name > 9584 > 11876 > 301952 > species > Avian sarcoma virus > scientifice name > > > Thanks > Deepak > > On 4/11/2010 2:53 PM, Richard Holland wrote: >> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead). >> >> thanks, >> Richard >> >> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote: >> >> >>> Hi, >>> >>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it. >>> >>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue) >>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id. >>> >>> >>> >>> >>> >>> >>> >>> ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry >>> >>> Thanks >>> Deepak Sheoran >>> >>> >>> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E:holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> > From holland at eaglegenomics.com Mon Apr 12 07:07:55 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 12 Apr 2010 08:07:55 +0100 Subject: [Biojava-dev] [Biojava-l] Issue with SimpleNCBITaxon class In-Reply-To: References: <4BC2200D.8000109@gmail.com> <4BC23A46.7090304@gmail.com> Message-ID: Incidentally, BioJava's approach matches the description in the BioSQL docs at: http://biosql.org/wiki/Schema_Overview#TAXON.2C_TAXON_NAME (first example SQL statement - find the taxon id of the parent taxon for 'Homo sapiens' using a self-join) The BioPerl/BioSQL load_ncbi_taxonomy.pl script however does not match this description. cheers, Richard On 12 Apr 2010, at 07:57, Richard Holland wrote: > Thanks Deepak. > > I've had a look at the code and I believe its due to the different ways in which BioJava and BioPerl load the taxon table. > > BioJava sets the ncbi_taxon_id and parent_taxon_id columns based on the values from the NCBI taxonomy file. The taxon_id column in BioJava is a meaningless auto-generated value that is never used. > > BioPerl however is generating taxon_id values and linking them by setting parent_taxon_id to the generated value. The parent value from the NCBI taxonomy file is therefore replaced with the BioPerl generated parent ID, meaning that instead of linking from parent_taxon_id to ncbi_taxon_id as per BioJava, the link is to taxon_id instead. (I'm basing this comment on looking at load_ncbi_taxonomy.pl from the BioSQL archives.) > > I believe if you load the taxonomy table using BioJava, you should see BioJava giving correct behaviour. Likewise if you load it using BioPerl, BioPerl will behave correctly. But if you load with one then query with the other, you'll get incorrect results. > > This sounds like a case for discussion on both lists - a matter of standardisation between the two projects. Not quickly/easily solvable for now. > > cheers, > Richard > > On 11 Apr 2010, at 22:08, Deepak Sheoran wrote: > >> I am using same table with biojava and bioperl taxon program and the output I get is below: >> >> Biojava: >> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is >> Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum var. haydenii. >> >> Biojava process of finding names: 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240 (wrong way of doing things) >> >> Bioperl: >> For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is >> Retroviridae; Orthoretrovirinae; Alpharetrovirus; unclassified Alpharetrovirus. >> >> Bioperl process of finding names: 11876==>353825==>153057==>327045==>11632 (Right way of doing things) >> >> Hint: biojava search ncbi_taxon_id column with a value from parent_taxon_id where bioperl search taxon_id column with a value from parent_taxon_id. >> >> Taxon and Taxon_name Table content which is being relevant in discussion: >> >> taxon_id ncbi_taxon_id parent_taxon_id node_rank name name_class >> 2901 3609 276240 genus Rhamnus scientific name >> 3610 4403 3609 species Platanus occidentalis scientific name >> 29052 48579 4403 species Suillus placidus scientific name >> 114412 143975 48579 species Diadasia australis scientific name >> 143976 176516 143975 species Arnicastrum guerrerense scientific name >> 30680 50447 176516 family Labiduridae scientific name >> 254757 301952 50447 varietas Oreostemma alpigenum var. haydenii scientific name >> 9394 11632 17394 family Retroviridae scientific name >> 277861 327045 9394 subfamily Orthoretrovirinae scientific name >> 122448 153057 277861 genus Alpharetrovirus scientific name >> 301952 353825 122448 no rank unclassified Alpharetrovirus scientific name >> 9584 >> 11876 >> 301952 >> species >> Avian sarcoma virus >> scientifice name >> >> Thanks >> Deepak >> >> On 4/11/2010 2:53 PM, Richard Holland wrote: >>> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead). >>> >>> thanks, >>> Richard >>> >>> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote: >>> >>> >>> >>>> Hi, >>>> >>>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it. >>>> >>>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue) >>>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry >>>> >>>> Thanks >>>> Deepak Sheoran >>>> >>>> >>>> >>>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: >>> holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >>> >>> >> > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Mon Apr 12 06:57:57 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 12 Apr 2010 07:57:57 +0100 Subject: [Biojava-dev] Issue with SimpleNCBITaxon class In-Reply-To: <4BC23A46.7090304@gmail.com> References: <4BC2200D.8000109@gmail.com> <4BC23A46.7090304@gmail.com> Message-ID: Thanks Deepak. I've had a look at the code and I believe its due to the different ways in which BioJava and BioPerl load the taxon table. BioJava sets the ncbi_taxon_id and parent_taxon_id columns based on the values from the NCBI taxonomy file. The taxon_id column in BioJava is a meaningless auto-generated value that is never used. BioPerl however is generating taxon_id values and linking them by setting parent_taxon_id to the generated value. The parent value from the NCBI taxonomy file is therefore replaced with the BioPerl generated parent ID, meaning that instead of linking from parent_taxon_id to ncbi_taxon_id as per BioJava, the link is to taxon_id instead. (I'm basing this comment on looking at load_ncbi_taxonomy.pl from the BioSQL archives.) I believe if you load the taxonomy table using BioJava, you should see BioJava giving correct behaviour. Likewise if you load it using BioPerl, BioPerl will behave correctly. But if you load with one then query with the other, you'll get incorrect results. This sounds like a case for discussion on both lists - a matter of standardisation between the two projects. Not quickly/easily solvable for now. cheers, Richard On 11 Apr 2010, at 22:08, Deepak Sheoran wrote: > I am using same table with biojava and bioperl taxon program and the output I get is below: > > Biojava: > For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is > Rhamnus; Platanus occidentalis; Suillus placidus; Diadasia australis; Arnicastrum guerrerense; Labiduridae; Oreostemma alpigenum var. haydenii. > > Biojava process of finding names: 11876==>3019252==>50447==>176516==>143975==>48579==>4403==>3609==>276240 (wrong way of doing things) > > Bioperl: > For example for ncbi_taxon_id = 11876 (Avian sarcoma virus), the lineage i get is > Retroviridae; Orthoretrovirinae; Alpharetrovirus; unclassified Alpharetrovirus. > > Bioperl process of finding names: 11876==>353825==>153057==>327045==>11632 (Right way of doing things) > > Hint: biojava search ncbi_taxon_id column with a value from parent_taxon_id where bioperl search taxon_id column with a value from parent_taxon_id. > > Taxon and Taxon_name Table content which is being relevant in discussion: > > taxon_id ncbi_taxon_id parent_taxon_id node_rank name name_class > 2901 3609 276240 genus Rhamnus scientific name > 3610 4403 3609 species Platanus occidentalis scientific name > 29052 48579 4403 species Suillus placidus scientific name > 114412 143975 48579 species Diadasia australis scientific name > 143976 176516 143975 species Arnicastrum guerrerense scientific name > 30680 50447 176516 family Labiduridae scientific name > 254757 301952 50447 varietas Oreostemma alpigenum var. haydenii scientific name > 9394 11632 17394 family Retroviridae scientific name > 277861 327045 9394 subfamily Orthoretrovirinae scientific name > 122448 153057 277861 genus Alpharetrovirus scientific name > 301952 353825 122448 no rank unclassified Alpharetrovirus scientific name > 9584 > 11876 > 301952 > species > Avian sarcoma virus > scientifice name > > Thanks > Deepak > > On 4/11/2010 2:53 PM, Richard Holland wrote: >> I'm sorry but I don't understand your example. Could you provide a real example of correct values for each column from a sample taxon entry in NCBI, plus an example of what BioJava is doing wrong? (i.e. give a sample record to use as reference, then point out the correct value of parent_taxon_id, and point out what value BioJava is using instead). >> >> thanks, >> Richard >> >> On 11 Apr 2010, at 20:16, Deepak Sheoran wrote: >> >> >> >>> Hi, >>> >>> Their is very fundamental issue in SimpleNCBITaxon class becuase of which it is producing wrong taxonomy hierarchy. I am explaing what I have found let me what you guys think of it, and me suggest how to fix it. >>> >>> 1) Columns in taxon table are (taxon_id, ncbi_taxon_id, parent_taxon_id, nodeRank, geneticCode, mitoGeneticCode, leftValue, rightValue) >>> 2) In the class SimpleNCBITaxon we are thinking "parent_taxon_id" to have parent ncbi_taxon_id for current ncbi_taxon_id value, but its not true. The value which "parent_taxon_id" have is "taxon_id" which have parent_ncbi_taxon_id of current ncbi_taxon_id. >>> >>> >>> >>> >>> >>> >>> >>> ----- its not correct column parent_taxon_id stores the taxon_id which have parent_ncbi_taxon_id for current entry >>> >>> Thanks >>> Deepak Sheoran >>> >>> >>> >>> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: >> holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> >> >> > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From trevor.paterson at roslin.ed.ac.uk Tue Apr 13 11:41:01 2010 From: trevor.paterson at roslin.ed.ac.uk (trevor paterson (RI)) Date: Tue, 13 Apr 2010 12:41:01 +0100 Subject: [Biojava-dev] Biojava3 structure In-Reply-To: <59a41c431003281902ic2c5ed3h4a2383899f465a8@mail.gmail.com> Message-ID: <050C9A545DC1D84BAC7A678B76A56C3C254D5F9552@ebrcexch1.ebrc.bbsrc.ac.uk> Andreas I am trying to do an anoymous checkout of the whole bio-java 3 trunk and it is failing on the structure module I cant even do a copy command the src/main tree seems corrupted - throwing an error Error: Decompression of svndiff data failed Trevor Paterson PhD new email trevor.paterson at roslin.ed.ac.uk Bioinformatics The Roslin Institute The Royal (Dick) School of Veterinary Studies University of Edinburgh Scotland EH25 9PS phone +44 (0)131 5274197 http://www.roslin.ed.ac.uk http://www.resspecies.org http://www.thearkdb.org Please consider the environment before printing this e-mail The University of Edinburgh is a charitable body, registered in Scotland with registration number SC005336 Disclaimer:This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org > [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of > Andreas Prlic > Sent: 29 March 2010 03:03 > To: Scooter Willis > Cc: biojava-dev > Subject: Re: [Biojava-dev] Biojava3 structure > > Hi Scooter, > > at the present the structure modules depend on the alignment > module and on the (old) core module. This is for aligning > ATOM and SEQRES residues in the PDB files, and for the Smith > Waterman alignment based 3D structure superposition. If we > target a release of biojava 3 in about a month, I don't think > it will be possible to break this out, mainly because the > alignment module is still based on the biojava 1 code base. > Overall I think that the core module probably should still be > part of the BioJava 3 release. Any opinions on that? > > Andreas > > On Sun, Mar 28, 2010 at 3:06 PM, Scooter Willis > wrote: > > > Andreas > > > > I needed to do some work with a PDB file so started to use the > > structure library. It looks like it depends on all the old biojava > > code. Mainly the structure exceptions that extend > bioexception is the > > first thing tripping me up. Should the biojava3-structure > module have > > any external dependencies or am I working with the wrong package? > > > > Thanks > > > > Scooter > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From andreas at sdsc.edu Tue Apr 13 14:04:20 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 13 Apr 2010 07:04:20 -0700 Subject: [Biojava-dev] Biojava3 structure In-Reply-To: <050C9A545DC1D84BAC7A678B76A56C3C254D5F9552@ebrcexch1.ebrc.bbsrc.ac.uk> References: <59a41c431003281902ic2c5ed3h4a2383899f465a8@mail.gmail.com> <050C9A545DC1D84BAC7A678B76A56C3C254D5F9552@ebrcexch1.ebrc.bbsrc.ac.uk> Message-ID: Hi Trevor, I can confirm the same behaviour from our anonymous SVN. Developer SVN seems to be ok and I also ran an svnadmin verify without problems. I suppose we are having issues with the anonymous SVN server again... I'll ask the OBF helpdesk to take a another look ... Can you try and let us know if checkout from svn/git from github works for you in the meanwhile ? e.g. svn co http://svn.github.com/biojava/biojava.git ./biojava Thanks, Andreas On Tue, Apr 13, 2010 at 4:41 AM, trevor paterson (RI) wrote: > Andreas > > I am trying to do an anoymous checkout of the whole bio-java 3 trunk ?and it is failing on the structure module > > I cant even do a copy command > > the src/main tree seems corrupted - throwing an error > Error: Decompression of svndiff data failed > > Trevor Paterson PhD > new email trevor.paterson at roslin.ed.ac.uk > > Bioinformatics > The Roslin Institute > The Royal (Dick) School of Veterinary Studies > University of Edinburgh > Scotland EH25 9PS > phone +44 (0)131 5274197 > http://www.roslin.ed.ac.uk > http://www.resspecies.org > http://www.thearkdb.org > Please consider the environment before printing this e-mail > > The University of Edinburgh is a charitable body, registered in Scotland with registration number SC005336 > Disclaimer:This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > >> -----Original Message----- >> From: biojava-dev-bounces at lists.open-bio.org >> [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of >> Andreas Prlic >> Sent: 29 March 2010 03:03 >> To: Scooter Willis >> Cc: biojava-dev >> Subject: Re: [Biojava-dev] Biojava3 structure >> >> Hi Scooter, >> >> at the present the structure modules depend on the alignment >> module and on the (old) core module. ?This is for aligning >> ATOM and SEQRES residues in the PDB files, and for the Smith >> Waterman alignment based 3D structure superposition. If we >> target a release of biojava 3 in about a month, I don't think >> it will be possible to break this out, mainly because the >> alignment module is still based on the biojava 1 code base. >> Overall I think that the core module probably should still be >> part of the BioJava 3 release. Any opinions on that? >> >> Andreas >> >> On Sun, Mar 28, 2010 at 3:06 PM, Scooter Willis >> wrote: >> >> > Andreas >> > >> > I needed to do some work with a PDB file so started to use the >> > structure library. It looks like it depends on all the old biojava >> > code. Mainly the structure exceptions that extend >> bioexception is the >> > first thing tripping me up. Should the biojava3-structure >> module have >> > any external dependencies or am I working with the wrong package? >> > >> > Thanks >> > >> > Scooter >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From biopython at maubp.freeserve.co.uk Thu Apr 15 17:54:56 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 15 Apr 2010 18:54:56 +0100 Subject: [Biojava-dev] [Biojava-l] Issue with SimpleNCBITaxon class In-Reply-To: References: <4BC2200D.8000109@gmail.com> <4BC23A46.7090304@gmail.com> Message-ID: Hi, I've CC'd this to the BioSQL mailing list for cross project discussion. On Mon, Apr 12, 2010 at 7:57 AM, Richard Holland wrote: > Thanks Deepak. > > I've had a look at the code and I believe its due to the > different ways in which BioJava and BioPerl load the > taxon table. > > BioJava sets the ncbi_taxon_id and parent_taxon_id > columns based on the values from the NCBI taxonomy > file. The taxon_id column in BioJava is a meaningless > auto-generated value that is never used. > > BioPerl however is generating taxon_id values and > linking them by setting parent_taxon_id to the > generated value. The parent value from the NCBI > taxonomy file is therefore replaced with the BioPerl > generated parent ID, meaning that instead of linking > from parent_taxon_id to ncbi_taxon_id as per BioJava, > the link is to taxon_id instead. (I'm basing this > comment on looking at load_ncbi_taxonomy.pl from > the BioSQL archives.) Note that old versions of load_ncbi_taxonomy.pl (which is part of BioSQL, not part of BioPerl) would set taxon_id equal to ncbi_taxon_id, see: http://bugzilla.open-bio.org/show_bug.cgi?id=2470 This may help explain the confusion. > I believe if you load the taxonomy table using BioJava, > you should see BioJava giving correct behaviour. > Likewise if you load it using BioPerl, BioPerl will > behave correctly. But if you load with one then query > with the other, you'll get incorrect results. > > This sounds like a case for discussion on both lists - > a matter of standardisation between the two projects. > Not quickly/easily solvable for now. Its not just two projects (BioPerl & BioJava) (grin). Its at least five projects (BioSQL itself plus BioRuby and Biopython). I'm not sure about BioRuby's implementation, but currently I think BioJava is the odd one out - BioPerl, Biopython, and the BioSQL's load_ncbi_taxonomy.pl all make entries in parent_taxon_id reference the automatically generated taxon_id (please correct me if I am wrong). My personal view is that bioperl-db is the reference implementation and should be followed in the event of any ambiguity within BioSQL. In this particular case, there is actually a BioSQL script to check against too (load_ncbi_taxonomy.pl). Hopefully Hilmar can give us an official verdict... Peter From andreas at sdsc.edu Fri Apr 16 17:39:37 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 16 Apr 2010 10:39:37 -0700 Subject: [Biojava-dev] Biojava3-genetics In-Reply-To: <4BC806F4.3090302@wur.nl> References: <4BC806F4.3090302@wur.nl> Message-ID: Hi Richard, any contribution is welcome. What do you have in mind in particular? Perhaps there is already something there along those lines... Andreas On Thu, Apr 15, 2010 at 11:43 PM, Richard Finkers wrote: > Dear List, > > I would be interested in adding a module for genetic analysis to the > biojava3 project. Are there others who are interested in this as well and > with who should I discuss this further? > > Thanks, > Richard > > > -- > Dr. Richard Finkers > Researcher Plant Breeding > Wageningen UR Plant Breeding > P.O. Box 16, 6700 AA, Wageningen, The Netherlands > Wageningen Campus, Building 107, Droevendaalsesteeg 1, 6708 PB > Wageningen, The Netherlands > Tel. +31-317-484165 Fax +31-317-418094 > http://www.plantbreeding.wur.nl/ > https://www.eu-sol.wur.nl/ > https://cbsgdbase.wur.nl/ > http://solgenomics.wur.nl/ > http://www.disclaimer-uk.wur.nl/ > > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From sheoran143 at gmail.com Fri Apr 16 18:43:59 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Fri, 16 Apr 2010 13:43:59 -0500 Subject: [Biojava-dev] [Biojava-l] Issue with SimpleNCBITaxon class In-Reply-To: References: <4BC2200D.8000109@gmail.com> <4BC23A46.7090304@gmail.com> Message-ID: <4BC8AFEF.70107@gmail.com> What my experience says on this issue we should make use of taxon_id because its a unique key in a local instance of biosql. ncbi_taxon_id should only be used for mapping purpose only so that a person can map his local taxon_id to a ncbi_taxon_id otherwise it defeat the sole purpose of having taxon_id as primary key in taxon table. The main goal which I think when biosql is designed is to make it independent of any other organization like genbank or NCBI but its a feature so that we can map a number(ncbi_taxon_id) given by a know authority to a local number (taxon_id). Deepak Sheoran On 4/15/2010 12:54 PM, Peter wrote: > Hi, > > I've CC'd this to the BioSQL mailing list for cross project > discussion. > > On Mon, Apr 12, 2010 at 7:57 AM, Richard Holland wrote: > >> Thanks Deepak. >> >> I've had a look at the code and I believe its due to the >> different ways in which BioJava and BioPerl load the >> taxon table. >> >> BioJava sets the ncbi_taxon_id and parent_taxon_id >> columns based on the values from the NCBI taxonomy >> file. The taxon_id column in BioJava is a meaningless >> auto-generated value that is never used. >> >> BioPerl however is generating taxon_id values and >> linking them by setting parent_taxon_id to the >> generated value. The parent value from the NCBI >> taxonomy file is therefore replaced with the BioPerl >> generated parent ID, meaning that instead of linking >> from parent_taxon_id to ncbi_taxon_id as per BioJava, >> the link is to taxon_id instead. (I'm basing this >> comment on looking at load_ncbi_taxonomy.pl from >> the BioSQL archives.) >> > Note that old versions of load_ncbi_taxonomy.pl > (which is part of BioSQL, not part of BioPerl) would > set taxon_id equal to ncbi_taxon_id, see: > http://bugzilla.open-bio.org/show_bug.cgi?id=2470 > > This may help explain the confusion. > > >> I believe if you load the taxonomy table using BioJava, >> you should see BioJava giving correct behaviour. >> Likewise if you load it using BioPerl, BioPerl will >> behave correctly. But if you load with one then query >> with the other, you'll get incorrect results. >> >> This sounds like a case for discussion on both lists - >> a matter of standardisation between the two projects. >> Not quickly/easily solvable for now. >> > Its not just two projects (BioPerl& BioJava) (grin). > Its at least five projects (BioSQL itself plus BioRuby > and Biopython). > > I'm not sure about BioRuby's implementation, but > currently I think BioJava is the odd one out - BioPerl, > Biopython, and the BioSQL's load_ncbi_taxonomy.pl > all make entries in parent_taxon_id reference the > automatically generated taxon_id (please correct > me if I am wrong). > > My personal view is that bioperl-db is the reference > implementation and should be followed in the event > of any ambiguity within BioSQL. In this particular > case, there is actually a BioSQL script to check > against too (load_ncbi_taxonomy.pl). > > Hopefully Hilmar can give us an official verdict... > > Peter > From sylvain.foisy at diploide.net Sat Apr 17 14:00:07 2010 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Sat, 17 Apr 2010 10:00:07 -0400 (EDT) Subject: [Biojava-dev] Eclipse + maven woes... Message-ID: <55335.76.10.128.89.1271512807.squirrel@humboldt.cyberlogic.net> Hi, Again, I feel stupid asking these newbie questions... I finally got my and on a new MacBook Pro and re-installing the apps to get stuff moving. As usual (I am sorry to say...), Eclipse and Maven are giving me a fit to do a checkout of the developer's tree. I have installed the latest Subversion and Maven plugins. When I want to create a new project, I try the following: 1) I right click to select "New > Other..." in the Navigator panel; 2) I select "SVN > Project from SVN", which leads me to a window where the location of the developer's tree is in svn+ssh; in the window that comes up next, I use this URL to get the "Finish" button activated: svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk 3) After that, I choose the "Check out as a project configured using the New Project Wizard", which pop the window where I select "Maven > Maven Project". 4) I get a "New Maven Project" window where I select the default. The window then changes to a "Select archetype" where I also use the default selections. 5) This is where I can't seem to be moving forward... The window that pops out ask me for an Artefact ID. I am clueless about what to put... The process stops there :-( Maven is probably a cool tool but its learning curve is pretty steep... Shouldn't all this be automatic after "Maven > Maven Project" Thanks in advance. I'll put the solution into the wiki ;-) Sylvain =================================================================== Sylvain Foisy, Ph. D. Consultant Bio-informatique / Bioinformatics Diploide.net - TI pour la vie / IT for Life Courriel: sylvain.foisy at diploide.net Web: http://www.diploide.net =================================================================== From heuermh at acm.org Mon Apr 19 03:33:00 2010 From: heuermh at acm.org (Michael Heuer) Date: Sun, 18 Apr 2010 23:33:00 -0400 (EDT) Subject: [Biojava-dev] Eclipse + maven woes... In-Reply-To: <55335.76.10.128.89.1271512807.squirrel@humboldt.cyberlogic.net> Message-ID: Sylvain Foisy wrote: > Again, I feel stupid asking these newbie questions... I finally got my and > on a new MacBook Pro and re-installing the apps to get stuff moving. As > usual (I am sorry to say...), Eclipse and Maven are giving me a fit to do a > checkout of the developer's tree. > > I have installed the latest Subversion and Maven plugins. When I want to > create a new project, I try the following: > > 1) I right click to select "New > Other..." in the Navigator panel; > > 2) I select "SVN > Project from SVN", which leads me to a window where the > location of the developer's tree is in svn+ssh; in the window that comes > up next, I use this URL to get the "Finish" button activated: > > svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk > > 3) After that, I choose the "Check out as a project configured using the New > Project Wizard", which pop the window where I select "Maven > Maven > Project". > > 4) I get a "New Maven Project" window where I select the default. The window > then changes to a "Select archetype" where I also use the default > selections. This last step doesn't sound right, here Eclipse is creating a brand new Maven project for you instead of creating a Maven-based project from the metadata already in subversion. In the SVN window you should see "Check out as Maven Project" when you right-click, unless that has changed with newer versions of the maven plugin. michael From andreas at sdsc.edu Mon Apr 19 04:17:17 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 18 Apr 2010 21:17:17 -0700 Subject: [Biojava-dev] Eclipse + maven woes... In-Reply-To: <55335.76.10.128.89.1271512807.squirrel@humboldt.cyberlogic.net> References: <55335.76.10.128.89.1271512807.squirrel@humboldt.cyberlogic.net> Message-ID: Hi Sylvain, The place to start the checkout in eclipse is the SVN repository browser. There you can do a right-click on the biojava/trunk folder and check out as a Maven project. Andreas On Sat, Apr 17, 2010 at 7:00 AM, Sylvain Foisy wrote: > Hi, > > Again, I feel stupid asking these newbie questions... I finally got my and > on a new MacBook Pro and re-installing the apps to get stuff moving. As > usual (I am sorry to say...), Eclipse and Maven are giving me a fit to do a > checkout of the developer's tree. > > I have installed the latest Subversion and Maven plugins. When I want to > create a new project, I try the following: > > 1) I right click to select "New > Other..." in the Navigator panel; > > 2) I select "SVN > Project from SVN", which leads me to a window where the > location of the developer's tree is in svn+ssh; in the window that comes > up next, I use this URL to get the "Finish" button activated: > > svn+ssh:// > dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk > > 3) After that, I choose the "Check out as a project configured using the > New > Project Wizard", which pop the window where I select "Maven > Maven > Project". > > 4) I get a "New Maven Project" window where I select the default. The > window > then changes to a "Select archetype" where I also use the default > selections. > > 5) This is where I can't seem to be moving forward... The window that pops > out ask me for an Artefact ID. I am clueless about what to put... The > process stops there :-( > > Maven is probably a cool tool but its learning curve is pretty steep... > Shouldn't all this be automatic after "Maven > Maven Project" > > Thanks in advance. I'll put the solution into the wiki ;-) > > Sylvain > > =================================================================== > > Sylvain Foisy, Ph. D. > Consultant Bio-informatique / Bioinformatics > Diploide.net - TI pour la vie / IT for Life > > Courriel: sylvain.foisy at diploide.net > Web: http://www.diploide.net > > =================================================================== > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From sylvain.foisy at diploide.net Mon Apr 19 20:47:43 2010 From: sylvain.foisy at diploide.net (Sylvain Foisy) Date: Mon, 19 Apr 2010 16:47:43 -0400 Subject: [Biojava-dev] Eclipse + maven woes... In-Reply-To: Message-ID: Hi Andreas, I finally got something working but it wasn't automatic... Switching to the SVN Repositories perspective, I right-clicked on trunk and selected "Checkout..." After d/l the code, I had to right-click the biojava-live that was now found in the Java Browsing perspective, select the "m2 Maven > Enable Dependancy Management" to have it working. If I tried the "Check out as..." option, I would have a window popping out with "Check out Maven projects with SCM" pre-selected and I would be stuck in the Group ID/Artefact ID mayhem. Thanks for the time. Back to coding ;-) Sylvain On 19/04/10 00:17, "[NAME]" <[ADDRESS]> wrote: > Hi Sylvain, > > The place to start the checkout in eclipse is the SVN repository browser.? > There you can do a right-click on the biojava/trunk folder and check out as a > Maven project. > > Andreas > > On Sat, Apr 17, 2010 at 7:00 AM, Sylvain Foisy > wrote: >> Hi, >> >> Again, I feel stupid asking these newbie questions... I finally got my and >> on a new MacBook Pro and re-installing the apps to get stuff moving. As >> usual (I am sorry to say...), Eclipse and Maven are giving me a fit to do a >> checkout of the developer's tree. >> >> I have installed the latest Subversion and Maven plugins. When I want to >> create a new project, I try the following: >> >> 1) I right click to select "New > Other..." in the Navigator panel; >> >> 2) I select "SVN > Project from SVN", which leads me to a window where the >> ?location of the developer's tree is in svn+ssh; in the window that comes >> up next, I use this URL to get the "Finish" button activated: >> >> ?svn+ssh://dev.open-bio.org/home/svn-repositories/biojava/biojava-live/trunk >> >> >> 3) After that, I choose the "Check out as a project configured using the New >> Project Wizard", which pop the window where I select "Maven > Maven >> Project". >> >> 4) I get a "New Maven Project" window where I select the default. The window >> then changes to a "Select archetype" where I also use the default >> selections. >> >> 5) This is where I can't seem to be moving forward... The window that pops >> out ask me for an Artefact ID. I am clueless about what to put... The >> process stops there :-( >> >> Maven is probably a cool tool but its learning curve is pretty steep... >> Shouldn't all this be automatic after "Maven > Maven Project" >> >> Thanks in advance. I'll put the solution into the wiki ;-) >> >> Sylvain >> >> =================================================================== >> >> ?Sylvain Foisy, Ph. D. >> ?Consultant Bio-informatique / Bioinformatics >> ?Diploide.net - TI pour la vie / IT for Life >> >> ?Courriel: sylvain.foisy at diploide.net >> ?Web: http://www.diploide.net >> >> =================================================================== >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > From andreas at sdsc.edu Tue Apr 27 05:33:51 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 26 Apr 2010 22:33:51 -0700 Subject: [Biojava-dev] accepted GSoC projects Message-ID: Dear all, Google has released the results for GSoC: Congratulations to Mark Chapman and Jianjiong Gao for having been accepted to work on the MSA and PTM projects for BioJava! Let's start the "community bonding" process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap ) and we all are looking forward to work with you on this during the summer. The Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis and Kyle Ellrott for the MSA project (and me). I want to thank all of of you who submitted proposals or showed interest in other ways for the Google Summer of Code. We hope you are not too disappointed if your application did not get accepted this time. We had a large number (52) applications and the the overall quality of the submissions was very high. We would like to stay in touch with you and we hope that you are interested in BioJava also beyond the scope of GSoC. There are a number of different ways how to contribute: We are always looking for people who provide code and patches to further improve our library, help out with the documentation on the Wiki page, or answer questions on the mailing lists. Let's all give Mark and Jianjiong a warm welcome to the BioJava community. For those of you who are interested in following the progress of the projects, as usually, the development related discussions are going to be on the biojava-dev list. Happy coding! Andreas From jianjiong.gao at gmail.com Tue Apr 27 19:13:12 2010 From: jianjiong.gao at gmail.com (Jianjiong Gao) Date: Tue, 27 Apr 2010 14:13:12 -0500 Subject: [Biojava-dev] accepted GSoC projects In-Reply-To: References: Message-ID: Dear Dr. Prlic and Everyone, Thanks for the warm welcome. I am so glad that I have the chance to work with the BioJava community this summer. I would like to briefly introduce myself. My name is Jianjiong (JJ) Gao. I am a PhD student in Computer Science at University of Missouri, Columbia. My study is focusing on Bioinformatics, specifically computational proteomics and PTMs. I came across BioJava about two years ago when I was working on a plugin for Cytoscape, and was attracted by the idea of providing generic Java API for bioinformatics applications. I was thinking maybe someday I could do some coding for BioJava. And now I got the chance :) Best Regards, -JJ On Tue, Apr 27, 2010 at 12:33 AM, Andreas Prlic wrote: > Dear all, > > Google has released the results for GSoC: Congratulations to Mark Chapman > and Jianjiong Gao for having been accepted to work on the MSA and PTM > projects for BioJava! Let's start the "community bonding" process ( > http://en.flossmanuals.net/GSoCMentoring/MindtheGap ) ?and we all are > looking forward to work with you on this during the summer. The Mentors and > co-mentors will be Peter Rose for the PTM and Scooter Willis and Kyle > Ellrott for the MSA project (and me). > > I want to thank all of of you who submitted proposals or showed interest in > other ways for the Google Summer of Code. We hope you are not too > disappointed if your application did not get accepted this time. We had a > large number (52) applications and the the overall quality of the > submissions was very high. We would like to stay in touch with you and we > hope that you are interested in BioJava also beyond the scope of GSoC. There > are a number of different ways how to contribute: ?We are always looking for > people who provide code and patches to further improve our library, help out > with the documentation on the Wiki page, or answer questions on the mailing > lists. > > Let's all give Mark and Jianjiong ?a warm welcome to the BioJava community. > For those of you who are interested in following the progress of the > projects, as usually, the development related discussions are going to be on > the biojava-dev list. > > Happy coding! > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From chapman at cs.wisc.edu Wed Apr 28 04:18:25 2010 From: chapman at cs.wisc.edu (Mark Chapman) Date: Tue, 27 Apr 2010 23:18:25 -0500 Subject: [Biojava-dev] accepted GSoC projects In-Reply-To: References: Message-ID: <4BD7B711.9090108@cs.wisc.edu> Hi all, Thank you to Google, Open Bioinformatics Foundation, BioJava, and my mentors for this opportunity. As a short introduction, I am Mark Chapman, a graduate student in Computer Sciences at the University of Wisconsin - Madison. My focus is in artificial intelligence and bioinformatics. This summer, I will add a Multiple Sequence Alignment module to BioJava. My first task will be to update the alignment module to BioJava3 and to design the interface for MSA. My second goal is to implement a progressive MSA styled after clustalw. After that, I will add alternative routines for each step. Any ideas for the MSA project as well as more sources of programming wisdom are quite welcome. For example, Andreas suggested a series about Java parallelism and lazy execution (http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/). I also noted a useful tip for iterative development (http://en.flossmanuals.net/GSoCMentoring/Workflow). Thanks again, Mark On 4/27/2010 12:33 AM, Andreas Prlic wrote: > Dear all, > > Google has released the results for GSoC: Congratulations to Mark > Chapman and Jianjiong Gao for having been accepted to work on the MSA > and PTM projects for BioJava! Let's start the "community bonding" > process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap ) and we > all are looking forward to work with you on this during the summer. The > Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis > and Kyle Ellrott for the MSA project (and me). > > I want to thank all of of you who submitted proposals or showed interest > in other ways for the Google Summer of Code. We hope you are not too > disappointed if your application did not get accepted this time. We had > a large number (52) applications and the the overall quality of the > submissions was very high. We would like to stay in touch with you and > we hope that you are interested in BioJava also beyond the scope of > GSoC. There are a number of different ways how to contribute: We are > always looking for people who provide code and patches to further > improve our library, help out with the documentation on the Wiki page, > or answer questions on the mailing lists. > > Let's all give Mark and Jianjiong a warm welcome to the BioJava > community. For those of you who are interested in following the > progress of the projects, as usually, the development related > discussions are going to be on the biojava-dev list. > > Happy coding! > > Andreas > > From andreas at sdsc.edu Wed Apr 28 17:31:58 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 28 Apr 2010 10:31:58 -0700 Subject: [Biojava-dev] accepted GSoC projects In-Reply-To: <4BD7B711.9090108@cs.wisc.edu> References: <4BD7B711.9090108@cs.wisc.edu> Message-ID: > Any ideas for the MSA project as well as more sources of programming wisdom > are quite welcome. For example, Andreas suggested a series about Java > parallelism and lazy execution ( > http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/). > credits for the links go to Scooter, who recommended those ;-) My general recommendation is to read Joshua Bloch's "Effective Java". http://java.sun.com/docs/books/effective/ It is a collection of rules that should help in avoiding some frequently made mistakes... Andreas > I also noted a useful tip for iterative development ( > http://en.flossmanuals.net/GSoCMentoring/Workflow). > > Thanks again, > Mark > > > > On 4/27/2010 12:33 AM, Andreas Prlic wrote: > >> Dear all, >> >> Google has released the results for GSoC: Congratulations to Mark >> Chapman and Jianjiong Gao for having been accepted to work on the MSA >> and PTM projects for BioJava! Let's start the "community bonding" >> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap ) and we >> all are looking forward to work with you on this during the summer. The >> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis >> and Kyle Ellrott for the MSA project (and me). >> >> I want to thank all of of you who submitted proposals or showed interest >> in other ways for the Google Summer of Code. We hope you are not too >> disappointed if your application did not get accepted this time. We had >> a large number (52) applications and the the overall quality of the >> submissions was very high. We would like to stay in touch with you and >> we hope that you are interested in BioJava also beyond the scope of >> GSoC. There are a number of different ways how to contribute: We are >> always looking for people who provide code and patches to further >> improve our library, help out with the documentation on the Wiki page, >> or answer questions on the mailing lists. >> >> Let's all give Mark and Jianjiong a warm welcome to the BioJava >> community. For those of you who are interested in following the >> progress of the projects, as usually, the development related >> discussions are going to be on the biojava-dev list. >> >> Happy coding! >> >> Andreas >> >> >> -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From HWillis at scripps.edu Wed Apr 28 17:57:14 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Wed, 28 Apr 2010 13:57:14 -0400 Subject: [Biojava-dev] accepted GSoC projects In-Reply-To: References: <4BD7B711.9090108@cs.wisc.edu> Message-ID: <6C3A102F-AF2B-4E29-9C84-BB6B881BD083@scripps.edu> Andreas Those links were sent to me by Mark Southern who sits a couple doors down and a past BioJava contributor for the sequence viewer. We should avoid bringing in any external parallel frameworks but at minimum give ourselves enough abstraction with a backend multi-threaded job-processing approach to take advantage of a multi-processor box and a cluster via Terracotta. If the abstraction of the jobs and the mapping of resources is generic enough then that allows different implementations in various cluster environments for those who have found the next best thing in parallel computing! Scooter On Apr 28, 2010, at 1:31 PM, Andreas Prlic wrote: >> Any ideas for the MSA project as well as more sources of programming wisdom >> are quite welcome. For example, Andreas suggested a series about Java >> parallelism and lazy execution ( >> http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/). >> > > > credits for the links go to Scooter, who recommended those ;-) My general > recommendation is to read Joshua Bloch's "Effective Java". > http://java.sun.com/docs/books/effective/ It is a collection of rules that > should help in avoiding some frequently made mistakes... > > Andreas > > > > > > >> I also noted a useful tip for iterative development ( >> http://en.flossmanuals.net/GSoCMentoring/Workflow). >> >> Thanks again, >> Mark >> >> >> >> On 4/27/2010 12:33 AM, Andreas Prlic wrote: >> >>> Dear all, >>> >>> Google has released the results for GSoC: Congratulations to Mark >>> Chapman and Jianjiong Gao for having been accepted to work on the MSA >>> and PTM projects for BioJava! Let's start the "community bonding" >>> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap ) and we >>> all are looking forward to work with you on this during the summer. The >>> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis >>> and Kyle Ellrott for the MSA project (and me). >>> >>> I want to thank all of of you who submitted proposals or showed interest >>> in other ways for the Google Summer of Code. We hope you are not too >>> disappointed if your application did not get accepted this time. We had >>> a large number (52) applications and the the overall quality of the >>> submissions was very high. We would like to stay in touch with you and >>> we hope that you are interested in BioJava also beyond the scope of >>> GSoC. There are a number of different ways how to contribute: We are >>> always looking for people who provide code and patches to further >>> improve our library, help out with the documentation on the Wiki page, >>> or answer questions on the mailing lists. >>> >>> Let's all give Mark and Jianjiong a warm welcome to the BioJava >>> community. For those of you who are interested in following the >>> progress of the projects, as usually, the development related >>> discussions are going to be on the biojava-dev list. >>> >>> Happy coding! >>> >>> Andreas >>> >>> >>> > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From quantum7 at gmail.com Wed Apr 28 19:06:40 2010 From: quantum7 at gmail.com (Spencer Bliven) Date: Wed, 28 Apr 2010 12:06:40 -0700 Subject: [Biojava-dev] accepted GSoC projects In-Reply-To: <4BD7B711.9090108@cs.wisc.edu> References: <4BD7B711.9090108@cs.wisc.edu> Message-ID: Mark- Welcome to the Biojava community! Adding multiple sequence alignments will be a nice feature for the library. One suggestion I have is to make any data structures for multiple alignments you create as general as possible, and to think about whether the special cases can still be represented. For instance, can you store an alignment where some of the sequence is unknown (eg {ABCD, ABXD})? Can you store an alignment where only a subset of the sequences are defined? I recently had to represent an alignment like this: ABCD EFGH EFGH ABCD This sort of alignment can't be written using just gaps; I had to make a new structure to store pairs {(A,A), (B,B), ...} and rewrite much of the existing alignment functionality based on that. Anyway, I don't mean to get bogged down in specific examples or exceptions. I just wanted to point out that there are a lot of methods which can be used to define some sort of alignment between a set of sequences, and it would be nice if the BioJava alignment package was general enough to accommodate such methods in the future without reinventing the wheel. Cheers! Spencer P.S. I ran into such weird alignments while working on structural alignments, which are not well behaved like traditional multiple sequence alignments. Andreas knows all about both types of alignment, and can probably judge better than I how much generality is worth spending your time on. On Tue, Apr 27, 2010 at 9:18 PM, Mark Chapman wrote: > Hi all, > > Thank you to Google, Open Bioinformatics Foundation, BioJava, and my > mentors for this opportunity. As a short introduction, I am Mark Chapman, a > graduate student in Computer Sciences at the University of Wisconsin - > Madison. My focus is in artificial intelligence and bioinformatics. This > summer, I will add a Multiple Sequence Alignment module to BioJava. > > My first task will be to update the alignment module to BioJava3 and to > design the interface for MSA. My second goal is to implement a progressive > MSA styled after clustalw. After that, I will add alternative routines for > each step. > > Any ideas for the MSA project as well as more sources of programming wisdom > are quite welcome. For example, Andreas suggested a series about Java > parallelism and lazy execution ( > http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/). > I also noted a useful tip for iterative development ( > http://en.flossmanuals.net/GSoCMentoring/Workflow). > > Thanks again, > Mark > > > > On 4/27/2010 12:33 AM, Andreas Prlic wrote: > >> Dear all, >> >> Google has released the results for GSoC: Congratulations to Mark >> Chapman and Jianjiong Gao for having been accepted to work on the MSA >> and PTM projects for BioJava! Let's start the "community bonding" >> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap ) and we >> all are looking forward to work with you on this during the summer. The >> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis >> and Kyle Ellrott for the MSA project (and me). >> >> I want to thank all of of you who submitted proposals or showed interest >> in other ways for the Google Summer of Code. We hope you are not too >> disappointed if your application did not get accepted this time. We had >> a large number (52) applications and the the overall quality of the >> submissions was very high. We would like to stay in touch with you and >> we hope that you are interested in BioJava also beyond the scope of >> GSoC. There are a number of different ways how to contribute: We are >> always looking for people who provide code and patches to further >> improve our library, help out with the documentation on the Wiki page, >> or answer questions on the mailing lists. >> >> Let's all give Mark and Jianjiong a warm welcome to the BioJava >> community. For those of you who are interested in following the >> progress of the projects, as usually, the development related >> discussions are going to be on the biojava-dev list. >> >> Happy coding! >> >> Andreas >> >> >> _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From chapman at cs.wisc.edu Thu Apr 29 01:09:07 2010 From: chapman at cs.wisc.edu (Mark Chapman) Date: Wed, 28 Apr 2010 20:09:07 -0500 Subject: [Biojava-dev] accepted GSoC projects In-Reply-To: <6C3A102F-AF2B-4E29-9C84-BB6B881BD083@scripps.edu> References: <4BD7B711.9090108@cs.wisc.edu> <6C3A102F-AF2B-4E29-9C84-BB6B881BD083@scripps.edu> Message-ID: <4BD8DC33.7010607@cs.wisc.edu> Here is a summary of the concurrency lessons I learned that are useful with or without the functional programming paradigm -- 1: implement Callable to submit tasks for concurrent/parallel/lazy execution - call() methods just wrap a call to the computation intensive method 2: share a fixed size thread pool with task queue to avoid - overhead of thread creation/destruction, - too many simultaneous threads, and - most blocking issues 3: place thread blocking Future.get() calls within tasks later in the queue - while(!Future.isDone()) Thread.yield(); may also help keep the pool active 4: execution in a task queue also enables easier logging and progress listening There are two obvious places concurrent execution will fit in the MSA module -- 1: building the distance matrix - queue pairwise alignment/scoring tasks in loop over all sequence pairs 2: progressive alignment - queue profile-profile alignment tasks in postfix traversal of guide tree (from leaves to root) All our library copies of "Effective Java" are checked out, so I ordered a copy for my personal library. The sample chapter on generics sold me. Mark On 4/28/2010 12:57 PM, Scooter Willis wrote: > Andreas > > Those links were sent to me by Mark Southern who sits a couple doors down and a past BioJava contributor for the sequence viewer. We should avoid bringing in any external parallel frameworks but at minimum give ourselves enough abstraction with a backend multi-threaded job-processing approach to take advantage of a multi-processor box and a cluster via Terracotta. If the abstraction of the jobs and the mapping of resources is generic enough then that allows different implementations in various cluster environments for those who have found the next best thing in parallel computing! > > Scooter > > On Apr 28, 2010, at 1:31 PM, Andreas Prlic wrote: > >>> Any ideas for the MSA project as well as more sources of programming wisdom >>> are quite welcome. For example, Andreas suggested a series about Java >>> parallelism and lazy execution ( >>> http://apocalisp.wordpress.com/2008/06/18/parallel-strategies-and-the-callable-monad/). >>> >> >> >> credits for the links go to Scooter, who recommended those ;-) My general >> recommendation is to read Joshua Bloch's "Effective Java". >> http://java.sun.com/docs/books/effective/ It is a collection of rules that >> should help in avoiding some frequently made mistakes... >> >> Andreas >> >> >> >> >> >> >>> I also noted a useful tip for iterative development ( >>> http://en.flossmanuals.net/GSoCMentoring/Workflow). >>> >>> Thanks again, >>> Mark >>> >>> >>> >>> On 4/27/2010 12:33 AM, Andreas Prlic wrote: >>> >>>> Dear all, >>>> >>>> Google has released the results for GSoC: Congratulations to Mark >>>> Chapman and Jianjiong Gao for having been accepted to work on the MSA >>>> and PTM projects for BioJava! Let's start the "community bonding" >>>> process ( http://en.flossmanuals.net/GSoCMentoring/MindtheGap ) and we >>>> all are looking forward to work with you on this during the summer. The >>>> Mentors and co-mentors will be Peter Rose for the PTM and Scooter Willis >>>> and Kyle Ellrott for the MSA project (and me). >>>> >>>> I want to thank all of of you who submitted proposals or showed interest >>>> in other ways for the Google Summer of Code. We hope you are not too >>>> disappointed if your application did not get accepted this time. We had >>>> a large number (52) applications and the the overall quality of the >>>> submissions was very high. We would like to stay in touch with you and >>>> we hope that you are interested in BioJava also beyond the scope of >>>> GSoC. There are a number of different ways how to contribute: We are >>>> always looking for people who provide code and patches to further >>>> improve our library, help out with the documentation on the Wiki page, >>>> or answer questions on the mailing lists. >>>> >>>> Let's all give Mark and Jianjiong a warm welcome to the BioJava >>>> community. For those of you who are interested in following the >>>> progress of the projects, as usually, the development related >>>> discussions are going to be on the biojava-dev list. >>>> >>>> Happy coding! >>>> >>>> Andreas >>>> >>>> >>>> >> >> >> -- >> ----------------------------------------------------------------------- >> Dr. Andreas Prlic >> Senior Scientist, RCSB PDB Protein Data Bank >> University of California, San Diego >> (+1) 858.246.0526 >> ----------------------------------------------------------------------- >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > From andreas at sdsc.edu Fri Apr 30 15:29:03 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 30 Apr 2010 08:29:03 -0700 Subject: [Biojava-dev] accepted GSoC projects In-Reply-To: <4BD8DC33.7010607@cs.wisc.edu> References: <4BD7B711.9090108@cs.wisc.edu> <6C3A102F-AF2B-4E29-9C84-BB6B881BD083@scripps.edu> <4BD8DC33.7010607@cs.wisc.edu> Message-ID: Hi Mark and Jianjiong, In the meanwhile you should have received your login info for the develoment SVN server. I suggest the following things as next steps: *) If you have not done so already, sign up to the biojava-l and biojava-dev mailing lists *) Get a biojava checkout from the developmental SVN server. *) add the LGPL license javadoc header http://www.biojava.org/wiki/BioJava3_license to the templates in your IDE. *) Take a look at the JUnit tests and add a new test for something that is related for your projects *) Take a look at the Wiki pages (e.g. http://www.biojava.org/wiki/BioJava:CookBook ), get an account on the wiki and improve one of the documentation pages *) take a look at the javadocs at http://www.biojava.org/docs/api/index.html Andreas From andreas at sdsc.edu Fri Apr 30 15:44:25 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 30 Apr 2010 08:44:25 -0700 Subject: [Biojava-dev] biojava SVN Message-ID: Hi, The BioJava SVN has not been fully compiling ever since the Hackathon. I guess things were quite in flux the last months and it is now time to make sure SVN fully compiles again. There is a few things we need to figure out in order for that: * Jar files for libraries that are not in a public Maven repository. Jules : at some point you indicated that we might be able to get such jar files hosted by the EBI Maven repository. Do you think that is still an possibility and could you get a few libraries into that? In particular that would be Jmol, Astex, and probably one or two other Jar files. That would make the BioJava checkout process much smoother and not require a developer to manually install jars for full functionality. * We have a couple of modules that are fragmented and broken. This is due to historic leftovers from when we started the re-factoring process. If all the functionality has been moved into the new biojava3-core module, I would vote for removing the modules starting with sequence* Andreas -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From ayates at ebi.ac.uk Fri Apr 30 15:48:01 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 30 Apr 2010 16:48:01 +0100 Subject: [Biojava-dev] biojava SVN In-Reply-To: References: Message-ID: <475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk> Does anyone know how hard it would be to get these into the public maven repository? The EBI repo is all well & good but updating it relies on BioJava always having a committer at the EBI. Now I know that is a very likely statement but is it something we can rely on? Andy On 30 Apr 2010, at 16:44, Andreas Prlic wrote: > Hi, > > The BioJava SVN has not been fully compiling ever since the Hackathon. I > guess things were quite in flux the last months and it is now time to make > sure SVN fully compiles again. There is a few things we need to figure out > in order for that: > > * Jar files for libraries that are not in a public Maven repository. Jules : > at some point you indicated that we might be able to get such jar files > hosted by the EBI Maven repository. Do you think that is still an > possibility and could you get a few libraries into that? In particular that > would be Jmol, Astex, and probably one or two other Jar files. That would > make the BioJava checkout process much smoother and not require a developer > to manually install jars for full functionality. > > * We have a couple of modules that are fragmented and broken. This is due to > historic leftovers from when we started the re-factoring process. If all the > functionality has been moved into the new biojava3-core module, I would vote > for removing the modules starting with sequence* > > Andreas > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From ayates at ebi.ac.uk Fri Apr 30 15:57:12 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 30 Apr 2010 16:57:12 +0100 Subject: [Biojava-dev] biojava SVN In-Reply-To: References: <475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk> Message-ID: <3C3AAC8F-5C03-44C1-B121-7808C0612A65@ebi.ac.uk> As far as I remember you 'can' have one setup manually. I think I offered one hand-developed from one of my projects. Infact FYI: http://code.google.com/p/dbcon/source/browse/#svn/trunk/maven-repo It just requires the correct structure in place & it works. I went for it being hosted in SVN because there's a HTTP interface to it offered by Google. The EBI Maven repo is just a public HTTP directory. It's been some years since I did a deployment there but it's not hard to do & we should be able to do it locally & sync it to SVN Andy On 30 Apr 2010, at 16:50, Richard Holland wrote: > Could a small MVN repo be set up at OBF? > > On 30 Apr 2010, at 16:48, Andy Yates wrote: > >> Does anyone know how hard it would be to get these into the public maven repository? The EBI repo is all well & good but updating it relies on BioJava always having a committer at the EBI. Now I know that is a very likely statement but is it something we can rely on? >> >> Andy >> >> On 30 Apr 2010, at 16:44, Andreas Prlic wrote: >> >>> Hi, >>> >>> The BioJava SVN has not been fully compiling ever since the Hackathon. I >>> guess things were quite in flux the last months and it is now time to make >>> sure SVN fully compiles again. There is a few things we need to figure out >>> in order for that: >>> >>> * Jar files for libraries that are not in a public Maven repository. Jules : >>> at some point you indicated that we might be able to get such jar files >>> hosted by the EBI Maven repository. Do you think that is still an >>> possibility and could you get a few libraries into that? In particular that >>> would be Jmol, Astex, and probably one or two other Jar files. That would >>> make the BioJava checkout process much smoother and not require a developer >>> to manually install jars for full functionality. >>> >>> * We have a couple of modules that are fragmented and broken. This is due to >>> historic leftovers from when we started the re-factoring process. If all the >>> functionality has been moved into the new biojava3-core module, I would vote >>> for removing the modules starting with sequence* >>> >>> Andreas >>> >>> >>> -- >>> ----------------------------------------------------------------------- >>> Dr. Andreas Prlic >>> Senior Scientist, RCSB PDB Protein Data Bank >>> University of California, San Diego >>> (+1) 858.246.0526 >>> ----------------------------------------------------------------------- >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates Ensembl Genomes Engineer >> EMBL-EBI Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From andreas at sdsc.edu Fri Apr 30 16:27:09 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 30 Apr 2010 09:27:09 -0700 Subject: [Biojava-dev] biojava SVN In-Reply-To: References: <475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk> Message-ID: > Could a small MVN repo be set up at OBF? I am pretty sure we could do that. Anybody volunteering? I can help with getting the necessary permissions... Anybody knows some good docu for how to set this up? Andreas On Fri, Apr 30, 2010 at 8:50 AM, Richard Holland wrote: > Could a small MVN repo be set up at OBF? > > On 30 Apr 2010, at 16:48, Andy Yates wrote: > > > Does anyone know how hard it would be to get these into the public maven > repository? The EBI repo is all well & good but updating it relies on > BioJava always having a committer at the EBI. Now I know that is a very > likely statement but is it something we can rely on? > > > > Andy > > > > On 30 Apr 2010, at 16:44, Andreas Prlic wrote: > > > >> Hi, > >> > >> The BioJava SVN has not been fully compiling ever since the Hackathon. I > >> guess things were quite in flux the last months and it is now time to > make > >> sure SVN fully compiles again. There is a few things we need to figure > out > >> in order for that: > >> > >> * Jar files for libraries that are not in a public Maven repository. > Jules : > >> at some point you indicated that we might be able to get such jar files > >> hosted by the EBI Maven repository. Do you think that is still an > >> possibility and could you get a few libraries into that? In particular > that > >> would be Jmol, Astex, and probably one or two other Jar files. That > would > >> make the BioJava checkout process much smoother and not require a > developer > >> to manually install jars for full functionality. > >> > >> * We have a couple of modules that are fragmented and broken. This is > due to > >> historic leftovers from when we started the re-factoring process. If all > the > >> functionality has been moved into the new biojava3-core module, I would > vote > >> for removing the modules starting with sequence* > >> > >> Andreas > >> > >> > >> -- > >> ----------------------------------------------------------------------- > >> Dr. Andreas Prlic > >> Senior Scientist, RCSB PDB Protein Data Bank > >> University of California, San Diego > >> (+1) 858.246.0526 > >> ----------------------------------------------------------------------- > >> _______________________________________________ > >> biojava-dev mailing list > >> biojava-dev at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > -- > > Andrew Yates Ensembl Genomes Engineer > > EMBL-EBI Tel: +44-(0)1223-492538 > > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > > > > > > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From holland at eaglegenomics.com Fri Apr 30 15:50:52 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 30 Apr 2010 16:50:52 +0100 Subject: [Biojava-dev] biojava SVN In-Reply-To: <475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk> References: <475FBD45-F4B8-4E06-B479-92319D48C06F@ebi.ac.uk> Message-ID: Could a small MVN repo be set up at OBF? On 30 Apr 2010, at 16:48, Andy Yates wrote: > Does anyone know how hard it would be to get these into the public maven repository? The EBI repo is all well & good but updating it relies on BioJava always having a committer at the EBI. Now I know that is a very likely statement but is it something we can rely on? > > Andy > > On 30 Apr 2010, at 16:44, Andreas Prlic wrote: > >> Hi, >> >> The BioJava SVN has not been fully compiling ever since the Hackathon. I >> guess things were quite in flux the last months and it is now time to make >> sure SVN fully compiles again. There is a few things we need to figure out >> in order for that: >> >> * Jar files for libraries that are not in a public Maven repository. Jules : >> at some point you indicated that we might be able to get such jar files >> hosted by the EBI Maven repository. Do you think that is still an >> possibility and could you get a few libraries into that? In particular that >> would be Jmol, Astex, and probably one or two other Jar files. That would >> make the BioJava checkout process much smoother and not require a developer >> to manually install jars for full functionality. >> >> * We have a couple of modules that are fragmented and broken. This is due to >> historic leftovers from when we started the re-factoring process. If all the >> functionality has been moved into the new biojava3-core module, I would vote >> for removing the modules starting with sequence* >> >> Andreas >> >> >> -- >> ----------------------------------------------------------------------- >> Dr. Andreas Prlic >> Senior Scientist, RCSB PDB Protein Data Bank >> University of California, San Diego >> (+1) 858.246.0526 >> ----------------------------------------------------------------------- >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates Ensembl Genomes Engineer > EMBL-EBI Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/