From ayates at ebi.ac.uk Mon Jan 4 04:47:58 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 4 Jan 2010 09:47:58 +0000 Subject: [Biojava-dev] biojava hackathon In-Reply-To: <59a41c430912210820p59ae6b46wa239ae78328098d8@mail.gmail.com> References: <59a41c430912210820p59ae6b46wa239ae78328098d8@mail.gmail.com> Message-ID: <31FC6A52-DE16-47B1-86E7-D5C49EA6C6FA@ebi.ac.uk> Other hotels available near the campus are: http://www.redlionhinxton.co.uk/ (the nearest non WT owned accommodation) http://www.johnbarleycorn.co.uk/ (in the next village along) http://www.duxfordlodgehotel.co.uk/ (see above) http://www.wtconference.org/ (owned by WT & sometimes will offer accommodation) However one problem with all of these are the costs involved; if you can get into the Travelodge Andreas mentioned it's normally quite a bit cheaper but do check carefully. Spending ?5 more per night to get into one of the other places can be worth it. Andy On 21 Dec 2009, at 16:20, Andreas Prlic wrote: > A few people asked me where to stay and I suggested to book the > Travelodge in Hills road. It is a plain and cheap hotel (for > Cambridge), walking distance to the train station, town center (20 > min) and close to one of the stops of the Genome Campus bus routes. > > Andreas > > On Mon, Dec 21, 2009 at 7:23 AM, Johan Henriksson > wrote: >> Hello, >> has anyone done their homework and figured out which hotels are the >> best. >> maybe someone working there knows the "standard" ones? >> >> it seems I can join in. have not decided exactly what I will work >> on but >> surely not the protein stuff. either improving the core or the >> algorithms/data structures. I have experimented a bit with >> alternative ways >> of dealing with annotation and sequence data, maybe it can be of use. >> >> /Johan >> >> -- >> ----------------------------------------------------------- >> Johan Henriksson >> PhD student, Karolinska Institutet >> http://mahogny.areta.org http://www.endrov.net >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From holland at eaglegenomics.com Mon Jan 4 05:57:54 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 4 Jan 2010 10:57:54 +0000 Subject: [Biojava-dev] BioJava 2010 Hackathon Message-ID: Hi all, For those attending the 2010 BioJava hackathon at the Wellcome Trust Genome Campus, near Cambridge, all details can be found on this wiki: http://biojava.org/wiki/BioJava:Hackathon2010 If your name is not listed on the wiki but you are planning on attending, please let us know by Jan 12th latest. Due to security arrangements at the campus it is not possible to just turn up on the day. Subjects for the hackathon are open for discussion on the wiki. Looking forward to seeing you all there! cheers, Richard -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From jolyon.holdstock at ogt.co.uk Wed Jan 13 07:49:17 2010 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Wed, 13 Jan 2010 12:49:17 -0000 Subject: [Biojava-dev] Script for cookbook Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F02C64D8C@EUCLID.internal.ogtip.com> Hi, I was going to add the following script to the annotation section of the cookbook. It takes an EMBL or Genbank file and outputs information about each CDS feature. I just wanted to check this was OK and also is this the most efficient way of doing this? Cheers, Jolyon [CODE] import java.io.*; import java.util.*; import org.biojava.bio.*; import org.biojava.bio.seq.*; import org.biojava.bio.seq.io.*; import org.biojavax.*; import org.biojavax.ontology.*; import org.biojavax.bio.*; import org.biojavax.bio.seq.*; public class ExtractInformation { //Create the RichSequence object RichSequence richSeq; public ExtractInformation(String fileName){ //Load the sequence file try { richSeq = RichSequence.IOTools.readGenbankDNA(new BufferedReader(new FileReader(fileName)),null).nextRichSequence(); } catch(FileNotFoundException fnfe){ System.out.println("FilwNotFoundException: " + fnfe); } catch(BioException bioe1){ System.err.println("Not a Genbank sequence trying EMBL"); try { richSeq = RichSequence.IOTools.readGenbankDNA(new BufferedReader(new FileReader(fileName)),null).nextRichSequence(); } catch(BioException bioe2){ System.err.println("Not an EMBL sequence either"); System.exit(1); } catch(FileNotFoundException fnfe){ System.out.println("FilwNotFoundException: " + fnfe); } } //Filter the sequence on CDS features FeatureFilter ff = new FeatureFilter.ByType("CDS"); FeatureHolder fh = richSeq.filter(ff); //Iterate through the CDS features for (Iterator i = fh.features(); i.hasNext();){ RichFeature rf = i.next(); //Get the strand orientation of the feature char featureStrand = rf.getStrand().getToken(); //Get the location of the feature String featureLocation = rf.getLocation().toString(); //Get the annotation of the feature RichAnnotation ra = (RichAnnotation)rf.getAnnotation(); //Create the required ComparableTerms ComparableTerm geneTerm = RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene"); ComparableTerm locusTerm = RichObjectFactory.getDefaultOntology().getOrCreateTerm("locus_tag"); ComparableTerm productTerm = RichObjectFactory.getDefaultOntology().getOrCreateTerm("product"); ComparableTerm synonymTerm = RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene_synonym"); ComparableTerm proteinIDTerm = RichObjectFactory.getDefaultOntology().getOrCreateTerm("protein_id"); //Create empty strings String gene = ""; String locus = ""; String product = ""; String geneSynonym = ""; String proteinID = ""; //Iterate through the notes in the annotation for (Iterator it = ra.getNoteSet().iterator(); it.hasNext();){ Note note = it.next(); //Check each note to see if it matches one of the required ComparableTerms if(note.getTerm().equals(locusTerm)){ locus = note.getValue().toString(); } if(note.getTerm().equals(productTerm)){ product = note.getValue().toString(); } if(note.getTerm().equals(geneTerm)){ gene = note.getValue().toString(); } if(note.getTerm().equals(synonymTerm)){ geneSynonym = note.getValue().toString(); } if(note.getTerm().equals(proteinIDTerm)){ proteinID = note.getValue().toString(); } } //Outout the feature information System.out.println(locus + " " + gene + " " + geneSynonym + " " + proteinID + " " + product +" " + featureStrand +" Chr1:" + featureLocation); } } public static void main(String args []){ if (args.length != 1){ System.out.println("Usage: java ExtractInformation "); System.exit(1); } else { new ExtractInformation(args[0]); } } } [/CODE] Dr. Jolyon Holdstock Senior Computational Biologist, Oxford Gene Technology, Begbroke Science Park, Sandy Lane, Yarnton, Oxford, OX5 1PF, UK. T: +44 (0)1865 856 852 F: +44 (0)1865 842 116 E: jolyon.holdstock at ogt.co.uk W: www.ogt.co.uk Looking to outsource your microarray studies? Look no further. Click here to tour our facilities Click here to request a quotation Scientific pedigree delivering high quality microarray results to you: * Service capacity >1000 samples per week * Rigorous QC from sample to result * Applications available include aCGH, CNV, methylation studies and miRNA Oxford Gene Technology (Operations) Ltd. Registered in England No: 03845432 Begbroke Science Park Sandy Lane Yarnton Oxford OX5 1PF Confidentiality Notice: The contents of this email from Oxford Gene Technology are confidential and intended solely for the person to whom it is addressed. It may contain privileged and confidential information. If you are not the intended recipient you must not read, copy, distribute, discuss or take any action in reliance on it. If you have received this email in error please advise the sender so that we can arrange for proper delivery. Then please delete the message from your inbox. Thank you. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/png Size: 15781 bytes Desc: image003.png URL: From holland at eaglegenomics.com Wed Jan 13 08:22:16 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 13 Jan 2010 13:22:16 +0000 Subject: [Biojava-dev] Script for cookbook In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3F02C64D8C@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3F02C64D8C@EUCLID.internal.ogtip.com> Message-ID: <982845F6-0845-474A-BEBD-C5D336C03F42@eaglegenomics.com> Just a few minor points, otherwise all good: 1. Your 'trying EMBL' code just tries Genbank again! 2. Typo in the FileNotFoundException message. 3. Check to see that your ComparableTerm retrieval statements using RichObjectFactory are not already defined as constants in RichSequence.Terms or GenbankFormat.Terms. If they are, use the constants instead as this makes the code clearer. 4. Your system.out line that prints the results has Chr1 hardcoded - should this be a parameter, or read from the file maybe? cheers, Richard On 13 Jan 2010, at 12:49, Jolyon Holdstock wrote: > Hi, > > > > I was going to add the following script to the annotation section of the > cookbook. > > It takes an EMBL or Genbank file and outputs information about each CDS > feature. > > > > I just wanted to check this was OK and also is this the most efficient > way of doing this? > > > > Cheers, > > > > Jolyon > > > > [CODE] > > import java.io.*; > > import java.util.*; > > import org.biojava.bio.*; > > import org.biojava.bio.seq.*; > > import org.biojava.bio.seq.io.*; > > import org.biojavax.*; > > import org.biojavax.ontology.*; > > import org.biojavax.bio.*; > > import org.biojavax.bio.seq.*; > > > > public class ExtractInformation { > > //Create the RichSequence object > > RichSequence richSeq; > > public ExtractInformation(String fileName){ > > //Load the sequence file > > try { > > richSeq = RichSequence.IOTools.readGenbankDNA(new > BufferedReader(new FileReader(fileName)),null).nextRichSequence(); > > } > > catch(FileNotFoundException fnfe){ > > System.out.println("FilwNotFoundException: " + fnfe); > > } > > catch(BioException bioe1){ > > System.err.println("Not a Genbank sequence trying EMBL"); > > try { > > richSeq = RichSequence.IOTools.readGenbankDNA(new > BufferedReader(new FileReader(fileName)),null).nextRichSequence(); > > } > > catch(BioException bioe2){ > > System.err.println("Not an EMBL sequence either"); > > System.exit(1); > > } > > catch(FileNotFoundException fnfe){ > > System.out.println("FilwNotFoundException: " + fnfe); > > } > > } > > //Filter the sequence on CDS features > > FeatureFilter ff = new FeatureFilter.ByType("CDS"); > > FeatureHolder fh = richSeq.filter(ff); > > > > //Iterate through the CDS features > > for (Iterator i = fh.features(); i.hasNext();){ > > RichFeature rf = i.next(); > > > > //Get the strand orientation of the feature > > char featureStrand = rf.getStrand().getToken(); > > > > //Get the location of the feature > > String featureLocation = rf.getLocation().toString(); > > > > //Get the annotation of the feature > > RichAnnotation ra = (RichAnnotation)rf.getAnnotation(); > > > > //Create the required ComparableTerms > > ComparableTerm geneTerm = > RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene"); > > ComparableTerm locusTerm = > RichObjectFactory.getDefaultOntology().getOrCreateTerm("locus_tag"); > > ComparableTerm productTerm = > RichObjectFactory.getDefaultOntology().getOrCreateTerm("product"); > > ComparableTerm synonymTerm = > RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene_synonym"); > > ComparableTerm proteinIDTerm = > RichObjectFactory.getDefaultOntology().getOrCreateTerm("protein_id"); > > > > //Create empty strings > > String gene = ""; > > String locus = ""; > > String product = ""; > > String geneSynonym = ""; > > String proteinID = ""; > > > > //Iterate through the notes in the annotation > > for (Iterator it = ra.getNoteSet().iterator(); > it.hasNext();){ > > Note note = it.next(); > > > > //Check each note to see if it matches one of the required > ComparableTerms > > if(note.getTerm().equals(locusTerm)){ > > locus = note.getValue().toString(); > > } > > if(note.getTerm().equals(productTerm)){ > > product = note.getValue().toString(); > > } > > if(note.getTerm().equals(geneTerm)){ > > gene = note.getValue().toString(); > > } > > if(note.getTerm().equals(synonymTerm)){ > > geneSynonym = note.getValue().toString(); > > } > > if(note.getTerm().equals(proteinIDTerm)){ > > proteinID = note.getValue().toString(); > > } > > } > > //Outout the feature information > > System.out.println(locus + " " + gene + " " + geneSynonym + " " > + proteinID + " " + product +" " + featureStrand +" Chr1:" + > featureLocation); > > } > > } > > > > public static void main(String args []){ > > if (args.length != 1){ > > System.out.println("Usage: java ExtractInformation Genbank or EMBL format>"); > > System.exit(1); > > } > > else { > > new ExtractInformation(args[0]); > > } > > } > > } > > [/CODE] > > > > Dr. Jolyon Holdstock > Senior Computational Biologist, > > Oxford Gene Technology, > Begbroke Science Park, > Sandy Lane, Yarnton, > Oxford, OX5 1PF, UK. > > T: +44 (0)1865 856 852 > F: +44 (0)1865 842 116 > E: jolyon.holdstock at ogt.co.uk > > W: www.ogt.co.uk > > > > Looking to outsource your microarray studies? Look no further. > Click here to tour our facilities > > > Click here to request a quotation > > > > > Scientific pedigree delivering high quality microarray results to you: > > * Service capacity >1000 samples per week > > * Rigorous QC from sample to > result > > * Applications available > include aCGH, CNV, methylation studies and miRNA > > > > Oxford Gene Technology (Operations) Ltd. Registered in England No: > 03845432 Begbroke Science Park Sandy Lane Yarnton Oxford OX5 1PF > > Confidentiality Notice: The contents of this email from Oxford Gene > Technology are confidential and intended solely for the person to whom > it is addressed. It may contain privileged and confidential information. > If you are not the intended recipient you must not read, copy, > distribute, discuss or take any action in reliance on it. If you have > received this email in error please advise the sender so that we can > arrange for proper delivery. Then please delete the message from your > inbox. Thank you. > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From genjasp at gmail.com Wed Jan 13 09:14:26 2010 From: genjasp at gmail.com (Alessandro Cipriani) Date: Wed, 13 Jan 2010 15:14:26 +0100 Subject: [Biojava-dev] Script for cookbook In-Reply-To: <982845F6-0845-474A-BEBD-C5D336C03F42@eaglegenomics.com> References: <588D0DD225D05746B5D8CAE1BE971F3F02C64D8C@EUCLID.internal.ogtip.com> <982845F6-0845-474A-BEBD-C5D336C03F42@eaglegenomics.com> Message-ID: <46b9a2151001130614i1d1b057xfdffa8f0051d0b89@mail.gmail.com> Hi, i think it is not a good practice: -to insert a try/catch block into anothet try/catch block. -to import all lib even unrequired -to make all methods public otherwise it seems to be ok! Regards Alex 2010/1/13 Richard Holland : > Just a few minor points, otherwise all good: > > ?1. Your 'trying EMBL' code just tries Genbank again! > ?2. Typo in the FileNotFoundException message. > ?3. Check to see that your ComparableTerm retrieval statements using RichObjectFactory are not already defined as constants in RichSequence.Terms or GenbankFormat.Terms. If they are, use the constants instead as this makes the code clearer. > ?4. Your system.out line that prints the results has Chr1 hardcoded - should this be a parameter, or read from the file maybe? > > cheers, > Richard > > On 13 Jan 2010, at 12:49, Jolyon Holdstock wrote: > >> Hi, >> >> >> >> I was going to add the following script to the annotation section of the >> cookbook. >> >> It takes an EMBL or Genbank file and outputs information about each CDS >> feature. >> >> >> >> I just wanted to check this was OK and also is this the most efficient >> way of doing this? >> >> >> >> Cheers, >> >> >> >> Jolyon >> >> >> >> [CODE] >> >> import java.io.*; >> >> import java.util.*; >> >> import org.biojava.bio.*; >> >> import org.biojava.bio.seq.*; >> >> import org.biojava.bio.seq.io.*; >> >> import org.biojavax.*; >> >> import org.biojavax.ontology.*; >> >> import org.biojavax.bio.*; >> >> import org.biojavax.bio.seq.*; >> >> >> >> public class ExtractInformation { >> >> ?//Create the RichSequence object >> >> ?RichSequence richSeq; >> >> ?public ExtractInformation(String fileName){ >> >> ? ?//Load the sequence file >> >> ? ?try { >> >> ? ? ?richSeq = RichSequence.IOTools.readGenbankDNA(new >> BufferedReader(new FileReader(fileName)),null).nextRichSequence(); >> >> ? ?} >> >> ? ?catch(FileNotFoundException fnfe){ >> >> ? ? ?System.out.println("FilwNotFoundException: " + fnfe); >> >> ? ?} >> >> ? ?catch(BioException bioe1){ >> >> ? ? ?System.err.println("Not a Genbank sequence trying EMBL"); >> >> ? ? ?try ?{ >> >> ? ? ? ?richSeq = RichSequence.IOTools.readGenbankDNA(new >> BufferedReader(new FileReader(fileName)),null).nextRichSequence(); >> >> ? ? ?} >> >> ? ? ?catch(BioException bioe2){ >> >> ? ? ? ?System.err.println("Not an EMBL sequence either"); >> >> ? ? ? ?System.exit(1); >> >> ? ? ?} >> >> ? ? ?catch(FileNotFoundException fnfe){ >> >> ? ? ? ?System.out.println("FilwNotFoundException: " + fnfe); >> >> ? ? ?} >> >> ? ?} >> >> ? ?//Filter the sequence on CDS features >> >> ? ?FeatureFilter ff = new FeatureFilter.ByType("CDS"); >> >> ? ?FeatureHolder fh = richSeq.filter(ff); >> >> >> >> ? ?//Iterate through the CDS features >> >> ? ?for (Iterator i = fh.features(); i.hasNext();){ >> >> ? ? ?RichFeature rf = i.next(); >> >> >> >> ? ? ?//Get the strand orientation of the feature >> >> ? ? ?char featureStrand = rf.getStrand().getToken(); >> >> >> >> ? ? ?//Get the location of the feature >> >> ? ? ?String featureLocation = rf.getLocation().toString(); >> >> >> >> ? ? ?//Get the annotation of the feature >> >> ? ? ?RichAnnotation ra = (RichAnnotation)rf.getAnnotation(); >> >> >> >> ? ? ?//Create the required ComparableTerms >> >> ? ? ?ComparableTerm geneTerm = >> RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene"); >> >> ? ? ?ComparableTerm locusTerm = >> RichObjectFactory.getDefaultOntology().getOrCreateTerm("locus_tag"); >> >> ? ? ?ComparableTerm productTerm = >> RichObjectFactory.getDefaultOntology().getOrCreateTerm("product"); >> >> ? ? ?ComparableTerm synonymTerm = >> RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene_synonym"); >> >> ? ? ?ComparableTerm proteinIDTerm = >> RichObjectFactory.getDefaultOntology().getOrCreateTerm("protein_id"); >> >> >> >> ? ? ?//Create empty strings >> >> ? ? ?String gene = ""; >> >> ? ? ?String locus = ""; >> >> ? ? ?String product = ""; >> >> ? ? ?String geneSynonym = ""; >> >> ? ? ?String proteinID = ""; >> >> >> >> ? ? ?//Iterate through the notes in the annotation >> >> ? ? ?for (Iterator it = ra.getNoteSet().iterator(); >> it.hasNext();){ >> >> ? ? ? ?Note note = it.next(); >> >> >> >> ? ? ?//Check each note to see if it matches one of the required >> ComparableTerms >> >> ? ? ? ?if(note.getTerm().equals(locusTerm)){ >> >> ? ? ? ? ?locus = note.getValue().toString(); >> >> ? ? ? ?} >> >> ? ? ? ?if(note.getTerm().equals(productTerm)){ >> >> ? ? ? ? ?product = note.getValue().toString(); >> >> ? ? ? ?} >> >> ? ? ? ?if(note.getTerm().equals(geneTerm)){ >> >> ? ? ? ? ?gene = note.getValue().toString(); >> >> ? ? ? ?} >> >> ? ? ? ?if(note.getTerm().equals(synonymTerm)){ >> >> ? ? ? ? ?geneSynonym = note.getValue().toString(); >> >> ? ? ? ?} >> >> ? ? ? ?if(note.getTerm().equals(proteinIDTerm)){ >> >> ? ? ? ? ?proteinID = note.getValue().toString(); >> >> ? ? ? ?} >> >> ? ? ?} >> >> ? ? ?//Outout the feature information >> >> ? ? ?System.out.println(locus + " ?" + gene + " ?" + geneSynonym + " ?" >> + proteinID + " ?" + product +" ?" + featureStrand +" ?Chr1:" + >> featureLocation); >> >> ? ?} >> >> ?} >> >> >> >> ?public static void main(String args []){ >> >> ? ?if (args.length != 1){ >> >> ? ? ?System.out.println("Usage: java ExtractInformation > Genbank or EMBL format>"); >> >> ? ? ?System.exit(1); >> >> ? ?} >> >> ? ?else { >> >> ? ? ?new ExtractInformation(args[0]); >> >> ? ?} >> >> ?} >> >> } >> >> [/CODE] >> >> >> >> Dr. Jolyon Holdstock >> Senior Computational Biologist, >> >> Oxford Gene Technology, >> Begbroke Science Park, >> Sandy Lane, Yarnton, >> Oxford, OX5 1PF, UK. >> >> T: +44 (0)1865 856 852 >> F: +44 (0)1865 842 116 >> E: jolyon.holdstock at ogt.co.uk >> >> W: www.ogt.co.uk >> >> >> >> Looking to outsource your microarray studies? Look no further. >> Click here to tour our facilities >> >> >> Click here to request a quotation >> >> >> >> >> Scientific pedigree delivering high quality microarray results to you: >> >> * ? ? ? ? Service capacity >1000 samples per week >> >> * ? ? ? ? Rigorous QC ?from sample to >> result >> >> * ? ? ? ? Applications ?available >> include aCGH, CNV, methylation studies and miRNA >> >> >> >> Oxford Gene Technology (Operations) Ltd. Registered in England No: >> 03845432 Begbroke Science Park Sandy Lane Yarnton Oxford OX5 1PF >> >> Confidentiality Notice: The contents of this email from Oxford Gene >> Technology are confidential and intended solely for the person to whom >> it is addressed. It may contain privileged and confidential information. >> If you are not the intended recipient you must not read, copy, >> distribute, discuss or take any action in reliance on it. If you have >> received this email in error please advise the sender so that we can >> arrange for proper delivery. Then please delete the message from your >> inbox. Thank you. >> >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From mahogny at areta.org Sun Jan 17 12:52:45 2010 From: mahogny at areta.org (Johan Henriksson) Date: Sun, 17 Jan 2010 18:52:45 +0100 Subject: [Biojava-dev] biojava hackathon In-Reply-To: References: Message-ID: sorry, I will not be able to attend the hackathon this year for a whole lot of reasons. in addition, sorry for telling this so late. I'm looking forward to the next one though; happy hacking to the rest of you! /Johan -- ----------------------------------------------------------- Johan Henriksson PhD student, Karolinska Institutet http://mahogny.areta.org http://www.endrov.net From heuermh at acm.org Mon Jan 18 23:54:21 2010 From: heuermh at acm.org (Michael Heuer) Date: Mon, 18 Jan 2010 23:54:21 -0500 (EST) Subject: [Biojava-dev] hackathon task list, pom refactoring, &c. In-Reply-To: Message-ID: Hello Hackathon participants, Might we start a task list for the hackathon tomorrow, on the biojava wiki perhaps? It would be nice to be able to contribute to the list of tasks and to claim tasks to work on from off-site. Today I refactored and cleaned up all the maven poms. Build and test works for me on linux, win, and mac with commandline maven. Eclipse via m2eclipse is good for almost everything, but has trouble firing off the maven-jaxb2-plugin code generation step. If you want to give it a try, version maven 3.0.6-alpha is a lot more picky about reproducable builds than earlier versions, and complained a ton when I first started. Now the only warning is the deprecated maven1 repo link at java.net. I have set up cia.vc to post svn commit notification via IRC to #biojava on irc.freenode.net. I'll also be on there most of the day tomorrow, or available on google chat. Ah, with the blogs and the twitter and stuff, I can't keep up. ;) michael From heuermh at acm.org Tue Jan 19 00:02:28 2010 From: heuermh at acm.org (Michael Heuer) Date: Tue, 19 Jan 2010 00:02:28 -0500 (EST) Subject: [Biojava-dev] [Biojava-l] BioJava Hackathon - Day 1 In-Reply-To: <95988839-75A3-4087-8071-5A19BEECDD04@eaglegenomics.com> Message-ID: On Mon, 18 Jan 2010, Richard Holland wrote: > ... > > Also by making the new SymbolList implement the standard List > interface from Collections, it can be used in nice Java shortcuts such > as the new foreach loops, and standard iterators can be used (instead of > the current SymbolListIterator method). Collections API can also be used > then to do things like subsets or reverses. Is this a good idea? It might preclude some efficient implementation strategies, such as ParallelArray from JSR166 http://gee.cs.oswego.edu/dl/jsr166/dist/extra166ydocs/extra166y/ParallelArray.html Would Iterable be sufficient enough? Also, List indexing is not compatible with typical sequence feature indexing. michael From holland at eaglegenomics.com Tue Jan 19 00:45:14 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 19 Jan 2010 05:45:14 +0000 Subject: [Biojava-dev] [Biojava-l] BioJava Hackathon - Day 1 In-Reply-To: References: Message-ID: <56B4AA4E-FF88-4D4E-BD95-49B76194FA82@eaglegenomics.com> Good point - Iterable would probably be fine. On 19 Jan 2010, at 05:02, Michael Heuer wrote: > On Mon, 18 Jan 2010, Richard Holland wrote: > >> ... >> >> Also by making the new SymbolList implement the standard List >> interface from Collections, it can be used in nice Java shortcuts such >> as the new foreach loops, and standard iterators can be used (instead of >> the current SymbolListIterator method). Collections API can also be used >> then to do things like subsets or reverses. > > Is this a good idea? It might preclude some efficient implementation > strategies, such as ParallelArray from JSR166 > > http://gee.cs.oswego.edu/dl/jsr166/dist/extra166ydocs/extra166y/ParallelArray.html > > Would Iterable be sufficient enough? > > Also, List indexing is not compatible with typical sequence feature > indexing. > > michael > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ayates at ebi.ac.uk Tue Jan 19 03:50:49 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 19 Jan 2010 08:50:49 +0000 Subject: [Biojava-dev] [Biojava-l] BioJava Hackathon - Day 1 In-Reply-To: <56B4AA4E-FF88-4D4E-BD95-49B76194FA82@eaglegenomics.com> References: <56B4AA4E-FF88-4D4E-BD95-49B76194FA82@eaglegenomics.com> Message-ID: <25714175-4B75-4D30-AA42-E0176060DCAB@ebi.ac.uk> This somewhat targets what this new version should be about; making the most of Java5+ features (Iterable & generics being the first things that spring to mind). On 19 Jan 2010, at 05:45, Richard Holland wrote: > Good point - Iterable would probably be fine. > > On 19 Jan 2010, at 05:02, Michael Heuer wrote: > >> On Mon, 18 Jan 2010, Richard Holland wrote: >> >>> ... >>> >>> Also by making the new SymbolList implement the standard List >>> interface from Collections, it can be used in nice Java shortcuts >>> such >>> as the new foreach loops, and standard iterators can be used >>> (instead of >>> the current SymbolListIterator method). Collections API can also >>> be used >>> then to do things like subsets or reverses. >> >> Is this a good idea? It might preclude some efficient implementation >> strategies, such as ParallelArray from JSR166 >> >> http://gee.cs.oswego.edu/dl/jsr166/dist/extra166ydocs/extra166y/ParallelArray.html >> >> Would Iterable be sufficient enough? >> >> Also, List indexing is not compatible with typical sequence feature >> indexing. >> >> michael >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From andy.law at roslin.ed.ac.uk Tue Jan 19 04:23:26 2010 From: andy.law at roslin.ed.ac.uk (Andy Law (RI)) Date: Tue, 19 Jan 2010 09:23:26 +0000 Subject: [Biojava-dev] hackathon task list, pom refactoring, &c. In-Reply-To: References: Message-ID: <6B943C9E-98E1-4788-8860-2B6AC266421C@roslin.ed.ac.uk> On 19 Jan 2010, at 04:54, Michael Heuer wrote: > > I have set up cia.vc to post svn commit notification via IRC to > #biojava > on irc.freenode.net. I'll also be on there most of the day > tomorrow, or > available on google chat. > > Ah, with the blogs and the twitter and stuff, I can't keep up. ;) Sheesh! What was ever wrong with posting occasional notes via email? Later, Andy -------- Yada, yada, yada... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. From jw12 at sanger.ac.uk Tue Jan 19 05:41:12 2010 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Tue, 19 Jan 2010 10:41:12 +0000 Subject: [Biojava-dev] DAS Workshop Registrations now Open (workshop date 7-9 April 2010) Message-ID: <9EDF4E46-15F8-434E-B557-2DE5906C4182@sanger.ac.uk> If you don't know about DAS and wish to know how to distribute your latest biological annotation to the world then the upcoming DAS workshop maybe for you. If you know about DAS and are maybe a DAS client developer then the upcoming DAS workshop is for you (as you will need to know about the upcoming DAS 1.6 Specification and how it may affect your software). For information on the workshop and registration please go to: http://www.ebi.ac.uk/training/handson/DAS_070410.html Jonathan Warren Senior Developer and DAS coordinator jw12 at sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From heuermh at acm.org Tue Jan 19 19:52:29 2010 From: heuermh at acm.org (Michael Heuer) Date: Tue, 19 Jan 2010 19:52:29 -0500 (EST) Subject: [Biojava-dev] sequence-core tests currently do not compile In-Reply-To: <6B943C9E-98E1-4788-8860-2B6AC266421C@roslin.ed.ac.uk> Message-ID: Module sequence-core tests currently do not compile [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile (default-testCompile) on project sequence-co re: Compilation failure: Compilation failure: C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[8,7] class DNATests is public, should be declared in a file named DNATests.java C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[45,9] cannot find sy mbol symbol : class Sequence location: class org.biojava3.core.DNATests C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[45,28] cannot find s ymbol symbol : class DNACompound location: class org.biojava3.core.DNATests C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[40,17] cannot find s ymbol symbol : class BadSequence location: class org.biojava3.core.DNATests C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[12,15] getSeq(java.l ang.String) in org.biojava3.core.DNATests cannot be applied to () C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[18,15] getSeq(java.l ang.String) in org.biojava3.core.DNATests cannot be applied to () C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[24,15] getSeq(java.l ang.String) in org.biojava3.core.DNATests cannot be applied to () C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[37,16] cannot find s ymbol symbol : variable toString location: class java.lang.String C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[47,15] cannot find s ymbol symbol : class DNASequence location: class org.biojava3.core.DNATests From ayates at ebi.ac.uk Wed Jan 20 04:55:31 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 20 Jan 2010 09:55:31 +0000 Subject: [Biojava-dev] sequence-core tests currently do not compile In-Reply-To: References: Message-ID: <9981AAC1-1197-4DF7-B347-5ADE4E0C03C0@ebi.ac.uk> Hmmm interesting. That should have all been commented out I thought. I will fix & commit back On 20 Jan 2010, at 00:52, Michael Heuer wrote: > > Module sequence-core tests currently do not compile > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile > (default-testCompile) on project sequence-co > re: Compilation failure: Compilation failure: > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[8,7] > class DNATests > is public, should be declared in a file named DNATests.java > > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[45,9] > cannot find sy > mbol > symbol : class Sequence > location: class org.biojava3.core.DNATests > > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[45,28] > cannot find s > ymbol > symbol : class DNACompound > location: class org.biojava3.core.DNATests > > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[40,17] > cannot find s > ymbol > symbol : class BadSequence > location: class org.biojava3.core.DNATests > > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[12,15] > getSeq(java.l > ang.String) in org.biojava3.core.DNATests cannot be applied to () > > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[18,15] > getSeq(java.l > ang.String) in org.biojava3.core.DNATests cannot be applied to () > > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[24,15] > getSeq(java.l > ang.String) in org.biojava3.core.DNATests cannot be applied to () > > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[37,16] > cannot find s > ymbol > symbol : variable toString > location: class java.lang.String > > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[47,15] > cannot find s > ymbol > symbol : class DNASequence > location: class org.biojava3.core.DNATests > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From HWillis at scripps.edu Tue Jan 26 13:09:36 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 26 Jan 2010 13:09:36 -0500 Subject: [Biojava-dev] Code Update Message-ID: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> I checked in updates with test cases for Fasta fileparsing where the main focus is on the fasta header. The test cases are based on the wikipedia examples so results will vary with actual files. It is very easy now to do a custom header parser so we have lots of flexibility. I also started the code for the file pointer sequence proxy where the key usage is creating a sequence with the header and storing a reference to the file and offset in the file for the start of the sequence. When a method is called related to getting a sequence/subsequence the init() method is called to load the sequence data via RandomAccessFile with a seek to the offset. It turns out that none of the java io classes will actually return an offset index of the actual bytes read. This also gets complicated with the readline() methods where the CR and/or LF is stripped off when the string is returned so you can't keep track of it externally. I copied the BufferedReader.java class to BufferedReaderBytesRead.java and keep track of the file pointer internally. This code still needs to be tested. This should be a great way to load large date sets with minimal memory. To complete this approach I will probably do a collection that is proxy aware that can go through and free up storage by returning a sequence to its proxy state. I will work this week on getting some wiki pages created to give examples on using the header parsing interface and proxy sequences. How do we want to organize wiki pages related to biojava3 work? Thanks Scooter From andreas at sdsc.edu Tue Jan 26 13:45:59 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 26 Jan 2010 10:45:59 -0800 Subject: [Biojava-dev] Code Update In-Reply-To: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> References: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> Message-ID: <59a41c431001261045g301d8a48hbd55999cafa5601c@mail.gmail.com> the cookbook approach seems to work quite well. You could start a new "Chapter" in the book and make it clear that this will be only available once biojava 3 has been released (or via SVN checkout) Andreas On Tue, Jan 26, 2010 at 10:09 AM, Scooter Willis wrote: > > I checked in updates with test cases for Fasta fileparsing where the main focus is on the fasta header. ?The test cases are based on the wikipedia examples so results will vary with actual files. It is very easy now to do a custom header parser so we have lots of flexibility. ?I also started the code for the file pointer sequence proxy where the key usage is creating a sequence with the header and storing a reference to the file and offset in the file for the start of the sequence. When a method is called related to getting a sequence/subsequence the init() method is called to load the sequence data via RandomAccessFile with a seek to the offset. It turns out that none of the java io classes will actually return an offset index of the actual bytes read. This also gets complicated with the readline() methods where the CR and/or LF is stripped off when the string is returned so you can't keep track of it externally. I copied the BufferedReader.java class to BufferedReaderBytes! > ?Read.java and keep track of the file pointer internally. This code still needs to be tested. This should be a great way to load large date sets with minimal memory. To complete this approach I will probably do a collection that is proxy aware that can go through and free up storage by returning a sequence to its proxy state. > > I will work this week on getting some wiki pages created to give examples on using the header parsing interface and proxy sequences. How do we want to organize wiki pages related to biojava3 work? > > Thanks > > Scooter > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From ayates at ebi.ac.uk Tue Jan 26 14:58:43 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 26 Jan 2010 19:58:43 +0000 Subject: [Biojava-dev] Code Update In-Reply-To: <59a41c431001261045g301d8a48hbd55999cafa5601c@mail.gmail.com> References: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> <59a41c431001261045g301d8a48hbd55999cafa5601c@mail.gmail.com> Message-ID: <7075E1DA-E19E-4AB0-A6DC-DF2A62D7DDE8@ebi.ac.uk> Talking about code updates I've got DNA -> RNA -> Peptide working quite well. It's about a day or two of tinkering away from being in a sensible state. There's also some utilities I've gone & created; they've gone into org.biojava3.core.util ... anyone got any better suggestions as to where they should live? Andy On 26 Jan 2010, at 18:45, Andreas Prlic wrote: > the cookbook approach seems to work quite well. You could start a new > "Chapter" in the book and make it clear that this will be only > available once biojava 3 has been released (or via SVN checkout) > > Andreas > > On Tue, Jan 26, 2010 at 10:09 AM, Scooter Willis > wrote: >> >> I checked in updates with test cases for Fasta fileparsing where >> the main focus is on the fasta header. The test cases are based on >> the wikipedia examples so results will vary with actual files. It >> is very easy now to do a custom header parser so we have lots of >> flexibility. I also started the code for the file pointer sequence >> proxy where the key usage is creating a sequence with the header >> and storing a reference to the file and offset in the file for the >> start of the sequence. When a method is called related to getting a >> sequence/subsequence the init() method is called to load the >> sequence data via RandomAccessFile with a seek to the offset. It >> turns out that none of the java io classes will actually return an >> offset index of the actual bytes read. This also gets complicated >> with the readline() methods where the CR and/or LF is stripped off >> when the string is returned so you can't keep track of it >> externally. I copied the BufferedReader.java class to >> BufferedReaderBytes! >> Read.java and keep track of the file pointer internally. This code >> still needs to be tested. This should be a great way to load large >> date sets with minimal memory. To complete this approach I will >> probably do a collection that is proxy aware that can go through >> and free up storage by returning a sequence to its proxy state. >> >> I will work this week on getting some wiki pages created to give >> examples on using the header parsing interface and proxy sequences. >> How do we want to organize wiki pages related to biojava3 work? >> >> Thanks >> >> Scooter >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From HWillis at scripps.edu Tue Jan 26 15:17:47 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 26 Jan 2010 15:17:47 -0500 Subject: [Biojava-dev] Code Update In-Reply-To: <7075E1DA-E19E-4AB0-A6DC-DF2A62D7DDE8@ebi.ac.uk> References: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> <59a41c431001261045g301d8a48hbd55999cafa5601c@mail.gmail.com> <7075E1DA-E19E-4AB0-A6DC-DF2A62D7DDE8@ebi.ac.uk> Message-ID: <56D4673E-EB91-43BA-BE0F-99BCD202752A@scripps.edu> Andy Let me know when you have that in a healthy state and I will work on the gtf/gff3 parser->create gene->transcript->(exon)->to protein code. Scooter On Jan 26, 2010, at 2:58 PM, Andy Yates wrote: > Talking about code updates I've got DNA -> RNA -> Peptide working > quite well. It's about a day or two of tinkering away from being in a > sensible state. There's also some utilities I've gone & created; > they've gone into org.biojava3.core.util ... anyone got any better > suggestions as to where they should live? > > Andy > > On 26 Jan 2010, at 18:45, Andreas Prlic wrote: > >> the cookbook approach seems to work quite well. You could start a new >> "Chapter" in the book and make it clear that this will be only >> available once biojava 3 has been released (or via SVN checkout) >> >> Andreas >> >> On Tue, Jan 26, 2010 at 10:09 AM, Scooter Willis >> wrote: >>> >>> I checked in updates with test cases for Fasta fileparsing where >>> the main focus is on the fasta header. The test cases are based on >>> the wikipedia examples so results will vary with actual files. It >>> is very easy now to do a custom header parser so we have lots of >>> flexibility. I also started the code for the file pointer sequence >>> proxy where the key usage is creating a sequence with the header >>> and storing a reference to the file and offset in the file for the >>> start of the sequence. When a method is called related to getting a >>> sequence/subsequence the init() method is called to load the >>> sequence data via RandomAccessFile with a seek to the offset. It >>> turns out that none of the java io classes will actually return an >>> offset index of the actual bytes read. This also gets complicated >>> with the readline() methods where the CR and/or LF is stripped off >>> when the string is returned so you can't keep track of it >>> externally. I copied the BufferedReader.java class to >>> BufferedReaderBytes! >>> Read.java and keep track of the file pointer internally. This code >>> still needs to be tested. This should be a great way to load large >>> date sets with minimal memory. To complete this approach I will >>> probably do a collection that is proxy aware that can go through >>> and free up storage by returning a sequence to its proxy state. >>> >>> I will work this week on getting some wiki pages created to give >>> examples on using the header parsing interface and proxy sequences. >>> How do we want to organize wiki pages related to biojava3 work? >>> >>> Thanks >>> >>> Scooter >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > From jacobsen at ebi.ac.uk Wed Jan 27 06:32:31 2010 From: jacobsen at ebi.ac.uk (Jules Jacobsen) Date: Wed, 27 Jan 2010 11:32:31 +0000 Subject: [Biojava-dev] Code Update In-Reply-To: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> References: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> Message-ID: <4B60244F.2070603@ebi.ac.uk> Cool - I've a sudden need for a SELEX reader/parser so will try and knock one of those together in the immediate future - might lift some code from Jalview for this purpose and see how the new MultipleSequenceAlignment class behaves in real life. Plus I've just tweaked the FastaReader and FastaWriter to be setup-neutral. Jules On 26/01/2010 18:09, Scooter Willis wrote: > I checked in updates with test cases for Fasta fileparsing where the main focus is on the fasta header. The test cases are based on the wikipedia examples so results will vary with actual files. It is very easy now to do a custom header parser so we have lots of flexibility. I also started the code for the file pointer sequence proxy where the key usage is creating a sequence with the header and storing a reference to the file and offset in the file for the start of the sequence. When a method is called related to getting a sequence/subsequence the init() method is called to load the sequence data via RandomAccessFile with a seek to the offset. It turns out that none of the java io classes will actually return an offset index of the actual bytes read. This also gets complicated with the readline() methods where the CR and/or LF is stripped off when the string is returned so you can't keep track of it externally. I copied the BufferedReader.java class to BufferedReaderBytes! > Read.java and keep track of the file pointer internally. This code still needs to be tested. This should be a great way to load large date sets with minimal memory. To complete this approach I will probably do a collection that is proxy aware that can go through and free up storage by returning a sequence to its proxy state. > > I will work this week on getting some wiki pages created to give examples on using the header parsing interface and proxy sequences. How do we want to organize wiki pages related to biojava3 work? > > Thanks > > Scooter > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From chapmanb at 50mail.com Thu Jan 28 15:35:05 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 28 Jan 2010 15:35:05 -0500 Subject: [Biojava-dev] OpenBio solution challenge: Project updates at BOSC 2010 Message-ID: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Hello all; The BOSC 2010 organizing committee is hard at work getting prepared for this July's meeting in Boston: http://www.open-bio.org/wiki/BOSC_2010 One of the items we've traditionally had at the conference is a project update from each of the OpenBio affiliated groups. This year, we're thinking about organizing these talks around a central theme: the OpenBio solution challenge. We start with a biological question of general interest, and each of the project talks would focus around how you would solve that problem using your toolkit and programming language. This is meant to provide a challenge for OpenBio contributors, a nice tutorial style overview of various projects and approaches for other programmers, and a fun opportunity to compete and learn from other projects. Conference attendees will vote on their favorite solution, with the winner receiving fame and fortune (warning: fortune not guaranteed). For this to be successful, it of course requires interest and enthusiasm from y'all fine folks involved with the projects. Specifically: - Is there interest from your group in participating in the challenge? You'll want at least a few people to work on it, and someone to give a presentation at BOSC. - Do you have suggestions on a good theme or specific biological problem to tackle? We'll hope to pick something in a sweet spot that is challenging enough to be of interest, yet reasonable for presentation and preparation. Let's discuss ideas and get this together. Since the schedule for BOSC is developing rapidly, please give us an idea if you're interested by February 12th, and copy responses to the BOSC mailing list as a central place for discussion. bosc at open-bio.org Thanks, Brad, Michael, and the BOSC organizing committee From markw at illuminae.com Thu Jan 28 16:17:44 2010 From: markw at illuminae.com (Mark Wilkinson) Date: Thu, 28 Jan 2010 13:17:44 -0800 Subject: [Biojava-dev] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: <20100128203505.GG40046@sobchak.mgh.harvard.edu> References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: Brad, this sounds exciting! One thing strikes me, though - by asking for the sub-projects to propose the "grand challenge" themselves the one thing you can guarantee is that the "grand challenge" is solvable (or more likely, already solved!) Other "grand challenge" kinds of meetings have an independent third party pose the problem that has to be solved, and then all groups work toward a solution and compare their results. This would, IMO, be more revealing of the "state of the art" in each Open-Bio project, and point out where the weaknesses are that we should be focusing on... Someone (for example, you!) could act as the moderator to ensure that the "grand challenge" was at least a reasonable one, within the scope of what an Open-Bio project *should* be able to solve... Just my CAD $0.02 Mark On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman wrote: > Hello all; > The BOSC 2010 organizing committee is hard at work getting prepared for > this > July's meeting in Boston: > > http://www.open-bio.org/wiki/BOSC_2010 > > One of the items we've traditionally had at the conference is a project > update from each of the OpenBio affiliated groups. This year, we're > thinking > about organizing these talks around a central theme: the OpenBio solution > challenge. We start with a biological question of general interest, and > each > of the project talks would focus around how you would solve that problem > using your toolkit and programming language. > > This is meant to provide a challenge for OpenBio contributors, a nice > tutorial > style overview of various projects and approaches for other programmers, > and a > fun opportunity to compete and learn from other projects. Conference > attendees > will vote on their favorite solution, with the winner receiving fame and > fortune (warning: fortune not guaranteed). > > For this to be successful, it of course requires interest and enthusiasm > from > y'all fine folks involved with the projects. Specifically: > > - Is there interest from your group in participating in the challenge? > You'll > want at least a few people to work on it, and someone to give a > presentation > at BOSC. > > - Do you have suggestions on a good theme or specific biological problem > to > tackle? We'll hope to pick something in a sweet spot that is > challenging > enough to be of interest, yet reasonable for presentation and > preparation. > > Let's discuss ideas and get this together. Since the schedule for BOSC is > developing rapidly, please give us an idea if you're interested by > February 12th, and copy responses to the BOSC mailing list as a central > place for discussion. > > bosc at open-bio.org > > Thanks, > Brad, Michael, and the BOSC organizing committee > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev -- Mark D Wilkinson, PI Bioinformatics Assistant Professor, Medical Genetics The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research Providence Heart + Lung Institute University of British Columbia - St. Paul's Hospital Vancouver, BC, Canada From HWillis at scripps.edu Thu Jan 28 20:03:10 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 28 Jan 2010 20:03:10 -0500 Subject: [Biojava-dev] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: <716E205A-5196-409F-A7BC-EF0F52AA997A@scripps.edu> Brad I agree with Mark that a particular problem may be biased towards a toolkit/language. Another approach would be to list a collection of problems and each group would then pick a problem to present. Could be a little more interesting to the audience as you are exposed to different problems and the various strengths of each toolkit. This could also help guide future development in the other toolkits as you would benefit from learning about the api and/or programming language. Each group would register a problem that they are going to present. From the group of problems not picked that becomes the surprise challenge where each group has 24 hours to either put together a presentation or an actual solution. Scooter On Jan 28, 2010, at 4:17 PM, Mark Wilkinson wrote: > > Brad, this sounds exciting! > > One thing strikes me, though - by asking for the sub-projects to propose > the "grand challenge" themselves the one thing you can guarantee is that > the "grand challenge" is solvable (or more likely, already solved!) > > Other "grand challenge" kinds of meetings have an independent third party > pose the problem that has to be solved, and then all groups work toward a > solution and compare their results. This would, IMO, be more revealing of > the "state of the art" in each Open-Bio project, and point out where the > weaknesses are that we should be focusing on... Someone (for example, > you!) could act as the moderator to ensure that the "grand challenge" was > at least a reasonable one, within the scope of what an Open-Bio project > *should* be able to solve... > > Just my CAD $0.02 > > Mark > > > > On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman > wrote: > >> Hello all; >> The BOSC 2010 organizing committee is hard at work getting prepared for >> this >> July's meeting in Boston: >> >> http://www.open-bio.org/wiki/BOSC_2010 >> >> One of the items we've traditionally had at the conference is a project >> update from each of the OpenBio affiliated groups. This year, we're >> thinking >> about organizing these talks around a central theme: the OpenBio solution >> challenge. We start with a biological question of general interest, and >> each >> of the project talks would focus around how you would solve that problem >> using your toolkit and programming language. >> >> This is meant to provide a challenge for OpenBio contributors, a nice >> tutorial >> style overview of various projects and approaches for other programmers, >> and a >> fun opportunity to compete and learn from other projects. Conference >> attendees >> will vote on their favorite solution, with the winner receiving fame and >> fortune (warning: fortune not guaranteed). >> >> For this to be successful, it of course requires interest and enthusiasm >> from >> y'all fine folks involved with the projects. Specifically: >> >> - Is there interest from your group in participating in the challenge? >> You'll >> want at least a few people to work on it, and someone to give a >> presentation >> at BOSC. >> >> - Do you have suggestions on a good theme or specific biological problem >> to >> tackle? We'll hope to pick something in a sweet spot that is >> challenging >> enough to be of interest, yet reasonable for presentation and >> preparation. >> >> Let's discuss ideas and get this together. Since the schedule for BOSC is >> developing rapidly, please give us an idea if you're interested by >> February 12th, and copy responses to the BOSC mailing list as a central >> place for discussion. >> >> bosc at open-bio.org >> >> Thanks, >> Brad, Michael, and the BOSC organizing committee >> _______________________________________________ >> MOBY-dev mailing list >> MOBY-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/moby-dev > > > -- > Mark D Wilkinson, PI Bioinformatics > Assistant Professor, Medical Genetics > The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research > Providence Heart + Lung Institute > University of British Columbia - St. Paul's Hospital > Vancouver, BC, Canada > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From biopython at maubp.freeserve.co.uk Fri Jan 29 05:36:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 29 Jan 2010 10:36:40 +0000 Subject: [Biojava-dev] [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: <320fb6e01001290236l1ad02515w403a19f94dbb6d15@mail.gmail.com> Hi all, This is a great topic but should be continue it on just the one mailing list? Is there a suitable BOSC list, or how about the general Open Bio list? On Thu, Jan 28, 2010 at 9:17 PM, Mark Wilkinson wrote: > > Brad, this sounds exciting! > > One thing strikes me, though - by asking for the sub-projects to propose > the "grand challenge" themselves the one thing you can guarantee is that > the "grand challenge" is solvable (or more likely, already solved!) > > Other "grand challenge" kinds of meetings have an independent third party > pose the problem that has to be solved, and then all groups work toward a > solution and compare their results. ?This would, IMO, be more revealing of > the "state of the art" in each Open-Bio project, and point out where the > weaknesses are that we should be focusing on... ?Someone (for example, > you!) could act as the moderator to ensure that the "grand challenge" was > at least a reasonable one, within the scope of what an Open-Bio project > *should* be able to solve... > > Just my CAD $0.02 > > Mark One possible problem with having Brad act as moderator is his ties to Biopython (plus it would be a shame if we'd be one man down for trying to solve the challenges - grin). Having a project representative "sign off" on the challenge might work - or simply the whole of the BOSC committee which is quite balanced. Alternatively some kind of panel of challenges does seem a good way to reduce individual project bias (as suggest by Scooter), but there will still need to be a judging committee. I'm curious what kind of challenges the BOSC committee had in mind - would something like taking a newly sequence bacteria and producing an automated annotation as a GenBank, EMBL, or GFF file be too ambitious for example? There are already several major projects to do this e.g. RAST http://rast.nmpdr.org/ Peter (@Biopython) From ayates at ebi.ac.uk Mon Jan 4 09:47:58 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Mon, 4 Jan 2010 09:47:58 +0000 Subject: [Biojava-dev] biojava hackathon In-Reply-To: <59a41c430912210820p59ae6b46wa239ae78328098d8@mail.gmail.com> References: <59a41c430912210820p59ae6b46wa239ae78328098d8@mail.gmail.com> Message-ID: <31FC6A52-DE16-47B1-86E7-D5C49EA6C6FA@ebi.ac.uk> Other hotels available near the campus are: http://www.redlionhinxton.co.uk/ (the nearest non WT owned accommodation) http://www.johnbarleycorn.co.uk/ (in the next village along) http://www.duxfordlodgehotel.co.uk/ (see above) http://www.wtconference.org/ (owned by WT & sometimes will offer accommodation) However one problem with all of these are the costs involved; if you can get into the Travelodge Andreas mentioned it's normally quite a bit cheaper but do check carefully. Spending ?5 more per night to get into one of the other places can be worth it. Andy On 21 Dec 2009, at 16:20, Andreas Prlic wrote: > A few people asked me where to stay and I suggested to book the > Travelodge in Hills road. It is a plain and cheap hotel (for > Cambridge), walking distance to the train station, town center (20 > min) and close to one of the stops of the Genome Campus bus routes. > > Andreas > > On Mon, Dec 21, 2009 at 7:23 AM, Johan Henriksson > wrote: >> Hello, >> has anyone done their homework and figured out which hotels are the >> best. >> maybe someone working there knows the "standard" ones? >> >> it seems I can join in. have not decided exactly what I will work >> on but >> surely not the protein stuff. either improving the core or the >> algorithms/data structures. I have experimented a bit with >> alternative ways >> of dealing with annotation and sequence data, maybe it can be of use. >> >> /Johan >> >> -- >> ----------------------------------------------------------- >> Johan Henriksson >> PhD student, Karolinska Institutet >> http://mahogny.areta.org http://www.endrov.net >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From holland at eaglegenomics.com Mon Jan 4 10:57:54 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 4 Jan 2010 10:57:54 +0000 Subject: [Biojava-dev] BioJava 2010 Hackathon Message-ID: Hi all, For those attending the 2010 BioJava hackathon at the Wellcome Trust Genome Campus, near Cambridge, all details can be found on this wiki: http://biojava.org/wiki/BioJava:Hackathon2010 If your name is not listed on the wiki but you are planning on attending, please let us know by Jan 12th latest. Due to security arrangements at the campus it is not possible to just turn up on the day. Subjects for the hackathon are open for discussion on the wiki. Looking forward to seeing you all there! cheers, Richard -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From jolyon.holdstock at ogt.co.uk Wed Jan 13 12:49:17 2010 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Wed, 13 Jan 2010 12:49:17 -0000 Subject: [Biojava-dev] Script for cookbook Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F02C64D8C@EUCLID.internal.ogtip.com> Hi, I was going to add the following script to the annotation section of the cookbook. It takes an EMBL or Genbank file and outputs information about each CDS feature. I just wanted to check this was OK and also is this the most efficient way of doing this? Cheers, Jolyon [CODE] import java.io.*; import java.util.*; import org.biojava.bio.*; import org.biojava.bio.seq.*; import org.biojava.bio.seq.io.*; import org.biojavax.*; import org.biojavax.ontology.*; import org.biojavax.bio.*; import org.biojavax.bio.seq.*; public class ExtractInformation { //Create the RichSequence object RichSequence richSeq; public ExtractInformation(String fileName){ //Load the sequence file try { richSeq = RichSequence.IOTools.readGenbankDNA(new BufferedReader(new FileReader(fileName)),null).nextRichSequence(); } catch(FileNotFoundException fnfe){ System.out.println("FilwNotFoundException: " + fnfe); } catch(BioException bioe1){ System.err.println("Not a Genbank sequence trying EMBL"); try { richSeq = RichSequence.IOTools.readGenbankDNA(new BufferedReader(new FileReader(fileName)),null).nextRichSequence(); } catch(BioException bioe2){ System.err.println("Not an EMBL sequence either"); System.exit(1); } catch(FileNotFoundException fnfe){ System.out.println("FilwNotFoundException: " + fnfe); } } //Filter the sequence on CDS features FeatureFilter ff = new FeatureFilter.ByType("CDS"); FeatureHolder fh = richSeq.filter(ff); //Iterate through the CDS features for (Iterator i = fh.features(); i.hasNext();){ RichFeature rf = i.next(); //Get the strand orientation of the feature char featureStrand = rf.getStrand().getToken(); //Get the location of the feature String featureLocation = rf.getLocation().toString(); //Get the annotation of the feature RichAnnotation ra = (RichAnnotation)rf.getAnnotation(); //Create the required ComparableTerms ComparableTerm geneTerm = RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene"); ComparableTerm locusTerm = RichObjectFactory.getDefaultOntology().getOrCreateTerm("locus_tag"); ComparableTerm productTerm = RichObjectFactory.getDefaultOntology().getOrCreateTerm("product"); ComparableTerm synonymTerm = RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene_synonym"); ComparableTerm proteinIDTerm = RichObjectFactory.getDefaultOntology().getOrCreateTerm("protein_id"); //Create empty strings String gene = ""; String locus = ""; String product = ""; String geneSynonym = ""; String proteinID = ""; //Iterate through the notes in the annotation for (Iterator it = ra.getNoteSet().iterator(); it.hasNext();){ Note note = it.next(); //Check each note to see if it matches one of the required ComparableTerms if(note.getTerm().equals(locusTerm)){ locus = note.getValue().toString(); } if(note.getTerm().equals(productTerm)){ product = note.getValue().toString(); } if(note.getTerm().equals(geneTerm)){ gene = note.getValue().toString(); } if(note.getTerm().equals(synonymTerm)){ geneSynonym = note.getValue().toString(); } if(note.getTerm().equals(proteinIDTerm)){ proteinID = note.getValue().toString(); } } //Outout the feature information System.out.println(locus + " " + gene + " " + geneSynonym + " " + proteinID + " " + product +" " + featureStrand +" Chr1:" + featureLocation); } } public static void main(String args []){ if (args.length != 1){ System.out.println("Usage: java ExtractInformation "); System.exit(1); } else { new ExtractInformation(args[0]); } } } [/CODE] Dr. Jolyon Holdstock Senior Computational Biologist, Oxford Gene Technology, Begbroke Science Park, Sandy Lane, Yarnton, Oxford, OX5 1PF, UK. T: +44 (0)1865 856 852 F: +44 (0)1865 842 116 E: jolyon.holdstock at ogt.co.uk W: www.ogt.co.uk Looking to outsource your microarray studies? Look no further. Click here to tour our facilities Click here to request a quotation Scientific pedigree delivering high quality microarray results to you: * Service capacity >1000 samples per week * Rigorous QC from sample to result * Applications available include aCGH, CNV, methylation studies and miRNA Oxford Gene Technology (Operations) Ltd. Registered in England No: 03845432 Begbroke Science Park Sandy Lane Yarnton Oxford OX5 1PF Confidentiality Notice: The contents of this email from Oxford Gene Technology are confidential and intended solely for the person to whom it is addressed. It may contain privileged and confidential information. If you are not the intended recipient you must not read, copy, distribute, discuss or take any action in reliance on it. If you have received this email in error please advise the sender so that we can arrange for proper delivery. Then please delete the message from your inbox. Thank you. -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.png Type: image/png Size: 15781 bytes Desc: image003.png URL: From holland at eaglegenomics.com Wed Jan 13 13:22:16 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 13 Jan 2010 13:22:16 +0000 Subject: [Biojava-dev] Script for cookbook In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3F02C64D8C@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3F02C64D8C@EUCLID.internal.ogtip.com> Message-ID: <982845F6-0845-474A-BEBD-C5D336C03F42@eaglegenomics.com> Just a few minor points, otherwise all good: 1. Your 'trying EMBL' code just tries Genbank again! 2. Typo in the FileNotFoundException message. 3. Check to see that your ComparableTerm retrieval statements using RichObjectFactory are not already defined as constants in RichSequence.Terms or GenbankFormat.Terms. If they are, use the constants instead as this makes the code clearer. 4. Your system.out line that prints the results has Chr1 hardcoded - should this be a parameter, or read from the file maybe? cheers, Richard On 13 Jan 2010, at 12:49, Jolyon Holdstock wrote: > Hi, > > > > I was going to add the following script to the annotation section of the > cookbook. > > It takes an EMBL or Genbank file and outputs information about each CDS > feature. > > > > I just wanted to check this was OK and also is this the most efficient > way of doing this? > > > > Cheers, > > > > Jolyon > > > > [CODE] > > import java.io.*; > > import java.util.*; > > import org.biojava.bio.*; > > import org.biojava.bio.seq.*; > > import org.biojava.bio.seq.io.*; > > import org.biojavax.*; > > import org.biojavax.ontology.*; > > import org.biojavax.bio.*; > > import org.biojavax.bio.seq.*; > > > > public class ExtractInformation { > > //Create the RichSequence object > > RichSequence richSeq; > > public ExtractInformation(String fileName){ > > //Load the sequence file > > try { > > richSeq = RichSequence.IOTools.readGenbankDNA(new > BufferedReader(new FileReader(fileName)),null).nextRichSequence(); > > } > > catch(FileNotFoundException fnfe){ > > System.out.println("FilwNotFoundException: " + fnfe); > > } > > catch(BioException bioe1){ > > System.err.println("Not a Genbank sequence trying EMBL"); > > try { > > richSeq = RichSequence.IOTools.readGenbankDNA(new > BufferedReader(new FileReader(fileName)),null).nextRichSequence(); > > } > > catch(BioException bioe2){ > > System.err.println("Not an EMBL sequence either"); > > System.exit(1); > > } > > catch(FileNotFoundException fnfe){ > > System.out.println("FilwNotFoundException: " + fnfe); > > } > > } > > //Filter the sequence on CDS features > > FeatureFilter ff = new FeatureFilter.ByType("CDS"); > > FeatureHolder fh = richSeq.filter(ff); > > > > //Iterate through the CDS features > > for (Iterator i = fh.features(); i.hasNext();){ > > RichFeature rf = i.next(); > > > > //Get the strand orientation of the feature > > char featureStrand = rf.getStrand().getToken(); > > > > //Get the location of the feature > > String featureLocation = rf.getLocation().toString(); > > > > //Get the annotation of the feature > > RichAnnotation ra = (RichAnnotation)rf.getAnnotation(); > > > > //Create the required ComparableTerms > > ComparableTerm geneTerm = > RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene"); > > ComparableTerm locusTerm = > RichObjectFactory.getDefaultOntology().getOrCreateTerm("locus_tag"); > > ComparableTerm productTerm = > RichObjectFactory.getDefaultOntology().getOrCreateTerm("product"); > > ComparableTerm synonymTerm = > RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene_synonym"); > > ComparableTerm proteinIDTerm = > RichObjectFactory.getDefaultOntology().getOrCreateTerm("protein_id"); > > > > //Create empty strings > > String gene = ""; > > String locus = ""; > > String product = ""; > > String geneSynonym = ""; > > String proteinID = ""; > > > > //Iterate through the notes in the annotation > > for (Iterator it = ra.getNoteSet().iterator(); > it.hasNext();){ > > Note note = it.next(); > > > > //Check each note to see if it matches one of the required > ComparableTerms > > if(note.getTerm().equals(locusTerm)){ > > locus = note.getValue().toString(); > > } > > if(note.getTerm().equals(productTerm)){ > > product = note.getValue().toString(); > > } > > if(note.getTerm().equals(geneTerm)){ > > gene = note.getValue().toString(); > > } > > if(note.getTerm().equals(synonymTerm)){ > > geneSynonym = note.getValue().toString(); > > } > > if(note.getTerm().equals(proteinIDTerm)){ > > proteinID = note.getValue().toString(); > > } > > } > > //Outout the feature information > > System.out.println(locus + " " + gene + " " + geneSynonym + " " > + proteinID + " " + product +" " + featureStrand +" Chr1:" + > featureLocation); > > } > > } > > > > public static void main(String args []){ > > if (args.length != 1){ > > System.out.println("Usage: java ExtractInformation Genbank or EMBL format>"); > > System.exit(1); > > } > > else { > > new ExtractInformation(args[0]); > > } > > } > > } > > [/CODE] > > > > Dr. Jolyon Holdstock > Senior Computational Biologist, > > Oxford Gene Technology, > Begbroke Science Park, > Sandy Lane, Yarnton, > Oxford, OX5 1PF, UK. > > T: +44 (0)1865 856 852 > F: +44 (0)1865 842 116 > E: jolyon.holdstock at ogt.co.uk > > W: www.ogt.co.uk > > > > Looking to outsource your microarray studies? Look no further. > Click here to tour our facilities > > > Click here to request a quotation > > > > > Scientific pedigree delivering high quality microarray results to you: > > * Service capacity >1000 samples per week > > * Rigorous QC from sample to > result > > * Applications available > include aCGH, CNV, methylation studies and miRNA > > > > Oxford Gene Technology (Operations) Ltd. Registered in England No: > 03845432 Begbroke Science Park Sandy Lane Yarnton Oxford OX5 1PF > > Confidentiality Notice: The contents of this email from Oxford Gene > Technology are confidential and intended solely for the person to whom > it is addressed. It may contain privileged and confidential information. > If you are not the intended recipient you must not read, copy, > distribute, discuss or take any action in reliance on it. If you have > received this email in error please advise the sender so that we can > arrange for proper delivery. Then please delete the message from your > inbox. Thank you. > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From genjasp at gmail.com Wed Jan 13 14:14:26 2010 From: genjasp at gmail.com (Alessandro Cipriani) Date: Wed, 13 Jan 2010 15:14:26 +0100 Subject: [Biojava-dev] Script for cookbook In-Reply-To: <982845F6-0845-474A-BEBD-C5D336C03F42@eaglegenomics.com> References: <588D0DD225D05746B5D8CAE1BE971F3F02C64D8C@EUCLID.internal.ogtip.com> <982845F6-0845-474A-BEBD-C5D336C03F42@eaglegenomics.com> Message-ID: <46b9a2151001130614i1d1b057xfdffa8f0051d0b89@mail.gmail.com> Hi, i think it is not a good practice: -to insert a try/catch block into anothet try/catch block. -to import all lib even unrequired -to make all methods public otherwise it seems to be ok! Regards Alex 2010/1/13 Richard Holland : > Just a few minor points, otherwise all good: > > ?1. Your 'trying EMBL' code just tries Genbank again! > ?2. Typo in the FileNotFoundException message. > ?3. Check to see that your ComparableTerm retrieval statements using RichObjectFactory are not already defined as constants in RichSequence.Terms or GenbankFormat.Terms. If they are, use the constants instead as this makes the code clearer. > ?4. Your system.out line that prints the results has Chr1 hardcoded - should this be a parameter, or read from the file maybe? > > cheers, > Richard > > On 13 Jan 2010, at 12:49, Jolyon Holdstock wrote: > >> Hi, >> >> >> >> I was going to add the following script to the annotation section of the >> cookbook. >> >> It takes an EMBL or Genbank file and outputs information about each CDS >> feature. >> >> >> >> I just wanted to check this was OK and also is this the most efficient >> way of doing this? >> >> >> >> Cheers, >> >> >> >> Jolyon >> >> >> >> [CODE] >> >> import java.io.*; >> >> import java.util.*; >> >> import org.biojava.bio.*; >> >> import org.biojava.bio.seq.*; >> >> import org.biojava.bio.seq.io.*; >> >> import org.biojavax.*; >> >> import org.biojavax.ontology.*; >> >> import org.biojavax.bio.*; >> >> import org.biojavax.bio.seq.*; >> >> >> >> public class ExtractInformation { >> >> ?//Create the RichSequence object >> >> ?RichSequence richSeq; >> >> ?public ExtractInformation(String fileName){ >> >> ? ?//Load the sequence file >> >> ? ?try { >> >> ? ? ?richSeq = RichSequence.IOTools.readGenbankDNA(new >> BufferedReader(new FileReader(fileName)),null).nextRichSequence(); >> >> ? ?} >> >> ? ?catch(FileNotFoundException fnfe){ >> >> ? ? ?System.out.println("FilwNotFoundException: " + fnfe); >> >> ? ?} >> >> ? ?catch(BioException bioe1){ >> >> ? ? ?System.err.println("Not a Genbank sequence trying EMBL"); >> >> ? ? ?try ?{ >> >> ? ? ? ?richSeq = RichSequence.IOTools.readGenbankDNA(new >> BufferedReader(new FileReader(fileName)),null).nextRichSequence(); >> >> ? ? ?} >> >> ? ? ?catch(BioException bioe2){ >> >> ? ? ? ?System.err.println("Not an EMBL sequence either"); >> >> ? ? ? ?System.exit(1); >> >> ? ? ?} >> >> ? ? ?catch(FileNotFoundException fnfe){ >> >> ? ? ? ?System.out.println("FilwNotFoundException: " + fnfe); >> >> ? ? ?} >> >> ? ?} >> >> ? ?//Filter the sequence on CDS features >> >> ? ?FeatureFilter ff = new FeatureFilter.ByType("CDS"); >> >> ? ?FeatureHolder fh = richSeq.filter(ff); >> >> >> >> ? ?//Iterate through the CDS features >> >> ? ?for (Iterator i = fh.features(); i.hasNext();){ >> >> ? ? ?RichFeature rf = i.next(); >> >> >> >> ? ? ?//Get the strand orientation of the feature >> >> ? ? ?char featureStrand = rf.getStrand().getToken(); >> >> >> >> ? ? ?//Get the location of the feature >> >> ? ? ?String featureLocation = rf.getLocation().toString(); >> >> >> >> ? ? ?//Get the annotation of the feature >> >> ? ? ?RichAnnotation ra = (RichAnnotation)rf.getAnnotation(); >> >> >> >> ? ? ?//Create the required ComparableTerms >> >> ? ? ?ComparableTerm geneTerm = >> RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene"); >> >> ? ? ?ComparableTerm locusTerm = >> RichObjectFactory.getDefaultOntology().getOrCreateTerm("locus_tag"); >> >> ? ? ?ComparableTerm productTerm = >> RichObjectFactory.getDefaultOntology().getOrCreateTerm("product"); >> >> ? ? ?ComparableTerm synonymTerm = >> RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene_synonym"); >> >> ? ? ?ComparableTerm proteinIDTerm = >> RichObjectFactory.getDefaultOntology().getOrCreateTerm("protein_id"); >> >> >> >> ? ? ?//Create empty strings >> >> ? ? ?String gene = ""; >> >> ? ? ?String locus = ""; >> >> ? ? ?String product = ""; >> >> ? ? ?String geneSynonym = ""; >> >> ? ? ?String proteinID = ""; >> >> >> >> ? ? ?//Iterate through the notes in the annotation >> >> ? ? ?for (Iterator it = ra.getNoteSet().iterator(); >> it.hasNext();){ >> >> ? ? ? ?Note note = it.next(); >> >> >> >> ? ? ?//Check each note to see if it matches one of the required >> ComparableTerms >> >> ? ? ? ?if(note.getTerm().equals(locusTerm)){ >> >> ? ? ? ? ?locus = note.getValue().toString(); >> >> ? ? ? ?} >> >> ? ? ? ?if(note.getTerm().equals(productTerm)){ >> >> ? ? ? ? ?product = note.getValue().toString(); >> >> ? ? ? ?} >> >> ? ? ? ?if(note.getTerm().equals(geneTerm)){ >> >> ? ? ? ? ?gene = note.getValue().toString(); >> >> ? ? ? ?} >> >> ? ? ? ?if(note.getTerm().equals(synonymTerm)){ >> >> ? ? ? ? ?geneSynonym = note.getValue().toString(); >> >> ? ? ? ?} >> >> ? ? ? ?if(note.getTerm().equals(proteinIDTerm)){ >> >> ? ? ? ? ?proteinID = note.getValue().toString(); >> >> ? ? ? ?} >> >> ? ? ?} >> >> ? ? ?//Outout the feature information >> >> ? ? ?System.out.println(locus + " ?" + gene + " ?" + geneSynonym + " ?" >> + proteinID + " ?" + product +" ?" + featureStrand +" ?Chr1:" + >> featureLocation); >> >> ? ?} >> >> ?} >> >> >> >> ?public static void main(String args []){ >> >> ? ?if (args.length != 1){ >> >> ? ? ?System.out.println("Usage: java ExtractInformation > Genbank or EMBL format>"); >> >> ? ? ?System.exit(1); >> >> ? ?} >> >> ? ?else { >> >> ? ? ?new ExtractInformation(args[0]); >> >> ? ?} >> >> ?} >> >> } >> >> [/CODE] >> >> >> >> Dr. Jolyon Holdstock >> Senior Computational Biologist, >> >> Oxford Gene Technology, >> Begbroke Science Park, >> Sandy Lane, Yarnton, >> Oxford, OX5 1PF, UK. >> >> T: +44 (0)1865 856 852 >> F: +44 (0)1865 842 116 >> E: jolyon.holdstock at ogt.co.uk >> >> W: www.ogt.co.uk >> >> >> >> Looking to outsource your microarray studies? Look no further. >> Click here to tour our facilities >> >> >> Click here to request a quotation >> >> >> >> >> Scientific pedigree delivering high quality microarray results to you: >> >> * ? ? ? ? Service capacity >1000 samples per week >> >> * ? ? ? ? Rigorous QC ?from sample to >> result >> >> * ? ? ? ? Applications ?available >> include aCGH, CNV, methylation studies and miRNA >> >> >> >> Oxford Gene Technology (Operations) Ltd. Registered in England No: >> 03845432 Begbroke Science Park Sandy Lane Yarnton Oxford OX5 1PF >> >> Confidentiality Notice: The contents of this email from Oxford Gene >> Technology are confidential and intended solely for the person to whom >> it is addressed. It may contain privileged and confidential information. >> If you are not the intended recipient you must not read, copy, >> distribute, discuss or take any action in reliance on it. If you have >> received this email in error please advise the sender so that we can >> arrange for proper delivery. Then please delete the message from your >> inbox. Thank you. >> >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From mahogny at areta.org Sun Jan 17 17:52:45 2010 From: mahogny at areta.org (Johan Henriksson) Date: Sun, 17 Jan 2010 18:52:45 +0100 Subject: [Biojava-dev] biojava hackathon In-Reply-To: References: Message-ID: sorry, I will not be able to attend the hackathon this year for a whole lot of reasons. in addition, sorry for telling this so late. I'm looking forward to the next one though; happy hacking to the rest of you! /Johan -- ----------------------------------------------------------- Johan Henriksson PhD student, Karolinska Institutet http://mahogny.areta.org http://www.endrov.net From heuermh at acm.org Tue Jan 19 04:54:21 2010 From: heuermh at acm.org (Michael Heuer) Date: Mon, 18 Jan 2010 23:54:21 -0500 (EST) Subject: [Biojava-dev] hackathon task list, pom refactoring, &c. In-Reply-To: Message-ID: Hello Hackathon participants, Might we start a task list for the hackathon tomorrow, on the biojava wiki perhaps? It would be nice to be able to contribute to the list of tasks and to claim tasks to work on from off-site. Today I refactored and cleaned up all the maven poms. Build and test works for me on linux, win, and mac with commandline maven. Eclipse via m2eclipse is good for almost everything, but has trouble firing off the maven-jaxb2-plugin code generation step. If you want to give it a try, version maven 3.0.6-alpha is a lot more picky about reproducable builds than earlier versions, and complained a ton when I first started. Now the only warning is the deprecated maven1 repo link at java.net. I have set up cia.vc to post svn commit notification via IRC to #biojava on irc.freenode.net. I'll also be on there most of the day tomorrow, or available on google chat. Ah, with the blogs and the twitter and stuff, I can't keep up. ;) michael From heuermh at acm.org Tue Jan 19 05:02:28 2010 From: heuermh at acm.org (Michael Heuer) Date: Tue, 19 Jan 2010 00:02:28 -0500 (EST) Subject: [Biojava-dev] [Biojava-l] BioJava Hackathon - Day 1 In-Reply-To: <95988839-75A3-4087-8071-5A19BEECDD04@eaglegenomics.com> Message-ID: On Mon, 18 Jan 2010, Richard Holland wrote: > ... > > Also by making the new SymbolList implement the standard List > interface from Collections, it can be used in nice Java shortcuts such > as the new foreach loops, and standard iterators can be used (instead of > the current SymbolListIterator method). Collections API can also be used > then to do things like subsets or reverses. Is this a good idea? It might preclude some efficient implementation strategies, such as ParallelArray from JSR166 http://gee.cs.oswego.edu/dl/jsr166/dist/extra166ydocs/extra166y/ParallelArray.html Would Iterable be sufficient enough? Also, List indexing is not compatible with typical sequence feature indexing. michael From holland at eaglegenomics.com Tue Jan 19 05:45:14 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 19 Jan 2010 05:45:14 +0000 Subject: [Biojava-dev] [Biojava-l] BioJava Hackathon - Day 1 In-Reply-To: References: Message-ID: <56B4AA4E-FF88-4D4E-BD95-49B76194FA82@eaglegenomics.com> Good point - Iterable would probably be fine. On 19 Jan 2010, at 05:02, Michael Heuer wrote: > On Mon, 18 Jan 2010, Richard Holland wrote: > >> ... >> >> Also by making the new SymbolList implement the standard List >> interface from Collections, it can be used in nice Java shortcuts such >> as the new foreach loops, and standard iterators can be used (instead of >> the current SymbolListIterator method). Collections API can also be used >> then to do things like subsets or reverses. > > Is this a good idea? It might preclude some efficient implementation > strategies, such as ParallelArray from JSR166 > > http://gee.cs.oswego.edu/dl/jsr166/dist/extra166ydocs/extra166y/ParallelArray.html > > Would Iterable be sufficient enough? > > Also, List indexing is not compatible with typical sequence feature > indexing. > > michael > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From ayates at ebi.ac.uk Tue Jan 19 08:50:49 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 19 Jan 2010 08:50:49 +0000 Subject: [Biojava-dev] [Biojava-l] BioJava Hackathon - Day 1 In-Reply-To: <56B4AA4E-FF88-4D4E-BD95-49B76194FA82@eaglegenomics.com> References: <56B4AA4E-FF88-4D4E-BD95-49B76194FA82@eaglegenomics.com> Message-ID: <25714175-4B75-4D30-AA42-E0176060DCAB@ebi.ac.uk> This somewhat targets what this new version should be about; making the most of Java5+ features (Iterable & generics being the first things that spring to mind). On 19 Jan 2010, at 05:45, Richard Holland wrote: > Good point - Iterable would probably be fine. > > On 19 Jan 2010, at 05:02, Michael Heuer wrote: > >> On Mon, 18 Jan 2010, Richard Holland wrote: >> >>> ... >>> >>> Also by making the new SymbolList implement the standard List >>> interface from Collections, it can be used in nice Java shortcuts >>> such >>> as the new foreach loops, and standard iterators can be used >>> (instead of >>> the current SymbolListIterator method). Collections API can also >>> be used >>> then to do things like subsets or reverses. >> >> Is this a good idea? It might preclude some efficient implementation >> strategies, such as ParallelArray from JSR166 >> >> http://gee.cs.oswego.edu/dl/jsr166/dist/extra166ydocs/extra166y/ParallelArray.html >> >> Would Iterable be sufficient enough? >> >> Also, List indexing is not compatible with typical sequence feature >> indexing. >> >> michael >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From andy.law at roslin.ed.ac.uk Tue Jan 19 09:23:26 2010 From: andy.law at roslin.ed.ac.uk (Andy Law (RI)) Date: Tue, 19 Jan 2010 09:23:26 +0000 Subject: [Biojava-dev] hackathon task list, pom refactoring, &c. In-Reply-To: References: Message-ID: <6B943C9E-98E1-4788-8860-2B6AC266421C@roslin.ed.ac.uk> On 19 Jan 2010, at 04:54, Michael Heuer wrote: > > I have set up cia.vc to post svn commit notification via IRC to > #biojava > on irc.freenode.net. I'll also be on there most of the day > tomorrow, or > available on google chat. > > Ah, with the blogs and the twitter and stuff, I can't keep up. ;) Sheesh! What was ever wrong with posting occasional notes via email? Later, Andy -------- Yada, yada, yada... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. From jw12 at sanger.ac.uk Tue Jan 19 10:41:12 2010 From: jw12 at sanger.ac.uk (Jonathan Warren) Date: Tue, 19 Jan 2010 10:41:12 +0000 Subject: [Biojava-dev] DAS Workshop Registrations now Open (workshop date 7-9 April 2010) Message-ID: <9EDF4E46-15F8-434E-B557-2DE5906C4182@sanger.ac.uk> If you don't know about DAS and wish to know how to distribute your latest biological annotation to the world then the upcoming DAS workshop maybe for you. If you know about DAS and are maybe a DAS client developer then the upcoming DAS workshop is for you (as you will need to know about the upcoming DAS 1.6 Specification and how it may affect your software). For information on the workshop and registration please go to: http://www.ebi.ac.uk/training/handson/DAS_070410.html Jonathan Warren Senior Developer and DAS coordinator jw12 at sanger.ac.uk -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From heuermh at acm.org Wed Jan 20 00:52:29 2010 From: heuermh at acm.org (Michael Heuer) Date: Tue, 19 Jan 2010 19:52:29 -0500 (EST) Subject: [Biojava-dev] sequence-core tests currently do not compile In-Reply-To: <6B943C9E-98E1-4788-8860-2B6AC266421C@roslin.ed.ac.uk> Message-ID: Module sequence-core tests currently do not compile [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile (default-testCompile) on project sequence-co re: Compilation failure: Compilation failure: C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[8,7] class DNATests is public, should be declared in a file named DNATests.java C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[45,9] cannot find sy mbol symbol : class Sequence location: class org.biojava3.core.DNATests C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[45,28] cannot find s ymbol symbol : class DNACompound location: class org.biojava3.core.DNATests C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[40,17] cannot find s ymbol symbol : class BadSequence location: class org.biojava3.core.DNATests C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[12,15] getSeq(java.l ang.String) in org.biojava3.core.DNATests cannot be applied to () C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[18,15] getSeq(java.l ang.String) in org.biojava3.core.DNATests cannot be applied to () C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[24,15] getSeq(java.l ang.String) in org.biojava3.core.DNATests cannot be applied to () C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[37,16] cannot find s ymbol symbol : variable toString location: class java.lang.String C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence-core\src\test\java\org\biojava3\core\DnaTests.java:[47,15] cannot find s ymbol symbol : class DNASequence location: class org.biojava3.core.DNATests From ayates at ebi.ac.uk Wed Jan 20 09:55:31 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 20 Jan 2010 09:55:31 +0000 Subject: [Biojava-dev] sequence-core tests currently do not compile In-Reply-To: References: Message-ID: <9981AAC1-1197-4DF7-B347-5ADE4E0C03C0@ebi.ac.uk> Hmmm interesting. That should have all been commented out I thought. I will fix & commit back On 20 Jan 2010, at 00:52, Michael Heuer wrote: > > Module sequence-core tests currently do not compile > > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:2.0.2:testCompile > (default-testCompile) on project sequence-co > re: Compilation failure: Compilation failure: > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[8,7] > class DNATests > is public, should be declared in a file named DNATests.java > > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[45,9] > cannot find sy > mbol > symbol : class Sequence > location: class org.biojava3.core.DNATests > > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[45,28] > cannot find s > ymbol > symbol : class DNACompound > location: class org.biojava3.core.DNATests > > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[40,17] > cannot find s > ymbol > symbol : class BadSequence > location: class org.biojava3.core.DNATests > > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[12,15] > getSeq(java.l > ang.String) in org.biojava3.core.DNATests cannot be applied to () > > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[18,15] > getSeq(java.l > ang.String) in org.biojava3.core.DNATests cannot be applied to () > > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[24,15] > getSeq(java.l > ang.String) in org.biojava3.core.DNATests cannot be applied to () > > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[37,16] > cannot find s > ymbol > symbol : variable toString > location: class java.lang.String > > C:\cygwin\home\heuermh\working\biojava-live-trunk\sequence\sequence- > core\src\test\java\org\biojava3\core\DnaTests.java:[47,15] > cannot find s > ymbol > symbol : class DNASequence > location: class org.biojava3.core.DNATests > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From HWillis at scripps.edu Tue Jan 26 18:09:36 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 26 Jan 2010 13:09:36 -0500 Subject: [Biojava-dev] Code Update Message-ID: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> I checked in updates with test cases for Fasta fileparsing where the main focus is on the fasta header. The test cases are based on the wikipedia examples so results will vary with actual files. It is very easy now to do a custom header parser so we have lots of flexibility. I also started the code for the file pointer sequence proxy where the key usage is creating a sequence with the header and storing a reference to the file and offset in the file for the start of the sequence. When a method is called related to getting a sequence/subsequence the init() method is called to load the sequence data via RandomAccessFile with a seek to the offset. It turns out that none of the java io classes will actually return an offset index of the actual bytes read. This also gets complicated with the readline() methods where the CR and/or LF is stripped off when the string is returned so you can't keep track of it externally. I copied the BufferedReader.java class to BufferedReaderBytesRead.java and keep track of the file pointer internally. This code still needs to be tested. This should be a great way to load large date sets with minimal memory. To complete this approach I will probably do a collection that is proxy aware that can go through and free up storage by returning a sequence to its proxy state. I will work this week on getting some wiki pages created to give examples on using the header parsing interface and proxy sequences. How do we want to organize wiki pages related to biojava3 work? Thanks Scooter From andreas at sdsc.edu Tue Jan 26 18:45:59 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 26 Jan 2010 10:45:59 -0800 Subject: [Biojava-dev] Code Update In-Reply-To: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> References: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> Message-ID: <59a41c431001261045g301d8a48hbd55999cafa5601c@mail.gmail.com> the cookbook approach seems to work quite well. You could start a new "Chapter" in the book and make it clear that this will be only available once biojava 3 has been released (or via SVN checkout) Andreas On Tue, Jan 26, 2010 at 10:09 AM, Scooter Willis wrote: > > I checked in updates with test cases for Fasta fileparsing where the main focus is on the fasta header. ?The test cases are based on the wikipedia examples so results will vary with actual files. It is very easy now to do a custom header parser so we have lots of flexibility. ?I also started the code for the file pointer sequence proxy where the key usage is creating a sequence with the header and storing a reference to the file and offset in the file for the start of the sequence. When a method is called related to getting a sequence/subsequence the init() method is called to load the sequence data via RandomAccessFile with a seek to the offset. It turns out that none of the java io classes will actually return an offset index of the actual bytes read. This also gets complicated with the readline() methods where the CR and/or LF is stripped off when the string is returned so you can't keep track of it externally. I copied the BufferedReader.java class to BufferedReaderBytes! > ?Read.java and keep track of the file pointer internally. This code still needs to be tested. This should be a great way to load large date sets with minimal memory. To complete this approach I will probably do a collection that is proxy aware that can go through and free up storage by returning a sequence to its proxy state. > > I will work this week on getting some wiki pages created to give examples on using the header parsing interface and proxy sequences. How do we want to organize wiki pages related to biojava3 work? > > Thanks > > Scooter > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From ayates at ebi.ac.uk Tue Jan 26 19:58:43 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Tue, 26 Jan 2010 19:58:43 +0000 Subject: [Biojava-dev] Code Update In-Reply-To: <59a41c431001261045g301d8a48hbd55999cafa5601c@mail.gmail.com> References: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> <59a41c431001261045g301d8a48hbd55999cafa5601c@mail.gmail.com> Message-ID: <7075E1DA-E19E-4AB0-A6DC-DF2A62D7DDE8@ebi.ac.uk> Talking about code updates I've got DNA -> RNA -> Peptide working quite well. It's about a day or two of tinkering away from being in a sensible state. There's also some utilities I've gone & created; they've gone into org.biojava3.core.util ... anyone got any better suggestions as to where they should live? Andy On 26 Jan 2010, at 18:45, Andreas Prlic wrote: > the cookbook approach seems to work quite well. You could start a new > "Chapter" in the book and make it clear that this will be only > available once biojava 3 has been released (or via SVN checkout) > > Andreas > > On Tue, Jan 26, 2010 at 10:09 AM, Scooter Willis > wrote: >> >> I checked in updates with test cases for Fasta fileparsing where >> the main focus is on the fasta header. The test cases are based on >> the wikipedia examples so results will vary with actual files. It >> is very easy now to do a custom header parser so we have lots of >> flexibility. I also started the code for the file pointer sequence >> proxy where the key usage is creating a sequence with the header >> and storing a reference to the file and offset in the file for the >> start of the sequence. When a method is called related to getting a >> sequence/subsequence the init() method is called to load the >> sequence data via RandomAccessFile with a seek to the offset. It >> turns out that none of the java io classes will actually return an >> offset index of the actual bytes read. This also gets complicated >> with the readline() methods where the CR and/or LF is stripped off >> when the string is returned so you can't keep track of it >> externally. I copied the BufferedReader.java class to >> BufferedReaderBytes! >> Read.java and keep track of the file pointer internally. This code >> still needs to be tested. This should be a great way to load large >> date sets with minimal memory. To complete this approach I will >> probably do a collection that is proxy aware that can go through >> and free up storage by returning a sequence to its proxy state. >> >> I will work this week on getting some wiki pages created to give >> examples on using the header parsing interface and proxy sequences. >> How do we want to organize wiki pages related to biojava3 work? >> >> Thanks >> >> Scooter >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From HWillis at scripps.edu Tue Jan 26 20:17:47 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 26 Jan 2010 15:17:47 -0500 Subject: [Biojava-dev] Code Update In-Reply-To: <7075E1DA-E19E-4AB0-A6DC-DF2A62D7DDE8@ebi.ac.uk> References: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> <59a41c431001261045g301d8a48hbd55999cafa5601c@mail.gmail.com> <7075E1DA-E19E-4AB0-A6DC-DF2A62D7DDE8@ebi.ac.uk> Message-ID: <56D4673E-EB91-43BA-BE0F-99BCD202752A@scripps.edu> Andy Let me know when you have that in a healthy state and I will work on the gtf/gff3 parser->create gene->transcript->(exon)->to protein code. Scooter On Jan 26, 2010, at 2:58 PM, Andy Yates wrote: > Talking about code updates I've got DNA -> RNA -> Peptide working > quite well. It's about a day or two of tinkering away from being in a > sensible state. There's also some utilities I've gone & created; > they've gone into org.biojava3.core.util ... anyone got any better > suggestions as to where they should live? > > Andy > > On 26 Jan 2010, at 18:45, Andreas Prlic wrote: > >> the cookbook approach seems to work quite well. You could start a new >> "Chapter" in the book and make it clear that this will be only >> available once biojava 3 has been released (or via SVN checkout) >> >> Andreas >> >> On Tue, Jan 26, 2010 at 10:09 AM, Scooter Willis >> wrote: >>> >>> I checked in updates with test cases for Fasta fileparsing where >>> the main focus is on the fasta header. The test cases are based on >>> the wikipedia examples so results will vary with actual files. It >>> is very easy now to do a custom header parser so we have lots of >>> flexibility. I also started the code for the file pointer sequence >>> proxy where the key usage is creating a sequence with the header >>> and storing a reference to the file and offset in the file for the >>> start of the sequence. When a method is called related to getting a >>> sequence/subsequence the init() method is called to load the >>> sequence data via RandomAccessFile with a seek to the offset. It >>> turns out that none of the java io classes will actually return an >>> offset index of the actual bytes read. This also gets complicated >>> with the readline() methods where the CR and/or LF is stripped off >>> when the string is returned so you can't keep track of it >>> externally. I copied the BufferedReader.java class to >>> BufferedReaderBytes! >>> Read.java and keep track of the file pointer internally. This code >>> still needs to be tested. This should be a great way to load large >>> date sets with minimal memory. To complete this approach I will >>> probably do a collection that is proxy aware that can go through >>> and free up storage by returning a sequence to its proxy state. >>> >>> I will work this week on getting some wiki pages created to give >>> examples on using the header parsing interface and proxy sequences. >>> How do we want to organize wiki pages related to biojava3 work? >>> >>> Thanks >>> >>> Scooter >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > From jacobsen at ebi.ac.uk Wed Jan 27 11:32:31 2010 From: jacobsen at ebi.ac.uk (Jules Jacobsen) Date: Wed, 27 Jan 2010 11:32:31 +0000 Subject: [Biojava-dev] Code Update In-Reply-To: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> References: <5DC3EABB-E571-4D23-BDA7-D74A0CCAD804@scripps.edu> Message-ID: <4B60244F.2070603@ebi.ac.uk> Cool - I've a sudden need for a SELEX reader/parser so will try and knock one of those together in the immediate future - might lift some code from Jalview for this purpose and see how the new MultipleSequenceAlignment class behaves in real life. Plus I've just tweaked the FastaReader and FastaWriter to be setup-neutral. Jules On 26/01/2010 18:09, Scooter Willis wrote: > I checked in updates with test cases for Fasta fileparsing where the main focus is on the fasta header. The test cases are based on the wikipedia examples so results will vary with actual files. It is very easy now to do a custom header parser so we have lots of flexibility. I also started the code for the file pointer sequence proxy where the key usage is creating a sequence with the header and storing a reference to the file and offset in the file for the start of the sequence. When a method is called related to getting a sequence/subsequence the init() method is called to load the sequence data via RandomAccessFile with a seek to the offset. It turns out that none of the java io classes will actually return an offset index of the actual bytes read. This also gets complicated with the readline() methods where the CR and/or LF is stripped off when the string is returned so you can't keep track of it externally. I copied the BufferedReader.java class to BufferedReaderBytes! > Read.java and keep track of the file pointer internally. This code still needs to be tested. This should be a great way to load large date sets with minimal memory. To complete this approach I will probably do a collection that is proxy aware that can go through and free up storage by returning a sequence to its proxy state. > > I will work this week on getting some wiki pages created to give examples on using the header parsing interface and proxy sequences. How do we want to organize wiki pages related to biojava3 work? > > Thanks > > Scooter > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From chapmanb at 50mail.com Thu Jan 28 20:35:05 2010 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 28 Jan 2010 15:35:05 -0500 Subject: [Biojava-dev] OpenBio solution challenge: Project updates at BOSC 2010 Message-ID: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Hello all; The BOSC 2010 organizing committee is hard at work getting prepared for this July's meeting in Boston: http://www.open-bio.org/wiki/BOSC_2010 One of the items we've traditionally had at the conference is a project update from each of the OpenBio affiliated groups. This year, we're thinking about organizing these talks around a central theme: the OpenBio solution challenge. We start with a biological question of general interest, and each of the project talks would focus around how you would solve that problem using your toolkit and programming language. This is meant to provide a challenge for OpenBio contributors, a nice tutorial style overview of various projects and approaches for other programmers, and a fun opportunity to compete and learn from other projects. Conference attendees will vote on their favorite solution, with the winner receiving fame and fortune (warning: fortune not guaranteed). For this to be successful, it of course requires interest and enthusiasm from y'all fine folks involved with the projects. Specifically: - Is there interest from your group in participating in the challenge? You'll want at least a few people to work on it, and someone to give a presentation at BOSC. - Do you have suggestions on a good theme or specific biological problem to tackle? We'll hope to pick something in a sweet spot that is challenging enough to be of interest, yet reasonable for presentation and preparation. Let's discuss ideas and get this together. Since the schedule for BOSC is developing rapidly, please give us an idea if you're interested by February 12th, and copy responses to the BOSC mailing list as a central place for discussion. bosc at open-bio.org Thanks, Brad, Michael, and the BOSC organizing committee From markw at illuminae.com Thu Jan 28 21:17:44 2010 From: markw at illuminae.com (Mark Wilkinson) Date: Thu, 28 Jan 2010 13:17:44 -0800 Subject: [Biojava-dev] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: <20100128203505.GG40046@sobchak.mgh.harvard.edu> References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: Brad, this sounds exciting! One thing strikes me, though - by asking for the sub-projects to propose the "grand challenge" themselves the one thing you can guarantee is that the "grand challenge" is solvable (or more likely, already solved!) Other "grand challenge" kinds of meetings have an independent third party pose the problem that has to be solved, and then all groups work toward a solution and compare their results. This would, IMO, be more revealing of the "state of the art" in each Open-Bio project, and point out where the weaknesses are that we should be focusing on... Someone (for example, you!) could act as the moderator to ensure that the "grand challenge" was at least a reasonable one, within the scope of what an Open-Bio project *should* be able to solve... Just my CAD $0.02 Mark On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman wrote: > Hello all; > The BOSC 2010 organizing committee is hard at work getting prepared for > this > July's meeting in Boston: > > http://www.open-bio.org/wiki/BOSC_2010 > > One of the items we've traditionally had at the conference is a project > update from each of the OpenBio affiliated groups. This year, we're > thinking > about organizing these talks around a central theme: the OpenBio solution > challenge. We start with a biological question of general interest, and > each > of the project talks would focus around how you would solve that problem > using your toolkit and programming language. > > This is meant to provide a challenge for OpenBio contributors, a nice > tutorial > style overview of various projects and approaches for other programmers, > and a > fun opportunity to compete and learn from other projects. Conference > attendees > will vote on their favorite solution, with the winner receiving fame and > fortune (warning: fortune not guaranteed). > > For this to be successful, it of course requires interest and enthusiasm > from > y'all fine folks involved with the projects. Specifically: > > - Is there interest from your group in participating in the challenge? > You'll > want at least a few people to work on it, and someone to give a > presentation > at BOSC. > > - Do you have suggestions on a good theme or specific biological problem > to > tackle? We'll hope to pick something in a sweet spot that is > challenging > enough to be of interest, yet reasonable for presentation and > preparation. > > Let's discuss ideas and get this together. Since the schedule for BOSC is > developing rapidly, please give us an idea if you're interested by > February 12th, and copy responses to the BOSC mailing list as a central > place for discussion. > > bosc at open-bio.org > > Thanks, > Brad, Michael, and the BOSC organizing committee > _______________________________________________ > MOBY-dev mailing list > MOBY-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/moby-dev -- Mark D Wilkinson, PI Bioinformatics Assistant Professor, Medical Genetics The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research Providence Heart + Lung Institute University of British Columbia - St. Paul's Hospital Vancouver, BC, Canada From HWillis at scripps.edu Fri Jan 29 01:03:10 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Thu, 28 Jan 2010 20:03:10 -0500 Subject: [Biojava-dev] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: <716E205A-5196-409F-A7BC-EF0F52AA997A@scripps.edu> Brad I agree with Mark that a particular problem may be biased towards a toolkit/language. Another approach would be to list a collection of problems and each group would then pick a problem to present. Could be a little more interesting to the audience as you are exposed to different problems and the various strengths of each toolkit. This could also help guide future development in the other toolkits as you would benefit from learning about the api and/or programming language. Each group would register a problem that they are going to present. From the group of problems not picked that becomes the surprise challenge where each group has 24 hours to either put together a presentation or an actual solution. Scooter On Jan 28, 2010, at 4:17 PM, Mark Wilkinson wrote: > > Brad, this sounds exciting! > > One thing strikes me, though - by asking for the sub-projects to propose > the "grand challenge" themselves the one thing you can guarantee is that > the "grand challenge" is solvable (or more likely, already solved!) > > Other "grand challenge" kinds of meetings have an independent third party > pose the problem that has to be solved, and then all groups work toward a > solution and compare their results. This would, IMO, be more revealing of > the "state of the art" in each Open-Bio project, and point out where the > weaknesses are that we should be focusing on... Someone (for example, > you!) could act as the moderator to ensure that the "grand challenge" was > at least a reasonable one, within the scope of what an Open-Bio project > *should* be able to solve... > > Just my CAD $0.02 > > Mark > > > > On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman > wrote: > >> Hello all; >> The BOSC 2010 organizing committee is hard at work getting prepared for >> this >> July's meeting in Boston: >> >> http://www.open-bio.org/wiki/BOSC_2010 >> >> One of the items we've traditionally had at the conference is a project >> update from each of the OpenBio affiliated groups. This year, we're >> thinking >> about organizing these talks around a central theme: the OpenBio solution >> challenge. We start with a biological question of general interest, and >> each >> of the project talks would focus around how you would solve that problem >> using your toolkit and programming language. >> >> This is meant to provide a challenge for OpenBio contributors, a nice >> tutorial >> style overview of various projects and approaches for other programmers, >> and a >> fun opportunity to compete and learn from other projects. Conference >> attendees >> will vote on their favorite solution, with the winner receiving fame and >> fortune (warning: fortune not guaranteed). >> >> For this to be successful, it of course requires interest and enthusiasm >> from >> y'all fine folks involved with the projects. Specifically: >> >> - Is there interest from your group in participating in the challenge? >> You'll >> want at least a few people to work on it, and someone to give a >> presentation >> at BOSC. >> >> - Do you have suggestions on a good theme or specific biological problem >> to >> tackle? We'll hope to pick something in a sweet spot that is >> challenging >> enough to be of interest, yet reasonable for presentation and >> preparation. >> >> Let's discuss ideas and get this together. Since the schedule for BOSC is >> developing rapidly, please give us an idea if you're interested by >> February 12th, and copy responses to the BOSC mailing list as a central >> place for discussion. >> >> bosc at open-bio.org >> >> Thanks, >> Brad, Michael, and the BOSC organizing committee >> _______________________________________________ >> MOBY-dev mailing list >> MOBY-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/moby-dev > > > -- > Mark D Wilkinson, PI Bioinformatics > Assistant Professor, Medical Genetics > The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research > Providence Heart + Lung Institute > University of British Columbia - St. Paul's Hospital > Vancouver, BC, Canada > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From biopython at maubp.freeserve.co.uk Fri Jan 29 10:36:40 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 29 Jan 2010 10:36:40 +0000 Subject: [Biojava-dev] [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project updates at BOSC 2010 In-Reply-To: References: <20100128203505.GG40046@sobchak.mgh.harvard.edu> Message-ID: <320fb6e01001290236l1ad02515w403a19f94dbb6d15@mail.gmail.com> Hi all, This is a great topic but should be continue it on just the one mailing list? Is there a suitable BOSC list, or how about the general Open Bio list? On Thu, Jan 28, 2010 at 9:17 PM, Mark Wilkinson wrote: > > Brad, this sounds exciting! > > One thing strikes me, though - by asking for the sub-projects to propose > the "grand challenge" themselves the one thing you can guarantee is that > the "grand challenge" is solvable (or more likely, already solved!) > > Other "grand challenge" kinds of meetings have an independent third party > pose the problem that has to be solved, and then all groups work toward a > solution and compare their results. ?This would, IMO, be more revealing of > the "state of the art" in each Open-Bio project, and point out where the > weaknesses are that we should be focusing on... ?Someone (for example, > you!) could act as the moderator to ensure that the "grand challenge" was > at least a reasonable one, within the scope of what an Open-Bio project > *should* be able to solve... > > Just my CAD $0.02 > > Mark One possible problem with having Brad act as moderator is his ties to Biopython (plus it would be a shame if we'd be one man down for trying to solve the challenges - grin). Having a project representative "sign off" on the challenge might work - or simply the whole of the BOSC committee which is quite balanced. Alternatively some kind of panel of challenges does seem a good way to reduce individual project bias (as suggest by Scooter), but there will still need to be a judging committee. I'm curious what kind of challenges the BOSC committee had in mind - would something like taking a newly sequence bacteria and producing an automated annotation as a GenBank, EMBL, or GFF file be too ambitious for example? There are already several major projects to do this e.g. RAST http://rast.nmpdr.org/ Peter (@Biopython)