From holland at ebi.ac.uk Mon Mar 3 03:41:58 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Mon, 03 Mar 2008 08:41:58 +0000 Subject: [Biojava-l] RichLocation.Tools.merge(Collection members) method In-Reply-To: <47C839BC.8030506@ebi.ac.uk> References: <47C839BC.8030506@ebi.ac.uk> Message-ID: <47CBB9D6.8090705@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Arnaud - you are correct. cheers, Richard Arnaud Kerhornou wrote: > Hi everyone, > > I don't think the RichLocation.Tools.merge(Collection members) method is > doing it right. > > e.g. Input: > biojavax:join:[1157624..1158025,1158025..1158420,1158420..1158893] > Expected output:1157624..1158895 > > But I get: join:[1157624..1158420,1158420..1158894] > > I think the code should have the extra line: parent = union; > just after c=p; statement line 18 (See source code below), > otherwise it doesn't take into account the newly generated location. > > Is that right ? > > Thanks > Arnaud > > Source code: > > 1 public static Collection merge(Collection members) { > 2 // flatten them out first so we don't end up recursing > 3 List membersList = new ArrayList(flatten(members)); > 4 // all members are now singles so we can use single vs > single union operations > 5 if (membersList.size()>1) { > 6 for (int p = 0; p < (membersList.size()-1); p++) { > 7 RichLocation parent = > (RichLocation)membersList.get(p); > 8 for (int c = p+1; c < membersList.size(); c++) { > 9 RichLocation child = > (RichLocation)membersList.get(c); > 10 RichLocation union = > (RichLocation)parent.union(child); > 11 // if parent can merge with child > 12 if (union.isContiguous()) { > 13 // replace parent with union > 14 membersList.set(p,union); > 15 // remove child > 16 membersList.remove(c); > 17 // check all children again > 18 c=p; > 19 } > 20 } > 21 } > 22 } > 23 return membersList; > 24 } > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHy7m04C5LeMEKA/QRAj05AJ9SBrv6yz8qvhwbmTrLZVfmwBuHTACgiq57 +J0EpviSyp2Qq00m4A8xLUA= =re75 -----END PGP SIGNATURE----- From markjschreiber at gmail.com Tue Mar 4 21:16:13 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 5 Mar 2008 10:16:13 +0800 Subject: [Biojava-l] biojava group on linkedin Message-ID: <93b45ca50803041816s28728b2eme99edcda863d252d@mail.gmail.com> Hi - BioJava is now a group on the networking site, linkedin (www.linkedin.com). If you have been involved in biojava in some way and want to join you can find it on the list of available groups when editing your profile. Best regards, - Mark From markjschreiber at gmail.com Tue Mar 4 21:20:52 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 5 Mar 2008 10:20:52 +0800 Subject: [Biojava-l] RichLocation.Tools.merge(Collection members) method In-Reply-To: <47CBB9D6.8090705@ebi.ac.uk> References: <47C839BC.8030506@ebi.ac.uk> <47CBB9D6.8090705@ebi.ac.uk> Message-ID: <93b45ca50803041820i38b06a68m9308e00eefb2824c@mail.gmail.com> Just to follow up... Has there been a fix checked in for this? - Mark On Mon, Mar 3, 2008 at 4:41 PM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Arnaud - you are correct. > > cheers, > Richard > > > > Arnaud Kerhornou wrote: > > Hi everyone, > > > > I don't think the RichLocation.Tools.merge(Collection members) method is > > doing it right. > > > > e.g. Input: > > biojavax:join:[1157624..1158025,1158025..1158420,1158420..1158893] > > Expected output:1157624..1158895 > > > > But I get: join:[1157624..1158420,1158420..1158894] > > > > I think the code should have the extra line: parent = union; > > just after c=p; statement line 18 (See source code below), > > otherwise it doesn't take into account the newly generated location. > > > > Is that right ? > > > > Thanks > > Arnaud > > > > Source code: > > > > 1 public static Collection merge(Collection members) { > > 2 // flatten them out first so we don't end up recursing > > 3 List membersList = new ArrayList(flatten(members)); > > 4 // all members are now singles so we can use single vs > > single union operations > > 5 if (membersList.size()>1) { > > 6 for (int p = 0; p < (membersList.size()-1); p++) { > > 7 RichLocation parent = > > (RichLocation)membersList.get(p); > > 8 for (int c = p+1; c < membersList.size(); c++) { > > 9 RichLocation child = > > (RichLocation)membersList.get(c); > > 10 RichLocation union = > > (RichLocation)parent.union(child); > > 11 // if parent can merge with child > > 12 if (union.isContiguous()) { > > 13 // replace parent with union > > 14 membersList.set(p,union); > > 15 // remove child > > 16 membersList.remove(c); > > 17 // check all children again > > 18 c=p; > > 19 } > > 20 } > > 21 } > > 22 } > > 23 return membersList; > > 24 } > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > - -- > Richard Holland (BioMart) > EMBL EBI, Wellcome Trust Genome Campus, > Hinxton, Cambridgeshire CB10 1SD, UK > Tel. +44 (0)1223 494416 > > http://www.biomart.org/ > http://www.biojava.org/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHy7m04C5LeMEKA/QRAj05AJ9SBrv6yz8qvhwbmTrLZVfmwBuHTACgiq57 > +J0EpviSyp2Qq00m4A8xLUA= > =re75 > -----END PGP SIGNATURE----- > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From holland at ebi.ac.uk Wed Mar 5 03:35:24 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 05 Mar 2008 08:35:24 +0000 Subject: [Biojava-l] RichLocation.Tools.merge(Collection members) method In-Reply-To: <93b45ca50803041820i38b06a68m9308e00eefb2824c@mail.gmail.com> References: <47C839BC.8030506@ebi.ac.uk> <47CBB9D6.8090705@ebi.ac.uk> <93b45ca50803041820i38b06a68m9308e00eefb2824c@mail.gmail.com> Message-ID: <47CE5B4C.3080506@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Not that I'm aware of. Mark Schreiber wrote: > Just to follow up... > > Has there been a fix checked in for this? > > - Mark > > On Mon, Mar 3, 2008 at 4:41 PM, Richard Holland wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Arnaud - you are correct. >> >> cheers, >> Richard >> >> >> >> Arnaud Kerhornou wrote: >> > Hi everyone, >> > >> > I don't think the RichLocation.Tools.merge(Collection members) method is >> > doing it right. >> > >> > e.g. Input: >> > biojavax:join:[1157624..1158025,1158025..1158420,1158420..1158893] >> > Expected output:1157624..1158895 >> > >> > But I get: join:[1157624..1158420,1158420..1158894] >> > >> > I think the code should have the extra line: parent = union; >> > just after c=p; statement line 18 (See source code below), >> > otherwise it doesn't take into account the newly generated location. >> > >> > Is that right ? >> > >> > Thanks >> > Arnaud >> > >> > Source code: >> > >> > 1 public static Collection merge(Collection members) { >> > 2 // flatten them out first so we don't end up recursing >> > 3 List membersList = new ArrayList(flatten(members)); >> > 4 // all members are now singles so we can use single vs >> > single union operations >> > 5 if (membersList.size()>1) { >> > 6 for (int p = 0; p < (membersList.size()-1); p++) { >> > 7 RichLocation parent = >> > (RichLocation)membersList.get(p); >> > 8 for (int c = p+1; c < membersList.size(); c++) { >> > 9 RichLocation child = >> > (RichLocation)membersList.get(c); >> > 10 RichLocation union = >> > (RichLocation)parent.union(child); >> > 11 // if parent can merge with child >> > 12 if (union.isContiguous()) { >> > 13 // replace parent with union >> > 14 membersList.set(p,union); >> > 15 // remove child >> > 16 membersList.remove(c); >> > 17 // check all children again >> > 18 c=p; >> > 19 } >> > 20 } >> > 21 } >> > 22 } >> > 23 return membersList; >> > 24 } >> > _______________________________________________ >> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> - -- >> Richard Holland (BioMart) >> EMBL EBI, Wellcome Trust Genome Campus, >> Hinxton, Cambridgeshire CB10 1SD, UK >> Tel. +44 (0)1223 494416 >> >> http://www.biomart.org/ >> http://www.biojava.org/ >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.2.2 (GNU/Linux) >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> >> iD8DBQFHy7m04C5LeMEKA/QRAj05AJ9SBrv6yz8qvhwbmTrLZVfmwBuHTACgiq57 >> +J0EpviSyp2Qq00m4A8xLUA= >> =re75 >> -----END PGP SIGNATURE----- >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHzltM4C5LeMEKA/QRAnk9AKCP7/0jmWk7h7rGd4+jwPkmUK9qUgCfe9Oz j+UWAU+q9orPHtpWgg48N70= =lBUD -----END PGP SIGNATURE----- From markjschreiber at gmail.com Thu Mar 6 01:24:32 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 6 Mar 2008 14:24:32 +0800 Subject: [Biojava-l] proposal to drop BioSQL Singapore version support from BJ version 1.6 onwards Message-ID: <93b45ca50803052224y5434d6d9t35fa180f33db35ca@mail.gmail.com> Hi all - With the formal release of BioSQL v1.0 (Tokyo) and the new Hibernate ORM mapping to BioSQL offered in BioJava 1.5 (BioJavaX) the older JDBC mappings to the BioSQL (Singapore) are now outdated. We have not been supporting these for some time. Unless people are still actively using these mappings I would propose that we drop them from the upcoming BioJava 1.6. This would remove some cruft from the code base and would also mean we can drop about 4 jar files from the lib (commons-pool etc). Are there any strong arguments for or against? - Mark From ayates at ebi.ac.uk Thu Mar 6 04:10:25 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 6 Mar 2008 09:10:25 +0000 Subject: [Biojava-l] [Biojava-dev] proposal to drop BioSQL Singapore version support from BJ version 1.6 onwards In-Reply-To: <93b45ca50803052224y5434d6d9t35fa180f33db35ca@mail.gmail.com> References: <93b45ca50803052224y5434d6d9t35fa180f33db35ca@mail.gmail.com> Message-ID: +1 Maintaining two apis is always a bad idea On 6 Mar 2008, at 06:24, Mark Schreiber wrote: > Hi all - > > With the formal release of BioSQL v1.0 (Tokyo) and the new Hibernate > ORM mapping to BioSQL offered in BioJava 1.5 (BioJavaX) the older JDBC > mappings to the BioSQL (Singapore) are now outdated. We have not been > supporting these for some time. > > Unless people are still actively using these mappings I would propose > that we drop them from the upcoming BioJava 1.6. This would remove > some cruft from the code base and would also mean we can drop about 4 > jar files from the lib (commons-pool etc). > > Are there any strong arguments for or against? > > - Mark > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From awollhe at gwdg.de Thu Mar 6 07:41:59 2008 From: awollhe at gwdg.de (Antje Wollherr) Date: Thu, 06 Mar 2008 13:41:59 +0100 Subject: [Biojava-l] Problem with EMBL and RichStreamReader Message-ID: <1204807319.23262.11.camel@antje-desktop> Hello biojava people, I am new to this list and also not very familiar with BioJava. I was trying to parse an EMBL File and extract the dna sequence with RichSequence.IOTools.readEMBLDNA(br, ns); . For most of the embl files this isn't a problem, but the file with accssion number AL009126 (Bacillus subtilis 168) caused an BioExcpetion saying the sequence could not be read. Can somebody tell where the problem is or how it can be solved? I'm using BioJava 1.5. Thank you a lot Antje Here is the error message: Exception Has Occurred During Parsing. Please submit the details that follow to biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/ Format_object=org.biojavax.bio.seq.io.EMBLFormat Accession=AL009126 Id=not set Comments=Unable to handle contig assemblies just yet Parse_block= Stack trace follows .... at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:730) at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:284) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) ... 2 more From holland at ebi.ac.uk Thu Mar 6 08:29:30 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 06 Mar 2008 13:29:30 +0000 Subject: [Biojava-l] Problem with EMBL and RichStreamReader In-Reply-To: <1204807319.23262.11.camel@antje-desktop> References: <1204807319.23262.11.camel@antje-desktop> Message-ID: <47CFF1BA.7060003@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The error message says it all: Comments=Unable to handle contig assemblies just yet BioJava does not yet support EMBL contig files such as AL009126. cheers, Richard Antje Wollherr wrote: > Hello biojava people, > > I am new to this list and also not very familiar with BioJava. I was > trying to parse an EMBL File and extract the dna sequence with > RichSequence.IOTools.readEMBLDNA(br, ns); . > > For most of the embl files this isn't a problem, but the file with > accssion number AL009126 (Bacillus subtilis 168) caused an BioExcpetion > saying the sequence could not be read. > > Can somebody tell where the problem is or how it can be solved? > I'm using BioJava 1.5. > > Thank you a lot > > Antje > > Here is the error message: > > Exception Has Occurred During Parsing. > Please submit the details that follow to biojava-l at biojava.org or post a > bug report to http://bugzilla.open-bio.org/ > > Format_object=org.biojavax.bio.seq.io.EMBLFormat > Accession=AL009126 > Id=not set > Comments=Unable to handle contig assemblies just yet > Parse_block= > Stack trace follows .... > > > at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:730) > at > org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:284) > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) > ... 2 more > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHz/G64C5LeMEKA/QRAn5tAJ9i3ocloh8HoPXW2BP4sINhkV1+jgCff44m DJIY1k+vicFOMoX4GCwdwGs= =8JwO -----END PGP SIGNATURE----- From awollhe at gwdg.de Thu Mar 6 08:57:39 2008 From: awollhe at gwdg.de (Antje Wollherr) Date: Thu, 06 Mar 2008 14:57:39 +0100 Subject: [Biojava-l] Problem with EMBL and RichStreamReader In-Reply-To: <47CFF1BA.7060003@ebi.ac.uk> References: <1204807319.23262.11.camel@antje-desktop> <47CFF1BA.7060003@ebi.ac.uk> Message-ID: <1204811859.24352.16.camel@antje-desktop> Hallo Richard, thank you for the fast response. Now I understand, what the error message means. Sorry for asking stupid questions but sometimes I don't see the wood for the trees. ;) Cheers, Antje On Thu, 2008-03-06 at 13:29 +0000, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > The error message says it all: > > Comments=Unable to handle contig assemblies just yet > > BioJava does not yet support EMBL contig files such as AL009126. > > cheers, > Richard > > Antje Wollherr wrote: > > Hello biojava people, > > > > I am new to this list and also not very familiar with BioJava. I was > > trying to parse an EMBL File and extract the dna sequence with > > RichSequence.IOTools.readEMBLDNA(br, ns); . > > > > For most of the embl files this isn't a problem, but the file with > > accssion number AL009126 (Bacillus subtilis 168) caused an BioExcpetion > > saying the sequence could not be read. > > > > Can somebody tell where the problem is or how it can be solved? > > I'm using BioJava 1.5. > > > > Thank you a lot > > > > Antje > > > > Here is the error message: > > > > Exception Has Occurred During Parsing. > > Please submit the details that follow to biojava-l at biojava.org or post a > > bug report to http://bugzilla.open-bio.org/ > > > > Format_object=org.biojavax.bio.seq.io.EMBLFormat > > Accession=AL009126 > > Id=not set > > Comments=Unable to handle contig assemblies just yet > > Parse_block= > > Stack trace follows .... > > > > > > at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:730) > > at > > org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:284) > > at > > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) > > ... 2 more > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > - -- > Richard Holland (BioMart) > EMBL EBI, Wellcome Trust Genome Campus, > Hinxton, Cambridgeshire CB10 1SD, UK > Tel. +44 (0)1223 494416 > > http://www.biomart.org/ > http://www.biojava.org/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHz/G64C5LeMEKA/QRAn5tAJ9i3ocloh8HoPXW2BP4sINhkV1+jgCff44m > DJIY1k+vicFOMoX4GCwdwGs= > =8JwO > -----END PGP SIGNATURE----- > -- Antje Wollherr, Diplom-Bioinformatikerin G?ttinger Genomlabor Institut f?r Mikrobiologie und Genetik Grisebachstra?e 8 37077 G?ttingen Email: awollhe at gwdg.de Tel.: 0551 393843 Fax: 0551 394195 From alex.johansson1 at gmail.com Fri Mar 7 15:22:45 2008 From: alex.johansson1 at gmail.com (alex johansson) Date: Fri, 7 Mar 2008 21:22:45 +0100 Subject: [Biojava-l] Dp newbie question!! Message-ID: <33380be30803071222l13357a33l4bf2c9e009d3d7fc@mail.gmail.com> Hi, Iam a Cell biology student with a growing interest in biojava, although i have a very basic biojava experience but the cookbook examples makes it easy to get around with api. My question is very basic and might sound very stupid, i followed the cookbook example on creating a HMMER like profileHMM and made a profile with a set 12 training sequences (19bp) and tested it with a test sequence with motif occuring twice.Below is the output from the program: Log Odds = 43.786769243019506 m-1 m-2 m-3 m-4 m-5 m-6 d-7 m-8 m-9 m-10 m-11 m-12 m-13 m-14 m-15 m-16 m-17 d-18 i-18 d-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 m-20 i-20 My Question is how to interpret these results?how do i know if the motif is occuring twice and its location in the test sequence?? I know that 'm' stands for match and 'i' and 'd' stands for insert and delete transitions in the path from start to end. I'd certainly appreciate if the biojava gurus out there could spend some of their valuable time in explaining this. Thank you for your time, Cheers, Alex J From markjschreiber at gmail.com Sat Mar 8 09:41:44 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 8 Mar 2008 22:41:44 +0800 Subject: [Biojava-l] Dp newbie question!! In-Reply-To: <33380be30803071222l13357a33l4bf2c9e009d3d7fc@mail.gmail.com> References: <33380be30803071222l13357a33l4bf2c9e009d3d7fc@mail.gmail.com> Message-ID: <93b45ca50803080641v51a8111egbd3a8c34a8941c86@mail.gmail.com> Hi Alex - Good to know that the cookbook is helpful in getting you started. You are correct about the state path of matches and deletes etc. The limitation of the model you probably used is that it doesn't loop back on itself and can by definition find only one match to a repeated motif. There are two ways to deal with this. One would be to wire up the model (set the transition alphabets and probs) so that the model can repeat (or at least repeat the motif part). The other would be to apply the model to a sliding window. The second approach requires less understanding of the DP package but is much less efficient and if you interested in interpreting the forwards and backwards probs it would be a bit hard to correct for the sliding window. Hope this helps. - Mark On Sat, Mar 8, 2008 at 4:22 AM, alex johansson wrote: > Hi, > > Iam a Cell biology student with a growing interest in biojava, although i > have a very basic biojava experience but the cookbook > examples makes it easy to get around with api. My question is very basic and > might sound very stupid, i followed the cookbook example on creating a HMMER > like profileHMM and made a profile with a set 12 training sequences (19bp) > and tested it with a test sequence with motif occuring twice.Below is the > output from the program: > > Log Odds = 43.786769243019506 > m-1 m-2 m-3 m-4 m-5 m-6 d-7 m-8 m-9 m-10 m-11 m-12 m-13 m-14 m-15 m-16 m-17 > d-18 i-18 d-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 > i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 m-20 i-20 > > My Question is how to interpret these results?how do i know if the motif is > occuring twice and its location in the test sequence?? I know that 'm' > stands for match and 'i' and 'd' stands for insert and delete transitions in > the path from start to end. > > I'd certainly appreciate if the biojava gurus out there could spend some of > their valuable time in explaining this. > > Thank you for your time, > > Cheers, > Alex J > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From pwrose at ucsd.edu Tue Mar 11 18:34:33 2008 From: pwrose at ucsd.edu (Peter Rose) Date: Tue, 11 Mar 2008 15:34:33 -0700 Subject: [Biojava-l] [Job] Scientific Software Developer - RCSB PDB La Jolla, CA Message-ID: <006701c883c8$14b0fb60$3e12f220$@edu> The RCSB Protein Data Bank at the University of California San Diego has an open position for a junior Scientific Software Developer: http://joblink.ucsd.edu/bulletin/job.html?cat=new&job_id=45537 Please contact Dr. Peter Rose at pwrose at sdsc.edu or Dr. Phil Bourne at bourne at sdsc.edu about this position. From pwrose at ucsd.edu Wed Mar 12 20:21:48 2008 From: pwrose at ucsd.edu (Peter Rose) Date: Wed, 12 Mar 2008 17:21:48 -0700 Subject: [Biojava-l] [Job] Lead Web Architect - RCSB PDB at UCSD, La Jolla, CA Message-ID: <001901c884a0$3a4826e0$aed874a0$@edu> The RCSB Protein Data Bank has an exciting position for a Lead Web Architect to help shape the future presentation layer of the website. A detailed description of the job is available: http://joblink.ucsd.edu/bulletin/job.html?cat=new&job_id=44789 Please contact Dr. Peter Rose at pwrose at sdsc.edu or Dr. Phil Bourne at bourne at sdsc.edu about this position. From adf at ncgr.org Thu Mar 13 15:06:40 2008 From: adf at ncgr.org (Andrew Farmer) Date: Thu, 13 Mar 2008 13:06:40 -0600 Subject: [Biojava-l] using phred calls with ChromatogramGraphic produced from ab1 files Message-ID: <47D97B40.2040100@ncgr.org> Hi all- I have been trying to use the ChromatogramGraphic class to display ABI chromatogram data, whilst relating this to alignments of sequences called from these trace files with phred. For example, if the user clicks a putatively polymorphic base in an alignment viewer, to scroll and highlight the region of the ChromatogramGraphic corresponding to the ase call. However, I seem to be having some difficulty in establishing correspondences between the phred base calls and the information shown in the graphic. As far as I understand what is being displayed by ChromatogramGraphic, it is drawing "callboxes" around peaks corresponding to calls that are stored by Chromatogram, which in turn is storing information about the base calls that was encoded in the ABI file. These tend to differ substantially (e.g. in lower-quality areas) from the calls made by phred- e.g. an untrimmed phred-called sequence might have 1300 bases to the abi-called version's 900 bases. So, I am trying to find some way to get the ChromatogramGraphic callboxes to reflect the calls made by phred. Has anyone else encountered this type of situation before? It appears that phred's phd output encodes a trace offset for each of its calls, so I would guess that one could conceivably overlay the phred calls into a chromatogram produced by the abi parser in order to get the callboxes to reflect phred's interpretation of the trace data. I could be way off-base (no pun intended) in my interpretation, and would appreciate any insights from the gurus out there. And if this is more or less correct, and there is not yet a canned solution, any advice on how to go about coding it in a way that could be contributed back to the project would be great. Thanks in advance -- Andrew Farmer adf at ncgr.org (505) 995-4464 Database Administrator/Software Developer National Center for Genome Resources --- "To live in the presence of great truths and eternal laws, to be led by permanent ideals- that is what keeps a man patient when the world ignores him, and calm and unspoiled when the world praises him." -Balzac --- From adf at ncgr.org Thu Mar 13 19:27:46 2008 From: adf at ncgr.org (Andrew Farmer) Date: Thu, 13 Mar 2008 17:27:46 -0600 Subject: [Biojava-l] jRe: using phred calls with ChromatogramGraphic produced from ab1 files In-Reply-To: References: <47D97B40.2040100@ncgr.org> Message-ID: <47D9B872.4040308@ncgr.org> Eric- thanks very much for your insights, the phred -c trick might in fact be exactly what I need. If not, I will check out the other code you have sent and follow up off-list if I have further questions. Andrew Eric Haugen wrote: > > Hi Andrew, > > It looks like what I did three years ago was turn off the > ChromatogramGraphic's call boxes: > > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_SEPARATORS, > Boolean.FALSE ); > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_A, Boolean.FALSE ); > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_C, Boolean.FALSE ); > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_G, Boolean.FALSE ); > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_T, Boolean.FALSE ); > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_OTHER, > Boolean.FALSE ); > > then set the range based on the chromat positions in the phred file: > > graphic.setOption(ChromatogramGraphic.Option.FROM_TRACE_SAMPLE, > startIndex ); > graphic.setOption(ChromatogramGraphic.Option.TO_TRACE_SAMPLE, endIndex ); > > and finally just draw Phred's base calls myself based on the chromat > position scale. > > I don't know if you'll find it useful, but I've attached > PhdSequence.java and support code which I use as an alternative to > PhredSequence currently in biojava, which I think ignores chromat > positions. > > But the easiest solution may be to have "phred -c" convert your ABI > chromats to SCF files containing the phred base calls. > > -- > Eric Haugen > Software Engineer > University of Washington Genome Center > ehaugen at u.washington.edu > (206) 616-7582 > > On Thu, 13 Mar 2008, Andrew Farmer wrote: > >> Hi all- >> I have been trying to use the ChromatogramGraphic class to display ABI >> chromatogram data, whilst relating this to alignments of sequences >> called from these trace files with phred. For example, if the user >> clicks a putatively polymorphic base in an alignment viewer, to scroll >> and highlight the region of the ChromatogramGraphic corresponding to >> the ase call. However, I seem to be having some difficulty in >> establishing correspondences between the phred base calls and the >> information shown in the graphic. >> >> As far as I understand what is being displayed by ChromatogramGraphic, >> it is drawing "callboxes" around peaks corresponding to calls that are >> stored by Chromatogram, which in turn is storing information about the >> base calls that was encoded in the ABI file. These tend to differ >> substantially (e.g. in lower-quality areas) from the calls made by >> phred- e.g. an untrimmed phred-called sequence might have 1300 bases >> to the abi-called version's 900 bases. So, I am trying to find some >> way to get the ChromatogramGraphic callboxes to reflect the calls made >> by phred. >> >> Has anyone else encountered this type of situation before? It appears >> that phred's phd output encodes a trace offset for each of its calls, >> so I would guess that one could conceivably overlay the phred calls >> into a chromatogram produced by the abi parser in order to get the >> callboxes to reflect phred's interpretation of the trace data. >> >> I could be way off-base (no pun intended) in my interpretation, and >> would appreciate any insights from the gurus out there. And if this is >> more or less correct, and there is not yet a canned solution, any >> advice on how to go about coding it in a way that could be contributed >> back to the project would be great. >> >> Thanks in advance >> -- >> >> Andrew Farmer >> adf at ncgr.org >> (505) 995-4464 >> Database Administrator/Software Developer >> National Center for Genome Resources >> >> --- >> "To live in the presence of great truths and eternal laws, >> to be led by permanent ideals- >> that is what keeps a man patient when the world ignores him, >> and calm and unspoiled when the world praises him." >> -Balzac >> --- >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> -- Andrew Farmer adf at ncgr.org (505) 995-4464 Database Administrator/Software Developer National Center for Genome Resources --- "To live in the presence of great truths and eternal laws, to be led by permanent ideals- that is what keeps a man patient when the world ignores him, and calm and unspoiled when the world praises him." -Balzac --- From markjschreiber at gmail.com Fri Mar 14 08:48:59 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 14 Mar 2008 20:48:59 +0800 Subject: [Biojava-l] jRe: using phred calls with ChromatogramGraphic produced from ab1 files In-Reply-To: <47D9B872.4040308@ncgr.org> References: <47D97B40.2040100@ncgr.org> <47D9B872.4040308@ncgr.org> Message-ID: <93b45ca50803140548y1fc37831m1f071c7bc4f95370@mail.gmail.com> Some code examples like this would be great on the biojava.org cookbook. If people could add their GUI code that would be excellent. - Mark On Fri, Mar 14, 2008 at 7:27 AM, Andrew Farmer wrote: > Eric- > thanks very much for your insights, the phred -c trick might in fact be > exactly what I need. If not, I will check out the other code you have > sent and follow up off-list if I have further questions. > > > Andrew > > > Eric Haugen wrote: > > > > Hi Andrew, > > > > It looks like what I did three years ago was turn off the > > ChromatogramGraphic's call boxes: > > > > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_SEPARATORS, > > Boolean.FALSE ); > > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_A, Boolean.FALSE ); > > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_C, Boolean.FALSE ); > > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_G, Boolean.FALSE ); > > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_T, Boolean.FALSE ); > > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_OTHER, > > Boolean.FALSE ); > > > > then set the range based on the chromat positions in the phred file: > > > > graphic.setOption(ChromatogramGraphic.Option.FROM_TRACE_SAMPLE, > > startIndex ); > > graphic.setOption(ChromatogramGraphic.Option.TO_TRACE_SAMPLE, endIndex ); > > > > and finally just draw Phred's base calls myself based on the chromat > > position scale. > > > > I don't know if you'll find it useful, but I've attached > > PhdSequence.java and support code which I use as an alternative to > > PhredSequence currently in biojava, which I think ignores chromat > > positions. > > > > But the easiest solution may be to have "phred -c" convert your ABI > > chromats to SCF files containing the phred base calls. > > > > -- > > Eric Haugen > > Software Engineer > > University of Washington Genome Center > > ehaugen at u.washington.edu > > (206) 616-7582 > > > > On Thu, 13 Mar 2008, Andrew Farmer wrote: > > > >> Hi all- > >> I have been trying to use the ChromatogramGraphic class to display ABI > >> chromatogram data, whilst relating this to alignments of sequences > >> called from these trace files with phred. For example, if the user > >> clicks a putatively polymorphic base in an alignment viewer, to scroll > >> and highlight the region of the ChromatogramGraphic corresponding to > >> the ase call. However, I seem to be having some difficulty in > >> establishing correspondences between the phred base calls and the > >> information shown in the graphic. > >> > >> As far as I understand what is being displayed by ChromatogramGraphic, > >> it is drawing "callboxes" around peaks corresponding to calls that are > >> stored by Chromatogram, which in turn is storing information about the > >> base calls that was encoded in the ABI file. These tend to differ > >> substantially (e.g. in lower-quality areas) from the calls made by > >> phred- e.g. an untrimmed phred-called sequence might have 1300 bases > >> to the abi-called version's 900 bases. So, I am trying to find some > >> way to get the ChromatogramGraphic callboxes to reflect the calls made > >> by phred. > >> > >> Has anyone else encountered this type of situation before? It appears > >> that phred's phd output encodes a trace offset for each of its calls, > >> so I would guess that one could conceivably overlay the phred calls > >> into a chromatogram produced by the abi parser in order to get the > >> callboxes to reflect phred's interpretation of the trace data. > >> > >> I could be way off-base (no pun intended) in my interpretation, and > >> would appreciate any insights from the gurus out there. And if this is > >> more or less correct, and there is not yet a canned solution, any > >> advice on how to go about coding it in a way that could be contributed > >> back to the project would be great. > >> > >> Thanks in advance > >> -- > >> > >> Andrew Farmer > >> adf at ncgr.org > >> (505) 995-4464 > >> Database Administrator/Software Developer > >> National Center for Genome Resources > >> > >> --- > >> "To live in the presence of great truths and eternal laws, > >> to be led by permanent ideals- > >> that is what keeps a man patient when the world ignores him, > >> and calm and unspoiled when the world praises him." > >> -Balzac > >> --- > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > > -- > > Andrew Farmer > adf at ncgr.org > (505) 995-4464 > Database Administrator/Software Developer > National Center for Genome Resources > > --- > "To live in the presence of great truths and eternal laws, > to be led by permanent ideals- > that is what keeps a man patient when the world ignores him, > and calm and unspoiled when the world praises him." > -Balzac > --- > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From yangkuan81 at gmail.com Sun Mar 23 02:42:14 2008 From: yangkuan81 at gmail.com (Kuan Yang) Date: Sun, 23 Mar 2008 02:42:14 -0400 Subject: [Biojava-l] About ORFs Message-ID: Hi Guys, I am new to biojava and don't know how to do a lot of things with it. One of them is how to get ORF prediction with it. Any help will be appreciated. BTW, Is there a detailed manual for it? Thanks Kuan From yangkuan81 at gmail.com Sun Mar 23 04:17:52 2008 From: yangkuan81 at gmail.com (Kuan Yang) Date: Sun, 23 Mar 2008 04:17:52 -0400 Subject: [Biojava-l] How to translate RNA into Proteins with different codon tables? Message-ID: Hi all, Sorry, another question, how to translate a RNA into Protein with different codon table? All I can find is the "translate" method that uses the standard codon table (1). Thanks you so much in advance!!! Kuan From markjschreiber at gmail.com Sun Mar 23 08:46:44 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sun, 23 Mar 2008 20:46:44 +0800 Subject: [Biojava-l] About ORFs In-Reply-To: References: Message-ID: <93b45ca50803230546w5a8d61b3y7f184d95f0e0c45b@mail.gmail.com> Hi - For a 'manual' of biojava, take a look at the documentation section of the biojava wiki (www.biojava.org). Specifically the tutorial (http://biojava.org/wiki/BioJava:Tutorial) and the cookbook (http://biojava.org/wiki/BioJava:CookBook). There is more than one way to get an ORF prediction. One way would be to do a six frame translation (http://biojava.org/wiki/BioJava:Cookbook:Translation:SixFrames) a more sophisticated way would be to make a Hidden Markov Model that distinguishes coding ORFs from non coding ORFs. You could do this using the DP package. There is no specific tutorial but you can get some hints from the Dynamic Programming part of the cookbook. One of the interesting things about BioJava is that it is much more low level than something like EMBOSS. Rather than doing lots of things it enables you to write your own programs to do things. Hope this helps. Welcome to BioJava. - Mark On Sun, Mar 23, 2008 at 2:42 PM, Kuan Yang wrote: > Hi Guys, > > I am new to biojava and don't know how to do a lot of things with it. > One of them is how to get ORF prediction with it. Any help will be > appreciated. > BTW, Is there a detailed manual for it? > > Thanks > > Kuan > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From martin.jones at ed.ac.uk Thu Mar 27 09:45:50 2008 From: martin.jones at ed.ac.uk (Martin Jones) Date: Thu, 27 Mar 2008 13:45:50 +0000 Subject: [Biojava-l] how to get properties of a Feature from a GenBank file Message-ID: Hi, I'm just getting started with BioJava so this may be a simple question. I'm reading a RichSequence from a GenBank file and want to get the gene name of each CDS feature. The following code gets hold of the features I'm interested in for (Object o : mySeq.getFeatureSet()){ RichFeature f = (RichFeature) o; if (f.getType().equals("CDS")){ //get gene name here } } but I'm not sure how best to get the gene name. The following seems to work: for (Object o2 : f.getNoteSet()){ Note n = (Note) o2; if (n.getTerm().getName().equals("gene")){ System.out.println("gene name is " + n.getValue()); } } but seems overly verbose - ideally I'd like to be able to pass the Feature to another part of my program, but writing the above whenever I want to get the name seems like overkill. Is there a shorter way - something along the lines of String name = f.getNoteByName("gene").getValue(); Thanks in advance for any help. PS one more question - is there a reason why e.g. getNoteSet returns a Set rather than a Set, which makes it necessary to do all the type casts? Thanks, Martin From holland at ebi.ac.uk Thu Mar 27 10:11:26 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 27 Mar 2008 14:11:26 +0000 Subject: [Biojava-l] how to get properties of a Feature from a GenBank file In-Reply-To: References: Message-ID: <47EBAB0E.3020305@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, Your code for getting the Note named "gene" is correct. I agree, a shorthand way of doing this would be lovely, but doesn't currently exist. (Such a method would have to do the same thing internally anyway). If you'd like to write one and add it in then you'd be most welcome! :) At the time that the BioJavaX extensions were written compatibility with Java 1.4 was still required and so we could not make use of any new Java features that were introduced in Java 1.5. Set, being an example of Generics, is one of these. Future versions of BioJava will require the user to install Java 1.6 or later and so we will be able to use these newer features in both new and existing code, depending on feasibility (for instance it is not always possible to convert older code to use Generics in a sensible manner, and it is not always possible to write Generics code that can interface sensibly with older non-Generics modules). cheers, Richard Martin Jones wrote: > Hi, > > I'm just getting started with BioJava so this may be a simple > question. I'm reading a RichSequence from a GenBank file and want to > get the gene name of each CDS feature. The following code gets hold > of the features I'm interested in > > for (Object o : mySeq.getFeatureSet()){ > RichFeature f = (RichFeature) o; > if (f.getType().equals("CDS")){ > //get gene name here > } > } > > but I'm not sure how best to get the gene name. The following seems to work: > > for (Object o2 : f.getNoteSet()){ > Note n = (Note) o2; > if (n.getTerm().getName().equals("gene")){ > System.out.println("gene name is " + n.getValue()); > } > } > > but seems overly verbose - ideally I'd like to be able to pass the > Feature to another part of my program, but writing the above whenever > I want to get the name seems like overkill. Is there a shorter way - > something along the lines of > > String name = f.getNoteByName("gene").getValue(); > > Thanks in advance for any help. > > PS one more question - is there a reason why e.g. getNoteSet returns > a Set rather than a Set, which makes it necessary to do all > the type casts? > > Thanks, > > Martin > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH66sO4C5LeMEKA/QRAmq9AJ4qyMw4eGVYIMZjVf5jcADVRQmzpQCeOXej mak90aLUhSF60DrWeRtM8o0= =0EOE -----END PGP SIGNATURE----- From su24 at st-andrews.ac.uk Thu Mar 27 13:46:38 2008 From: su24 at st-andrews.ac.uk (Saif Ur-Rehman) Date: Thu, 27 Mar 2008 17:46:38 +0000 Subject: [Biojava-l] Problem Message-ID: <1206639998.47ebdd7e5f32c@webmail.st-andrews.ac.uk> Dear All, I am attempting to split up a Fasta file of an entire genomes amino acid sequences into separate files for each individual gene. I am simply reading the Fasta file of the entire genome as a Sequence Db and then iterating around it creating a new file for each Sequence and writing it out to that file. However out of a Fasta file containing 13465 genes only 12945 are written out to their own individual files. This does not occur in files which lack the termination symbol i.e do not use the Alphabet ("PROTEIN_TERM"). I was wondering if you could suggest any reason why this might occur as I am completely mystified. Thanking you in advance, Saif ------------------------------------------------------------------------------- Saif Ur-Rehman Research Student The Centre for Evolution, Genes & Genomics (CEGG) Dyers Brae School of Biology The University of St Andrews St Andrews, Fife Scotland,UK ------------------------------------------------------------------ University of St Andrews Webmail: https://webmail.st-andrews.ac.uk From su24 at st-andrews.ac.uk Thu Mar 27 13:46:40 2008 From: su24 at st-andrews.ac.uk (Saif Ur-Rehman) Date: Thu, 27 Mar 2008 17:46:40 +0000 Subject: [Biojava-l] Problem Message-ID: <1206640000.47ebdd802a753@webmail.st-andrews.ac.uk> Dear All, I am attempting to split up a Fasta file of an entire genomes amino acid sequences into separate files for each individual gene. I am simply reading the Fasta file of the entire genome as a Sequence Db and then iterating around it creating a new file for each Sequence and writing it out to that file. However out of a Fasta file containing 13465 genes only 12945 are written out to their own individual files. This does not occur in files which lack the termination symbol i.e do not use the Alphabet ("PROTEIN_TERM"). I was wondering if you could suggest any reason why this might occur as I am completely mystified. Thanking you in advance, Saif ------------------------------------------------------------------------------- Saif Ur-Rehman Research Student The Centre for Evolution, Genes & Genomics (CEGG) Dyers Brae School of Biology The University of St Andrews St Andrews, Fife Scotland,UK ------------------------------------------------------------------ University of St Andrews Webmail: https://webmail.st-andrews.ac.uk From holland at ebi.ac.uk Thu Mar 27 14:16:41 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 27 Mar 2008 18:16:41 -0000 (GMT) Subject: [Biojava-l] Problem In-Reply-To: <1206640000.47ebdd802a753@webmail.st-andrews.ac.uk> References: <1206640000.47ebdd802a753@webmail.st-andrews.ac.uk> Message-ID: <42786.80.42.15.154.1206641801.squirrel@webmail.ebi.ac.uk> I'm afraid we can't be much help unless we can see the actual code you have written to do this job! cheers, Richard On Thu, March 27, 2008 5:46 pm, Saif Ur-Rehman wrote: > > Dear All, > > I am attempting to split up a Fasta file of an entire genomes amino acid > sequences into separate files for each individual gene. I am simply > reading the > Fasta file of the entire genome as a Sequence Db and then iterating around > it > creating a new file for each Sequence and writing it out to that file. > However > out of a Fasta file containing 13465 genes only 12945 are written out to > their > own individual files. This does not occur in files which lack the > termination > symbol i.e do not use the Alphabet ("PROTEIN_TERM"). I was wondering if > you > could suggest any reason why this might occur as I am completely > mystified. > > Thanking you in advance, > > Saif > > > ------------------------------------------------------------------------------- > Saif Ur-Rehman > Research Student > The Centre for Evolution, Genes & Genomics (CEGG) > Dyers Brae > School of Biology > The University of St Andrews > St Andrews, > Fife > Scotland,UK > > ------------------------------------------------------------------ > University of St Andrews Webmail: https://webmail.st-andrews.ac.uk > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland BioMart (http://www.biomart.org/) EMBL-EBI Hinxton, Cambridgeshire CB10 1SD, UK From markjschreiber at gmail.com Thu Mar 27 21:55:38 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 28 Mar 2008 09:55:38 +0800 Subject: [Biojava-l] how to get properties of a Feature from a GenBank file In-Reply-To: <47EBAB0E.3020305@ebi.ac.uk> References: <47EBAB0E.3020305@ebi.ac.uk> Message-ID: <93b45ca50803271855r12190578l137c4bcc723bfb9b@mail.gmail.com> These are good points. Can we generify interfaces without breaking them? Certainly the addition of generics to the biojavax packages would remove a lot of nasty casting. Adding convenience methods would certainly also help biojavax code does get a bit verbose. Again this would break interfaces unless we put them in some kind of tools class. These are certainly good points to remember for future designs (eg biojava2) usability should be a test criteria as well. BioJavaX gives excellent ORM with BioSQL and great capture of detail in it's parsers but the coding style is a bit unwieldy. 2 out of 3 is not bad though : ) - Mark On Thu, Mar 27, 2008 at 10:11 PM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello, > > Your code for getting the Note named "gene" is correct. I agree, a > shorthand way of doing this would be lovely, but doesn't currently > exist. (Such a method would have to do the same thing internally > anyway). If you'd like to write one and add it in then you'd be most > welcome! :) > > At the time that the BioJavaX extensions were written compatibility with > Java 1.4 was still required and so we could not make use of any new Java > features that were introduced in Java 1.5. Set, being an > example of Generics, is one of these. > > Future versions of BioJava will require the user to install Java 1.6 or > later and so we will be able to use these newer features in both new and > existing code, depending on feasibility (for instance it is not always > possible to convert older code to use Generics in a sensible manner, and > it is not always possible to write Generics code that can interface > sensibly with older non-Generics modules). > > cheers, > Richard > > > > Martin Jones wrote: > > Hi, > > > > I'm just getting started with BioJava so this may be a simple > > question. I'm reading a RichSequence from a GenBank file and want to > > get the gene name of each CDS feature. The following code gets hold > > of the features I'm interested in > > > > for (Object o : mySeq.getFeatureSet()){ > > RichFeature f = (RichFeature) o; > > if (f.getType().equals("CDS")){ > > //get gene name here > > } > > } > > > > but I'm not sure how best to get the gene name. The following seems to > work: > > > > for (Object o2 : f.getNoteSet()){ > > Note n = (Note) o2; > > if (n.getTerm().getName().equals("gene")){ > > System.out.println("gene name is " + n.getValue()); > > } > > } > > > > but seems overly verbose - ideally I'd like to be able to pass the > > Feature to another part of my program, but writing the above whenever > > I want to get the name seems like overkill. Is there a shorter way - > > something along the lines of > > > > String name = f.getNoteByName("gene").getValue(); > > > > Thanks in advance for any help. > > > > PS one more question - is there a reason why e.g. getNoteSet returns > > a Set rather than a Set, which makes it necessary to do all > > the type casts? > > > > Thanks, > > > > Martin > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > - -- > Richard Holland (BioMart) > EMBL EBI, Wellcome Trust Genome Campus, > Hinxton, Cambridgeshire CB10 1SD, UK > Tel. +44 (0)1223 494416 > > http://www.biomart.org/ > http://www.biojava.org/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFH66sO4C5LeMEKA/QRAmq9AJ4qyMw4eGVYIMZjVf5jcADVRQmzpQCeOXej > mak90aLUhSF60DrWeRtM8o0= > =0EOE > -----END PGP SIGNATURE----- > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From ayates at ebi.ac.uk Fri Mar 28 05:17:54 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 28 Mar 2008 09:17:54 +0000 Subject: [Biojava-l] how to get properties of a Feature from a GenBank file In-Reply-To: <93b45ca50803271855r12190578l137c4bcc723bfb9b@mail.gmail.com> References: <47EBAB0E.3020305@ebi.ac.uk> <93b45ca50803271855r12190578l137c4bcc723bfb9b@mail.gmail.com> Message-ID: <47ECB7C2.3070800@ebi.ac.uk> As with most things the answer is a yes & no. It shouldn't break interfaces as in the contract setup between the interface & a consumer since generics are erased after compilation. However if we're looking at class incompatibility errors then it's quite likely that the new interface will have a different signature to the original one & may cause runtime errors in classes which haven't been recompiled against the new interface. A solution in some other projects is to have to sets of interfaces; one with generics & one without but that can be quite a nightmare to maintain. Andy Mark Schreiber wrote: > These are good points. Can we generify interfaces without breaking them? > Certainly the addition of generics to the biojavax packages would remove a > lot of nasty casting. Adding convenience methods would certainly also help > biojavax code does get a bit verbose. Again this would break interfaces > unless we put them in some kind of tools class. > > These are certainly good points to remember for future designs (eg biojava2) > usability should be a test criteria as well. BioJavaX gives excellent ORM > with BioSQL and great capture of detail in it's parsers but the coding style > is a bit unwieldy. 2 out of 3 is not bad though : ) > > - Mark > > On Thu, Mar 27, 2008 at 10:11 PM, Richard Holland wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Hello, >> >> Your code for getting the Note named "gene" is correct. I agree, a >> shorthand way of doing this would be lovely, but doesn't currently >> exist. (Such a method would have to do the same thing internally >> anyway). If you'd like to write one and add it in then you'd be most >> welcome! :) >> >> At the time that the BioJavaX extensions were written compatibility with >> Java 1.4 was still required and so we could not make use of any new Java >> features that were introduced in Java 1.5. Set, being an >> example of Generics, is one of these. >> >> Future versions of BioJava will require the user to install Java 1.6 or >> later and so we will be able to use these newer features in both new and >> existing code, depending on feasibility (for instance it is not always >> possible to convert older code to use Generics in a sensible manner, and >> it is not always possible to write Generics code that can interface >> sensibly with older non-Generics modules). >> >> cheers, >> Richard >> >> >> >> Martin Jones wrote: >>> Hi, >>> >>> I'm just getting started with BioJava so this may be a simple >>> question. I'm reading a RichSequence from a GenBank file and want to >>> get the gene name of each CDS feature. The following code gets hold >>> of the features I'm interested in >>> >>> for (Object o : mySeq.getFeatureSet()){ >>> RichFeature f = (RichFeature) o; >>> if (f.getType().equals("CDS")){ >>> //get gene name here >>> } >>> } >>> >>> but I'm not sure how best to get the gene name. The following seems to >> work: >>> for (Object o2 : f.getNoteSet()){ >>> Note n = (Note) o2; >>> if (n.getTerm().getName().equals("gene")){ >>> System.out.println("gene name is " + n.getValue()); >>> } >>> } >>> >>> but seems overly verbose - ideally I'd like to be able to pass the >>> Feature to another part of my program, but writing the above whenever >>> I want to get the name seems like overkill. Is there a shorter way - >>> something along the lines of >>> >>> String name = f.getNoteByName("gene").getValue(); >>> >>> Thanks in advance for any help. >>> >>> PS one more question - is there a reason why e.g. getNoteSet returns >>> a Set rather than a Set, which makes it necessary to do all >>> the type casts? >>> >>> Thanks, >>> >>> Martin >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> - -- >> Richard Holland (BioMart) >> EMBL EBI, Wellcome Trust Genome Campus, >> Hinxton, Cambridgeshire CB10 1SD, UK >> Tel. +44 (0)1223 494416 >> >> http://www.biomart.org/ >> http://www.biojava.org/ >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.2.2 (GNU/Linux) >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> >> iD8DBQFH66sO4C5LeMEKA/QRAmq9AJ4qyMw4eGVYIMZjVf5jcADVRQmzpQCeOXej >> mak90aLUhSF60DrWeRtM8o0= >> =0EOE >> -----END PGP SIGNATURE----- >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From ap3 at sanger.ac.uk Fri Mar 28 06:50:34 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Fri, 28 Mar 2008 10:50:34 +0000 Subject: [Biojava-l] how to get properties of a Feature from a GenBank file In-Reply-To: <47ECB7C2.3070800@ebi.ac.uk> References: <47EBAB0E.3020305@ebi.ac.uk> <93b45ca50803271855r12190578l137c4bcc723bfb9b@mail.gmail.com> <47ECB7C2.3070800@ebi.ac.uk> Message-ID: <7E037431-F0AB-48CD-A628-A2236E8BA95D@sanger.ac.uk> >> These are good points. Can we generify interfaces without >> breaking them? I don;t think that adding generics will break anything, e.g. old code: public interface MyTest { public Set getFeatures() } then some code that uses this: public void myFoo(){ MyTest test = new MyTestImpl(); Set features = test.getFeatures(); } this call will not break, even if we change the MyTest interface to: public Set getFeatures() MyTestImpl will get some warnings (in my eclipse), to ensure the type safety, but that is all. Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From holland at ebi.ac.uk Fri Mar 28 07:47:38 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Fri, 28 Mar 2008 11:47:38 +0000 Subject: [Biojava-l] how to get properties of a Feature from a GenBank file In-Reply-To: <7E037431-F0AB-48CD-A628-A2236E8BA95D@sanger.ac.uk> References: <47EBAB0E.3020305@ebi.ac.uk> <93b45ca50803271855r12190578l137c4bcc723bfb9b@mail.gmail.com> <47ECB7C2.3070800@ebi.ac.uk> <7E037431-F0AB-48CD-A628-A2236E8BA95D@sanger.ac.uk> Message-ID: <47ECDADA.4050800@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I was more worried about breaking concepts than breaking code. For instance, the concept of a SymbolList can be almost completely replaced by the use of a standard genericised colleciton, e.g. List or List . Likewise, the concept of a SequenceIterator is really just an Iterator (or maybe Iterator ? ). cheers, Richard Andreas Prlic wrote: > >>> These are good points. Can we generify interfaces without breaking >>> them? > > > > I don;t think that adding generics will break anything, e.g. > > old code: > > public interface MyTest { > public Set getFeatures() > } > > then some code that uses this: > > public void myFoo(){ > > MyTest test = new MyTestImpl(); > > Set features = test.getFeatures(); > } > > this call will not break, even if we change the MyTest interface to: > > public Set getFeatures() > > MyTestImpl will get some warnings (in my eclipse), to ensure the type > safety, but that is all. > > Andreas > > > > > > ----------------------------------------------------------------------- > > Andreas Prlic Wellcome Trust Sanger Institute > Hinxton, Cambridge CB10 1SA, UK > +44 (0) 1223 49 6891 > > ----------------------------------------------------------------------- > > > > > --The Wellcome Trust Sanger Institute is operated by Genome > ResearchLimited, a charity registered in England with number 1021457 and > acompany registered in England with number 2742969, whose > registeredoffice is 215 Euston Road, London, NW1 > 2BE._______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH7Nra4C5LeMEKA/QRAjW7AJ9r9RNv4ZaiqB7NsL1yrEGG6TawBwCfahDq 3paiRHHEIiuFxaRCAXYTFsA= =vh0r -----END PGP SIGNATURE----- From philipheller at comcast.net Fri Mar 28 11:46:33 2008 From: philipheller at comcast.net (philipheller at comcast.net) Date: Fri, 28 Mar 2008 15:46:33 +0000 Subject: [Biojava-l] how to get properties of a Feature from a GenBank file Message-ID: <032820081546.8495.47ED12D9000D53B70000212F22007507849D0A04040A089F070407089F@comcast.net> An embedded and charset-unspecified text was scrubbed... Name: not available URL: From markjschreiber at gmail.com Sat Mar 1 02:30:42 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 1 Mar 2008 10:30:42 +0800 Subject: [Biojava-l] RichLocation.Tools.merge(Collection members) method In-Reply-To: <47C839BC.8030506@ebi.ac.uk> References: <47C839BC.8030506@ebi.ac.uk> Message-ID: <93b45ca50802291830l58628d0bqbed7a1feb0e85e89@mail.gmail.com> Hi - This could be a corner case. Can you provide the code that actually generates this error? - Mark On Sat, Mar 1, 2008 at 12:58 AM, Arnaud Kerhornou wrote: > Hi everyone, > > I don't think the RichLocation.Tools.merge(Collection members) method is > doing it right. > > e.g. Input: > biojavax:join:[1157624..1158025,1158025..1158420,1158420..1158893] > Expected output:1157624..1158895 > > But I get: join:[1157624..1158420,1158420..1158894] > > I think the code should have the extra line: parent = union; > just after c=p; statement line 18 (See source code below), > otherwise it doesn't take into account the newly generated location. > > Is that right ? > > Thanks > Arnaud > > Source code: > > 1 public static Collection merge(Collection members) { > 2 // flatten them out first so we don't end up recursing > 3 List membersList = new ArrayList(flatten(members)); > 4 // all members are now singles so we can use single vs > single union operations > 5 if (membersList.size()>1) { > 6 for (int p = 0; p < (membersList.size()-1); p++) { > 7 RichLocation parent = (RichLocation)membersList.get(p); > 8 for (int c = p+1; c < membersList.size(); c++) { > 9 RichLocation child = > (RichLocation)membersList.get(c); > 10 RichLocation union = > (RichLocation)parent.union(child); > 11 // if parent can merge with child > 12 if (union.isContiguous()) { > 13 // replace parent with union > 14 membersList.set(p,union); > 15 // remove child > 16 membersList.remove(c); > 17 // check all children again > 18 c=p; > 19 } > 20 } > 21 } > 22 } > 23 return membersList; > 24 } > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From holland at ebi.ac.uk Mon Mar 3 08:41:58 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Mon, 03 Mar 2008 08:41:58 +0000 Subject: [Biojava-l] RichLocation.Tools.merge(Collection members) method In-Reply-To: <47C839BC.8030506@ebi.ac.uk> References: <47C839BC.8030506@ebi.ac.uk> Message-ID: <47CBB9D6.8090705@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Arnaud - you are correct. cheers, Richard Arnaud Kerhornou wrote: > Hi everyone, > > I don't think the RichLocation.Tools.merge(Collection members) method is > doing it right. > > e.g. Input: > biojavax:join:[1157624..1158025,1158025..1158420,1158420..1158893] > Expected output:1157624..1158895 > > But I get: join:[1157624..1158420,1158420..1158894] > > I think the code should have the extra line: parent = union; > just after c=p; statement line 18 (See source code below), > otherwise it doesn't take into account the newly generated location. > > Is that right ? > > Thanks > Arnaud > > Source code: > > 1 public static Collection merge(Collection members) { > 2 // flatten them out first so we don't end up recursing > 3 List membersList = new ArrayList(flatten(members)); > 4 // all members are now singles so we can use single vs > single union operations > 5 if (membersList.size()>1) { > 6 for (int p = 0; p < (membersList.size()-1); p++) { > 7 RichLocation parent = > (RichLocation)membersList.get(p); > 8 for (int c = p+1; c < membersList.size(); c++) { > 9 RichLocation child = > (RichLocation)membersList.get(c); > 10 RichLocation union = > (RichLocation)parent.union(child); > 11 // if parent can merge with child > 12 if (union.isContiguous()) { > 13 // replace parent with union > 14 membersList.set(p,union); > 15 // remove child > 16 membersList.remove(c); > 17 // check all children again > 18 c=p; > 19 } > 20 } > 21 } > 22 } > 23 return membersList; > 24 } > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHy7m04C5LeMEKA/QRAj05AJ9SBrv6yz8qvhwbmTrLZVfmwBuHTACgiq57 +J0EpviSyp2Qq00m4A8xLUA= =re75 -----END PGP SIGNATURE----- From markjschreiber at gmail.com Wed Mar 5 02:16:13 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 5 Mar 2008 10:16:13 +0800 Subject: [Biojava-l] biojava group on linkedin Message-ID: <93b45ca50803041816s28728b2eme99edcda863d252d@mail.gmail.com> Hi - BioJava is now a group on the networking site, linkedin (www.linkedin.com). If you have been involved in biojava in some way and want to join you can find it on the list of available groups when editing your profile. Best regards, - Mark From markjschreiber at gmail.com Wed Mar 5 02:20:52 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 5 Mar 2008 10:20:52 +0800 Subject: [Biojava-l] RichLocation.Tools.merge(Collection members) method In-Reply-To: <47CBB9D6.8090705@ebi.ac.uk> References: <47C839BC.8030506@ebi.ac.uk> <47CBB9D6.8090705@ebi.ac.uk> Message-ID: <93b45ca50803041820i38b06a68m9308e00eefb2824c@mail.gmail.com> Just to follow up... Has there been a fix checked in for this? - Mark On Mon, Mar 3, 2008 at 4:41 PM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Arnaud - you are correct. > > cheers, > Richard > > > > Arnaud Kerhornou wrote: > > Hi everyone, > > > > I don't think the RichLocation.Tools.merge(Collection members) method is > > doing it right. > > > > e.g. Input: > > biojavax:join:[1157624..1158025,1158025..1158420,1158420..1158893] > > Expected output:1157624..1158895 > > > > But I get: join:[1157624..1158420,1158420..1158894] > > > > I think the code should have the extra line: parent = union; > > just after c=p; statement line 18 (See source code below), > > otherwise it doesn't take into account the newly generated location. > > > > Is that right ? > > > > Thanks > > Arnaud > > > > Source code: > > > > 1 public static Collection merge(Collection members) { > > 2 // flatten them out first so we don't end up recursing > > 3 List membersList = new ArrayList(flatten(members)); > > 4 // all members are now singles so we can use single vs > > single union operations > > 5 if (membersList.size()>1) { > > 6 for (int p = 0; p < (membersList.size()-1); p++) { > > 7 RichLocation parent = > > (RichLocation)membersList.get(p); > > 8 for (int c = p+1; c < membersList.size(); c++) { > > 9 RichLocation child = > > (RichLocation)membersList.get(c); > > 10 RichLocation union = > > (RichLocation)parent.union(child); > > 11 // if parent can merge with child > > 12 if (union.isContiguous()) { > > 13 // replace parent with union > > 14 membersList.set(p,union); > > 15 // remove child > > 16 membersList.remove(c); > > 17 // check all children again > > 18 c=p; > > 19 } > > 20 } > > 21 } > > 22 } > > 23 return membersList; > > 24 } > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > - -- > Richard Holland (BioMart) > EMBL EBI, Wellcome Trust Genome Campus, > Hinxton, Cambridgeshire CB10 1SD, UK > Tel. +44 (0)1223 494416 > > http://www.biomart.org/ > http://www.biojava.org/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHy7m04C5LeMEKA/QRAj05AJ9SBrv6yz8qvhwbmTrLZVfmwBuHTACgiq57 > +J0EpviSyp2Qq00m4A8xLUA= > =re75 > -----END PGP SIGNATURE----- > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From holland at ebi.ac.uk Wed Mar 5 08:35:24 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Wed, 05 Mar 2008 08:35:24 +0000 Subject: [Biojava-l] RichLocation.Tools.merge(Collection members) method In-Reply-To: <93b45ca50803041820i38b06a68m9308e00eefb2824c@mail.gmail.com> References: <47C839BC.8030506@ebi.ac.uk> <47CBB9D6.8090705@ebi.ac.uk> <93b45ca50803041820i38b06a68m9308e00eefb2824c@mail.gmail.com> Message-ID: <47CE5B4C.3080506@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Not that I'm aware of. Mark Schreiber wrote: > Just to follow up... > > Has there been a fix checked in for this? > > - Mark > > On Mon, Mar 3, 2008 at 4:41 PM, Richard Holland wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Arnaud - you are correct. >> >> cheers, >> Richard >> >> >> >> Arnaud Kerhornou wrote: >> > Hi everyone, >> > >> > I don't think the RichLocation.Tools.merge(Collection members) method is >> > doing it right. >> > >> > e.g. Input: >> > biojavax:join:[1157624..1158025,1158025..1158420,1158420..1158893] >> > Expected output:1157624..1158895 >> > >> > But I get: join:[1157624..1158420,1158420..1158894] >> > >> > I think the code should have the extra line: parent = union; >> > just after c=p; statement line 18 (See source code below), >> > otherwise it doesn't take into account the newly generated location. >> > >> > Is that right ? >> > >> > Thanks >> > Arnaud >> > >> > Source code: >> > >> > 1 public static Collection merge(Collection members) { >> > 2 // flatten them out first so we don't end up recursing >> > 3 List membersList = new ArrayList(flatten(members)); >> > 4 // all members are now singles so we can use single vs >> > single union operations >> > 5 if (membersList.size()>1) { >> > 6 for (int p = 0; p < (membersList.size()-1); p++) { >> > 7 RichLocation parent = >> > (RichLocation)membersList.get(p); >> > 8 for (int c = p+1; c < membersList.size(); c++) { >> > 9 RichLocation child = >> > (RichLocation)membersList.get(c); >> > 10 RichLocation union = >> > (RichLocation)parent.union(child); >> > 11 // if parent can merge with child >> > 12 if (union.isContiguous()) { >> > 13 // replace parent with union >> > 14 membersList.set(p,union); >> > 15 // remove child >> > 16 membersList.remove(c); >> > 17 // check all children again >> > 18 c=p; >> > 19 } >> > 20 } >> > 21 } >> > 22 } >> > 23 return membersList; >> > 24 } >> > _______________________________________________ >> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> - -- >> Richard Holland (BioMart) >> EMBL EBI, Wellcome Trust Genome Campus, >> Hinxton, Cambridgeshire CB10 1SD, UK >> Tel. +44 (0)1223 494416 >> >> http://www.biomart.org/ >> http://www.biojava.org/ >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.2.2 (GNU/Linux) >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> >> iD8DBQFHy7m04C5LeMEKA/QRAj05AJ9SBrv6yz8qvhwbmTrLZVfmwBuHTACgiq57 >> +J0EpviSyp2Qq00m4A8xLUA= >> =re75 >> -----END PGP SIGNATURE----- >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHzltM4C5LeMEKA/QRAnk9AKCP7/0jmWk7h7rGd4+jwPkmUK9qUgCfe9Oz j+UWAU+q9orPHtpWgg48N70= =lBUD -----END PGP SIGNATURE----- From markjschreiber at gmail.com Thu Mar 6 06:24:32 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 6 Mar 2008 14:24:32 +0800 Subject: [Biojava-l] proposal to drop BioSQL Singapore version support from BJ version 1.6 onwards Message-ID: <93b45ca50803052224y5434d6d9t35fa180f33db35ca@mail.gmail.com> Hi all - With the formal release of BioSQL v1.0 (Tokyo) and the new Hibernate ORM mapping to BioSQL offered in BioJava 1.5 (BioJavaX) the older JDBC mappings to the BioSQL (Singapore) are now outdated. We have not been supporting these for some time. Unless people are still actively using these mappings I would propose that we drop them from the upcoming BioJava 1.6. This would remove some cruft from the code base and would also mean we can drop about 4 jar files from the lib (commons-pool etc). Are there any strong arguments for or against? - Mark From ayates at ebi.ac.uk Thu Mar 6 09:10:25 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Thu, 6 Mar 2008 09:10:25 +0000 Subject: [Biojava-l] [Biojava-dev] proposal to drop BioSQL Singapore version support from BJ version 1.6 onwards In-Reply-To: <93b45ca50803052224y5434d6d9t35fa180f33db35ca@mail.gmail.com> References: <93b45ca50803052224y5434d6d9t35fa180f33db35ca@mail.gmail.com> Message-ID: +1 Maintaining two apis is always a bad idea On 6 Mar 2008, at 06:24, Mark Schreiber wrote: > Hi all - > > With the formal release of BioSQL v1.0 (Tokyo) and the new Hibernate > ORM mapping to BioSQL offered in BioJava 1.5 (BioJavaX) the older JDBC > mappings to the BioSQL (Singapore) are now outdated. We have not been > supporting these for some time. > > Unless people are still actively using these mappings I would propose > that we drop them from the upcoming BioJava 1.6. This would remove > some cruft from the code base and would also mean we can drop about 4 > jar files from the lib (commons-pool etc). > > Are there any strong arguments for or against? > > - Mark > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From awollhe at gwdg.de Thu Mar 6 12:41:59 2008 From: awollhe at gwdg.de (Antje Wollherr) Date: Thu, 06 Mar 2008 13:41:59 +0100 Subject: [Biojava-l] Problem with EMBL and RichStreamReader Message-ID: <1204807319.23262.11.camel@antje-desktop> Hello biojava people, I am new to this list and also not very familiar with BioJava. I was trying to parse an EMBL File and extract the dna sequence with RichSequence.IOTools.readEMBLDNA(br, ns); . For most of the embl files this isn't a problem, but the file with accssion number AL009126 (Bacillus subtilis 168) caused an BioExcpetion saying the sequence could not be read. Can somebody tell where the problem is or how it can be solved? I'm using BioJava 1.5. Thank you a lot Antje Here is the error message: Exception Has Occurred During Parsing. Please submit the details that follow to biojava-l at biojava.org or post a bug report to http://bugzilla.open-bio.org/ Format_object=org.biojavax.bio.seq.io.EMBLFormat Accession=AL009126 Id=not set Comments=Unable to handle contig assemblies just yet Parse_block= Stack trace follows .... at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:730) at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:284) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) ... 2 more From holland at ebi.ac.uk Thu Mar 6 13:29:30 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 06 Mar 2008 13:29:30 +0000 Subject: [Biojava-l] Problem with EMBL and RichStreamReader In-Reply-To: <1204807319.23262.11.camel@antje-desktop> References: <1204807319.23262.11.camel@antje-desktop> Message-ID: <47CFF1BA.7060003@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The error message says it all: Comments=Unable to handle contig assemblies just yet BioJava does not yet support EMBL contig files such as AL009126. cheers, Richard Antje Wollherr wrote: > Hello biojava people, > > I am new to this list and also not very familiar with BioJava. I was > trying to parse an EMBL File and extract the dna sequence with > RichSequence.IOTools.readEMBLDNA(br, ns); . > > For most of the embl files this isn't a problem, but the file with > accssion number AL009126 (Bacillus subtilis 168) caused an BioExcpetion > saying the sequence could not be read. > > Can somebody tell where the problem is or how it can be solved? > I'm using BioJava 1.5. > > Thank you a lot > > Antje > > Here is the error message: > > Exception Has Occurred During Parsing. > Please submit the details that follow to biojava-l at biojava.org or post a > bug report to http://bugzilla.open-bio.org/ > > Format_object=org.biojavax.bio.seq.io.EMBLFormat > Accession=AL009126 > Id=not set > Comments=Unable to handle contig assemblies just yet > Parse_block= > Stack trace follows .... > > > at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:730) > at > org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:284) > at > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) > ... 2 more > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHz/G64C5LeMEKA/QRAn5tAJ9i3ocloh8HoPXW2BP4sINhkV1+jgCff44m DJIY1k+vicFOMoX4GCwdwGs= =8JwO -----END PGP SIGNATURE----- From awollhe at gwdg.de Thu Mar 6 13:57:39 2008 From: awollhe at gwdg.de (Antje Wollherr) Date: Thu, 06 Mar 2008 14:57:39 +0100 Subject: [Biojava-l] Problem with EMBL and RichStreamReader In-Reply-To: <47CFF1BA.7060003@ebi.ac.uk> References: <1204807319.23262.11.camel@antje-desktop> <47CFF1BA.7060003@ebi.ac.uk> Message-ID: <1204811859.24352.16.camel@antje-desktop> Hallo Richard, thank you for the fast response. Now I understand, what the error message means. Sorry for asking stupid questions but sometimes I don't see the wood for the trees. ;) Cheers, Antje On Thu, 2008-03-06 at 13:29 +0000, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > The error message says it all: > > Comments=Unable to handle contig assemblies just yet > > BioJava does not yet support EMBL contig files such as AL009126. > > cheers, > Richard > > Antje Wollherr wrote: > > Hello biojava people, > > > > I am new to this list and also not very familiar with BioJava. I was > > trying to parse an EMBL File and extract the dna sequence with > > RichSequence.IOTools.readEMBLDNA(br, ns); . > > > > For most of the embl files this isn't a problem, but the file with > > accssion number AL009126 (Bacillus subtilis 168) caused an BioExcpetion > > saying the sequence could not be read. > > > > Can somebody tell where the problem is or how it can be solved? > > I'm using BioJava 1.5. > > > > Thank you a lot > > > > Antje > > > > Here is the error message: > > > > Exception Has Occurred During Parsing. > > Please submit the details that follow to biojava-l at biojava.org or post a > > bug report to http://bugzilla.open-bio.org/ > > > > Format_object=org.biojavax.bio.seq.io.EMBLFormat > > Accession=AL009126 > > Id=not set > > Comments=Unable to handle contig assemblies just yet > > Parse_block= > > Stack trace follows .... > > > > > > at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:730) > > at > > org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:284) > > at > > org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) > > ... 2 more > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > - -- > Richard Holland (BioMart) > EMBL EBI, Wellcome Trust Genome Campus, > Hinxton, Cambridgeshire CB10 1SD, UK > Tel. +44 (0)1223 494416 > > http://www.biomart.org/ > http://www.biojava.org/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFHz/G64C5LeMEKA/QRAn5tAJ9i3ocloh8HoPXW2BP4sINhkV1+jgCff44m > DJIY1k+vicFOMoX4GCwdwGs= > =8JwO > -----END PGP SIGNATURE----- > -- Antje Wollherr, Diplom-Bioinformatikerin G?ttinger Genomlabor Institut f?r Mikrobiologie und Genetik Grisebachstra?e 8 37077 G?ttingen Email: awollhe at gwdg.de Tel.: 0551 393843 Fax: 0551 394195 From alex.johansson1 at gmail.com Fri Mar 7 20:22:45 2008 From: alex.johansson1 at gmail.com (alex johansson) Date: Fri, 7 Mar 2008 21:22:45 +0100 Subject: [Biojava-l] Dp newbie question!! Message-ID: <33380be30803071222l13357a33l4bf2c9e009d3d7fc@mail.gmail.com> Hi, Iam a Cell biology student with a growing interest in biojava, although i have a very basic biojava experience but the cookbook examples makes it easy to get around with api. My question is very basic and might sound very stupid, i followed the cookbook example on creating a HMMER like profileHMM and made a profile with a set 12 training sequences (19bp) and tested it with a test sequence with motif occuring twice.Below is the output from the program: Log Odds = 43.786769243019506 m-1 m-2 m-3 m-4 m-5 m-6 d-7 m-8 m-9 m-10 m-11 m-12 m-13 m-14 m-15 m-16 m-17 d-18 i-18 d-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 m-20 i-20 My Question is how to interpret these results?how do i know if the motif is occuring twice and its location in the test sequence?? I know that 'm' stands for match and 'i' and 'd' stands for insert and delete transitions in the path from start to end. I'd certainly appreciate if the biojava gurus out there could spend some of their valuable time in explaining this. Thank you for your time, Cheers, Alex J From markjschreiber at gmail.com Sat Mar 8 14:41:44 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 8 Mar 2008 22:41:44 +0800 Subject: [Biojava-l] Dp newbie question!! In-Reply-To: <33380be30803071222l13357a33l4bf2c9e009d3d7fc@mail.gmail.com> References: <33380be30803071222l13357a33l4bf2c9e009d3d7fc@mail.gmail.com> Message-ID: <93b45ca50803080641v51a8111egbd3a8c34a8941c86@mail.gmail.com> Hi Alex - Good to know that the cookbook is helpful in getting you started. You are correct about the state path of matches and deletes etc. The limitation of the model you probably used is that it doesn't loop back on itself and can by definition find only one match to a repeated motif. There are two ways to deal with this. One would be to wire up the model (set the transition alphabets and probs) so that the model can repeat (or at least repeat the motif part). The other would be to apply the model to a sliding window. The second approach requires less understanding of the DP package but is much less efficient and if you interested in interpreting the forwards and backwards probs it would be a bit hard to correct for the sliding window. Hope this helps. - Mark On Sat, Mar 8, 2008 at 4:22 AM, alex johansson wrote: > Hi, > > Iam a Cell biology student with a growing interest in biojava, although i > have a very basic biojava experience but the cookbook > examples makes it easy to get around with api. My question is very basic and > might sound very stupid, i followed the cookbook example on creating a HMMER > like profileHMM and made a profile with a set 12 training sequences (19bp) > and tested it with a test sequence with motif occuring twice.Below is the > output from the program: > > Log Odds = 43.786769243019506 > m-1 m-2 m-3 m-4 m-5 m-6 d-7 m-8 m-9 m-10 m-11 m-12 m-13 m-14 m-15 m-16 m-17 > d-18 i-18 d-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 > i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 i-19 m-20 i-20 > > My Question is how to interpret these results?how do i know if the motif is > occuring twice and its location in the test sequence?? I know that 'm' > stands for match and 'i' and 'd' stands for insert and delete transitions in > the path from start to end. > > I'd certainly appreciate if the biojava gurus out there could spend some of > their valuable time in explaining this. > > Thank you for your time, > > Cheers, > Alex J > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From pwrose at ucsd.edu Tue Mar 11 22:34:33 2008 From: pwrose at ucsd.edu (Peter Rose) Date: Tue, 11 Mar 2008 15:34:33 -0700 Subject: [Biojava-l] [Job] Scientific Software Developer - RCSB PDB La Jolla, CA Message-ID: <006701c883c8$14b0fb60$3e12f220$@edu> The RCSB Protein Data Bank at the University of California San Diego has an open position for a junior Scientific Software Developer: http://joblink.ucsd.edu/bulletin/job.html?cat=new&job_id=45537 Please contact Dr. Peter Rose at pwrose at sdsc.edu or Dr. Phil Bourne at bourne at sdsc.edu about this position. From pwrose at ucsd.edu Thu Mar 13 00:21:48 2008 From: pwrose at ucsd.edu (Peter Rose) Date: Wed, 12 Mar 2008 17:21:48 -0700 Subject: [Biojava-l] [Job] Lead Web Architect - RCSB PDB at UCSD, La Jolla, CA Message-ID: <001901c884a0$3a4826e0$aed874a0$@edu> The RCSB Protein Data Bank has an exciting position for a Lead Web Architect to help shape the future presentation layer of the website. A detailed description of the job is available: http://joblink.ucsd.edu/bulletin/job.html?cat=new&job_id=44789 Please contact Dr. Peter Rose at pwrose at sdsc.edu or Dr. Phil Bourne at bourne at sdsc.edu about this position. From adf at ncgr.org Thu Mar 13 19:06:40 2008 From: adf at ncgr.org (Andrew Farmer) Date: Thu, 13 Mar 2008 13:06:40 -0600 Subject: [Biojava-l] using phred calls with ChromatogramGraphic produced from ab1 files Message-ID: <47D97B40.2040100@ncgr.org> Hi all- I have been trying to use the ChromatogramGraphic class to display ABI chromatogram data, whilst relating this to alignments of sequences called from these trace files with phred. For example, if the user clicks a putatively polymorphic base in an alignment viewer, to scroll and highlight the region of the ChromatogramGraphic corresponding to the ase call. However, I seem to be having some difficulty in establishing correspondences between the phred base calls and the information shown in the graphic. As far as I understand what is being displayed by ChromatogramGraphic, it is drawing "callboxes" around peaks corresponding to calls that are stored by Chromatogram, which in turn is storing information about the base calls that was encoded in the ABI file. These tend to differ substantially (e.g. in lower-quality areas) from the calls made by phred- e.g. an untrimmed phred-called sequence might have 1300 bases to the abi-called version's 900 bases. So, I am trying to find some way to get the ChromatogramGraphic callboxes to reflect the calls made by phred. Has anyone else encountered this type of situation before? It appears that phred's phd output encodes a trace offset for each of its calls, so I would guess that one could conceivably overlay the phred calls into a chromatogram produced by the abi parser in order to get the callboxes to reflect phred's interpretation of the trace data. I could be way off-base (no pun intended) in my interpretation, and would appreciate any insights from the gurus out there. And if this is more or less correct, and there is not yet a canned solution, any advice on how to go about coding it in a way that could be contributed back to the project would be great. Thanks in advance -- Andrew Farmer adf at ncgr.org (505) 995-4464 Database Administrator/Software Developer National Center for Genome Resources --- "To live in the presence of great truths and eternal laws, to be led by permanent ideals- that is what keeps a man patient when the world ignores him, and calm and unspoiled when the world praises him." -Balzac --- From adf at ncgr.org Thu Mar 13 23:27:46 2008 From: adf at ncgr.org (Andrew Farmer) Date: Thu, 13 Mar 2008 17:27:46 -0600 Subject: [Biojava-l] jRe: using phred calls with ChromatogramGraphic produced from ab1 files In-Reply-To: References: <47D97B40.2040100@ncgr.org> Message-ID: <47D9B872.4040308@ncgr.org> Eric- thanks very much for your insights, the phred -c trick might in fact be exactly what I need. If not, I will check out the other code you have sent and follow up off-list if I have further questions. Andrew Eric Haugen wrote: > > Hi Andrew, > > It looks like what I did three years ago was turn off the > ChromatogramGraphic's call boxes: > > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_SEPARATORS, > Boolean.FALSE ); > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_A, Boolean.FALSE ); > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_C, Boolean.FALSE ); > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_G, Boolean.FALSE ); > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_T, Boolean.FALSE ); > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_OTHER, > Boolean.FALSE ); > > then set the range based on the chromat positions in the phred file: > > graphic.setOption(ChromatogramGraphic.Option.FROM_TRACE_SAMPLE, > startIndex ); > graphic.setOption(ChromatogramGraphic.Option.TO_TRACE_SAMPLE, endIndex ); > > and finally just draw Phred's base calls myself based on the chromat > position scale. > > I don't know if you'll find it useful, but I've attached > PhdSequence.java and support code which I use as an alternative to > PhredSequence currently in biojava, which I think ignores chromat > positions. > > But the easiest solution may be to have "phred -c" convert your ABI > chromats to SCF files containing the phred base calls. > > -- > Eric Haugen > Software Engineer > University of Washington Genome Center > ehaugen at u.washington.edu > (206) 616-7582 > > On Thu, 13 Mar 2008, Andrew Farmer wrote: > >> Hi all- >> I have been trying to use the ChromatogramGraphic class to display ABI >> chromatogram data, whilst relating this to alignments of sequences >> called from these trace files with phred. For example, if the user >> clicks a putatively polymorphic base in an alignment viewer, to scroll >> and highlight the region of the ChromatogramGraphic corresponding to >> the ase call. However, I seem to be having some difficulty in >> establishing correspondences between the phred base calls and the >> information shown in the graphic. >> >> As far as I understand what is being displayed by ChromatogramGraphic, >> it is drawing "callboxes" around peaks corresponding to calls that are >> stored by Chromatogram, which in turn is storing information about the >> base calls that was encoded in the ABI file. These tend to differ >> substantially (e.g. in lower-quality areas) from the calls made by >> phred- e.g. an untrimmed phred-called sequence might have 1300 bases >> to the abi-called version's 900 bases. So, I am trying to find some >> way to get the ChromatogramGraphic callboxes to reflect the calls made >> by phred. >> >> Has anyone else encountered this type of situation before? It appears >> that phred's phd output encodes a trace offset for each of its calls, >> so I would guess that one could conceivably overlay the phred calls >> into a chromatogram produced by the abi parser in order to get the >> callboxes to reflect phred's interpretation of the trace data. >> >> I could be way off-base (no pun intended) in my interpretation, and >> would appreciate any insights from the gurus out there. And if this is >> more or less correct, and there is not yet a canned solution, any >> advice on how to go about coding it in a way that could be contributed >> back to the project would be great. >> >> Thanks in advance >> -- >> >> Andrew Farmer >> adf at ncgr.org >> (505) 995-4464 >> Database Administrator/Software Developer >> National Center for Genome Resources >> >> --- >> "To live in the presence of great truths and eternal laws, >> to be led by permanent ideals- >> that is what keeps a man patient when the world ignores him, >> and calm and unspoiled when the world praises him." >> -Balzac >> --- >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> -- Andrew Farmer adf at ncgr.org (505) 995-4464 Database Administrator/Software Developer National Center for Genome Resources --- "To live in the presence of great truths and eternal laws, to be led by permanent ideals- that is what keeps a man patient when the world ignores him, and calm and unspoiled when the world praises him." -Balzac --- From markjschreiber at gmail.com Fri Mar 14 12:48:59 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 14 Mar 2008 20:48:59 +0800 Subject: [Biojava-l] jRe: using phred calls with ChromatogramGraphic produced from ab1 files In-Reply-To: <47D9B872.4040308@ncgr.org> References: <47D97B40.2040100@ncgr.org> <47D9B872.4040308@ncgr.org> Message-ID: <93b45ca50803140548y1fc37831m1f071c7bc4f95370@mail.gmail.com> Some code examples like this would be great on the biojava.org cookbook. If people could add their GUI code that would be excellent. - Mark On Fri, Mar 14, 2008 at 7:27 AM, Andrew Farmer wrote: > Eric- > thanks very much for your insights, the phred -c trick might in fact be > exactly what I need. If not, I will check out the other code you have > sent and follow up off-list if I have further questions. > > > Andrew > > > Eric Haugen wrote: > > > > Hi Andrew, > > > > It looks like what I did three years ago was turn off the > > ChromatogramGraphic's call boxes: > > > > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_SEPARATORS, > > Boolean.FALSE ); > > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_A, Boolean.FALSE ); > > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_C, Boolean.FALSE ); > > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_G, Boolean.FALSE ); > > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_T, Boolean.FALSE ); > > graphic.setOption(ChromatogramGraphic.Option.DRAW_CALL_OTHER, > > Boolean.FALSE ); > > > > then set the range based on the chromat positions in the phred file: > > > > graphic.setOption(ChromatogramGraphic.Option.FROM_TRACE_SAMPLE, > > startIndex ); > > graphic.setOption(ChromatogramGraphic.Option.TO_TRACE_SAMPLE, endIndex ); > > > > and finally just draw Phred's base calls myself based on the chromat > > position scale. > > > > I don't know if you'll find it useful, but I've attached > > PhdSequence.java and support code which I use as an alternative to > > PhredSequence currently in biojava, which I think ignores chromat > > positions. > > > > But the easiest solution may be to have "phred -c" convert your ABI > > chromats to SCF files containing the phred base calls. > > > > -- > > Eric Haugen > > Software Engineer > > University of Washington Genome Center > > ehaugen at u.washington.edu > > (206) 616-7582 > > > > On Thu, 13 Mar 2008, Andrew Farmer wrote: > > > >> Hi all- > >> I have been trying to use the ChromatogramGraphic class to display ABI > >> chromatogram data, whilst relating this to alignments of sequences > >> called from these trace files with phred. For example, if the user > >> clicks a putatively polymorphic base in an alignment viewer, to scroll > >> and highlight the region of the ChromatogramGraphic corresponding to > >> the ase call. However, I seem to be having some difficulty in > >> establishing correspondences between the phred base calls and the > >> information shown in the graphic. > >> > >> As far as I understand what is being displayed by ChromatogramGraphic, > >> it is drawing "callboxes" around peaks corresponding to calls that are > >> stored by Chromatogram, which in turn is storing information about the > >> base calls that was encoded in the ABI file. These tend to differ > >> substantially (e.g. in lower-quality areas) from the calls made by > >> phred- e.g. an untrimmed phred-called sequence might have 1300 bases > >> to the abi-called version's 900 bases. So, I am trying to find some > >> way to get the ChromatogramGraphic callboxes to reflect the calls made > >> by phred. > >> > >> Has anyone else encountered this type of situation before? It appears > >> that phred's phd output encodes a trace offset for each of its calls, > >> so I would guess that one could conceivably overlay the phred calls > >> into a chromatogram produced by the abi parser in order to get the > >> callboxes to reflect phred's interpretation of the trace data. > >> > >> I could be way off-base (no pun intended) in my interpretation, and > >> would appreciate any insights from the gurus out there. And if this is > >> more or less correct, and there is not yet a canned solution, any > >> advice on how to go about coding it in a way that could be contributed > >> back to the project would be great. > >> > >> Thanks in advance > >> -- > >> > >> Andrew Farmer > >> adf at ncgr.org > >> (505) 995-4464 > >> Database Administrator/Software Developer > >> National Center for Genome Resources > >> > >> --- > >> "To live in the presence of great truths and eternal laws, > >> to be led by permanent ideals- > >> that is what keeps a man patient when the world ignores him, > >> and calm and unspoiled when the world praises him." > >> -Balzac > >> --- > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > > -- > > Andrew Farmer > adf at ncgr.org > (505) 995-4464 > Database Administrator/Software Developer > National Center for Genome Resources > > --- > "To live in the presence of great truths and eternal laws, > to be led by permanent ideals- > that is what keeps a man patient when the world ignores him, > and calm and unspoiled when the world praises him." > -Balzac > --- > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From yangkuan81 at gmail.com Sun Mar 23 06:42:14 2008 From: yangkuan81 at gmail.com (Kuan Yang) Date: Sun, 23 Mar 2008 02:42:14 -0400 Subject: [Biojava-l] About ORFs Message-ID: Hi Guys, I am new to biojava and don't know how to do a lot of things with it. One of them is how to get ORF prediction with it. Any help will be appreciated. BTW, Is there a detailed manual for it? Thanks Kuan From yangkuan81 at gmail.com Sun Mar 23 08:17:52 2008 From: yangkuan81 at gmail.com (Kuan Yang) Date: Sun, 23 Mar 2008 04:17:52 -0400 Subject: [Biojava-l] How to translate RNA into Proteins with different codon tables? Message-ID: Hi all, Sorry, another question, how to translate a RNA into Protein with different codon table? All I can find is the "translate" method that uses the standard codon table (1). Thanks you so much in advance!!! Kuan From markjschreiber at gmail.com Sun Mar 23 12:46:44 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sun, 23 Mar 2008 20:46:44 +0800 Subject: [Biojava-l] About ORFs In-Reply-To: References: Message-ID: <93b45ca50803230546w5a8d61b3y7f184d95f0e0c45b@mail.gmail.com> Hi - For a 'manual' of biojava, take a look at the documentation section of the biojava wiki (www.biojava.org). Specifically the tutorial (http://biojava.org/wiki/BioJava:Tutorial) and the cookbook (http://biojava.org/wiki/BioJava:CookBook). There is more than one way to get an ORF prediction. One way would be to do a six frame translation (http://biojava.org/wiki/BioJava:Cookbook:Translation:SixFrames) a more sophisticated way would be to make a Hidden Markov Model that distinguishes coding ORFs from non coding ORFs. You could do this using the DP package. There is no specific tutorial but you can get some hints from the Dynamic Programming part of the cookbook. One of the interesting things about BioJava is that it is much more low level than something like EMBOSS. Rather than doing lots of things it enables you to write your own programs to do things. Hope this helps. Welcome to BioJava. - Mark On Sun, Mar 23, 2008 at 2:42 PM, Kuan Yang wrote: > Hi Guys, > > I am new to biojava and don't know how to do a lot of things with it. > One of them is how to get ORF prediction with it. Any help will be > appreciated. > BTW, Is there a detailed manual for it? > > Thanks > > Kuan > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From martin.jones at ed.ac.uk Thu Mar 27 13:45:50 2008 From: martin.jones at ed.ac.uk (Martin Jones) Date: Thu, 27 Mar 2008 13:45:50 +0000 Subject: [Biojava-l] how to get properties of a Feature from a GenBank file Message-ID: Hi, I'm just getting started with BioJava so this may be a simple question. I'm reading a RichSequence from a GenBank file and want to get the gene name of each CDS feature. The following code gets hold of the features I'm interested in for (Object o : mySeq.getFeatureSet()){ RichFeature f = (RichFeature) o; if (f.getType().equals("CDS")){ //get gene name here } } but I'm not sure how best to get the gene name. The following seems to work: for (Object o2 : f.getNoteSet()){ Note n = (Note) o2; if (n.getTerm().getName().equals("gene")){ System.out.println("gene name is " + n.getValue()); } } but seems overly verbose - ideally I'd like to be able to pass the Feature to another part of my program, but writing the above whenever I want to get the name seems like overkill. Is there a shorter way - something along the lines of String name = f.getNoteByName("gene").getValue(); Thanks in advance for any help. PS one more question - is there a reason why e.g. getNoteSet returns a Set rather than a Set, which makes it necessary to do all the type casts? Thanks, Martin From holland at ebi.ac.uk Thu Mar 27 14:11:26 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 27 Mar 2008 14:11:26 +0000 Subject: [Biojava-l] how to get properties of a Feature from a GenBank file In-Reply-To: References: Message-ID: <47EBAB0E.3020305@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hello, Your code for getting the Note named "gene" is correct. I agree, a shorthand way of doing this would be lovely, but doesn't currently exist. (Such a method would have to do the same thing internally anyway). If you'd like to write one and add it in then you'd be most welcome! :) At the time that the BioJavaX extensions were written compatibility with Java 1.4 was still required and so we could not make use of any new Java features that were introduced in Java 1.5. Set, being an example of Generics, is one of these. Future versions of BioJava will require the user to install Java 1.6 or later and so we will be able to use these newer features in both new and existing code, depending on feasibility (for instance it is not always possible to convert older code to use Generics in a sensible manner, and it is not always possible to write Generics code that can interface sensibly with older non-Generics modules). cheers, Richard Martin Jones wrote: > Hi, > > I'm just getting started with BioJava so this may be a simple > question. I'm reading a RichSequence from a GenBank file and want to > get the gene name of each CDS feature. The following code gets hold > of the features I'm interested in > > for (Object o : mySeq.getFeatureSet()){ > RichFeature f = (RichFeature) o; > if (f.getType().equals("CDS")){ > //get gene name here > } > } > > but I'm not sure how best to get the gene name. The following seems to work: > > for (Object o2 : f.getNoteSet()){ > Note n = (Note) o2; > if (n.getTerm().getName().equals("gene")){ > System.out.println("gene name is " + n.getValue()); > } > } > > but seems overly verbose - ideally I'd like to be able to pass the > Feature to another part of my program, but writing the above whenever > I want to get the name seems like overkill. Is there a shorter way - > something along the lines of > > String name = f.getNoteByName("gene").getValue(); > > Thanks in advance for any help. > > PS one more question - is there a reason why e.g. getNoteSet returns > a Set rather than a Set, which makes it necessary to do all > the type casts? > > Thanks, > > Martin > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH66sO4C5LeMEKA/QRAmq9AJ4qyMw4eGVYIMZjVf5jcADVRQmzpQCeOXej mak90aLUhSF60DrWeRtM8o0= =0EOE -----END PGP SIGNATURE----- From su24 at st-andrews.ac.uk Thu Mar 27 17:46:38 2008 From: su24 at st-andrews.ac.uk (Saif Ur-Rehman) Date: Thu, 27 Mar 2008 17:46:38 +0000 Subject: [Biojava-l] Problem Message-ID: <1206639998.47ebdd7e5f32c@webmail.st-andrews.ac.uk> Dear All, I am attempting to split up a Fasta file of an entire genomes amino acid sequences into separate files for each individual gene. I am simply reading the Fasta file of the entire genome as a Sequence Db and then iterating around it creating a new file for each Sequence and writing it out to that file. However out of a Fasta file containing 13465 genes only 12945 are written out to their own individual files. This does not occur in files which lack the termination symbol i.e do not use the Alphabet ("PROTEIN_TERM"). I was wondering if you could suggest any reason why this might occur as I am completely mystified. Thanking you in advance, Saif ------------------------------------------------------------------------------- Saif Ur-Rehman Research Student The Centre for Evolution, Genes & Genomics (CEGG) Dyers Brae School of Biology The University of St Andrews St Andrews, Fife Scotland,UK ------------------------------------------------------------------ University of St Andrews Webmail: https://webmail.st-andrews.ac.uk From su24 at st-andrews.ac.uk Thu Mar 27 17:46:40 2008 From: su24 at st-andrews.ac.uk (Saif Ur-Rehman) Date: Thu, 27 Mar 2008 17:46:40 +0000 Subject: [Biojava-l] Problem Message-ID: <1206640000.47ebdd802a753@webmail.st-andrews.ac.uk> Dear All, I am attempting to split up a Fasta file of an entire genomes amino acid sequences into separate files for each individual gene. I am simply reading the Fasta file of the entire genome as a Sequence Db and then iterating around it creating a new file for each Sequence and writing it out to that file. However out of a Fasta file containing 13465 genes only 12945 are written out to their own individual files. This does not occur in files which lack the termination symbol i.e do not use the Alphabet ("PROTEIN_TERM"). I was wondering if you could suggest any reason why this might occur as I am completely mystified. Thanking you in advance, Saif ------------------------------------------------------------------------------- Saif Ur-Rehman Research Student The Centre for Evolution, Genes & Genomics (CEGG) Dyers Brae School of Biology The University of St Andrews St Andrews, Fife Scotland,UK ------------------------------------------------------------------ University of St Andrews Webmail: https://webmail.st-andrews.ac.uk From holland at ebi.ac.uk Thu Mar 27 18:16:41 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Thu, 27 Mar 2008 18:16:41 -0000 (GMT) Subject: [Biojava-l] Problem In-Reply-To: <1206640000.47ebdd802a753@webmail.st-andrews.ac.uk> References: <1206640000.47ebdd802a753@webmail.st-andrews.ac.uk> Message-ID: <42786.80.42.15.154.1206641801.squirrel@webmail.ebi.ac.uk> I'm afraid we can't be much help unless we can see the actual code you have written to do this job! cheers, Richard On Thu, March 27, 2008 5:46 pm, Saif Ur-Rehman wrote: > > Dear All, > > I am attempting to split up a Fasta file of an entire genomes amino acid > sequences into separate files for each individual gene. I am simply > reading the > Fasta file of the entire genome as a Sequence Db and then iterating around > it > creating a new file for each Sequence and writing it out to that file. > However > out of a Fasta file containing 13465 genes only 12945 are written out to > their > own individual files. This does not occur in files which lack the > termination > symbol i.e do not use the Alphabet ("PROTEIN_TERM"). I was wondering if > you > could suggest any reason why this might occur as I am completely > mystified. > > Thanking you in advance, > > Saif > > > ------------------------------------------------------------------------------- > Saif Ur-Rehman > Research Student > The Centre for Evolution, Genes & Genomics (CEGG) > Dyers Brae > School of Biology > The University of St Andrews > St Andrews, > Fife > Scotland,UK > > ------------------------------------------------------------------ > University of St Andrews Webmail: https://webmail.st-andrews.ac.uk > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland BioMart (http://www.biomart.org/) EMBL-EBI Hinxton, Cambridgeshire CB10 1SD, UK From markjschreiber at gmail.com Fri Mar 28 01:55:38 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Fri, 28 Mar 2008 09:55:38 +0800 Subject: [Biojava-l] how to get properties of a Feature from a GenBank file In-Reply-To: <47EBAB0E.3020305@ebi.ac.uk> References: <47EBAB0E.3020305@ebi.ac.uk> Message-ID: <93b45ca50803271855r12190578l137c4bcc723bfb9b@mail.gmail.com> These are good points. Can we generify interfaces without breaking them? Certainly the addition of generics to the biojavax packages would remove a lot of nasty casting. Adding convenience methods would certainly also help biojavax code does get a bit verbose. Again this would break interfaces unless we put them in some kind of tools class. These are certainly good points to remember for future designs (eg biojava2) usability should be a test criteria as well. BioJavaX gives excellent ORM with BioSQL and great capture of detail in it's parsers but the coding style is a bit unwieldy. 2 out of 3 is not bad though : ) - Mark On Thu, Mar 27, 2008 at 10:11 PM, Richard Holland wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hello, > > Your code for getting the Note named "gene" is correct. I agree, a > shorthand way of doing this would be lovely, but doesn't currently > exist. (Such a method would have to do the same thing internally > anyway). If you'd like to write one and add it in then you'd be most > welcome! :) > > At the time that the BioJavaX extensions were written compatibility with > Java 1.4 was still required and so we could not make use of any new Java > features that were introduced in Java 1.5. Set, being an > example of Generics, is one of these. > > Future versions of BioJava will require the user to install Java 1.6 or > later and so we will be able to use these newer features in both new and > existing code, depending on feasibility (for instance it is not always > possible to convert older code to use Generics in a sensible manner, and > it is not always possible to write Generics code that can interface > sensibly with older non-Generics modules). > > cheers, > Richard > > > > Martin Jones wrote: > > Hi, > > > > I'm just getting started with BioJava so this may be a simple > > question. I'm reading a RichSequence from a GenBank file and want to > > get the gene name of each CDS feature. The following code gets hold > > of the features I'm interested in > > > > for (Object o : mySeq.getFeatureSet()){ > > RichFeature f = (RichFeature) o; > > if (f.getType().equals("CDS")){ > > //get gene name here > > } > > } > > > > but I'm not sure how best to get the gene name. The following seems to > work: > > > > for (Object o2 : f.getNoteSet()){ > > Note n = (Note) o2; > > if (n.getTerm().getName().equals("gene")){ > > System.out.println("gene name is " + n.getValue()); > > } > > } > > > > but seems overly verbose - ideally I'd like to be able to pass the > > Feature to another part of my program, but writing the above whenever > > I want to get the name seems like overkill. Is there a shorter way - > > something along the lines of > > > > String name = f.getNoteByName("gene").getValue(); > > > > Thanks in advance for any help. > > > > PS one more question - is there a reason why e.g. getNoteSet returns > > a Set rather than a Set, which makes it necessary to do all > > the type casts? > > > > Thanks, > > > > Martin > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > - -- > Richard Holland (BioMart) > EMBL EBI, Wellcome Trust Genome Campus, > Hinxton, Cambridgeshire CB10 1SD, UK > Tel. +44 (0)1223 494416 > > http://www.biomart.org/ > http://www.biojava.org/ > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org > > iD8DBQFH66sO4C5LeMEKA/QRAmq9AJ4qyMw4eGVYIMZjVf5jcADVRQmzpQCeOXej > mak90aLUhSF60DrWeRtM8o0= > =0EOE > -----END PGP SIGNATURE----- > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From ayates at ebi.ac.uk Fri Mar 28 09:17:54 2008 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 28 Mar 2008 09:17:54 +0000 Subject: [Biojava-l] how to get properties of a Feature from a GenBank file In-Reply-To: <93b45ca50803271855r12190578l137c4bcc723bfb9b@mail.gmail.com> References: <47EBAB0E.3020305@ebi.ac.uk> <93b45ca50803271855r12190578l137c4bcc723bfb9b@mail.gmail.com> Message-ID: <47ECB7C2.3070800@ebi.ac.uk> As with most things the answer is a yes & no. It shouldn't break interfaces as in the contract setup between the interface & a consumer since generics are erased after compilation. However if we're looking at class incompatibility errors then it's quite likely that the new interface will have a different signature to the original one & may cause runtime errors in classes which haven't been recompiled against the new interface. A solution in some other projects is to have to sets of interfaces; one with generics & one without but that can be quite a nightmare to maintain. Andy Mark Schreiber wrote: > These are good points. Can we generify interfaces without breaking them? > Certainly the addition of generics to the biojavax packages would remove a > lot of nasty casting. Adding convenience methods would certainly also help > biojavax code does get a bit verbose. Again this would break interfaces > unless we put them in some kind of tools class. > > These are certainly good points to remember for future designs (eg biojava2) > usability should be a test criteria as well. BioJavaX gives excellent ORM > with BioSQL and great capture of detail in it's parsers but the coding style > is a bit unwieldy. 2 out of 3 is not bad though : ) > > - Mark > > On Thu, Mar 27, 2008 at 10:11 PM, Richard Holland wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Hello, >> >> Your code for getting the Note named "gene" is correct. I agree, a >> shorthand way of doing this would be lovely, but doesn't currently >> exist. (Such a method would have to do the same thing internally >> anyway). If you'd like to write one and add it in then you'd be most >> welcome! :) >> >> At the time that the BioJavaX extensions were written compatibility with >> Java 1.4 was still required and so we could not make use of any new Java >> features that were introduced in Java 1.5. Set, being an >> example of Generics, is one of these. >> >> Future versions of BioJava will require the user to install Java 1.6 or >> later and so we will be able to use these newer features in both new and >> existing code, depending on feasibility (for instance it is not always >> possible to convert older code to use Generics in a sensible manner, and >> it is not always possible to write Generics code that can interface >> sensibly with older non-Generics modules). >> >> cheers, >> Richard >> >> >> >> Martin Jones wrote: >>> Hi, >>> >>> I'm just getting started with BioJava so this may be a simple >>> question. I'm reading a RichSequence from a GenBank file and want to >>> get the gene name of each CDS feature. The following code gets hold >>> of the features I'm interested in >>> >>> for (Object o : mySeq.getFeatureSet()){ >>> RichFeature f = (RichFeature) o; >>> if (f.getType().equals("CDS")){ >>> //get gene name here >>> } >>> } >>> >>> but I'm not sure how best to get the gene name. The following seems to >> work: >>> for (Object o2 : f.getNoteSet()){ >>> Note n = (Note) o2; >>> if (n.getTerm().getName().equals("gene")){ >>> System.out.println("gene name is " + n.getValue()); >>> } >>> } >>> >>> but seems overly verbose - ideally I'd like to be able to pass the >>> Feature to another part of my program, but writing the above whenever >>> I want to get the name seems like overkill. Is there a shorter way - >>> something along the lines of >>> >>> String name = f.getNoteByName("gene").getValue(); >>> >>> Thanks in advance for any help. >>> >>> PS one more question - is there a reason why e.g. getNoteSet returns >>> a Set rather than a Set, which makes it necessary to do all >>> the type casts? >>> >>> Thanks, >>> >>> Martin >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> - -- >> Richard Holland (BioMart) >> EMBL EBI, Wellcome Trust Genome Campus, >> Hinxton, Cambridgeshire CB10 1SD, UK >> Tel. +44 (0)1223 494416 >> >> http://www.biomart.org/ >> http://www.biojava.org/ >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.2.2 (GNU/Linux) >> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org >> >> iD8DBQFH66sO4C5LeMEKA/QRAmq9AJ4qyMw4eGVYIMZjVf5jcADVRQmzpQCeOXej >> mak90aLUhSF60DrWeRtM8o0= >> =0EOE >> -----END PGP SIGNATURE----- >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From ap3 at sanger.ac.uk Fri Mar 28 10:50:34 2008 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Fri, 28 Mar 2008 10:50:34 +0000 Subject: [Biojava-l] how to get properties of a Feature from a GenBank file In-Reply-To: <47ECB7C2.3070800@ebi.ac.uk> References: <47EBAB0E.3020305@ebi.ac.uk> <93b45ca50803271855r12190578l137c4bcc723bfb9b@mail.gmail.com> <47ECB7C2.3070800@ebi.ac.uk> Message-ID: <7E037431-F0AB-48CD-A628-A2236E8BA95D@sanger.ac.uk> >> These are good points. Can we generify interfaces without >> breaking them? I don;t think that adding generics will break anything, e.g. old code: public interface MyTest { public Set getFeatures() } then some code that uses this: public void myFoo(){ MyTest test = new MyTestImpl(); Set features = test.getFeatures(); } this call will not break, even if we change the MyTest interface to: public Set getFeatures() MyTestImpl will get some warnings (in my eclipse), to ensure the type safety, but that is all. Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 ----------------------------------------------------------------------- -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From holland at ebi.ac.uk Fri Mar 28 11:47:38 2008 From: holland at ebi.ac.uk (Richard Holland) Date: Fri, 28 Mar 2008 11:47:38 +0000 Subject: [Biojava-l] how to get properties of a Feature from a GenBank file In-Reply-To: <7E037431-F0AB-48CD-A628-A2236E8BA95D@sanger.ac.uk> References: <47EBAB0E.3020305@ebi.ac.uk> <93b45ca50803271855r12190578l137c4bcc723bfb9b@mail.gmail.com> <47ECB7C2.3070800@ebi.ac.uk> <7E037431-F0AB-48CD-A628-A2236E8BA95D@sanger.ac.uk> Message-ID: <47ECDADA.4050800@ebi.ac.uk> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 I was more worried about breaking concepts than breaking code. For instance, the concept of a SymbolList can be almost completely replaced by the use of a standard genericised colleciton, e.g. List or List . Likewise, the concept of a SequenceIterator is really just an Iterator (or maybe Iterator ? ). cheers, Richard Andreas Prlic wrote: > >>> These are good points. Can we generify interfaces without breaking >>> them? > > > > I don;t think that adding generics will break anything, e.g. > > old code: > > public interface MyTest { > public Set getFeatures() > } > > then some code that uses this: > > public void myFoo(){ > > MyTest test = new MyTestImpl(); > > Set features = test.getFeatures(); > } > > this call will not break, even if we change the MyTest interface to: > > public Set getFeatures() > > MyTestImpl will get some warnings (in my eclipse), to ensure the type > safety, but that is all. > > Andreas > > > > > > ----------------------------------------------------------------------- > > Andreas Prlic Wellcome Trust Sanger Institute > Hinxton, Cambridge CB10 1SA, UK > +44 (0) 1223 49 6891 > > ----------------------------------------------------------------------- > > > > > --The Wellcome Trust Sanger Institute is operated by Genome > ResearchLimited, a charity registered in England with number 1021457 and > acompany registered in England with number 2742969, whose > registeredoffice is 215 Euston Road, London, NW1 > 2BE._______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > - -- Richard Holland (BioMart) EMBL EBI, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Tel. +44 (0)1223 494416 http://www.biomart.org/ http://www.biojava.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFH7Nra4C5LeMEKA/QRAjW7AJ9r9RNv4ZaiqB7NsL1yrEGG6TawBwCfahDq 3paiRHHEIiuFxaRCAXYTFsA= =vh0r -----END PGP SIGNATURE----- From philipheller at comcast.net Fri Mar 28 15:46:33 2008 From: philipheller at comcast.net (philipheller at comcast.net) Date: Fri, 28 Mar 2008 15:46:33 +0000 Subject: [Biojava-l] how to get properties of a Feature from a GenBank file Message-ID: <032820081546.8495.47ED12D9000D53B70000212F22007507849D0A04040A089F070407089F@comcast.net> An embedded and charset-unspecified text was scrubbed... Name: not available URL: