From harryzs1981 at gmail.com Wed May 6 09:13:42 2009 From: harryzs1981 at gmail.com (sheng zhao) Date: Wed, 6 May 2009 15:13:42 +0200 Subject: [Biojava-l] Biojava-doc in chm forma Message-ID: <3d23b1eb0905060613m643adf87sdef55a05a083dd51@mail.gmail.com> Hi Where can I find Biojava-doc in chm format?? Thanks ! harry From andreas at sdsc.edu Thu May 7 08:59:32 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 7 May 2009 05:59:32 -0700 Subject: [Biojava-l] SVDSuperimposer In-Reply-To: <4A02D2E7.9070705@wp.pl> References: <4A02D2E7.9070705@wp.pl> Message-ID: <59a41c430905070559v238e0df2p28478d5108dae4a@mail.gmail.com> Hi Michal, The SVD superimposer actually does the same Singular Value Decomposition in both BioJava and BioPython. The difference is that BioJava also contains a complete alignment algorithm that can calculate a structure alignment, if the region of similarity between 2 proteins is not known a priori. See also here: http://biojava.org/wiki/BioJava:CookBook:PDB:align and http://biojava.org/wiki/BioJava:CookBook:PDB:aboutalign Andreas On Thu, May 7, 2009 at 5:24 AM, Michal Lorenc wrote: > Hello Andreas, > SVDSuperimposer from BioPython works only with proteins which have the same > length, but in BioJava it does not matter. > > Do you published how SVDSuperimposer in BioJava works or do you know where > could I found more information? > > Thank you in advance. > > Best regards, > > Michal > From raphael.andre.bauer at gmail.com Fri May 8 09:13:44 2009 From: raphael.andre.bauer at gmail.com (=?UTF-8?Q?Raphael_Andr=C3=A9_Bauer?=) Date: Fri, 8 May 2009 15:13:44 +0200 Subject: [Biojava-l] SVDSuperimposer In-Reply-To: <59a41c430905070559v238e0df2p28478d5108dae4a@mail.gmail.com> References: <4A02D2E7.9070705@wp.pl> <59a41c430905070559v238e0df2p28478d5108dae4a@mail.gmail.com> Message-ID: <9b46aa30905080613o2ea91751sdd086c5d880cc435@mail.gmail.com> On Thu, May 7, 2009 at 2:59 PM, Andreas Prlic wrote: > Hi Michal, > > The SVD superimposer actually does the same Singular Value > Decomposition in both BioJava and BioPython. The difference is that > BioJava also contains a complete alignment algorithm that can > calculate a structure alignment, if the region of similarity between 2 > proteins is not known a priori. > > See also here: > http://biojava.org/wiki/BioJava:CookBook:PDB:align ?and > http://biojava.org/wiki/BioJava:CookBook:PDB:aboutalign Maybe it is interesting for you to know that we developed a generic algorithm for the structural alignment of RNA and proteins called "LaJolla". The reference implementation is built using BioJava and released as open source at http://lajolla.sourceforge.net. I think LaJolla is straight forward to use and modify if you know how to use Maven and the BioJava core classes. Btw. The algorithm for the alignment of proteins used by BioJava yielded sometimes strange results - but I did not do an exhaustive evaluation. But maybe LaJolla is an alternative in those cases. Additionally -- if the BioJava project is interested in integrating LaJolla into BioJava this is really not a problem -- but I do not know if it blurs the clean toolkit/library character of BioJava. We are open for all suggestions regarding the algorithm and also regarding a cooperation with BioJava... Thanks, Raphael @Michal: LaJolla is also partly made in Poland -- so mabye you some people by chance.. > > Andreas > > On Thu, May 7, 2009 at 5:24 AM, Michal Lorenc wrote: >> Hello Andreas, >> SVDSuperimposer from BioPython works only with proteins which have the same >> length, but in BioJava it does not matter. >> >> Do you published how SVDSuperimposer in BioJava works or do you know where >> could I found more information? >> >> Thank you in advance. >> >> Best regards, >> >> Michal >> > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Fri May 8 13:37:41 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 8 May 2009 10:37:41 -0700 Subject: [Biojava-l] SVDSuperimposer In-Reply-To: <9b46aa30905080613o2ea91751sdd086c5d880cc435@mail.gmail.com> References: <4A02D2E7.9070705@wp.pl> <59a41c430905070559v238e0df2p28478d5108dae4a@mail.gmail.com> <9b46aa30905080613o2ea91751sdd086c5d880cc435@mail.gmail.com> Message-ID: <59a41c430905081037n5b2ce541nc132dbfb2ea49b0a@mail.gmail.com> Hi Raphael, Thanks for posting the link to "LaJolla" (funny that you chose this name, it is the part of the town where my university is located...). This is very interesting and exactly the kind of applications we are happy to see BioJava being used for. Do you want to provide a link back to your project site at http://biojava.org/wiki/BioJava:BioJavaInside and also add your paper to the list of papers that are using BioJava? About the BioJava algorithm : If you have an example where the BioJava structure comparison does not give the result you would expect, could you send it to me so I can take a look ? Thanks, Andreas On Fri, May 8, 2009 at 6:13 AM, Raphael Andr? Bauer wrote: > On Thu, May 7, 2009 at 2:59 PM, Andreas Prlic wrote: >> Hi Michal, >> >> The SVD superimposer actually does the same Singular Value >> Decomposition in both BioJava and BioPython. The difference is that >> BioJava also contains a complete alignment algorithm that can >> calculate a structure alignment, if the region of similarity between 2 >> proteins is not known a priori. >> >> See also here: >> http://biojava.org/wiki/BioJava:CookBook:PDB:align ?and >> http://biojava.org/wiki/BioJava:CookBook:PDB:aboutalign > > Maybe it is interesting for you to know that we developed a generic > algorithm for the structural alignment of RNA and proteins called > "LaJolla". The reference implementation is built using BioJava and > released as open source at http://lajolla.sourceforge.net. I think > LaJolla is straight forward to use and modify if you know how to use > Maven and the BioJava core classes. > > Btw. The algorithm for the alignment of proteins used by BioJava > yielded sometimes strange results - but I did not do an exhaustive > evaluation. But maybe LaJolla is an alternative in those cases. > > Additionally -- if the BioJava project is interested in integrating > LaJolla into BioJava this is really not a problem -- but I do not know > if it blurs the clean toolkit/library character of BioJava. > > We are open for all suggestions regarding the algorithm and also > regarding a cooperation with BioJava... > > Thanks, > > Raphael > @Michal: LaJolla is also partly made in Poland -- so mabye you some > people by chance.. > >> >> Andreas >> >> On Thu, May 7, 2009 at 5:24 AM, Michal Lorenc wrote: >>> Hello Andreas, >>> SVDSuperimposer from BioPython works only with proteins which have the same >>> length, but in BioJava it does not matter. >>> >>> Do you published how SVDSuperimposer in BioJava works or do you know where >>> could I found more information? >>> >>> Thank you in advance. >>> >>> Best regards, >>> >>> Michal >>> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Mon May 11 00:30:30 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 10 May 2009 21:30:30 -0700 Subject: [Biojava-l] Plans for next biojava release - modularization Message-ID: <59a41c430905102130v6e878c6bgda582268e02824fe@mail.gmail.com> Hi, In case you are not subscribed to biojava-dev: we are having a discussion about how to organize the code base for the next release of biojava on the biojava-dev mailing list. If you want to follow this discussion, but are not subscribed yet, you can do so here: http://www.biojava.org/mailman/listinfo/biojava-dev Andreas From raphael.andre.bauer at gmail.com Mon May 11 05:10:48 2009 From: raphael.andre.bauer at gmail.com (=?UTF-8?Q?Raphael_Andr=C3=A9_Bauer?=) Date: Mon, 11 May 2009 11:10:48 +0200 Subject: [Biojava-l] SVDSuperimposer In-Reply-To: <59a41c430905081037n5b2ce541nc132dbfb2ea49b0a@mail.gmail.com> References: <4A02D2E7.9070705@wp.pl> <59a41c430905070559v238e0df2p28478d5108dae4a@mail.gmail.com> <9b46aa30905080613o2ea91751sdd086c5d880cc435@mail.gmail.com> <59a41c430905081037n5b2ce541nc132dbfb2ea49b0a@mail.gmail.com> Message-ID: <9b46aa30905110210j5799359s6d72f50dccb3c74f@mail.gmail.com> On Fri, May 8, 2009 at 7:37 PM, Andreas Prlic wrote: > Hi Raphael, > > Thanks for posting the link to "LaJolla" (funny that you chose this > name, it is the part of the town where my university is located...). > This is very interesting and exactly the kind of applications we are > happy to see BioJava being used for. Well. Phil Bourne and the PDB are there (and I think you are also somehow connected to both)... I also heard that LaJolla might have nice beaches, sunny weather and the surf?... So LaJolla definitely is a good name for a boring structural RNA and protein alignment algorithm... > > Do you want to provide a link back to your project site at > http://biojava.org/wiki/BioJava:BioJavaInside and also add your paper > to the list of papers that are using BioJava? done > > About the ?BioJava algorithm : If you have an example where the > BioJava structure comparison does not give the result you would > expect, could you send it to me so I can take a look ? I did not write down the examples (I thought, that the protein alignment of BioJava is somehow alpha because of "personal communication of..."). But the examples where simple. If I rediscover them I will let you know. Maybe it is also of help to mention in the source code or on the webpage of the cookbook: "If our protein alignment alg delivers strange results please let us now..." or so.. So that users are aware of submitting strange results... Thanks, Raphael > > Thanks, > Andreas > > > On Fri, May 8, 2009 at 6:13 AM, Raphael Andr? Bauer > wrote: >> On Thu, May 7, 2009 at 2:59 PM, Andreas Prlic wrote: >>> Hi Michal, >>> >>> The SVD superimposer actually does the same Singular Value >>> Decomposition in both BioJava and BioPython. The difference is that >>> BioJava also contains a complete alignment algorithm that can >>> calculate a structure alignment, if the region of similarity between 2 >>> proteins is not known a priori. >>> >>> See also here: >>> http://biojava.org/wiki/BioJava:CookBook:PDB:align ?and >>> http://biojava.org/wiki/BioJava:CookBook:PDB:aboutalign >> >> Maybe it is interesting for you to know that we developed a generic >> algorithm for the structural alignment of RNA and proteins called >> "LaJolla". The reference implementation is built using BioJava and >> released as open source at http://lajolla.sourceforge.net. I think >> LaJolla is straight forward to use and modify if you know how to use >> Maven and the BioJava core classes. >> >> Btw. The algorithm for the alignment of proteins used by BioJava >> yielded sometimes strange results - but I did not do an exhaustive >> evaluation. But maybe LaJolla is an alternative in those cases. >> >> Additionally -- if the BioJava project is interested in integrating >> LaJolla into BioJava this is really not a problem -- but I do not know >> if it blurs the clean toolkit/library character of BioJava. >> >> We are open for all suggestions regarding the algorithm and also >> regarding a cooperation with BioJava... >> >> Thanks, >> >> Raphael >> @Michal: LaJolla is also partly made in Poland -- so mabye you some >> people by chance.. >> >>> >>> Andreas >>> >>> On Thu, May 7, 2009 at 5:24 AM, Michal Lorenc wrote: >>>> Hello Andreas, >>>> SVDSuperimposer from BioPython works only with proteins which have the same >>>> length, but in BioJava it does not matter. >>>> >>>> Do you published how SVDSuperimposer in BioJava works or do you know where >>>> could I found more information? >>>> >>>> Thank you in advance. >>>> >>>> Best regards, >>>> >>>> Michal >>>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > From jp at javaclass.co.uk Mon May 11 06:05:43 2009 From: jp at javaclass.co.uk (JP) Date: Mon, 11 May 2009 11:05:43 +0100 Subject: [Biojava-l] FASTA header, loc attribute question Message-ID: <4adc29060905110305l392a9792p2aa40590e97c9891@mail.gmail.com> Hi there at Biojava, I have two FASTA files - one containing amino acid sequences and the other containing dna sequences. In the AA FASTA file I have something like : >FBpp0077713 type=protein; loc=2L:join(384551..384894,385701..385746,386308..386576,386703..387270); ID=FBpp0077713; name=al-PA; parent=FBgn0000061,FBtr0078053; dbxref=FlyBase:FBpp0077713,GB_protein:AAF51505.1,GB_protein:AAF51505,FlyBase_Annotation_IDs:CG3935-PA,REFSEQ:NP_722629; MD5=64a866db3e2913b97a2158c2de9d02f6; length=408; release=r5.9; species=Dmel; MGISEEIKLEELPQEAKLAHPDAVVLVDRAPGSSAASAGAALTVSMSVSG GAPSGASGASGGTNSPVSDGNSDCEADEYAPKRKQRRYRTTFTSFQLEEL... etc etc etc I would like to parse this header line in particular the loc attribute and extract it from the entry in the DNA FASTA file (so I get the genomic data for the protein) >FBgn0000061 type=gene; loc=2L:378116..387439; ID=FBgn0000061; name=al; dbxref=FlyBase:FBgn0000061,FlyBase:FBan0003935,FlyBase_Annotation_IDs:CG3935,GB:AE003589,GB_protein:AAF51505,GB:AY121696,GB_protein:AAM52023,GB:BI485174,GB:CZ486795,GB:L08401,GB_protein:AAA28840,UniProt/Swiss-Prot:Q06453,INTERPRO:IPR000047,INTERPRO:IPR001356,INTERPRO:IPR003654,INTERPRO:IPR009057,INTERPRO:IPR012287,bdgpinsituexpr:al,dedb:5830,drsc:FBgn0000061,flight:FBgn0000061,flyatlas:FBgn0000061,flyexpress:FBgn0000061,flygrid:59464,flymine:FBgn0000061,geo:FBgn0000061,hdri:FBgn0000061,if:/gene/aristal.htm,orthologs:ensANOGA:ENSANGP00000011877,orthologs:ensBOSTA:ENSBTAP00000015907,orthologs:ensCANFA:ENSCAFP00000009888,orthologs:ensGALGA:ENSGALP00000005255,orthologs:ensHOMSA:ENSP00000298420,orthologs:ensMACMU:ENSMMUP00000007349,orthologs:ensMONDO:ENSMODP00000008388,orthologs:ensPANTR:ENSPTRP00000004281,orthologs:ensRATNO:ENSRNOP00000027186,orthologs:ensTETNI:GSTENP00015517001,orthologs:graORYSA:Q6YYB8,orthologs:graORYSA:Q8W0T5,orthologs:modCAEEL:WBGene00044330,orthologs:modDANRE:ZDB-GENE-990415-15,orthologs:modMUSMU:MGI:1097716,panther:FBgn0000061; cyto_range=21C1-21C1; gbunit=AE014134; MD5=0f5568cf13aeb2c7076f11b1ce3d6b2f; length=9324; release=r5.9; species=Dmel; GTAGTTTGCTGCCGGCTCTGGAACAGCCCGGTCATCTCGTCGCGTTCGGT TCCGATTCCGATTCGAATAGTCGAGCTGGGGATACATTGTTGTTTCCGGG etc etc etc I understand this is not exactly conventional, but does biojava support the parsing of the loc attribute ? (join, complement etc.) Many Thanks JP From holland at eaglegenomics.com Mon May 11 06:27:23 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 11 May 2009 10:27:23 +0000 Subject: [Biojava-l] FASTA header, loc attribute question In-Reply-To: <4adc29060905110305l392a9792p2aa40590e97c9891@mail.gmail.com> References: <4adc29060905110305l392a9792p2aa40590e97c9891@mail.gmail.com> Message-ID: <1242037643.15828.12.camel@buzzybee> Short answer - no, not directly. Longer answer - if you can write some code to snip out the Loc string from the FASTA description line then there is existing code which can convert the snipped Loc string into a RichLocation, which you can then apply to the parsed FASTA sequence in order to extract the required location. The Loc string parser is GenbankLocationParser, part of the biojavax packages. This assumes that the Loc string conforms to Genbank format location definitions. cheers, Richard On Mon, 2009-05-11 at 11:05 +0100, JP wrote: > Hi there at Biojava, > > I have two FASTA files - one containing amino acid sequences and the other > containing dna sequences. > > In the AA FASTA file I have something like : > > >FBpp0077713 type=protein; > loc=2L:join(384551..384894,385701..385746,386308..386576,386703..387270); > ID=FBpp0077713; name=al-PA; parent=FBgn0000061,FBtr0078053; > dbxref=FlyBase:FBpp0077713,GB_protein:AAF51505.1,GB_protein:AAF51505,FlyBase_Annotation_IDs:CG3935-PA,REFSEQ:NP_722629; > MD5=64a866db3e2913b97a2158c2de9d02f6; length=408; release=r5.9; > species=Dmel; > MGISEEIKLEELPQEAKLAHPDAVVLVDRAPGSSAASAGAALTVSMSVSG > GAPSGASGASGGTNSPVSDGNSDCEADEYAPKRKQRRYRTTFTSFQLEEL... > etc etc etc > > I would like to parse this header line in particular the loc attribute and > extract it from the entry in the DNA FASTA file (so I get the genomic data > for the protein) > > >FBgn0000061 type=gene; loc=2L:378116..387439; ID=FBgn0000061; name=al; > dbxref=FlyBase:FBgn0000061,FlyBase:FBan0003935,FlyBase_Annotation_IDs:CG3935,GB:AE003589,GB_protein:AAF51505,GB:AY121696,GB_protein:AAM52023,GB:BI485174,GB:CZ486795,GB:L08401,GB_protein:AAA28840,UniProt/Swiss-Prot:Q06453,INTERPRO:IPR000047,INTERPRO:IPR001356,INTERPRO:IPR003654,INTERPRO:IPR009057,INTERPRO:IPR012287,bdgpinsituexpr:al,dedb:5830,drsc:FBgn0000061,flight:FBgn0000061,flyatlas:FBgn0000061,flyexpress:FBgn0000061,flygrid:59464,flymine:FBgn0000061,geo:FBgn0000061,hdri:FBgn0000061,if:/gene/aristal.htm,orthologs:ensANOGA:ENSANGP00000011877,orthologs:ensBOSTA:ENSBTAP00000015907,orthologs:ensCANFA:ENSCAFP00000009888,orthologs:ensGALGA:ENSGALP00000005255,orthologs:ensHOMSA:ENSP00000298420,orthologs:ensMACMU:ENSMMUP00000007349,orthologs:ensMONDO:ENSMODP00000008388,orthologs:ensPANTR:ENSPTRP00000004281,orthologs:ensRATNO:ENSRNOP00000027186,orthologs:ensTETNI:GSTENP00015517001,orthologs:graORYSA:Q6YYB8,orthologs:graORYSA:Q8W0T5,orthologs:modCAEEL:WBGene00044330,orthologs:modDA! > NRE:ZDB-GENE-990415-15,orthologs:modMUSMU:MGI:1097716,panther:FBgn0000061; > cyto_range=21C1-21C1; gbunit=AE014134; MD5=0f5568cf13aeb2c7076f11b1ce3d6b2f; > length=9324; release=r5.9; species=Dmel; > GTAGTTTGCTGCCGGCTCTGGAACAGCCCGGTCATCTCGTCGCGTTCGGT > TCCGATTCCGATTCGAATAGTCGAGCTGGGGATACATTGTTGTTTCCGGG > etc etc etc > > I understand this is not exactly conventional, but does biojava support the > parsing of the loc attribute ? (join, complement etc.) > > Many Thanks > JP > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From thomas.covello at gmail.com Mon May 11 16:33:43 2009 From: thomas.covello at gmail.com (Thomas Covello) Date: Mon, 11 May 2009 14:33:43 -0600 Subject: [Biojava-l] NullPointerException when using Viterbi algorithm in HMM. Message-ID: <2e367d7b0905111333o2d074f81if9a34545acb95b5b@mail.gmail.com> Hello, I am trying to use a Hidden Markov Model to classify a sequence, and I am encountering a NullPointerException when I try to use the Viterbi algorithm. My test case follows - if you want the code I am actually using I can post it. The stack trace: java.lang.NullPointerException at org.biojava.bio.dp.onehead.SingleDP.viterbi(SingleDP.java:648) at org.biojava.bio.dp.onehead.SingleDP.viterbi(SingleDP.java:512) at oxytricha_crap.Main.main(Main.java:69) File: Main.java package oxytricha_crap; import org.biojava.bio.seq.*; import org.biojava.bio.*; import org.biojava.bio.symbol.*; import org.biojava.bio.dp.*; import org.biojava.bio.dist.*; import java.util.*; import org.biojava.bio.symbol.SymbolList; public class Main { private static Alphabet DNA = DNATools.getDNA(); private static State IES; private static State MDS; private static MarkovModel hmm = new SimpleMarkovModel(1, DNA, "MyHMM"); static { try { Distribution IES_dist = DistributionFactory.DEFAULT.createDistribution(DNA); Distribution MDS_dist = DistributionFactory.DEFAULT.createDistribution(DNA); IES = new SimpleEmissionState("IES", Annotation.EMPTY_ANNOTATION, new int [] {1}, IES_dist); MDS = new SimpleEmissionState("MDS", Annotation.EMPTY_ANNOTATION, new int [] {1}, MDS_dist); hmm.addState(IES); hmm.addState(MDS); hmm.createTransition(MDS, hmm.magicalState()); hmm.createTransition(hmm.magicalState(), MDS); hmm.createTransition(MDS, IES); hmm.createTransition(IES, MDS); hmm.createTransition(MDS, MDS); hmm.createTransition(IES, IES); } catch (Exception e) { e.printStackTrace(); } } public static void main(String[] args) { try { State currState; State lastState = hmm.magicalState(); Iterator states = Arrays.asList(new State [] { MDS, MDS, MDS, IES, IES, IES, MDS, MDS, IES, IES, MDS, MDS, MDS, MDS }).iterator(); SymbolList sl = DNATools.createDNA("acgtgtctgagaga"); HMMTrainer trainer = new SimpleHMMTrainer(hmm); trainer.startCycle(); for (Symbol sym : (List)sl.toList()) { currState = states.next(); if (currState != lastState && lastState != null) trainer.recordTransition(lastState, currState, 1.0); trainer.recordEmittedSymbol(currState, sym, 1.0); lastState = currState; } trainer.recordTransition(MDS, hmm.magicalState(), 1.0); trainer.completeCycle(); DP dp = DPFactory.DEFAULT.createDP(hmm); SymbolList unknown = DNATools.createDNA("acgtcgtacgtacgtacacaacga"); StatePath spath = dp.viterbi(new SymbolList [] { unknown }, ScoreType.PROBABILITY); } catch (Exception e) { e.printStackTrace(); } System.exit(0); } } From mauleemacx at gmail.com Wed May 13 07:21:57 2009 From: mauleemacx at gmail.com (mac LEE) Date: Wed, 13 May 2009 19:21:57 +0800 Subject: [Biojava-l] biojava on Maven2 repository Message-ID: Dear All, Hello. Is it possible to place the biojava jar into the maven 2 repository? regards, jack Lee -- Somewhere between heaven and hell mauleemacx at gmail.com From raphael.andre.bauer at gmail.com Wed May 13 09:12:11 2009 From: raphael.andre.bauer at gmail.com (=?UTF-8?Q?Raphael_Andr=C3=A9_Bauer?=) Date: Wed, 13 May 2009 15:12:11 +0200 Subject: [Biojava-l] biojava on Maven2 repository In-Reply-To: References: Message-ID: <9b46aa30905130612g194a022eu4fdb30114773b70e@mail.gmail.com> On Wed, May 13, 2009 at 1:21 PM, mac LEE wrote: > Dear All, > Hello. Is it possible to place the biojava jar into the maven 2 repository? You have to install the biojava.jar manually into your local repository: http://maven.apache.org/guides/mini/guide-3rd-party-jars-local.html LaJolla is an application that uses that approach too... Maybe the pom.xml of LaJolla is worth a look to see how it works...: http://lajolla.svn.sourceforge.net/viewvc/lajolla/trunk/lajolla/pom.xml?revision=350&view=markup Regards, Raphael > > regards, > > jack Lee > > -- > Somewhere between heaven and hell > mauleemacx at gmail.com > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From helpmedicine.savelife at gmail.com Wed May 13 10:04:18 2009 From: helpmedicine.savelife at gmail.com (helpmedicine savelife) Date: Wed, 13 May 2009 10:04:18 -0400 Subject: [Biojava-l] biojava on Maven2 repository In-Reply-To: <9b46aa30905130612g194a022eu4fdb30114773b70e@mail.gmail.com> References: <9b46aa30905130612g194a022eu4fdb30114773b70e@mail.gmail.com> Message-ID: <3d0f70fd0905130704h57eb09b6k33bf067b6f729768@mail.gmail.com> I agree it is possible by setting up in local repository (I did the same for my project I am currently working), but now that biojava is being used by many, I guess it it is a good time to to make it accessible through maven2 by providing the project metadata information to Maven2. Biojava Team - Is this possible? Thanks, Sandya On Wed, May 13, 2009 at 9:12 AM, Raphael Andr? Bauer < raphael.andre.bauer at gmail.com> wrote: > On Wed, May 13, 2009 at 1:21 PM, mac LEE wrote: > > Dear All, > > Hello. Is it possible to place the biojava jar into the maven 2 > repository? > > You have to install the biojava.jar manually into your local repository: > http://maven.apache.org/guides/mini/guide-3rd-party-jars-local.html > > LaJolla is an application that uses that approach too... Maybe the > pom.xml of LaJolla is worth a look to see how it works...: > > http://lajolla.svn.sourceforge.net/viewvc/lajolla/trunk/lajolla/pom.xml?revision=350&view=markup > > Regards, > > Raphael > > > > > regards, > > > > jack Lee > > > > -- > > Somewhere between heaven and hell > > mauleemacx at gmail.com > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From mauleemacx at gmail.com Thu May 14 01:50:51 2009 From: mauleemacx at gmail.com (mac LEE) Date: Thu, 14 May 2009 13:50:51 +0800 Subject: [Biojava-l] biojava on Maven2 repository In-Reply-To: <3d0f70fd0905130704h57eb09b6k33bf067b6f729768@mail.gmail.com> References: <9b46aa30905130612g194a022eu4fdb30114773b70e@mail.gmail.com> <3d0f70fd0905130704h57eb09b6k33bf067b6f729768@mail.gmail.com> Message-ID: Dear All, I think putting Biojava on central Maven2 repository not only facilitate current users. It also attracts new users as well, just think of how success is the BioPerl project resided in CPAN is. I strongly hope that there will be central repository soon. regards, jack LEE On Wed, May 13, 2009 at 10:04 PM, helpmedicine savelife < helpmedicine.savelife at gmail.com> wrote: > I agree it is possible by setting up in local repository (I did the same > for > my project I am currently working), but now that biojava is being used by > many, I guess it it is a good time to to make it accessible through maven2 > by providing the project metadata information to Maven2. > > Biojava Team - Is this possible? > > Thanks, > Sandya > > On Wed, May 13, 2009 at 9:12 AM, Raphael Andr? Bauer < > raphael.andre.bauer at gmail.com> wrote: > > > On Wed, May 13, 2009 at 1:21 PM, mac LEE wrote: > > > Dear All, > > > Hello. Is it possible to place the biojava jar into the maven 2 > > repository? > > > > You have to install the biojava.jar manually into your local repository: > > http://maven.apache.org/guides/mini/guide-3rd-party-jars-local.html > > > > LaJolla is an application that uses that approach too... Maybe the > > pom.xml of LaJolla is worth a look to see how it works...: > > > > > http://lajolla.svn.sourceforge.net/viewvc/lajolla/trunk/lajolla/pom.xml?revision=350&view=markup > > > > Regards, > > > > Raphael > > > > > > > > regards, > > > > > > jack Lee > > > > > > -- > > > Somewhere between heaven and hell > > > mauleemacx at gmail.com > > > _______________________________________________ > > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Somewhere between heaven and hell mauleemacx at gmail.com From andreas at sdsc.edu Thu May 14 09:38:08 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 14 May 2009 06:38:08 -0700 Subject: [Biojava-l] biojava on Maven2 repository In-Reply-To: <3d0f70fd0905130704h57eb09b6k33bf067b6f729768@mail.gmail.com> References: <9b46aa30905130612g194a022eu4fdb30114773b70e@mail.gmail.com> <3d0f70fd0905130704h57eb09b6k33bf067b6f729768@mail.gmail.com> Message-ID: <59a41c430905140638r74554c10wb9c97dddd8f05b6b@mail.gmail.com> I will have a look at this, once we have mavenized the biojava build process. In the meanwhile, perhaps somebody could post instructions how to add biojava to a local maven repository on the wiki? Andreas On Wed, May 13, 2009 at 7:04 AM, helpmedicine savelife wrote: > I agree it is possible by setting up in local repository (I did the same for > my project I am currently working), but now that biojava is being used by > many, I guess it it is a good time to to make it accessible through maven2 > by providing the project metadata information to Maven2. > > Biojava Team - Is this possible? > > Thanks, > Sandya > > On Wed, May 13, 2009 at 9:12 AM, Raphael Andr? Bauer < > raphael.andre.bauer at gmail.com> wrote: > >> On Wed, May 13, 2009 at 1:21 PM, mac LEE wrote: >> > Dear All, >> > Hello. Is it possible to place the biojava jar into the maven 2 >> repository? >> >> You have to install the biojava.jar manually into your local repository: >> http://maven.apache.org/guides/mini/guide-3rd-party-jars-local.html >> >> LaJolla is an application that uses that approach too... Maybe the >> pom.xml of LaJolla is worth a look to see how it works...: >> >> http://lajolla.svn.sourceforge.net/viewvc/lajolla/trunk/lajolla/pom.xml?revision=350&view=markup >> >> Regards, >> >> Raphael >> >> > >> > regards, >> > >> > jack Lee >> > >> > -- >> > Somewhere between heaven and hell >> > mauleemacx at gmail.com >> > _______________________________________________ >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From niall at sgenomics.org Thu May 14 09:55:31 2009 From: niall at sgenomics.org (Niall Haslam) Date: Thu, 14 May 2009 15:55:31 +0200 Subject: [Biojava-l] Webservices clients in bio-java Message-ID: <200905141555.31967.niall@sgenomics.org> Hi all, I was wondering if there was any interest in creating a set of maintained clients for popular webservices. Depending on how it goes and how feasible it is these could be included in biojava. I assume there are plenty of people writing and rewriting clients to these things. In the first instance we could just set up a repository for example code on using the servers. I'm sure this would be a useful resource for java programmers. Unless someone wants to point me towards a pre-existing resource! Niall. From heuermh at acm.org Thu May 14 12:34:53 2009 From: heuermh at acm.org (Michael Heuer) Date: Thu, 14 May 2009 12:34:53 -0400 (EDT) Subject: [Biojava-l] biojava on Maven2 repository In-Reply-To: <59a41c430905140638r74554c10wb9c97dddd8f05b6b@mail.gmail.com> Message-ID: Hello Andreas, The biojava jars can be manually uploaded to the central repository. I will do this for the 1.7 release. Once the build migrates to maven, we can set up our own repository that is mirrored to the central repository. This is required if we have a multi-module build, for example. There may be a maven repo hosted at open-bio.org already, I think the biomoby folks use one. michael On Thu, 14 May 2009, Andreas Prlic wrote: > I will have a look at this, once we have mavenized the biojava build > process. In the meanwhile, perhaps somebody could post instructions > how to add biojava to a local maven repository on the wiki? > > Andreas > > > > On Wed, May 13, 2009 at 7:04 AM, helpmedicine savelife > wrote: > > I agree it is possible by setting up in local repository (I did the same for > > my project I am currently working), but now that biojava is being used by > > many, I guess it it is a good time to to make it accessible through maven2 > > by providing the project metadata information to Maven2. > > > > Biojava Team - Is this possible? > > > > Thanks, > > Sandya > > > > On Wed, May 13, 2009 at 9:12 AM, Raphael Andr? Bauer < > > raphael.andre.bauer at gmail.com> wrote: > > > >> On Wed, May 13, 2009 at 1:21 PM, mac LEE wrote: > >> > Dear All, > >> > Hello. Is it possible to place the biojava jar into the maven 2 > >> repository? > >> > >> You have to install the biojava.jar manually into your local repository: > >> http://maven.apache.org/guides/mini/guide-3rd-party-jars-local.html > >> > >> LaJolla is an application that uses that approach too... Maybe the > >> pom.xml of LaJolla is worth a look to see how it works...: > >> > >> http://lajolla.svn.sourceforge.net/viewvc/lajolla/trunk/lajolla/pom.xml?revision=350&view=markup > >> > >> Regards, > >> > >> Raphael > >> > >> > > >> > regards, > >> > > >> > jack Lee > >> > > >> > -- > >> > Somewhere between heaven and hell > >> > mauleemacx at gmail.com > >> > _______________________________________________ > >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > > >> > >> _______________________________________________ > >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > > > > _______________________________________________ > > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Thu May 14 23:27:05 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 14 May 2009 20:27:05 -0700 Subject: [Biojava-l] biojava on Maven2 repository In-Reply-To: References: <59a41c430905140638r74554c10wb9c97dddd8f05b6b@mail.gmail.com> Message-ID: <59a41c430905142027g2de44f57s6ef0c48f8c877c17@mail.gmail.com> That would be helpful, thanks, Andreas On Thu, May 14, 2009 at 9:34 AM, Michael Heuer wrote: > Hello Andreas, > > The biojava jars can be manually uploaded to the central repository. ?I > will do this for the 1.7 release. > > Once the build migrates to maven, we can set up our own repository that > is mirrored to the central repository. ? This is required if we have a > multi-module build, for example. ?There may be a maven repo hosted at > open-bio.org already, I think the biomoby folks use one. > > ? michael > > > On Thu, 14 May 2009, Andreas Prlic wrote: > >> I will have a look at this, once we have mavenized the biojava build >> process. In the meanwhile, perhaps somebody could post instructions >> how to add biojava to a local maven repository on the wiki? >> >> Andreas >> >> >> >> On Wed, May 13, 2009 at 7:04 AM, helpmedicine savelife >> wrote: >> > I agree it is possible by setting up in local repository (I did the same for >> > my project I am currently working), but now that biojava is being used by >> > many, I guess it it is a good time to to make it accessible through maven2 >> > by providing the project metadata information to Maven2. >> > >> > Biojava Team - Is this possible? >> > >> > Thanks, >> > Sandya >> > >> > On Wed, May 13, 2009 at 9:12 AM, Raphael Andr? Bauer < >> > raphael.andre.bauer at gmail.com> wrote: >> > >> >> On Wed, May 13, 2009 at 1:21 PM, mac LEE wrote: >> >> > Dear All, >> >> > Hello. Is it possible to place the biojava jar into the maven 2 >> >> repository? >> >> >> >> You have to install the biojava.jar manually into your local repository: >> >> http://maven.apache.org/guides/mini/guide-3rd-party-jars-local.html >> >> >> >> LaJolla is an application that uses that approach too... Maybe the >> >> pom.xml of LaJolla is worth a look to see how it works...: >> >> >> >> http://lajolla.svn.sourceforge.net/viewvc/lajolla/trunk/lajolla/pom.xml?revision=350&view=markup >> >> >> >> Regards, >> >> >> >> Raphael >> >> >> >> > >> >> > regards, >> >> > >> >> > jack Lee >> >> > >> >> > -- >> >> > Somewhere between heaven and hell >> >> > mauleemacx at gmail.com >> >> > _______________________________________________ >> >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > >> >> >> >> _______________________________________________ >> >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> > >> > _______________________________________________ >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > From andreas at sdsc.edu Sat May 16 15:10:56 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 16 May 2009 12:10:56 -0700 Subject: [Biojava-l] Webservices clients in bio-java In-Reply-To: <200905141555.31967.niall@sgenomics.org> References: <200905141555.31967.niall@sgenomics.org> Message-ID: <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> Hi Niall. Which webservices did you have in mind? Something that got recently discussed on biojava-dev is support Blast webservices... Andreas On Thu, May 14, 2009 at 6:55 AM, Niall Haslam wrote: > Hi all, > > I was wondering if there was any interest in creating a set of maintained > clients for popular webservices. Depending on how it goes and how feasible it > is these could be included in biojava. I assume there are plenty of people > writing and rewriting clients to these things. In the first instance we could > just set up a repository for example code on using the servers. > > I'm sure this would be a useful resource for java programmers. Unless someone > wants to point me towards a pre-existing resource! > > Niall. > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From niall at sgenomics.org Tue May 19 03:36:38 2009 From: niall at sgenomics.org (Niall Haslam) Date: Tue, 19 May 2009 09:36:38 +0200 Subject: [Biojava-l] Webservices clients in bio-java In-Reply-To: <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> References: <200905141555.31967.niall@sgenomics.org> <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> Message-ID: <200905190936.38541.niall@sgenomics.org> On Saturday 16 May 2009 21:10, Andreas Prlic wrote: > Which webservices did you have in mind? Something that got recently > discussed on biojava-dev is support Blast webservices... I think BLAST would obviously be one of the main ones to get started with. I have a client at the moment which goes as far as fetching the job-id and generating a link to the results, which is as much as I need. If I remember correctly, the BLAST parser at the moment doesn't deal with the responses from the EBI webservices (which I used). Mark makes a valid point about the choice of technology. I would also recommend jax-ws and probably axis2 as the frameworks, but note that some clients will probably require axis1_4. Although its not clear which e-mail discussion you are referring to - is it one on the biojava-dev mailing list? A lot of the client code is autogenerated, and I guess one question would be should we include this in any repository that we make? I have been using the wtp plugin in eclipse and this takes some of the pain out of generating clients. Though it does generate some rather large source files. Niall. From markjschreiber at gmail.com Tue May 19 10:05:11 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 19 May 2009 22:05:11 +0800 Subject: [Biojava-l] Webservices clients in bio-java In-Reply-To: <93b45ca50905190702l2169d7f4qf200ed04f33c17b5@mail.gmail.com> References: <200905141555.31967.niall@sgenomics.org> <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> <200905190936.38541.niall@sgenomics.org> <93b45ca50905190702l2169d7f4qf200ed04f33c17b5@mail.gmail.com> Message-ID: <93b45ca50905190705qec04c48yb7a81c9b0715a50e@mail.gmail.com> Hi Niall - There was a recent discussion of generated code on the list. A good solution for webservices may be to put the wsdl and high level api in the repository and let jax-ws tools autogenerate the client biolerplate code. Mark On 19 May 2009, 3:39 PM, "Niall Haslam" wrote: On Saturday 16 May 2009 21:10, Andreas Prlic wrote: > Which webservices did you have in mind? Someth... I think BLAST would obviously be one of the main ones to get started with. I have a client at the moment which goes as far as fetching the job-id and generating a link to the results, which is as much as I need. If I remember correctly, the BLAST parser at the moment doesn't deal with the responses from the EBI webservices (which I used). Mark makes a valid point about the choice of technology. I would also recommend jax-ws and probably axis2 as the frameworks, but note that some clients will probably require axis1_4. Although its not clear which e-mail discussion you are referring to - is it one on the biojava-dev mailing list? A lot of the client code is autogenerated, and I guess one question would be should we include this in any repository that we make? I have been using the wtp plugin in eclipse and this takes some of the pain out of generating clients. Though it does generate some rather large source files. Niall. _______________________________________________ Biojava-l mailing list - Biojava-l at lists.o... From aaron_b_a at yahoo.com Tue May 19 21:12:28 2009 From: aaron_b_a at yahoo.com (Aaron Brata Aditama) Date: Tue, 19 May 2009 18:12:28 -0700 (PDT) Subject: [Biojava-l] How to write pairwiseAlignment output in my own format? Message-ID: <248630.13325.qm@web56107.mail.re3.yahoo.com> There is a default output method in NeedlemanWunsch Class, and it writes output in its own format. How can I write output of paiwire alignment in my own format? For example, I want to get the aligned sequences and the score only. Aaron Brata Aditama From mike.thon at gmail.com Thu May 21 14:06:56 2009 From: mike.thon at gmail.com (Michael Thon) Date: Thu, 21 May 2009 20:06:56 +0200 Subject: [Biojava-l] CRC64Checksum example Message-ID: <8EEAB8FC-4D9D-4417-90EC-DC9E4329648C@gmail.com> Hi all - I need to compute a crc64 for some sequences and I thought that org.biojavax.utils.CRC64Checksum could do this for me. Java throws a NullPointerException when I try to do: crc.update(seq); where crc is a CRC64Checksum object and seq is a String containing a protein sequence. Does anyone have some example code showing how to compute a crc using this class? Many thanks Mike Thon From holland at eaglegenomics.com Thu May 21 15:11:14 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 21 May 2009 20:11:14 +0100 Subject: [Biojava-l] CRC64Checksum example In-Reply-To: <8EEAB8FC-4D9D-4417-90EC-DC9E4329648C@gmail.com> References: <8EEAB8FC-4D9D-4417-90EC-DC9E4329648C@gmail.com> Message-ID: <1242933074.29986.16.camel@buzzybee> Could you provide the line you used to instantiate the CRC64Checksum object, and the value that you're passing to update(seq), and the full stacktrace you get when it fails? That'll all help us work out what's wrong. cheers, Richard On Thu, 2009-05-21 at 20:06 +0200, Michael Thon wrote: > Hi all - I need to compute a crc64 for some sequences and I thought > that org.biojavax.utils.CRC64Checksum could do this for me. Java > throws a NullPointerException when I try to do: > crc.update(seq); > where crc is a CRC64Checksum object and seq is a String containing a > protein sequence. > > Does anyone have some example code showing how to compute a crc using > this class? > Many thanks > Mike Thon > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From paolo.romano at istge.it Wed May 27 12:15:09 2009 From: paolo.romano at istge.it (Paolo Romano) Date: Wed, 27 May 2009 18:15:09 +0200 Subject: [Biojava-l] NETTAB 2009: Last Call for Participation Message-ID: <200905271630.n4RGU87M030436@ibm43p.biotech.ist.unige.it> Last Call for Participation NETTAB 2009 Workshop on "Technologies, Tools and Applications for Collaborative and Social Bioinformatics Research and Development" with a Special Session on: "Methods and Tools for RNA Structure and Functional Analysis" June 10-13, 2009 Department of Computer Science, University of Catania, Italy http://www.nettab.org/2009/ KEYNOTE TALKS RNA WikiProject: Community annotation of RNA families Alex Bateman, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. Who are you? Managing collaborative digital identities in bioinformatics with myExperiment Duncan Hull, School of Chemistry, University of Manchester, Manchester, UK. Semantically Integrated eCommunities in Biomedicine: Next-Generation Models of Biomedical Communication Tim Clark, Director of Informatics, MassGeneral Institute for Neurodegenerative Disease Neurology Research Department, Massachusetts General Hospital, Boston, USA. Bacterial Phylogeny and Taxonomy in the High-Throughput Sequencing World Gabriel Valiente, Technical University of Catalonia, Department of Software, Barcelona, Spain. Non coding RNAs (title to be confirmed) Doron Betel, MSKCC - Computational Biology Center, New York, USA . SELECTED ORAL COMMUNICATIONS The DC-THERA Directory: A Knowledge Management System to Support Collaboration on Dendritic Cell and Immunology Research Michaela Guendel, Ciro Scognamiglio, Marco Brandizi and Andrea Splendiani Bioinformatics Experimentation in a New Agent-Based Infrastructure OpenKnowledge Dietlind Gerloff, Xueping Quan, Adrian de Pinninck, Paolo Besana, Siu-wai Leung, Marco Schorlemmer and David Robertson Abstraction based cooperation for the design of bioinformatics workflows Fr?d?ric Cadier and Philippe Picouet Biomedical applications in the EELA-2 Project Leandro N Ciuffo and Rafael Mayo Make Histri: collaborative curation of exchange histories of bacterial and archaeal type strains Bert Verslyppe, Paul De Vos, Bernard De Baets and Peter Dawyndt SBMM Assistant: Social Pathway Annotation Ismael Navas-Delgado, Alejandro del Real-Chicharro, Francisca S?nchez-Jim?nez and Jose F Aldana Montes GePh-CARD: an information exchange application for an Hub & Spoke Network for Skeletal Dysplasias Marina Mordenti and Luca Sangiorgi ProDaMa-C: a collaborative web application to generate specialized protein structure datasets Giuliano Armano and Andrea Manconi RNA tertiary structure prediction with ModeRNA Magdalena Musielak, Kristian Rother, Tomasz Puton and Janusz M. Bujnicki Improved heuristic for pairwise RNA secondary structure prediction Olivier Perriquet and Pedro Barahona Analysing the microRNA-17-92/Myc/E2F/RB Compound Toggle Switch by Theorem Proving Giampaolo Bella and Pietro Li? Mapping miRNA genes on human fragile sites and translocation breakpoints Alfredo Ferro, rosalba giugno, Alessandro Lagan?, Alfredo Pulvirenti and Francesco Russo MicroRNA.gr: A suite of web based tools for elucidating microRNA function Giorgio L. Papadopoulos, Panagiotis Alexiou, Manolis Maragkakis, Martin Reczko and Artemis G. Hatzigeorgiou miRScape: A Cytoscape Plugin to Annotate Biological Networks with microRNAs Alfredo Ferro, rosalba giugno, Alessandro Lagan?, Misael Mongiov?, Giuseppe Pigola, Alfredo Pulvirenti, Gary Bader and Dennis Shasha POSTERS A web site supporting collaborative activities within the Italian Network for Oncology Bioinformatics Silvia Giuliani, Elda Rossi, Stefania Parodi and Paolo Romano Prediction of human targets for viral encoded microRNAs Alessandro Lagan?, Stefano Forte, Rosalba Giugno, Alfredo Pulvirenti and Alfredo Ferro Genome-wide microRNA target gene identification and annotation using multiple prediction algorithms Dario Corrada, Luciano Milanesi and Daniele Catalucci Design of highly specific synthetic miRNAs Alessandro Lagan?, Stefano Forte, Angela Papa, Rosalba Giugno, Alfredo Pulvirenti, Dennis Shasha and Alfredo Ferro NETTAB '09 - Ninth International Workshop on Network Tools and Applications in Biology 10-13 June 2009, Catania, Italy http://www.nettab.org/2009/ Paolo Romano (paolo.romano at istge.it) Bioinformatics National Cancer Research Institute (IST) From jimp at compbio.dundee.ac.uk Thu May 28 08:29:19 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Thu, 28 May 2009 13:29:19 +0100 Subject: [Biojava-l] Webservices clients in bio-java In-Reply-To: <93b45ca50905190705qec04c48yb7a81c9b0715a50e@mail.gmail.com> References: <200905141555.31967.niall@sgenomics.org> <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> <200905190936.38541.niall@sgenomics.org> <93b45ca50905190702l2169d7f4qf200ed04f33c17b5@mail.gmail.com> <93b45ca50905190705qec04c48yb7a81c9b0715a50e@mail.gmail.com> Message-ID: <4A1E839F.2050700@compbio.dundee.ac.uk> Hi All. Mark Schreiber wrote: > There was a recent discussion of generated code on the list. A good solution > for webservices may be to put the wsdl and high level api in the repository > and let jax-ws tools autogenerate the client biolerplate code. Just to clarify - by high-level API you mean the interface to the code that takes biojava objects as arguments and translates them into parameters for the service and vice versa for the results ? With regard to the recent(ish) discussion about modularisation in BJ3, I'd suggest that a module is created for each distinct WSDL containing the glue code for connecting BJ objects with the boilerplate code. The glue code implements factory methods to generate instances of appropriate BJ service APIs, which are defined in a core BJ module. This means that the WSDL + API implementation is decoupled from the BJ service API. Does that make sense (and/or am I simply stating the obvious here) ? Jim. -- ------------------------------------------------------------------- J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk The University of Dundee is a Scottish Registered Charity, No. SC015096. From willishf at ufl.edu Thu May 28 09:35:49 2009 From: willishf at ufl.edu (Scooter Willis) Date: Thu, 28 May 2009 09:35:49 -0400 Subject: [Biojava-l] Webservices clients in bio-java In-Reply-To: <4A1E839F.2050700@compbio.dundee.ac.uk> References: <200905141555.31967.niall@sgenomics.org> <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> <200905190936.38541.niall@sgenomics.org> <93b45ca50905190702l2169d7f4qf200ed04f33c17b5@mail.gmail.com> <93b45ca50905190705qec04c48yb7a81c9b0715a50e@mail.gmail.com> <4A1E839F.2050700@compbio.dundee.ac.uk> Message-ID: <7ceb4beb0905280635g7bbb028dw9c5d69a9290b5a16@mail.gmail.com> Jim I agree. I am planning on doing some testing of a couple BLAST web services interfaces(assuming more than one exists) and see what they truly have in common and see how that would impact a BJ3 front end to multiple providers. My assumption is that they will be the same. I noticed on the NCBI Blast implementations the user was required to pass their email address as part of the web service call. They are concerned with abuse from external processes and they only allow one sequence per request. >From wikipedia the following are listed as BLAST resources where more than one may offer a web service interface. Should BioJava3 try and support more than one? Thanks Scooter Variations of BLAST - WU-BLAST ? the original gapping BLAST with statistics, developed and maintained by Warren Gish at Washington University in St. Louis - EBI's BLAST Services ? EBI'smain blast services page. - FSA-BLAST ? a new, faster but still accurate version of NCBI BLAST based on recently published algorithmic improvements - NBIC mpiBLAST ? at the Netherlands Bioinformatics Centre - Parallel BLAST? a dual scheduling BLAST tested on the Blue Gene/L - mpiBLAST ? open-source parallel BLAST - A/G BLAST ? implementation for PowerPC G4/G5 processors and Mac OS X, from Apple Computer 's Advanced Computation Groupand Genentech . - STRAP ? the protein workbench STRAPcontains a comfortable BLAST front-end with a cache for BLAST results [edit ] Commercial versions - ThermoBLAST by DNA Software Inc.? scans entire genomes quickly and accurately combing the power of BLAST with the most advanced thermodynamics parameters - PatternHunter? an alternative software which provides similar functionality to BLAST while claiming increased speed and sensitivity - KoriBlast ? a reliable graphical environment dedicated to sequence data mining. KoriBlast combines Blast searches with advanced data management capabilities and a state-of-the-art graphical user interface. - microbial identification BLAST ? a quality controlled database for in-vitro diagnostics. SepsiTest combines broad-range-PCR using ultra-pure reagents with Blast searches in a quality controlled environment. On Thu, May 28, 2009 at 8:29 AM, James Procter wrote: > Hi All. > > Mark Schreiber wrote: > > There was a recent discussion of generated code on the list. A good > solution > > for webservices may be to put the wsdl and high level api in the > repository > > and let jax-ws tools autogenerate the client biolerplate code. > > Just to clarify - by high-level API you mean the interface to the code > that takes biojava objects as arguments and translates them into > parameters for the service and vice versa for the results ? > > With regard to the recent(ish) discussion about modularisation in BJ3, > I'd suggest that a module is created for each distinct WSDL containing > the glue code for connecting BJ objects with the boilerplate code. The > glue code implements factory methods to generate instances of > appropriate BJ service APIs, which are defined in a core BJ module. This > means that the WSDL + API implementation is decoupled from the BJ > service API. > > Does that make sense (and/or am I simply stating the obvious here) ? > > Jim. > > -- > ------------------------------------------------------------------- > J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group > Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk > The University of Dundee is a Scottish Registered Charity, No. SC015096. > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From markjschreiber at gmail.com Thu May 28 10:11:36 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 28 May 2009 22:11:36 +0800 Subject: [Biojava-l] Webservices clients in bio-java In-Reply-To: <93b45ca50905280709q6896f96fg598276de20222440@mail.gmail.com> References: <200905141555.31967.niall@sgenomics.org> <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> <200905190936.38541.niall@sgenomics.org> <93b45ca50905190702l2169d7f4qf200ed04f33c17b5@mail.gmail.com> <93b45ca50905190705qec04c48yb7a81c9b0715a50e@mail.gmail.com> <4A1E839F.2050700@compbio.dundee.ac.uk> <93b45ca50905280709q6896f96fg598276de20222440@mail.gmail.com> Message-ID: <93b45ca50905280711q222a0c90ob93a3098ff1b97c7@mail.gmail.com> By high level API I mean anything not machine generated. That could be classes that use biojava objects or a more humanized or smarter interface to the boilerplate code. - Mark On 28 May 2009, 2:53 PM, "James Procter" wrote: Hi All. Mark Schreiber wrote: > There was a recent discussion of generated code on the list. A good solutio... Just to clarify - by high-level API you mean the interface to the code that takes biojava objects as arguments and translates them into parameters for the service and vice versa for the results ? With regard to the recent(ish) discussion about modularisation in BJ3, I'd suggest that a module is created for each distinct WSDL containing the glue code for connecting BJ objects with the boilerplate code. The glue code implements factory methods to generate instances of appropriate BJ service APIs, which are defined in a core BJ module. This means that the WSDL + API implementation is decoupled from the BJ service API. Does that make sense (and/or am I simply stating the obvious here) ? Jim. -- ------------------------------------------------------------------- J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk The University of Dundee is a Scottish Registered Charity, No. SC015096. _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.... From andreas at sdsc.edu Thu May 28 10:59:17 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 28 May 2009 07:59:17 -0700 Subject: [Biojava-l] Webservices clients in bio-java In-Reply-To: <4A1E839F.2050700@compbio.dundee.ac.uk> References: <200905141555.31967.niall@sgenomics.org> <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> <200905190936.38541.niall@sgenomics.org> <93b45ca50905190702l2169d7f4qf200ed04f33c17b5@mail.gmail.com> <93b45ca50905190705qec04c48yb7a81c9b0715a50e@mail.gmail.com> <4A1E839F.2050700@compbio.dundee.ac.uk> Message-ID: <59a41c430905280759i503719eaxd2ac64436154be2d@mail.gmail.com> makes sense and fits to the more detailed discussion we are having on biojava-dev right now. A On Thu, May 28, 2009 at 5:29 AM, James Procter wrote: > Hi All. > > Mark Schreiber wrote: >> There was a recent discussion of generated code on the list. A good solution >> for webservices may be to put the wsdl and high level api in the repository >> and let jax-ws tools autogenerate the client biolerplate code. > > Just to clarify - by high-level API you mean the interface to the code > that takes biojava objects as arguments and translates them into > parameters for the service and vice versa for the results ? > > With regard to the recent(ish) discussion about modularisation in BJ3, > I'd suggest that a module is created for each distinct WSDL containing > the glue code for connecting BJ objects with the boilerplate code. The > glue code implements factory methods to generate instances of > appropriate BJ service APIs, which are defined in a core BJ module. This > means that the WSDL + API implementation is decoupled from the BJ > service API. > > Does that make sense (and/or am I simply stating the obvious here) ? > > Jim. > > -- > ------------------------------------------------------------------- > J. B. Procter ?(ENFIN/VAMSAS) ?Barton Bioinformatics Research Group > Phone/Fax:+44(0)1382 388734/345764 ?http://www.compbio.dundee.ac.uk > The University of Dundee is a Scottish Registered Charity, No. SC015096. > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From jimp at compbio.dundee.ac.uk Thu May 28 11:14:51 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Thu, 28 May 2009 16:14:51 +0100 Subject: [Biojava-l] Webservices clients in bio-java In-Reply-To: <7ceb4beb0905280635g7bbb028dw9c5d69a9290b5a16@mail.gmail.com> References: <200905141555.31967.niall@sgenomics.org> <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> <200905190936.38541.niall@sgenomics.org> <93b45ca50905190702l2169d7f4qf200ed04f33c17b5@mail.gmail.com> <93b45ca50905190705qec04c48yb7a81c9b0715a50e@mail.gmail.com> <4A1E839F.2050700@compbio.dundee.ac.uk> <7ceb4beb0905280635g7bbb028dw9c5d69a9290b5a16@mail.gmail.com> Message-ID: <4A1EAA6B.7020202@compbio.dundee.ac.uk> Hi Mark, and Scooter. Mark Schreiber wrote: > By high level API I mean anything not machine generated. That could be > classes that use biojava objects or a more humanized or smarter > interface to the boilerplate code. ah. yes. that's what I thought. I'd really advocate keeping the Biojava object adaptor glue for the autogenerated code separate from the core. This is because (as I'm sure you know) there are no real object design standards for bioinformatic services (apart from BioMOBY..) - so the adaptor code is always going to be specific to the WSDL. Scooter Willis wrote: > I agree. I am planning on doing some testing of a couple BLAST web > services interfaces(assuming more than one exists) and see what they > truly have in common and see how that would impact a BJ3 front end to > multiple providers. My assumption is that they will be the same. Of course, the actual functionality provided by all the servers will be more or less the same (i.e. inputs and outputs), but servers will differ in how they wrap the input and result data, how they support parameters, and how they deal with polling and result retrieval (if they are at all asynchronous). I > noticed on the NCBI Blast implementations the user was required to pass > their email address as part of the web service call. They are concerned > with abuse from external processes and they only allow one sequence per > request. Yes. The NCBI has an automatic blacklisting system - too many requests in a short space of time from a single IP will result in denial of service for X (24?) hours. I'm not sure how something like Taverna manages this kind of thing, but I expect this kind of user attribute will have to be supported in the core web service model - that could also provide authentication management support too (when users are using secure or authenticated HTTP connections). > From wikipedia the following are listed as BLAST resources where more > than one may offer a web service interface. Should BioJava3 try and > support more than one? I would pick the two or three most popular/capable/geographically widespread services to implement clients for, and then choose your favourite one to start with (EBI,NCBI, and perhaps one in Japan - e.g. http://xml.nig.ac.jp/wsdl/Blast.wsdl). Implementing clients for two or three allows users a nice choice, and should also capture most variation in the input/output and polling model that is used. If any more need to be supported, then by then it should be pretty clear how to go about implementing an API instance for a new WSDL. You might also want to consider generalising the high-level interface to non-Blast sequence search services, since they all take and return more or less the same kind of data and results. I suspect this discussion should move to the biojava-dev list! Jim. From jp at javaclass.co.uk Thu May 28 14:54:42 2009 From: jp at javaclass.co.uk (JP) Date: Thu, 28 May 2009 19:54:42 +0100 Subject: [Biojava-l] dN/dS ratio calculation biojava Message-ID: <4adc29060905281154i6db465edt8c154bb30863296f@mail.gmail.com> Hi there - is there a class or method which finds the dN/dS ratio (nonsynonymous/synonymous rate ratio) for two sequences ? Many Thanks JP From ayates at ebi.ac.uk Fri May 29 04:23:34 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 29 May 2009 09:23:34 +0100 Subject: [Biojava-l] dN/dS ratio calculation biojava In-Reply-To: <4adc29060905281154i6db465edt8c154bb30863296f@mail.gmail.com> References: <4adc29060905281154i6db465edt8c154bb30863296f@mail.gmail.com> Message-ID: <4A1F9B86.6080309@ebi.ac.uk> Hi JP, No there isn't a class to calculate dN/dS. At a basic level dN/dS is a very easy ratio to calculate if you take the very basic approach of sequences of exact lengths. The basics of it are available from: http://pubmlst.org/software/analysis/start/manual/dsdn.shtml The best package I know of to calculate the rates though is codeml. It's a very awkward program to run but it can be done & will produce the best results. I can say for sure that the Ensembl Compara team use it to calculate dN/dS rates for their homologous protein sets. Hope that helps, Andy P.S. I say all of this from personal experience when I wrote a very basic calculator using BioJava (and I don't know where the code got to). The amount of work to produce that code vs. running a third party application just makes it more cost effective to go for the 3rd party app JP wrote: > Hi there - is there a class or method which finds the dN/dS ratio > (nonsynonymous/synonymous rate ratio) for two sequences ? > > Many Thanks > JP > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From jogoodma at indiana.edu Tue May 12 16:40:40 2009 From: jogoodma at indiana.edu (Josh Goodman) Date: Tue, 12 May 2009 20:40:40 -0000 Subject: [Biojava-l] FASTA parsing bug ? In-Reply-To: <49F822C6.1020809@eaglegenomics.com> References: <4adc29060904280701q5d3dc760mb018f6b38a9e056f@mail.gmail.com> <49F71258.3060103@eaglegenomics.com> <4adc29060904280759q4b27a4eembd28974d46199532@mail.gmail.com> <49F71EF5.90702@eaglegenomics.com> <49F822C6.1020809@eaglegenomics.com> Message-ID: <4A09D8DC.1050307@indiana.edu> Hi all, I apologize for missing the "next day or so" window but here is my patch. I'm attaching a patch file for FastaFormat.java (1.7 tagged branch), the full source file, and a test class. It seems to work and performance is on par with the previous approach in my measurements. One problem that I couldn't quite figure out a way around is with the guessSymbolTokenization(BufferedInputStream stream) method. If the first sequence of the stream has a header length plus first line of sequence length longer than 2000 characters it will fail to reset properly. Cheers, Josh Richard Holland wrote: > I'd love to see a proper solution to this that doesn't involve upping > the read-ahead limit. I was aware that it might be the issue, but had no > idea why it was not failing for other similar long sequences. I look > forward to seeing your suggested fix! > > thanks, > Richard > > Josh Goodman wrote: >> Hi Richard and JP, >> >> I think I can be of some help as I'm the FlyBase developer responsible for >> generating these troublesome FASTA files :-). The cause of this problem >> appears to be the description line length for the record FBpp0145470. >> >> The trouble lies in org.biojavax.bio.seq.io.FastaFormat in the while loop >> at line 196. Biojava correctly reads in FBpp0145468 but throws an error >> when trying to parse FBpp0145469. There is nothing wrong in FBpp0145469 >> but when biojava reaches the end of the sequence it reads in the header >> for the next record (FBpp0145470). It then tries to reset the >> BufferedReader to the start of FBpp0145470 but that is where the exception >> is thrown because line 197 sets the read ahead limit to 500 characters and >> the reader.readLine() command exceeds that limit. >> >> What isn't obvious to me is why other large definition lines that precede >> that line don't throw the same error (e.g. FBpp0157909). I guess the >> javadoc on BufferedReader.mark() does say "may fail" but I assumed it >> would be more predictable than that. >> >> The file in question can be downloaded from >> ftp://ftp.flybase.net/genomes/Drosophila_grimshawi/dgri_r1.3_FB2008_07/fasta/dgri-all-translation-r1.3.fasta.gz. >> >> If there is interest in a solution that doesn't involve simply upping the >> read ahead limit I can put a patch file together in the next day or so. >> >> Cheers, >> Josh >> >> On Tue, 28 Apr 2009, Richard Holland wrote: >> >>> You're right, doesn't look like newlines. >>> >>> The "Mark invalid" happens when the parser looks too far ahead in the >>> file attempting to seek out the next valid sequence to parse. I'm not >>> sure why this is happening. >>> >>> I don't have the time to test right now but if you could post the link >>> to where someone could download the same FASTA as you're using, then it >>> would make it possible for someone else to investigate in more detail. >>> >>> thanks, >>> Richard >>> >>> JP wrote: >>>> Thanks Richard for your prompt reply. >>>> >>>> I will not attach the fasta file I am parsing (12MB) its >>>> dgri-all-translation-r1.3.fasta from the flybase project. >>>> >>>> If the file had any extra new lines I would see them when I loaded it in >>>> a text editor - no ? >>>> >>>> I implemented the whole thing without using Biojava (for this part) >>>> >>>> fr = new FileReader(fastaProteinFileName); >>>> br = new BufferedReader(fr); >>>> String fastaLine; >>>> String startAccession = '>' + accessionId.trim(); >>>> String fastaEntry = ""; >>>> boolean record = false; >>>> while ((fastaLine = br.readLine()) != null) { >>>> fastaLine = fastaLine.trim() + '\n'; >>>> if (fastaLine.startsWith(startAccession)) { >>>> record = true; >>>> } else if (record && fastaLine.startsWith(">")) { >>>> record = false; >>>> break; >>>> } >>>> if (record) { >>>> fastaEntry += fastaLine; >>>> } >>>> } >>>> >>>> >>>> Notice - I do not use regex - since I'd need to read the whole file and >>>> then regex upon it (if the record is the first one - I just read that one). >>>> >>>> Cheers >>>> JP >>>> >>>> >>>> On Tue, Apr 28, 2009 at 3:27 PM, Richard Holland >>>> > wrote: >>>> >>>> The "Mark invalid" exception is indicating that the parser has gone too >>>> far ahead in the file looking for a valid header. I'm not sure why but >>>> looking at your original query, there may be extra newlines embedded >>>> into your FASTA header line? That would definitely confuse it. >>>> >>>> The parser is not able to currently pull out just one sequence - in >>>> effect this is a search facility, which it doesn't have. :( >>>> >>>> thanks, >>>> Richard >>>> >>>> JP wrote: >>>> > Hi all at BioJava, >>>> > >>>> > I am trying to parse several FASTA files using the following code: >>>> > >>>> > fr = new FileReader(fastaProteinFileName); >>>> >> br = new BufferedReader(fr); >>>> >> >>>> >> RichSequenceIterator protIter = IOTools.readFastaProtein(br, null); >>>> >> while (protIter.hasNext()) { >>>> >> BioEntry bioEntry = protIter.nextBioEntry(); >>>> >> System.out.println (fastaProteinFileName + " == " + >>>> accessionId + " = >>>> >> " + bioEntry.getAccession()); >>>> >> } >>>> > >>>> > >>>> > At particular points in my fasta file - I get the following exception: >>>> > >>>> > 14:53:42,546 ERROR FastaFileProcessing - File parsing exception (from >>>> >> biojava library) >>>> >> org.biojava.bio.BioException: Could not read sequence >>>> >> at >>>> >> >>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) >>>> >> at >>>> >> >>>> org.biojavax.bio.seq.io.RichStreamReader.nextBioEntry(RichStreamReader.java:99) >>>> >> at >>>> >> >>>> edu.imperial.msc.orthologue.fasta.FastaFileProcessing.getProteinSequenceFromFASTAFile(FastaFileProcessing.java:60) >>>> >> at >>>> >> >>>> edu.imperial.msc.orthologue.core.OrthologueFinder.getFASTAEntries(OrthologueFinder.java:64) >>>> >> at >>>> >> >>>> edu.imperial.msc.orthologue.core.OrthologueFinder.(OrthologueFinder.java:51) >>>> >> at >>>> >> >>>> edu.imperial.msc.orthologue.launcher.OrthologueFinderLauncher.main(OrthologueFinderLauncher.java:60) >>>> >> Caused by: java.io.IOException: Mark invalid >>>> >> at java.io.BufferedReader.reset(Unknown Source) >>>> >> at >>>> >> >>>> org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:202) >>>> >> at >>>> >> >>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) >>>> >> ... 5 more >>>> > >>>> > >>>> > Interestingly if I delete the header portion of the header line (from >>>> > type=protein... till the end of the line ...Dgri;) >>>> > >>>> >> FBpp0145468 type=protein; >>>> >> >>>> loc=scaffold_15252:join(13219687..13219727,13219972..13220279,13220507..13220798,13220861..13221180,13221286..13221467,13222258..13222629,13226331..13226463,13226531..13226658); >>>> >> ID=FBpp0145468; name=Dgri\GH11562-PA; parent=FBgn0119042,FBtr0146976; >>>> >> dbxref=FlyBase:FBpp0145468,FlyBase_Annotation_IDs:GH11562-PA; >>>> >> MD5=c8dc38c7197a0d3c93c78b08059e2604; length=591; release=r1.3; >>>> >> species=Dgri; >>>> >> >>>> > >>>> > It works - but I have a number of these exceptions (and I do not >>>> want to >>>> > edit the original data). Mind you I have longer headers in my >>>> file which >>>> > are parsed OK (strange!). >>>> > >>>> > Any ideas anyone ? Alternatively - is there a better way how to >>>> get ONE >>>> > SINGLE sequence from the whole fasta file give that I have the >>>> accession id >>>> > (FBpp0145468) ? >>>> > >>>> > Many Thanks >>>> > JP >>>> > _______________________________________________ >>>> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> >>>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> > >>>> >>>> -- >>>> Richard Holland, BSc MBCS >>>> Finance Director, Eagle Genomics Ltd >>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>> >>>> http://www.eaglegenomics.com/ >>>> >>>> >>> -- >>> Richard Holland, BSc MBCS >>> Finance Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> > -------------- next part -------------- A non-text attachment was scrubbed... Name: FastaFormat.patch Type: text/x-patch Size: 3402 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: FastaFormat.java Type: text/x-java Size: 11394 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: FastaFormatTest.java Type: text/x-java Size: 3517 bytes Desc: not available URL: From harryzs1981 at gmail.com Wed May 6 13:13:42 2009 From: harryzs1981 at gmail.com (sheng zhao) Date: Wed, 6 May 2009 15:13:42 +0200 Subject: [Biojava-l] Biojava-doc in chm forma Message-ID: <3d23b1eb0905060613m643adf87sdef55a05a083dd51@mail.gmail.com> Hi Where can I find Biojava-doc in chm format?? Thanks ! harry From andreas at sdsc.edu Thu May 7 12:59:32 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 7 May 2009 05:59:32 -0700 Subject: [Biojava-l] SVDSuperimposer In-Reply-To: <4A02D2E7.9070705@wp.pl> References: <4A02D2E7.9070705@wp.pl> Message-ID: <59a41c430905070559v238e0df2p28478d5108dae4a@mail.gmail.com> Hi Michal, The SVD superimposer actually does the same Singular Value Decomposition in both BioJava and BioPython. The difference is that BioJava also contains a complete alignment algorithm that can calculate a structure alignment, if the region of similarity between 2 proteins is not known a priori. See also here: http://biojava.org/wiki/BioJava:CookBook:PDB:align and http://biojava.org/wiki/BioJava:CookBook:PDB:aboutalign Andreas On Thu, May 7, 2009 at 5:24 AM, Michal Lorenc wrote: > Hello Andreas, > SVDSuperimposer from BioPython works only with proteins which have the same > length, but in BioJava it does not matter. > > Do you published how SVDSuperimposer in BioJava works or do you know where > could I found more information? > > Thank you in advance. > > Best regards, > > Michal > From raphael.andre.bauer at gmail.com Fri May 8 13:13:44 2009 From: raphael.andre.bauer at gmail.com (=?UTF-8?Q?Raphael_Andr=C3=A9_Bauer?=) Date: Fri, 8 May 2009 15:13:44 +0200 Subject: [Biojava-l] SVDSuperimposer In-Reply-To: <59a41c430905070559v238e0df2p28478d5108dae4a@mail.gmail.com> References: <4A02D2E7.9070705@wp.pl> <59a41c430905070559v238e0df2p28478d5108dae4a@mail.gmail.com> Message-ID: <9b46aa30905080613o2ea91751sdd086c5d880cc435@mail.gmail.com> On Thu, May 7, 2009 at 2:59 PM, Andreas Prlic wrote: > Hi Michal, > > The SVD superimposer actually does the same Singular Value > Decomposition in both BioJava and BioPython. The difference is that > BioJava also contains a complete alignment algorithm that can > calculate a structure alignment, if the region of similarity between 2 > proteins is not known a priori. > > See also here: > http://biojava.org/wiki/BioJava:CookBook:PDB:align ?and > http://biojava.org/wiki/BioJava:CookBook:PDB:aboutalign Maybe it is interesting for you to know that we developed a generic algorithm for the structural alignment of RNA and proteins called "LaJolla". The reference implementation is built using BioJava and released as open source at http://lajolla.sourceforge.net. I think LaJolla is straight forward to use and modify if you know how to use Maven and the BioJava core classes. Btw. The algorithm for the alignment of proteins used by BioJava yielded sometimes strange results - but I did not do an exhaustive evaluation. But maybe LaJolla is an alternative in those cases. Additionally -- if the BioJava project is interested in integrating LaJolla into BioJava this is really not a problem -- but I do not know if it blurs the clean toolkit/library character of BioJava. We are open for all suggestions regarding the algorithm and also regarding a cooperation with BioJava... Thanks, Raphael @Michal: LaJolla is also partly made in Poland -- so mabye you some people by chance.. > > Andreas > > On Thu, May 7, 2009 at 5:24 AM, Michal Lorenc wrote: >> Hello Andreas, >> SVDSuperimposer from BioPython works only with proteins which have the same >> length, but in BioJava it does not matter. >> >> Do you published how SVDSuperimposer in BioJava works or do you know where >> could I found more information? >> >> Thank you in advance. >> >> Best regards, >> >> Michal >> > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Fri May 8 17:37:41 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 8 May 2009 10:37:41 -0700 Subject: [Biojava-l] SVDSuperimposer In-Reply-To: <9b46aa30905080613o2ea91751sdd086c5d880cc435@mail.gmail.com> References: <4A02D2E7.9070705@wp.pl> <59a41c430905070559v238e0df2p28478d5108dae4a@mail.gmail.com> <9b46aa30905080613o2ea91751sdd086c5d880cc435@mail.gmail.com> Message-ID: <59a41c430905081037n5b2ce541nc132dbfb2ea49b0a@mail.gmail.com> Hi Raphael, Thanks for posting the link to "LaJolla" (funny that you chose this name, it is the part of the town where my university is located...). This is very interesting and exactly the kind of applications we are happy to see BioJava being used for. Do you want to provide a link back to your project site at http://biojava.org/wiki/BioJava:BioJavaInside and also add your paper to the list of papers that are using BioJava? About the BioJava algorithm : If you have an example where the BioJava structure comparison does not give the result you would expect, could you send it to me so I can take a look ? Thanks, Andreas On Fri, May 8, 2009 at 6:13 AM, Raphael Andr? Bauer wrote: > On Thu, May 7, 2009 at 2:59 PM, Andreas Prlic wrote: >> Hi Michal, >> >> The SVD superimposer actually does the same Singular Value >> Decomposition in both BioJava and BioPython. The difference is that >> BioJava also contains a complete alignment algorithm that can >> calculate a structure alignment, if the region of similarity between 2 >> proteins is not known a priori. >> >> See also here: >> http://biojava.org/wiki/BioJava:CookBook:PDB:align ?and >> http://biojava.org/wiki/BioJava:CookBook:PDB:aboutalign > > Maybe it is interesting for you to know that we developed a generic > algorithm for the structural alignment of RNA and proteins called > "LaJolla". The reference implementation is built using BioJava and > released as open source at http://lajolla.sourceforge.net. I think > LaJolla is straight forward to use and modify if you know how to use > Maven and the BioJava core classes. > > Btw. The algorithm for the alignment of proteins used by BioJava > yielded sometimes strange results - but I did not do an exhaustive > evaluation. But maybe LaJolla is an alternative in those cases. > > Additionally -- if the BioJava project is interested in integrating > LaJolla into BioJava this is really not a problem -- but I do not know > if it blurs the clean toolkit/library character of BioJava. > > We are open for all suggestions regarding the algorithm and also > regarding a cooperation with BioJava... > > Thanks, > > Raphael > @Michal: LaJolla is also partly made in Poland -- so mabye you some > people by chance.. > >> >> Andreas >> >> On Thu, May 7, 2009 at 5:24 AM, Michal Lorenc wrote: >>> Hello Andreas, >>> SVDSuperimposer from BioPython works only with proteins which have the same >>> length, but in BioJava it does not matter. >>> >>> Do you published how SVDSuperimposer in BioJava works or do you know where >>> could I found more information? >>> >>> Thank you in advance. >>> >>> Best regards, >>> >>> Michal >>> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Mon May 11 04:30:30 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 10 May 2009 21:30:30 -0700 Subject: [Biojava-l] Plans for next biojava release - modularization Message-ID: <59a41c430905102130v6e878c6bgda582268e02824fe@mail.gmail.com> Hi, In case you are not subscribed to biojava-dev: we are having a discussion about how to organize the code base for the next release of biojava on the biojava-dev mailing list. If you want to follow this discussion, but are not subscribed yet, you can do so here: http://www.biojava.org/mailman/listinfo/biojava-dev Andreas From raphael.andre.bauer at gmail.com Mon May 11 09:10:48 2009 From: raphael.andre.bauer at gmail.com (=?UTF-8?Q?Raphael_Andr=C3=A9_Bauer?=) Date: Mon, 11 May 2009 11:10:48 +0200 Subject: [Biojava-l] SVDSuperimposer In-Reply-To: <59a41c430905081037n5b2ce541nc132dbfb2ea49b0a@mail.gmail.com> References: <4A02D2E7.9070705@wp.pl> <59a41c430905070559v238e0df2p28478d5108dae4a@mail.gmail.com> <9b46aa30905080613o2ea91751sdd086c5d880cc435@mail.gmail.com> <59a41c430905081037n5b2ce541nc132dbfb2ea49b0a@mail.gmail.com> Message-ID: <9b46aa30905110210j5799359s6d72f50dccb3c74f@mail.gmail.com> On Fri, May 8, 2009 at 7:37 PM, Andreas Prlic wrote: > Hi Raphael, > > Thanks for posting the link to "LaJolla" (funny that you chose this > name, it is the part of the town where my university is located...). > This is very interesting and exactly the kind of applications we are > happy to see BioJava being used for. Well. Phil Bourne and the PDB are there (and I think you are also somehow connected to both)... I also heard that LaJolla might have nice beaches, sunny weather and the surf?... So LaJolla definitely is a good name for a boring structural RNA and protein alignment algorithm... > > Do you want to provide a link back to your project site at > http://biojava.org/wiki/BioJava:BioJavaInside and also add your paper > to the list of papers that are using BioJava? done > > About the ?BioJava algorithm : If you have an example where the > BioJava structure comparison does not give the result you would > expect, could you send it to me so I can take a look ? I did not write down the examples (I thought, that the protein alignment of BioJava is somehow alpha because of "personal communication of..."). But the examples where simple. If I rediscover them I will let you know. Maybe it is also of help to mention in the source code or on the webpage of the cookbook: "If our protein alignment alg delivers strange results please let us now..." or so.. So that users are aware of submitting strange results... Thanks, Raphael > > Thanks, > Andreas > > > On Fri, May 8, 2009 at 6:13 AM, Raphael Andr? Bauer > wrote: >> On Thu, May 7, 2009 at 2:59 PM, Andreas Prlic wrote: >>> Hi Michal, >>> >>> The SVD superimposer actually does the same Singular Value >>> Decomposition in both BioJava and BioPython. The difference is that >>> BioJava also contains a complete alignment algorithm that can >>> calculate a structure alignment, if the region of similarity between 2 >>> proteins is not known a priori. >>> >>> See also here: >>> http://biojava.org/wiki/BioJava:CookBook:PDB:align ?and >>> http://biojava.org/wiki/BioJava:CookBook:PDB:aboutalign >> >> Maybe it is interesting for you to know that we developed a generic >> algorithm for the structural alignment of RNA and proteins called >> "LaJolla". The reference implementation is built using BioJava and >> released as open source at http://lajolla.sourceforge.net. I think >> LaJolla is straight forward to use and modify if you know how to use >> Maven and the BioJava core classes. >> >> Btw. The algorithm for the alignment of proteins used by BioJava >> yielded sometimes strange results - but I did not do an exhaustive >> evaluation. But maybe LaJolla is an alternative in those cases. >> >> Additionally -- if the BioJava project is interested in integrating >> LaJolla into BioJava this is really not a problem -- but I do not know >> if it blurs the clean toolkit/library character of BioJava. >> >> We are open for all suggestions regarding the algorithm and also >> regarding a cooperation with BioJava... >> >> Thanks, >> >> Raphael >> @Michal: LaJolla is also partly made in Poland -- so mabye you some >> people by chance.. >> >>> >>> Andreas >>> >>> On Thu, May 7, 2009 at 5:24 AM, Michal Lorenc wrote: >>>> Hello Andreas, >>>> SVDSuperimposer from BioPython works only with proteins which have the same >>>> length, but in BioJava it does not matter. >>>> >>>> Do you published how SVDSuperimposer in BioJava works or do you know where >>>> could I found more information? >>>> >>>> Thank you in advance. >>>> >>>> Best regards, >>>> >>>> Michal >>>> >>> _______________________________________________ >>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > From jp at javaclass.co.uk Mon May 11 10:05:43 2009 From: jp at javaclass.co.uk (JP) Date: Mon, 11 May 2009 11:05:43 +0100 Subject: [Biojava-l] FASTA header, loc attribute question Message-ID: <4adc29060905110305l392a9792p2aa40590e97c9891@mail.gmail.com> Hi there at Biojava, I have two FASTA files - one containing amino acid sequences and the other containing dna sequences. In the AA FASTA file I have something like : >FBpp0077713 type=protein; loc=2L:join(384551..384894,385701..385746,386308..386576,386703..387270); ID=FBpp0077713; name=al-PA; parent=FBgn0000061,FBtr0078053; dbxref=FlyBase:FBpp0077713,GB_protein:AAF51505.1,GB_protein:AAF51505,FlyBase_Annotation_IDs:CG3935-PA,REFSEQ:NP_722629; MD5=64a866db3e2913b97a2158c2de9d02f6; length=408; release=r5.9; species=Dmel; MGISEEIKLEELPQEAKLAHPDAVVLVDRAPGSSAASAGAALTVSMSVSG GAPSGASGASGGTNSPVSDGNSDCEADEYAPKRKQRRYRTTFTSFQLEEL... etc etc etc I would like to parse this header line in particular the loc attribute and extract it from the entry in the DNA FASTA file (so I get the genomic data for the protein) >FBgn0000061 type=gene; loc=2L:378116..387439; ID=FBgn0000061; name=al; dbxref=FlyBase:FBgn0000061,FlyBase:FBan0003935,FlyBase_Annotation_IDs:CG3935,GB:AE003589,GB_protein:AAF51505,GB:AY121696,GB_protein:AAM52023,GB:BI485174,GB:CZ486795,GB:L08401,GB_protein:AAA28840,UniProt/Swiss-Prot:Q06453,INTERPRO:IPR000047,INTERPRO:IPR001356,INTERPRO:IPR003654,INTERPRO:IPR009057,INTERPRO:IPR012287,bdgpinsituexpr:al,dedb:5830,drsc:FBgn0000061,flight:FBgn0000061,flyatlas:FBgn0000061,flyexpress:FBgn0000061,flygrid:59464,flymine:FBgn0000061,geo:FBgn0000061,hdri:FBgn0000061,if:/gene/aristal.htm,orthologs:ensANOGA:ENSANGP00000011877,orthologs:ensBOSTA:ENSBTAP00000015907,orthologs:ensCANFA:ENSCAFP00000009888,orthologs:ensGALGA:ENSGALP00000005255,orthologs:ensHOMSA:ENSP00000298420,orthologs:ensMACMU:ENSMMUP00000007349,orthologs:ensMONDO:ENSMODP00000008388,orthologs:ensPANTR:ENSPTRP00000004281,orthologs:ensRATNO:ENSRNOP00000027186,orthologs:ensTETNI:GSTENP00015517001,orthologs:graORYSA:Q6YYB8,orthologs:graORYSA:Q8W0T5,orthologs:modCAEEL:WBGene00044330,orthologs:modDANRE:ZDB-GENE-990415-15,orthologs:modMUSMU:MGI:1097716,panther:FBgn0000061; cyto_range=21C1-21C1; gbunit=AE014134; MD5=0f5568cf13aeb2c7076f11b1ce3d6b2f; length=9324; release=r5.9; species=Dmel; GTAGTTTGCTGCCGGCTCTGGAACAGCCCGGTCATCTCGTCGCGTTCGGT TCCGATTCCGATTCGAATAGTCGAGCTGGGGATACATTGTTGTTTCCGGG etc etc etc I understand this is not exactly conventional, but does biojava support the parsing of the loc attribute ? (join, complement etc.) Many Thanks JP From holland at eaglegenomics.com Mon May 11 10:27:23 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 11 May 2009 10:27:23 +0000 Subject: [Biojava-l] FASTA header, loc attribute question In-Reply-To: <4adc29060905110305l392a9792p2aa40590e97c9891@mail.gmail.com> References: <4adc29060905110305l392a9792p2aa40590e97c9891@mail.gmail.com> Message-ID: <1242037643.15828.12.camel@buzzybee> Short answer - no, not directly. Longer answer - if you can write some code to snip out the Loc string from the FASTA description line then there is existing code which can convert the snipped Loc string into a RichLocation, which you can then apply to the parsed FASTA sequence in order to extract the required location. The Loc string parser is GenbankLocationParser, part of the biojavax packages. This assumes that the Loc string conforms to Genbank format location definitions. cheers, Richard On Mon, 2009-05-11 at 11:05 +0100, JP wrote: > Hi there at Biojava, > > I have two FASTA files - one containing amino acid sequences and the other > containing dna sequences. > > In the AA FASTA file I have something like : > > >FBpp0077713 type=protein; > loc=2L:join(384551..384894,385701..385746,386308..386576,386703..387270); > ID=FBpp0077713; name=al-PA; parent=FBgn0000061,FBtr0078053; > dbxref=FlyBase:FBpp0077713,GB_protein:AAF51505.1,GB_protein:AAF51505,FlyBase_Annotation_IDs:CG3935-PA,REFSEQ:NP_722629; > MD5=64a866db3e2913b97a2158c2de9d02f6; length=408; release=r5.9; > species=Dmel; > MGISEEIKLEELPQEAKLAHPDAVVLVDRAPGSSAASAGAALTVSMSVSG > GAPSGASGASGGTNSPVSDGNSDCEADEYAPKRKQRRYRTTFTSFQLEEL... > etc etc etc > > I would like to parse this header line in particular the loc attribute and > extract it from the entry in the DNA FASTA file (so I get the genomic data > for the protein) > > >FBgn0000061 type=gene; loc=2L:378116..387439; ID=FBgn0000061; name=al; > dbxref=FlyBase:FBgn0000061,FlyBase:FBan0003935,FlyBase_Annotation_IDs:CG3935,GB:AE003589,GB_protein:AAF51505,GB:AY121696,GB_protein:AAM52023,GB:BI485174,GB:CZ486795,GB:L08401,GB_protein:AAA28840,UniProt/Swiss-Prot:Q06453,INTERPRO:IPR000047,INTERPRO:IPR001356,INTERPRO:IPR003654,INTERPRO:IPR009057,INTERPRO:IPR012287,bdgpinsituexpr:al,dedb:5830,drsc:FBgn0000061,flight:FBgn0000061,flyatlas:FBgn0000061,flyexpress:FBgn0000061,flygrid:59464,flymine:FBgn0000061,geo:FBgn0000061,hdri:FBgn0000061,if:/gene/aristal.htm,orthologs:ensANOGA:ENSANGP00000011877,orthologs:ensBOSTA:ENSBTAP00000015907,orthologs:ensCANFA:ENSCAFP00000009888,orthologs:ensGALGA:ENSGALP00000005255,orthologs:ensHOMSA:ENSP00000298420,orthologs:ensMACMU:ENSMMUP00000007349,orthologs:ensMONDO:ENSMODP00000008388,orthologs:ensPANTR:ENSPTRP00000004281,orthologs:ensRATNO:ENSRNOP00000027186,orthologs:ensTETNI:GSTENP00015517001,orthologs:graORYSA:Q6YYB8,orthologs:graORYSA:Q8W0T5,orthologs:modCAEEL:WBGene00044330,orthologs:modDA! > NRE:ZDB-GENE-990415-15,orthologs:modMUSMU:MGI:1097716,panther:FBgn0000061; > cyto_range=21C1-21C1; gbunit=AE014134; MD5=0f5568cf13aeb2c7076f11b1ce3d6b2f; > length=9324; release=r5.9; species=Dmel; > GTAGTTTGCTGCCGGCTCTGGAACAGCCCGGTCATCTCGTCGCGTTCGGT > TCCGATTCCGATTCGAATAGTCGAGCTGGGGATACATTGTTGTTTCCGGG > etc etc etc > > I understand this is not exactly conventional, but does biojava support the > parsing of the loc attribute ? (join, complement etc.) > > Many Thanks > JP > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From thomas.covello at gmail.com Mon May 11 20:33:43 2009 From: thomas.covello at gmail.com (Thomas Covello) Date: Mon, 11 May 2009 14:33:43 -0600 Subject: [Biojava-l] NullPointerException when using Viterbi algorithm in HMM. Message-ID: <2e367d7b0905111333o2d074f81if9a34545acb95b5b@mail.gmail.com> Hello, I am trying to use a Hidden Markov Model to classify a sequence, and I am encountering a NullPointerException when I try to use the Viterbi algorithm. My test case follows - if you want the code I am actually using I can post it. The stack trace: java.lang.NullPointerException at org.biojava.bio.dp.onehead.SingleDP.viterbi(SingleDP.java:648) at org.biojava.bio.dp.onehead.SingleDP.viterbi(SingleDP.java:512) at oxytricha_crap.Main.main(Main.java:69) File: Main.java package oxytricha_crap; import org.biojava.bio.seq.*; import org.biojava.bio.*; import org.biojava.bio.symbol.*; import org.biojava.bio.dp.*; import org.biojava.bio.dist.*; import java.util.*; import org.biojava.bio.symbol.SymbolList; public class Main { private static Alphabet DNA = DNATools.getDNA(); private static State IES; private static State MDS; private static MarkovModel hmm = new SimpleMarkovModel(1, DNA, "MyHMM"); static { try { Distribution IES_dist = DistributionFactory.DEFAULT.createDistribution(DNA); Distribution MDS_dist = DistributionFactory.DEFAULT.createDistribution(DNA); IES = new SimpleEmissionState("IES", Annotation.EMPTY_ANNOTATION, new int [] {1}, IES_dist); MDS = new SimpleEmissionState("MDS", Annotation.EMPTY_ANNOTATION, new int [] {1}, MDS_dist); hmm.addState(IES); hmm.addState(MDS); hmm.createTransition(MDS, hmm.magicalState()); hmm.createTransition(hmm.magicalState(), MDS); hmm.createTransition(MDS, IES); hmm.createTransition(IES, MDS); hmm.createTransition(MDS, MDS); hmm.createTransition(IES, IES); } catch (Exception e) { e.printStackTrace(); } } public static void main(String[] args) { try { State currState; State lastState = hmm.magicalState(); Iterator states = Arrays.asList(new State [] { MDS, MDS, MDS, IES, IES, IES, MDS, MDS, IES, IES, MDS, MDS, MDS, MDS }).iterator(); SymbolList sl = DNATools.createDNA("acgtgtctgagaga"); HMMTrainer trainer = new SimpleHMMTrainer(hmm); trainer.startCycle(); for (Symbol sym : (List)sl.toList()) { currState = states.next(); if (currState != lastState && lastState != null) trainer.recordTransition(lastState, currState, 1.0); trainer.recordEmittedSymbol(currState, sym, 1.0); lastState = currState; } trainer.recordTransition(MDS, hmm.magicalState(), 1.0); trainer.completeCycle(); DP dp = DPFactory.DEFAULT.createDP(hmm); SymbolList unknown = DNATools.createDNA("acgtcgtacgtacgtacacaacga"); StatePath spath = dp.viterbi(new SymbolList [] { unknown }, ScoreType.PROBABILITY); } catch (Exception e) { e.printStackTrace(); } System.exit(0); } } From mauleemacx at gmail.com Wed May 13 11:21:57 2009 From: mauleemacx at gmail.com (mac LEE) Date: Wed, 13 May 2009 19:21:57 +0800 Subject: [Biojava-l] biojava on Maven2 repository Message-ID: Dear All, Hello. Is it possible to place the biojava jar into the maven 2 repository? regards, jack Lee -- Somewhere between heaven and hell mauleemacx at gmail.com From raphael.andre.bauer at gmail.com Wed May 13 13:12:11 2009 From: raphael.andre.bauer at gmail.com (=?UTF-8?Q?Raphael_Andr=C3=A9_Bauer?=) Date: Wed, 13 May 2009 15:12:11 +0200 Subject: [Biojava-l] biojava on Maven2 repository In-Reply-To: References: Message-ID: <9b46aa30905130612g194a022eu4fdb30114773b70e@mail.gmail.com> On Wed, May 13, 2009 at 1:21 PM, mac LEE wrote: > Dear All, > Hello. Is it possible to place the biojava jar into the maven 2 repository? You have to install the biojava.jar manually into your local repository: http://maven.apache.org/guides/mini/guide-3rd-party-jars-local.html LaJolla is an application that uses that approach too... Maybe the pom.xml of LaJolla is worth a look to see how it works...: http://lajolla.svn.sourceforge.net/viewvc/lajolla/trunk/lajolla/pom.xml?revision=350&view=markup Regards, Raphael > > regards, > > jack Lee > > -- > Somewhere between heaven and hell > mauleemacx at gmail.com > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From helpmedicine.savelife at gmail.com Wed May 13 14:04:18 2009 From: helpmedicine.savelife at gmail.com (helpmedicine savelife) Date: Wed, 13 May 2009 10:04:18 -0400 Subject: [Biojava-l] biojava on Maven2 repository In-Reply-To: <9b46aa30905130612g194a022eu4fdb30114773b70e@mail.gmail.com> References: <9b46aa30905130612g194a022eu4fdb30114773b70e@mail.gmail.com> Message-ID: <3d0f70fd0905130704h57eb09b6k33bf067b6f729768@mail.gmail.com> I agree it is possible by setting up in local repository (I did the same for my project I am currently working), but now that biojava is being used by many, I guess it it is a good time to to make it accessible through maven2 by providing the project metadata information to Maven2. Biojava Team - Is this possible? Thanks, Sandya On Wed, May 13, 2009 at 9:12 AM, Raphael Andr? Bauer < raphael.andre.bauer at gmail.com> wrote: > On Wed, May 13, 2009 at 1:21 PM, mac LEE wrote: > > Dear All, > > Hello. Is it possible to place the biojava jar into the maven 2 > repository? > > You have to install the biojava.jar manually into your local repository: > http://maven.apache.org/guides/mini/guide-3rd-party-jars-local.html > > LaJolla is an application that uses that approach too... Maybe the > pom.xml of LaJolla is worth a look to see how it works...: > > http://lajolla.svn.sourceforge.net/viewvc/lajolla/trunk/lajolla/pom.xml?revision=350&view=markup > > Regards, > > Raphael > > > > > regards, > > > > jack Lee > > > > -- > > Somewhere between heaven and hell > > mauleemacx at gmail.com > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From mauleemacx at gmail.com Thu May 14 05:50:51 2009 From: mauleemacx at gmail.com (mac LEE) Date: Thu, 14 May 2009 13:50:51 +0800 Subject: [Biojava-l] biojava on Maven2 repository In-Reply-To: <3d0f70fd0905130704h57eb09b6k33bf067b6f729768@mail.gmail.com> References: <9b46aa30905130612g194a022eu4fdb30114773b70e@mail.gmail.com> <3d0f70fd0905130704h57eb09b6k33bf067b6f729768@mail.gmail.com> Message-ID: Dear All, I think putting Biojava on central Maven2 repository not only facilitate current users. It also attracts new users as well, just think of how success is the BioPerl project resided in CPAN is. I strongly hope that there will be central repository soon. regards, jack LEE On Wed, May 13, 2009 at 10:04 PM, helpmedicine savelife < helpmedicine.savelife at gmail.com> wrote: > I agree it is possible by setting up in local repository (I did the same > for > my project I am currently working), but now that biojava is being used by > many, I guess it it is a good time to to make it accessible through maven2 > by providing the project metadata information to Maven2. > > Biojava Team - Is this possible? > > Thanks, > Sandya > > On Wed, May 13, 2009 at 9:12 AM, Raphael Andr? Bauer < > raphael.andre.bauer at gmail.com> wrote: > > > On Wed, May 13, 2009 at 1:21 PM, mac LEE wrote: > > > Dear All, > > > Hello. Is it possible to place the biojava jar into the maven 2 > > repository? > > > > You have to install the biojava.jar manually into your local repository: > > http://maven.apache.org/guides/mini/guide-3rd-party-jars-local.html > > > > LaJolla is an application that uses that approach too... Maybe the > > pom.xml of LaJolla is worth a look to see how it works...: > > > > > http://lajolla.svn.sourceforge.net/viewvc/lajolla/trunk/lajolla/pom.xml?revision=350&view=markup > > > > Regards, > > > > Raphael > > > > > > > > regards, > > > > > > jack Lee > > > > > > -- > > > Somewhere between heaven and hell > > > mauleemacx at gmail.com > > > _______________________________________________ > > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Somewhere between heaven and hell mauleemacx at gmail.com From andreas at sdsc.edu Thu May 14 13:38:08 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 14 May 2009 06:38:08 -0700 Subject: [Biojava-l] biojava on Maven2 repository In-Reply-To: <3d0f70fd0905130704h57eb09b6k33bf067b6f729768@mail.gmail.com> References: <9b46aa30905130612g194a022eu4fdb30114773b70e@mail.gmail.com> <3d0f70fd0905130704h57eb09b6k33bf067b6f729768@mail.gmail.com> Message-ID: <59a41c430905140638r74554c10wb9c97dddd8f05b6b@mail.gmail.com> I will have a look at this, once we have mavenized the biojava build process. In the meanwhile, perhaps somebody could post instructions how to add biojava to a local maven repository on the wiki? Andreas On Wed, May 13, 2009 at 7:04 AM, helpmedicine savelife wrote: > I agree it is possible by setting up in local repository (I did the same for > my project I am currently working), but now that biojava is being used by > many, I guess it it is a good time to to make it accessible through maven2 > by providing the project metadata information to Maven2. > > Biojava Team - Is this possible? > > Thanks, > Sandya > > On Wed, May 13, 2009 at 9:12 AM, Raphael Andr? Bauer < > raphael.andre.bauer at gmail.com> wrote: > >> On Wed, May 13, 2009 at 1:21 PM, mac LEE wrote: >> > Dear All, >> > Hello. Is it possible to place the biojava jar into the maven 2 >> repository? >> >> You have to install the biojava.jar manually into your local repository: >> http://maven.apache.org/guides/mini/guide-3rd-party-jars-local.html >> >> LaJolla is an application that uses that approach too... Maybe the >> pom.xml of LaJolla is worth a look to see how it works...: >> >> http://lajolla.svn.sourceforge.net/viewvc/lajolla/trunk/lajolla/pom.xml?revision=350&view=markup >> >> Regards, >> >> Raphael >> >> > >> > regards, >> > >> > jack Lee >> > >> > -- >> > Somewhere between heaven and hell >> > mauleemacx at gmail.com >> > _______________________________________________ >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From niall at sgenomics.org Thu May 14 13:55:31 2009 From: niall at sgenomics.org (Niall Haslam) Date: Thu, 14 May 2009 15:55:31 +0200 Subject: [Biojava-l] Webservices clients in bio-java Message-ID: <200905141555.31967.niall@sgenomics.org> Hi all, I was wondering if there was any interest in creating a set of maintained clients for popular webservices. Depending on how it goes and how feasible it is these could be included in biojava. I assume there are plenty of people writing and rewriting clients to these things. In the first instance we could just set up a repository for example code on using the servers. I'm sure this would be a useful resource for java programmers. Unless someone wants to point me towards a pre-existing resource! Niall. From heuermh at acm.org Thu May 14 16:34:53 2009 From: heuermh at acm.org (Michael Heuer) Date: Thu, 14 May 2009 12:34:53 -0400 (EDT) Subject: [Biojava-l] biojava on Maven2 repository In-Reply-To: <59a41c430905140638r74554c10wb9c97dddd8f05b6b@mail.gmail.com> Message-ID: Hello Andreas, The biojava jars can be manually uploaded to the central repository. I will do this for the 1.7 release. Once the build migrates to maven, we can set up our own repository that is mirrored to the central repository. This is required if we have a multi-module build, for example. There may be a maven repo hosted at open-bio.org already, I think the biomoby folks use one. michael On Thu, 14 May 2009, Andreas Prlic wrote: > I will have a look at this, once we have mavenized the biojava build > process. In the meanwhile, perhaps somebody could post instructions > how to add biojava to a local maven repository on the wiki? > > Andreas > > > > On Wed, May 13, 2009 at 7:04 AM, helpmedicine savelife > wrote: > > I agree it is possible by setting up in local repository (I did the same for > > my project I am currently working), but now that biojava is being used by > > many, I guess it it is a good time to to make it accessible through maven2 > > by providing the project metadata information to Maven2. > > > > Biojava Team - Is this possible? > > > > Thanks, > > Sandya > > > > On Wed, May 13, 2009 at 9:12 AM, Raphael Andr? Bauer < > > raphael.andre.bauer at gmail.com> wrote: > > > >> On Wed, May 13, 2009 at 1:21 PM, mac LEE wrote: > >> > Dear All, > >> > Hello. Is it possible to place the biojava jar into the maven 2 > >> repository? > >> > >> You have to install the biojava.jar manually into your local repository: > >> http://maven.apache.org/guides/mini/guide-3rd-party-jars-local.html > >> > >> LaJolla is an application that uses that approach too... Maybe the > >> pom.xml of LaJolla is worth a look to see how it works...: > >> > >> http://lajolla.svn.sourceforge.net/viewvc/lajolla/trunk/lajolla/pom.xml?revision=350&view=markup > >> > >> Regards, > >> > >> Raphael > >> > >> > > >> > regards, > >> > > >> > jack Lee > >> > > >> > -- > >> > Somewhere between heaven and hell > >> > mauleemacx at gmail.com > >> > _______________________________________________ > >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > >> > http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > > >> > >> _______________________________________________ > >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > > > > _______________________________________________ > > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Fri May 15 03:27:05 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 14 May 2009 20:27:05 -0700 Subject: [Biojava-l] biojava on Maven2 repository In-Reply-To: References: <59a41c430905140638r74554c10wb9c97dddd8f05b6b@mail.gmail.com> Message-ID: <59a41c430905142027g2de44f57s6ef0c48f8c877c17@mail.gmail.com> That would be helpful, thanks, Andreas On Thu, May 14, 2009 at 9:34 AM, Michael Heuer wrote: > Hello Andreas, > > The biojava jars can be manually uploaded to the central repository. ?I > will do this for the 1.7 release. > > Once the build migrates to maven, we can set up our own repository that > is mirrored to the central repository. ? This is required if we have a > multi-module build, for example. ?There may be a maven repo hosted at > open-bio.org already, I think the biomoby folks use one. > > ? michael > > > On Thu, 14 May 2009, Andreas Prlic wrote: > >> I will have a look at this, once we have mavenized the biojava build >> process. In the meanwhile, perhaps somebody could post instructions >> how to add biojava to a local maven repository on the wiki? >> >> Andreas >> >> >> >> On Wed, May 13, 2009 at 7:04 AM, helpmedicine savelife >> wrote: >> > I agree it is possible by setting up in local repository (I did the same for >> > my project I am currently working), but now that biojava is being used by >> > many, I guess it it is a good time to to make it accessible through maven2 >> > by providing the project metadata information to Maven2. >> > >> > Biojava Team - Is this possible? >> > >> > Thanks, >> > Sandya >> > >> > On Wed, May 13, 2009 at 9:12 AM, Raphael Andr? Bauer < >> > raphael.andre.bauer at gmail.com> wrote: >> > >> >> On Wed, May 13, 2009 at 1:21 PM, mac LEE wrote: >> >> > Dear All, >> >> > Hello. Is it possible to place the biojava jar into the maven 2 >> >> repository? >> >> >> >> You have to install the biojava.jar manually into your local repository: >> >> http://maven.apache.org/guides/mini/guide-3rd-party-jars-local.html >> >> >> >> LaJolla is an application that uses that approach too... Maybe the >> >> pom.xml of LaJolla is worth a look to see how it works...: >> >> >> >> http://lajolla.svn.sourceforge.net/viewvc/lajolla/trunk/lajolla/pom.xml?revision=350&view=markup >> >> >> >> Regards, >> >> >> >> Raphael >> >> >> >> > >> >> > regards, >> >> > >> >> > jack Lee >> >> > >> >> > -- >> >> > Somewhere between heaven and hell >> >> > mauleemacx at gmail.com >> >> > _______________________________________________ >> >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > >> >> >> >> _______________________________________________ >> >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> >> > >> > _______________________________________________ >> > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> > http://lists.open-bio.org/mailman/listinfo/biojava-l >> > >> >> _______________________________________________ >> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > From andreas at sdsc.edu Sat May 16 19:10:56 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 16 May 2009 12:10:56 -0700 Subject: [Biojava-l] Webservices clients in bio-java In-Reply-To: <200905141555.31967.niall@sgenomics.org> References: <200905141555.31967.niall@sgenomics.org> Message-ID: <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> Hi Niall. Which webservices did you have in mind? Something that got recently discussed on biojava-dev is support Blast webservices... Andreas On Thu, May 14, 2009 at 6:55 AM, Niall Haslam wrote: > Hi all, > > I was wondering if there was any interest in creating a set of maintained > clients for popular webservices. Depending on how it goes and how feasible it > is these could be included in biojava. I assume there are plenty of people > writing and rewriting clients to these things. In the first instance we could > just set up a repository for example code on using the servers. > > I'm sure this would be a useful resource for java programmers. Unless someone > wants to point me towards a pre-existing resource! > > Niall. > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From niall at sgenomics.org Tue May 19 07:36:38 2009 From: niall at sgenomics.org (Niall Haslam) Date: Tue, 19 May 2009 09:36:38 +0200 Subject: [Biojava-l] Webservices clients in bio-java In-Reply-To: <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> References: <200905141555.31967.niall@sgenomics.org> <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> Message-ID: <200905190936.38541.niall@sgenomics.org> On Saturday 16 May 2009 21:10, Andreas Prlic wrote: > Which webservices did you have in mind? Something that got recently > discussed on biojava-dev is support Blast webservices... I think BLAST would obviously be one of the main ones to get started with. I have a client at the moment which goes as far as fetching the job-id and generating a link to the results, which is as much as I need. If I remember correctly, the BLAST parser at the moment doesn't deal with the responses from the EBI webservices (which I used). Mark makes a valid point about the choice of technology. I would also recommend jax-ws and probably axis2 as the frameworks, but note that some clients will probably require axis1_4. Although its not clear which e-mail discussion you are referring to - is it one on the biojava-dev mailing list? A lot of the client code is autogenerated, and I guess one question would be should we include this in any repository that we make? I have been using the wtp plugin in eclipse and this takes some of the pain out of generating clients. Though it does generate some rather large source files. Niall. From markjschreiber at gmail.com Tue May 19 14:05:11 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 19 May 2009 22:05:11 +0800 Subject: [Biojava-l] Webservices clients in bio-java In-Reply-To: <93b45ca50905190702l2169d7f4qf200ed04f33c17b5@mail.gmail.com> References: <200905141555.31967.niall@sgenomics.org> <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> <200905190936.38541.niall@sgenomics.org> <93b45ca50905190702l2169d7f4qf200ed04f33c17b5@mail.gmail.com> Message-ID: <93b45ca50905190705qec04c48yb7a81c9b0715a50e@mail.gmail.com> Hi Niall - There was a recent discussion of generated code on the list. A good solution for webservices may be to put the wsdl and high level api in the repository and let jax-ws tools autogenerate the client biolerplate code. Mark On 19 May 2009, 3:39 PM, "Niall Haslam" wrote: On Saturday 16 May 2009 21:10, Andreas Prlic wrote: > Which webservices did you have in mind? Someth... I think BLAST would obviously be one of the main ones to get started with. I have a client at the moment which goes as far as fetching the job-id and generating a link to the results, which is as much as I need. If I remember correctly, the BLAST parser at the moment doesn't deal with the responses from the EBI webservices (which I used). Mark makes a valid point about the choice of technology. I would also recommend jax-ws and probably axis2 as the frameworks, but note that some clients will probably require axis1_4. Although its not clear which e-mail discussion you are referring to - is it one on the biojava-dev mailing list? A lot of the client code is autogenerated, and I guess one question would be should we include this in any repository that we make? I have been using the wtp plugin in eclipse and this takes some of the pain out of generating clients. Though it does generate some rather large source files. Niall. _______________________________________________ Biojava-l mailing list - Biojava-l at lists.o... From aaron_b_a at yahoo.com Wed May 20 01:12:28 2009 From: aaron_b_a at yahoo.com (Aaron Brata Aditama) Date: Tue, 19 May 2009 18:12:28 -0700 (PDT) Subject: [Biojava-l] How to write pairwiseAlignment output in my own format? Message-ID: <248630.13325.qm@web56107.mail.re3.yahoo.com> There is a default output method in NeedlemanWunsch Class, and it writes output in its own format. How can I write output of paiwire alignment in my own format? For example, I want to get the aligned sequences and the score only. Aaron Brata Aditama From mike.thon at gmail.com Thu May 21 18:06:56 2009 From: mike.thon at gmail.com (Michael Thon) Date: Thu, 21 May 2009 20:06:56 +0200 Subject: [Biojava-l] CRC64Checksum example Message-ID: <8EEAB8FC-4D9D-4417-90EC-DC9E4329648C@gmail.com> Hi all - I need to compute a crc64 for some sequences and I thought that org.biojavax.utils.CRC64Checksum could do this for me. Java throws a NullPointerException when I try to do: crc.update(seq); where crc is a CRC64Checksum object and seq is a String containing a protein sequence. Does anyone have some example code showing how to compute a crc using this class? Many thanks Mike Thon From holland at eaglegenomics.com Thu May 21 19:11:14 2009 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 21 May 2009 20:11:14 +0100 Subject: [Biojava-l] CRC64Checksum example In-Reply-To: <8EEAB8FC-4D9D-4417-90EC-DC9E4329648C@gmail.com> References: <8EEAB8FC-4D9D-4417-90EC-DC9E4329648C@gmail.com> Message-ID: <1242933074.29986.16.camel@buzzybee> Could you provide the line you used to instantiate the CRC64Checksum object, and the value that you're passing to update(seq), and the full stacktrace you get when it fails? That'll all help us work out what's wrong. cheers, Richard On Thu, 2009-05-21 at 20:06 +0200, Michael Thon wrote: > Hi all - I need to compute a crc64 for some sequences and I thought > that org.biojavax.utils.CRC64Checksum could do this for me. Java > throws a NullPointerException when I try to do: > crc.update(seq); > where crc is a CRC64Checksum object and seq is a String containing a > protein sequence. > > Does anyone have some example code showing how to compute a crc using > this class? > Many thanks > Mike Thon > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From paolo.romano at istge.it Wed May 27 16:15:09 2009 From: paolo.romano at istge.it (Paolo Romano) Date: Wed, 27 May 2009 18:15:09 +0200 Subject: [Biojava-l] NETTAB 2009: Last Call for Participation Message-ID: <200905271630.n4RGU87M030436@ibm43p.biotech.ist.unige.it> Last Call for Participation NETTAB 2009 Workshop on "Technologies, Tools and Applications for Collaborative and Social Bioinformatics Research and Development" with a Special Session on: "Methods and Tools for RNA Structure and Functional Analysis" June 10-13, 2009 Department of Computer Science, University of Catania, Italy http://www.nettab.org/2009/ KEYNOTE TALKS RNA WikiProject: Community annotation of RNA families Alex Bateman, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK. Who are you? Managing collaborative digital identities in bioinformatics with myExperiment Duncan Hull, School of Chemistry, University of Manchester, Manchester, UK. Semantically Integrated eCommunities in Biomedicine: Next-Generation Models of Biomedical Communication Tim Clark, Director of Informatics, MassGeneral Institute for Neurodegenerative Disease Neurology Research Department, Massachusetts General Hospital, Boston, USA. Bacterial Phylogeny and Taxonomy in the High-Throughput Sequencing World Gabriel Valiente, Technical University of Catalonia, Department of Software, Barcelona, Spain. Non coding RNAs (title to be confirmed) Doron Betel, MSKCC - Computational Biology Center, New York, USA . SELECTED ORAL COMMUNICATIONS The DC-THERA Directory: A Knowledge Management System to Support Collaboration on Dendritic Cell and Immunology Research Michaela Guendel, Ciro Scognamiglio, Marco Brandizi and Andrea Splendiani Bioinformatics Experimentation in a New Agent-Based Infrastructure OpenKnowledge Dietlind Gerloff, Xueping Quan, Adrian de Pinninck, Paolo Besana, Siu-wai Leung, Marco Schorlemmer and David Robertson Abstraction based cooperation for the design of bioinformatics workflows Fr?d?ric Cadier and Philippe Picouet Biomedical applications in the EELA-2 Project Leandro N Ciuffo and Rafael Mayo Make Histri: collaborative curation of exchange histories of bacterial and archaeal type strains Bert Verslyppe, Paul De Vos, Bernard De Baets and Peter Dawyndt SBMM Assistant: Social Pathway Annotation Ismael Navas-Delgado, Alejandro del Real-Chicharro, Francisca S?nchez-Jim?nez and Jose F Aldana Montes GePh-CARD: an information exchange application for an Hub & Spoke Network for Skeletal Dysplasias Marina Mordenti and Luca Sangiorgi ProDaMa-C: a collaborative web application to generate specialized protein structure datasets Giuliano Armano and Andrea Manconi RNA tertiary structure prediction with ModeRNA Magdalena Musielak, Kristian Rother, Tomasz Puton and Janusz M. Bujnicki Improved heuristic for pairwise RNA secondary structure prediction Olivier Perriquet and Pedro Barahona Analysing the microRNA-17-92/Myc/E2F/RB Compound Toggle Switch by Theorem Proving Giampaolo Bella and Pietro Li? Mapping miRNA genes on human fragile sites and translocation breakpoints Alfredo Ferro, rosalba giugno, Alessandro Lagan?, Alfredo Pulvirenti and Francesco Russo MicroRNA.gr: A suite of web based tools for elucidating microRNA function Giorgio L. Papadopoulos, Panagiotis Alexiou, Manolis Maragkakis, Martin Reczko and Artemis G. Hatzigeorgiou miRScape: A Cytoscape Plugin to Annotate Biological Networks with microRNAs Alfredo Ferro, rosalba giugno, Alessandro Lagan?, Misael Mongiov?, Giuseppe Pigola, Alfredo Pulvirenti, Gary Bader and Dennis Shasha POSTERS A web site supporting collaborative activities within the Italian Network for Oncology Bioinformatics Silvia Giuliani, Elda Rossi, Stefania Parodi and Paolo Romano Prediction of human targets for viral encoded microRNAs Alessandro Lagan?, Stefano Forte, Rosalba Giugno, Alfredo Pulvirenti and Alfredo Ferro Genome-wide microRNA target gene identification and annotation using multiple prediction algorithms Dario Corrada, Luciano Milanesi and Daniele Catalucci Design of highly specific synthetic miRNAs Alessandro Lagan?, Stefano Forte, Angela Papa, Rosalba Giugno, Alfredo Pulvirenti, Dennis Shasha and Alfredo Ferro NETTAB '09 - Ninth International Workshop on Network Tools and Applications in Biology 10-13 June 2009, Catania, Italy http://www.nettab.org/2009/ Paolo Romano (paolo.romano at istge.it) Bioinformatics National Cancer Research Institute (IST) From jimp at compbio.dundee.ac.uk Thu May 28 12:29:19 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Thu, 28 May 2009 13:29:19 +0100 Subject: [Biojava-l] Webservices clients in bio-java In-Reply-To: <93b45ca50905190705qec04c48yb7a81c9b0715a50e@mail.gmail.com> References: <200905141555.31967.niall@sgenomics.org> <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> <200905190936.38541.niall@sgenomics.org> <93b45ca50905190702l2169d7f4qf200ed04f33c17b5@mail.gmail.com> <93b45ca50905190705qec04c48yb7a81c9b0715a50e@mail.gmail.com> Message-ID: <4A1E839F.2050700@compbio.dundee.ac.uk> Hi All. Mark Schreiber wrote: > There was a recent discussion of generated code on the list. A good solution > for webservices may be to put the wsdl and high level api in the repository > and let jax-ws tools autogenerate the client biolerplate code. Just to clarify - by high-level API you mean the interface to the code that takes biojava objects as arguments and translates them into parameters for the service and vice versa for the results ? With regard to the recent(ish) discussion about modularisation in BJ3, I'd suggest that a module is created for each distinct WSDL containing the glue code for connecting BJ objects with the boilerplate code. The glue code implements factory methods to generate instances of appropriate BJ service APIs, which are defined in a core BJ module. This means that the WSDL + API implementation is decoupled from the BJ service API. Does that make sense (and/or am I simply stating the obvious here) ? Jim. -- ------------------------------------------------------------------- J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk The University of Dundee is a Scottish Registered Charity, No. SC015096. From willishf at ufl.edu Thu May 28 13:35:49 2009 From: willishf at ufl.edu (Scooter Willis) Date: Thu, 28 May 2009 09:35:49 -0400 Subject: [Biojava-l] Webservices clients in bio-java In-Reply-To: <4A1E839F.2050700@compbio.dundee.ac.uk> References: <200905141555.31967.niall@sgenomics.org> <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> <200905190936.38541.niall@sgenomics.org> <93b45ca50905190702l2169d7f4qf200ed04f33c17b5@mail.gmail.com> <93b45ca50905190705qec04c48yb7a81c9b0715a50e@mail.gmail.com> <4A1E839F.2050700@compbio.dundee.ac.uk> Message-ID: <7ceb4beb0905280635g7bbb028dw9c5d69a9290b5a16@mail.gmail.com> Jim I agree. I am planning on doing some testing of a couple BLAST web services interfaces(assuming more than one exists) and see what they truly have in common and see how that would impact a BJ3 front end to multiple providers. My assumption is that they will be the same. I noticed on the NCBI Blast implementations the user was required to pass their email address as part of the web service call. They are concerned with abuse from external processes and they only allow one sequence per request. >From wikipedia the following are listed as BLAST resources where more than one may offer a web service interface. Should BioJava3 try and support more than one? Thanks Scooter Variations of BLAST - WU-BLAST ? the original gapping BLAST with statistics, developed and maintained by Warren Gish at Washington University in St. Louis - EBI's BLAST Services ? EBI'smain blast services page. - FSA-BLAST ? a new, faster but still accurate version of NCBI BLAST based on recently published algorithmic improvements - NBIC mpiBLAST ? at the Netherlands Bioinformatics Centre - Parallel BLAST? a dual scheduling BLAST tested on the Blue Gene/L - mpiBLAST ? open-source parallel BLAST - A/G BLAST ? implementation for PowerPC G4/G5 processors and Mac OS X, from Apple Computer 's Advanced Computation Groupand Genentech . - STRAP ? the protein workbench STRAPcontains a comfortable BLAST front-end with a cache for BLAST results [edit ] Commercial versions - ThermoBLAST by DNA Software Inc.? scans entire genomes quickly and accurately combing the power of BLAST with the most advanced thermodynamics parameters - PatternHunter? an alternative software which provides similar functionality to BLAST while claiming increased speed and sensitivity - KoriBlast ? a reliable graphical environment dedicated to sequence data mining. KoriBlast combines Blast searches with advanced data management capabilities and a state-of-the-art graphical user interface. - microbial identification BLAST ? a quality controlled database for in-vitro diagnostics. SepsiTest combines broad-range-PCR using ultra-pure reagents with Blast searches in a quality controlled environment. On Thu, May 28, 2009 at 8:29 AM, James Procter wrote: > Hi All. > > Mark Schreiber wrote: > > There was a recent discussion of generated code on the list. A good > solution > > for webservices may be to put the wsdl and high level api in the > repository > > and let jax-ws tools autogenerate the client biolerplate code. > > Just to clarify - by high-level API you mean the interface to the code > that takes biojava objects as arguments and translates them into > parameters for the service and vice versa for the results ? > > With regard to the recent(ish) discussion about modularisation in BJ3, > I'd suggest that a module is created for each distinct WSDL containing > the glue code for connecting BJ objects with the boilerplate code. The > glue code implements factory methods to generate instances of > appropriate BJ service APIs, which are defined in a core BJ module. This > means that the WSDL + API implementation is decoupled from the BJ > service API. > > Does that make sense (and/or am I simply stating the obvious here) ? > > Jim. > > -- > ------------------------------------------------------------------- > J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group > Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk > The University of Dundee is a Scottish Registered Charity, No. SC015096. > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From markjschreiber at gmail.com Thu May 28 14:11:36 2009 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 28 May 2009 22:11:36 +0800 Subject: [Biojava-l] Webservices clients in bio-java In-Reply-To: <93b45ca50905280709q6896f96fg598276de20222440@mail.gmail.com> References: <200905141555.31967.niall@sgenomics.org> <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> <200905190936.38541.niall@sgenomics.org> <93b45ca50905190702l2169d7f4qf200ed04f33c17b5@mail.gmail.com> <93b45ca50905190705qec04c48yb7a81c9b0715a50e@mail.gmail.com> <4A1E839F.2050700@compbio.dundee.ac.uk> <93b45ca50905280709q6896f96fg598276de20222440@mail.gmail.com> Message-ID: <93b45ca50905280711q222a0c90ob93a3098ff1b97c7@mail.gmail.com> By high level API I mean anything not machine generated. That could be classes that use biojava objects or a more humanized or smarter interface to the boilerplate code. - Mark On 28 May 2009, 2:53 PM, "James Procter" wrote: Hi All. Mark Schreiber wrote: > There was a recent discussion of generated code on the list. A good solutio... Just to clarify - by high-level API you mean the interface to the code that takes biojava objects as arguments and translates them into parameters for the service and vice versa for the results ? With regard to the recent(ish) discussion about modularisation in BJ3, I'd suggest that a module is created for each distinct WSDL containing the glue code for connecting BJ objects with the boilerplate code. The glue code implements factory methods to generate instances of appropriate BJ service APIs, which are defined in a core BJ module. This means that the WSDL + API implementation is decoupled from the BJ service API. Does that make sense (and/or am I simply stating the obvious here) ? Jim. -- ------------------------------------------------------------------- J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk The University of Dundee is a Scottish Registered Charity, No. SC015096. _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.... From andreas at sdsc.edu Thu May 28 14:59:17 2009 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 28 May 2009 07:59:17 -0700 Subject: [Biojava-l] Webservices clients in bio-java In-Reply-To: <4A1E839F.2050700@compbio.dundee.ac.uk> References: <200905141555.31967.niall@sgenomics.org> <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> <200905190936.38541.niall@sgenomics.org> <93b45ca50905190702l2169d7f4qf200ed04f33c17b5@mail.gmail.com> <93b45ca50905190705qec04c48yb7a81c9b0715a50e@mail.gmail.com> <4A1E839F.2050700@compbio.dundee.ac.uk> Message-ID: <59a41c430905280759i503719eaxd2ac64436154be2d@mail.gmail.com> makes sense and fits to the more detailed discussion we are having on biojava-dev right now. A On Thu, May 28, 2009 at 5:29 AM, James Procter wrote: > Hi All. > > Mark Schreiber wrote: >> There was a recent discussion of generated code on the list. A good solution >> for webservices may be to put the wsdl and high level api in the repository >> and let jax-ws tools autogenerate the client biolerplate code. > > Just to clarify - by high-level API you mean the interface to the code > that takes biojava objects as arguments and translates them into > parameters for the service and vice versa for the results ? > > With regard to the recent(ish) discussion about modularisation in BJ3, > I'd suggest that a module is created for each distinct WSDL containing > the glue code for connecting BJ objects with the boilerplate code. The > glue code implements factory methods to generate instances of > appropriate BJ service APIs, which are defined in a core BJ module. This > means that the WSDL + API implementation is decoupled from the BJ > service API. > > Does that make sense (and/or am I simply stating the obvious here) ? > > Jim. > > -- > ------------------------------------------------------------------- > J. B. Procter ?(ENFIN/VAMSAS) ?Barton Bioinformatics Research Group > Phone/Fax:+44(0)1382 388734/345764 ?http://www.compbio.dundee.ac.uk > The University of Dundee is a Scottish Registered Charity, No. SC015096. > _______________________________________________ > Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From jimp at compbio.dundee.ac.uk Thu May 28 15:14:51 2009 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Thu, 28 May 2009 16:14:51 +0100 Subject: [Biojava-l] Webservices clients in bio-java In-Reply-To: <7ceb4beb0905280635g7bbb028dw9c5d69a9290b5a16@mail.gmail.com> References: <200905141555.31967.niall@sgenomics.org> <59a41c430905161210u2b79a402m643eb47704bc06e7@mail.gmail.com> <200905190936.38541.niall@sgenomics.org> <93b45ca50905190702l2169d7f4qf200ed04f33c17b5@mail.gmail.com> <93b45ca50905190705qec04c48yb7a81c9b0715a50e@mail.gmail.com> <4A1E839F.2050700@compbio.dundee.ac.uk> <7ceb4beb0905280635g7bbb028dw9c5d69a9290b5a16@mail.gmail.com> Message-ID: <4A1EAA6B.7020202@compbio.dundee.ac.uk> Hi Mark, and Scooter. Mark Schreiber wrote: > By high level API I mean anything not machine generated. That could be > classes that use biojava objects or a more humanized or smarter > interface to the boilerplate code. ah. yes. that's what I thought. I'd really advocate keeping the Biojava object adaptor glue for the autogenerated code separate from the core. This is because (as I'm sure you know) there are no real object design standards for bioinformatic services (apart from BioMOBY..) - so the adaptor code is always going to be specific to the WSDL. Scooter Willis wrote: > I agree. I am planning on doing some testing of a couple BLAST web > services interfaces(assuming more than one exists) and see what they > truly have in common and see how that would impact a BJ3 front end to > multiple providers. My assumption is that they will be the same. Of course, the actual functionality provided by all the servers will be more or less the same (i.e. inputs and outputs), but servers will differ in how they wrap the input and result data, how they support parameters, and how they deal with polling and result retrieval (if they are at all asynchronous). I > noticed on the NCBI Blast implementations the user was required to pass > their email address as part of the web service call. They are concerned > with abuse from external processes and they only allow one sequence per > request. Yes. The NCBI has an automatic blacklisting system - too many requests in a short space of time from a single IP will result in denial of service for X (24?) hours. I'm not sure how something like Taverna manages this kind of thing, but I expect this kind of user attribute will have to be supported in the core web service model - that could also provide authentication management support too (when users are using secure or authenticated HTTP connections). > From wikipedia the following are listed as BLAST resources where more > than one may offer a web service interface. Should BioJava3 try and > support more than one? I would pick the two or three most popular/capable/geographically widespread services to implement clients for, and then choose your favourite one to start with (EBI,NCBI, and perhaps one in Japan - e.g. http://xml.nig.ac.jp/wsdl/Blast.wsdl). Implementing clients for two or three allows users a nice choice, and should also capture most variation in the input/output and polling model that is used. If any more need to be supported, then by then it should be pretty clear how to go about implementing an API instance for a new WSDL. You might also want to consider generalising the high-level interface to non-Blast sequence search services, since they all take and return more or less the same kind of data and results. I suspect this discussion should move to the biojava-dev list! Jim. From jp at javaclass.co.uk Thu May 28 18:54:42 2009 From: jp at javaclass.co.uk (JP) Date: Thu, 28 May 2009 19:54:42 +0100 Subject: [Biojava-l] dN/dS ratio calculation biojava Message-ID: <4adc29060905281154i6db465edt8c154bb30863296f@mail.gmail.com> Hi there - is there a class or method which finds the dN/dS ratio (nonsynonymous/synonymous rate ratio) for two sequences ? Many Thanks JP From ayates at ebi.ac.uk Fri May 29 08:23:34 2009 From: ayates at ebi.ac.uk (Andy Yates) Date: Fri, 29 May 2009 09:23:34 +0100 Subject: [Biojava-l] dN/dS ratio calculation biojava In-Reply-To: <4adc29060905281154i6db465edt8c154bb30863296f@mail.gmail.com> References: <4adc29060905281154i6db465edt8c154bb30863296f@mail.gmail.com> Message-ID: <4A1F9B86.6080309@ebi.ac.uk> Hi JP, No there isn't a class to calculate dN/dS. At a basic level dN/dS is a very easy ratio to calculate if you take the very basic approach of sequences of exact lengths. The basics of it are available from: http://pubmlst.org/software/analysis/start/manual/dsdn.shtml The best package I know of to calculate the rates though is codeml. It's a very awkward program to run but it can be done & will produce the best results. I can say for sure that the Ensembl Compara team use it to calculate dN/dS rates for their homologous protein sets. Hope that helps, Andy P.S. I say all of this from personal experience when I wrote a very basic calculator using BioJava (and I don't know where the code got to). The amount of work to produce that code vs. running a third party application just makes it more cost effective to go for the 3rd party app JP wrote: > Hi there - is there a class or method which finds the dN/dS ratio > (nonsynonymous/synonymous rate ratio) for two sequences ? > > Many Thanks > JP > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From jogoodma at indiana.edu Tue May 12 20:40:40 2009 From: jogoodma at indiana.edu (Josh Goodman) Date: Tue, 12 May 2009 20:40:40 -0000 Subject: [Biojava-l] FASTA parsing bug ? In-Reply-To: <49F822C6.1020809@eaglegenomics.com> References: <4adc29060904280701q5d3dc760mb018f6b38a9e056f@mail.gmail.com> <49F71258.3060103@eaglegenomics.com> <4adc29060904280759q4b27a4eembd28974d46199532@mail.gmail.com> <49F71EF5.90702@eaglegenomics.com> <49F822C6.1020809@eaglegenomics.com> Message-ID: <4A09D8DC.1050307@indiana.edu> Hi all, I apologize for missing the "next day or so" window but here is my patch. I'm attaching a patch file for FastaFormat.java (1.7 tagged branch), the full source file, and a test class. It seems to work and performance is on par with the previous approach in my measurements. One problem that I couldn't quite figure out a way around is with the guessSymbolTokenization(BufferedInputStream stream) method. If the first sequence of the stream has a header length plus first line of sequence length longer than 2000 characters it will fail to reset properly. Cheers, Josh Richard Holland wrote: > I'd love to see a proper solution to this that doesn't involve upping > the read-ahead limit. I was aware that it might be the issue, but had no > idea why it was not failing for other similar long sequences. I look > forward to seeing your suggested fix! > > thanks, > Richard > > Josh Goodman wrote: >> Hi Richard and JP, >> >> I think I can be of some help as I'm the FlyBase developer responsible for >> generating these troublesome FASTA files :-). The cause of this problem >> appears to be the description line length for the record FBpp0145470. >> >> The trouble lies in org.biojavax.bio.seq.io.FastaFormat in the while loop >> at line 196. Biojava correctly reads in FBpp0145468 but throws an error >> when trying to parse FBpp0145469. There is nothing wrong in FBpp0145469 >> but when biojava reaches the end of the sequence it reads in the header >> for the next record (FBpp0145470). It then tries to reset the >> BufferedReader to the start of FBpp0145470 but that is where the exception >> is thrown because line 197 sets the read ahead limit to 500 characters and >> the reader.readLine() command exceeds that limit. >> >> What isn't obvious to me is why other large definition lines that precede >> that line don't throw the same error (e.g. FBpp0157909). I guess the >> javadoc on BufferedReader.mark() does say "may fail" but I assumed it >> would be more predictable than that. >> >> The file in question can be downloaded from >> ftp://ftp.flybase.net/genomes/Drosophila_grimshawi/dgri_r1.3_FB2008_07/fasta/dgri-all-translation-r1.3.fasta.gz. >> >> If there is interest in a solution that doesn't involve simply upping the >> read ahead limit I can put a patch file together in the next day or so. >> >> Cheers, >> Josh >> >> On Tue, 28 Apr 2009, Richard Holland wrote: >> >>> You're right, doesn't look like newlines. >>> >>> The "Mark invalid" happens when the parser looks too far ahead in the >>> file attempting to seek out the next valid sequence to parse. I'm not >>> sure why this is happening. >>> >>> I don't have the time to test right now but if you could post the link >>> to where someone could download the same FASTA as you're using, then it >>> would make it possible for someone else to investigate in more detail. >>> >>> thanks, >>> Richard >>> >>> JP wrote: >>>> Thanks Richard for your prompt reply. >>>> >>>> I will not attach the fasta file I am parsing (12MB) its >>>> dgri-all-translation-r1.3.fasta from the flybase project. >>>> >>>> If the file had any extra new lines I would see them when I loaded it in >>>> a text editor - no ? >>>> >>>> I implemented the whole thing without using Biojava (for this part) >>>> >>>> fr = new FileReader(fastaProteinFileName); >>>> br = new BufferedReader(fr); >>>> String fastaLine; >>>> String startAccession = '>' + accessionId.trim(); >>>> String fastaEntry = ""; >>>> boolean record = false; >>>> while ((fastaLine = br.readLine()) != null) { >>>> fastaLine = fastaLine.trim() + '\n'; >>>> if (fastaLine.startsWith(startAccession)) { >>>> record = true; >>>> } else if (record && fastaLine.startsWith(">")) { >>>> record = false; >>>> break; >>>> } >>>> if (record) { >>>> fastaEntry += fastaLine; >>>> } >>>> } >>>> >>>> >>>> Notice - I do not use regex - since I'd need to read the whole file and >>>> then regex upon it (if the record is the first one - I just read that one). >>>> >>>> Cheers >>>> JP >>>> >>>> >>>> On Tue, Apr 28, 2009 at 3:27 PM, Richard Holland >>>> > wrote: >>>> >>>> The "Mark invalid" exception is indicating that the parser has gone too >>>> far ahead in the file looking for a valid header. I'm not sure why but >>>> looking at your original query, there may be extra newlines embedded >>>> into your FASTA header line? That would definitely confuse it. >>>> >>>> The parser is not able to currently pull out just one sequence - in >>>> effect this is a search facility, which it doesn't have. :( >>>> >>>> thanks, >>>> Richard >>>> >>>> JP wrote: >>>> > Hi all at BioJava, >>>> > >>>> > I am trying to parse several FASTA files using the following code: >>>> > >>>> > fr = new FileReader(fastaProteinFileName); >>>> >> br = new BufferedReader(fr); >>>> >> >>>> >> RichSequenceIterator protIter = IOTools.readFastaProtein(br, null); >>>> >> while (protIter.hasNext()) { >>>> >> BioEntry bioEntry = protIter.nextBioEntry(); >>>> >> System.out.println (fastaProteinFileName + " == " + >>>> accessionId + " = >>>> >> " + bioEntry.getAccession()); >>>> >> } >>>> > >>>> > >>>> > At particular points in my fasta file - I get the following exception: >>>> > >>>> > 14:53:42,546 ERROR FastaFileProcessing - File parsing exception (from >>>> >> biojava library) >>>> >> org.biojava.bio.BioException: Could not read sequence >>>> >> at >>>> >> >>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113) >>>> >> at >>>> >> >>>> org.biojavax.bio.seq.io.RichStreamReader.nextBioEntry(RichStreamReader.java:99) >>>> >> at >>>> >> >>>> edu.imperial.msc.orthologue.fasta.FastaFileProcessing.getProteinSequenceFromFASTAFile(FastaFileProcessing.java:60) >>>> >> at >>>> >> >>>> edu.imperial.msc.orthologue.core.OrthologueFinder.getFASTAEntries(OrthologueFinder.java:64) >>>> >> at >>>> >> >>>> edu.imperial.msc.orthologue.core.OrthologueFinder.(OrthologueFinder.java:51) >>>> >> at >>>> >> >>>> edu.imperial.msc.orthologue.launcher.OrthologueFinderLauncher.main(OrthologueFinderLauncher.java:60) >>>> >> Caused by: java.io.IOException: Mark invalid >>>> >> at java.io.BufferedReader.reset(Unknown Source) >>>> >> at >>>> >> >>>> org.biojavax.bio.seq.io.FastaFormat.readRichSequence(FastaFormat.java:202) >>>> >> at >>>> >> >>>> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110) >>>> >> ... 5 more >>>> > >>>> > >>>> > Interestingly if I delete the header portion of the header line (from >>>> > type=protein... till the end of the line ...Dgri;) >>>> > >>>> >> FBpp0145468 type=protein; >>>> >> >>>> loc=scaffold_15252:join(13219687..13219727,13219972..13220279,13220507..13220798,13220861..13221180,13221286..13221467,13222258..13222629,13226331..13226463,13226531..13226658); >>>> >> ID=FBpp0145468; name=Dgri\GH11562-PA; parent=FBgn0119042,FBtr0146976; >>>> >> dbxref=FlyBase:FBpp0145468,FlyBase_Annotation_IDs:GH11562-PA; >>>> >> MD5=c8dc38c7197a0d3c93c78b08059e2604; length=591; release=r1.3; >>>> >> species=Dgri; >>>> >> >>>> > >>>> > It works - but I have a number of these exceptions (and I do not >>>> want to >>>> > edit the original data). Mind you I have longer headers in my >>>> file which >>>> > are parsed OK (strange!). >>>> > >>>> > Any ideas anyone ? Alternatively - is there a better way how to >>>> get ONE >>>> > SINGLE sequence from the whole fasta file give that I have the >>>> accession id >>>> > (FBpp0145468) ? >>>> > >>>> > Many Thanks >>>> > JP >>>> > _______________________________________________ >>>> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>> >>>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >>>> > >>>> >>>> -- >>>> Richard Holland, BSc MBCS >>>> Finance Director, Eagle Genomics Ltd >>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>> >>>> http://www.eaglegenomics.com/ >>>> >>>> >>> -- >>> Richard Holland, BSc MBCS >>> Finance Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> > -------------- next part -------------- A non-text attachment was scrubbed... Name: FastaFormat.patch Type: text/x-patch Size: 3402 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: FastaFormat.java Type: text/x-java Size: 11394 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: FastaFormatTest.java Type: text/x-java Size: 3517 bytes Desc: not available URL: