From gwaldon at geneinfinity.org Mon Aug 9 15:26:28 2010 From: gwaldon at geneinfinity.org (George Waldon) Date: Mon, 09 Aug 2010 15:26:28 -0400 Subject: [Biojava-dev] build problem Message-ID: <20100809192628.24899.qmail@mxw1102.verio-web.com> Hi, I am getting the following failures in org.biojava.bio.structure.align.benchmark.MultipleAlignmentTest: Fetching ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/pdb1hcy.ent.gz writing to \tmp\hc\pdb1hcy.ent.gz java.io.FileNotFoundException: \tmp\hc\pdb1hcy.ent.gz (The system cannot find the path specified) ... and Fetching ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/pdb1nls.ent.gz writing to \tmp\nl\pdb1nls.ent.gz java.io.FileNotFoundException: \tmp\nl\pdb1nls.ent.gz (The system cannot find the path specified) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.(FileOutputStream.java:179) at java.io.FileOutputStream.(FileOutputStream.java:131) at org.biojava.bio.structure.io.PDBFileReader.downloadPDB(PDBFileReader.java:430) ... Does anyone has a solution for this? I am building from within NetBeans. Thanks, George From andreas at sdsc.edu Mon Aug 9 15:35:56 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 9 Aug 2010 12:35:56 -0700 Subject: [Biojava-dev] build problem In-Reply-To: <20100809192628.24899.qmail@mxw1102.verio-web.com> References: <20100809192628.24899.qmail@mxw1102.verio-web.com> Message-ID: If you update the class, this should be fixed now. Was a problem with a hard coded /tmp path... Andreas On Mon, Aug 9, 2010 at 12:26 PM, George Waldon wrote: > Hi, > > I am getting the following failures in org.biojava.bio.structure.align.benchmark.MultipleAlignmentTest: > > Fetching ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/pdb1hcy.ent.gz > writing to \tmp\hc\pdb1hcy.ent.gz > java.io.FileNotFoundException: \tmp\hc\pdb1hcy.ent.gz (The system cannot find the path specified) > ... > > and > > Fetching ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/pdb1nls.ent.gz > writing to \tmp\nl\pdb1nls.ent.gz > java.io.FileNotFoundException: \tmp\nl\pdb1nls.ent.gz (The system cannot find the path specified) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:179) > at java.io.FileOutputStream.(FileOutputStream.java:131) > at org.biojava.bio.structure.io.PDBFileReader.downloadPDB(PDBFileReader.java:430) > ... > > Does anyone has a solution for this? I am building from within NetBeans. > > Thanks, > George > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From HWillis at scripps.edu Mon Aug 9 15:33:36 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Mon, 9 Aug 2010 15:33:36 -0400 Subject: [Biojava-dev] build problem In-Reply-To: <20100809192628.24899.qmail@mxw1102.verio-web.com> References: <20100809192628.24899.qmail@mxw1102.verio-web.com> Message-ID: George Try creating the directory \tmp\hc and \tmp\nl and see if that fixes the problem. Not sure if the test cases build the directory structure for copying files. If that doesn't work then Andreas will need to figure out the problem. You can comment out the test case if you want a quick work around. Thanks Scooter On Aug 9, 2010, at 3:26 PM, George Waldon wrote: ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/pdb1hcy.ent.gz From gwaldon at geneinfinity.org Mon Aug 9 16:52:39 2010 From: gwaldon at geneinfinity.org (George Waldon) Date: Mon, 09 Aug 2010 16:52:39 -0400 Subject: [Biojava-dev] build problem Message-ID: <20100809205239.19040.qmail@mxw1102.verio-web.com> Thanks to all for the fixing. The build took 19 minutes and 33 seconds, of which 16 min and 32 s were for the structure modules! This sounds a bit long to me. Is-this expected? George >----- ------- Original Message ------- ----- >From: Andreas Prlic >To: George Waldon >Sent: Mon, 9 Aug 2010 12:35:56 > >If you update the class, this should be fixed now. >Was a problem with >a hard coded /tmp path... > >Andreas > > > >On Mon, Aug 9, 2010 at 12:26 PM, George Waldon > wrote: >> Hi, >> >> I am getting the following failures in >org.biojava.bio.structure.align.benchmark.MultipleA >lignmentTest: >> >> Fetching >ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb >/pdb1hcy.ent.gz >> writing to \tmp\hc\pdb1hcy.ent.gz >> java.io.FileNotFoundException: >\tmp\hc\pdb1hcy.ent.gz (The system cannot find the >path specified) >> ... >> >> and >> >> Fetching >ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb >/pdb1nls.ent.gz >> writing to \tmp\nl\pdb1nls.ent.gz >> java.io.FileNotFoundException: >\tmp\nl\pdb1nls.ent.gz (The system cannot find the >path specified) >> at java.io.FileOutputStream.open(Native Method) >> at >java.io.FileOutputStream.(FileOutputStream.ja >va:179) >> at >java.io.FileOutputStream.(FileOutputStream.ja >va:131) >> at >org.biojava.bio.structure.io.PDBFileReader.download >PDB(PDBFileReader.java:430) >> ... >> >> Does anyone has a solution for this? I am >building from within NetBeans. >> >> Thanks, >> George >> > > > >-- >--------------------------------------------------- >-------------------- >Dr. Andreas Prlic >Senior Scientist, RCSB PDB Protein Data Bank >University of California, San Diego >(+1) 858.246.0526 >--------------------------------------------------- >-------------------- From andreas at sdsc.edu Mon Aug 9 17:27:27 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 9 Aug 2010 14:27:27 -0700 Subject: [Biojava-dev] build problem In-Reply-To: <20100809205239.19040.qmail@mxw1102.verio-web.com> References: <20100809205239.19040.qmail@mxw1102.verio-web.com> Message-ID: That seems like a very long time... on the automated build server the times look like below: If the structure module takes so much time, I wonder if you are behind a very slow network? Some of the junit tests fetch PDB files from a public ftp server and I wonder if there is a networking issue. Having said this, the two slowest modules are structure and protmod. We will try to cut down the time spent on those tests... Andreas INFO] biojava ............................................... SUCCESS [21.430s] [INFO] bytecode .............................................. SUCCESS [44.778s] [INFO] core .................................................. SUCCESS [4:54.691s] [INFO] alignment ............................................. SUCCESS [48.048s] [INFO] blast ................................................. SUCCESS [1:20.263s] [INFO] biojava3-structure .................................... SUCCESS [8:12.412s] [INFO] das ................................................... SUCCESS [1:13.103s] [INFO] sequence .............................................. SUCCESS [18.176s] [INFO] sequence-core ......................................... SUCCESS [47.424s] [INFO] sequence-rna .......................................... SUCCESS [28.770s] [INFO] sequence-biosql ....................................... SUCCESS [47.032s] [INFO] sequence-fasta ........................................ SUCCESS [29.075s] [INFO] sequence-blastxml ..................................... SUCCESS [44.392s] [INFO] sequencing ............................................ SUCCESS [49.520s] [INFO] phylo ................................................. SUCCESS [48.459s] [INFO] biosql ................................................ SUCCESS [57.503s] [INFO] gui ................................................... SUCCESS [58.610s] [INFO] biojava3-core ......................................... SUCCESS [1:05.398s] [INFO] biojava3-phylo ........................................ SUCCESS [46.623s] [INFO] biojava3-structure-gui ................................ SUCCESS [1:06.737s] [INFO] biojava3-alignment .................................... SUCCESS [54.682s] [INFO] biojava3-genome ....................................... SUCCESS [49.844s] [INFO] biojava3-protmod ...................................... SUCCESS [10:05.171s] [INFO] biojava3-ws ........................................... SUCCESS [44.379s] On Mon, Aug 9, 2010 at 1:52 PM, George Waldon wrote: > Thanks to all for the fixing. > > The build took 19 minutes and 33 seconds, of which 16 min and 32 s were for the structure modules! This sounds a bit long to me. Is-this expected? > > George > >>----- ------- Original Message ------- ----- >>From: Andreas Prlic >>To: George Waldon >>Sent: Mon, 9 Aug 2010 12:35:56 >> >>If you update the class, this should be fixed now. >>Was a problem with >>a hard coded /tmp path... >> >>Andreas >> >> >> >>On Mon, Aug 9, 2010 at 12:26 PM, George Waldon >> wrote: >>> Hi, >>> >>> I am getting the following failures in >>org.biojava.bio.structure.align.benchmark.MultipleA >>lignmentTest: >>> >>> Fetching >>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb >>/pdb1hcy.ent.gz >>> writing to \tmp\hc\pdb1hcy.ent.gz >>> java.io.FileNotFoundException: >>\tmp\hc\pdb1hcy.ent.gz (The system cannot find the >>path specified) >>> ... >>> >>> and >>> >>> Fetching >>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb >>/pdb1nls.ent.gz >>> writing to \tmp\nl\pdb1nls.ent.gz >>> java.io.FileNotFoundException: >>\tmp\nl\pdb1nls.ent.gz (The system cannot find the >>path specified) >>> at java.io.FileOutputStream.open(Native Method) >>> at >>java.io.FileOutputStream.(FileOutputStream.ja >>va:179) >>> at >>java.io.FileOutputStream.(FileOutputStream.ja >>va:131) >>> at >>org.biojava.bio.structure.io.PDBFileReader.download >>PDB(PDBFileReader.java:430) >>> ... >>> >>> Does anyone has a solution for this? I am >>building from within NetBeans. >>> >>> Thanks, >>> George >>> >> >> >> >>-- >>--------------------------------------------------- >>-------------------- >>Dr. Andreas Prlic >>Senior Scientist, RCSB PDB Protein Data Bank >>University of California, San Diego >>(+1) 858.246.0526 >>--------------------------------------------------- >>-------------------- > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From gwaldon at geneinfinity.org Mon Aug 9 18:20:27 2010 From: gwaldon at geneinfinity.org (George Waldon) Date: Mon, 09 Aug 2010 18:20:27 -0400 Subject: [Biojava-dev] build problem Message-ID: <20100809222027.95361.qmail@mxw1102.verio-web.com> Here is my summary and some of the timings are pretty good (simple HP laptop with a dual core): ------------------------------------------------------------------------ Reactor Summary: ------------------------------------------------------------------------ biojava ............................................... SUCCESS [3.820s] bytecode .............................................. SUCCESS [6.022s] core .................................................. SUCCESS [52.822s] alignment ............................................. SUCCESS [4.332s] blast ................................................. SUCCESS [27.453s] biojava3-structure .................................... SUCCESS [7:33.228s] das ................................................... SUCCESS [18.456s] sequence .............................................. SUCCESS [0.122s] sequence-core ......................................... SUCCESS [3.974s] sequence-rna .......................................... SUCCESS [0.313s] sequence-biosql ....................................... SUCCESS [2.735s] sequence-fasta ........................................ SUCCESS [0.324s] sequence-blastxml ..................................... SUCCESS [3.339s] sequencing ............................................ SUCCESS [5.637s] phylo ................................................. SUCCESS [5.084s] biosql ................................................ SUCCESS [6.344s] gui ................................................... SUCCESS [6.552s] biojava3-core ......................................... SUCCESS [8.477s] biojava3-phylo ........................................ SUCCESS [3.596s] biojava3-structure-gui ................................ SUCCESS [8.283s] biojava3-alignment .................................... SUCCESS [6.002s] biojava3-genome ....................................... SUCCESS [3.933s] biojava3-protmod ...................................... SUCCESS [8:59.048s] biojava3-ws ........................................... SUCCESS [1.367s] ------------------------------------------------------------------------ ------------------------------------------------------------------------ BUILD SUCCESSFUL ------------------------------------------------------------------------ Total time: 19 minutes 33 seconds Finished at: Mon Aug 09 13:48:00 PDT 2010 Final Memory: 116M/363M ------------------------------------------------------------------------ The long time comes apparently from fetching all these files. I tried the build last week on a different network after removing the faulty test as suggested by Scott and I had similar timing. This could be an issue with NetBeans in fact. If someone has experienced such long delays, it would be interesting to know in which conditions this occurred. Thanks again, George >----- ------- Original Message ------- ----- >From: Andreas Prlic >To: George Waldon >Sent: Mon, 9 Aug 2010 14:27:27 > >That seems like a very long time... on the >automated build server the >times look like below: > >If the structure module takes so much time, I >wonder if you are behind >a very slow network? Some of the junit tests fetch >PDB files from a >public ftp server and I wonder if there is a >networking issue. Having >said this, the two slowest modules are structure >and protmod. We will >try to cut down the time spent on those tests... > >Andreas > > >INFO] biojava >............................................... >SUCCESS [21.430s] >[INFO] bytecode >.............................................. >SUCCESS [44.778s] >[INFO] core >.................................................. >SUCCESS >[4:54.691s] >[INFO] alignment >............................................. >SUCCESS [48.048s] >[INFO] blast >................................................. >SUCCESS >[1:20.263s] >[INFO] biojava3-structure >.................................... SUCCESS >[8:12.412s] >[INFO] das >................................................... >SUCCESS >[1:13.103s] >[INFO] sequence >.............................................. >SUCCESS [18.176s] >[INFO] sequence-core >......................................... SUCCESS >[47.424s] >[INFO] sequence-rna >.......................................... SUCCESS >[28.770s] >[INFO] sequence-biosql >....................................... SUCCESS >[47.032s] >[INFO] sequence-fasta >........................................ SUCCESS >[29.075s] >[INFO] sequence-blastxml >..................................... SUCCESS >[44.392s] >[INFO] sequencing >............................................ >SUCCESS [49.520s] >[INFO] phylo >................................................. >SUCCESS [48.459s] >[INFO] biosql >................................................ >SUCCESS [57.503s] >[INFO] gui >................................................... >SUCCESS [58.610s] >[INFO] biojava3-core >......................................... SUCCESS >[1:05.398s] >[INFO] biojava3-phylo >........................................ SUCCESS >[46.623s] >[INFO] biojava3-structure-gui >................................ SUCCESS >[1:06.737s] >[INFO] biojava3-alignment >.................................... SUCCESS >[54.682s] >[INFO] biojava3-genome >....................................... SUCCESS >[49.844s] >[INFO] biojava3-protmod >...................................... SUCCESS >[10:05.171s] >[INFO] biojava3-ws >........................................... SUCCESS >[44.379s] > > >On Mon, Aug 9, 2010 at 1:52 PM, George Waldon > wrote: >> Thanks to all for the fixing. >> >> The build took 19 minutes and 33 seconds, of >which 16 min and 32 s were for the structure >modules! This sounds a bit long to me. Is-this >expected? >> >> George >> >>>----- ------- Original Message ------- ----- >>>From: Andreas Prlic >>>To: George Waldon >>>Sent: Mon, 9 Aug 2010 12:35:56 >>> >>>If you update the class, this should be fixed >now. >>>Was a problem with >>>a hard coded /tmp path... >>> >>>Andreas >>> >>> >>> >>>On Mon, Aug 9, 2010 at 12:26 PM, George Waldon >>> wrote: >>>> Hi, >>>> >>>> I am getting the following failures in >>>org.biojava.bio.structure.align.benchmark.Multipl >eA >>>lignmentTest: >>>> >>>> Fetching >>>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/p >db >>>/pdb1hcy.ent.gz >>>> writing to \tmp\hc\pdb1hcy.ent.gz >>>> java.io.FileNotFoundException: >>>\tmp\hc\pdb1hcy.ent.gz (The system cannot find >the >>>path specified) >>>> ... >>>> >>>> and >>>> >>>> Fetching >>>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/p >db >>>/pdb1nls.ent.gz >>>> writing to \tmp\nl\pdb1nls.ent.gz >>>> java.io.FileNotFoundException: >>>\tmp\nl\pdb1nls.ent.gz (The system cannot find >the >>>path specified) >>>> at java.io.FileOutputStream.open(Native Method) > >>>> at >>>java.io.FileOutputStream.(FileOutputStream. >ja >>>va:179) >>>> at >>>java.io.FileOutputStream.(FileOutputStream. >ja >>>va:131) >>>> at >>>org.biojava.bio.structure.io.PDBFileReader.downlo >ad >>>PDB(PDBFileReader.java:430) >>>> ... >>>> >>>> Does anyone has a solution for this? I am >>>building from within NetBeans. >>>> >>>> Thanks, >>>> George >>>> >>> >>> >>> >>>-- >>>------------------------------------------------- >-- >>>-------------------- >>>Dr. Andreas Prlic >>>Senior Scientist, RCSB PDB Protein Data Bank >>>University of California, San Diego >>>(+1) 858.246.0526 >>>------------------------------------------------- >-- >>>-------------------- >> > > > >-- >--------------------------------------------------- >-------------------- >Dr. Andreas Prlic >Senior Scientist, RCSB PDB Protein Data Bank >University of California, San Diego >(+1) 858.246.0526 >--------------------------------------------------- >-------------------- From andreas at sdsc.edu Mon Aug 9 18:48:18 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 9 Aug 2010 15:48:18 -0700 Subject: [Biojava-dev] build problem In-Reply-To: <20100809222027.95361.qmail@mxw1102.verio-web.com> References: <20100809222027.95361.qmail@mxw1102.verio-web.com> Message-ID: Seems your times are comparable with the build machine. The files are downloaded and store in local temporary directories (as provided by System.getProperty("java.io.tmpdir") ) If that tmp directory changes, the files will have to be re-loaded again, otherwise they will be re-used and the code runs quicker. Seems the VM is changing the tmp dir location frequently. Anybody has a suggestion how to define a more "stable" tmp dir locations? Otherwise I will put all required test files into the test resources dir ... Andreas On Mon, Aug 9, 2010 at 3:20 PM, George Waldon wrote: > Here is my summary and some of the timings are pretty good (simple HP laptop with a dual core): > > ------------------------------------------------------------------------ > Reactor Summary: > ------------------------------------------------------------------------ > biojava ............................................... SUCCESS [3.820s] > bytecode .............................................. SUCCESS [6.022s] > core .................................................. SUCCESS [52.822s] > alignment ............................................. SUCCESS [4.332s] > blast ................................................. SUCCESS [27.453s] > biojava3-structure .................................... SUCCESS [7:33.228s] > das ................................................... SUCCESS [18.456s] > sequence .............................................. SUCCESS [0.122s] > sequence-core ......................................... SUCCESS [3.974s] > sequence-rna .......................................... SUCCESS [0.313s] > sequence-biosql ....................................... SUCCESS [2.735s] > sequence-fasta ........................................ SUCCESS [0.324s] > sequence-blastxml ..................................... SUCCESS [3.339s] > sequencing ............................................ SUCCESS [5.637s] > phylo ................................................. SUCCESS [5.084s] > biosql ................................................ SUCCESS [6.344s] > gui ................................................... SUCCESS [6.552s] > biojava3-core ......................................... SUCCESS [8.477s] > biojava3-phylo ........................................ SUCCESS [3.596s] > biojava3-structure-gui ................................ SUCCESS [8.283s] > biojava3-alignment .................................... SUCCESS [6.002s] > biojava3-genome ....................................... SUCCESS [3.933s] > biojava3-protmod ...................................... SUCCESS [8:59.048s] > biojava3-ws ........................................... SUCCESS [1.367s] > ------------------------------------------------------------------------ > ------------------------------------------------------------------------ > BUILD SUCCESSFUL > ------------------------------------------------------------------------ > Total time: 19 minutes 33 seconds > Finished at: Mon Aug 09 13:48:00 PDT 2010 > Final Memory: 116M/363M > ------------------------------------------------------------------------ > > The long time comes apparently from fetching all these files. I tried the build last week on a different network after removing the faulty test as suggested by Scott and I had similar timing. This could be an issue with NetBeans in fact. If someone has experienced such long delays, it would be interesting to know in which conditions this occurred. > > Thanks again, > > George > >>----- ------- Original Message ------- ----- >>From: Andreas Prlic >>To: George Waldon >>Sent: Mon, 9 Aug 2010 14:27:27 >> >>That seems like a very long time... on the >>automated build server the >>times look like below: >> >>If the structure module takes so much time, I >>wonder if you are behind >>a very slow network? Some of the junit tests fetch >>PDB files from a >>public ftp server and I wonder if there is a >>networking issue. Having >>said this, the two slowest modules are structure >>and protmod. We will >>try to cut down the time spent on those tests... >> >>Andreas >> >> >>INFO] biojava >>............................................... >>SUCCESS [21.430s] >>[INFO] bytecode >>.............................................. >>SUCCESS [44.778s] >>[INFO] core >>.................................................. >>SUCCESS >>[4:54.691s] >>[INFO] alignment >>............................................. >>SUCCESS [48.048s] >>[INFO] blast >>................................................. >>SUCCESS >>[1:20.263s] >>[INFO] biojava3-structure >>.................................... SUCCESS >>[8:12.412s] >>[INFO] das >>................................................... >>SUCCESS >>[1:13.103s] >>[INFO] sequence >>.............................................. >>SUCCESS [18.176s] >>[INFO] sequence-core >>......................................... SUCCESS >>[47.424s] >>[INFO] sequence-rna >>.......................................... SUCCESS >>[28.770s] >>[INFO] sequence-biosql >>....................................... SUCCESS >>[47.032s] >>[INFO] sequence-fasta >>........................................ SUCCESS >>[29.075s] >>[INFO] sequence-blastxml >>..................................... SUCCESS >>[44.392s] >>[INFO] sequencing >>............................................ >>SUCCESS [49.520s] >>[INFO] phylo >>................................................. >>SUCCESS [48.459s] >>[INFO] biosql >>................................................ >>SUCCESS [57.503s] >>[INFO] gui >>................................................... >>SUCCESS [58.610s] >>[INFO] biojava3-core >>......................................... SUCCESS >>[1:05.398s] >>[INFO] biojava3-phylo >>........................................ SUCCESS >>[46.623s] >>[INFO] biojava3-structure-gui >>................................ SUCCESS >>[1:06.737s] >>[INFO] biojava3-alignment >>.................................... SUCCESS >>[54.682s] >>[INFO] biojava3-genome >>....................................... SUCCESS >>[49.844s] >>[INFO] biojava3-protmod >>...................................... SUCCESS >>[10:05.171s] >>[INFO] biojava3-ws >>........................................... SUCCESS >>[44.379s] >> >> >>On Mon, Aug 9, 2010 at 1:52 PM, George Waldon >> wrote: >>> Thanks to all for the fixing. >>> >>> The build took 19 minutes and 33 seconds, of >>which 16 min and 32 s were for the structure >>modules! This sounds a bit long to me. Is-this >>expected? >>> >>> George >>> >>>>----- ------- Original Message ------- ----- >>>>From: Andreas Prlic >>>>To: George Waldon >>>>Sent: Mon, 9 Aug 2010 12:35:56 >>>> >>>>If you update the class, this should be fixed >>now. >>>>Was a problem with >>>>a hard coded /tmp path... >>>> >>>>Andreas >>>> >>>> >>>> >>>>On Mon, Aug 9, 2010 at 12:26 PM, George Waldon >>>> wrote: >>>>> Hi, >>>>> >>>>> I am getting the following failures in >>>>org.biojava.bio.structure.align.benchmark.Multipl >>eA >>>>lignmentTest: >>>>> >>>>> Fetching >>>>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/p >>db >>>>/pdb1hcy.ent.gz >>>>> writing to \tmp\hc\pdb1hcy.ent.gz >>>>> java.io.FileNotFoundException: >>>>\tmp\hc\pdb1hcy.ent.gz (The system cannot find >>the >>>>path specified) >>>>> ... >>>>> >>>>> and >>>>> >>>>> Fetching >>>>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/p >>db >>>>/pdb1nls.ent.gz >>>>> writing to \tmp\nl\pdb1nls.ent.gz >>>>> java.io.FileNotFoundException: >>>>\tmp\nl\pdb1nls.ent.gz (The system cannot find >>the >>>>path specified) >>>>> at java.io.FileOutputStream.open(Native Method) >> >>>>> at >>>>java.io.FileOutputStream.(FileOutputStream. >>ja >>>>va:179) >>>>> at >>>>java.io.FileOutputStream.(FileOutputStream. >>ja >>>>va:131) >>>>> at >>>>org.biojava.bio.structure.io.PDBFileReader.downlo >>ad >>>>PDB(PDBFileReader.java:430) >>>>> ... >>>>> >>>>> Does anyone has a solution for this? I am >>>>building from within NetBeans. >>>>> >>>>> Thanks, >>>>> George >>>>> >>>> >>>> >>>> >>>>-- >>>>------------------------------------------------- >>-- >>>>-------------------- >>>>Dr. Andreas Prlic >>>>Senior Scientist, RCSB PDB Protein Data Bank >>>>University of California, San Diego >>>>(+1) 858.246.0526 >>>>------------------------------------------------- >>-- >>>>-------------------- >>> >> >> >> >>-- >>--------------------------------------------------- >>-------------------- >>Dr. Andreas Prlic >>Senior Scientist, RCSB PDB Protein Data Bank >>University of California, San Diego >>(+1) 858.246.0526 >>--------------------------------------------------- >>-------------------- > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From gwaldon at geneinfinity.org Mon Aug 9 20:04:39 2010 From: gwaldon at geneinfinity.org (George Waldon) Date: Mon, 09 Aug 2010 20:04:39 -0400 Subject: [Biojava-dev] build problem Message-ID: <20100810000439.77852.qmail@mxw1102.verio-web.com> I think you are right and these files are not downloaded again. Here are the tests that consume significant time; maybe you can figure out which process is slow: Running org.biojava.bio.structure.align.fatcat.TestFlexibleRotationMatrices Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 63.145 sec Running org.biojava.bio.structure.align.FlipAFPChainTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 35.688 sec Running org.biojava.bio.structure.align.fatcat.TestOutputStrings Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 129.81 sec Running org.biojava.bio.structure.align.fatcat.AFPChainSerialisationTest Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 131.016 sec Running org.biojava3.protmod.structure.ProteinModificationParserTest Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 530.036 sec George >----- ------- Original Message ------- ----- >From: Andreas Prlic >To: George Waldon >Sent: Mon, 9 Aug 2010 15:48:18 > >Seems your times are comparable with the build >machine. The files are >downloaded and store in local temporary directories >(as provided by >System.getProperty("java.io.tmpdir") ) If that tmp >directory changes, >the files will have to be re-loaded again, >otherwise they will be >re-used and the code runs quicker. Seems the VM is >changing the tmp >dir location frequently. Anybody has a suggestion >how to define a more >"stable" tmp dir locations? Otherwise I will put >all required test >files into the test resources dir ... > >Andreas > > > >On Mon, Aug 9, 2010 at 3:20 PM, George Waldon > wrote: >> Here is my summary and some of the timings are >pretty good (simple HP laptop with a dual core): >> >> >--------------------------------------------------- >--------------------- >> Reactor Summary: >> >--------------------------------------------------- >--------------------- >> biojava >............................................... >SUCCESS [3.820s] >> bytecode >.............................................. >SUCCESS [6.022s] >> core >.................................................. >SUCCESS [52.822s] >> alignment >............................................. >SUCCESS [4.332s] >> blast >................................................. >SUCCESS [27.453s] >> biojava3-structure >.................................... SUCCESS >[7:33.228s] >> das >................................................... >SUCCESS [18.456s] >> sequence >.............................................. >SUCCESS [0.122s] >> sequence-core >......................................... SUCCESS >[3.974s] >> sequence-rna >.......................................... SUCCESS >[0.313s] >> sequence-biosql >....................................... SUCCESS >[2.735s] >> sequence-fasta >........................................ SUCCESS >[0.324s] >> sequence-blastxml >..................................... SUCCESS >[3.339s] >> sequencing >............................................ >SUCCESS [5.637s] >> phylo >................................................. >SUCCESS [5.084s] >> biosql >................................................ >SUCCESS [6.344s] >> gui >................................................... >SUCCESS [6.552s] >> biojava3-core >......................................... SUCCESS >[8.477s] >> biojava3-phylo >........................................ SUCCESS >[3.596s] >> biojava3-structure-gui >................................ SUCCESS [8.283s] >> biojava3-alignment >.................................... SUCCESS >[6.002s] >> biojava3-genome >....................................... SUCCESS >[3.933s] >> biojava3-protmod >...................................... SUCCESS >[8:59.048s] >> biojava3-ws >........................................... SUCCESS >[1.367s] >> >--------------------------------------------------- >--------------------- >> >--------------------------------------------------- >--------------------- >> BUILD SUCCESSFUL >> >--------------------------------------------------- >--------------------- >> Total time: 19 minutes 33 seconds >> Finished at: Mon Aug 09 13:48:00 PDT 2010 >> Final Memory: 116M/363M >> >--------------------------------------------------- >--------------------- >> >> The long time comes apparently from fetching all >these files. I tried the build last week on a >different network after removing the faulty test as >suggested by Scott and I had similar timing. This >could be an issue with NetBeans in fact. If someone >has experienced such long delays, it would be >interesting to know in which conditions this >occurred. >> >> Thanks again, >> >> George >> >>>----- ------- Original Message ------- ----- >>>From: Andreas Prlic >>>To: George Waldon >>>Sent: Mon, 9 Aug 2010 14:27:27 >>> >>>That seems like a very long time... on the >>>automated build server the >>>times look like below: >>> >>>If the structure module takes so much time, I >>>wonder if you are behind >>>a very slow network? Some of the junit tests >fetch >>>PDB files from a >>>public ftp server and I wonder if there is a >>>networking issue. Having >>>said this, the two slowest modules are structure >>>and protmod. We will >>>try to cut down the time spent on those tests... >>> >>>Andreas >>> >>> >>>INFO] biojava >>>............................................... >>>SUCCESS [21.430s] >>>[INFO] bytecode >>>.............................................. >>>SUCCESS [44.778s] >>>[INFO] core >>>................................................. >. >>>SUCCESS >>>[4:54.691s] >>>[INFO] alignment >>>............................................. >>>SUCCESS [48.048s] >>>[INFO] blast >>>................................................. >>>SUCCESS >>>[1:20.263s] >>>[INFO] biojava3-structure >>>.................................... SUCCESS >>>[8:12.412s] >>>[INFO] das >>>................................................. >.. >>>SUCCESS >>>[1:13.103s] >>>[INFO] sequence >>>.............................................. >>>SUCCESS [18.176s] >>>[INFO] sequence-core >>>......................................... SUCCESS >>>[47.424s] >>>[INFO] sequence-rna >>>.......................................... >SUCCESS >>>[28.770s] >>>[INFO] sequence-biosql >>>....................................... SUCCESS >>>[47.032s] >>>[INFO] sequence-fasta >>>........................................ SUCCESS >>>[29.075s] >>>[INFO] sequence-blastxml >>>..................................... SUCCESS >>>[44.392s] >>>[INFO] sequencing >>>............................................ >>>SUCCESS [49.520s] >>>[INFO] phylo >>>................................................. >>>SUCCESS [48.459s] >>>[INFO] biosql >>>................................................ >>>SUCCESS [57.503s] >>>[INFO] gui >>>................................................. >.. >>>SUCCESS [58.610s] >>>[INFO] biojava3-core >>>......................................... SUCCESS >>>[1:05.398s] >>>[INFO] biojava3-phylo >>>........................................ SUCCESS >>>[46.623s] >>>[INFO] biojava3-structure-gui >>>................................ SUCCESS >>>[1:06.737s] >>>[INFO] biojava3-alignment >>>.................................... SUCCESS >>>[54.682s] >>>[INFO] biojava3-genome >>>....................................... SUCCESS >>>[49.844s] >>>[INFO] biojava3-protmod >>>...................................... SUCCESS >>>[10:05.171s] >>>[INFO] biojava3-ws >>>........................................... >SUCCESS >>>[44.379s] >>> >>> >>>On Mon, Aug 9, 2010 at 1:52 PM, George Waldon >>> wrote: >>>> Thanks to all for the fixing. >>>> >>>> The build took 19 minutes and 33 seconds, of >>>which 16 min and 32 s were for the structure >>>modules! This sounds a bit long to me. Is-this >>>expected? >>>> >>>> George >>>> >>>>>----- ------- Original Message ------- ----- >>>>>From: Andreas Prlic >>>>>To: George Waldon >>>>>Sent: Mon, 9 Aug 2010 12:35:56 >>>>> >>>>>If you update the class, this should be fixed >>>now. >>>>>Was a problem with >>>>>a hard coded /tmp path... >>>>> >>>>>Andreas >>>>> >>>>> >>>>> >>>>>On Mon, Aug 9, 2010 at 12:26 PM, George Waldon >>>>> wrote: >>>>>> Hi, >>>>>> >>>>>> I am getting the following failures in >>>>>org.biojava.bio.structure.align.benchmark.Multi >pl >>>eA >>>>>lignmentTest: >>>>>> >>>>>> Fetching >>>>>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all >/p >>>db >>>>>/pdb1hcy.ent.gz >>>>>> writing to \tmp\hc\pdb1hcy.ent.gz >>>>>> java.io.FileNotFoundException: >>>>>\tmp\hc\pdb1hcy.ent.gz (The system cannot find >>>the >>>>>path specified) >>>>>> ... >>>>>> >>>>>> and >>>>>> >>>>>> Fetching >>>>>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all >/p >>>db >>>>>/pdb1nls.ent.gz >>>>>> writing to \tmp\nl\pdb1nls.ent.gz >>>>>> java.io.FileNotFoundException: >>>>>\tmp\nl\pdb1nls.ent.gz (The system cannot find >>>the >>>>>path specified) >>>>>> at java.io.FileOutputStream.open(Native >Method) >>> >>>>>> at >>>>>java.io.FileOutputStream.(FileOutputStrea >m. >>>ja >>>>>va:179) >>>>>> at >>>>>java.io.FileOutputStream.(FileOutputStrea >m. >>>ja >>>>>va:131) >>>>>> at >>>>>org.biojava.bio.structure.io.PDBFileReader.down >lo >>>ad >>>>>PDB(PDBFileReader.java:430) >>>>>> ... >>>>>> >>>>>> Does anyone has a solution for this? I am >>>>>building from within NetBeans. >>>>>> >>>>>> Thanks, >>>>>> George >>>>>> >>>>> >>>>> >>>>> >>>>>-- >>>>>----------------------------------------------- >-- >>>-- >>>>>-------------------- >>>>>Dr. Andreas Prlic >>>>>Senior Scientist, RCSB PDB Protein Data Bank >>>>>University of California, San Diego >>>>>(+1) 858.246.0526 >>>>>----------------------------------------------- >-- >>>-- >>>>>-------------------- >>>> >>> >>> >>> >>>-- >>>------------------------------------------------- >-- >>>-------------------- >>>Dr. Andreas Prlic >>>Senior Scientist, RCSB PDB Protein Data Bank >>>University of California, San Diego >>>(+1) 858.246.0526 >>>------------------------------------------------- >-- >>>-------------------- >> > > > >-- >--------------------------------------------------- >-------------------- >Dr. Andreas Prlic >Senior Scientist, RCSB PDB Protein Data Bank >University of California, San Diego >(+1) 858.246.0526 >--------------------------------------------------- >-------------------- From andreas at sdsc.edu Tue Aug 10 22:24:08 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 10 Aug 2010 19:24:08 -0700 Subject: [Biojava-dev] biojava3 sequence tools Message-ID: Hi, just wondering if we have already a class that can accept any protein or DNA sequence as input and can return a Sequence object of the correct type ? Andreas From holland at eaglegenomics.com Wed Aug 11 00:05:55 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 11 Aug 2010 05:05:55 +0100 Subject: [Biojava-dev] biojava3 sequence tools In-Reply-To: References: Message-ID: <70AB59E7-52EA-47AD-A10C-734B928F1BED@eaglegenomics.com> You mean an auto-detector that takes a String input, guesses based on content what it is, and returns a Sequence object of the appropriate type, being Protein or DNA etc.? Not that I know of. A bit hard too - if all the letters in the String are a valid subset from two or more alphabets (e.g. ATCG are all in the Protein alphabet as well as being DNA), how do we know which one it is? On 11 Aug 2010, at 03:24, Andreas Prlic wrote: > Hi, > > just wondering if we have already a class that can accept any protein > or DNA sequence as input and can return a Sequence object of the > correct type ? > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From markjschreiber at gmail.com Wed Aug 11 00:46:32 2010 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 11 Aug 2010 12:46:32 +0800 Subject: [Biojava-dev] biojava3 sequence tools In-Reply-To: <70AB59E7-52EA-47AD-A10C-734B928F1BED@eaglegenomics.com> References: <70AB59E7-52EA-47AD-A10C-734B928F1BED@eaglegenomics.com> Message-ID: I think SeqIOTools had a method for this, possible also available in RichSequence.IOTools. As Richard says, not guaranteed to work in all cases. On Wed, Aug 11, 2010 at 12:05 PM, Richard Holland wrote: > You mean an auto-detector that takes a String input, guesses based on > content what it is, and returns a Sequence object of the appropriate type, > being Protein or DNA etc.? Not that I know of. A bit hard too - if all the > letters in the String are a valid subset from two or more alphabets (e.g. > ATCG are all in the Protein alphabet as well as being DNA), how do we know > which one it is? > > On 11 Aug 2010, at 03:24, Andreas Prlic wrote: > > > Hi, > > > > just wondering if we have already a class that can accept any protein > > or DNA sequence as input and can return a Sequence object of the > > correct type ? > > > > Andreas > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From ayates at ebi.ac.uk Wed Aug 11 04:26:31 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 11 Aug 2010 09:26:31 +0100 Subject: [Biojava-dev] biojava3 sequence tools In-Reply-To: References: <70AB59E7-52EA-47AD-A10C-734B928F1BED@eaglegenomics.com> Message-ID: <042B872B-35C2-4A50-BD29-606A45165D6A@ebi.ac.uk> Building a Sequence object which can contain AminoAcidCompound or NucleotideCompound is easy; the return types makes this incredibly hard since we'd have to return Sequence which forces the user to start casting to a more useful type. Every auto detector I've known gets it wrong since they all apply arbitrary thresholds to decide the switch. However if the need is there (which I'm sure for writing some interfaces there are) something can be knocked up quickly I think. On 11 Aug 2010, at 05:46, Mark Schreiber wrote: > I think SeqIOTools had a method for this, possible also available in > RichSequence.IOTools. > > As Richard says, not guaranteed to work in all cases. > > > > > On Wed, Aug 11, 2010 at 12:05 PM, Richard Holland > wrote: > >> You mean an auto-detector that takes a String input, guesses based on >> content what it is, and returns a Sequence object of the appropriate type, >> being Protein or DNA etc.? Not that I know of. A bit hard too - if all the >> letters in the String are a valid subset from two or more alphabets (e.g. >> ATCG are all in the Protein alphabet as well as being DNA), how do we know >> which one it is? >> >> On 11 Aug 2010, at 03:24, Andreas Prlic wrote: >> >>> Hi, >>> >>> just wondering if we have already a class that can accept any protein >>> or DNA sequence as input and can return a Sequence object of the >>> correct type ? >>> >>> Andreas >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From andreas at sdsc.edu Wed Aug 11 11:58:15 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 11 Aug 2010 08:58:15 -0700 Subject: [Biojava-dev] biojava3 sequence tools In-Reply-To: <042B872B-35C2-4A50-BD29-606A45165D6A@ebi.ac.uk> References: <70AB59E7-52EA-47AD-A10C-734B928F1BED@eaglegenomics.com> <042B872B-35C2-4A50-BD29-606A45165D6A@ebi.ac.uk> Message-ID: thanks for the replies. I was trying to see how to improve a web-form into which the user can paste in any type of sequence and the server selects the correct version of blast to run... I will probably use a check how many % of the sequence are looking like they are nucleotides. Unlikely to find a longer protein sequence that just consist of ATCGs ... Andreas On Wed, Aug 11, 2010 at 1:26 AM, Andy Yates wrote: > Building a Sequence object which can contain AminoAcidCompound or NucleotideCompound is easy; the return types makes this incredibly hard since we'd have to return Sequence which forces the user to start casting to a more useful type. Every auto detector I've known gets it wrong since they all apply arbitrary thresholds to decide the switch. > > However if the need is there (which I'm sure for writing some interfaces there are) something can be knocked up quickly I think. > > On 11 Aug 2010, at 05:46, Mark Schreiber wrote: > >> I think SeqIOTools had a method for this, possible also available in >> RichSequence.IOTools. >> >> As Richard says, not guaranteed to work in all cases. >> >> >> >> >> On Wed, Aug 11, 2010 at 12:05 PM, Richard Holland >> wrote: >> >>> You mean an auto-detector that takes a String input, guesses based on >>> content what it is, and returns a Sequence object of the appropriate type, >>> being Protein or DNA etc.? Not that I know of. A bit hard too - if all the >>> letters in the String are a valid subset from two or more alphabets (e.g. >>> ATCG are all in the Protein alphabet as well as being DNA), how do we know >>> which one it is? >>> >>> On 11 Aug 2010, at 03:24, Andreas Prlic wrote: >>> >>>> Hi, >>>> >>>> just wondering if we have already a class that can accept any protein >>>> or DNA sequence as input and can return a Sequence object of the >>>> correct type ? >>>> >>>> Andreas >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From SMarkel at accelrys.com Wed Aug 11 12:51:09 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Wed, 11 Aug 2010 09:51:09 -0700 Subject: [Biojava-dev] biojava3 sequence tools In-Reply-To: References: <70AB59E7-52EA-47AD-A10C-734B928F1BED@eaglegenomics.com> <042B872B-35C2-4A50-BD29-606A45165D6A@ebi.ac.uk> Message-ID: <5ACBA19439E77B43A06F4CAB897EC97701C034BA42@EXCH1-COLO.accelrys.net> Andreas, You might want to look at the _guess_alphabet subroutine in BioPerl's Bio::PrimarySeq module. Here's the core logic. my $u = ($str =~ tr/Uu//); # The assumption here is that most of sequences comprised of mainly # ATGC, with some N, will be 'dna' despite the fact that N could # also be Asparagine my $atgc = ($str =~ tr/ATGCNatgcn//); if( ($atgc / $total) > 0.85 ) { $type = 'dna'; } elsif( (($atgc + $u) / $total) > 0.85 ) { $type = 'rna'; } else { $type = 'protein'; } Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics -----Original Message----- From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic Sent: Wednesday, 11 August 2010 8:58 AM To: Andy Yates Cc: biojava-dev Subject: Re: [Biojava-dev] biojava3 sequence tools thanks for the replies. I was trying to see how to improve a web-form into which the user can paste in any type of sequence and the server selects the correct version of blast to run... I will probably use a check how many % of the sequence are looking like they are nucleotides. Unlikely to find a longer protein sequence that just consist of ATCGs ... Andreas On Wed, Aug 11, 2010 at 1:26 AM, Andy Yates wrote: > Building a Sequence object which can contain AminoAcidCompound or NucleotideCompound is easy; the return types makes this incredibly hard since we'd have to return Sequence which forces the user to start casting to a more useful type. Every auto detector I've known gets it wrong since they all apply arbitrary thresholds to decide the switch. > > However if the need is there (which I'm sure for writing some interfaces there are) something can be knocked up quickly I think. > > On 11 Aug 2010, at 05:46, Mark Schreiber wrote: > >> I think SeqIOTools had a method for this, possible also available in >> RichSequence.IOTools. >> >> As Richard says, not guaranteed to work in all cases. >> >> >> >> >> On Wed, Aug 11, 2010 at 12:05 PM, Richard Holland >> wrote: >> >>> You mean an auto-detector that takes a String input, guesses based on >>> content what it is, and returns a Sequence object of the appropriate type, >>> being Protein or DNA etc.? Not that I know of. A bit hard too - if all the >>> letters in the String are a valid subset from two or more alphabets (e.g. >>> ATCG are all in the Protein alphabet as well as being DNA), how do we know >>> which one it is? >>> >>> On 11 Aug 2010, at 03:24, Andreas Prlic wrote: >>> >>>> Hi, >>>> >>>> just wondering if we have already a class that can accept any protein >>>> or DNA sequence as input and can return a Sequence object of the >>>> correct type ? >>>> >>>> Andreas >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From andreas at sdsc.edu Wed Aug 11 13:58:14 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 11 Aug 2010 10:58:14 -0700 Subject: [Biojava-dev] biojava3 sequence tools In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC97701C034BA42@EXCH1-COLO.accelrys.net> References: <70AB59E7-52EA-47AD-A10C-734B928F1BED@eaglegenomics.com> <042B872B-35C2-4A50-BD29-606A45165D6A@ebi.ac.uk> <5ACBA19439E77B43A06F4CAB897EC97701C034BA42@EXCH1-COLO.accelrys.net> Message-ID: thanks, Scott, here similar utility methods in Java ... Andreas protected static final String NUCLEOTIDE_LETTERS = "GCTAUX"; public static int percentNucleotideSequence(String sequence) { if (sequence == null || sequence.length() == 0) return 0; int l = sequence.length(); int n =0; for (int i = 0; i < l; i++) { if (NUCLEOTIDE_LETTERS.indexOf(sequence.charAt(i)) < 0) { continue; } n++; } return (100 * n) / l; } public static boolean isNucleotideSequence(String sequence) { if (sequence == null || sequence.length() == 0) return false; int l = sequence.length(); for (int i = 0; i < l; i++) { if (NUCLEOTIDE_LETTERS.indexOf(sequence.charAt(i)) < 0) { return false; } } return true; } On Wed, Aug 11, 2010 at 9:51 AM, Scott Markel wrote: > Andreas, > > You might want to look at the _guess_alphabet subroutine in BioPerl's > Bio::PrimarySeq module. > > Here's the core logic. > > ? my $u = ($str =~ tr/Uu//); > ? ? ? ?# The assumption here is that most of sequences comprised of mainly > ? # ATGC, with some N, will be 'dna' despite the fact that N could > ? ? ? ?# also be Asparagine > ? my $atgc = ($str =~ tr/ATGCNatgcn//); > > ? if( ($atgc / $total) > 0.85 ) { > ? ? ? $type = 'dna'; > ? } elsif( (($atgc + $u) / $total) > 0.85 ) { > ? ? ? $type = 'rna'; > ? } else { > ? ? ? $type = 'protein'; > ? } > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect ?email: ?smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) ? ? ? mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 ? ? ?voice: ?+1 858 799 5603 > San Diego, CA 92121 ? ? ? ? ? ? ? ? fax: ? ?+1 858 799 5222 > USA ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? web: ? ?http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > ? ?International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic > Sent: Wednesday, 11 August 2010 8:58 AM > To: Andy Yates > Cc: biojava-dev > Subject: Re: [Biojava-dev] biojava3 sequence tools > > thanks for the replies. I was trying to see how to improve a web-form > into which the user can paste in any type of sequence and the server > selects the correct version of blast to run... ?I will probably use a > check how many % of the sequence are looking like they are > nucleotides. Unlikely to find a longer protein sequence that just > consist of ATCGs ... > > Andreas > > > On Wed, Aug 11, 2010 at 1:26 AM, Andy Yates wrote: >> Building a Sequence object which can contain AminoAcidCompound or NucleotideCompound is easy; the return types makes this incredibly hard since we'd have to return Sequence which forces the user to start casting to a more useful type. Every auto detector I've known gets it wrong since they all apply arbitrary thresholds to decide the switch. >> >> However if the need is there (which I'm sure for writing some interfaces there are) something can be knocked up quickly I think. >> >> On 11 Aug 2010, at 05:46, Mark Schreiber wrote: >> >>> I think SeqIOTools had a method for this, possible also available in >>> RichSequence.IOTools. >>> >>> As Richard says, not guaranteed to work in all cases. >>> >>> >>> >>> >>> On Wed, Aug 11, 2010 at 12:05 PM, Richard Holland >>> wrote: >>> >>>> You mean an auto-detector that takes a String input, guesses based on >>>> content what it is, and returns a Sequence object of the appropriate type, >>>> being Protein or DNA etc.? Not that I know of. A bit hard too - if all the >>>> letters in the String are a valid subset from two or more alphabets (e.g. >>>> ATCG are all in the Protein alphabet as well as being DNA), how do we know >>>> which one it is? >>>> >>>> On 11 Aug 2010, at 03:24, Andreas Prlic wrote: >>>> >>>>> Hi, >>>>> >>>>> just wondering if we have already a class that can accept any protein >>>>> or DNA sequence as input and can return a Sequence object of the >>>>> correct type ? >>>>> >>>>> Andreas >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>>> -- >>>> Richard Holland, BSc MBCS >>>> Operations and Delivery Director, Eagle Genomics Ltd >>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>> http://www.eaglegenomics.com/ >>>> >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From andreas at sdsc.edu Wed Aug 11 18:49:28 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 11 Aug 2010 15:49:28 -0700 Subject: [Biojava-dev] build problem In-Reply-To: <20100811224434.27093.qmail@mxw1102.verio-web.com> References: <20100811224434.27093.qmail@mxw1102.verio-web.com> Message-ID: thanks, still working on making it even faster... A On Wed, Aug 11, 2010 at 3:44 PM, George Waldon wrote: > Andreas: > > Just for your intention, here are my today's build times. I am down to 9 min 21 s. This has very improved. Thank you very much. > > George > > ------------------------------------------------------------------------ > biojava ............................................... SUCCESS [3.938s] > bytecode .............................................. SUCCESS [6.657s] > core .................................................. SUCCESS [47.960s] > alignment ............................................. SUCCESS [3.024s] > blast ................................................. SUCCESS [24.915s] > biojava3-structure .................................... SUCCESS [5:09.755s] > das ................................................... SUCCESS [40.210s] > sequence .............................................. SUCCESS [0.126s] > sequence-core ......................................... SUCCESS [3.359s] > sequence-rna .......................................... SUCCESS [0.364s] > sequence-biosql ....................................... SUCCESS [1.763s] > sequence-fasta ........................................ SUCCESS [0.324s] > sequence-blastxml ..................................... SUCCESS [2.905s] > sequencing ............................................ SUCCESS [5.447s] > phylo ................................................. SUCCESS [4.051s] > biosql ................................................ SUCCESS [4.305s] > gui ................................................... SUCCESS [2.787s] > biojava3-core ......................................... SUCCESS [6.502s] > biojava3-phylo ........................................ SUCCESS [3.124s] > biojava3-structure-gui ................................ SUCCESS [5.339s] > biojava3-alignment .................................... SUCCESS [4.594s] > biojava3-genome ....................................... SUCCESS [2.657s] > biojava3-protmod ...................................... SUCCESS [1:13.620s] > biojava3-ws ........................................... SUCCESS [2.078s] > ------------------------------------------------------------------------ > ------------------------------------------------------------------------ > BUILD SUCCESSFUL > ------------------------------------------------------------------------ > Total time: 9 minutes 21 seconds > Finished at: Wed Aug 11 15:29:57 PDT 2010 > Final Memory: 83M/370M > > > > >>----- ------- Original Message ------- ----- >>From: Andreas Prlic >>To: George Waldon >>Sent: Mon, 9 Aug 2010 15:48:18 >> >>Seems your times are comparable with the build >>machine. The files are >>downloaded and store in local temporary directories >>(as provided by >>System.getProperty("java.io.tmpdir") ) If that tmp >>directory changes, >>the files will have to be re-loaded again, >>otherwise they will be >>re-used and the code runs quicker. Seems the VM is >>changing the tmp >>dir location frequently. Anybody has a suggestion >>how to define a more >>"stable" tmp dir locations? Otherwise I will put >>all required test >>files into the test resources dir ... >> >>Andreas >> >> >> >>On Mon, Aug 9, 2010 at 3:20 PM, George Waldon >> wrote: >>> Here is my summary and some of the timings are >>pretty good (simple HP laptop with a dual core): >>> >>> >>--------------------------------------------------- >>--------------------- >>> Reactor Summary: >>> >>--------------------------------------------------- >>--------------------- >>> biojava >>............................................... >>SUCCESS [3.820s] >>> bytecode >>.............................................. >>SUCCESS [6.022s] >>> core >>.................................................. >>SUCCESS [52.822s] >>> alignment >>............................................. >>SUCCESS [4.332s] >>> blast >>................................................. >>SUCCESS [27.453s] >>> biojava3-structure >>.................................... SUCCESS >>[7:33.228s] >>> das >>................................................... >>SUCCESS [18.456s] >>> sequence >>.............................................. >>SUCCESS [0.122s] >>> sequence-core >>......................................... SUCCESS >>[3.974s] >>> sequence-rna >>.......................................... SUCCESS >>[0.313s] >>> sequence-biosql >>....................................... SUCCESS >>[2.735s] >>> sequence-fasta >>........................................ SUCCESS >>[0.324s] >>> sequence-blastxml >>..................................... SUCCESS >>[3.339s] >>> sequencing >>............................................ >>SUCCESS [5.637s] >>> phylo >>................................................. >>SUCCESS [5.084s] >>> biosql >>................................................ >>SUCCESS [6.344s] >>> gui >>................................................... >>SUCCESS [6.552s] >>> biojava3-core >>......................................... SUCCESS >>[8.477s] >>> biojava3-phylo >>........................................ SUCCESS >>[3.596s] >>> biojava3-structure-gui >>................................ SUCCESS [8.283s] >>> biojava3-alignment >>.................................... SUCCESS >>[6.002s] >>> biojava3-genome >>....................................... SUCCESS >>[3.933s] >>> biojava3-protmod >>...................................... SUCCESS >>[8:59.048s] >>> biojava3-ws >>........................................... SUCCESS >>[1.367s] >>> >>--------------------------------------------------- >>--------------------- >>> >>--------------------------------------------------- >>--------------------- >>> BUILD SUCCESSFUL >>> >>--------------------------------------------------- >>--------------------- >>> Total time: 19 minutes 33 seconds >>> Finished at: Mon Aug 09 13:48:00 PDT 2010 >>> Final Memory: 116M/363M >>> >>--------------------------------------------------- >>--------------------- >>> >>> The long time comes apparently from fetching all >>these files. I tried the build last week on a >>different network after removing the faulty test as >>suggested by Scott and I had similar timing. This >>could be an issue with NetBeans in fact. If someone >>has experienced such long delays, it would be >>interesting to know in which conditions this >>occurred. >>> >>> Thanks again, >>> >>> George >>> >>>>----- ------- Original Message ------- ----- >>>>From: Andreas Prlic >>>>To: George Waldon >>>>Sent: Mon, 9 Aug 2010 14:27:27 >>>> >>>>That seems like a very long time... on the >>>>automated build server the >>>>times look like below: >>>> >>>>If the structure module takes so much time, I >>>>wonder if you are behind >>>>a very slow network? Some of the junit tests >>fetch >>>>PDB files from a >>>>public ftp server and I wonder if there is a >>>>networking issue. Having >>>>said this, the two slowest modules are structure >>>>and protmod. We will >>>>try to cut down the time spent on those tests... >>>> >>>>Andreas >>>> >>>> >>>>INFO] biojava >>>>............................................... >>>>SUCCESS [21.430s] >>>>[INFO] bytecode >>>>.............................................. >>>>SUCCESS [44.778s] >>>>[INFO] core >>>>................................................. >>. >>>>SUCCESS >>>>[4:54.691s] >>>>[INFO] alignment >>>>............................................. >>>>SUCCESS [48.048s] >>>>[INFO] blast >>>>................................................. >>>>SUCCESS >>>>[1:20.263s] >>>>[INFO] biojava3-structure >>>>.................................... SUCCESS >>>>[8:12.412s] >>>>[INFO] das >>>>................................................. >>.. >>>>SUCCESS >>>>[1:13.103s] >>>>[INFO] sequence >>>>.............................................. >>>>SUCCESS [18.176s] >>>>[INFO] sequence-core >>>>......................................... SUCCESS >>>>[47.424s] >>>>[INFO] sequence-rna >>>>.......................................... >>SUCCESS >>>>[28.770s] >>>>[INFO] sequence-biosql >>>>....................................... SUCCESS >>>>[47.032s] >>>>[INFO] sequence-fasta >>>>........................................ SUCCESS >>>>[29.075s] >>>>[INFO] sequence-blastxml >>>>..................................... SUCCESS >>>>[44.392s] >>>>[INFO] sequencing >>>>............................................ >>>>SUCCESS [49.520s] >>>>[INFO] phylo >>>>................................................. >>>>SUCCESS [48.459s] >>>>[INFO] biosql >>>>................................................ >>>>SUCCESS [57.503s] >>>>[INFO] gui >>>>................................................. >>.. >>>>SUCCESS [58.610s] >>>>[INFO] biojava3-core >>>>......................................... SUCCESS >>>>[1:05.398s] >>>>[INFO] biojava3-phylo >>>>........................................ SUCCESS >>>>[46.623s] >>>>[INFO] biojava3-structure-gui >>>>................................ SUCCESS >>>>[1:06.737s] >>>>[INFO] biojava3-alignment >>>>.................................... SUCCESS >>>>[54.682s] >>>>[INFO] biojava3-genome >>>>....................................... SUCCESS >>>>[49.844s] >>>>[INFO] biojava3-protmod >>>>...................................... SUCCESS >>>>[10:05.171s] >>>>[INFO] biojava3-ws >>>>........................................... >>SUCCESS >>>>[44.379s] >>>> >>>> >>>>On Mon, Aug 9, 2010 at 1:52 PM, George Waldon >>>> wrote: >>>>> Thanks to all for the fixing. >>>>> >>>>> The build took 19 minutes and 33 seconds, of >>>>which 16 min and 32 s were for the structure >>>>modules! This sounds a bit long to me. Is-this >>>>expected? >>>>> >>>>> George >>>>> >>>>>>----- ------- Original Message ------- ----- >>>>>>From: Andreas Prlic >>>>>>To: George Waldon >>>>>>Sent: Mon, 9 Aug 2010 12:35:56 >>>>>> >>>>>>If you update the class, this should be fixed >>>>now. >>>>>>Was a problem with >>>>>>a hard coded /tmp path... >>>>>> >>>>>>Andreas >>>>>> >>>>>> >>>>>> >>>>>>On Mon, Aug 9, 2010 at 12:26 PM, George Waldon >>>>>> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I am getting the following failures in >>>>>>org.biojava.bio.structure.align.benchmark.Multi >>pl >>>>eA >>>>>>lignmentTest: >>>>>>> >>>>>>> Fetching >>>>>>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all >>/p >>>>db >>>>>>/pdb1hcy.ent.gz >>>>>>> writing to \tmp\hc\pdb1hcy.ent.gz >>>>>>> java.io.FileNotFoundException: >>>>>>\tmp\hc\pdb1hcy.ent.gz (The system cannot find >>>>the >>>>>>path specified) >>>>>>> ... >>>>>>> >>>>>>> and >>>>>>> >>>>>>> Fetching >>>>>>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all >>/p >>>>db >>>>>>/pdb1nls.ent.gz >>>>>>> writing to \tmp\nl\pdb1nls.ent.gz >>>>>>> java.io.FileNotFoundException: >>>>>>\tmp\nl\pdb1nls.ent.gz (The system cannot find >>>>the >>>>>>path specified) >>>>>>> at java.io.FileOutputStream.open(Native >>Method) >>>> >>>>>>> at >>>>>>java.io.FileOutputStream.(FileOutputStrea >>m. >>>>ja >>>>>>va:179) >>>>>>> at >>>>>>java.io.FileOutputStream.(FileOutputStrea >>m. >>>>ja >>>>>>va:131) >>>>>>> at >>>>>>org.biojava.bio.structure.io.PDBFileReader.down >>lo >>>>ad >>>>>>PDB(PDBFileReader.java:430) >>>>>>> ... >>>>>>> >>>>>>> Does anyone has a solution for this? I am >>>>>>building from within NetBeans. >>>>>>> >>>>>>> Thanks, >>>>>>> George >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>>-- >>>>>>----------------------------------------------- >>-- >>>>-- >>>>>>-------------------- >>>>>>Dr. Andreas Prlic >>>>>>Senior Scientist, RCSB PDB Protein Data Bank >>>>>>University of California, San Diego >>>>>>(+1) 858.246.0526 >>>>>>----------------------------------------------- >>-- >>>>-- >>>>>>-------------------- >>>>> >>>> >>>> >>>> >>>>-- >>>>------------------------------------------------- >>-- >>>>-------------------- >>>>Dr. Andreas Prlic >>>>Senior Scientist, RCSB PDB Protein Data Bank >>>>University of California, San Diego >>>>(+1) 858.246.0526 >>>>------------------------------------------------- >>-- >>>>-------------------- >>> >> >> >> >>-- >>--------------------------------------------------- >>-------------------- >>Dr. Andreas Prlic >>Senior Scientist, RCSB PDB Protein Data Bank >>University of California, San Diego >>(+1) 858.246.0526 >>--------------------------------------------------- >>-------------------- > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From bugzilla-daemon at portal.open-bio.org Thu Aug 12 01:12:22 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Aug 2010 01:12:22 -0400 Subject: [Biojava-dev] [Bug 2565] Cannot parse uniprot P02768 In-Reply-To: Message-ID: <201008120512.o7C5CMAV026392@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2565 gwaldon at geneinfinity.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #1 from gwaldon at geneinfinity.org 2010-08-12 01:12 EST ------- Cannot reproduce error with today's build. Must have been fixed or record has changed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 12 01:28:35 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Aug 2010 01:28:35 -0400 Subject: [Biojava-dev] [Bug 2541] Exception is thrown when trying to parse a valid GenBank file In-Reply-To: Message-ID: <201008120528.o7C5SZ1a026843@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2541 gwaldon at geneinfinity.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |INVALID ------- Comment #3 from gwaldon at geneinfinity.org 2010-08-12 01:28 EST ------- As suggested by Peter, adding the sequence termination // solves the problem. This was most likely invalid. Please reopen if necessary. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From andreas at sdsc.edu Fri Aug 13 21:30:45 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 13 Aug 2010 18:30:45 -0700 Subject: [Biojava-dev] biojava-structure now depending on biojava3-core and biojava3-alignment Message-ID: Hi, I just committed a major update to the biojava-structure modules. They no longer depend on anything biojava 1.7 related, but only on biojava3-core and biojava3-alignment. That's one step closer to getting ready for a new 3.0 release... Andreas From HWillis at scripps.edu Fri Aug 13 23:22:39 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Fri, 13 Aug 2010 23:22:39 -0400 Subject: [Biojava-dev] biojava-structure now depending on biojava3-core and biojava3-alignment Message-ID: Andreas I figured you were burning some brain cells before going on vacation. The only major core elements missing is a good design for features. I want it to be transparent and descriptive without adding a ton of methods. I will check out the latest and do some testing. Scooter ----- Reply message ----- From: "Andreas Prlic" Date: Fri, Aug 13, 2010 9:30 pm Subject: biojava-structure now depending on biojava3-core and biojava3-alignment To: "Scooter Willis" , "Mark Chapman" Cc: "biojava-dev" Hi, I just committed a major update to the biojava-structure modules. They no longer depend on anything biojava 1.7 related, but only on biojava3-core and biojava3-alignment. That's one step closer to getting ready for a new 3.0 release... Andreas From bugzilla-daemon at portal.open-bio.org Tue Aug 17 12:44:46 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Aug 2010 12:44:46 -0400 Subject: [Biojava-dev] [Bug 3132] New: SITE records in PDBFileReader Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3132 Summary: SITE records in PDBFileReader Product: BioJava Version: unspecified Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: structure AssignedTo: biojava-dev at biojava.org ReportedBy: darnells at dnastar.com Feature request from the BioJava community: http://lists.open-bio.org/pipermail/biojava-l/2010-August/007254.html I am interested in parsing SITE records from a PDB file. ??[...] would it be possible to add this capability to PDBFileReader and the Structure class? http://lists.open-bio.org/pipermail/biojava-l/2010-August/007260.html REMARK 800 provides a very useful SITE_DESCRIPTION for each SITE_IDENTIFIER code in use in the SITE records. Could the site name also be associated with the site identifier and residues? There is precedence for parsing REMARK records in BioJava (e.g. experiment type, resolution), but this is a special case where REMARK 800 and SITE records are dependent on one another and physically separated in the header. Reply from Andreas Prlic: http://lists.open-bio.org/pipermail/biojava-l/2010-August/007257.html - Take a look at PDBFileParser.java and at http://www.wwpdb.org/documentation/format32/sect7.html - It needs a new Handler method for the Site records that builds up the data containers. - Create a new bean that will contain the data for the SITE record - Instead of having fields for insertion code residue nr and chain IDs, you can use the new PDBResidueNumber.java class to group this together. - Add a get/set method for the Site beans to the Structure class - Create a junit test that make sure the parsing works ok. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From andreas at sdsc.edu Wed Aug 18 14:26:23 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 18 Aug 2010 11:26:23 -0700 Subject: [Biojava-dev] Last week of Google Summer of Code Message-ID: Hi, This is the last week of this year's Google Summer of Code project and I am happy to announce that our two students Mark Chapman and Jianjiong Gao did an amazing job on their two projects "All Java Multiple Sequence Alignment" (MSA) and "Identification and Classification of Posttranslational Modification of Proteins" (PTM). For Multiple Sequence Alignments we?now have a flexible and multi-threaded MSA implementation that works in linear space and that, as an option, allows the users to define anchors that are used in the build up of the multiple alignment. The code is available as part of the new biojava3-alignment module. The Posttranslational Modification module (biojava3-protmod) can detect three different types of protein modifications in protein structures. It comes with an XML file & Java data structures to store information about different types of protein modifications, and contains entries from RESID, PDBCC and PSI-MOD. There is also a visualisation component to display cross linked PTM on a sequence viewer. Both Mark and Jianjiong have expressed their interest in maintaining and further developing their modules and I am looking forward to interacting more with them in the future. I want to thank the Mentors and Co-Mentors Peter Rose, Kyle Ellrott and Scooter Willis for their help and guidance for the projects, without them this would not have been possible. Thanks also to Robert Buels and the ?Open Bioinformatics Foundation for organizing our applications for GSoC and last, but not least, Google for sponsoring this Summer of Code. Happy BioJava-ing, Andreas From darnells at dnastar.com Mon Aug 23 12:21:41 2010 From: darnells at dnastar.com (Steve Darnell) Date: Mon, 23 Aug 2010 11:21:41 -0500 Subject: [Biojava-dev] Structures test failures Message-ID: Greetings, I am attempting to build biojava from the anonymous svn server at github; I cannot even connect to the anonymous svn server at open-bio. I am unable to package/install the structure module and I suspect it is caused by two failures that occur in the unit tests: Failed tests: testOldSecOutput(org.biojava.bio.structure.TestSECalignment) testParsePairs(org.biojava.bio.structure.align.TestAlignDBSearchPairs) Tests run: 61, Failures: 2, Errors: 0, Skipped: 0 The build log has no indication that the structure module was packaged or installed. I have yet to configure Eclipse to build biojava, so this is how I attempted to build it on the command line: $ svn co http://svn.github.com/biojava/biojava.git ./biojava $ cd biojava $ mvn clean install > biojava.log& tail -f biojava.log $ cd ~/.m2/repository/org/biojava $ ls % alignment % biojava % biojava3-alignment % biojava3-core % biojava3-phylo % blast % bytecode % core I am new to maven, so it is likely I am overlooking something obvious. I have reproduced this outcome on OSX 10.6.4 and Ubuntu 10.04. I would appreciate any suggestions that the dev list might have. Best regards, Steve Darnell From jacobsen at ebi.ac.uk Tue Aug 24 08:41:12 2010 From: jacobsen at ebi.ac.uk (Jules Jacobsen) Date: Tue, 24 Aug 2010 13:41:12 +0100 Subject: [Biojava-dev] Structures test failures In-Reply-To: References: Message-ID: <4C73BDE8.4050708@ebi.ac.uk> Hi Steve, Unless you're a staunch Eclipse user I can recommend Netbeans for maven as it works very well with it. One issue I've had is that Netbeans 6.9 is generally slower and more cumbersome than 6.8, worse still 6.9.1 seems incapable of connecting to the repository at all. So stick with 6.8 and it should be fine. I can confirm that the most recent dev version of biojava-structure does not fail any tests, although I just tried checking out the read-only public version from svn://code.open-bio.org/biojava/biojava-live/trunk and encountered a 'rev-props' error and now I'm unable to connect at all... Your error isn't likely to be maven related - if everything else ran fine it's probably just that you have an out-of date version of the source as the structure module is independent of the other modules, but I can't tell for sure without seeing the error messages produced by the failed tests. Regards, Jules On 23/08/2010 17:21, Steve Darnell wrote: > Greetings, > > > > I am attempting to build biojava from the anonymous svn server at > github; I cannot even connect to the anonymous svn server at open-bio. > I am unable to package/install the structure module and I suspect it is > caused by two failures that occur in the unit tests: > > > > Failed tests: > > testOldSecOutput(org.biojava.bio.structure.TestSECalignment) > > testParsePairs(org.biojava.bio.structure.align.TestAlignDBSearchPairs) > > > > Tests run: 61, Failures: 2, Errors: 0, Skipped: 0 > > > > The build log has no indication that the structure module was packaged > or installed. > > > > I have yet to configure Eclipse to build biojava, so this is how I > attempted to build it on the command line: > > > > $ svn co http://svn.github.com/biojava/biojava.git ./biojava > > $ cd biojava > > $ mvn clean install> biojava.log& tail -f biojava.log > > $ cd ~/.m2/repository/org/biojava > > $ ls > > % alignment > > % biojava > > % biojava3-alignment > > % biojava3-core > > % biojava3-phylo > > % blast > > % bytecode > > % core > > > > I am new to maven, so it is likely I am overlooking something obvious. > I have reproduced this outcome on OSX 10.6.4 and Ubuntu 10.04. I would > appreciate any suggestions that the dev list might have. > > > > Best regards, > > Steve Darnell > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From HWillis at scripps.edu Tue Aug 24 09:08:11 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 24 Aug 2010 09:08:11 -0400 Subject: [Biojava-dev] Structures test failures In-Reply-To: <4C73BDE8.4050708@ebi.ac.uk> References: <4C73BDE8.4050708@ebi.ac.uk> Message-ID: <45B9BED2-4713-455B-A660-A855E7CBE39C@scripps.edu> Jules I am not having in problem with Netbeans 6.9 with the latest patches. I do like the improved debug GUI in 6.9 for inspecting collections. I haven't done a fresh install of 6.9.1. Scooter On Aug 24, 2010, at 8:41 AM, Jules Jacobsen wrote: > Hi Steve, > > Unless you're a staunch Eclipse user I can recommend Netbeans for maven > as it works very well with it. One issue I've had is that Netbeans 6.9 > is generally slower and more cumbersome than 6.8, worse still 6.9.1 > seems incapable of connecting to the repository at all. So stick with > 6.8 and it should be fine. > > I can confirm that the most recent dev version of biojava-structure does > not fail any tests, although I just tried checking out the read-only > public version from svn://code.open-bio.org/biojava/biojava-live/trunk > and encountered a 'rev-props' error and now I'm unable to connect at all... > > Your error isn't likely to be maven related - if everything else ran > fine it's probably just that you have an out-of date version of the > source as the structure module is independent of the other modules, but > I can't tell for sure without seeing the error messages produced by the > failed tests. > > Regards, > > Jules > > On 23/08/2010 17:21, Steve Darnell wrote: >> Greetings, >> >> >> >> I am attempting to build biojava from the anonymous svn server at >> github; I cannot even connect to the anonymous svn server at open-bio. >> I am unable to package/install the structure module and I suspect it is >> caused by two failures that occur in the unit tests: >> >> >> >> Failed tests: >> >> testOldSecOutput(org.biojava.bio.structure.TestSECalignment) >> >> testParsePairs(org.biojava.bio.structure.align.TestAlignDBSearchPairs) >> >> >> >> Tests run: 61, Failures: 2, Errors: 0, Skipped: 0 >> >> >> >> The build log has no indication that the structure module was packaged >> or installed. >> >> >> >> I have yet to configure Eclipse to build biojava, so this is how I >> attempted to build it on the command line: >> >> >> >> $ svn co http://svn.github.com/biojava/biojava.git ./biojava >> >> $ cd biojava >> >> $ mvn clean install> biojava.log& tail -f biojava.log >> >> $ cd ~/.m2/repository/org/biojava >> >> $ ls >> >> % alignment >> >> % biojava >> >> % biojava3-alignment >> >> % biojava3-core >> >> % biojava3-phylo >> >> % blast >> >> % bytecode >> >> % core >> >> >> >> I am new to maven, so it is likely I am overlooking something obvious. >> I have reproduced this outcome on OSX 10.6.4 and Ubuntu 10.04. I would >> appreciate any suggestions that the dev list might have. >> >> >> >> Best regards, >> >> Steve Darnell >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From darnells at dnastar.com Tue Aug 24 16:19:57 2010 From: darnells at dnastar.com (Steve Darnell) Date: Tue, 24 Aug 2010 15:19:57 -0500 Subject: [Biojava-dev] Structures test failures In-Reply-To: <4C73BDE8.4050708@ebi.ac.uk> References: <4C73BDE8.4050708@ebi.ac.uk> Message-ID: Jules, Thanks for the suggestion. In the interim, I was able to work around my Eclipse and svn problems. It's not a fully integrated solution, but it works. The Subclipse svn plug-in from Tigris would fail during checkout with an exception stating "RA layer request failed svn: REPORT of '[...]' 200 OK" for both Windows 7 and OSX 10.6. For Windows, there are reports of the Windows indexing service, antivirus scanners, or the Subclipse JavaHL adapter causing svn problems. Resolving these issues did not help. In the end, I used a git client on Windows to clone the biojava project from github and then imported it into Eclipse as an existing Maven project with the m2eclipse plug-in. I was able to successfully build and install biojava3-structure and its dependencies without test failures. To the list, If anyone has had similar problems setting up Eclispe, I would be grateful to hear how you resolved the problem. If not, my solution is good enough for now. Regards, Steve -----Original Message----- From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Jules Jacobsen Sent: Tuesday, August 24, 2010 7:41 AM To: biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] Structures test failures Hi Steve, Unless you're a staunch Eclipse user I can recommend Netbeans for maven as it works very well with it. One issue I've had is that Netbeans 6.9 is generally slower and more cumbersome than 6.8, worse still 6.9.1 seems incapable of connecting to the repository at all. So stick with 6.8 and it should be fine. I can confirm that the most recent dev version of biojava-structure does not fail any tests, although I just tried checking out the read-only public version from svn://code.open-bio.org/biojava/biojava-live/trunk and encountered a 'rev-props' error and now I'm unable to connect at all... Your error isn't likely to be maven related - if everything else ran fine it's probably just that you have an out-of date version of the source as the structure module is independent of the other modules, but I can't tell for sure without seeing the error messages produced by the failed tests. Regards, Jules On 23/08/2010 17:21, Steve Darnell wrote: > Greetings, > > I am attempting to build biojava from the anonymous svn server at > github; I cannot even connect to the anonymous svn server at open-bio. > I am unable to package/install the structure module and I suspect it is > caused by two failures that occur in the unit tests: > > Failed tests: > testOldSecOutput(org.biojava.bio.structure.TestSECalignment) > testParsePairs(org.biojava.bio.structure.align.TestAlignDBSearchPairs) > > Tests run: 61, Failures: 2, Errors: 0, Skipped: 0 > > The build log has no indication that the structure module was packaged > or installed. > > I have yet to configure Eclipse to build biojava, so this is how I > attempted to build it on the command line: > > $ svn co http://svn.github.com/biojava/biojava.git ./biojava > $ cd biojava > $ mvn clean install> biojava.log& tail -f biojava.log > $ cd ~/.m2/repository/org/biojava > $ ls > % alignment > % biojava > % biojava3-alignment > % biojava3-core > % biojava3-phylo > % blast > % bytecode > % core > > I am new to maven, so it is likely I am overlooking something obvious. > I have reproduced this outcome on OSX 10.6.4 and Ubuntu 10.04. I would > appreciate any suggestions that the dev list might have. > > Best regards, > Steve Darnell > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From bugzilla-daemon at portal.open-bio.org Mon Aug 30 13:44:56 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 30 Aug 2010 13:44:56 -0400 Subject: [Biojava-dev] [Bug 3132] SITE records in PDBFileReader In-Reply-To: Message-ID: <201008301744.o7UHiu26023875@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3132 amr_alhossary at hotmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Aug 30 14:22:45 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 30 Aug 2010 14:22:45 -0400 Subject: [Biojava-dev] [Bug 3132] SITE records in PDBFileReader In-Reply-To: Message-ID: <201008301822.o7UIMjMK025040@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3132 amr_alhossary at hotmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #1 from amr_alhossary at hotmail.com 2010-08-30 14:22 EST ------- Done! SITE records are now parsed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From deniz.koellhofer at cambia.org Tue Aug 31 02:46:30 2010 From: deniz.koellhofer at cambia.org (Deniz Koellhofer) Date: Tue, 31 Aug 2010 16:46:30 +1000 Subject: [Biojava-dev] biojava3 BLAST parser Message-ID: Hi, I wanted to find out the current state of blast parsing efforts in biojava3 - especially for ncbi blastxml output? I had a quick look and found some DOM based code fragments in org.biojava3.genome.query.BlastXMLQuery. Is there already anybody working on a more comprehensive SAX parser? The biojava1.7.1 blastxml parser seems to work fine, however some of the tags in NCBI-BLASTN 2.2.23+ output like Hsp_midline, BlastOutput_param don't seem to get parsed properly in org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade. Cheers, Deniz -- Deniz Koellhofer Cambia Initiative for Open Innovation (IOI) Cambia at QUT, G301, 2 George Street, Brisbane Qld 4000, Australia From HWillis at scripps.edu Tue Aug 31 06:43:00 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 31 Aug 2010 06:43:00 -0400 Subject: [Biojava-dev] biojava3 BLAST parser In-Reply-To: References: Message-ID: Deniz Can you provide some requirements regarding parsing the Blast XML. I tend to use XPATH and the DOM object to get to the data elements of interest so you already have the ability to load the Blast XML and work with the data. The difficulty of "parsing" is not an issue with XML. The BlastXMLQuery is an example of searching the Blast XML to get results. Are you wanting the XML elements translated to Java classes? Thanks Scooter On Aug 31, 2010, at 2:46 AM, Deniz Koellhofer wrote: > Hi, > > I wanted to find out the current state of blast parsing efforts in biojava3 > - especially for ncbi blastxml output? > > I had a quick look and found some DOM based code fragments > in org.biojava3.genome.query.BlastXMLQuery. Is there already anybody working > on a more comprehensive SAX parser? > > The biojava1.7.1 blastxml parser seems to work fine, however some of the > tags in NCBI-BLASTN 2.2.23+ output like Hsp_midline, BlastOutput_param don't > seem to get parsed properly > in org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade. > > Cheers, > Deniz > > -- > Deniz Koellhofer > Cambia > Initiative for Open Innovation (IOI) > Cambia at QUT, G301, 2 George Street, Brisbane Qld 4000, Australia > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From deniz.koellhofer at cambia.org Tue Aug 31 08:49:21 2010 From: deniz.koellhofer at cambia.org (Deniz Koellhofer) Date: Tue, 31 Aug 2010 22:49:21 +1000 Subject: [Biojava-dev] biojava3 BLAST parser In-Reply-To: References: Message-ID: *Hi Scooter,* * * *Thanks for the reply. I guess the BlastXMLQuery is a good example to show how to quickly extract information from a BLAST result. * * * *But in my opinion biojava3 should alo have a Blast parser that generates java beans containing the complete Blast result set - similar to what biojava1.7.1 was doing. So yeah, I'm after translating the XML elements to Java classes.* * * *Would something like that fit into one of the biojava3 modules? homology, I/O?* * * *Thanks,* *Deniz* * * On Tue, Aug 31, 2010 at 8:43 PM, Scooter Willis wrote: > Deniz > > Can you provide some requirements regarding parsing the Blast XML. I tend > to use XPATH and the DOM object to get to the data elements of interest so > you already have the ability to load the Blast XML and work with the data. > The difficulty of "parsing" is not an issue with XML. The BlastXMLQuery is > an example of searching the Blast XML to get results. Are you wanting the > XML elements translated to Java classes? > > Thanks > > Scooter > > On Aug 31, 2010, at 2:46 AM, Deniz Koellhofer wrote: > > > Hi, > > > > I wanted to find out the current state of blast parsing efforts in > biojava3 > > - especially for ncbi blastxml output? > > > > I had a quick look and found some DOM based code fragments > > in org.biojava3.genome.query.BlastXMLQuery. Is there already anybody > working > > on a more comprehensive SAX parser? > > > > The biojava1.7.1 blastxml parser seems to work fine, however some of the > > tags in NCBI-BLASTN 2.2.23+ output like Hsp_midline, BlastOutput_param > don't > > seem to get parsed properly > > in org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade. > > > > Cheers, > > Deniz > > > > -- > > Deniz Koellhofer > > Cambia > > Initiative for Open Innovation (IOI) > > Cambia at QUT, G301, 2 George Street, Brisbane Qld 4000, Australia > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- -- Deniz Koellhofer Cambia Initiative for Open Innovation (IOI) Cambia at QUT, G301, 2 George Street, Brisbane Qld 4000, Australia From HWillis at scripps.edu Tue Aug 31 10:11:30 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 31 Aug 2010 10:11:30 -0400 Subject: [Biojava-dev] biojava3 BLAST parser In-Reply-To: References: Message-ID: <47FD5948-0439-45C6-A1AB-22E7CC8D17A6@scripps.edu> Deniz It would be great to formalize the XML blast results as Java classes. Do you have any interest in taking on the project? Capturing the blast alignment using the new alignment classes would be a very nice feature. I like using XPATH as the query language to select for hits of interest which should allow for a SAX based approach to minimize the impact of very large XML files. XPATH and SAX does appear to have some constraints (http://stackoverflow.com/questions/1863250/is-it-there-any-xpath-processor-for-sax-model) Probably makes sense to have a Blast module that would depend on core and alignment. Thanks Scooter On Aug 31, 2010, at 8:49 AM, Deniz Koellhofer wrote: Hi Scooter, Thanks for the reply. I guess the BlastXMLQuery is a good example to show how to quickly extract information from a BLAST result. But in my opinion biojava3 should alo have a Blast parser that generates java beans containing the complete Blast result set - similar to what biojava1.7.1 was doing. So yeah, I'm after translating the XML elements to Java classes. Would something like that fit into one of the biojava3 modules? homology, I/O? Thanks, Deniz On Tue, Aug 31, 2010 at 8:43 PM, Scooter Willis > wrote: Deniz Can you provide some requirements regarding parsing the Blast XML. I tend to use XPATH and the DOM object to get to the data elements of interest so you already have the ability to load the Blast XML and work with the data. The difficulty of "parsing" is not an issue with XML. The BlastXMLQuery is an example of searching the Blast XML to get results. Are you wanting the XML elements translated to Java classes? Thanks Scooter On Aug 31, 2010, at 2:46 AM, Deniz Koellhofer wrote: > Hi, > > I wanted to find out the current state of blast parsing efforts in biojava3 > - especially for ncbi blastxml output? > > I had a quick look and found some DOM based code fragments > in org.biojava3.genome.query.BlastXMLQuery. Is there already anybody working > on a more comprehensive SAX parser? > > The biojava1.7.1 blastxml parser seems to work fine, however some of the > tags in NCBI-BLASTN 2.2.23+ output like Hsp_midline, BlastOutput_param don't > seem to get parsed properly > in org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade. > > Cheers, > Deniz > > -- > Deniz Koellhofer > Cambia > Initiative for Open Innovation (IOI) > Cambia at QUT, G301, 2 George Street, Brisbane Qld 4000, Australia > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- -- Deniz Koellhofer Cambia Initiative for Open Innovation (IOI) Cambia at QUT, G301, 2 George Street, Brisbane Qld 4000, Australia From sheoran143 at gmail.com Thu Aug 19 20:45:29 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Fri, 20 Aug 2010 00:45:29 -0000 Subject: [Biojava-dev] Required Correction in GenbankLocationParser class Message-ID: <4C6DD03C.1080909@gmail.com> Their is problem with GenbankLocationParser class, this class don't process genbank record with Accession: M32882. LocationParser class fails at following line in genbank record: gene join((8298.8300)..10206,1..855) /gene="bcn" mRNA join((8298.8300)..10206,1..855) /gene="bcn" /note="alternative transcript" Exception stack trace is as follows: Could not understand position: 10206,1..855 org.biojava.bio.seq.io.ParseException: Could not understand position: 10206,1..855 at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:277) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:244) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) I did some investigation in following matter, and found the defect in regular expression named as "gp" in GenbankLocationParser class. This error can be fixed by applying attached patch. And then for testing I have created a method which proves that it can now understand all the possible combination of location. This test class is also attached so that you can test my patch before and after its application. I don't have access to svn so please apply this patch for me, and let me know if you approve this patch or not. Thanks Deepak Sheoran -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: GenbankLocationParser.patch URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: LocationParserTest.java URL: From sheoran143 at gmail.com Thu Aug 19 20:48:24 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Fri, 20 Aug 2010 00:48:24 -0000 Subject: [Biojava-dev] Required Correction in GenbankLocationParser class Message-ID: <4C6DD0E8.8070704@gmail.com> Their is problem with GenbankLocationParser class, this class don't process genbank record with Accession: M32882. LocationParser class fails at following line in genbank record: gene join((8298.8300)..10206,1..855) /gene="bcn" mRNA join((8298.8300)..10206,1..855) /gene="bcn" /note="alternative transcript" Exception stack trace is as follows: Could not understand position: 10206,1..855 org.biojava.bio.seq.io.ParseException: Could not understand position: 10206,1..855 at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:277) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:244) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) I did some investigation in following matter, and found the defect in regular expression named as "gp" in GenbankLocationParser class. This error can be fixed by applying attached patch. And then for testing I have created a method which proves that it can now understand all the possible combination of location. This test class is also attached so that you can test my patch before and after its application. I don't have access to svn so please apply this patch for me, and let me know if you approve this patch or not. Thanks Deepak Sheoran -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: GenbankLocationParser.patch URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: LocationParserTest.java URL: From gwaldon at geneinfinity.org Mon Aug 9 19:26:28 2010 From: gwaldon at geneinfinity.org (George Waldon) Date: Mon, 09 Aug 2010 15:26:28 -0400 Subject: [Biojava-dev] build problem Message-ID: <20100809192628.24899.qmail@mxw1102.verio-web.com> Hi, I am getting the following failures in org.biojava.bio.structure.align.benchmark.MultipleAlignmentTest: Fetching ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/pdb1hcy.ent.gz writing to \tmp\hc\pdb1hcy.ent.gz java.io.FileNotFoundException: \tmp\hc\pdb1hcy.ent.gz (The system cannot find the path specified) ... and Fetching ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/pdb1nls.ent.gz writing to \tmp\nl\pdb1nls.ent.gz java.io.FileNotFoundException: \tmp\nl\pdb1nls.ent.gz (The system cannot find the path specified) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.(FileOutputStream.java:179) at java.io.FileOutputStream.(FileOutputStream.java:131) at org.biojava.bio.structure.io.PDBFileReader.downloadPDB(PDBFileReader.java:430) ... Does anyone has a solution for this? I am building from within NetBeans. Thanks, George From andreas at sdsc.edu Mon Aug 9 19:35:56 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 9 Aug 2010 12:35:56 -0700 Subject: [Biojava-dev] build problem In-Reply-To: <20100809192628.24899.qmail@mxw1102.verio-web.com> References: <20100809192628.24899.qmail@mxw1102.verio-web.com> Message-ID: If you update the class, this should be fixed now. Was a problem with a hard coded /tmp path... Andreas On Mon, Aug 9, 2010 at 12:26 PM, George Waldon wrote: > Hi, > > I am getting the following failures in org.biojava.bio.structure.align.benchmark.MultipleAlignmentTest: > > Fetching ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/pdb1hcy.ent.gz > writing to \tmp\hc\pdb1hcy.ent.gz > java.io.FileNotFoundException: \tmp\hc\pdb1hcy.ent.gz (The system cannot find the path specified) > ... > > and > > Fetching ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/pdb1nls.ent.gz > writing to \tmp\nl\pdb1nls.ent.gz > java.io.FileNotFoundException: \tmp\nl\pdb1nls.ent.gz (The system cannot find the path specified) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:179) > at java.io.FileOutputStream.(FileOutputStream.java:131) > at org.biojava.bio.structure.io.PDBFileReader.downloadPDB(PDBFileReader.java:430) > ... > > Does anyone has a solution for this? I am building from within NetBeans. > > Thanks, > George > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From HWillis at scripps.edu Mon Aug 9 19:33:36 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Mon, 9 Aug 2010 15:33:36 -0400 Subject: [Biojava-dev] build problem In-Reply-To: <20100809192628.24899.qmail@mxw1102.verio-web.com> References: <20100809192628.24899.qmail@mxw1102.verio-web.com> Message-ID: George Try creating the directory \tmp\hc and \tmp\nl and see if that fixes the problem. Not sure if the test cases build the directory structure for copying files. If that doesn't work then Andreas will need to figure out the problem. You can comment out the test case if you want a quick work around. Thanks Scooter On Aug 9, 2010, at 3:26 PM, George Waldon wrote: ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/pdb1hcy.ent.gz From gwaldon at geneinfinity.org Mon Aug 9 20:52:39 2010 From: gwaldon at geneinfinity.org (George Waldon) Date: Mon, 09 Aug 2010 16:52:39 -0400 Subject: [Biojava-dev] build problem Message-ID: <20100809205239.19040.qmail@mxw1102.verio-web.com> Thanks to all for the fixing. The build took 19 minutes and 33 seconds, of which 16 min and 32 s were for the structure modules! This sounds a bit long to me. Is-this expected? George >----- ------- Original Message ------- ----- >From: Andreas Prlic >To: George Waldon >Sent: Mon, 9 Aug 2010 12:35:56 > >If you update the class, this should be fixed now. >Was a problem with >a hard coded /tmp path... > >Andreas > > > >On Mon, Aug 9, 2010 at 12:26 PM, George Waldon > wrote: >> Hi, >> >> I am getting the following failures in >org.biojava.bio.structure.align.benchmark.MultipleA >lignmentTest: >> >> Fetching >ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb >/pdb1hcy.ent.gz >> writing to \tmp\hc\pdb1hcy.ent.gz >> java.io.FileNotFoundException: >\tmp\hc\pdb1hcy.ent.gz (The system cannot find the >path specified) >> ... >> >> and >> >> Fetching >ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb >/pdb1nls.ent.gz >> writing to \tmp\nl\pdb1nls.ent.gz >> java.io.FileNotFoundException: >\tmp\nl\pdb1nls.ent.gz (The system cannot find the >path specified) >> at java.io.FileOutputStream.open(Native Method) >> at >java.io.FileOutputStream.(FileOutputStream.ja >va:179) >> at >java.io.FileOutputStream.(FileOutputStream.ja >va:131) >> at >org.biojava.bio.structure.io.PDBFileReader.download >PDB(PDBFileReader.java:430) >> ... >> >> Does anyone has a solution for this? I am >building from within NetBeans. >> >> Thanks, >> George >> > > > >-- >--------------------------------------------------- >-------------------- >Dr. Andreas Prlic >Senior Scientist, RCSB PDB Protein Data Bank >University of California, San Diego >(+1) 858.246.0526 >--------------------------------------------------- >-------------------- From andreas at sdsc.edu Mon Aug 9 21:27:27 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 9 Aug 2010 14:27:27 -0700 Subject: [Biojava-dev] build problem In-Reply-To: <20100809205239.19040.qmail@mxw1102.verio-web.com> References: <20100809205239.19040.qmail@mxw1102.verio-web.com> Message-ID: That seems like a very long time... on the automated build server the times look like below: If the structure module takes so much time, I wonder if you are behind a very slow network? Some of the junit tests fetch PDB files from a public ftp server and I wonder if there is a networking issue. Having said this, the two slowest modules are structure and protmod. We will try to cut down the time spent on those tests... Andreas INFO] biojava ............................................... SUCCESS [21.430s] [INFO] bytecode .............................................. SUCCESS [44.778s] [INFO] core .................................................. SUCCESS [4:54.691s] [INFO] alignment ............................................. SUCCESS [48.048s] [INFO] blast ................................................. SUCCESS [1:20.263s] [INFO] biojava3-structure .................................... SUCCESS [8:12.412s] [INFO] das ................................................... SUCCESS [1:13.103s] [INFO] sequence .............................................. SUCCESS [18.176s] [INFO] sequence-core ......................................... SUCCESS [47.424s] [INFO] sequence-rna .......................................... SUCCESS [28.770s] [INFO] sequence-biosql ....................................... SUCCESS [47.032s] [INFO] sequence-fasta ........................................ SUCCESS [29.075s] [INFO] sequence-blastxml ..................................... SUCCESS [44.392s] [INFO] sequencing ............................................ SUCCESS [49.520s] [INFO] phylo ................................................. SUCCESS [48.459s] [INFO] biosql ................................................ SUCCESS [57.503s] [INFO] gui ................................................... SUCCESS [58.610s] [INFO] biojava3-core ......................................... SUCCESS [1:05.398s] [INFO] biojava3-phylo ........................................ SUCCESS [46.623s] [INFO] biojava3-structure-gui ................................ SUCCESS [1:06.737s] [INFO] biojava3-alignment .................................... SUCCESS [54.682s] [INFO] biojava3-genome ....................................... SUCCESS [49.844s] [INFO] biojava3-protmod ...................................... SUCCESS [10:05.171s] [INFO] biojava3-ws ........................................... SUCCESS [44.379s] On Mon, Aug 9, 2010 at 1:52 PM, George Waldon wrote: > Thanks to all for the fixing. > > The build took 19 minutes and 33 seconds, of which 16 min and 32 s were for the structure modules! This sounds a bit long to me. Is-this expected? > > George > >>----- ------- Original Message ------- ----- >>From: Andreas Prlic >>To: George Waldon >>Sent: Mon, 9 Aug 2010 12:35:56 >> >>If you update the class, this should be fixed now. >>Was a problem with >>a hard coded /tmp path... >> >>Andreas >> >> >> >>On Mon, Aug 9, 2010 at 12:26 PM, George Waldon >> wrote: >>> Hi, >>> >>> I am getting the following failures in >>org.biojava.bio.structure.align.benchmark.MultipleA >>lignmentTest: >>> >>> Fetching >>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb >>/pdb1hcy.ent.gz >>> writing to \tmp\hc\pdb1hcy.ent.gz >>> java.io.FileNotFoundException: >>\tmp\hc\pdb1hcy.ent.gz (The system cannot find the >>path specified) >>> ... >>> >>> and >>> >>> Fetching >>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb >>/pdb1nls.ent.gz >>> writing to \tmp\nl\pdb1nls.ent.gz >>> java.io.FileNotFoundException: >>\tmp\nl\pdb1nls.ent.gz (The system cannot find the >>path specified) >>> at java.io.FileOutputStream.open(Native Method) >>> at >>java.io.FileOutputStream.(FileOutputStream.ja >>va:179) >>> at >>java.io.FileOutputStream.(FileOutputStream.ja >>va:131) >>> at >>org.biojava.bio.structure.io.PDBFileReader.download >>PDB(PDBFileReader.java:430) >>> ... >>> >>> Does anyone has a solution for this? I am >>building from within NetBeans. >>> >>> Thanks, >>> George >>> >> >> >> >>-- >>--------------------------------------------------- >>-------------------- >>Dr. Andreas Prlic >>Senior Scientist, RCSB PDB Protein Data Bank >>University of California, San Diego >>(+1) 858.246.0526 >>--------------------------------------------------- >>-------------------- > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From gwaldon at geneinfinity.org Mon Aug 9 22:20:27 2010 From: gwaldon at geneinfinity.org (George Waldon) Date: Mon, 09 Aug 2010 18:20:27 -0400 Subject: [Biojava-dev] build problem Message-ID: <20100809222027.95361.qmail@mxw1102.verio-web.com> Here is my summary and some of the timings are pretty good (simple HP laptop with a dual core): ------------------------------------------------------------------------ Reactor Summary: ------------------------------------------------------------------------ biojava ............................................... SUCCESS [3.820s] bytecode .............................................. SUCCESS [6.022s] core .................................................. SUCCESS [52.822s] alignment ............................................. SUCCESS [4.332s] blast ................................................. SUCCESS [27.453s] biojava3-structure .................................... SUCCESS [7:33.228s] das ................................................... SUCCESS [18.456s] sequence .............................................. SUCCESS [0.122s] sequence-core ......................................... SUCCESS [3.974s] sequence-rna .......................................... SUCCESS [0.313s] sequence-biosql ....................................... SUCCESS [2.735s] sequence-fasta ........................................ SUCCESS [0.324s] sequence-blastxml ..................................... SUCCESS [3.339s] sequencing ............................................ SUCCESS [5.637s] phylo ................................................. SUCCESS [5.084s] biosql ................................................ SUCCESS [6.344s] gui ................................................... SUCCESS [6.552s] biojava3-core ......................................... SUCCESS [8.477s] biojava3-phylo ........................................ SUCCESS [3.596s] biojava3-structure-gui ................................ SUCCESS [8.283s] biojava3-alignment .................................... SUCCESS [6.002s] biojava3-genome ....................................... SUCCESS [3.933s] biojava3-protmod ...................................... SUCCESS [8:59.048s] biojava3-ws ........................................... SUCCESS [1.367s] ------------------------------------------------------------------------ ------------------------------------------------------------------------ BUILD SUCCESSFUL ------------------------------------------------------------------------ Total time: 19 minutes 33 seconds Finished at: Mon Aug 09 13:48:00 PDT 2010 Final Memory: 116M/363M ------------------------------------------------------------------------ The long time comes apparently from fetching all these files. I tried the build last week on a different network after removing the faulty test as suggested by Scott and I had similar timing. This could be an issue with NetBeans in fact. If someone has experienced such long delays, it would be interesting to know in which conditions this occurred. Thanks again, George >----- ------- Original Message ------- ----- >From: Andreas Prlic >To: George Waldon >Sent: Mon, 9 Aug 2010 14:27:27 > >That seems like a very long time... on the >automated build server the >times look like below: > >If the structure module takes so much time, I >wonder if you are behind >a very slow network? Some of the junit tests fetch >PDB files from a >public ftp server and I wonder if there is a >networking issue. Having >said this, the two slowest modules are structure >and protmod. We will >try to cut down the time spent on those tests... > >Andreas > > >INFO] biojava >............................................... >SUCCESS [21.430s] >[INFO] bytecode >.............................................. >SUCCESS [44.778s] >[INFO] core >.................................................. >SUCCESS >[4:54.691s] >[INFO] alignment >............................................. >SUCCESS [48.048s] >[INFO] blast >................................................. >SUCCESS >[1:20.263s] >[INFO] biojava3-structure >.................................... SUCCESS >[8:12.412s] >[INFO] das >................................................... >SUCCESS >[1:13.103s] >[INFO] sequence >.............................................. >SUCCESS [18.176s] >[INFO] sequence-core >......................................... SUCCESS >[47.424s] >[INFO] sequence-rna >.......................................... SUCCESS >[28.770s] >[INFO] sequence-biosql >....................................... SUCCESS >[47.032s] >[INFO] sequence-fasta >........................................ SUCCESS >[29.075s] >[INFO] sequence-blastxml >..................................... SUCCESS >[44.392s] >[INFO] sequencing >............................................ >SUCCESS [49.520s] >[INFO] phylo >................................................. >SUCCESS [48.459s] >[INFO] biosql >................................................ >SUCCESS [57.503s] >[INFO] gui >................................................... >SUCCESS [58.610s] >[INFO] biojava3-core >......................................... SUCCESS >[1:05.398s] >[INFO] biojava3-phylo >........................................ SUCCESS >[46.623s] >[INFO] biojava3-structure-gui >................................ SUCCESS >[1:06.737s] >[INFO] biojava3-alignment >.................................... SUCCESS >[54.682s] >[INFO] biojava3-genome >....................................... SUCCESS >[49.844s] >[INFO] biojava3-protmod >...................................... SUCCESS >[10:05.171s] >[INFO] biojava3-ws >........................................... SUCCESS >[44.379s] > > >On Mon, Aug 9, 2010 at 1:52 PM, George Waldon > wrote: >> Thanks to all for the fixing. >> >> The build took 19 minutes and 33 seconds, of >which 16 min and 32 s were for the structure >modules! This sounds a bit long to me. Is-this >expected? >> >> George >> >>>----- ------- Original Message ------- ----- >>>From: Andreas Prlic >>>To: George Waldon >>>Sent: Mon, 9 Aug 2010 12:35:56 >>> >>>If you update the class, this should be fixed >now. >>>Was a problem with >>>a hard coded /tmp path... >>> >>>Andreas >>> >>> >>> >>>On Mon, Aug 9, 2010 at 12:26 PM, George Waldon >>> wrote: >>>> Hi, >>>> >>>> I am getting the following failures in >>>org.biojava.bio.structure.align.benchmark.Multipl >eA >>>lignmentTest: >>>> >>>> Fetching >>>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/p >db >>>/pdb1hcy.ent.gz >>>> writing to \tmp\hc\pdb1hcy.ent.gz >>>> java.io.FileNotFoundException: >>>\tmp\hc\pdb1hcy.ent.gz (The system cannot find >the >>>path specified) >>>> ... >>>> >>>> and >>>> >>>> Fetching >>>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/p >db >>>/pdb1nls.ent.gz >>>> writing to \tmp\nl\pdb1nls.ent.gz >>>> java.io.FileNotFoundException: >>>\tmp\nl\pdb1nls.ent.gz (The system cannot find >the >>>path specified) >>>> at java.io.FileOutputStream.open(Native Method) > >>>> at >>>java.io.FileOutputStream.(FileOutputStream. >ja >>>va:179) >>>> at >>>java.io.FileOutputStream.(FileOutputStream. >ja >>>va:131) >>>> at >>>org.biojava.bio.structure.io.PDBFileReader.downlo >ad >>>PDB(PDBFileReader.java:430) >>>> ... >>>> >>>> Does anyone has a solution for this? I am >>>building from within NetBeans. >>>> >>>> Thanks, >>>> George >>>> >>> >>> >>> >>>-- >>>------------------------------------------------- >-- >>>-------------------- >>>Dr. Andreas Prlic >>>Senior Scientist, RCSB PDB Protein Data Bank >>>University of California, San Diego >>>(+1) 858.246.0526 >>>------------------------------------------------- >-- >>>-------------------- >> > > > >-- >--------------------------------------------------- >-------------------- >Dr. Andreas Prlic >Senior Scientist, RCSB PDB Protein Data Bank >University of California, San Diego >(+1) 858.246.0526 >--------------------------------------------------- >-------------------- From andreas at sdsc.edu Mon Aug 9 22:48:18 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 9 Aug 2010 15:48:18 -0700 Subject: [Biojava-dev] build problem In-Reply-To: <20100809222027.95361.qmail@mxw1102.verio-web.com> References: <20100809222027.95361.qmail@mxw1102.verio-web.com> Message-ID: Seems your times are comparable with the build machine. The files are downloaded and store in local temporary directories (as provided by System.getProperty("java.io.tmpdir") ) If that tmp directory changes, the files will have to be re-loaded again, otherwise they will be re-used and the code runs quicker. Seems the VM is changing the tmp dir location frequently. Anybody has a suggestion how to define a more "stable" tmp dir locations? Otherwise I will put all required test files into the test resources dir ... Andreas On Mon, Aug 9, 2010 at 3:20 PM, George Waldon wrote: > Here is my summary and some of the timings are pretty good (simple HP laptop with a dual core): > > ------------------------------------------------------------------------ > Reactor Summary: > ------------------------------------------------------------------------ > biojava ............................................... SUCCESS [3.820s] > bytecode .............................................. SUCCESS [6.022s] > core .................................................. SUCCESS [52.822s] > alignment ............................................. SUCCESS [4.332s] > blast ................................................. SUCCESS [27.453s] > biojava3-structure .................................... SUCCESS [7:33.228s] > das ................................................... SUCCESS [18.456s] > sequence .............................................. SUCCESS [0.122s] > sequence-core ......................................... SUCCESS [3.974s] > sequence-rna .......................................... SUCCESS [0.313s] > sequence-biosql ....................................... SUCCESS [2.735s] > sequence-fasta ........................................ SUCCESS [0.324s] > sequence-blastxml ..................................... SUCCESS [3.339s] > sequencing ............................................ SUCCESS [5.637s] > phylo ................................................. SUCCESS [5.084s] > biosql ................................................ SUCCESS [6.344s] > gui ................................................... SUCCESS [6.552s] > biojava3-core ......................................... SUCCESS [8.477s] > biojava3-phylo ........................................ SUCCESS [3.596s] > biojava3-structure-gui ................................ SUCCESS [8.283s] > biojava3-alignment .................................... SUCCESS [6.002s] > biojava3-genome ....................................... SUCCESS [3.933s] > biojava3-protmod ...................................... SUCCESS [8:59.048s] > biojava3-ws ........................................... SUCCESS [1.367s] > ------------------------------------------------------------------------ > ------------------------------------------------------------------------ > BUILD SUCCESSFUL > ------------------------------------------------------------------------ > Total time: 19 minutes 33 seconds > Finished at: Mon Aug 09 13:48:00 PDT 2010 > Final Memory: 116M/363M > ------------------------------------------------------------------------ > > The long time comes apparently from fetching all these files. I tried the build last week on a different network after removing the faulty test as suggested by Scott and I had similar timing. This could be an issue with NetBeans in fact. If someone has experienced such long delays, it would be interesting to know in which conditions this occurred. > > Thanks again, > > George > >>----- ------- Original Message ------- ----- >>From: Andreas Prlic >>To: George Waldon >>Sent: Mon, 9 Aug 2010 14:27:27 >> >>That seems like a very long time... on the >>automated build server the >>times look like below: >> >>If the structure module takes so much time, I >>wonder if you are behind >>a very slow network? Some of the junit tests fetch >>PDB files from a >>public ftp server and I wonder if there is a >>networking issue. Having >>said this, the two slowest modules are structure >>and protmod. We will >>try to cut down the time spent on those tests... >> >>Andreas >> >> >>INFO] biojava >>............................................... >>SUCCESS [21.430s] >>[INFO] bytecode >>.............................................. >>SUCCESS [44.778s] >>[INFO] core >>.................................................. >>SUCCESS >>[4:54.691s] >>[INFO] alignment >>............................................. >>SUCCESS [48.048s] >>[INFO] blast >>................................................. >>SUCCESS >>[1:20.263s] >>[INFO] biojava3-structure >>.................................... SUCCESS >>[8:12.412s] >>[INFO] das >>................................................... >>SUCCESS >>[1:13.103s] >>[INFO] sequence >>.............................................. >>SUCCESS [18.176s] >>[INFO] sequence-core >>......................................... SUCCESS >>[47.424s] >>[INFO] sequence-rna >>.......................................... SUCCESS >>[28.770s] >>[INFO] sequence-biosql >>....................................... SUCCESS >>[47.032s] >>[INFO] sequence-fasta >>........................................ SUCCESS >>[29.075s] >>[INFO] sequence-blastxml >>..................................... SUCCESS >>[44.392s] >>[INFO] sequencing >>............................................ >>SUCCESS [49.520s] >>[INFO] phylo >>................................................. >>SUCCESS [48.459s] >>[INFO] biosql >>................................................ >>SUCCESS [57.503s] >>[INFO] gui >>................................................... >>SUCCESS [58.610s] >>[INFO] biojava3-core >>......................................... SUCCESS >>[1:05.398s] >>[INFO] biojava3-phylo >>........................................ SUCCESS >>[46.623s] >>[INFO] biojava3-structure-gui >>................................ SUCCESS >>[1:06.737s] >>[INFO] biojava3-alignment >>.................................... SUCCESS >>[54.682s] >>[INFO] biojava3-genome >>....................................... SUCCESS >>[49.844s] >>[INFO] biojava3-protmod >>...................................... SUCCESS >>[10:05.171s] >>[INFO] biojava3-ws >>........................................... SUCCESS >>[44.379s] >> >> >>On Mon, Aug 9, 2010 at 1:52 PM, George Waldon >> wrote: >>> Thanks to all for the fixing. >>> >>> The build took 19 minutes and 33 seconds, of >>which 16 min and 32 s were for the structure >>modules! This sounds a bit long to me. Is-this >>expected? >>> >>> George >>> >>>>----- ------- Original Message ------- ----- >>>>From: Andreas Prlic >>>>To: George Waldon >>>>Sent: Mon, 9 Aug 2010 12:35:56 >>>> >>>>If you update the class, this should be fixed >>now. >>>>Was a problem with >>>>a hard coded /tmp path... >>>> >>>>Andreas >>>> >>>> >>>> >>>>On Mon, Aug 9, 2010 at 12:26 PM, George Waldon >>>> wrote: >>>>> Hi, >>>>> >>>>> I am getting the following failures in >>>>org.biojava.bio.structure.align.benchmark.Multipl >>eA >>>>lignmentTest: >>>>> >>>>> Fetching >>>>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/p >>db >>>>/pdb1hcy.ent.gz >>>>> writing to \tmp\hc\pdb1hcy.ent.gz >>>>> java.io.FileNotFoundException: >>>>\tmp\hc\pdb1hcy.ent.gz (The system cannot find >>the >>>>path specified) >>>>> ... >>>>> >>>>> and >>>>> >>>>> Fetching >>>>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/p >>db >>>>/pdb1nls.ent.gz >>>>> writing to \tmp\nl\pdb1nls.ent.gz >>>>> java.io.FileNotFoundException: >>>>\tmp\nl\pdb1nls.ent.gz (The system cannot find >>the >>>>path specified) >>>>> at java.io.FileOutputStream.open(Native Method) >> >>>>> at >>>>java.io.FileOutputStream.(FileOutputStream. >>ja >>>>va:179) >>>>> at >>>>java.io.FileOutputStream.(FileOutputStream. >>ja >>>>va:131) >>>>> at >>>>org.biojava.bio.structure.io.PDBFileReader.downlo >>ad >>>>PDB(PDBFileReader.java:430) >>>>> ... >>>>> >>>>> Does anyone has a solution for this? I am >>>>building from within NetBeans. >>>>> >>>>> Thanks, >>>>> George >>>>> >>>> >>>> >>>> >>>>-- >>>>------------------------------------------------- >>-- >>>>-------------------- >>>>Dr. Andreas Prlic >>>>Senior Scientist, RCSB PDB Protein Data Bank >>>>University of California, San Diego >>>>(+1) 858.246.0526 >>>>------------------------------------------------- >>-- >>>>-------------------- >>> >> >> >> >>-- >>--------------------------------------------------- >>-------------------- >>Dr. Andreas Prlic >>Senior Scientist, RCSB PDB Protein Data Bank >>University of California, San Diego >>(+1) 858.246.0526 >>--------------------------------------------------- >>-------------------- > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From gwaldon at geneinfinity.org Tue Aug 10 00:04:39 2010 From: gwaldon at geneinfinity.org (George Waldon) Date: Mon, 09 Aug 2010 20:04:39 -0400 Subject: [Biojava-dev] build problem Message-ID: <20100810000439.77852.qmail@mxw1102.verio-web.com> I think you are right and these files are not downloaded again. Here are the tests that consume significant time; maybe you can figure out which process is slow: Running org.biojava.bio.structure.align.fatcat.TestFlexibleRotationMatrices Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 63.145 sec Running org.biojava.bio.structure.align.FlipAFPChainTest Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 35.688 sec Running org.biojava.bio.structure.align.fatcat.TestOutputStrings Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 129.81 sec Running org.biojava.bio.structure.align.fatcat.AFPChainSerialisationTest Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 131.016 sec Running org.biojava3.protmod.structure.ProteinModificationParserTest Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 530.036 sec George >----- ------- Original Message ------- ----- >From: Andreas Prlic >To: George Waldon >Sent: Mon, 9 Aug 2010 15:48:18 > >Seems your times are comparable with the build >machine. The files are >downloaded and store in local temporary directories >(as provided by >System.getProperty("java.io.tmpdir") ) If that tmp >directory changes, >the files will have to be re-loaded again, >otherwise they will be >re-used and the code runs quicker. Seems the VM is >changing the tmp >dir location frequently. Anybody has a suggestion >how to define a more >"stable" tmp dir locations? Otherwise I will put >all required test >files into the test resources dir ... > >Andreas > > > >On Mon, Aug 9, 2010 at 3:20 PM, George Waldon > wrote: >> Here is my summary and some of the timings are >pretty good (simple HP laptop with a dual core): >> >> >--------------------------------------------------- >--------------------- >> Reactor Summary: >> >--------------------------------------------------- >--------------------- >> biojava >............................................... >SUCCESS [3.820s] >> bytecode >.............................................. >SUCCESS [6.022s] >> core >.................................................. >SUCCESS [52.822s] >> alignment >............................................. >SUCCESS [4.332s] >> blast >................................................. >SUCCESS [27.453s] >> biojava3-structure >.................................... SUCCESS >[7:33.228s] >> das >................................................... >SUCCESS [18.456s] >> sequence >.............................................. >SUCCESS [0.122s] >> sequence-core >......................................... SUCCESS >[3.974s] >> sequence-rna >.......................................... SUCCESS >[0.313s] >> sequence-biosql >....................................... SUCCESS >[2.735s] >> sequence-fasta >........................................ SUCCESS >[0.324s] >> sequence-blastxml >..................................... SUCCESS >[3.339s] >> sequencing >............................................ >SUCCESS [5.637s] >> phylo >................................................. >SUCCESS [5.084s] >> biosql >................................................ >SUCCESS [6.344s] >> gui >................................................... >SUCCESS [6.552s] >> biojava3-core >......................................... SUCCESS >[8.477s] >> biojava3-phylo >........................................ SUCCESS >[3.596s] >> biojava3-structure-gui >................................ SUCCESS [8.283s] >> biojava3-alignment >.................................... SUCCESS >[6.002s] >> biojava3-genome >....................................... SUCCESS >[3.933s] >> biojava3-protmod >...................................... SUCCESS >[8:59.048s] >> biojava3-ws >........................................... SUCCESS >[1.367s] >> >--------------------------------------------------- >--------------------- >> >--------------------------------------------------- >--------------------- >> BUILD SUCCESSFUL >> >--------------------------------------------------- >--------------------- >> Total time: 19 minutes 33 seconds >> Finished at: Mon Aug 09 13:48:00 PDT 2010 >> Final Memory: 116M/363M >> >--------------------------------------------------- >--------------------- >> >> The long time comes apparently from fetching all >these files. I tried the build last week on a >different network after removing the faulty test as >suggested by Scott and I had similar timing. This >could be an issue with NetBeans in fact. If someone >has experienced such long delays, it would be >interesting to know in which conditions this >occurred. >> >> Thanks again, >> >> George >> >>>----- ------- Original Message ------- ----- >>>From: Andreas Prlic >>>To: George Waldon >>>Sent: Mon, 9 Aug 2010 14:27:27 >>> >>>That seems like a very long time... on the >>>automated build server the >>>times look like below: >>> >>>If the structure module takes so much time, I >>>wonder if you are behind >>>a very slow network? Some of the junit tests >fetch >>>PDB files from a >>>public ftp server and I wonder if there is a >>>networking issue. Having >>>said this, the two slowest modules are structure >>>and protmod. We will >>>try to cut down the time spent on those tests... >>> >>>Andreas >>> >>> >>>INFO] biojava >>>............................................... >>>SUCCESS [21.430s] >>>[INFO] bytecode >>>.............................................. >>>SUCCESS [44.778s] >>>[INFO] core >>>................................................. >. >>>SUCCESS >>>[4:54.691s] >>>[INFO] alignment >>>............................................. >>>SUCCESS [48.048s] >>>[INFO] blast >>>................................................. >>>SUCCESS >>>[1:20.263s] >>>[INFO] biojava3-structure >>>.................................... SUCCESS >>>[8:12.412s] >>>[INFO] das >>>................................................. >.. >>>SUCCESS >>>[1:13.103s] >>>[INFO] sequence >>>.............................................. >>>SUCCESS [18.176s] >>>[INFO] sequence-core >>>......................................... SUCCESS >>>[47.424s] >>>[INFO] sequence-rna >>>.......................................... >SUCCESS >>>[28.770s] >>>[INFO] sequence-biosql >>>....................................... SUCCESS >>>[47.032s] >>>[INFO] sequence-fasta >>>........................................ SUCCESS >>>[29.075s] >>>[INFO] sequence-blastxml >>>..................................... SUCCESS >>>[44.392s] >>>[INFO] sequencing >>>............................................ >>>SUCCESS [49.520s] >>>[INFO] phylo >>>................................................. >>>SUCCESS [48.459s] >>>[INFO] biosql >>>................................................ >>>SUCCESS [57.503s] >>>[INFO] gui >>>................................................. >.. >>>SUCCESS [58.610s] >>>[INFO] biojava3-core >>>......................................... SUCCESS >>>[1:05.398s] >>>[INFO] biojava3-phylo >>>........................................ SUCCESS >>>[46.623s] >>>[INFO] biojava3-structure-gui >>>................................ SUCCESS >>>[1:06.737s] >>>[INFO] biojava3-alignment >>>.................................... SUCCESS >>>[54.682s] >>>[INFO] biojava3-genome >>>....................................... SUCCESS >>>[49.844s] >>>[INFO] biojava3-protmod >>>...................................... SUCCESS >>>[10:05.171s] >>>[INFO] biojava3-ws >>>........................................... >SUCCESS >>>[44.379s] >>> >>> >>>On Mon, Aug 9, 2010 at 1:52 PM, George Waldon >>> wrote: >>>> Thanks to all for the fixing. >>>> >>>> The build took 19 minutes and 33 seconds, of >>>which 16 min and 32 s were for the structure >>>modules! This sounds a bit long to me. Is-this >>>expected? >>>> >>>> George >>>> >>>>>----- ------- Original Message ------- ----- >>>>>From: Andreas Prlic >>>>>To: George Waldon >>>>>Sent: Mon, 9 Aug 2010 12:35:56 >>>>> >>>>>If you update the class, this should be fixed >>>now. >>>>>Was a problem with >>>>>a hard coded /tmp path... >>>>> >>>>>Andreas >>>>> >>>>> >>>>> >>>>>On Mon, Aug 9, 2010 at 12:26 PM, George Waldon >>>>> wrote: >>>>>> Hi, >>>>>> >>>>>> I am getting the following failures in >>>>>org.biojava.bio.structure.align.benchmark.Multi >pl >>>eA >>>>>lignmentTest: >>>>>> >>>>>> Fetching >>>>>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all >/p >>>db >>>>>/pdb1hcy.ent.gz >>>>>> writing to \tmp\hc\pdb1hcy.ent.gz >>>>>> java.io.FileNotFoundException: >>>>>\tmp\hc\pdb1hcy.ent.gz (The system cannot find >>>the >>>>>path specified) >>>>>> ... >>>>>> >>>>>> and >>>>>> >>>>>> Fetching >>>>>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all >/p >>>db >>>>>/pdb1nls.ent.gz >>>>>> writing to \tmp\nl\pdb1nls.ent.gz >>>>>> java.io.FileNotFoundException: >>>>>\tmp\nl\pdb1nls.ent.gz (The system cannot find >>>the >>>>>path specified) >>>>>> at java.io.FileOutputStream.open(Native >Method) >>> >>>>>> at >>>>>java.io.FileOutputStream.(FileOutputStrea >m. >>>ja >>>>>va:179) >>>>>> at >>>>>java.io.FileOutputStream.(FileOutputStrea >m. >>>ja >>>>>va:131) >>>>>> at >>>>>org.biojava.bio.structure.io.PDBFileReader.down >lo >>>ad >>>>>PDB(PDBFileReader.java:430) >>>>>> ... >>>>>> >>>>>> Does anyone has a solution for this? I am >>>>>building from within NetBeans. >>>>>> >>>>>> Thanks, >>>>>> George >>>>>> >>>>> >>>>> >>>>> >>>>>-- >>>>>----------------------------------------------- >-- >>>-- >>>>>-------------------- >>>>>Dr. Andreas Prlic >>>>>Senior Scientist, RCSB PDB Protein Data Bank >>>>>University of California, San Diego >>>>>(+1) 858.246.0526 >>>>>----------------------------------------------- >-- >>>-- >>>>>-------------------- >>>> >>> >>> >>> >>>-- >>>------------------------------------------------- >-- >>>-------------------- >>>Dr. Andreas Prlic >>>Senior Scientist, RCSB PDB Protein Data Bank >>>University of California, San Diego >>>(+1) 858.246.0526 >>>------------------------------------------------- >-- >>>-------------------- >> > > > >-- >--------------------------------------------------- >-------------------- >Dr. Andreas Prlic >Senior Scientist, RCSB PDB Protein Data Bank >University of California, San Diego >(+1) 858.246.0526 >--------------------------------------------------- >-------------------- From andreas at sdsc.edu Wed Aug 11 02:24:08 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 10 Aug 2010 19:24:08 -0700 Subject: [Biojava-dev] biojava3 sequence tools Message-ID: Hi, just wondering if we have already a class that can accept any protein or DNA sequence as input and can return a Sequence object of the correct type ? Andreas From holland at eaglegenomics.com Wed Aug 11 04:05:55 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 11 Aug 2010 05:05:55 +0100 Subject: [Biojava-dev] biojava3 sequence tools In-Reply-To: References: Message-ID: <70AB59E7-52EA-47AD-A10C-734B928F1BED@eaglegenomics.com> You mean an auto-detector that takes a String input, guesses based on content what it is, and returns a Sequence object of the appropriate type, being Protein or DNA etc.? Not that I know of. A bit hard too - if all the letters in the String are a valid subset from two or more alphabets (e.g. ATCG are all in the Protein alphabet as well as being DNA), how do we know which one it is? On 11 Aug 2010, at 03:24, Andreas Prlic wrote: > Hi, > > just wondering if we have already a class that can accept any protein > or DNA sequence as input and can return a Sequence object of the > correct type ? > > Andreas > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From markjschreiber at gmail.com Wed Aug 11 04:46:32 2010 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 11 Aug 2010 12:46:32 +0800 Subject: [Biojava-dev] biojava3 sequence tools In-Reply-To: <70AB59E7-52EA-47AD-A10C-734B928F1BED@eaglegenomics.com> References: <70AB59E7-52EA-47AD-A10C-734B928F1BED@eaglegenomics.com> Message-ID: I think SeqIOTools had a method for this, possible also available in RichSequence.IOTools. As Richard says, not guaranteed to work in all cases. On Wed, Aug 11, 2010 at 12:05 PM, Richard Holland wrote: > You mean an auto-detector that takes a String input, guesses based on > content what it is, and returns a Sequence object of the appropriate type, > being Protein or DNA etc.? Not that I know of. A bit hard too - if all the > letters in the String are a valid subset from two or more alphabets (e.g. > ATCG are all in the Protein alphabet as well as being DNA), how do we know > which one it is? > > On 11 Aug 2010, at 03:24, Andreas Prlic wrote: > > > Hi, > > > > just wondering if we have already a class that can accept any protein > > or DNA sequence as input and can return a Sequence object of the > > correct type ? > > > > Andreas > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From ayates at ebi.ac.uk Wed Aug 11 08:26:31 2010 From: ayates at ebi.ac.uk (Andy Yates) Date: Wed, 11 Aug 2010 09:26:31 +0100 Subject: [Biojava-dev] biojava3 sequence tools In-Reply-To: References: <70AB59E7-52EA-47AD-A10C-734B928F1BED@eaglegenomics.com> Message-ID: <042B872B-35C2-4A50-BD29-606A45165D6A@ebi.ac.uk> Building a Sequence object which can contain AminoAcidCompound or NucleotideCompound is easy; the return types makes this incredibly hard since we'd have to return Sequence which forces the user to start casting to a more useful type. Every auto detector I've known gets it wrong since they all apply arbitrary thresholds to decide the switch. However if the need is there (which I'm sure for writing some interfaces there are) something can be knocked up quickly I think. On 11 Aug 2010, at 05:46, Mark Schreiber wrote: > I think SeqIOTools had a method for this, possible also available in > RichSequence.IOTools. > > As Richard says, not guaranteed to work in all cases. > > > > > On Wed, Aug 11, 2010 at 12:05 PM, Richard Holland > wrote: > >> You mean an auto-detector that takes a String input, guesses based on >> content what it is, and returns a Sequence object of the appropriate type, >> being Protein or DNA etc.? Not that I know of. A bit hard too - if all the >> letters in the String are a valid subset from two or more alphabets (e.g. >> ATCG are all in the Protein alphabet as well as being DNA), how do we know >> which one it is? >> >> On 11 Aug 2010, at 03:24, Andreas Prlic wrote: >> >>> Hi, >>> >>> just wondering if we have already a class that can accept any protein >>> or DNA sequence as input and can return a Sequence object of the >>> correct type ? >>> >>> Andreas >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- Andrew Yates Ensembl Genomes Engineer EMBL-EBI Tel: +44-(0)1223-492538 Wellcome Trust Genome Campus Fax: +44-(0)1223-494468 Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/ From andreas at sdsc.edu Wed Aug 11 15:58:15 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 11 Aug 2010 08:58:15 -0700 Subject: [Biojava-dev] biojava3 sequence tools In-Reply-To: <042B872B-35C2-4A50-BD29-606A45165D6A@ebi.ac.uk> References: <70AB59E7-52EA-47AD-A10C-734B928F1BED@eaglegenomics.com> <042B872B-35C2-4A50-BD29-606A45165D6A@ebi.ac.uk> Message-ID: thanks for the replies. I was trying to see how to improve a web-form into which the user can paste in any type of sequence and the server selects the correct version of blast to run... I will probably use a check how many % of the sequence are looking like they are nucleotides. Unlikely to find a longer protein sequence that just consist of ATCGs ... Andreas On Wed, Aug 11, 2010 at 1:26 AM, Andy Yates wrote: > Building a Sequence object which can contain AminoAcidCompound or NucleotideCompound is easy; the return types makes this incredibly hard since we'd have to return Sequence which forces the user to start casting to a more useful type. Every auto detector I've known gets it wrong since they all apply arbitrary thresholds to decide the switch. > > However if the need is there (which I'm sure for writing some interfaces there are) something can be knocked up quickly I think. > > On 11 Aug 2010, at 05:46, Mark Schreiber wrote: > >> I think SeqIOTools had a method for this, possible also available in >> RichSequence.IOTools. >> >> As Richard says, not guaranteed to work in all cases. >> >> >> >> >> On Wed, Aug 11, 2010 at 12:05 PM, Richard Holland >> wrote: >> >>> You mean an auto-detector that takes a String input, guesses based on >>> content what it is, and returns a Sequence object of the appropriate type, >>> being Protein or DNA etc.? Not that I know of. A bit hard too - if all the >>> letters in the String are a valid subset from two or more alphabets (e.g. >>> ATCG are all in the Protein alphabet as well as being DNA), how do we know >>> which one it is? >>> >>> On 11 Aug 2010, at 03:24, Andreas Prlic wrote: >>> >>>> Hi, >>>> >>>> just wondering if we have already a class that can accept any protein >>>> or DNA sequence as input and can return a Sequence object of the >>>> correct type ? >>>> >>>> Andreas >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From SMarkel at accelrys.com Wed Aug 11 16:51:09 2010 From: SMarkel at accelrys.com (Scott Markel) Date: Wed, 11 Aug 2010 09:51:09 -0700 Subject: [Biojava-dev] biojava3 sequence tools In-Reply-To: References: <70AB59E7-52EA-47AD-A10C-734B928F1BED@eaglegenomics.com> <042B872B-35C2-4A50-BD29-606A45165D6A@ebi.ac.uk> Message-ID: <5ACBA19439E77B43A06F4CAB897EC97701C034BA42@EXCH1-COLO.accelrys.net> Andreas, You might want to look at the _guess_alphabet subroutine in BioPerl's Bio::PrimarySeq module. Here's the core logic. my $u = ($str =~ tr/Uu//); # The assumption here is that most of sequences comprised of mainly # ATGC, with some N, will be 'dna' despite the fact that N could # also be Asparagine my $atgc = ($str =~ tr/ATGCNatgcn//); if( ($atgc / $total) > 0.85 ) { $type = 'dna'; } elsif( (($atgc + $u) / $total) > 0.85 ) { $type = 'rna'; } else { $type = 'protein'; } Scott Scott Markel, Ph.D. Principal Bioinformatics Architect email: smarkel at accelrys.com Accelrys (Pipeline Pilot R&D) mobile: +1 858 205 3653 10188 Telesis Court, Suite 100 voice: +1 858 799 5603 San Diego, CA 92121 fax: +1 858 799 5222 USA web: http://www.accelrys.com http://www.linkedin.com/in/smarkel Vice President, Board of Directors: International Society for Computational Biology Chair: ISCB Publications Committee Associate Editor: PLoS Computational Biology Editorial Board: Briefings in Bioinformatics -----Original Message----- From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic Sent: Wednesday, 11 August 2010 8:58 AM To: Andy Yates Cc: biojava-dev Subject: Re: [Biojava-dev] biojava3 sequence tools thanks for the replies. I was trying to see how to improve a web-form into which the user can paste in any type of sequence and the server selects the correct version of blast to run... I will probably use a check how many % of the sequence are looking like they are nucleotides. Unlikely to find a longer protein sequence that just consist of ATCGs ... Andreas On Wed, Aug 11, 2010 at 1:26 AM, Andy Yates wrote: > Building a Sequence object which can contain AminoAcidCompound or NucleotideCompound is easy; the return types makes this incredibly hard since we'd have to return Sequence which forces the user to start casting to a more useful type. Every auto detector I've known gets it wrong since they all apply arbitrary thresholds to decide the switch. > > However if the need is there (which I'm sure for writing some interfaces there are) something can be knocked up quickly I think. > > On 11 Aug 2010, at 05:46, Mark Schreiber wrote: > >> I think SeqIOTools had a method for this, possible also available in >> RichSequence.IOTools. >> >> As Richard says, not guaranteed to work in all cases. >> >> >> >> >> On Wed, Aug 11, 2010 at 12:05 PM, Richard Holland >> wrote: >> >>> You mean an auto-detector that takes a String input, guesses based on >>> content what it is, and returns a Sequence object of the appropriate type, >>> being Protein or DNA etc.? Not that I know of. A bit hard too - if all the >>> letters in the String are a valid subset from two or more alphabets (e.g. >>> ATCG are all in the Protein alphabet as well as being DNA), how do we know >>> which one it is? >>> >>> On 11 Aug 2010, at 03:24, Andreas Prlic wrote: >>> >>>> Hi, >>>> >>>> just wondering if we have already a class that can accept any protein >>>> or DNA sequence as input and can return a Sequence object of the >>>> correct type ? >>>> >>>> Andreas >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >>> -- >>> Richard Holland, BSc MBCS >>> Operations and Delivery Director, Eagle Genomics Ltd >>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> >>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- > Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer > EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 > Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 > Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From andreas at sdsc.edu Wed Aug 11 17:58:14 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 11 Aug 2010 10:58:14 -0700 Subject: [Biojava-dev] biojava3 sequence tools In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC97701C034BA42@EXCH1-COLO.accelrys.net> References: <70AB59E7-52EA-47AD-A10C-734B928F1BED@eaglegenomics.com> <042B872B-35C2-4A50-BD29-606A45165D6A@ebi.ac.uk> <5ACBA19439E77B43A06F4CAB897EC97701C034BA42@EXCH1-COLO.accelrys.net> Message-ID: thanks, Scott, here similar utility methods in Java ... Andreas protected static final String NUCLEOTIDE_LETTERS = "GCTAUX"; public static int percentNucleotideSequence(String sequence) { if (sequence == null || sequence.length() == 0) return 0; int l = sequence.length(); int n =0; for (int i = 0; i < l; i++) { if (NUCLEOTIDE_LETTERS.indexOf(sequence.charAt(i)) < 0) { continue; } n++; } return (100 * n) / l; } public static boolean isNucleotideSequence(String sequence) { if (sequence == null || sequence.length() == 0) return false; int l = sequence.length(); for (int i = 0; i < l; i++) { if (NUCLEOTIDE_LETTERS.indexOf(sequence.charAt(i)) < 0) { return false; } } return true; } On Wed, Aug 11, 2010 at 9:51 AM, Scott Markel wrote: > Andreas, > > You might want to look at the _guess_alphabet subroutine in BioPerl's > Bio::PrimarySeq module. > > Here's the core logic. > > ? my $u = ($str =~ tr/Uu//); > ? ? ? ?# The assumption here is that most of sequences comprised of mainly > ? # ATGC, with some N, will be 'dna' despite the fact that N could > ? ? ? ?# also be Asparagine > ? my $atgc = ($str =~ tr/ATGCNatgcn//); > > ? if( ($atgc / $total) > 0.85 ) { > ? ? ? $type = 'dna'; > ? } elsif( (($atgc + $u) / $total) > 0.85 ) { > ? ? ? $type = 'rna'; > ? } else { > ? ? ? $type = 'protein'; > ? } > > Scott > > Scott Markel, Ph.D. > Principal Bioinformatics Architect ?email: ?smarkel at accelrys.com > Accelrys (Pipeline Pilot R&D) ? ? ? mobile: +1 858 205 3653 > 10188 Telesis Court, Suite 100 ? ? ?voice: ?+1 858 799 5603 > San Diego, CA 92121 ? ? ? ? ? ? ? ? fax: ? ?+1 858 799 5222 > USA ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? web: ? ?http://www.accelrys.com > > http://www.linkedin.com/in/smarkel > Vice President, Board of Directors: > ? ?International Society for Computational Biology > Chair: ISCB Publications Committee > Associate Editor: PLoS Computational Biology > Editorial Board: Briefings in Bioinformatics > > > -----Original Message----- > From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Andreas Prlic > Sent: Wednesday, 11 August 2010 8:58 AM > To: Andy Yates > Cc: biojava-dev > Subject: Re: [Biojava-dev] biojava3 sequence tools > > thanks for the replies. I was trying to see how to improve a web-form > into which the user can paste in any type of sequence and the server > selects the correct version of blast to run... ?I will probably use a > check how many % of the sequence are looking like they are > nucleotides. Unlikely to find a longer protein sequence that just > consist of ATCGs ... > > Andreas > > > On Wed, Aug 11, 2010 at 1:26 AM, Andy Yates wrote: >> Building a Sequence object which can contain AminoAcidCompound or NucleotideCompound is easy; the return types makes this incredibly hard since we'd have to return Sequence which forces the user to start casting to a more useful type. Every auto detector I've known gets it wrong since they all apply arbitrary thresholds to decide the switch. >> >> However if the need is there (which I'm sure for writing some interfaces there are) something can be knocked up quickly I think. >> >> On 11 Aug 2010, at 05:46, Mark Schreiber wrote: >> >>> I think SeqIOTools had a method for this, possible also available in >>> RichSequence.IOTools. >>> >>> As Richard says, not guaranteed to work in all cases. >>> >>> >>> >>> >>> On Wed, Aug 11, 2010 at 12:05 PM, Richard Holland >>> wrote: >>> >>>> You mean an auto-detector that takes a String input, guesses based on >>>> content what it is, and returns a Sequence object of the appropriate type, >>>> being Protein or DNA etc.? Not that I know of. A bit hard too - if all the >>>> letters in the String are a valid subset from two or more alphabets (e.g. >>>> ATCG are all in the Protein alphabet as well as being DNA), how do we know >>>> which one it is? >>>> >>>> On 11 Aug 2010, at 03:24, Andreas Prlic wrote: >>>> >>>>> Hi, >>>>> >>>>> just wondering if we have already a class that can accept any protein >>>>> or DNA sequence as input and can return a Sequence object of the >>>>> correct type ? >>>>> >>>>> Andreas >>>>> _______________________________________________ >>>>> biojava-dev mailing list >>>>> biojava-dev at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>>> -- >>>> Richard Holland, BSc MBCS >>>> Operations and Delivery Director, Eagle Genomics Ltd >>>> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >>>> http://www.eaglegenomics.com/ >>>> >>>> >>>> _______________________________________________ >>>> biojava-dev mailing list >>>> biojava-dev at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >>>> >>> _______________________________________________ >>> biojava-dev mailing list >>> biojava-dev at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> >> -- >> Andrew Yates ? ? ? ? ? ? ? ? ? Ensembl Genomes Engineer >> EMBL-EBI ? ? ? ? ? ? ? ? ? ? ? Tel: +44-(0)1223-492538 >> Wellcome Trust Genome Campus ? Fax: +44-(0)1223-494468 >> Cambridge CB10 1SD, UK ? ? ? ? http://www.ensemblgenomes.org/ >> >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > > -- > ----------------------------------------------------------------------- > Dr. Andreas Prlic > Senior Scientist, RCSB PDB Protein Data Bank > University of California, San Diego > (+1) 858.246.0526 > ----------------------------------------------------------------------- > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From andreas at sdsc.edu Wed Aug 11 22:49:28 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 11 Aug 2010 15:49:28 -0700 Subject: [Biojava-dev] build problem In-Reply-To: <20100811224434.27093.qmail@mxw1102.verio-web.com> References: <20100811224434.27093.qmail@mxw1102.verio-web.com> Message-ID: thanks, still working on making it even faster... A On Wed, Aug 11, 2010 at 3:44 PM, George Waldon wrote: > Andreas: > > Just for your intention, here are my today's build times. I am down to 9 min 21 s. This has very improved. Thank you very much. > > George > > ------------------------------------------------------------------------ > biojava ............................................... SUCCESS [3.938s] > bytecode .............................................. SUCCESS [6.657s] > core .................................................. SUCCESS [47.960s] > alignment ............................................. SUCCESS [3.024s] > blast ................................................. SUCCESS [24.915s] > biojava3-structure .................................... SUCCESS [5:09.755s] > das ................................................... SUCCESS [40.210s] > sequence .............................................. SUCCESS [0.126s] > sequence-core ......................................... SUCCESS [3.359s] > sequence-rna .......................................... SUCCESS [0.364s] > sequence-biosql ....................................... SUCCESS [1.763s] > sequence-fasta ........................................ SUCCESS [0.324s] > sequence-blastxml ..................................... SUCCESS [2.905s] > sequencing ............................................ SUCCESS [5.447s] > phylo ................................................. SUCCESS [4.051s] > biosql ................................................ SUCCESS [4.305s] > gui ................................................... SUCCESS [2.787s] > biojava3-core ......................................... SUCCESS [6.502s] > biojava3-phylo ........................................ SUCCESS [3.124s] > biojava3-structure-gui ................................ SUCCESS [5.339s] > biojava3-alignment .................................... SUCCESS [4.594s] > biojava3-genome ....................................... SUCCESS [2.657s] > biojava3-protmod ...................................... SUCCESS [1:13.620s] > biojava3-ws ........................................... SUCCESS [2.078s] > ------------------------------------------------------------------------ > ------------------------------------------------------------------------ > BUILD SUCCESSFUL > ------------------------------------------------------------------------ > Total time: 9 minutes 21 seconds > Finished at: Wed Aug 11 15:29:57 PDT 2010 > Final Memory: 83M/370M > > > > >>----- ------- Original Message ------- ----- >>From: Andreas Prlic >>To: George Waldon >>Sent: Mon, 9 Aug 2010 15:48:18 >> >>Seems your times are comparable with the build >>machine. The files are >>downloaded and store in local temporary directories >>(as provided by >>System.getProperty("java.io.tmpdir") ) If that tmp >>directory changes, >>the files will have to be re-loaded again, >>otherwise they will be >>re-used and the code runs quicker. Seems the VM is >>changing the tmp >>dir location frequently. Anybody has a suggestion >>how to define a more >>"stable" tmp dir locations? Otherwise I will put >>all required test >>files into the test resources dir ... >> >>Andreas >> >> >> >>On Mon, Aug 9, 2010 at 3:20 PM, George Waldon >> wrote: >>> Here is my summary and some of the timings are >>pretty good (simple HP laptop with a dual core): >>> >>> >>--------------------------------------------------- >>--------------------- >>> Reactor Summary: >>> >>--------------------------------------------------- >>--------------------- >>> biojava >>............................................... >>SUCCESS [3.820s] >>> bytecode >>.............................................. >>SUCCESS [6.022s] >>> core >>.................................................. >>SUCCESS [52.822s] >>> alignment >>............................................. >>SUCCESS [4.332s] >>> blast >>................................................. >>SUCCESS [27.453s] >>> biojava3-structure >>.................................... SUCCESS >>[7:33.228s] >>> das >>................................................... >>SUCCESS [18.456s] >>> sequence >>.............................................. >>SUCCESS [0.122s] >>> sequence-core >>......................................... SUCCESS >>[3.974s] >>> sequence-rna >>.......................................... SUCCESS >>[0.313s] >>> sequence-biosql >>....................................... SUCCESS >>[2.735s] >>> sequence-fasta >>........................................ SUCCESS >>[0.324s] >>> sequence-blastxml >>..................................... SUCCESS >>[3.339s] >>> sequencing >>............................................ >>SUCCESS [5.637s] >>> phylo >>................................................. >>SUCCESS [5.084s] >>> biosql >>................................................ >>SUCCESS [6.344s] >>> gui >>................................................... >>SUCCESS [6.552s] >>> biojava3-core >>......................................... SUCCESS >>[8.477s] >>> biojava3-phylo >>........................................ SUCCESS >>[3.596s] >>> biojava3-structure-gui >>................................ SUCCESS [8.283s] >>> biojava3-alignment >>.................................... SUCCESS >>[6.002s] >>> biojava3-genome >>....................................... SUCCESS >>[3.933s] >>> biojava3-protmod >>...................................... SUCCESS >>[8:59.048s] >>> biojava3-ws >>........................................... SUCCESS >>[1.367s] >>> >>--------------------------------------------------- >>--------------------- >>> >>--------------------------------------------------- >>--------------------- >>> BUILD SUCCESSFUL >>> >>--------------------------------------------------- >>--------------------- >>> Total time: 19 minutes 33 seconds >>> Finished at: Mon Aug 09 13:48:00 PDT 2010 >>> Final Memory: 116M/363M >>> >>--------------------------------------------------- >>--------------------- >>> >>> The long time comes apparently from fetching all >>these files. I tried the build last week on a >>different network after removing the faulty test as >>suggested by Scott and I had similar timing. This >>could be an issue with NetBeans in fact. If someone >>has experienced such long delays, it would be >>interesting to know in which conditions this >>occurred. >>> >>> Thanks again, >>> >>> George >>> >>>>----- ------- Original Message ------- ----- >>>>From: Andreas Prlic >>>>To: George Waldon >>>>Sent: Mon, 9 Aug 2010 14:27:27 >>>> >>>>That seems like a very long time... on the >>>>automated build server the >>>>times look like below: >>>> >>>>If the structure module takes so much time, I >>>>wonder if you are behind >>>>a very slow network? Some of the junit tests >>fetch >>>>PDB files from a >>>>public ftp server and I wonder if there is a >>>>networking issue. Having >>>>said this, the two slowest modules are structure >>>>and protmod. We will >>>>try to cut down the time spent on those tests... >>>> >>>>Andreas >>>> >>>> >>>>INFO] biojava >>>>............................................... >>>>SUCCESS [21.430s] >>>>[INFO] bytecode >>>>.............................................. >>>>SUCCESS [44.778s] >>>>[INFO] core >>>>................................................. >>. >>>>SUCCESS >>>>[4:54.691s] >>>>[INFO] alignment >>>>............................................. >>>>SUCCESS [48.048s] >>>>[INFO] blast >>>>................................................. >>>>SUCCESS >>>>[1:20.263s] >>>>[INFO] biojava3-structure >>>>.................................... SUCCESS >>>>[8:12.412s] >>>>[INFO] das >>>>................................................. >>.. >>>>SUCCESS >>>>[1:13.103s] >>>>[INFO] sequence >>>>.............................................. >>>>SUCCESS [18.176s] >>>>[INFO] sequence-core >>>>......................................... SUCCESS >>>>[47.424s] >>>>[INFO] sequence-rna >>>>.......................................... >>SUCCESS >>>>[28.770s] >>>>[INFO] sequence-biosql >>>>....................................... SUCCESS >>>>[47.032s] >>>>[INFO] sequence-fasta >>>>........................................ SUCCESS >>>>[29.075s] >>>>[INFO] sequence-blastxml >>>>..................................... SUCCESS >>>>[44.392s] >>>>[INFO] sequencing >>>>............................................ >>>>SUCCESS [49.520s] >>>>[INFO] phylo >>>>................................................. >>>>SUCCESS [48.459s] >>>>[INFO] biosql >>>>................................................ >>>>SUCCESS [57.503s] >>>>[INFO] gui >>>>................................................. >>.. >>>>SUCCESS [58.610s] >>>>[INFO] biojava3-core >>>>......................................... SUCCESS >>>>[1:05.398s] >>>>[INFO] biojava3-phylo >>>>........................................ SUCCESS >>>>[46.623s] >>>>[INFO] biojava3-structure-gui >>>>................................ SUCCESS >>>>[1:06.737s] >>>>[INFO] biojava3-alignment >>>>.................................... SUCCESS >>>>[54.682s] >>>>[INFO] biojava3-genome >>>>....................................... SUCCESS >>>>[49.844s] >>>>[INFO] biojava3-protmod >>>>...................................... SUCCESS >>>>[10:05.171s] >>>>[INFO] biojava3-ws >>>>........................................... >>SUCCESS >>>>[44.379s] >>>> >>>> >>>>On Mon, Aug 9, 2010 at 1:52 PM, George Waldon >>>> wrote: >>>>> Thanks to all for the fixing. >>>>> >>>>> The build took 19 minutes and 33 seconds, of >>>>which 16 min and 32 s were for the structure >>>>modules! This sounds a bit long to me. Is-this >>>>expected? >>>>> >>>>> George >>>>> >>>>>>----- ------- Original Message ------- ----- >>>>>>From: Andreas Prlic >>>>>>To: George Waldon >>>>>>Sent: Mon, 9 Aug 2010 12:35:56 >>>>>> >>>>>>If you update the class, this should be fixed >>>>now. >>>>>>Was a problem with >>>>>>a hard coded /tmp path... >>>>>> >>>>>>Andreas >>>>>> >>>>>> >>>>>> >>>>>>On Mon, Aug 9, 2010 at 12:26 PM, George Waldon >>>>>> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I am getting the following failures in >>>>>>org.biojava.bio.structure.align.benchmark.Multi >>pl >>>>eA >>>>>>lignmentTest: >>>>>>> >>>>>>> Fetching >>>>>>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all >>/p >>>>db >>>>>>/pdb1hcy.ent.gz >>>>>>> writing to \tmp\hc\pdb1hcy.ent.gz >>>>>>> java.io.FileNotFoundException: >>>>>>\tmp\hc\pdb1hcy.ent.gz (The system cannot find >>>>the >>>>>>path specified) >>>>>>> ... >>>>>>> >>>>>>> and >>>>>>> >>>>>>> Fetching >>>>>>ftp://ftp.wwpdb.org/pub/pdb/data/structures/all >>/p >>>>db >>>>>>/pdb1nls.ent.gz >>>>>>> writing to \tmp\nl\pdb1nls.ent.gz >>>>>>> java.io.FileNotFoundException: >>>>>>\tmp\nl\pdb1nls.ent.gz (The system cannot find >>>>the >>>>>>path specified) >>>>>>> at java.io.FileOutputStream.open(Native >>Method) >>>> >>>>>>> at >>>>>>java.io.FileOutputStream.(FileOutputStrea >>m. >>>>ja >>>>>>va:179) >>>>>>> at >>>>>>java.io.FileOutputStream.(FileOutputStrea >>m. >>>>ja >>>>>>va:131) >>>>>>> at >>>>>>org.biojava.bio.structure.io.PDBFileReader.down >>lo >>>>ad >>>>>>PDB(PDBFileReader.java:430) >>>>>>> ... >>>>>>> >>>>>>> Does anyone has a solution for this? I am >>>>>>building from within NetBeans. >>>>>>> >>>>>>> Thanks, >>>>>>> George >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>>-- >>>>>>----------------------------------------------- >>-- >>>>-- >>>>>>-------------------- >>>>>>Dr. Andreas Prlic >>>>>>Senior Scientist, RCSB PDB Protein Data Bank >>>>>>University of California, San Diego >>>>>>(+1) 858.246.0526 >>>>>>----------------------------------------------- >>-- >>>>-- >>>>>>-------------------- >>>>> >>>> >>>> >>>> >>>>-- >>>>------------------------------------------------- >>-- >>>>-------------------- >>>>Dr. Andreas Prlic >>>>Senior Scientist, RCSB PDB Protein Data Bank >>>>University of California, San Diego >>>>(+1) 858.246.0526 >>>>------------------------------------------------- >>-- >>>>-------------------- >>> >> >> >> >>-- >>--------------------------------------------------- >>-------------------- >>Dr. Andreas Prlic >>Senior Scientist, RCSB PDB Protein Data Bank >>University of California, San Diego >>(+1) 858.246.0526 >>--------------------------------------------------- >>-------------------- > -- ----------------------------------------------------------------------- Dr. Andreas Prlic Senior Scientist, RCSB PDB Protein Data Bank University of California, San Diego (+1) 858.246.0526 ----------------------------------------------------------------------- From bugzilla-daemon at portal.open-bio.org Thu Aug 12 05:12:22 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Aug 2010 01:12:22 -0400 Subject: [Biojava-dev] [Bug 2565] Cannot parse uniprot P02768 In-Reply-To: Message-ID: <201008120512.o7C5CMAV026392@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2565 gwaldon at geneinfinity.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |INVALID ------- Comment #1 from gwaldon at geneinfinity.org 2010-08-12 01:12 EST ------- Cannot reproduce error with today's build. Must have been fixed or record has changed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Aug 12 05:28:35 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 12 Aug 2010 01:28:35 -0400 Subject: [Biojava-dev] [Bug 2541] Exception is thrown when trying to parse a valid GenBank file In-Reply-To: Message-ID: <201008120528.o7C5SZ1a026843@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2541 gwaldon at geneinfinity.org changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |INVALID ------- Comment #3 from gwaldon at geneinfinity.org 2010-08-12 01:28 EST ------- As suggested by Peter, adding the sequence termination // solves the problem. This was most likely invalid. Please reopen if necessary. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From andreas at sdsc.edu Sat Aug 14 01:30:45 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 13 Aug 2010 18:30:45 -0700 Subject: [Biojava-dev] biojava-structure now depending on biojava3-core and biojava3-alignment Message-ID: Hi, I just committed a major update to the biojava-structure modules. They no longer depend on anything biojava 1.7 related, but only on biojava3-core and biojava3-alignment. That's one step closer to getting ready for a new 3.0 release... Andreas From HWillis at scripps.edu Sat Aug 14 03:22:39 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Fri, 13 Aug 2010 23:22:39 -0400 Subject: [Biojava-dev] biojava-structure now depending on biojava3-core and biojava3-alignment Message-ID: Andreas I figured you were burning some brain cells before going on vacation. The only major core elements missing is a good design for features. I want it to be transparent and descriptive without adding a ton of methods. I will check out the latest and do some testing. Scooter ----- Reply message ----- From: "Andreas Prlic" Date: Fri, Aug 13, 2010 9:30 pm Subject: biojava-structure now depending on biojava3-core and biojava3-alignment To: "Scooter Willis" , "Mark Chapman" Cc: "biojava-dev" Hi, I just committed a major update to the biojava-structure modules. They no longer depend on anything biojava 1.7 related, but only on biojava3-core and biojava3-alignment. That's one step closer to getting ready for a new 3.0 release... Andreas From bugzilla-daemon at portal.open-bio.org Tue Aug 17 16:44:46 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 17 Aug 2010 12:44:46 -0400 Subject: [Biojava-dev] [Bug 3132] New: SITE records in PDBFileReader Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=3132 Summary: SITE records in PDBFileReader Product: BioJava Version: unspecified Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: structure AssignedTo: biojava-dev at biojava.org ReportedBy: darnells at dnastar.com Feature request from the BioJava community: http://lists.open-bio.org/pipermail/biojava-l/2010-August/007254.html I am interested in parsing SITE records from a PDB file. ??[...] would it be possible to add this capability to PDBFileReader and the Structure class? http://lists.open-bio.org/pipermail/biojava-l/2010-August/007260.html REMARK 800 provides a very useful SITE_DESCRIPTION for each SITE_IDENTIFIER code in use in the SITE records. Could the site name also be associated with the site identifier and residues? There is precedence for parsing REMARK records in BioJava (e.g. experiment type, resolution), but this is a special case where REMARK 800 and SITE records are dependent on one another and physically separated in the header. Reply from Andreas Prlic: http://lists.open-bio.org/pipermail/biojava-l/2010-August/007257.html - Take a look at PDBFileParser.java and at http://www.wwpdb.org/documentation/format32/sect7.html - It needs a new Handler method for the Site records that builds up the data containers. - Create a new bean that will contain the data for the SITE record - Instead of having fields for insertion code residue nr and chain IDs, you can use the new PDBResidueNumber.java class to group this together. - Add a get/set method for the Site beans to the Structure class - Create a junit test that make sure the parsing works ok. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From andreas at sdsc.edu Wed Aug 18 18:26:23 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 18 Aug 2010 11:26:23 -0700 Subject: [Biojava-dev] Last week of Google Summer of Code Message-ID: Hi, This is the last week of this year's Google Summer of Code project and I am happy to announce that our two students Mark Chapman and Jianjiong Gao did an amazing job on their two projects "All Java Multiple Sequence Alignment" (MSA) and "Identification and Classification of Posttranslational Modification of Proteins" (PTM). For Multiple Sequence Alignments we?now have a flexible and multi-threaded MSA implementation that works in linear space and that, as an option, allows the users to define anchors that are used in the build up of the multiple alignment. The code is available as part of the new biojava3-alignment module. The Posttranslational Modification module (biojava3-protmod) can detect three different types of protein modifications in protein structures. It comes with an XML file & Java data structures to store information about different types of protein modifications, and contains entries from RESID, PDBCC and PSI-MOD. There is also a visualisation component to display cross linked PTM on a sequence viewer. Both Mark and Jianjiong have expressed their interest in maintaining and further developing their modules and I am looking forward to interacting more with them in the future. I want to thank the Mentors and Co-Mentors Peter Rose, Kyle Ellrott and Scooter Willis for their help and guidance for the projects, without them this would not have been possible. Thanks also to Robert Buels and the ?Open Bioinformatics Foundation for organizing our applications for GSoC and last, but not least, Google for sponsoring this Summer of Code. Happy BioJava-ing, Andreas From darnells at dnastar.com Mon Aug 23 16:21:41 2010 From: darnells at dnastar.com (Steve Darnell) Date: Mon, 23 Aug 2010 11:21:41 -0500 Subject: [Biojava-dev] Structures test failures Message-ID: Greetings, I am attempting to build biojava from the anonymous svn server at github; I cannot even connect to the anonymous svn server at open-bio. I am unable to package/install the structure module and I suspect it is caused by two failures that occur in the unit tests: Failed tests: testOldSecOutput(org.biojava.bio.structure.TestSECalignment) testParsePairs(org.biojava.bio.structure.align.TestAlignDBSearchPairs) Tests run: 61, Failures: 2, Errors: 0, Skipped: 0 The build log has no indication that the structure module was packaged or installed. I have yet to configure Eclipse to build biojava, so this is how I attempted to build it on the command line: $ svn co http://svn.github.com/biojava/biojava.git ./biojava $ cd biojava $ mvn clean install > biojava.log& tail -f biojava.log $ cd ~/.m2/repository/org/biojava $ ls % alignment % biojava % biojava3-alignment % biojava3-core % biojava3-phylo % blast % bytecode % core I am new to maven, so it is likely I am overlooking something obvious. I have reproduced this outcome on OSX 10.6.4 and Ubuntu 10.04. I would appreciate any suggestions that the dev list might have. Best regards, Steve Darnell From jacobsen at ebi.ac.uk Tue Aug 24 12:41:12 2010 From: jacobsen at ebi.ac.uk (Jules Jacobsen) Date: Tue, 24 Aug 2010 13:41:12 +0100 Subject: [Biojava-dev] Structures test failures In-Reply-To: References: Message-ID: <4C73BDE8.4050708@ebi.ac.uk> Hi Steve, Unless you're a staunch Eclipse user I can recommend Netbeans for maven as it works very well with it. One issue I've had is that Netbeans 6.9 is generally slower and more cumbersome than 6.8, worse still 6.9.1 seems incapable of connecting to the repository at all. So stick with 6.8 and it should be fine. I can confirm that the most recent dev version of biojava-structure does not fail any tests, although I just tried checking out the read-only public version from svn://code.open-bio.org/biojava/biojava-live/trunk and encountered a 'rev-props' error and now I'm unable to connect at all... Your error isn't likely to be maven related - if everything else ran fine it's probably just that you have an out-of date version of the source as the structure module is independent of the other modules, but I can't tell for sure without seeing the error messages produced by the failed tests. Regards, Jules On 23/08/2010 17:21, Steve Darnell wrote: > Greetings, > > > > I am attempting to build biojava from the anonymous svn server at > github; I cannot even connect to the anonymous svn server at open-bio. > I am unable to package/install the structure module and I suspect it is > caused by two failures that occur in the unit tests: > > > > Failed tests: > > testOldSecOutput(org.biojava.bio.structure.TestSECalignment) > > testParsePairs(org.biojava.bio.structure.align.TestAlignDBSearchPairs) > > > > Tests run: 61, Failures: 2, Errors: 0, Skipped: 0 > > > > The build log has no indication that the structure module was packaged > or installed. > > > > I have yet to configure Eclipse to build biojava, so this is how I > attempted to build it on the command line: > > > > $ svn co http://svn.github.com/biojava/biojava.git ./biojava > > $ cd biojava > > $ mvn clean install> biojava.log& tail -f biojava.log > > $ cd ~/.m2/repository/org/biojava > > $ ls > > % alignment > > % biojava > > % biojava3-alignment > > % biojava3-core > > % biojava3-phylo > > % blast > > % bytecode > > % core > > > > I am new to maven, so it is likely I am overlooking something obvious. > I have reproduced this outcome on OSX 10.6.4 and Ubuntu 10.04. I would > appreciate any suggestions that the dev list might have. > > > > Best regards, > > Steve Darnell > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From HWillis at scripps.edu Tue Aug 24 13:08:11 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 24 Aug 2010 09:08:11 -0400 Subject: [Biojava-dev] Structures test failures In-Reply-To: <4C73BDE8.4050708@ebi.ac.uk> References: <4C73BDE8.4050708@ebi.ac.uk> Message-ID: <45B9BED2-4713-455B-A660-A855E7CBE39C@scripps.edu> Jules I am not having in problem with Netbeans 6.9 with the latest patches. I do like the improved debug GUI in 6.9 for inspecting collections. I haven't done a fresh install of 6.9.1. Scooter On Aug 24, 2010, at 8:41 AM, Jules Jacobsen wrote: > Hi Steve, > > Unless you're a staunch Eclipse user I can recommend Netbeans for maven > as it works very well with it. One issue I've had is that Netbeans 6.9 > is generally slower and more cumbersome than 6.8, worse still 6.9.1 > seems incapable of connecting to the repository at all. So stick with > 6.8 and it should be fine. > > I can confirm that the most recent dev version of biojava-structure does > not fail any tests, although I just tried checking out the read-only > public version from svn://code.open-bio.org/biojava/biojava-live/trunk > and encountered a 'rev-props' error and now I'm unable to connect at all... > > Your error isn't likely to be maven related - if everything else ran > fine it's probably just that you have an out-of date version of the > source as the structure module is independent of the other modules, but > I can't tell for sure without seeing the error messages produced by the > failed tests. > > Regards, > > Jules > > On 23/08/2010 17:21, Steve Darnell wrote: >> Greetings, >> >> >> >> I am attempting to build biojava from the anonymous svn server at >> github; I cannot even connect to the anonymous svn server at open-bio. >> I am unable to package/install the structure module and I suspect it is >> caused by two failures that occur in the unit tests: >> >> >> >> Failed tests: >> >> testOldSecOutput(org.biojava.bio.structure.TestSECalignment) >> >> testParsePairs(org.biojava.bio.structure.align.TestAlignDBSearchPairs) >> >> >> >> Tests run: 61, Failures: 2, Errors: 0, Skipped: 0 >> >> >> >> The build log has no indication that the structure module was packaged >> or installed. >> >> >> >> I have yet to configure Eclipse to build biojava, so this is how I >> attempted to build it on the command line: >> >> >> >> $ svn co http://svn.github.com/biojava/biojava.git ./biojava >> >> $ cd biojava >> >> $ mvn clean install> biojava.log& tail -f biojava.log >> >> $ cd ~/.m2/repository/org/biojava >> >> $ ls >> >> % alignment >> >> % biojava >> >> % biojava3-alignment >> >> % biojava3-core >> >> % biojava3-phylo >> >> % blast >> >> % bytecode >> >> % core >> >> >> >> I am new to maven, so it is likely I am overlooking something obvious. >> I have reproduced this outcome on OSX 10.6.4 and Ubuntu 10.04. I would >> appreciate any suggestions that the dev list might have. >> >> >> >> Best regards, >> >> Steve Darnell >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From darnells at dnastar.com Tue Aug 24 20:19:57 2010 From: darnells at dnastar.com (Steve Darnell) Date: Tue, 24 Aug 2010 15:19:57 -0500 Subject: [Biojava-dev] Structures test failures In-Reply-To: <4C73BDE8.4050708@ebi.ac.uk> References: <4C73BDE8.4050708@ebi.ac.uk> Message-ID: Jules, Thanks for the suggestion. In the interim, I was able to work around my Eclipse and svn problems. It's not a fully integrated solution, but it works. The Subclipse svn plug-in from Tigris would fail during checkout with an exception stating "RA layer request failed svn: REPORT of '[...]' 200 OK" for both Windows 7 and OSX 10.6. For Windows, there are reports of the Windows indexing service, antivirus scanners, or the Subclipse JavaHL adapter causing svn problems. Resolving these issues did not help. In the end, I used a git client on Windows to clone the biojava project from github and then imported it into Eclipse as an existing Maven project with the m2eclipse plug-in. I was able to successfully build and install biojava3-structure and its dependencies without test failures. To the list, If anyone has had similar problems setting up Eclispe, I would be grateful to hear how you resolved the problem. If not, my solution is good enough for now. Regards, Steve -----Original Message----- From: biojava-dev-bounces at lists.open-bio.org [mailto:biojava-dev-bounces at lists.open-bio.org] On Behalf Of Jules Jacobsen Sent: Tuesday, August 24, 2010 7:41 AM To: biojava-dev at lists.open-bio.org Subject: Re: [Biojava-dev] Structures test failures Hi Steve, Unless you're a staunch Eclipse user I can recommend Netbeans for maven as it works very well with it. One issue I've had is that Netbeans 6.9 is generally slower and more cumbersome than 6.8, worse still 6.9.1 seems incapable of connecting to the repository at all. So stick with 6.8 and it should be fine. I can confirm that the most recent dev version of biojava-structure does not fail any tests, although I just tried checking out the read-only public version from svn://code.open-bio.org/biojava/biojava-live/trunk and encountered a 'rev-props' error and now I'm unable to connect at all... Your error isn't likely to be maven related - if everything else ran fine it's probably just that you have an out-of date version of the source as the structure module is independent of the other modules, but I can't tell for sure without seeing the error messages produced by the failed tests. Regards, Jules On 23/08/2010 17:21, Steve Darnell wrote: > Greetings, > > I am attempting to build biojava from the anonymous svn server at > github; I cannot even connect to the anonymous svn server at open-bio. > I am unable to package/install the structure module and I suspect it is > caused by two failures that occur in the unit tests: > > Failed tests: > testOldSecOutput(org.biojava.bio.structure.TestSECalignment) > testParsePairs(org.biojava.bio.structure.align.TestAlignDBSearchPairs) > > Tests run: 61, Failures: 2, Errors: 0, Skipped: 0 > > The build log has no indication that the structure module was packaged > or installed. > > I have yet to configure Eclipse to build biojava, so this is how I > attempted to build it on the command line: > > $ svn co http://svn.github.com/biojava/biojava.git ./biojava > $ cd biojava > $ mvn clean install> biojava.log& tail -f biojava.log > $ cd ~/.m2/repository/org/biojava > $ ls > % alignment > % biojava > % biojava3-alignment > % biojava3-core > % biojava3-phylo > % blast > % bytecode > % core > > I am new to maven, so it is likely I am overlooking something obvious. > I have reproduced this outcome on OSX 10.6.4 and Ubuntu 10.04. I would > appreciate any suggestions that the dev list might have. > > Best regards, > Steve Darnell > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev _______________________________________________ biojava-dev mailing list biojava-dev at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-dev From bugzilla-daemon at portal.open-bio.org Mon Aug 30 17:44:56 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 30 Aug 2010 13:44:56 -0400 Subject: [Biojava-dev] [Bug 3132] SITE records in PDBFileReader In-Reply-To: Message-ID: <201008301744.o7UHiu26023875@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3132 amr_alhossary at hotmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Aug 30 18:22:45 2010 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 30 Aug 2010 14:22:45 -0400 Subject: [Biojava-dev] [Bug 3132] SITE records in PDBFileReader In-Reply-To: Message-ID: <201008301822.o7UIMjMK025040@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=3132 amr_alhossary at hotmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED ------- Comment #1 from amr_alhossary at hotmail.com 2010-08-30 14:22 EST ------- Done! SITE records are now parsed. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From deniz.koellhofer at cambia.org Tue Aug 31 06:46:30 2010 From: deniz.koellhofer at cambia.org (Deniz Koellhofer) Date: Tue, 31 Aug 2010 16:46:30 +1000 Subject: [Biojava-dev] biojava3 BLAST parser Message-ID: Hi, I wanted to find out the current state of blast parsing efforts in biojava3 - especially for ncbi blastxml output? I had a quick look and found some DOM based code fragments in org.biojava3.genome.query.BlastXMLQuery. Is there already anybody working on a more comprehensive SAX parser? The biojava1.7.1 blastxml parser seems to work fine, however some of the tags in NCBI-BLASTN 2.2.23+ output like Hsp_midline, BlastOutput_param don't seem to get parsed properly in org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade. Cheers, Deniz -- Deniz Koellhofer Cambia Initiative for Open Innovation (IOI) Cambia at QUT, G301, 2 George Street, Brisbane Qld 4000, Australia From HWillis at scripps.edu Tue Aug 31 10:43:00 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 31 Aug 2010 06:43:00 -0400 Subject: [Biojava-dev] biojava3 BLAST parser In-Reply-To: References: Message-ID: Deniz Can you provide some requirements regarding parsing the Blast XML. I tend to use XPATH and the DOM object to get to the data elements of interest so you already have the ability to load the Blast XML and work with the data. The difficulty of "parsing" is not an issue with XML. The BlastXMLQuery is an example of searching the Blast XML to get results. Are you wanting the XML elements translated to Java classes? Thanks Scooter On Aug 31, 2010, at 2:46 AM, Deniz Koellhofer wrote: > Hi, > > I wanted to find out the current state of blast parsing efforts in biojava3 > - especially for ncbi blastxml output? > > I had a quick look and found some DOM based code fragments > in org.biojava3.genome.query.BlastXMLQuery. Is there already anybody working > on a more comprehensive SAX parser? > > The biojava1.7.1 blastxml parser seems to work fine, however some of the > tags in NCBI-BLASTN 2.2.23+ output like Hsp_midline, BlastOutput_param don't > seem to get parsed properly > in org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade. > > Cheers, > Deniz > > -- > Deniz Koellhofer > Cambia > Initiative for Open Innovation (IOI) > Cambia at QUT, G301, 2 George Street, Brisbane Qld 4000, Australia > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev From deniz.koellhofer at cambia.org Tue Aug 31 12:49:21 2010 From: deniz.koellhofer at cambia.org (Deniz Koellhofer) Date: Tue, 31 Aug 2010 22:49:21 +1000 Subject: [Biojava-dev] biojava3 BLAST parser In-Reply-To: References: Message-ID: *Hi Scooter,* * * *Thanks for the reply. I guess the BlastXMLQuery is a good example to show how to quickly extract information from a BLAST result. * * * *But in my opinion biojava3 should alo have a Blast parser that generates java beans containing the complete Blast result set - similar to what biojava1.7.1 was doing. So yeah, I'm after translating the XML elements to Java classes.* * * *Would something like that fit into one of the biojava3 modules? homology, I/O?* * * *Thanks,* *Deniz* * * On Tue, Aug 31, 2010 at 8:43 PM, Scooter Willis wrote: > Deniz > > Can you provide some requirements regarding parsing the Blast XML. I tend > to use XPATH and the DOM object to get to the data elements of interest so > you already have the ability to load the Blast XML and work with the data. > The difficulty of "parsing" is not an issue with XML. The BlastXMLQuery is > an example of searching the Blast XML to get results. Are you wanting the > XML elements translated to Java classes? > > Thanks > > Scooter > > On Aug 31, 2010, at 2:46 AM, Deniz Koellhofer wrote: > > > Hi, > > > > I wanted to find out the current state of blast parsing efforts in > biojava3 > > - especially for ncbi blastxml output? > > > > I had a quick look and found some DOM based code fragments > > in org.biojava3.genome.query.BlastXMLQuery. Is there already anybody > working > > on a more comprehensive SAX parser? > > > > The biojava1.7.1 blastxml parser seems to work fine, however some of the > > tags in NCBI-BLASTN 2.2.23+ output like Hsp_midline, BlastOutput_param > don't > > seem to get parsed properly > > in org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade. > > > > Cheers, > > Deniz > > > > -- > > Deniz Koellhofer > > Cambia > > Initiative for Open Innovation (IOI) > > Cambia at QUT, G301, 2 George Street, Brisbane Qld 4000, Australia > > _______________________________________________ > > biojava-dev mailing list > > biojava-dev at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-dev > > -- -- Deniz Koellhofer Cambia Initiative for Open Innovation (IOI) Cambia at QUT, G301, 2 George Street, Brisbane Qld 4000, Australia From HWillis at scripps.edu Tue Aug 31 14:11:30 2010 From: HWillis at scripps.edu (Scooter Willis) Date: Tue, 31 Aug 2010 10:11:30 -0400 Subject: [Biojava-dev] biojava3 BLAST parser In-Reply-To: References: Message-ID: <47FD5948-0439-45C6-A1AB-22E7CC8D17A6@scripps.edu> Deniz It would be great to formalize the XML blast results as Java classes. Do you have any interest in taking on the project? Capturing the blast alignment using the new alignment classes would be a very nice feature. I like using XPATH as the query language to select for hits of interest which should allow for a SAX based approach to minimize the impact of very large XML files. XPATH and SAX does appear to have some constraints (http://stackoverflow.com/questions/1863250/is-it-there-any-xpath-processor-for-sax-model) Probably makes sense to have a Blast module that would depend on core and alignment. Thanks Scooter On Aug 31, 2010, at 8:49 AM, Deniz Koellhofer wrote: Hi Scooter, Thanks for the reply. I guess the BlastXMLQuery is a good example to show how to quickly extract information from a BLAST result. But in my opinion biojava3 should alo have a Blast parser that generates java beans containing the complete Blast result set - similar to what biojava1.7.1 was doing. So yeah, I'm after translating the XML elements to Java classes. Would something like that fit into one of the biojava3 modules? homology, I/O? Thanks, Deniz On Tue, Aug 31, 2010 at 8:43 PM, Scooter Willis > wrote: Deniz Can you provide some requirements regarding parsing the Blast XML. I tend to use XPATH and the DOM object to get to the data elements of interest so you already have the ability to load the Blast XML and work with the data. The difficulty of "parsing" is not an issue with XML. The BlastXMLQuery is an example of searching the Blast XML to get results. Are you wanting the XML elements translated to Java classes? Thanks Scooter On Aug 31, 2010, at 2:46 AM, Deniz Koellhofer wrote: > Hi, > > I wanted to find out the current state of blast parsing efforts in biojava3 > - especially for ncbi blastxml output? > > I had a quick look and found some DOM based code fragments > in org.biojava3.genome.query.BlastXMLQuery. Is there already anybody working > on a more comprehensive SAX parser? > > The biojava1.7.1 blastxml parser seems to work fine, however some of the > tags in NCBI-BLASTN 2.2.23+ output like Hsp_midline, BlastOutput_param don't > seem to get parsed properly > in org.biojava.bio.program.sax.blastxml.BlastXMLParserFacade. > > Cheers, > Deniz > > -- > Deniz Koellhofer > Cambia > Initiative for Open Innovation (IOI) > Cambia at QUT, G301, 2 George Street, Brisbane Qld 4000, Australia > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev -- -- Deniz Koellhofer Cambia Initiative for Open Innovation (IOI) Cambia at QUT, G301, 2 George Street, Brisbane Qld 4000, Australia From sheoran143 at gmail.com Fri Aug 20 00:45:29 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Fri, 20 Aug 2010 00:45:29 -0000 Subject: [Biojava-dev] Required Correction in GenbankLocationParser class Message-ID: <4C6DD03C.1080909@gmail.com> Their is problem with GenbankLocationParser class, this class don't process genbank record with Accession: M32882. LocationParser class fails at following line in genbank record: gene join((8298.8300)..10206,1..855) /gene="bcn" mRNA join((8298.8300)..10206,1..855) /gene="bcn" /note="alternative transcript" Exception stack trace is as follows: Could not understand position: 10206,1..855 org.biojava.bio.seq.io.ParseException: Could not understand position: 10206,1..855 at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:277) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:244) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) I did some investigation in following matter, and found the defect in regular expression named as "gp" in GenbankLocationParser class. This error can be fixed by applying attached patch. And then for testing I have created a method which proves that it can now understand all the possible combination of location. This test class is also attached so that you can test my patch before and after its application. I don't have access to svn so please apply this patch for me, and let me know if you approve this patch or not. Thanks Deepak Sheoran -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: GenbankLocationParser.patch URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: LocationParserTest.java URL: From sheoran143 at gmail.com Fri Aug 20 00:48:24 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Fri, 20 Aug 2010 00:48:24 -0000 Subject: [Biojava-dev] Required Correction in GenbankLocationParser class Message-ID: <4C6DD0E8.8070704@gmail.com> Their is problem with GenbankLocationParser class, this class don't process genbank record with Accession: M32882. LocationParser class fails at following line in genbank record: gene join((8298.8300)..10206,1..855) /gene="bcn" mRNA join((8298.8300)..10206,1..855) /gene="bcn" /note="alternative transcript" Exception stack trace is as follows: Could not understand position: 10206,1..855 org.biojava.bio.seq.io.ParseException: Could not understand position: 10206,1..855 at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) at org.biojavax.bio.seq.io.GenbankLocationParser.parsePosition(GenbankLocationParser.java:285) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:277) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocString(GenbankLocationParser.java:244) at org.biojavax.bio.seq.io.GenbankLocationParser.parseLocation(GenbankLocationParser.java:131) I did some investigation in following matter, and found the defect in regular expression named as "gp" in GenbankLocationParser class. This error can be fixed by applying attached patch. And then for testing I have created a method which proves that it can now understand all the possible combination of location. This test class is also attached so that you can test my patch before and after its application. I don't have access to svn so please apply this patch for me, and let me know if you approve this patch or not. Thanks Deepak Sheoran -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: GenbankLocationParser.patch URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: LocationParserTest.java URL: