From dreher at mpiib-berlin.mpg.de Wed Feb 1 09:57:03 2006 From: dreher at mpiib-berlin.mpg.de (Felix Dreher) Date: Wed Feb 1 10:02:30 2006 Subject: [Biojava-l] BioSQL cvs versions In-Reply-To: References: Message-ID: <43E0CC3F.4010402@mpiib-berlin.mpg.de> Hello Mark, thank you very much for your fast reply. In the meantime I was busy with installing and configuring a new version of 'Sun Java Studio Creator', which I use to develop at the time. As you suggested, I would like to start using Hibernate. Is there any documentation right now about the Hibernate-BioSQL interaction? Thank you, Felix mark.schreiber@novartis.com wrote: >Dear Felix, > >We have found a number of deficiencies in biojava's support of biosql. >Therefore we have moved to a new model using hibernate to overcome several >problems. This will be officially released in biojava1.5. In the meantime >you can download the development version from CVS. > >Having said that, the best supported database versions in biojava 1.4 are >Oracle and MySQL. These have received the most testing and support. If you >have a chance (and cannot use Hibernate) I would suggest using one of >those. Although someone may offer a bug fix for this problem we do not >plan to support the old biojava/biosql mappings after 1.5 is released. >They have been deprecated in the CVS. The official way to interact with >biosql will be via Hibernate. > >- Mark > >Mark Schreiber >Research Investigator (Bioinformatics) > >Novartis Institute for Tropical Diseases (NITD) >10 Biopolis Road >#05-01 Chromos >Singapore 138670 >www.nitd.novartis.com > >phone +65 6722 2973 >fax +65 6722 2910 > > > > > >Felix Dreher >Sent by: biojava-l-bounces@portal.open-bio.org >01/20/2006 10:45 PM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] BioSQL cvs versions > > >Hello, >when I try to add a sequence to a BioSQL-DB, the following exception is >thrown: > >*Exception Details: * org.postgresql.util.PSQLException > ERROR: column "seqfeature_key_id" of relation "seqfeature" does not >exist > >|org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512) >org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297) >org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188) >org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430) >org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346) >org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250) >org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205) >org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205) >org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeature(FeaturesSQL.java:804) >org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:760) >org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:729) >org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSequenceDB.java:481) >org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequenceDB.java:374) >. >. >. > >| >apparently the BioJava- and BioSQL-version don't really match. >I use the following cvs-version of the corresponding class: >/BioSQLSequenceDB.java/1.70/Fri Jun 10 07:48:11 2005// >Further I use the latest cvs-version of the BioSQL-script >'biosqldb-pg.sql' (it's from June 2005). >Are there any suggestions how this could be solved? > >Thank you, >Felix > > > > > > > > -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 From mark.schreiber at novartis.com Wed Feb 1 20:02:17 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Feb 1 19:58:24 2006 Subject: [Biojava-l] BioSQL cvs versions Message-ID: Hello Felix - The best document is the BioJavaX docbook in the docs/ folder of the CVS distribution of biojava. - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 Felix Dreher 02/01/2006 10:57 PM To: Mark Schreiber/GP/Novartis@PH cc: biojava-l@biojava.org Subject: Re: [Biojava-l] BioSQL cvs versions Hello Mark, thank you very much for your fast reply. In the meantime I was busy with installing and configuring a new version of 'Sun Java Studio Creator', which I use to develop at the time. As you suggested, I would like to start using Hibernate. Is there any documentation right now about the Hibernate-BioSQL interaction? Thank you, Felix mark.schreiber@novartis.com wrote: >Dear Felix, > >We have found a number of deficiencies in biojava's support of biosql. >Therefore we have moved to a new model using hibernate to overcome several >problems. This will be officially released in biojava1.5. In the meantime >you can download the development version from CVS. > >Having said that, the best supported database versions in biojava 1.4 are >Oracle and MySQL. These have received the most testing and support. If you >have a chance (and cannot use Hibernate) I would suggest using one of >those. Although someone may offer a bug fix for this problem we do not >plan to support the old biojava/biosql mappings after 1.5 is released. >They have been deprecated in the CVS. The official way to interact with >biosql will be via Hibernate. > >- Mark > >Mark Schreiber >Research Investigator (Bioinformatics) > >Novartis Institute for Tropical Diseases (NITD) >10 Biopolis Road >#05-01 Chromos >Singapore 138670 >www.nitd.novartis.com > >phone +65 6722 2973 >fax +65 6722 2910 > > > > > >Felix Dreher >Sent by: biojava-l-bounces@portal.open-bio.org >01/20/2006 10:45 PM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] BioSQL cvs versions > > >Hello, >when I try to add a sequence to a BioSQL-DB, the following exception is >thrown: > >*Exception Details: * org.postgresql.util.PSQLException > ERROR: column "seqfeature_key_id" of relation "seqfeature" does not >exist > >|org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512) >org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297) >org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188) >org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430) >org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346) >org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250) >org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205) >org.apache.commons.dbcp.DelegatingPreparedStatement.executeQuery(DelegatingPreparedStatement.java:205) >org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeature(FeaturesSQL.java:804) >org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:760) >org.biojava.bio.seq.db.biosql.FeaturesSQL.persistFeatures(FeaturesSQL.java:729) >org.biojava.bio.seq.db.biosql.BioSQLSequenceDB._addSequence(BioSQLSequenceDB.java:481) >org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.addSequence(BioSQLSequenceDB.java:374) >. >. >. > >| >apparently the BioJava- and BioSQL-version don't really match. >I use the following cvs-version of the corresponding class: >/BioSQLSequenceDB.java/1.70/Fri Jun 10 07:48:11 2005// >Further I use the latest cvs-version of the BioSQL-script >'biosqldb-pg.sql' (it's from June 2005). >Are there any suggestions how this could be solved? > >Thank you, >Felix > > > > > > > > -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 From mark.schreiber at novartis.com Wed Feb 1 22:19:02 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Feb 1 22:15:11 2006 Subject: [Biojava-l] Help needed to add "Number of Bits" vertical and column number labeling to DistributionLogos Message-ID: >Actually, I am doing all of those things (Graphics2D object, BufferedImage, etc.). I would like to get the >code that draws vertical and horizontal labels and such. I have seen the results in Samiul Hasan's thesis >paper. Ah! I forgot that I can use modern technology as a visual aid..... > >I would like to be able to do this (copied from Samiul Hasan's thesis paper).......... take a look at http://www.sanger.ac.uk/Info/theses/ - Mark From heatkent at gmail.com Thu Feb 2 01:41:56 2006 From: heatkent at gmail.com (Heather Kent) Date: Thu Feb 2 04:18:07 2006 Subject: [Biojava-l] concatenating chromatograms Message-ID: I would like to write a small application that would concatenate abi or scf chromatograms and write out a new chromatogram file.. has anyone done something similar to this or seen any code that would be helpful for me, i am new at programming and have been looking through the Biojava API From mark.schreiber at novartis.com Thu Feb 2 04:51:02 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Thu Feb 2 04:49:26 2006 Subject: [Biojava-l] concatenating chromatograms Message-ID: Hi Heather, If you look at the API docs under the Chromatogram and trace file support http://www.biojava.org/docs/api14/index.html there are the classes that biojava has to support traces. The best package to use is org.biojava.bio.chromatogram. One possible way would be to do something like this... Chromatogram c1 = ChromatogramFactory.create(file1); Chromatogram c2 = ChromatogramFactory.create(file2); SimpleChromatogram mergeChrom = new SimpleChromatogram(); //repeat all steps below for each DNA base, eg replace DNATools.a() with DNATools.g() etc int[] a1 = c1.getTrace(DNATools.a()); int[] a2 = c2.getTrace(DNATools.a()); int[] merged = new int[a1.length + a2.lenght] //use a loop here to copy a1 and a2 into merged //now set the DNATools.a() trace for mergeChrom mergeChrom.setTraceValues(DNATools.a(), merged, merged.length); Hope this works! - Mark Heather Kent Sent by: biojava-l-bounces@portal.open-bio.org 02/02/2006 02:41 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] concatenating chromatograms I would like to write a small application that would concatenate abi or scf chromatograms and write out a new chromatogram file.. has anyone done something similar to this or seen any code that would be helpful for me, i am new at programming and have been looking through the Biojava API _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From russ at kepler-eng.com Thu Feb 2 09:23:30 2006 From: russ at kepler-eng.com (Russ Kepler) Date: Thu Feb 2 09:48:34 2006 Subject: [Biojava-l] concatenating chromatograms In-Reply-To: References: Message-ID: <200602020723.30627.russ@kepler-eng.com> On Wednesday 01 February 2006 11:41 pm, Heather Kent wrote: > I would like to write a small application that would concatenate abi or scf > chromatograms and write out a new chromatogram file.. > has anyone done something similar to this or seen any code that would be > helpful for me, i am new at programming > and have been looking through the Biojava API I'm familiar with the ABI trace code and what you want to do would not be difficult, but the result may not work the way that you want it to. A basecaller will likely be fooled in the transition between the traces and miscall or call no peaks for some time unless you match the local frequencies of each trace around the transition, and tagging the start of one run to the end of the other is a pretty good way to not do that. If you're not going to run things through a basecaller all you really need to do it is to catenate the trace and basecalls arrays and sequences. These are all exposed in gets(). If the data is coming from a newish AB instrument you may want to add code to handle the Q values from the KB caller and catenate those arrays as well. Writing the new file would be a new capability, but the existing reader should show you the way to do it. From ady at sanger.ac.uk Thu Feb 2 11:23:54 2006 From: ady at sanger.ac.uk (Andy Yates) Date: Thu Feb 2 12:09:53 2006 Subject: [Biojava-l] concatenating chromatograms In-Reply-To: <200602020723.30627.russ@kepler-eng.com> References: <200602020723.30627.russ@kepler-eng.com> Message-ID: <43E2321A.2090804@sanger.ac.uk> Throwing my opinion into the ring on this I've got to agree with Russ here. I would think that SCF is a more sensible format for this kind of procedure but there is the added bonus that the SCF parser does not encode delta-delta values which the SCF specification is completely dependant on. SCF does have the advantage that nothing "really" assumes anything about them so you can fiddle about with the chromatogram and so long as the things you create in the output Chromatogram are normalised with respect to the cuts then everything should be hunky dory. If you're doing this for space concerns can I suggest passing the SCF files through a compression filter. You get the best results with a BZIP2 compression algorithm (the format was developed for bzip compression) but GZIP works really well and is the choice of compression format here at the Sanger Centre. Hope that helps, Andy Yates ~~~~~~~~~~~~~~~ Senior Computer Biologist, Cancer Genome Project. Wellcome Trust Sanger Institute, Hinxton, Cambridge Russ Kepler wrote: > On Wednesday 01 February 2006 11:41 pm, Heather Kent wrote: >> I would like to write a small application that would concatenate abi or scf >> chromatograms and write out a new chromatogram file.. >> has anyone done something similar to this or seen any code that would be >> helpful for me, i am new at programming >> and have been looking through the Biojava API > > I'm familiar with the ABI trace code and what you want to do would not be > difficult, but the result may not work the way that you want it to. A > basecaller will likely be fooled in the transition between the traces and > miscall or call no peaks for some time unless you match the local frequencies > of each trace around the transition, and tagging the start of one run to the > end of the other is a pretty good way to not do that. > > If you're not going to run things through a basecaller all you really need to > do it is to catenate the trace and basecalls arrays and sequences. These are > all exposed in gets(). If the data is coming from a newish AB instrument you > may want to add code to handle the Q values from the KB caller and catenate > those arrays as well. > > Writing the new file would be a new capability, but the existing reader should > show you the way to do it. > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Thu Feb 2 21:37:36 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Thu Feb 2 21:33:41 2006 Subject: [Biojava-l] biojava wikimedia based home page Message-ID: Hi all - The OBF is moving several of it's projects homepages to wikimedia based systems. There is a possibility that biojava will move to use this system too. I think this is a great way to establish a community based biojava web presence. The current home page suffers from the problem that only a few people can access and update it which creates a large burden and means it sometimes gets out of date. More hands will make things easier. The new look bioperl page is a great example of what can be done (www.bioperl.org). Before it is ready for prime-time some work needs to be done to copy content over from the current biojava site. We would like to ask for any volunteers who have some experience with Wikimedia who could help out. Please reply to me or to the list. Any help would be greatly appreciated. - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From sylvain.foisy at bioneq.qc.ca Fri Feb 3 10:43:41 2006 From: sylvain.foisy at bioneq.qc.ca (sylvain.foisy@bioneq.qc.ca) Date: Fri Feb 3 13:01:31 2006 Subject: [Biojava-l] biojava wikimedia based home page In-Reply-To: References: Message-ID: <22065.132.204.82.34.1138981421.squirrel@mail.bioneq.qc.ca> Hi Mark Marvelous idea ;-) We (the Quebec BIoinformatics Network) are in the process of moving our "Bioinformatics KnowledgeBase" (http://apps.bioneq.qc.ca/twiki/bin/view/Knowledgebase/WebHome) from its current TWiki format toward MediaWiki. We are offering to put this (on going) experience to use on the migration of the Biojava website. Who should we contact on the OBF side to initiate the project? Chris D.? Shameless plug: if anyone want to contribute to our Bioinformatics KnowledgeBase, please feel free to do so!! This is all done in the Wiki spirit. Don't get scared by the french side of thing, we'll take care of it ;-) > The OBF is moving several of it's projects homepages to wikimedia based > systems. There is a possibility that biojava will move to use this system > too. I think this is a great way to establish a community based biojava > web presence. The current home page suffers from the problem that only a > few people can access and update it which creates a large burden and means > it sometimes gets out of date. More hands will make things easier. The new > look bioperl page is a great example of what can be done > (www.bioperl.org). > > Before it is ready for prime-time some work needs to be done to copy > content over from the current biojava site. We would like to ask for any > volunteers who have some experience with Wikimedia who could help out. From guedes at unisul.br Fri Feb 3 12:55:15 2006 From: guedes at unisul.br (Dickson S. Guedes) Date: Fri Feb 3 14:02:44 2006 Subject: [Biojava-l] Re: [Biojava-dev] biojava wikimedia based home page In-Reply-To: References: Message-ID: <43E39903.7050300@unisul.br> Hi Mark, I mean that?s very good and think that I can help about. So I?ll learn more too. :) I don?t know more things about Wikimedia, but what I know let me think that isn't so hard or difficult. It?s only another way to write a hypertext. ;) []s Guedes mark.schreiber@novartis.com escreveu: > Hi all - > > The OBF is moving several of it's projects homepages to wikimedia based > systems. There is a possibility that biojava will move to use this system > too. I think this is a great way to establish a community based biojava > web presence. The current home page suffers from the problem that only a > few people can access and update it which creates a large burden and means > it sometimes gets out of date. More hands will make things easier. The new > look bioperl page is a great example of what can be done > (www.bioperl.org). > > Before it is ready for prime-time some work needs to be done to copy > content over from the current biojava site. We would like to ask for any > volunteers who have some experience with Wikimedia who could help out. > > Please reply to me or to the list. > > Any help would be greatly appreciated. > > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > _______________________________________________ > biojava-dev mailing list > biojava-dev@biojava.org > http://biojava.org/mailman/listinfo/biojava-dev > -- -- :: Dickson S. Guedes (guedes at unisul dot br) :: :: UNISUL - Universidade do Sul de Santa Catarina :: ATI - Assessoria de Tecnologia da Informa??o :: (0xx48) 621-3200 - http://www.unisul.br -- "H? 10 tipos de pessoas no mundo: as que entendem bin?rio, e as que n?o entendem" From foisys at sympatico.ca Fri Feb 3 11:34:30 2006 From: foisys at sympatico.ca (foisys@sympatico.ca) Date: Fri Feb 3 14:33:48 2006 Subject: [Biojava-l] Re:biojava wikimedia based home page Message-ID: <20060203163430.IOXX1601.tomts46-srv.bellnexxia.net@[209.226.175.82]> Hi Mark Marvelous idea ;-) We (the Quebec BIoinformatics Network) are in the process of moving our "Bioinformatics KnowledgeBase" (http://apps.bioneq.qc.ca/twiki/bin/view/Knowledgebase/WebHome) from its current TWiki format toward MediaWiki. We are offering to put this (on going) experience to use on the migration of the Biojava website. Who should we contact on the OBF side to initiate the project? Chris D.? Shameless plug: if anyone want to contribute to our Bioinformatics KnowledgeBase, please feel free to do so!! This is all done in the Wiki spirit. Don't get scared by the french side of thing, we'll take care of it ;-) > The OBF is moving several of it's projects homepages to wikimedia based > systems. There is a possibility that biojava will move to use this system > too. I think this is a great way to establish a community based biojava > web presence. The current home page suffers from the problem that only a > few people can access and update it which creates a large burden and means > it sometimes gets out of date. More hands will make things easier. The new > look bioperl page is a great example of what can be done > (www.bioperl.org). > > Before it is ready for prime-time some work needs to be done to copy > content over from the current biojava site. We would like to ask for any > volunteers who have some experience with Wikimedia who could help out. From e.willighagen at science.ru.nl Sat Feb 4 04:10:10 2006 From: e.willighagen at science.ru.nl (Egon Willighagen) Date: Sat Feb 4 05:28:01 2006 Subject: [Biojava-l] biojava wikimedia based home page In-Reply-To: References: Message-ID: <200602041010.11238.e.willighagen@science.ru.nl> On Friday 03 February 2006 03:37, mark.schreiber@novartis.com wrote: > The OBF is moving several of it's projects homepages to wikimedia based > systems. There is a possibility that biojava will move to use this system > too. I think this is a great way to establish a community based biojava > web presence. The current home page suffers from the problem that only a > few people can access and update it which creates a large burden and means > it sometimes gets out of date. More hands will make things easier. The new > look bioperl page is a great example of what can be done > (www.bioperl.org). I have good experience with Wiki's in the past, for example for Jmol. I would like to point out that Jmol perfectly fits in to Wiki's systems, or at least 4 of them: http://wiki.jmol.org/JmolProcessor It's a great way to enhance Wiki's with live protein structures. Egon -- e.willighagen@science.ru.nl PhD student on Molecular Representation in Chemometrics Radboud University Nijmegen Blog: http://chem-bla-ics.blogspot.com/ http://www.cac.science.ru.nl/people/egonw/ GPG: 1024D/D6336BA6 From shameer at ncbs.res.in Sun Feb 5 05:27:15 2006 From: shameer at ncbs.res.in (Shameer Khadar) Date: Sun Feb 5 06:19:12 2006 Subject: [Biojava-l] from C alpha trace to full co-ordinates In-Reply-To: References: Message-ID: <36486.192.168.1.176.1139135235.squirrel@192.168.1.176> Dear All, Any one is aware of a perl script / java code / class that can be used to construct full atomic coordinates of a protein from a given C(alpha) trace and optimizes side chain geometry. I tried the original program Maxsprout from Holms Group, But it is not giving me proper results (am getting errors like segmentation fault - backbonchain failed etc.) Since I need to use as a part of a web server - I would appreciate if any one could let me know about a perl script for the same. Thanks and cheers in advance, -- Mr. Shameer Khadar (JRF) Dr. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675 W - http://www.ncbs.res.in -------------------------------------------------- "Refrain from illusions, insist on work and not words, patiently seek divine and scientific truth." From shameer at ncbs.res.in Mon Feb 6 03:27:50 2006 From: shameer at ncbs.res.in (Shameer Khadar) Date: Mon Feb 6 04:01:49 2006 Subject: [Biojava-l] Need a slogan for OBF In-Reply-To: <47205.192.168.1.176.1139048133.squirrel@192.168.1.176> References: <001001c62793$bef08f70$93656785@zhur> <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com> <47205.192.168.1.176.1139048133.squirrel@192.168.1.176> Message-ID: <2888.192.168.4.38.1139214470.squirrel@192.168.4.38> Dear All, As we are moving to the all new look wiki-style-web - why dont we think about a unique logo + slogan that can express our spirit and excitement ??? For Example we can have a logo with O|B|F its full form and the slogan - any body is interested - i would be happy to design logos once we have done with the logo. I have a couple of suggestions -I hope all OBF members can sent much more powerful slogans than mine 'Let's Code for Life' 'Let's Decode Life' 'Let's Recode Life' 'Code your Life ' Happy O|B|!!! -- Mr. Shameer Khadar (JRF) Dr. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675 W - http://www.ncbs.res.in -------------------------------------------------- "Refrain from illusions, insist on work and not words, patiently seek divine and scientific truth." MM From tjaart at tuks.co.za Mon Feb 6 12:09:57 2006 From: tjaart at tuks.co.za (Tjaart de Beer) Date: Mon Feb 6 12:19:34 2006 Subject: [Biojava-l] Newbie struggling with secondary structure Message-ID: <43E782E5.6090502@tuks.co.za> Hi I am new to Biojava (and Java). I want to get the secondary structure assignment (located in the PDB file) for a PDB file. I have looked at the AminoAcid interfaces and classes but can't get the stuff to work (I ahev gotten some of the Structure interfaces to work...). Does someone maybe have an example of extracting assigned secondary structure from a PDB file? An alternative is simply to write a class which reads all the lines starting with HELIX or SHEET and somehow parse them into meaningful results. Any suggestions would be greatly appreciated! -- Tjaart de Beer The software required "Windows XP or better" ... so I installed Linux From ap3 at sanger.ac.uk Tue Feb 7 16:44:33 2006 From: ap3 at sanger.ac.uk (Andreas Prlic) Date: Tue Feb 7 17:02:53 2006 Subject: [Biojava-l] Newbie struggling with secondary structure In-Reply-To: <43E782E5.6090502@tuks.co.za> References: <43E782E5.6090502@tuks.co.za> Message-ID: Hi Tjaart, The PDB parser currently does not parse all of the header - the HELIX and SHEET lines containing the author's secondary structure assignments are are currently ignored. > An alternative is simply to write a class which reads all the lines > starting with HELIX or SHEET for the moment this might be a solution > and somehow parse them into meaningful results. what is meaningful? :-) - in case you want to add the data to the AminoAcid objects, you might want to have a look at the PDB file parser and write a patch that reads the lines and stores the data in the amino acids. let me know if you need help. Cheers, Andreas ----------------------------------------------------------------------- Andreas Prlic Wellcome Trust Sanger Institute Hinxton, Cambridge CB10 1SA, UK +44 (0) 1223 49 6891 From mark.schreiber at novartis.com Tue Feb 7 21:15:46 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Feb 7 21:11:42 2006 Subject: [Biojava-l] BioJava News site Message-ID: Dear subscribers - BioJava has a new news and blog site based on WordPress. It can be found at http://biojava.open-bio.org/news/ I have copied some of the more recent news items over and added a few new ones. Feel free to subscirbe and or contribute. All major biojava announcements will be posted and archived here in future. Thanks, - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From mark.schreiber at novartis.com Tue Feb 7 22:06:03 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Feb 7 22:02:00 2006 Subject: [Biojava-l] BioJava wiki Message-ID: Dear BioJava users. The biojava wiki website has been up for a few days at http://biojava.open-bio.org/wiki/Main_Page. Thanks to the amazing efforts of a few early volunteers almost all of the content of the old site has been transfered to the new page. I think that the best part is that now it is wiki based it will be much easier for the biojava community to contribute by adding new material updating old material and fixing mistakes as they find them. This should make the site much more up to date and informative than it has been in the past. We are currently lacking a logo. We have a few suggestions at http://biojava.open-bio.org/wiki/BioJava:Logo. If you are an artistic type we would love to see your contributions. If you more of a critic add your comments about what you like and don't like. After a couple of weeks we will decide (somehow) on something official. As I'm based in Singapore I cannot guarentee the selection process will be entirely transparent : ) Your contributions are welcome. - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From mheusel at gmail.com Wed Feb 8 02:06:46 2006 From: mheusel at gmail.com (Martin Heusel) Date: Wed Feb 8 03:53:23 2006 Subject: [Biojava-l] Newbie struggling with secondary structure In-Reply-To: <43E782E5.6090502@tuks.co.za> References: <43E782E5.6090502@tuks.co.za> Message-ID: <6127fc200602072306m7820bd12m@mail.gmail.com> Hi Tjaart, you can use DSSP to determine the secondary structure from a PDB. http://swift.cmbi.ru.nl/gv/dssp/ or maybe better use Christoph's SecondaryStructure_Predictor with biojava http://www.charite.de/bioinf/strap/biojavaInAnger_SecondaryStructure_Predictor.html bye Martin From hotafin at gmail.com Wed Feb 8 08:33:24 2006 From: hotafin at gmail.com (Tamas Horvath) Date: Wed Feb 8 09:23:24 2006 Subject: [Biojava-l] Re: structureNMRImpl In-Reply-To: References: Message-ID: I forgot to mention, that I use a not yet published modified version ofStructureImpl, which can parse an ArrayList. It has a newconstructor for it, and the old BufferedReader method was changed so itgenerates the ArrayList for the parser method... On 2/8/06, Tamas Horvath wrote:>> Hi!> The structureNMRImpl class, I've been working on is finally working. It is> far from ready to be incorporated into BioJava just yet.> It's a bit messy, lacks documentation, there are some naming convention> issues with it, nevertheless I'd be happy to hear suggestions about it.> In theory in the future it should use the same Structure interface as> StructureImpl. At least that's my aim.> Anyway... Tell me what u think!>> From hotafin at gmail.com Wed Feb 8 10:53:12 2006 From: hotafin at gmail.com (Tamas Horvath) Date: Wed Feb 8 11:49:01 2006 Subject: [Biojava-l] warnings, errors, comments Message-ID: > What do u think about making a givewarning flag for PDBFileParser? that would be nice - I would suggest to do it using the java -loggingtool - - actually that applies to all of biojava and should be suggested tothe list -I find java.util.logging very helpful. > By default it would be true, but parsing could be invoked so that it> would give no warnings or comments. this can be done by setting log levels.level.severe, level.warning, level.info, level.finest, etc. From joel at macresearcher.com Thu Feb 9 21:15:50 2006 From: joel at macresearcher.com (Joel Dudley) Date: Thu Feb 9 22:02:57 2006 Subject: [Biojava-l] MacResearch announces iPod giveaway contest Message-ID: Help MacResearch.org expand its Script Repository and you could win a black 2GB iPod Nano. Eligible contestants must submit a research- oriented script that can run natively (no emulators) on Mac OS X 10.3 or higher without modification before the contest end date. Scripts for all scientific domains are welcome including scripts written for High Performance Computing (grid, cluster, etc) setup and management. If your script does not meet the aforementioned criteria then you will not be eligible to win the iPod Nano. Winners will be chosen by random drawing. The contest begins 2/8/2006 and ends 2/28/2006. The ultimate goal of this contest, and the script repository in general, is to create a valuable community resource that can be used to benefit endeavors in research and education. Please don't be shy about your coding style or lack of documentation. Your script will make someone's life easier. MacResearch.org is the premier, non- profit community for scientists using Mac OS X and related hardware in their research. To learn more about MacResearch.org and the MacResearch.org Script Repository visit http://www.macresearch.org and http://www.macresearch.org/script_repository. For official contest rules see http://www.macresearch.org/ipod_contest From toddri at eden.rutgers.edu Thu Feb 9 22:42:44 2006 From: toddri at eden.rutgers.edu (Todd Riley) Date: Thu Feb 9 22:57:41 2006 Subject: [Biojava-l] Is FullHmmerProfileHMM (or FlatModel) Broken? Message-ID: <43EC0BB4.9010105@eden.rutgers.edu> Hello again, I have attempted to move up from the ProfileHMM class to the HMMER classes (FullHmmerProfileHMM and HmmerProfileHMM). However, I immediately get an error when attempting to create the DP matrix passing in the FullHmmerProfileHMM object. java.lang.ClassCastException: org.biojava.bio.dp.SimpleDotState at org.biojava.bio.dp.FlatModel.(FlatModel.java:185) at org.biojava.bio.dp.DP.flatView(DP.java:169) at org.biojava.bio.dp.DPFactory$DefaultFactory.createDP(DPFactory.java:52) at BioJavaHMM.trainHmm(BioJavaHMM.java:840) Here is the code from FlatModel.java in the ModelInState section: if(t instanceof DotState) { DotStateWrapper dsw = new DotStateWrapper(t); addAState(dsw); inModel.put(t, flatM); toM.put(t, dsw); toM.put(((Wrapper) t).getWrapped(), dsw); <-------------line 185!!!!!!!!! //System.out.println("Added wrapped dot state " + dsw.getName()); } else if(t instanceof EmissionState) { This code is a bit confusing, but t appears to be of type SimpleDotState, which I do not believe can be cast to type Wrapper. Also, should both lines 184 and 185 be executed? Also, I found this in the source code as well: // // FIXME -- Matthew broked this... <--------line 243!!!!!!!!! // Does this mean that some functionality of FlatModel.java is broken? Should the ModelInState (and thus the Hmmer classes) be avoided? Any help would be greatly appreciated, Todd From toddri at eden.rutgers.edu Thu Feb 9 23:02:21 2006 From: toddri at eden.rutgers.edu (Todd Riley) Date: Thu Feb 9 22:58:03 2006 Subject: [Biojava-l] Looking for a RandomAccessFile-like class for sequences Message-ID: <43EC104D.4030202@eden.rutgers.edu> Hello, I am looking for a RandomAccessFile-like class that can read small, arbitrary chunks of a very large (like 250K of human DNA) fasta file. (UCSC chromosomal fasta files contain just 1 sequence for the whole chromosome). I was hoping that someone may have already written a class that will take in an alphabet and a range (maybe in the form of a RangeLocation object) and will return the sequence in that range from the file. I would hate to spend time re-inventing a wheel that may already exist. Thanks, Todd From mark.schreiber at novartis.com Fri Feb 10 01:48:03 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Fri Feb 10 01:43:56 2006 Subject: [Biojava-l] Looking for a RandomAccessFile-like class for sequences Message-ID: Hello Todd, This sounds a bit like what the biojava BioIndex code does. You make be able to use that. - Mark Todd Riley Sent by: biojava-l-bounces@portal.open-bio.org 02/10/2006 12:02 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Looking for a RandomAccessFile-like class for sequences Hello, I am looking for a RandomAccessFile-like class that can read small, arbitrary chunks of a very large (like 250K of human DNA) fasta file. (UCSC chromosomal fasta files contain just 1 sequence for the whole chromosome). I was hoping that someone may have already written a class that will take in an alphabet and a range (maybe in the form of a RangeLocation object) and will return the sequence in that range from the file. I would hate to spend time re-inventing a wheel that may already exist. Thanks, Todd _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From dreher at mpiib-berlin.mpg.de Fri Feb 10 12:55:28 2006 From: dreher at mpiib-berlin.mpg.de (Felix Dreher) Date: Fri Feb 10 13:01:18 2006 Subject: [Biojava-l] BioJavaX-Hibernate: Namespace problem Message-ID: <43ECD390.1010405@mpiib-berlin.mpg.de> Hello, I tried to create different virtual BioSQL-databases for the storage of different types of sequences. For testing purposes, I created and saved a new Namespace called 'mRNA'. I didn't find out though, how to save a newly created sequence inside this namespace. I tried the following code block: Namespace nsp = new SimpleNamespace("mRNA"); session.saveOrUpdate("Namespace", nsp); RichSequenceDB db = new BioSQLRichSequenceDB("mRNA", session); RichSequence seq = RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","test")); db.addRichSequence(seq); tx.commit(); The Namespace and the sequence are actually being saved in the database, but the sequence is saved in the default namespace 'lcl' and not in the new namespace 'mRNA'. Can someone tell me what I'm missing here? Thanks in advance, Felix -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 From tjaart at tuks.co.za Sun Feb 12 11:51:57 2006 From: tjaart at tuks.co.za (Tjaart de Beer) Date: Sun Feb 12 11:44:36 2006 Subject: [Biojava-l] Newbie struggling with secondary structure In-Reply-To: <6127fc200602072306m7820bd12m@mail.gmail.com> References: <43E782E5.6090502@tuks.co.za> <6127fc200602072306m7820bd12m@mail.gmail.com> Message-ID: <43EF67AD.3030307@tuks.co.za> Hi Thanks for all the suggstions. Currently I just want to extract the secondary structure as specified in the PDB file. I am having trouble understanding how to utilize the AminoAcid class (after having looked at the source...). Does anyone have an example of extracting the secondary structure from a PDB file using the AminoAcid class in Biojava? Or any example using the AminoAcid class to extract info from a PDB file? Any help would be greatly appreciated! Martin Heusel wrote: > Hi Tjaart, > you can use DSSP to determine the secondary structure from a PDB. > http://swift.cmbi.ru.nl/gv/dssp/ > or maybe better use Christoph's SecondaryStructure_Predictor with biojava > http://www.charite.de/bioinf/strap/biojavaInAnger_SecondaryStructure_Predictor.html > bye > Martin > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > -- Tjaart de Beer --------- The software required "Windows XP or better" ... so I installed Linux From mark.schreiber at novartis.com Sun Feb 12 20:15:44 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Feb 12 20:11:34 2006 Subject: [Biojava-l] BioJavaX-Hibernate: Namespace problem Message-ID: Hello - When you make a Sequence with DNATools it is not Rich and therefore has no namespace. When you enrich it biojava will give it the default namespace 'lcl' or local. Thus when you add it to the DB you get it added under the lcl namespace. I would make a new SimpleRichSequence instead. Then you can specify it's namespace. - Mark Felix Dreher Sent by: biojava-l-bounces@portal.open-bio.org 02/11/2006 01:55 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] BioJavaX-Hibernate: Namespace problem Hello, I tried to create different virtual BioSQL-databases for the storage of different types of sequences. For testing purposes, I created and saved a new Namespace called 'mRNA'. I didn't find out though, how to save a newly created sequence inside this namespace. I tried the following code block: Namespace nsp = new SimpleNamespace("mRNA"); session.saveOrUpdate("Namespace", nsp); RichSequenceDB db = new BioSQLRichSequenceDB("mRNA", session); RichSequence seq = RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","test")); db.addRichSequence(seq); tx.commit(); The Namespace and the sequence are actually being saved in the database, but the sequence is saved in the default namespace 'lcl' and not in the new namespace 'mRNA'. Can someone tell me what I'm missing here? Thanks in advance, Felix -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Mon Feb 13 03:34:06 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Feb 13 03:29:50 2006 Subject: [Biojava-l] easy createRichSequence methods Message-ID: Hi - To make it easier to create a RichSequence I have added several overloaded createRichSequence(...) methods to RichSequence.Tools. These are similar to the createSequence methods found in DNATools and RNATools and optionally allow you to specify the namespace as either a String or a Namespace object. Now available in CVS. - Mark From martin.eklund at farmbio.uu.se Mon Feb 13 11:06:38 2006 From: martin.eklund at farmbio.uu.se (Martin Eklund) Date: Mon Feb 13 11:34:44 2006 Subject: [Biojava-l] Persist SingleDP object Message-ID: <1139846798.8057.47.camel@pele> Hi, I'm wondering if there is some way of persisting SingleDP objects? As I see it, serialization requires quite a lot of rewriting...or? Is there another way? Thank you! Martin. -- ======================================== Martin Eklund PhD Student Department of Pharmaceutical Biosciences Uppsala University, Sweden Ph: +46-18-4714281 ======================================== From mthomasc at vub.ac.be Mon Feb 13 15:36:59 2006 From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER) Date: Mon Feb 13 15:54:06 2006 Subject: [Biojava-l] Genbank parser error [biojavax] Message-ID: <43F0EDEB.1010801@vub.ac.be> Hello, I have tried biojavax today with a view to use the Genbank file parser. My test file is a Genbank formatted file which has been produced by Ensembl export system. The head of the file is as follow : LOCUS 6 489671 bp DNA HTG 13-FEB-2006 DEFINITION Mus musculus chromosome 6 NCBIM34 partial sequence 52296503..52786173 reannotated via EnsEMBL ACCESSION chromosome:NCBIM34:6:52296503:52786173:1 VERSION chromosome:NCBIM34:6:52296503:52786173:1 I used the code provided in biojavax docbook to parse this file. I get the following error : Exception in thread "main" org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) at org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31) Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line found: 6 489671 bp DNA HTG 13-FEB-2006 at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) ... 1 more I had a look at GenbankFormat.java, and I guess the problem comes from the regular expression that do not recognize the LOCUS as a standard Genbank file LOCUS tag. Am I wrong ? Have biojavax Genbank parser been tested on Ensembl exported files ? Morgane. -- ************************************* Morgane THOMAS-CHOLLIER, PHD Student Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium From hotafin at gmail.com Mon Feb 13 13:05:38 2006 From: hotafin at gmail.com (Tamas Horvath) Date: Mon Feb 13 16:44:25 2006 Subject: [Biojava-l] Newbie struggling with secondary structure In-Reply-To: References: <43E782E5.6090502@tuks.co.za> <6127fc200602072306m7820bd12m@mail.gmail.com> <43EF67AD.3030307@tuks.co.za> <43F04B82.7060409@tuks.co.za> Message-ID: Hi!I have the needed modifications of PDBFileParser to read secondary structuredata.The code is not final, there maybe some changes before it is added to cvs.(And I need Andreas to be around for that... )But if you need the file right now, I can send it to you, and you canrecompile your biojava. alternatively I can send u my compiledbiojava.jaras well. It's not the most current cvs, but not too lateeither... On 2/13/06, Tamas Horvath wrote:>> You are right.... the secondary structure is not yet parsed. But it's> quite easy, so if Andreas is around today, we may add the needed code...>> On 2/13/06, Tjaart de Beer wrote:> >> > Thanks for help! But I still have problems. Here is my code, please see> > if you can find anything wrong....> >> >> > import org.biojava.bio.structure.*;> > import org.biojava.bio.structure.io.*;> > .> > .> > .> > PDBFileReader file = new PDBFileReader();> > Structure structure = file.getStructure (filename); //I used 1eye.pdb> > Chain chain = structure.getChain(0); //Only get 1st chain> > ArrayList s = chain.getGroups("amino"); //Get amino acids> > AminoAcidImpl a = (AminoAcidImpl)s.get(28); //Make object of element 28> >> > Map secStruc = a.getSecStruc(); -> This returns an empty map...> > ???> >> >> > My chain variable contains the specifed chain and I can get a specific> > value for element 28. When I check the class of a element in the> > ArrayList it says "class org.biojava.bio.structure.AminoAcidImpl". Thus> > I now want to use the getSecStruc method on that element. Typing> > "System.out.println(s.get(28).getSecStruc())" does not give me anyhting> > but an empty array.> >> > Any help would be appreciated...> >> >> > Tamas Horvath wrote:> > > It's very easy really. Especially if you use cvs BioJava.> > >> > > You parse a PDB file using PDBFileParser.> > > That gives you a structure object.> > > Than you need to iterate through all models (if there are more than 1)> > > and all chains in them.> > > Once you have a chain object, you can iterate through it.> > > for example you can say findChain("A")> > > that gives you the "A" chain.> > > then you can say getGroups("amino").> > > That gives you a list of the aminoacids.> > > And every aminoacid object has a secondary structure attribute.> > >> > > On 2/12/06, *Tjaart de Beer* > > > wrote:> > >> > > Hi> > >> > > Thanks for all the suggstions. Currently I just want to extract> > the> > > secondary structure as specified in the PDB file. I am having> > trouble> > > understanding how to utilize the AminoAcid class (after having> > > looked at> > > the source...). Does anyone have an example of extracting the> > secondary> > > structure from a PDB file using the AminoAcid class in Biojava? Or> > any> > > example using the AminoAcid class to extract info from a PDB file?> >> > >> > > Any help would be greatly appreciated!> > >> > > Martin Heusel wrote:> > > > Hi Tjaart,> > > > you can use DSSP to determine the secondary structure from a> > PDB.> > > > http://swift.cmbi.ru.nl/gv/dssp/ <> > http://swift.cmbi.ru.nl/gv/dssp/>> > > > or maybe better use Christoph's SecondaryStructure_Predictor> > with> > > biojava> > > >> > > http://www.charite.de/bioinf/strap/biojavaInAnger_SecondaryStructure_Predictor.html> >> > > <> > http://www.charite.de/bioinf/strap/biojavaInAnger_SecondaryStructure_Predictor.html> > >> > > > bye> > > > Martin> > > > _______________________________________________> > > > Biojava-l mailing list - Biojava-l@biojava.org> > > > > > > http://biojava.org/mailman/listinfo/biojava-l> > > >> > >> > > --> > > Tjaart de Beer> > >> > >> > > ---------> > > The software required "Windows XP or better" ... so I installed> > Linux> > > _______________________________________________> > > Biojava-l mailing list - Biojava-l@biojava.org> > > > > > http://biojava.org/mailman/listinfo/biojava-l> > >> > >> >> > --> > Tjaart de Beer> > Bioinformatics and Computational Biology Unit> > Department Biochemistry> > FABI Square/Bioinformatics building> > Faculty of Natural Sciences> > University of Pretoria> > Lynwood rd> > Pretoria> > South Africa> > 0001> >> > Tel: +27 12 420 5802> > Cell: +27 83 504 7914> > Fax: +27 12 420 5800> > Email: tjaart@tuks.co.za> > tdebeer@gmail.com> >> > ---------> > The software required "Windows XP or better" ... so I installed Linux> >>> From mark.schreiber at novartis.com Mon Feb 13 20:11:07 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Feb 13 20:06:53 2006 Subject: [Biojava-l] Genbank parser error [biojavax] Message-ID: Hi Morgane - I have to say that doesn't look much like Genbank : ) The biojavax parser are possibly a bit brittle due to their use of regexps to recognize key elements. It should be fixable, I think the problem is that the parser expects a word after LOCUS not a number. This may not be the only problem though. Could you post the entire file? Or if it is large then a representative file of smaller size. - Mark Morgane THOMAS-CHOLLIER Sent by: biojava-l-bounces@portal.open-bio.org 02/14/2006 04:36 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Genbank parser error [biojavax] Hello, I have tried biojavax today with a view to use the Genbank file parser. My test file is a Genbank formatted file which has been produced by Ensembl export system. The head of the file is as follow : LOCUS 6 489671 bp DNA HTG 13-FEB-2006 DEFINITION Mus musculus chromosome 6 NCBIM34 partial sequence 52296503..52786173 reannotated via EnsEMBL ACCESSION chromosome:NCBIM34:6:52296503:52786173:1 VERSION chromosome:NCBIM34:6:52296503:52786173:1 I used the code provided in biojavax docbook to parse this file. I get the following error : Exception in thread "main" org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) at org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31) Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line found: 6 489671 bp DNA HTG 13-FEB-2006 at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) ... 1 more I had a look at GenbankFormat.java, and I guess the problem comes from the regular expression that do not recognize the LOCUS as a standard Genbank file LOCUS tag. Am I wrong ? Have biojavax Genbank parser been tested on Ensembl exported files ? Morgane. -- ************************************* Morgane THOMAS-CHOLLIER, PHD Student Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Mon Feb 13 20:45:11 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Feb 13 20:40:56 2006 Subject: [Biojava-l] Persist SingleDP object Message-ID: Hi Martin - You could try the XmlMarkovModel class. It has readModel and writeModel to write markov models as XML. I have used this successfully for models in the past. - Mark Martin Eklund Sent by: biojava-l-bounces@portal.open-bio.org 02/14/2006 12:06 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Persist SingleDP object Hi, I'm wondering if there is some way of persisting SingleDP objects? As I see it, serialization requires quite a lot of rewriting...or? Is there another way? Thank you! Martin. -- ======================================== Martin Eklund PhD Student Department of Pharmaceutical Biosciences Uppsala University, Sweden Ph: +46-18-4714281 ======================================== _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From crackeur at comcast.net Mon Feb 13 20:47:42 2006 From: crackeur at comcast.net (Jimmy Zhang) Date: Mon Feb 13 20:53:07 2006 Subject: [Biojava-l] [ANN] VTD-XML Version 1.5 Released References: Message-ID: <006501c63108$a6979850$0d02a8c0@ximpleware> [ANN] VTD-XML Version 1.5 Released Eight years after the invention of XML, DOM and SAX, despite their respective issues, are still the mainstays of application developers. So is it the end of road for XML parsing innovation? The VTD-XML project team think not. We are proud to announce the availability of both C and Java version 1.5 of VTD-XML, the next generation open-source XML parser that goes beyond DOM and SAX in terms of performance, memory usage and ease of use. The technical highlights of VTD-XML are: * Performance: the world's fastest XML parser, between 5x~10x faster than DOM * Memory Usage: 3x to 5x less than DOM, 1.3x~1.5x XML document size * Random access with built-in XPath support * A simple and intuitive API Other advanced features include: * Buffer reuse * Large document support (2GByte) * Incremental update * Hardware acceleration * Native XML indexing. For demos, latest benchmarks, related articles and software downloads, please visit http://vtd-xml.sf.net. Also let us know your thoughts and suggestions and help us improve VTD-XML. From mark.schreiber at novartis.com Tue Feb 14 04:45:07 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Feb 14 04:41:10 2006 Subject: [Biojava-l] Contributers Message-ID: Hi all - As you will know we are moving biojava to a wiki based format. On the BioJava community page I am trying to extend the list of people who contributed to the project in someway. Currently the list is very incomplete. If you have made any kind of contribution in the past please add yourself to the list. It's simple enough to do. Follow these instructions. 1. Go to http://biojava.open-bio.org/wiki/BioJava:Community_Portal 2. Click the edit button next to the contributers heading (you might be prompted to login, just give yourself a username and password and hey presto your editing) 3. add a link for yourself eg, * [[Joe Bloggs|Joe Bloggs]] and save the page. 4. Joe Bloggs will now appear as a red link on the page. Click the link and start adding information about yourself and what you did (do) with biojava. 5. add [[Category:People]] to the bottom of the page 6. Save the page. 7. By default you will have a [[User:MyUserName]] page. In this you should put #REDIRECT [[Joe Bloggs]] Why should you do this? 1. Because you can. 2. It's really the only way people who contribute to biojava get any credit at all for their contributions to humanity. 3. It would be really great to keep some kind of record of who contributed what and how many people have contributed to biojava. This will be essential if we ever publish. Don't make me look through the @author tags! - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From mthomasc at vub.ac.be Tue Feb 14 08:33:02 2006 From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER) Date: Tue Feb 14 08:28:58 2006 Subject: [Biojava-l] Genbank parser error [biojavax] In-Reply-To: References: Message-ID: <43F1DC0E.7050809@vub.ac.be> Hello Mark, My file is indeed too large to be posted. So I have exported a smaller sequence from Ensembl that I tested with the parser. The behavior is the same. You will find below this "Genbank" formatted file enclosed. Thanks for your help, Morgane. LOCUS 6 3498 bp DNA HTG 14-FEB-2006 DEFINITION Mus musculus chromosome 6 NCBIM34 partial sequence 52305503..52309000 reannotated via EnsEMBL ACCESSION chromosome:NCBIM34:6:52305503:52309000:1 VERSION chromosome:NCBIM34:6:52305503:52309000:1 KEYWORDS . SOURCE House mouse ORGANISM Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muridae; Murinae; Mus. COMMENT This sequence was annotated by the Ensembl system. Please visit the Ensembl web site, http://www.ensembl.org/ for more information. COMMENT All feature locations are relative to the first (5') base of the sequence in this file. The sequence presented is always the forward strand of the assembly. Features that lie outside of the sequence contained in this file have clonal location coordinates in the format: .:.. COMMENT The /gene indicates a unique id for a gene, /note="transcript_id=..." a unique id for a transcript, /protein_id a unique id for a peptide and note="exon_id=..." a unique id for an exon. These ids are maintained wherever possible between versions. COMMENT All the exons and transcripts in Ensembl are confirmed by similarity to either protein or cDNA sequences. FEATURES Location/Qualifiers source 1..3498 /organism="Mus musculus" /db_xref="taxon:10090" gene complement(506..2826) /gene=ENSMUSG00000014704 mRNA join(complement(2261..2826),complement(506..1620)) /gene="ENSMUSG00000014704" /note="transcript_id=ENSMUST00000014848" CDS join(complement(2261..2639),complement(881..1620)) /gene="ENSMUSG00000014704" /protein_id="ENSMUSP00000014848" /note="transcript_id=ENSMUST00000014848" /db_xref="MarkerSymbol:Hoxa2" /db_xref="Uniprot/SWISSPROT:HXA2_MOUSE" /db_xref="RefSeq_peptide:NP_034581.1" /db_xref="RefSeq_dna:NM_010451.1" /db_xref="Uniprot/SPTREMBL:Q3UYP9_MOUSE" /db_xref="Uniprot/SPTREMBL:Q920T7_MOUSE" /db_xref="Uniprot/SPTREMBL:Q920T9_MOUSE" /db_xref="Uniprot/SPTREMBL:Q920U0_MOUSE" /db_xref="Uniprot/SPTREMBL:Q920U1_MOUSE" /db_xref="Uniprot/SPTREMBL:Q920U2_MOUSE" /db_xref="Uniprot/SPTREMBL:Q920U3_MOUSE" /db_xref="Uniprot/SPTREMBL:Q920U4_MOUSE" /db_xref="Uniprot/SPTREMBL:Q920U5_MOUSE" /db_xref="EntrezGene:15399" /db_xref="AgilentProbe:A_51_P501803" /db_xref="EMBL:AB039184" /db_xref="EMBL:AB039185" /db_xref="EMBL:AB039186" /db_xref="EMBL:AB039187" /db_xref="EMBL:AB039188" /db_xref="EMBL:AB039189" /db_xref="EMBL:AB039190" /db_xref="EMBL:AB039191" /db_xref="EMBL:AB039192" /db_xref="EMBL:AK134501" /db_xref="EMBL:M87801" /db_xref="EMBL:M93148" /db_xref="EMBL:M93292" /db_xref="EMBL:M95599" /db_xref="GO:GO:0003700" /db_xref="GO:GO:0005634" /db_xref="GO:GO:0006355" /db_xref="GO:GO:0007275" /db_xref="IPI:IPI00132242.1" /db_xref="UniGene:Mm.131" /db_xref="protein_id:AAA37827.1" /db_xref="protein_id:AAA37834.1" /db_xref="protein_id:AAA37835.1" /db_xref="protein_id:AAA37836.1" /db_xref="protein_id:BAB68708.1" /db_xref="protein_id:BAB68709.1" /db_xref="protein_id:BAB68710.1" /db_xref="protein_id:BAB68711.1" /db_xref="protein_id:BAB68712.1" /db_xref="protein_id:BAB68713.1" /db_xref="protein_id:BAB68714.1" /db_xref="protein_id:BAB68715.1" /db_xref="protein_id:BAB68716.1" /db_xref="protein_id:BAE22163.1" /db_xref="AFFY_MG_U74Av2:102643_at" /db_xref="AFFY_MG_U74Cv2:171063_at" /db_xref="AFFY_Mouse430A_2:1419602_at" /db_xref="AFFY_Mouse430_2:1419602_at" /translation="MNYEFEREIGFINSQPSLAECLTSFPPVADTFQSSSIKTSTLSH STLIPPPFEQTIPSLNPGSHPRHGAGVGGRPKSSPAGSRGSPVPAGALQPPEYPWMKE KKAAKKTALPPAAASTGPACLGHKESLEIADGSGGGSRRLRTAYTNTQLLELEKEFHF NKYLCRPRRVEIAALLDLTERQVKVWFQNRRMKHKRQTQCKENQNSEGKFKNLEDSDK VEEDEEEKSLFEQALSVSGALLEREGYTFQQNALSQQQAPNGHNGDSQTFPVSPLTSN EKNLKHFQHQSPTVPNCLSTMGQNCGAGLNNDSPEAIEVPSLQDFNVFSTDSCLQLSD ALSPSLPGSLDSPVDISADSFDFFTDTLTTIDLQHLNY" exon complement(506..1620) /note="exon_id=ENSMUSE00000387033" exon complement(2261..2826) /note="exon_id=ENSMUSE00000193269" BASE COUNT 938 a 815 c 882 g 863 t ORIGIN 1 AGGAAGAGTT GGAACGTAGA TGTTTGAAAC AAATGTGTAT AAATAAATGA ATTTTTGATA 61 ACTCCGTTAT TGACCTAGAA ACTAGCAGCT TGGTAAGGGA ACTCCATTCC ACTCCACTCG 121 TCCTAGAACT GGAAGTTTTT GTAGGCACTT TTCCTCTCCA CACTCAAAAG CTTGGGCTAG 181 GGCCAACTCA GGCTGCCCAA GCCCATTTCT ATTACTAATG TAACTCTATG GCCTGAGTCT 241 CAACACTGAA AACCAAATTC ATTCCCTTAG GGGGGAAAAA TCCAAAAAAA AAAAAAAAAA 301 AAGTCTTGCC AGAAGCCCTA GCACTTTCTG GTTTTCTTCT TTGTTGCTGT TTGTTGCAGG 361 CTTTGAACAT GCCACCCTAA TAAAATATAT TAAGATTGAA AAGTAAATTG TGACCAGACT 421 TTTATTTACC ATGTTAGACT AAAAGAAGTA TAAGAAATCA GTATGAGTCT TGAGAAAGAG 481 GGGAAGAAAA AAATAAGAAA GCTACTTATA GCAAAGGAGA ATTTATTCTA CCAAAAATAC 541 GCATGACAAT GCATTCTAAT GTGGTACAAA AATAAACAGA AAGTGACAAG ACAATTTATG 601 GTCACTTTCT TGCAGGCCTC CTGTTTTGTT TTTCAGGAAA ATCACATAGA AGCTTGTTGG 661 GTTCTGTGTA AAAACCACTT AGAACGCCAA CATAATTTGC AAGAGATGGC TTTAAAACTG 721 TGTCAGGGGA GAACATTAAA CGGAAAGTCC TCAACATTTG AGAGAGTAGG GGTAGATCAA 781 GAAGAAACTA AAACGAAAAT CAACTCCCAG AATAAAAGAA GGCAAAGCCA CCTGGTCAAA 841 GGCGTTTTGT TTTGTGAAGC TTTGTTTTGC TTTAATGTTC TTAGTAATTC AGATGCTGTA 901 GGTCGATTGT GGTGAGTGTG TCTGTAAAAA AGTCAAAGCT GTCAGCTGAG ATATCTACAG 961 GACTGTCCAG GGAGCCAGGC AAGCTGGGCG ACAGTGCATC TGAAAGCTGC AGGCAGGAAT 1021 CTGTGGAGAA AACATTGAAG TCCTGCAAAG AGGGGACCTC GATGGCCTCG GGACTGTCAT 1081 TGTTTAGGCC AGCTCCACAG TTCTGGCCCA TTGTTGACAA GCAGTTAGGA ACAGTGGGTG 1141 ACTGGTGCTG AAAATGTTTC AAATTTTTCT CATTGCTGGT TAAAGGCGAA ACTGGGAAAG 1201 TTTGGGAGTC GCCATTGTGT CCATTGGGAG CCTGCTGTTG AGAGAGCGCA TTTTGCTGAA 1261 AAGTGTACCC TTCCCTCTCC AGAAGGGCCC CGGAGACACT GAGGGCTTGC TCAAAGAGTG 1321 ACTTCTCTTC CTCGTCTTCC TCCACTTTGT CCGAGTCCTC CAGGTTTTTA AATTTCCCTT 1381 CGCTGTTTTG GTTCTCCTTG CACTGGGTTT GCCTCTTATG CTTCATTCTC CGGTTCTGAA 1441 ACCACACTTT CACTTGTCTC TCGGTCAAAT CCAGCAGCGC GGCGATTTCC ACCCTGCGGG 1501 GTCTGCAAAG GTACTTGTTG AAATGAAATT CCTTTTCCAG CTCCAAAAGC TGAGTGTTGG 1561 TGTACGCGGT TCTCAGACGC CTGGATCCCC CGCCGCTGCC ATCAGCTATT TCCAGGGATT 1621 CTGCAGAAAG GGAAACCAAC AAGAGACACA CATACAGTTG AAGGTGGAAG GGTCCGAGCA 1681 GGGTTATTCC ATTGGAGCAT AAATACAGCA GAAAAGATCA ACTGCAACAA AATGGCCGCC 1741 CCTGGATGCA GTGCAGCTAT TGTGCTGCCC TTCCTGGGAG CCCAGCCCGG GGAAGCCCAG 1801 TCTCTTCCAC CTCCATCAAA TTCCTGCCTG TGGCTTCCCC CAACCTCTTC ATCCGGGAGC 1861 AAACTTTATA TTAGCTACAA CACAATTTAT AATTAATGCA TCAGCTGCTT AGCTGAGCAA 1921 GAGCGGTCTA TCACTCTTCA TTACTGTCAA AAAGCCAAAC TCTAGGACAA CTAGACAAGA 1981 GGAGGTCAGT TCCAACTCAA ATAAATCATC CTACATTACA CAAGTTAGGG AAAGTGCCCC 2041 CCCTCCTCAA AATATATATG TCTCATTGTG GGACTCGGGA TCTATTTTCC CCTCCACCAA 2101 ACCCACTCCT GAGACCACAG GGGCATGAGA CCCGCCACCA GGCATCTCTC TCTCTCCCCC 2161 TTCCCTCGAA GCTCATGGTC CCCTCCCCCA CAACCGCTCC TAGGGAAGCC CGGAGGGGGA 2221 CAAGGGTCCC CGAGACCTGG GGCCAAGTCT CCGGACTGAC CTTTGTGGCC GAGGCAGGCA 2281 GGGCCCGTGG AGGCGGCGGC GGGCGGCAGC GCGGTTTTCT TGGCCGCCTT CTTCTCCTTC 2341 ATCCAGGGAT ACTCAGGCGG CTGCAGGGCG CCGGCAGGCA CCGGGCTGCC GCGACTGCCC 2401 GCGGGGCTCG ACTTGGGGCG GCCGCCAACG CCAGCGCCGT GGCGAGGGTG ACTGCCCGGG 2461 TTCAGGCTGG GAATGGTCTG CTCAAAAGGA GGAGGAATCA GTGTCGAGTG TGAAAGCGTC 2521 GAGGTCTTGA TTGATGAACT TTGAAATGTA TCAGCGACAG GGGGAAAAGA TGTCAGGCAC 2581 TCAGCGAGCG ACGGCTGGCT ATTGATAAAA CCAATCTCTC GCTCAAATTC GTAATTCATG 2641 GCCTTCTCCT TGGAGCCCCC TCGGAGGAAA AGTTCCCTCT TTTGGAGGGG CTTTGGGGGG 2701 GCAAGGCCCA GGAAAAAGGC GAGCGCGAAG GAAAAAAAAA TCTATCATAG AAGATCGCTG 2761 CTGGGGTGTT TTTTTTCTAA TTCACTGATT ACAGCCGTAT GGGGACCGCG CTACTATTAA 2821 ACTATTGAAT TCATGGAGAC AAGGTTGAAA TTGGACCGAA TTGGCTGTCA CATGATTGCT 2881 TCTGCCCAAT GACAATTTGG GCTTTAATCA AAAGAAGCCA CTGTCTGTTT GATTGATCCA 2941 AAAAAGTCAG AAAGGAACGC CTCATTGGGG GCCATCGAGG CTTTATTTAC ACTTTTTTTC 3001 AGGGCAAAAA TACATATATG TGGGTGTGGA TGGCAATGCC CCGGGAGTGC GTGGGGGGCG 3061 AGAGTGCCTG TTTGCCTCCT GATCTGCAAG GATCTAGTGT GCTCCCTGGA GTGTGTGTGT 3121 GAGTGTGTGC GTGTGAGCCC TGCTGCCGTC CCGCCAGTGG CTGCCCTCTG CCTCCCCCGC 3181 ACACTCCGCG CATTGTTTGG GACTGTCGGG AAGACGCCTC GCACCTCACA AATCATTTAA 3241 GCACCTCAGC CTGACGCCTG CAGTCATTAA CAAAGTAATC CATTAATCTT CAAAGTTTTG 3301 ACACCCCAGG GCCCTGCATC TCAGCCACAT AAGTTCTGCT AAGGCAAGAG AAAGGAGCAG 3361 AGTGGGAGAG AGAGAGGAGA GAGGGAGAGA GGGAGAGAGG GAGAGAGAGA GAGAGAGAGA 3421 GAGAGAGAGA GAGAGAGAGA GAGAGAATGA ATATTGGGGT TCACCTTTCC TCTTCCTCCT 3481 CTTTTTCCAA AATCAGTT // mark.schreiber@novartis.com wrote: >Hi Morgane - > >I have to say that doesn't look much like Genbank : ) > >The biojavax parser are possibly a bit brittle due to their use of regexps >to recognize key elements. It should be fixable, I think the problem is >that the parser expects a word after LOCUS not a number. This may not be >the only problem though. Could you post the entire file? Or if it is large >then a representative file of smaller size. > >- Mark > > > > > >Morgane THOMAS-CHOLLIER >Sent by: biojava-l-bounces@portal.open-bio.org >02/14/2006 04:36 AM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] Genbank parser error [biojavax] > > >Hello, > >I have tried biojavax today with a view to use the Genbank file parser. > >My test file is a Genbank formatted file which has been produced by >Ensembl export system. > >The head of the file is as follow : > >LOCUS 6 489671 bp DNA HTG 13-FEB-2006 >DEFINITION Mus musculus chromosome 6 NCBIM34 partial sequence > 52296503..52786173 reannotated via EnsEMBL >ACCESSION chromosome:NCBIM34:6:52296503:52786173:1 >VERSION chromosome:NCBIM34:6:52296503:52786173:1 > >I used the code provided in biojavax docbook to parse this file. >I get the following error : > >Exception in thread "main" org.biojava.bio.BioException: Could not read >sequence > at >org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) > at >org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31) >Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line found: >6 489671 bp DNA HTG 13-FEB-2006 > at >org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229) > at >org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) > ... 1 more > >I had a look at GenbankFormat.java, and I guess the problem comes from >the regular expression that do not recognize the LOCUS as a standard >Genbank file LOCUS tag. > >Am I wrong ? Have biojavax Genbank parser been tested on Ensembl >exported files ? > >Morgane. > > > -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student (mthomasc@vub.ac.be) Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium Tel : +32 2 629 15 22 ********************************************************** Stop Using Internet Explorer, choose FIREFOX ! http://emmanuel.clement.free.fr/navigateurs/comparatif.htm From mthomasc at vub.ac.be Wed Feb 15 03:56:53 2006 From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER) Date: Wed Feb 15 03:52:50 2006 Subject: [Biojava-l] Genbank parser error [biojavax] In-Reply-To: <43F1DC0E.7050809@vub.ac.be> References: <43F1DC0E.7050809@vub.ac.be> Message-ID: <43F2ECD5.1070605@vub.ac.be> Hello again, I have continued using the Genbank parser, but this time with Genbank files coming from NCBI :) I really appreciate the example from the documentation that converts a Genbank file into an EMBL file. I have to say, it is really easy to use. I nevertheless have a question concerning the Organism and Source tags. Indeed, it is clear in the documentation that they are ignored by the parser. But I do not really understand why. When I used the Genbank file of the accession numbers : AC147788 and DQ158013, I was unable to get the common name of the organism or use getNameHierarchy(), but I can get the taxon ID for both. Is there a way to get the common name of the organism, without using a remote call to the NCBI with the taxonID ? Thanks for your help, Morgane. Morgane THOMAS-CHOLLIER wrote: > Hello Mark, > > My file is indeed too large to be posted. > So I have exported a smaller sequence from Ensembl that I tested with > the parser. The behavior is the same. > You will find below this "Genbank" formatted file enclosed. > > Thanks for your help, > > Morgane. > > LOCUS 6 3498 bp DNA HTG 14-FEB-2006 > DEFINITION Mus musculus chromosome 6 NCBIM34 partial sequence > 52305503..52309000 reannotated via EnsEMBL > ACCESSION chromosome:NCBIM34:6:52305503:52309000:1 > VERSION chromosome:NCBIM34:6:52305503:52309000:1 > KEYWORDS . > SOURCE House mouse > ORGANISM Mus musculus > Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; > Euteleostomi; > Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; > Sciurognathi; Muridae; Murinae; Mus. > COMMENT This sequence was annotated by the Ensembl system. Please > visit the > Ensembl web site, http://www.ensembl.org/ for more > information. > COMMENT All feature locations are relative to the first (5') base > of the > sequence in this file. The sequence presented is always the > forward strand of the assembly. Features that lie outside > of the > sequence contained in this file have clonal location > coordinates in > the format: .:.. > COMMENT The /gene indicates a unique id for a gene, > /note="transcript_id=..." a unique id for a transcript, > /protein_id > a unique id for a peptide and note="exon_id=..." a unique > id for an > exon. These ids are maintained wherever possible between > versions. > COMMENT All the exons and transcripts in Ensembl are confirmed by > similarity to either protein or cDNA sequences. > FEATURES Location/Qualifiers > source 1..3498 > /organism="Mus musculus" > /db_xref="taxon:10090" > gene complement(506..2826) > /gene=ENSMUSG00000014704 > mRNA join(complement(2261..2826),complement(506..1620)) > /gene="ENSMUSG00000014704" > /note="transcript_id=ENSMUST00000014848" > CDS join(complement(2261..2639),complement(881..1620)) > /gene="ENSMUSG00000014704" > /protein_id="ENSMUSP00000014848" > /note="transcript_id=ENSMUST00000014848" > /db_xref="MarkerSymbol:Hoxa2" > /db_xref="Uniprot/SWISSPROT:HXA2_MOUSE" > /db_xref="RefSeq_peptide:NP_034581.1" > /db_xref="RefSeq_dna:NM_010451.1" > /db_xref="Uniprot/SPTREMBL:Q3UYP9_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920T7_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920T9_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U0_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U1_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U2_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U3_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U4_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U5_MOUSE" > /db_xref="EntrezGene:15399" > /db_xref="AgilentProbe:A_51_P501803" > /db_xref="EMBL:AB039184" > /db_xref="EMBL:AB039185" > /db_xref="EMBL:AB039186" > /db_xref="EMBL:AB039187" > /db_xref="EMBL:AB039188" > /db_xref="EMBL:AB039189" > /db_xref="EMBL:AB039190" > /db_xref="EMBL:AB039191" > /db_xref="EMBL:AB039192" > /db_xref="EMBL:AK134501" > /db_xref="EMBL:M87801" > /db_xref="EMBL:M93148" > /db_xref="EMBL:M93292" > /db_xref="EMBL:M95599" > /db_xref="GO:GO:0003700" > /db_xref="GO:GO:0005634" > /db_xref="GO:GO:0006355" > /db_xref="GO:GO:0007275" > /db_xref="IPI:IPI00132242.1" > /db_xref="UniGene:Mm.131" > /db_xref="protein_id:AAA37827.1" > /db_xref="protein_id:AAA37834.1" > /db_xref="protein_id:AAA37835.1" > /db_xref="protein_id:AAA37836.1" > /db_xref="protein_id:BAB68708.1" > /db_xref="protein_id:BAB68709.1" > /db_xref="protein_id:BAB68710.1" > /db_xref="protein_id:BAB68711.1" > /db_xref="protein_id:BAB68712.1" > /db_xref="protein_id:BAB68713.1" > /db_xref="protein_id:BAB68714.1" > /db_xref="protein_id:BAB68715.1" > /db_xref="protein_id:BAB68716.1" > /db_xref="protein_id:BAE22163.1" > /db_xref="AFFY_MG_U74Av2:102643_at" > /db_xref="AFFY_MG_U74Cv2:171063_at" > /db_xref="AFFY_Mouse430A_2:1419602_at" > /db_xref="AFFY_Mouse430_2:1419602_at" > > /translation="MNYEFEREIGFINSQPSLAECLTSFPPVADTFQSSSIKTSTLSH > > STLIPPPFEQTIPSLNPGSHPRHGAGVGGRPKSSPAGSRGSPVPAGALQPPEYPWMKE > > KKAAKKTALPPAAASTGPACLGHKESLEIADGSGGGSRRLRTAYTNTQLLELEKEFHF > > NKYLCRPRRVEIAALLDLTERQVKVWFQNRRMKHKRQTQCKENQNSEGKFKNLEDSDK > > VEEDEEEKSLFEQALSVSGALLEREGYTFQQNALSQQQAPNGHNGDSQTFPVSPLTSN > > EKNLKHFQHQSPTVPNCLSTMGQNCGAGLNNDSPEAIEVPSLQDFNVFSTDSCLQLSD > ALSPSLPGSLDSPVDISADSFDFFTDTLTTIDLQHLNY" > exon complement(506..1620) > /note="exon_id=ENSMUSE00000387033" > exon complement(2261..2826) > /note="exon_id=ENSMUSE00000193269" > BASE COUNT 938 a 815 c 882 g 863 t > ORIGIN > 1 AGGAAGAGTT GGAACGTAGA TGTTTGAAAC AAATGTGTAT AAATAAATGA > ATTTTTGATA > 61 ACTCCGTTAT TGACCTAGAA ACTAGCAGCT TGGTAAGGGA ACTCCATTCC > ACTCCACTCG > 121 TCCTAGAACT GGAAGTTTTT GTAGGCACTT TTCCTCTCCA CACTCAAAAG > CTTGGGCTAG > 181 GGCCAACTCA GGCTGCCCAA GCCCATTTCT ATTACTAATG TAACTCTATG > GCCTGAGTCT > 241 CAACACTGAA AACCAAATTC ATTCCCTTAG GGGGGAAAAA TCCAAAAAAA > AAAAAAAAAA > 301 AAGTCTTGCC AGAAGCCCTA GCACTTTCTG GTTTTCTTCT TTGTTGCTGT > TTGTTGCAGG > 361 CTTTGAACAT GCCACCCTAA TAAAATATAT TAAGATTGAA AAGTAAATTG > TGACCAGACT > 421 TTTATTTACC ATGTTAGACT AAAAGAAGTA TAAGAAATCA GTATGAGTCT > TGAGAAAGAG > 481 GGGAAGAAAA AAATAAGAAA GCTACTTATA GCAAAGGAGA ATTTATTCTA > CCAAAAATAC > 541 GCATGACAAT GCATTCTAAT GTGGTACAAA AATAAACAGA AAGTGACAAG > ACAATTTATG > 601 GTCACTTTCT TGCAGGCCTC CTGTTTTGTT TTTCAGGAAA ATCACATAGA > AGCTTGTTGG > 661 GTTCTGTGTA AAAACCACTT AGAACGCCAA CATAATTTGC AAGAGATGGC > TTTAAAACTG > 721 TGTCAGGGGA GAACATTAAA CGGAAAGTCC TCAACATTTG AGAGAGTAGG > GGTAGATCAA > 781 GAAGAAACTA AAACGAAAAT CAACTCCCAG AATAAAAGAA GGCAAAGCCA > CCTGGTCAAA > 841 GGCGTTTTGT TTTGTGAAGC TTTGTTTTGC TTTAATGTTC TTAGTAATTC > AGATGCTGTA > 901 GGTCGATTGT GGTGAGTGTG TCTGTAAAAA AGTCAAAGCT GTCAGCTGAG > ATATCTACAG > 961 GACTGTCCAG GGAGCCAGGC AAGCTGGGCG ACAGTGCATC TGAAAGCTGC > AGGCAGGAAT > 1021 CTGTGGAGAA AACATTGAAG TCCTGCAAAG AGGGGACCTC GATGGCCTCG > GGACTGTCAT > 1081 TGTTTAGGCC AGCTCCACAG TTCTGGCCCA TTGTTGACAA GCAGTTAGGA > ACAGTGGGTG > 1141 ACTGGTGCTG AAAATGTTTC AAATTTTTCT CATTGCTGGT TAAAGGCGAA > ACTGGGAAAG > 1201 TTTGGGAGTC GCCATTGTGT CCATTGGGAG CCTGCTGTTG AGAGAGCGCA > TTTTGCTGAA > 1261 AAGTGTACCC TTCCCTCTCC AGAAGGGCCC CGGAGACACT GAGGGCTTGC > TCAAAGAGTG > 1321 ACTTCTCTTC CTCGTCTTCC TCCACTTTGT CCGAGTCCTC CAGGTTTTTA > AATTTCCCTT > 1381 CGCTGTTTTG GTTCTCCTTG CACTGGGTTT GCCTCTTATG CTTCATTCTC > CGGTTCTGAA > 1441 ACCACACTTT CACTTGTCTC TCGGTCAAAT CCAGCAGCGC GGCGATTTCC > ACCCTGCGGG > 1501 GTCTGCAAAG GTACTTGTTG AAATGAAATT CCTTTTCCAG CTCCAAAAGC > TGAGTGTTGG > 1561 TGTACGCGGT TCTCAGACGC CTGGATCCCC CGCCGCTGCC ATCAGCTATT > TCCAGGGATT > 1621 CTGCAGAAAG GGAAACCAAC AAGAGACACA CATACAGTTG AAGGTGGAAG > GGTCCGAGCA > 1681 GGGTTATTCC ATTGGAGCAT AAATACAGCA GAAAAGATCA ACTGCAACAA > AATGGCCGCC > 1741 CCTGGATGCA GTGCAGCTAT TGTGCTGCCC TTCCTGGGAG CCCAGCCCGG > GGAAGCCCAG > 1801 TCTCTTCCAC CTCCATCAAA TTCCTGCCTG TGGCTTCCCC CAACCTCTTC > ATCCGGGAGC > 1861 AAACTTTATA TTAGCTACAA CACAATTTAT AATTAATGCA TCAGCTGCTT > AGCTGAGCAA > 1921 GAGCGGTCTA TCACTCTTCA TTACTGTCAA AAAGCCAAAC TCTAGGACAA > CTAGACAAGA > 1981 GGAGGTCAGT TCCAACTCAA ATAAATCATC CTACATTACA CAAGTTAGGG > AAAGTGCCCC > 2041 CCCTCCTCAA AATATATATG TCTCATTGTG GGACTCGGGA TCTATTTTCC > CCTCCACCAA > 2101 ACCCACTCCT GAGACCACAG GGGCATGAGA CCCGCCACCA GGCATCTCTC > TCTCTCCCCC > 2161 TTCCCTCGAA GCTCATGGTC CCCTCCCCCA CAACCGCTCC TAGGGAAGCC > CGGAGGGGGA > 2221 CAAGGGTCCC CGAGACCTGG GGCCAAGTCT CCGGACTGAC CTTTGTGGCC > GAGGCAGGCA > 2281 GGGCCCGTGG AGGCGGCGGC GGGCGGCAGC GCGGTTTTCT TGGCCGCCTT > CTTCTCCTTC > 2341 ATCCAGGGAT ACTCAGGCGG CTGCAGGGCG CCGGCAGGCA CCGGGCTGCC > GCGACTGCCC > 2401 GCGGGGCTCG ACTTGGGGCG GCCGCCAACG CCAGCGCCGT GGCGAGGGTG > ACTGCCCGGG > 2461 TTCAGGCTGG GAATGGTCTG CTCAAAAGGA GGAGGAATCA GTGTCGAGTG > TGAAAGCGTC > 2521 GAGGTCTTGA TTGATGAACT TTGAAATGTA TCAGCGACAG GGGGAAAAGA > TGTCAGGCAC > 2581 TCAGCGAGCG ACGGCTGGCT ATTGATAAAA CCAATCTCTC GCTCAAATTC > GTAATTCATG > 2641 GCCTTCTCCT TGGAGCCCCC TCGGAGGAAA AGTTCCCTCT TTTGGAGGGG > CTTTGGGGGG > 2701 GCAAGGCCCA GGAAAAAGGC GAGCGCGAAG GAAAAAAAAA TCTATCATAG > AAGATCGCTG > 2761 CTGGGGTGTT TTTTTTCTAA TTCACTGATT ACAGCCGTAT GGGGACCGCG > CTACTATTAA > 2821 ACTATTGAAT TCATGGAGAC AAGGTTGAAA TTGGACCGAA TTGGCTGTCA > CATGATTGCT > 2881 TCTGCCCAAT GACAATTTGG GCTTTAATCA AAAGAAGCCA CTGTCTGTTT > GATTGATCCA > 2941 AAAAAGTCAG AAAGGAACGC CTCATTGGGG GCCATCGAGG CTTTATTTAC > ACTTTTTTTC > 3001 AGGGCAAAAA TACATATATG TGGGTGTGGA TGGCAATGCC CCGGGAGTGC > GTGGGGGGCG > 3061 AGAGTGCCTG TTTGCCTCCT GATCTGCAAG GATCTAGTGT GCTCCCTGGA > GTGTGTGTGT > 3121 GAGTGTGTGC GTGTGAGCCC TGCTGCCGTC CCGCCAGTGG CTGCCCTCTG > CCTCCCCCGC > 3181 ACACTCCGCG CATTGTTTGG GACTGTCGGG AAGACGCCTC GCACCTCACA > AATCATTTAA > 3241 GCACCTCAGC CTGACGCCTG CAGTCATTAA CAAAGTAATC CATTAATCTT > CAAAGTTTTG > 3301 ACACCCCAGG GCCCTGCATC TCAGCCACAT AAGTTCTGCT AAGGCAAGAG > AAAGGAGCAG > 3361 AGTGGGAGAG AGAGAGGAGA GAGGGAGAGA GGGAGAGAGG GAGAGAGAGA > GAGAGAGAGA > 3421 GAGAGAGAGA GAGAGAGAGA GAGAGAATGA ATATTGGGGT TCACCTTTCC > TCTTCCTCCT > 3481 CTTTTTCCAA AATCAGTT > // > > > > > mark.schreiber@novartis.com wrote: > >> Hi Morgane - >> >> I have to say that doesn't look much like Genbank : ) >> >> The biojavax parser are possibly a bit brittle due to their use of >> regexps to recognize key elements. It should be fixable, I think the >> problem is that the parser expects a word after LOCUS not a number. >> This may not be the only problem though. Could you post the entire >> file? Or if it is large then a representative file of smaller size. >> >> - Mark >> >> >> >> >> >> Morgane THOMAS-CHOLLIER >> Sent by: biojava-l-bounces@portal.open-bio.org >> 02/14/2006 04:36 AM >> >> >> To: biojava-l@biojava.org >> cc: (bcc: Mark Schreiber/GP/Novartis) >> Subject: [Biojava-l] Genbank parser error [biojavax] >> >> >> Hello, >> >> I have tried biojavax today with a view to use the Genbank file parser. >> >> My test file is a Genbank formatted file which has been produced by >> Ensembl export system. >> >> The head of the file is as follow : >> >> LOCUS 6 489671 bp DNA HTG 13-FEB-2006 >> DEFINITION Mus musculus chromosome 6 NCBIM34 partial sequence >> 52296503..52786173 reannotated via EnsEMBL >> ACCESSION chromosome:NCBIM34:6:52296503:52786173:1 >> VERSION chromosome:NCBIM34:6:52296503:52786173:1 >> >> I used the code provided in biojavax docbook to parse this file. >> I get the following error : >> >> Exception in thread "main" org.biojava.bio.BioException: Could not >> read sequence >> at >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) >> >> at >> org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31) >> >> Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line >> found: 6 489671 bp DNA HTG 13-FEB-2006 >> at >> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229) >> >> at >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) >> >> ... 1 more >> >> I had a look at GenbankFormat.java, and I guess the problem comes >> from the regular expression that do not recognize the LOCUS as a >> standard Genbank file LOCUS tag. >> >> Am I wrong ? Have biojavax Genbank parser been tested on Ensembl >> exported files ? >> >> Morgane. >> >> >> > -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student (mthomasc@vub.ac.be) Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium From mark.schreiber at novartis.com Wed Feb 15 04:00:44 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Feb 15 03:56:32 2006 Subject: [Biojava-l] Genbank parser error [biojavax] Message-ID: Hi Morgane - Turned out to be a problem with a greedy regexp parsing the LOCUS tag. This is fixed in CVS. Let me know if something else is a problem. - Mark Morgane THOMAS-CHOLLIER Sent by: biojava-l-bounces@portal.open-bio.org 02/14/2006 09:33 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: Re: [Biojava-l] Genbank parser error [biojavax] Hello Mark, My file is indeed too large to be posted. So I have exported a smaller sequence from Ensembl that I tested with the parser. The behavior is the same. You will find below this "Genbank" formatted file enclosed. Thanks for your help, Morgane. LOCUS 6 3498 bp DNA HTG 14-FEB-2006 DEFINITION Mus musculus chromosome 6 NCBIM34 partial sequence 52305503..52309000 reannotated via EnsEMBL ACCESSION chromosome:NCBIM34:6:52305503:52309000:1 VERSION chromosome:NCBIM34:6:52305503:52309000:1 KEYWORDS . SOURCE House mouse ORGANISM Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; Sciurognathi; Muridae; Murinae; Mus. COMMENT This sequence was annotated by the Ensembl system. Please visit the Ensembl web site, http://www.ensembl.org/ for more information. COMMENT All feature locations are relative to the first (5') base of the sequence in this file. The sequence presented is always the forward strand of the assembly. Features that lie outside of the sequence contained in this file have clonal location coordinates in the format: .:.. COMMENT The /gene indicates a unique id for a gene, /note="transcript_id=..." a unique id for a transcript, /protein_id a unique id for a peptide and note="exon_id=..." a unique id for an exon. These ids are maintained wherever possible between versions. COMMENT All the exons and transcripts in Ensembl are confirmed by similarity to either protein or cDNA sequences. FEATURES Location/Qualifiers source 1..3498 /organism="Mus musculus" /db_xref="taxon:10090" gene complement(506..2826) /gene=ENSMUSG00000014704 mRNA join(complement(2261..2826),complement(506..1620)) /gene="ENSMUSG00000014704" /note="transcript_id=ENSMUST00000014848" CDS join(complement(2261..2639),complement(881..1620)) /gene="ENSMUSG00000014704" /protein_id="ENSMUSP00000014848" /note="transcript_id=ENSMUST00000014848" /db_xref="MarkerSymbol:Hoxa2" /db_xref="Uniprot/SWISSPROT:HXA2_MOUSE" /db_xref="RefSeq_peptide:NP_034581.1" /db_xref="RefSeq_dna:NM_010451.1" /db_xref="Uniprot/SPTREMBL:Q3UYP9_MOUSE" /db_xref="Uniprot/SPTREMBL:Q920T7_MOUSE" /db_xref="Uniprot/SPTREMBL:Q920T9_MOUSE" /db_xref="Uniprot/SPTREMBL:Q920U0_MOUSE" /db_xref="Uniprot/SPTREMBL:Q920U1_MOUSE" /db_xref="Uniprot/SPTREMBL:Q920U2_MOUSE" /db_xref="Uniprot/SPTREMBL:Q920U3_MOUSE" /db_xref="Uniprot/SPTREMBL:Q920U4_MOUSE" /db_xref="Uniprot/SPTREMBL:Q920U5_MOUSE" /db_xref="EntrezGene:15399" /db_xref="AgilentProbe:A_51_P501803" /db_xref="EMBL:AB039184" /db_xref="EMBL:AB039185" /db_xref="EMBL:AB039186" /db_xref="EMBL:AB039187" /db_xref="EMBL:AB039188" /db_xref="EMBL:AB039189" /db_xref="EMBL:AB039190" /db_xref="EMBL:AB039191" /db_xref="EMBL:AB039192" /db_xref="EMBL:AK134501" /db_xref="EMBL:M87801" /db_xref="EMBL:M93148" /db_xref="EMBL:M93292" /db_xref="EMBL:M95599" /db_xref="GO:GO:0003700" /db_xref="GO:GO:0005634" /db_xref="GO:GO:0006355" /db_xref="GO:GO:0007275" /db_xref="IPI:IPI00132242.1" /db_xref="UniGene:Mm.131" /db_xref="protein_id:AAA37827.1" /db_xref="protein_id:AAA37834.1" /db_xref="protein_id:AAA37835.1" /db_xref="protein_id:AAA37836.1" /db_xref="protein_id:BAB68708.1" /db_xref="protein_id:BAB68709.1" /db_xref="protein_id:BAB68710.1" /db_xref="protein_id:BAB68711.1" /db_xref="protein_id:BAB68712.1" /db_xref="protein_id:BAB68713.1" /db_xref="protein_id:BAB68714.1" /db_xref="protein_id:BAB68715.1" /db_xref="protein_id:BAB68716.1" /db_xref="protein_id:BAE22163.1" /db_xref="AFFY_MG_U74Av2:102643_at" /db_xref="AFFY_MG_U74Cv2:171063_at" /db_xref="AFFY_Mouse430A_2:1419602_at" /db_xref="AFFY_Mouse430_2:1419602_at" /translation="MNYEFEREIGFINSQPSLAECLTSFPPVADTFQSSSIKTSTLSH STLIPPPFEQTIPSLNPGSHPRHGAGVGGRPKSSPAGSRGSPVPAGALQPPEYPWMKE KKAAKKTALPPAAASTGPACLGHKESLEIADGSGGGSRRLRTAYTNTQLLELEKEFHF NKYLCRPRRVEIAALLDLTERQVKVWFQNRRMKHKRQTQCKENQNSEGKFKNLEDSDK VEEDEEEKSLFEQALSVSGALLEREGYTFQQNALSQQQAPNGHNGDSQTFPVSPLTSN EKNLKHFQHQSPTVPNCLSTMGQNCGAGLNNDSPEAIEVPSLQDFNVFSTDSCLQLSD ALSPSLPGSLDSPVDISADSFDFFTDTLTTIDLQHLNY" exon complement(506..1620) /note="exon_id=ENSMUSE00000387033" exon complement(2261..2826) /note="exon_id=ENSMUSE00000193269" BASE COUNT 938 a 815 c 882 g 863 t ORIGIN 1 AGGAAGAGTT GGAACGTAGA TGTTTGAAAC AAATGTGTAT AAATAAATGA ATTTTTGATA 61 ACTCCGTTAT TGACCTAGAA ACTAGCAGCT TGGTAAGGGA ACTCCATTCC ACTCCACTCG 121 TCCTAGAACT GGAAGTTTTT GTAGGCACTT TTCCTCTCCA CACTCAAAAG CTTGGGCTAG 181 GGCCAACTCA GGCTGCCCAA GCCCATTTCT ATTACTAATG TAACTCTATG GCCTGAGTCT 241 CAACACTGAA AACCAAATTC ATTCCCTTAG GGGGGAAAAA TCCAAAAAAA AAAAAAAAAA 301 AAGTCTTGCC AGAAGCCCTA GCACTTTCTG GTTTTCTTCT TTGTTGCTGT TTGTTGCAGG 361 CTTTGAACAT GCCACCCTAA TAAAATATAT TAAGATTGAA AAGTAAATTG TGACCAGACT 421 TTTATTTACC ATGTTAGACT AAAAGAAGTA TAAGAAATCA GTATGAGTCT TGAGAAAGAG 481 GGGAAGAAAA AAATAAGAAA GCTACTTATA GCAAAGGAGA ATTTATTCTA CCAAAAATAC 541 GCATGACAAT GCATTCTAAT GTGGTACAAA AATAAACAGA AAGTGACAAG ACAATTTATG 601 GTCACTTTCT TGCAGGCCTC CTGTTTTGTT TTTCAGGAAA ATCACATAGA AGCTTGTTGG 661 GTTCTGTGTA AAAACCACTT AGAACGCCAA CATAATTTGC AAGAGATGGC TTTAAAACTG 721 TGTCAGGGGA GAACATTAAA CGGAAAGTCC TCAACATTTG AGAGAGTAGG GGTAGATCAA 781 GAAGAAACTA AAACGAAAAT CAACTCCCAG AATAAAAGAA GGCAAAGCCA CCTGGTCAAA 841 GGCGTTTTGT TTTGTGAAGC TTTGTTTTGC TTTAATGTTC TTAGTAATTC AGATGCTGTA 901 GGTCGATTGT GGTGAGTGTG TCTGTAAAAA AGTCAAAGCT GTCAGCTGAG ATATCTACAG 961 GACTGTCCAG GGAGCCAGGC AAGCTGGGCG ACAGTGCATC TGAAAGCTGC AGGCAGGAAT 1021 CTGTGGAGAA AACATTGAAG TCCTGCAAAG AGGGGACCTC GATGGCCTCG GGACTGTCAT 1081 TGTTTAGGCC AGCTCCACAG TTCTGGCCCA TTGTTGACAA GCAGTTAGGA ACAGTGGGTG 1141 ACTGGTGCTG AAAATGTTTC AAATTTTTCT CATTGCTGGT TAAAGGCGAA ACTGGGAAAG 1201 TTTGGGAGTC GCCATTGTGT CCATTGGGAG CCTGCTGTTG AGAGAGCGCA TTTTGCTGAA 1261 AAGTGTACCC TTCCCTCTCC AGAAGGGCCC CGGAGACACT GAGGGCTTGC TCAAAGAGTG 1321 ACTTCTCTTC CTCGTCTTCC TCCACTTTGT CCGAGTCCTC CAGGTTTTTA AATTTCCCTT 1381 CGCTGTTTTG GTTCTCCTTG CACTGGGTTT GCCTCTTATG CTTCATTCTC CGGTTCTGAA 1441 ACCACACTTT CACTTGTCTC TCGGTCAAAT CCAGCAGCGC GGCGATTTCC ACCCTGCGGG 1501 GTCTGCAAAG GTACTTGTTG AAATGAAATT CCTTTTCCAG CTCCAAAAGC TGAGTGTTGG 1561 TGTACGCGGT TCTCAGACGC CTGGATCCCC CGCCGCTGCC ATCAGCTATT TCCAGGGATT 1621 CTGCAGAAAG GGAAACCAAC AAGAGACACA CATACAGTTG AAGGTGGAAG GGTCCGAGCA 1681 GGGTTATTCC ATTGGAGCAT AAATACAGCA GAAAAGATCA ACTGCAACAA AATGGCCGCC 1741 CCTGGATGCA GTGCAGCTAT TGTGCTGCCC TTCCTGGGAG CCCAGCCCGG GGAAGCCCAG 1801 TCTCTTCCAC CTCCATCAAA TTCCTGCCTG TGGCTTCCCC CAACCTCTTC ATCCGGGAGC 1861 AAACTTTATA TTAGCTACAA CACAATTTAT AATTAATGCA TCAGCTGCTT AGCTGAGCAA 1921 GAGCGGTCTA TCACTCTTCA TTACTGTCAA AAAGCCAAAC TCTAGGACAA CTAGACAAGA 1981 GGAGGTCAGT TCCAACTCAA ATAAATCATC CTACATTACA CAAGTTAGGG AAAGTGCCCC 2041 CCCTCCTCAA AATATATATG TCTCATTGTG GGACTCGGGA TCTATTTTCC CCTCCACCAA 2101 ACCCACTCCT GAGACCACAG GGGCATGAGA CCCGCCACCA GGCATCTCTC TCTCTCCCCC 2161 TTCCCTCGAA GCTCATGGTC CCCTCCCCCA CAACCGCTCC TAGGGAAGCC CGGAGGGGGA 2221 CAAGGGTCCC CGAGACCTGG GGCCAAGTCT CCGGACTGAC CTTTGTGGCC GAGGCAGGCA 2281 GGGCCCGTGG AGGCGGCGGC GGGCGGCAGC GCGGTTTTCT TGGCCGCCTT CTTCTCCTTC 2341 ATCCAGGGAT ACTCAGGCGG CTGCAGGGCG CCGGCAGGCA CCGGGCTGCC GCGACTGCCC 2401 GCGGGGCTCG ACTTGGGGCG GCCGCCAACG CCAGCGCCGT GGCGAGGGTG ACTGCCCGGG 2461 TTCAGGCTGG GAATGGTCTG CTCAAAAGGA GGAGGAATCA GTGTCGAGTG TGAAAGCGTC 2521 GAGGTCTTGA TTGATGAACT TTGAAATGTA TCAGCGACAG GGGGAAAAGA TGTCAGGCAC 2581 TCAGCGAGCG ACGGCTGGCT ATTGATAAAA CCAATCTCTC GCTCAAATTC GTAATTCATG 2641 GCCTTCTCCT TGGAGCCCCC TCGGAGGAAA AGTTCCCTCT TTTGGAGGGG CTTTGGGGGG 2701 GCAAGGCCCA GGAAAAAGGC GAGCGCGAAG GAAAAAAAAA TCTATCATAG AAGATCGCTG 2761 CTGGGGTGTT TTTTTTCTAA TTCACTGATT ACAGCCGTAT GGGGACCGCG CTACTATTAA 2821 ACTATTGAAT TCATGGAGAC AAGGTTGAAA TTGGACCGAA TTGGCTGTCA CATGATTGCT 2881 TCTGCCCAAT GACAATTTGG GCTTTAATCA AAAGAAGCCA CTGTCTGTTT GATTGATCCA 2941 AAAAAGTCAG AAAGGAACGC CTCATTGGGG GCCATCGAGG CTTTATTTAC ACTTTTTTTC 3001 AGGGCAAAAA TACATATATG TGGGTGTGGA TGGCAATGCC CCGGGAGTGC GTGGGGGGCG 3061 AGAGTGCCTG TTTGCCTCCT GATCTGCAAG GATCTAGTGT GCTCCCTGGA GTGTGTGTGT 3121 GAGTGTGTGC GTGTGAGCCC TGCTGCCGTC CCGCCAGTGG CTGCCCTCTG CCTCCCCCGC 3181 ACACTCCGCG CATTGTTTGG GACTGTCGGG AAGACGCCTC GCACCTCACA AATCATTTAA 3241 GCACCTCAGC CTGACGCCTG CAGTCATTAA CAAAGTAATC CATTAATCTT CAAAGTTTTG 3301 ACACCCCAGG GCCCTGCATC TCAGCCACAT AAGTTCTGCT AAGGCAAGAG AAAGGAGCAG 3361 AGTGGGAGAG AGAGAGGAGA GAGGGAGAGA GGGAGAGAGG GAGAGAGAGA GAGAGAGAGA 3421 GAGAGAGAGA GAGAGAGAGA GAGAGAATGA ATATTGGGGT TCACCTTTCC TCTTCCTCCT 3481 CTTTTTCCAA AATCAGTT // mark.schreiber@novartis.com wrote: >Hi Morgane - > >I have to say that doesn't look much like Genbank : ) > >The biojavax parser are possibly a bit brittle due to their use of regexps >to recognize key elements. It should be fixable, I think the problem is >that the parser expects a word after LOCUS not a number. This may not be >the only problem though. Could you post the entire file? Or if it is large >then a representative file of smaller size. > >- Mark > > > > > >Morgane THOMAS-CHOLLIER >Sent by: biojava-l-bounces@portal.open-bio.org >02/14/2006 04:36 AM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] Genbank parser error [biojavax] > > >Hello, > >I have tried biojavax today with a view to use the Genbank file parser. > >My test file is a Genbank formatted file which has been produced by >Ensembl export system. > >The head of the file is as follow : > >LOCUS 6 489671 bp DNA HTG 13-FEB-2006 >DEFINITION Mus musculus chromosome 6 NCBIM34 partial sequence > 52296503..52786173 reannotated via EnsEMBL >ACCESSION chromosome:NCBIM34:6:52296503:52786173:1 >VERSION chromosome:NCBIM34:6:52296503:52786173:1 > >I used the code provided in biojavax docbook to parse this file. >I get the following error : > >Exception in thread "main" org.biojava.bio.BioException: Could not read >sequence > at >org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) > at >org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31) >Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line found: >6 489671 bp DNA HTG 13-FEB-2006 > at >org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229) > at >org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) > ... 1 more > >I had a look at GenbankFormat.java, and I guess the problem comes from >the regular expression that do not recognize the LOCUS as a standard >Genbank file LOCUS tag. > >Am I wrong ? Have biojavax Genbank parser been tested on Ensembl >exported files ? > >Morgane. > > > -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student (mthomasc@vub.ac.be) Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium Tel : +32 2 629 15 22 ********************************************************** Stop Using Internet Explorer, choose FIREFOX ! http://emmanuel.clement.free.fr/navigateurs/comparatif.htm _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From mthomasc at vub.ac.be Wed Feb 15 05:04:22 2006 From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER) Date: Wed Feb 15 05:00:07 2006 Subject: [Biojava-l] Genbank parser error [biojavax] In-Reply-To: References: Message-ID: <43F2FCA6.4040905@vub.ac.be> Hi Mark, I have downloaded the fixed version and tested it with my large file. Works great. Thank you very much, Morgane. mark.schreiber@novartis.com wrote: >Hi Morgane - > >Turned out to be a problem with a greedy regexp parsing the LOCUS tag. >This is fixed in CVS. Let me know if something else is a problem. > >- Mark > > > > > >Morgane THOMAS-CHOLLIER >Sent by: biojava-l-bounces@portal.open-bio.org >02/14/2006 09:33 PM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: Re: [Biojava-l] Genbank parser error [biojavax] > > >Hello Mark, > >My file is indeed too large to be posted. >So I have exported a smaller sequence from Ensembl that I tested with >the parser. The behavior is the same. >You will find below this "Genbank" formatted file enclosed. > >Thanks for your help, > >Morgane. > >LOCUS 6 3498 bp DNA HTG 14-FEB-2006 >DEFINITION Mus musculus chromosome 6 NCBIM34 partial sequence > 52305503..52309000 reannotated via EnsEMBL >ACCESSION chromosome:NCBIM34:6:52305503:52309000:1 >VERSION chromosome:NCBIM34:6:52305503:52309000:1 >KEYWORDS . >SOURCE House mouse > ORGANISM Mus musculus > Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >Euteleostomi; > Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; > Sciurognathi; Muridae; Murinae; Mus. >COMMENT This sequence was annotated by the Ensembl system. Please >visit the > Ensembl web site, http://www.ensembl.org/ for more information. >COMMENT All feature locations are relative to the first (5') base of >the > sequence in this file. The sequence presented is always the > forward strand of the assembly. Features that lie outside of >the > sequence contained in this file have clonal location >coordinates in > the format: .:.. >COMMENT The /gene indicates a unique id for a gene, > /note="transcript_id=..." a unique id for a transcript, >/protein_id > a unique id for a peptide and note="exon_id=..." a unique id >for an > exon. These ids are maintained wherever possible between >versions. >COMMENT All the exons and transcripts in Ensembl are confirmed by > similarity to either protein or cDNA sequences. >FEATURES Location/Qualifiers > source 1..3498 > /organism="Mus musculus" > /db_xref="taxon:10090" > gene complement(506..2826) > /gene=ENSMUSG00000014704 > mRNA join(complement(2261..2826),complement(506..1620)) > /gene="ENSMUSG00000014704" > /note="transcript_id=ENSMUST00000014848" > CDS join(complement(2261..2639),complement(881..1620)) > /gene="ENSMUSG00000014704" > /protein_id="ENSMUSP00000014848" > /note="transcript_id=ENSMUST00000014848" > /db_xref="MarkerSymbol:Hoxa2" > /db_xref="Uniprot/SWISSPROT:HXA2_MOUSE" > /db_xref="RefSeq_peptide:NP_034581.1" > /db_xref="RefSeq_dna:NM_010451.1" > /db_xref="Uniprot/SPTREMBL:Q3UYP9_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920T7_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920T9_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U0_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U1_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U2_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U3_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U4_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U5_MOUSE" > /db_xref="EntrezGene:15399" > /db_xref="AgilentProbe:A_51_P501803" > /db_xref="EMBL:AB039184" > /db_xref="EMBL:AB039185" > /db_xref="EMBL:AB039186" > /db_xref="EMBL:AB039187" > /db_xref="EMBL:AB039188" > /db_xref="EMBL:AB039189" > /db_xref="EMBL:AB039190" > /db_xref="EMBL:AB039191" > /db_xref="EMBL:AB039192" > /db_xref="EMBL:AK134501" > /db_xref="EMBL:M87801" > /db_xref="EMBL:M93148" > /db_xref="EMBL:M93292" > /db_xref="EMBL:M95599" > /db_xref="GO:GO:0003700" > /db_xref="GO:GO:0005634" > /db_xref="GO:GO:0006355" > /db_xref="GO:GO:0007275" > /db_xref="IPI:IPI00132242.1" > /db_xref="UniGene:Mm.131" > /db_xref="protein_id:AAA37827.1" > /db_xref="protein_id:AAA37834.1" > /db_xref="protein_id:AAA37835.1" > /db_xref="protein_id:AAA37836.1" > /db_xref="protein_id:BAB68708.1" > /db_xref="protein_id:BAB68709.1" > /db_xref="protein_id:BAB68710.1" > /db_xref="protein_id:BAB68711.1" > /db_xref="protein_id:BAB68712.1" > /db_xref="protein_id:BAB68713.1" > /db_xref="protein_id:BAB68714.1" > /db_xref="protein_id:BAB68715.1" > /db_xref="protein_id:BAB68716.1" > /db_xref="protein_id:BAE22163.1" > /db_xref="AFFY_MG_U74Av2:102643_at" > /db_xref="AFFY_MG_U74Cv2:171063_at" > /db_xref="AFFY_Mouse430A_2:1419602_at" > /db_xref="AFFY_Mouse430_2:1419602_at" > /translation="MNYEFEREIGFINSQPSLAECLTSFPPVADTFQSSSIKTSTLSH > STLIPPPFEQTIPSLNPGSHPRHGAGVGGRPKSSPAGSRGSPVPAGALQPPEYPWMKE > KKAAKKTALPPAAASTGPACLGHKESLEIADGSGGGSRRLRTAYTNTQLLELEKEFHF > NKYLCRPRRVEIAALLDLTERQVKVWFQNRRMKHKRQTQCKENQNSEGKFKNLEDSDK > VEEDEEEKSLFEQALSVSGALLEREGYTFQQNALSQQQAPNGHNGDSQTFPVSPLTSN > EKNLKHFQHQSPTVPNCLSTMGQNCGAGLNNDSPEAIEVPSLQDFNVFSTDSCLQLSD > ALSPSLPGSLDSPVDISADSFDFFTDTLTTIDLQHLNY" > exon complement(506..1620) > /note="exon_id=ENSMUSE00000387033" > exon complement(2261..2826) > /note="exon_id=ENSMUSE00000193269" >BASE COUNT 938 a 815 c 882 g 863 t >ORIGIN > 1 AGGAAGAGTT GGAACGTAGA TGTTTGAAAC AAATGTGTAT AAATAAATGA >ATTTTTGATA > 61 ACTCCGTTAT TGACCTAGAA ACTAGCAGCT TGGTAAGGGA ACTCCATTCC >ACTCCACTCG > 121 TCCTAGAACT GGAAGTTTTT GTAGGCACTT TTCCTCTCCA CACTCAAAAG >CTTGGGCTAG > 181 GGCCAACTCA GGCTGCCCAA GCCCATTTCT ATTACTAATG TAACTCTATG >GCCTGAGTCT > 241 CAACACTGAA AACCAAATTC ATTCCCTTAG GGGGGAAAAA TCCAAAAAAA >AAAAAAAAAA > 301 AAGTCTTGCC AGAAGCCCTA GCACTTTCTG GTTTTCTTCT TTGTTGCTGT >TTGTTGCAGG > 361 CTTTGAACAT GCCACCCTAA TAAAATATAT TAAGATTGAA AAGTAAATTG >TGACCAGACT > 421 TTTATTTACC ATGTTAGACT AAAAGAAGTA TAAGAAATCA GTATGAGTCT >TGAGAAAGAG > 481 GGGAAGAAAA AAATAAGAAA GCTACTTATA GCAAAGGAGA ATTTATTCTA >CCAAAAATAC > 541 GCATGACAAT GCATTCTAAT GTGGTACAAA AATAAACAGA AAGTGACAAG >ACAATTTATG > 601 GTCACTTTCT TGCAGGCCTC CTGTTTTGTT TTTCAGGAAA ATCACATAGA >AGCTTGTTGG > 661 GTTCTGTGTA AAAACCACTT AGAACGCCAA CATAATTTGC AAGAGATGGC >TTTAAAACTG > 721 TGTCAGGGGA GAACATTAAA CGGAAAGTCC TCAACATTTG AGAGAGTAGG >GGTAGATCAA > 781 GAAGAAACTA AAACGAAAAT CAACTCCCAG AATAAAAGAA GGCAAAGCCA >CCTGGTCAAA > 841 GGCGTTTTGT TTTGTGAAGC TTTGTTTTGC TTTAATGTTC TTAGTAATTC >AGATGCTGTA > 901 GGTCGATTGT GGTGAGTGTG TCTGTAAAAA AGTCAAAGCT GTCAGCTGAG >ATATCTACAG > 961 GACTGTCCAG GGAGCCAGGC AAGCTGGGCG ACAGTGCATC TGAAAGCTGC >AGGCAGGAAT > 1021 CTGTGGAGAA AACATTGAAG TCCTGCAAAG AGGGGACCTC GATGGCCTCG >GGACTGTCAT > 1081 TGTTTAGGCC AGCTCCACAG TTCTGGCCCA TTGTTGACAA GCAGTTAGGA >ACAGTGGGTG > 1141 ACTGGTGCTG AAAATGTTTC AAATTTTTCT CATTGCTGGT TAAAGGCGAA >ACTGGGAAAG > 1201 TTTGGGAGTC GCCATTGTGT CCATTGGGAG CCTGCTGTTG AGAGAGCGCA >TTTTGCTGAA > 1261 AAGTGTACCC TTCCCTCTCC AGAAGGGCCC CGGAGACACT GAGGGCTTGC >TCAAAGAGTG > 1321 ACTTCTCTTC CTCGTCTTCC TCCACTTTGT CCGAGTCCTC CAGGTTTTTA >AATTTCCCTT > 1381 CGCTGTTTTG GTTCTCCTTG CACTGGGTTT GCCTCTTATG CTTCATTCTC >CGGTTCTGAA > 1441 ACCACACTTT CACTTGTCTC TCGGTCAAAT CCAGCAGCGC GGCGATTTCC >ACCCTGCGGG > 1501 GTCTGCAAAG GTACTTGTTG AAATGAAATT CCTTTTCCAG CTCCAAAAGC >TGAGTGTTGG > 1561 TGTACGCGGT TCTCAGACGC CTGGATCCCC CGCCGCTGCC ATCAGCTATT >TCCAGGGATT > 1621 CTGCAGAAAG GGAAACCAAC AAGAGACACA CATACAGTTG AAGGTGGAAG >GGTCCGAGCA > 1681 GGGTTATTCC ATTGGAGCAT AAATACAGCA GAAAAGATCA ACTGCAACAA >AATGGCCGCC > 1741 CCTGGATGCA GTGCAGCTAT TGTGCTGCCC TTCCTGGGAG CCCAGCCCGG >GGAAGCCCAG > 1801 TCTCTTCCAC CTCCATCAAA TTCCTGCCTG TGGCTTCCCC CAACCTCTTC >ATCCGGGAGC > 1861 AAACTTTATA TTAGCTACAA CACAATTTAT AATTAATGCA TCAGCTGCTT >AGCTGAGCAA > 1921 GAGCGGTCTA TCACTCTTCA TTACTGTCAA AAAGCCAAAC TCTAGGACAA >CTAGACAAGA > 1981 GGAGGTCAGT TCCAACTCAA ATAAATCATC CTACATTACA CAAGTTAGGG >AAAGTGCCCC > 2041 CCCTCCTCAA AATATATATG TCTCATTGTG GGACTCGGGA TCTATTTTCC >CCTCCACCAA > 2101 ACCCACTCCT GAGACCACAG GGGCATGAGA CCCGCCACCA GGCATCTCTC >TCTCTCCCCC > 2161 TTCCCTCGAA GCTCATGGTC CCCTCCCCCA CAACCGCTCC TAGGGAAGCC >CGGAGGGGGA > 2221 CAAGGGTCCC CGAGACCTGG GGCCAAGTCT CCGGACTGAC CTTTGTGGCC >GAGGCAGGCA > 2281 GGGCCCGTGG AGGCGGCGGC GGGCGGCAGC GCGGTTTTCT TGGCCGCCTT >CTTCTCCTTC > 2341 ATCCAGGGAT ACTCAGGCGG CTGCAGGGCG CCGGCAGGCA CCGGGCTGCC >GCGACTGCCC > 2401 GCGGGGCTCG ACTTGGGGCG GCCGCCAACG CCAGCGCCGT GGCGAGGGTG >ACTGCCCGGG > 2461 TTCAGGCTGG GAATGGTCTG CTCAAAAGGA GGAGGAATCA GTGTCGAGTG >TGAAAGCGTC > 2521 GAGGTCTTGA TTGATGAACT TTGAAATGTA TCAGCGACAG GGGGAAAAGA >TGTCAGGCAC > 2581 TCAGCGAGCG ACGGCTGGCT ATTGATAAAA CCAATCTCTC GCTCAAATTC >GTAATTCATG > 2641 GCCTTCTCCT TGGAGCCCCC TCGGAGGAAA AGTTCCCTCT TTTGGAGGGG >CTTTGGGGGG > 2701 GCAAGGCCCA GGAAAAAGGC GAGCGCGAAG GAAAAAAAAA TCTATCATAG >AAGATCGCTG > 2761 CTGGGGTGTT TTTTTTCTAA TTCACTGATT ACAGCCGTAT GGGGACCGCG >CTACTATTAA > 2821 ACTATTGAAT TCATGGAGAC AAGGTTGAAA TTGGACCGAA TTGGCTGTCA >CATGATTGCT > 2881 TCTGCCCAAT GACAATTTGG GCTTTAATCA AAAGAAGCCA CTGTCTGTTT >GATTGATCCA > 2941 AAAAAGTCAG AAAGGAACGC CTCATTGGGG GCCATCGAGG CTTTATTTAC >ACTTTTTTTC > 3001 AGGGCAAAAA TACATATATG TGGGTGTGGA TGGCAATGCC CCGGGAGTGC >GTGGGGGGCG > 3061 AGAGTGCCTG TTTGCCTCCT GATCTGCAAG GATCTAGTGT GCTCCCTGGA >GTGTGTGTGT > 3121 GAGTGTGTGC GTGTGAGCCC TGCTGCCGTC CCGCCAGTGG CTGCCCTCTG >CCTCCCCCGC > 3181 ACACTCCGCG CATTGTTTGG GACTGTCGGG AAGACGCCTC GCACCTCACA >AATCATTTAA > 3241 GCACCTCAGC CTGACGCCTG CAGTCATTAA CAAAGTAATC CATTAATCTT >CAAAGTTTTG > 3301 ACACCCCAGG GCCCTGCATC TCAGCCACAT AAGTTCTGCT AAGGCAAGAG >AAAGGAGCAG > 3361 AGTGGGAGAG AGAGAGGAGA GAGGGAGAGA GGGAGAGAGG GAGAGAGAGA >GAGAGAGAGA > 3421 GAGAGAGAGA GAGAGAGAGA GAGAGAATGA ATATTGGGGT TCACCTTTCC >TCTTCCTCCT > 3481 CTTTTTCCAA AATCAGTT >// > > > > >mark.schreiber@novartis.com wrote: > > > >>Hi Morgane - >> >>I have to say that doesn't look much like Genbank : ) >> >>The biojavax parser are possibly a bit brittle due to their use of >> >> >regexps > > >>to recognize key elements. It should be fixable, I think the problem is >>that the parser expects a word after LOCUS not a number. This may not be >>the only problem though. Could you post the entire file? Or if it is >> >> >large > > >>then a representative file of smaller size. >> >>- Mark >> >> >> >> >> >>Morgane THOMAS-CHOLLIER >>Sent by: biojava-l-bounces@portal.open-bio.org >>02/14/2006 04:36 AM >> >> >> To: biojava-l@biojava.org >> cc: (bcc: Mark Schreiber/GP/Novartis) >> Subject: [Biojava-l] Genbank parser error [biojavax] >> >> >>Hello, >> >>I have tried biojavax today with a view to use the Genbank file parser. >> >>My test file is a Genbank formatted file which has been produced by >>Ensembl export system. >> >>The head of the file is as follow : >> >>LOCUS 6 489671 bp DNA HTG 13-FEB-2006 >>DEFINITION Mus musculus chromosome 6 NCBIM34 partial sequence >> 52296503..52786173 reannotated via EnsEMBL >>ACCESSION chromosome:NCBIM34:6:52296503:52786173:1 >>VERSION chromosome:NCBIM34:6:52296503:52786173:1 >> >>I used the code provided in biojavax docbook to parse this file. >>I get the following error : >> >>Exception in thread "main" org.biojava.bio.BioException: Could not read >>sequence >> at >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) >> at >>org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31) >>Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line found: >>6 489671 bp DNA HTG 13-FEB-2006 >> at >>org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229) >> at >>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) >> ... 1 more >> >>I had a look at GenbankFormat.java, and I guess the problem comes from >>the regular expression that do not recognize the LOCUS as a standard >>Genbank file LOCUS tag. >> >>Am I wrong ? Have biojavax Genbank parser been tested on Ensembl >>exported files ? >> >>Morgane. >> >> >> >> >> > > > -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student (mthomasc@vub.ac.be) Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium Tel : +32 2 629 15 22 ********************************************************** Stop Using Internet Explorer, choose FIREFOX ! http://emmanuel.clement.free.fr/navigateurs/comparatif.htm From mark.schreiber at novartis.com Wed Feb 15 07:20:13 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Feb 15 07:16:10 2006 Subject: [Biojava-l] Genbank parser error [biojavax] Message-ID: I think these properties should be going to the (Rich)Annotation bundle. - Mark Morgane THOMAS-CHOLLIER Sent by: biojava-l-bounces@portal.open-bio.org 02/15/2006 04:56 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: Re: [Biojava-l] Genbank parser error [biojavax] Hello again, I have continued using the Genbank parser, but this time with Genbank files coming from NCBI :) I really appreciate the example from the documentation that converts a Genbank file into an EMBL file. I have to say, it is really easy to use. I nevertheless have a question concerning the Organism and Source tags. Indeed, it is clear in the documentation that they are ignored by the parser. But I do not really understand why. When I used the Genbank file of the accession numbers : AC147788 and DQ158013, I was unable to get the common name of the organism or use getNameHierarchy(), but I can get the taxon ID for both. Is there a way to get the common name of the organism, without using a remote call to the NCBI with the taxonID ? Thanks for your help, Morgane. Morgane THOMAS-CHOLLIER wrote: > Hello Mark, > > My file is indeed too large to be posted. > So I have exported a smaller sequence from Ensembl that I tested with > the parser. The behavior is the same. > You will find below this "Genbank" formatted file enclosed. > > Thanks for your help, > > Morgane. > > LOCUS 6 3498 bp DNA HTG 14-FEB-2006 > DEFINITION Mus musculus chromosome 6 NCBIM34 partial sequence > 52305503..52309000 reannotated via EnsEMBL > ACCESSION chromosome:NCBIM34:6:52305503:52309000:1 > VERSION chromosome:NCBIM34:6:52305503:52309000:1 > KEYWORDS . > SOURCE House mouse > ORGANISM Mus musculus > Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; > Euteleostomi; > Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; > Sciurognathi; Muridae; Murinae; Mus. > COMMENT This sequence was annotated by the Ensembl system. Please > visit the > Ensembl web site, http://www.ensembl.org/ for more > information. > COMMENT All feature locations are relative to the first (5') base > of the > sequence in this file. The sequence presented is always the > forward strand of the assembly. Features that lie outside > of the > sequence contained in this file have clonal location > coordinates in > the format: .:.. > COMMENT The /gene indicates a unique id for a gene, > /note="transcript_id=..." a unique id for a transcript, > /protein_id > a unique id for a peptide and note="exon_id=..." a unique > id for an > exon. These ids are maintained wherever possible between > versions. > COMMENT All the exons and transcripts in Ensembl are confirmed by > similarity to either protein or cDNA sequences. > FEATURES Location/Qualifiers > source 1..3498 > /organism="Mus musculus" > /db_xref="taxon:10090" > gene complement(506..2826) > /gene=ENSMUSG00000014704 > mRNA join(complement(2261..2826),complement(506..1620)) > /gene="ENSMUSG00000014704" > /note="transcript_id=ENSMUST00000014848" > CDS join(complement(2261..2639),complement(881..1620)) > /gene="ENSMUSG00000014704" > /protein_id="ENSMUSP00000014848" > /note="transcript_id=ENSMUST00000014848" > /db_xref="MarkerSymbol:Hoxa2" > /db_xref="Uniprot/SWISSPROT:HXA2_MOUSE" > /db_xref="RefSeq_peptide:NP_034581.1" > /db_xref="RefSeq_dna:NM_010451.1" > /db_xref="Uniprot/SPTREMBL:Q3UYP9_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920T7_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920T9_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U0_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U1_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U2_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U3_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U4_MOUSE" > /db_xref="Uniprot/SPTREMBL:Q920U5_MOUSE" > /db_xref="EntrezGene:15399" > /db_xref="AgilentProbe:A_51_P501803" > /db_xref="EMBL:AB039184" > /db_xref="EMBL:AB039185" > /db_xref="EMBL:AB039186" > /db_xref="EMBL:AB039187" > /db_xref="EMBL:AB039188" > /db_xref="EMBL:AB039189" > /db_xref="EMBL:AB039190" > /db_xref="EMBL:AB039191" > /db_xref="EMBL:AB039192" > /db_xref="EMBL:AK134501" > /db_xref="EMBL:M87801" > /db_xref="EMBL:M93148" > /db_xref="EMBL:M93292" > /db_xref="EMBL:M95599" > /db_xref="GO:GO:0003700" > /db_xref="GO:GO:0005634" > /db_xref="GO:GO:0006355" > /db_xref="GO:GO:0007275" > /db_xref="IPI:IPI00132242.1" > /db_xref="UniGene:Mm.131" > /db_xref="protein_id:AAA37827.1" > /db_xref="protein_id:AAA37834.1" > /db_xref="protein_id:AAA37835.1" > /db_xref="protein_id:AAA37836.1" > /db_xref="protein_id:BAB68708.1" > /db_xref="protein_id:BAB68709.1" > /db_xref="protein_id:BAB68710.1" > /db_xref="protein_id:BAB68711.1" > /db_xref="protein_id:BAB68712.1" > /db_xref="protein_id:BAB68713.1" > /db_xref="protein_id:BAB68714.1" > /db_xref="protein_id:BAB68715.1" > /db_xref="protein_id:BAB68716.1" > /db_xref="protein_id:BAE22163.1" > /db_xref="AFFY_MG_U74Av2:102643_at" > /db_xref="AFFY_MG_U74Cv2:171063_at" > /db_xref="AFFY_Mouse430A_2:1419602_at" > /db_xref="AFFY_Mouse430_2:1419602_at" > > /translation="MNYEFEREIGFINSQPSLAECLTSFPPVADTFQSSSIKTSTLSH > > STLIPPPFEQTIPSLNPGSHPRHGAGVGGRPKSSPAGSRGSPVPAGALQPPEYPWMKE > > KKAAKKTALPPAAASTGPACLGHKESLEIADGSGGGSRRLRTAYTNTQLLELEKEFHF > > NKYLCRPRRVEIAALLDLTERQVKVWFQNRRMKHKRQTQCKENQNSEGKFKNLEDSDK > > VEEDEEEKSLFEQALSVSGALLEREGYTFQQNALSQQQAPNGHNGDSQTFPVSPLTSN > > EKNLKHFQHQSPTVPNCLSTMGQNCGAGLNNDSPEAIEVPSLQDFNVFSTDSCLQLSD > ALSPSLPGSLDSPVDISADSFDFFTDTLTTIDLQHLNY" > exon complement(506..1620) > /note="exon_id=ENSMUSE00000387033" > exon complement(2261..2826) > /note="exon_id=ENSMUSE00000193269" > BASE COUNT 938 a 815 c 882 g 863 t > ORIGIN > 1 AGGAAGAGTT GGAACGTAGA TGTTTGAAAC AAATGTGTAT AAATAAATGA > ATTTTTGATA > 61 ACTCCGTTAT TGACCTAGAA ACTAGCAGCT TGGTAAGGGA ACTCCATTCC > ACTCCACTCG > 121 TCCTAGAACT GGAAGTTTTT GTAGGCACTT TTCCTCTCCA CACTCAAAAG > CTTGGGCTAG > 181 GGCCAACTCA GGCTGCCCAA GCCCATTTCT ATTACTAATG TAACTCTATG > GCCTGAGTCT > 241 CAACACTGAA AACCAAATTC ATTCCCTTAG GGGGGAAAAA TCCAAAAAAA > AAAAAAAAAA > 301 AAGTCTTGCC AGAAGCCCTA GCACTTTCTG GTTTTCTTCT TTGTTGCTGT > TTGTTGCAGG > 361 CTTTGAACAT GCCACCCTAA TAAAATATAT TAAGATTGAA AAGTAAATTG > TGACCAGACT > 421 TTTATTTACC ATGTTAGACT AAAAGAAGTA TAAGAAATCA GTATGAGTCT > TGAGAAAGAG > 481 GGGAAGAAAA AAATAAGAAA GCTACTTATA GCAAAGGAGA ATTTATTCTA > CCAAAAATAC > 541 GCATGACAAT GCATTCTAAT GTGGTACAAA AATAAACAGA AAGTGACAAG > ACAATTTATG > 601 GTCACTTTCT TGCAGGCCTC CTGTTTTGTT TTTCAGGAAA ATCACATAGA > AGCTTGTTGG > 661 GTTCTGTGTA AAAACCACTT AGAACGCCAA CATAATTTGC AAGAGATGGC > TTTAAAACTG > 721 TGTCAGGGGA GAACATTAAA CGGAAAGTCC TCAACATTTG AGAGAGTAGG > GGTAGATCAA > 781 GAAGAAACTA AAACGAAAAT CAACTCCCAG AATAAAAGAA GGCAAAGCCA > CCTGGTCAAA > 841 GGCGTTTTGT TTTGTGAAGC TTTGTTTTGC TTTAATGTTC TTAGTAATTC > AGATGCTGTA > 901 GGTCGATTGT GGTGAGTGTG TCTGTAAAAA AGTCAAAGCT GTCAGCTGAG > ATATCTACAG > 961 GACTGTCCAG GGAGCCAGGC AAGCTGGGCG ACAGTGCATC TGAAAGCTGC > AGGCAGGAAT > 1021 CTGTGGAGAA AACATTGAAG TCCTGCAAAG AGGGGACCTC GATGGCCTCG > GGACTGTCAT > 1081 TGTTTAGGCC AGCTCCACAG TTCTGGCCCA TTGTTGACAA GCAGTTAGGA > ACAGTGGGTG > 1141 ACTGGTGCTG AAAATGTTTC AAATTTTTCT CATTGCTGGT TAAAGGCGAA > ACTGGGAAAG > 1201 TTTGGGAGTC GCCATTGTGT CCATTGGGAG CCTGCTGTTG AGAGAGCGCA > TTTTGCTGAA > 1261 AAGTGTACCC TTCCCTCTCC AGAAGGGCCC CGGAGACACT GAGGGCTTGC > TCAAAGAGTG > 1321 ACTTCTCTTC CTCGTCTTCC TCCACTTTGT CCGAGTCCTC CAGGTTTTTA > AATTTCCCTT > 1381 CGCTGTTTTG GTTCTCCTTG CACTGGGTTT GCCTCTTATG CTTCATTCTC > CGGTTCTGAA > 1441 ACCACACTTT CACTTGTCTC TCGGTCAAAT CCAGCAGCGC GGCGATTTCC > ACCCTGCGGG > 1501 GTCTGCAAAG GTACTTGTTG AAATGAAATT CCTTTTCCAG CTCCAAAAGC > TGAGTGTTGG > 1561 TGTACGCGGT TCTCAGACGC CTGGATCCCC CGCCGCTGCC ATCAGCTATT > TCCAGGGATT > 1621 CTGCAGAAAG GGAAACCAAC AAGAGACACA CATACAGTTG AAGGTGGAAG > GGTCCGAGCA > 1681 GGGTTATTCC ATTGGAGCAT AAATACAGCA GAAAAGATCA ACTGCAACAA > AATGGCCGCC > 1741 CCTGGATGCA GTGCAGCTAT TGTGCTGCCC TTCCTGGGAG CCCAGCCCGG > GGAAGCCCAG > 1801 TCTCTTCCAC CTCCATCAAA TTCCTGCCTG TGGCTTCCCC CAACCTCTTC > ATCCGGGAGC > 1861 AAACTTTATA TTAGCTACAA CACAATTTAT AATTAATGCA TCAGCTGCTT > AGCTGAGCAA > 1921 GAGCGGTCTA TCACTCTTCA TTACTGTCAA AAAGCCAAAC TCTAGGACAA > CTAGACAAGA > 1981 GGAGGTCAGT TCCAACTCAA ATAAATCATC CTACATTACA CAAGTTAGGG > AAAGTGCCCC > 2041 CCCTCCTCAA AATATATATG TCTCATTGTG GGACTCGGGA TCTATTTTCC > CCTCCACCAA > 2101 ACCCACTCCT GAGACCACAG GGGCATGAGA CCCGCCACCA GGCATCTCTC > TCTCTCCCCC > 2161 TTCCCTCGAA GCTCATGGTC CCCTCCCCCA CAACCGCTCC TAGGGAAGCC > CGGAGGGGGA > 2221 CAAGGGTCCC CGAGACCTGG GGCCAAGTCT CCGGACTGAC CTTTGTGGCC > GAGGCAGGCA > 2281 GGGCCCGTGG AGGCGGCGGC GGGCGGCAGC GCGGTTTTCT TGGCCGCCTT > CTTCTCCTTC > 2341 ATCCAGGGAT ACTCAGGCGG CTGCAGGGCG CCGGCAGGCA CCGGGCTGCC > GCGACTGCCC > 2401 GCGGGGCTCG ACTTGGGGCG GCCGCCAACG CCAGCGCCGT GGCGAGGGTG > ACTGCCCGGG > 2461 TTCAGGCTGG GAATGGTCTG CTCAAAAGGA GGAGGAATCA GTGTCGAGTG > TGAAAGCGTC > 2521 GAGGTCTTGA TTGATGAACT TTGAAATGTA TCAGCGACAG GGGGAAAAGA > TGTCAGGCAC > 2581 TCAGCGAGCG ACGGCTGGCT ATTGATAAAA CCAATCTCTC GCTCAAATTC > GTAATTCATG > 2641 GCCTTCTCCT TGGAGCCCCC TCGGAGGAAA AGTTCCCTCT TTTGGAGGGG > CTTTGGGGGG > 2701 GCAAGGCCCA GGAAAAAGGC GAGCGCGAAG GAAAAAAAAA TCTATCATAG > AAGATCGCTG > 2761 CTGGGGTGTT TTTTTTCTAA TTCACTGATT ACAGCCGTAT GGGGACCGCG > CTACTATTAA > 2821 ACTATTGAAT TCATGGAGAC AAGGTTGAAA TTGGACCGAA TTGGCTGTCA > CATGATTGCT > 2881 TCTGCCCAAT GACAATTTGG GCTTTAATCA AAAGAAGCCA CTGTCTGTTT > GATTGATCCA > 2941 AAAAAGTCAG AAAGGAACGC CTCATTGGGG GCCATCGAGG CTTTATTTAC > ACTTTTTTTC > 3001 AGGGCAAAAA TACATATATG TGGGTGTGGA TGGCAATGCC CCGGGAGTGC > GTGGGGGGCG > 3061 AGAGTGCCTG TTTGCCTCCT GATCTGCAAG GATCTAGTGT GCTCCCTGGA > GTGTGTGTGT > 3121 GAGTGTGTGC GTGTGAGCCC TGCTGCCGTC CCGCCAGTGG CTGCCCTCTG > CCTCCCCCGC > 3181 ACACTCCGCG CATTGTTTGG GACTGTCGGG AAGACGCCTC GCACCTCACA > AATCATTTAA > 3241 GCACCTCAGC CTGACGCCTG CAGTCATTAA CAAAGTAATC CATTAATCTT > CAAAGTTTTG > 3301 ACACCCCAGG GCCCTGCATC TCAGCCACAT AAGTTCTGCT AAGGCAAGAG > AAAGGAGCAG > 3361 AGTGGGAGAG AGAGAGGAGA GAGGGAGAGA GGGAGAGAGG GAGAGAGAGA > GAGAGAGAGA > 3421 GAGAGAGAGA GAGAGAGAGA GAGAGAATGA ATATTGGGGT TCACCTTTCC > TCTTCCTCCT > 3481 CTTTTTCCAA AATCAGTT > // > > > > > mark.schreiber@novartis.com wrote: > >> Hi Morgane - >> >> I have to say that doesn't look much like Genbank : ) >> >> The biojavax parser are possibly a bit brittle due to their use of >> regexps to recognize key elements. It should be fixable, I think the >> problem is that the parser expects a word after LOCUS not a number. >> This may not be the only problem though. Could you post the entire >> file? Or if it is large then a representative file of smaller size. >> >> - Mark >> >> >> >> >> >> Morgane THOMAS-CHOLLIER >> Sent by: biojava-l-bounces@portal.open-bio.org >> 02/14/2006 04:36 AM >> >> >> To: biojava-l@biojava.org >> cc: (bcc: Mark Schreiber/GP/Novartis) >> Subject: [Biojava-l] Genbank parser error [biojavax] >> >> >> Hello, >> >> I have tried biojavax today with a view to use the Genbank file parser. >> >> My test file is a Genbank formatted file which has been produced by >> Ensembl export system. >> >> The head of the file is as follow : >> >> LOCUS 6 489671 bp DNA HTG 13-FEB-2006 >> DEFINITION Mus musculus chromosome 6 NCBIM34 partial sequence >> 52296503..52786173 reannotated via EnsEMBL >> ACCESSION chromosome:NCBIM34:6:52296503:52786173:1 >> VERSION chromosome:NCBIM34:6:52296503:52786173:1 >> >> I used the code provided in biojavax docbook to parse this file. >> I get the following error : >> >> Exception in thread "main" org.biojava.bio.BioException: Could not >> read sequence >> at >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) >> >> at >> org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31) >> >> Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line >> found: 6 489671 bp DNA HTG 13-FEB-2006 >> at >> org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229) >> >> at >> org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) >> >> ... 1 more >> >> I had a look at GenbankFormat.java, and I guess the problem comes >> from the regular expression that do not recognize the LOCUS as a >> standard Genbank file LOCUS tag. >> >> Am I wrong ? Have biojavax Genbank parser been tested on Ensembl >> exported files ? >> >> Morgane. >> >> >> > -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student (mthomasc@vub.ac.be) Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From dreher at mpiib-berlin.mpg.de Wed Feb 15 09:49:32 2006 From: dreher at mpiib-berlin.mpg.de (Felix Dreher) Date: Wed Feb 15 09:48:49 2006 Subject: [Biojava-l] Problem: BioSQL-cvs and/or RichSequence-Annotation Message-ID: <43F33F7C.2070407@mpiib-berlin.mpg.de> Hello, I have a question regarding the BioSQL-schema-scripts. The tutorial on installing BioSQL (http://www.biojava.org/tutorials/biosql.html) says that three scripts are required: biosqldb-pg.sql biosql-accelerators-pg.sql biosqldb-assembly-pg.sql However, the 'assembly'-script can not be found on the CVS-server. Instead there is another script called 'biosqldb-views-pg.sql'. So I would like to know which scripts should be used. Furthermore I have a problem with adding an annotation (or also a feature) to a RichSequence. As it seems to be a problem with Hibernate and/or the BioSQL-schemas: I use BioJava-live (CVS) from 2 weeks ago and the latest CVS-BioSQL-scripts. When I try the following code, the following Exceptions are thrown (while the execution of line 2). 1 RichSequence seq = (SimpleRichSequence) RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","urn:local:seq")); 2 ComparableTerm ct = RichObjectFactory.getDefaultOntology().getOrCreateTerm("projectname"); 3 seq.getAnnotation().setProperty(ct, "project_25"); Exception in thread "main" java.lang.RuntimeException: Error while trying to call new class org.biojavax.ontology.SimpleComparableOntology(class java.lang.String) at org.biojavax.bio.db.HibernateRichObjectBuilder.buildObject(HibernateRichObjectBuilder.java:154) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java:97) at org.biojavax.RichObjectFactory.getDefaultOntology(RichObjectFactory.java:178) at hibernatetest.Main.main(Main.java:246) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.biojavax.bio.db.HibernateRichObjectBuilder.buildObject(HibernateRichObjectBuilder.java:138) ... 3 more Caused by: org.hibernate.exception.SQLGrammarException: could not insert: [Ontology] at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:65) at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43) at org.hibernate.id.AbstractPostInsertGenerator.getGenerated(AbstractPostInsertGenerator.java:56) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:1994) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2405) at org.hibernate.action.EntityIdentityInsertAction.execute(EntityIdentityInsertAction.java:37) at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243) at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:269) at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:167) at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:101) at org.hibernate.event.def.DefaultPersistEventListener.entityIsTransient(DefaultPersistEventListener.java:131) at org.hibernate.event.def.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:87) at org.hibernate.event.def.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:38) at org.hibernate.impl.SessionImpl.firePersist(SessionImpl.java:642) at org.hibernate.impl.SessionImpl.persist(SessionImpl.java:616) ... 8 more Caused by: org.postgresql.util.PSQLException: ERROR: relation "ontology_ontology_id_seq" does not exist at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250) at org.hibernate.id.AbstractPostInsertGenerator.getGenerated(AbstractPostInsertGenerator.java:42) ... 20 more Thanks in advance! Greetings, Felix -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 From mark.schreiber at novartis.com Wed Feb 15 21:44:31 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Feb 15 21:40:33 2006 Subject: [Biojava-l] Problem: BioSQL-cvs and/or RichSequence-Annotation Message-ID: Wow, that tutorial is out of date! The assembly sql is not required any longer. It was specifically put in by David Huen (I think) to allow him to store assembly data in biosql. Can anyone comment on the need for the accelerators? As for you second point I would discourage the use of the enrich method whenever possible. It does the best it can but cannot work miracles. If you get a new download of CVS RichSequence.Tools has several createRichSequence methods to avoid the use of this 'anti-pattern'. RichSequence seq = (SimpleRichSequence)RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","urn:local:seq")); As an aside there is no need to cast the return of enrich if you are assining it to a RichSequence pointer. Hope this helps, - Mark Felix Dreher Sent by: biojava-l-bounces@portal.open-bio.org 02/15/2006 10:49 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Problem: BioSQL-cvs and/or RichSequence-Annotation Hello, I have a question regarding the BioSQL-schema-scripts. The tutorial on installing BioSQL (http://www.biojava.org/tutorials/biosql.html) says that three scripts are required: biosqldb-pg.sql biosql-accelerators-pg.sql biosqldb-assembly-pg.sql However, the 'assembly'-script can not be found on the CVS-server. Instead there is another script called 'biosqldb-views-pg.sql'. So I would like to know which scripts should be used. Furthermore I have a problem with adding an annotation (or also a feature) to a RichSequence. As it seems to be a problem with Hibernate and/or the BioSQL-schemas: I use BioJava-live (CVS) from 2 weeks ago and the latest CVS-BioSQL-scripts. When I try the following code, the following Exceptions are thrown (while the execution of line 2). 1 RichSequence seq = (SimpleRichSequence) RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","urn:local:seq")); 2 ComparableTerm ct = RichObjectFactory.getDefaultOntology().getOrCreateTerm("projectname"); 3 seq.getAnnotation().setProperty(ct, "project_25"); Exception in thread "main" java.lang.RuntimeException: Error while trying to call new class org.biojavax.ontology.SimpleComparableOntology(class java.lang.String) at org.biojavax.bio.db.HibernateRichObjectBuilder.buildObject(HibernateRichObjectBuilder.java:154) at org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java:97) at org.biojavax.RichObjectFactory.getDefaultOntology(RichObjectFactory.java:178) at hibernatetest.Main.main(Main.java:246) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.biojavax.bio.db.HibernateRichObjectBuilder.buildObject(HibernateRichObjectBuilder.java:138) ... 3 more Caused by: org.hibernate.exception.SQLGrammarException: could not insert: [Ontology] at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:65) at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43) at org.hibernate.id.AbstractPostInsertGenerator.getGenerated(AbstractPostInsertGenerator.java:56) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:1994) at org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2405) at org.hibernate.action.EntityIdentityInsertAction.execute(EntityIdentityInsertAction.java:37) at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243) at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:269) at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:167) at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:101) at org.hibernate.event.def.DefaultPersistEventListener.entityIsTransient(DefaultPersistEventListener.java:131) at org.hibernate.event.def.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:87) at org.hibernate.event.def.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:38) at org.hibernate.impl.SessionImpl.firePersist(SessionImpl.java:642) at org.hibernate.impl.SessionImpl.persist(SessionImpl.java:616) ... 8 more Caused by: org.postgresql.util.PSQLException: ERROR: relation "ontology_ontology_id_seq" does not exist at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250) at org.hibernate.id.AbstractPostInsertGenerator.getGenerated(AbstractPostInsertGenerator.java:42) ... 20 more Thanks in advance! Greetings, Felix -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From dreher at mpiib-berlin.mpg.de Thu Feb 16 07:25:53 2006 From: dreher at mpiib-berlin.mpg.de (Felix Dreher) Date: Thu Feb 16 07:25:23 2006 Subject: [Biojava-l] Problem: BioSQL-cvs and/or RichSequence-Annotation Message-ID: <43F46F51.5030002@mpiib-berlin.mpg.de> Hello Mark, thank you for the information. If I got it right, when using BioJavaX, the only BioSQL-script that is really needed for PostgreSQL is 'biosqldb-pg.sql' (plus possibly 'biosql-accelerators-pg.sql'). I tried the example again (with and without the accelerator-script) with the new CVS-RichSequence.Tools method (see below), but still the same exceptions are thrown: org.postgresql.util.PSQLException: ERROR: relation "ontology_ontology_id_seq" does not exist org.hibernate.exception.SQLGrammarException: could not insert: [Ontology] Exception in thread "main" java.lang.RuntimeException: Error while trying to call new class org.biojavax.ontology.SimpleComparableOntology(class java.lang.String) I'm wondering if something with the Hibernate-Configuration is wrong, because in the log-file I found two suspicious entries: 2006-02-16 12:35:12,676 INFO [main] calling method: org.hibernate.transaction.TransactionManagerLookupFactory.getTransactionManagerLookup(TransactionManagerLookupFactory.java:33) No TransactionManagerLookup configured (in JTA environment, use of read-write or transactional second-level cache is not recommended) 2006-02-16 12:35:12,754 WARN [main] calling method: net.sf.ehcache.config.Configurator.configure(Configurator.java:126) No configuration found. Configuring ehcache from ehcache-failsafe.xml found in the classpath: jar:file:/home/dreher/Java/hibernate-3.1/lib/ehcache-1.1.jar!/ehcache-failsafe.xml Since I ran out of ideas, I hope maybe someone has a hint where I could search further. Thanks in advance, Felix P.S.: Here's the code-example: public class HibernateTest { static private final Logger logger = PredictionLogger.getLogger(HibernateTest.class); public static void main(String[] args) { SessionFactory hibernateFactory = new Configuration().configure().buildSessionFactory(); Session session = hibernateFactory.openSession(); RichObjectFactory.connectToBioSQL(session); Transaction tx = session.beginTransaction(); try { //create a RichSequence FiniteAlphabet dna = (FiniteAlphabet) AlphabetManager.alphabetForName("DNA"); RichSequence seq = RichSequence.Tools.createRichSequence("targets", "testseq", "acgcttcatctgc", dna); //add an Annotation to that Sequence ComparableTerm ct = RichObjectFactory.getDefaultOntology().getOrCreateTerm("projectname"); seq.getAnnotation().setProperty(ct, "bklf25"); tx.commit(); System.out.println("Annotation added."); } catch (Exception ex) { tx.rollback(); System.out.println("Transaction Error."); logger.error("Changes rolled back.", ex); } finally { session.close(); } } } mark.schreiber@novartis.com wrote: >Wow, that tutorial is out of date! > >The assembly sql is not required any longer. It was specifically put in by >David Huen (I think) to allow him to store assembly data in biosql. Can >anyone comment on the need for the accelerators? > >As for you second point I would discourage the use of the enrich method >whenever possible. It does the best it can but cannot work miracles. If >you get a new download of CVS RichSequence.Tools has several >createRichSequence methods to avoid the use of this 'anti-pattern'. > >RichSequence seq = >(SimpleRichSequence)RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","urn:local:seq")); > >As an aside there is no need to cast the return of enrich if you are >assining it to a RichSequence pointer. > >Hope this helps, > >- Mark > > > > > >Felix Dreher >Sent by: biojava-l-bounces@portal.open-bio.org >02/15/2006 10:49 PM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] Problem: BioSQL-cvs and/or RichSequence-Annotation > > >Hello, > >I have a question regarding the BioSQL-schema-scripts. >The tutorial on installing BioSQL >(http://www.biojava.org/tutorials/biosql.html) says that three scripts >are required: > >biosqldb-pg.sql >biosql-accelerators-pg.sql >biosqldb-assembly-pg.sql > >However, the 'assembly'-script can not be found on the CVS-server. >Instead there is another script called 'biosqldb-views-pg.sql'. >So I would like to know which scripts should be used. > > >Furthermore I have a problem with adding an annotation (or also a >feature) to a RichSequence. >As it seems to be a problem with Hibernate and/or the BioSQL-schemas: I >use BioJava-live (CVS) from 2 weeks ago and the latest CVS-BioSQL-scripts. > >When I try the following code, the following Exceptions are thrown >(while the execution of line 2). > >1 RichSequence seq = (SimpleRichSequence) >RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","urn:local:seq")); >2 ComparableTerm ct = >RichObjectFactory.getDefaultOntology().getOrCreateTerm("projectname"); >3 seq.getAnnotation().setProperty(ct, "project_25"); > > > > >Exception in thread "main" java.lang.RuntimeException: Error while >trying to call new class >org.biojavax.ontology.SimpleComparableOntology(class java.lang.String) > at >org.biojavax.bio.db.HibernateRichObjectBuilder.buildObject(HibernateRichObjectBuilder.java:154) > at >org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java:97) > at >org.biojavax.RichObjectFactory.getDefaultOntology(RichObjectFactory.java:178) > at hibernatetest.Main.main(Main.java:246) >Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at >org.biojavax.bio.db.HibernateRichObjectBuilder.buildObject(HibernateRichObjectBuilder.java:138) > ... 3 more >Caused by: org.hibernate.exception.SQLGrammarException: could not >insert: [Ontology] > at >org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:65) > at >org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43) > at >org.hibernate.id.AbstractPostInsertGenerator.getGenerated(AbstractPostInsertGenerator.java:56) > at >org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:1994) > at >org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2405) > at >org.hibernate.action.EntityIdentityInsertAction.execute(EntityIdentityInsertAction.java:37) > at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243) > at >org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:269) > at >org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:167) > at >org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:101) > at >org.hibernate.event.def.DefaultPersistEventListener.entityIsTransient(DefaultPersistEventListener.java:131) > at >org.hibernate.event.def.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:87) > at >org.hibernate.event.def.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:38) > at >org.hibernate.impl.SessionImpl.firePersist(SessionImpl.java:642) > at org.hibernate.impl.SessionImpl.persist(SessionImpl.java:616) > ... 8 more >Caused by: org.postgresql.util.PSQLException: ERROR: relation >"ontology_ontology_id_seq" does not exist > at >org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512) > at >org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297) > at >org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188) > at >org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430) > at >org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346) > at >org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250) > at >org.hibernate.id.AbstractPostInsertGenerator.getGenerated(AbstractPostInsertGenerator.java:42) > ... 20 more > > > >Thanks in advance! > >Greetings, >Felix > > > > > -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 From mark.schreiber at novartis.com Thu Feb 16 07:45:04 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Thu Feb 16 07:40:46 2006 Subject: [Biojava-l] Problem: BioSQL-cvs and/or RichSequence-Annotation Message-ID: Looking further at your exception trace from your previous email it seems like an error somewhere in the Hibernate binding in one of the hbm.xml config files. Specifically, org.postgresql.util.PSQLException: ERROR: relation "ontology_ontology_id_seq" does not exist The log files mean you have not configured a JTA transaction manager or cache. Not critical but recommended for any serious application. - Mark Felix Dreher 02/16/2006 08:25 PM To: Mark Schreiber/GP/Novartis@PH, biojava-l@biojava.org cc: Subject: Re: [Biojava-l] Problem: BioSQL-cvs and/or RichSequence-Annotation Hello Mark, thank you for the information. If I got it right, when using BioJavaX, the only BioSQL-script that is really needed for PostgreSQL is 'biosqldb-pg.sql' (plus possibly 'biosql-accelerators-pg.sql'). I tried the example again (with and without the accelerator-script) with the new CVS-RichSequence.Tools method (see below), but still the same exceptions are thrown: org.postgresql.util.PSQLException: ERROR: relation "ontology_ontology_id_seq" does not exist org.hibernate.exception.SQLGrammarException: could not insert: [Ontology] Exception in thread "main" java.lang.RuntimeException: Error while trying to call new class org.biojavax.ontology.SimpleComparableOntology(class java.lang.String) I'm wondering if something with the Hibernate-Configuration is wrong, because in the log-file I found two suspicious entries: 2006-02-16 12:35:12,676 INFO [main] calling method: org.hibernate.transaction.TransactionManagerLookupFactory.getTransactionManagerLookup(TransactionManagerLookupFactory.java:33) No TransactionManagerLookup configured (in JTA environment, use of read-write or transactional second-level cache is not recommended) 2006-02-16 12:35:12,754 WARN [main] calling method: net.sf.ehcache.config.Configurator.configure(Configurator.java:126) No configuration found. Configuring ehcache from ehcache-failsafe.xml found in the classpath: jar:file:/home/dreher/Java/hibernate-3.1/lib/ehcache-1.1.jar!/ehcache-failsafe.xml Since I ran out of ideas, I hope maybe someone has a hint where I could search further. Thanks in advance, Felix P.S.: Here's the code-example: public class HibernateTest { static private final Logger logger = PredictionLogger.getLogger(HibernateTest.class); public static void main(String[] args) { SessionFactory hibernateFactory = new Configuration().configure().buildSessionFactory(); Session session = hibernateFactory.openSession(); RichObjectFactory.connectToBioSQL(session); Transaction tx = session.beginTransaction(); try { //create a RichSequence FiniteAlphabet dna = (FiniteAlphabet) AlphabetManager.alphabetForName("DNA"); RichSequence seq = RichSequence.Tools.createRichSequence("targets", "testseq", "acgcttcatctgc", dna); //add an Annotation to that Sequence ComparableTerm ct = RichObjectFactory.getDefaultOntology().getOrCreateTerm("projectname"); seq.getAnnotation().setProperty(ct, "bklf25"); tx.commit(); System.out.println("Annotation added."); } catch (Exception ex) { tx.rollback(); System.out.println("Transaction Error."); logger.error("Changes rolled back.", ex); } finally { session.close(); } } } mark.schreiber@novartis.com wrote: >Wow, that tutorial is out of date! > >The assembly sql is not required any longer. It was specifically put in by >David Huen (I think) to allow him to store assembly data in biosql. Can >anyone comment on the need for the accelerators? > >As for you second point I would discourage the use of the enrich method >whenever possible. It does the best it can but cannot work miracles. If >you get a new download of CVS RichSequence.Tools has several >createRichSequence methods to avoid the use of this 'anti-pattern'. > >RichSequence seq = >(SimpleRichSequence)RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","urn:local:seq")); > >As an aside there is no need to cast the return of enrich if you are >assining it to a RichSequence pointer. > >Hope this helps, > >- Mark > > > > > >Felix Dreher >Sent by: biojava-l-bounces@portal.open-bio.org >02/15/2006 10:49 PM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] Problem: BioSQL-cvs and/or RichSequence-Annotation > > >Hello, > >I have a question regarding the BioSQL-schema-scripts. >The tutorial on installing BioSQL >(http://www.biojava.org/tutorials/biosql.html) says that three scripts >are required: > >biosqldb-pg.sql >biosql-accelerators-pg.sql >biosqldb-assembly-pg.sql > >However, the 'assembly'-script can not be found on the CVS-server. >Instead there is another script called 'biosqldb-views-pg.sql'. >So I would like to know which scripts should be used. > > >Furthermore I have a problem with adding an annotation (or also a >feature) to a RichSequence. >As it seems to be a problem with Hibernate and/or the BioSQL-schemas: I >use BioJava-live (CVS) from 2 weeks ago and the latest CVS-BioSQL-scripts. > >When I try the following code, the following Exceptions are thrown >(while the execution of line 2). > >1 RichSequence seq = (SimpleRichSequence) >RichSequence.Tools.enrich(DNATools.createDNASequence("gattacagattaca","urn:local:seq")); >2 ComparableTerm ct = >RichObjectFactory.getDefaultOntology().getOrCreateTerm("projectname"); >3 seq.getAnnotation().setProperty(ct, "project_25"); > > > > >Exception in thread "main" java.lang.RuntimeException: Error while >trying to call new class >org.biojavax.ontology.SimpleComparableOntology(class java.lang.String) > at >org.biojavax.bio.db.HibernateRichObjectBuilder.buildObject(HibernateRichObjectBuilder.java:154) > at >org.biojavax.RichObjectFactory.getObject(RichObjectFactory.java:97) > at >org.biojavax.RichObjectFactory.getDefaultOntology(RichObjectFactory.java:178) > at hibernatetest.Main.main(Main.java:246) >Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at >org.biojavax.bio.db.HibernateRichObjectBuilder.buildObject(HibernateRichObjectBuilder.java:138) > ... 3 more >Caused by: org.hibernate.exception.SQLGrammarException: could not >insert: [Ontology] > at >org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:65) > at >org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43) > at >org.hibernate.id.AbstractPostInsertGenerator.getGenerated(AbstractPostInsertGenerator.java:56) > at >org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:1994) > at >org.hibernate.persister.entity.AbstractEntityPersister.insert(AbstractEntityPersister.java:2405) > at >org.hibernate.action.EntityIdentityInsertAction.execute(EntityIdentityInsertAction.java:37) > at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:243) > at >org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:269) > at >org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:167) > at >org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:101) > at >org.hibernate.event.def.DefaultPersistEventListener.entityIsTransient(DefaultPersistEventListener.java:131) > at >org.hibernate.event.def.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:87) > at >org.hibernate.event.def.DefaultPersistEventListener.onPersist(DefaultPersistEventListener.java:38) > at >org.hibernate.impl.SessionImpl.firePersist(SessionImpl.java:642) > at org.hibernate.impl.SessionImpl.persist(SessionImpl.java:616) > ... 8 more >Caused by: org.postgresql.util.PSQLException: ERROR: relation >"ontology_ontology_id_seq" does not exist > at >org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1512) > at >org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1297) > at >org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188) > at >org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:430) > at >org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:346) > at >org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:250) > at >org.hibernate.id.AbstractPostInsertGenerator.getGenerated(AbstractPostInsertGenerator.java:42) > ... 20 more > > > >Thanks in advance! > >Greetings, >Felix > > > > > -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 From mthomasc at vub.ac.be Fri Feb 17 05:16:05 2006 From: mthomasc at vub.ac.be (Morgane THOMAS-CHOLLIER) Date: Fri Feb 17 05:37:21 2006 Subject: [Biojava-l] Genbank parser error [biojavax] In-Reply-To: References: Message-ID: <43F5A265.7000605@vub.ac.be> Hello Mark, Thank you very much for your quick reply. However, I could not find out how to get the organism informations via the (Rich)Annotation. Would it be possible for you to post a piece of code showing how I could retrieve the common name for the organism ? Sorry for insisting, but I really need this parser for my work, and I also really need to retrieve the organism info from the file :) Thank you for your help, Morgane. mark.schreiber@novartis.com wrote: >I think these properties should be going to the (Rich)Annotation bundle. > >- Mark > > > > > >Morgane THOMAS-CHOLLIER >Sent by: biojava-l-bounces@portal.open-bio.org >02/15/2006 04:56 PM > > > To: biojava-l@biojava.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: Re: [Biojava-l] Genbank parser error [biojavax] > > >Hello again, > >I have continued using the Genbank parser, but this time with Genbank >files coming from NCBI :) > >I really appreciate the example from the documentation that converts a >Genbank file into an EMBL file. I have to say, it is really easy to use. > >I nevertheless have a question concerning the Organism and Source tags. >Indeed, it is clear in the documentation that they are ignored by the >parser. >But I do not really understand why. >When I used the Genbank file of the accession numbers : AC147788 and >DQ158013, I was unable to get the common name of the organism or use >getNameHierarchy(), but I can get the taxon ID for both. > >Is there a way to get the common name of the organism, without using a >remote call to the NCBI with the taxonID ? > >Thanks for your help, > >Morgane. > >Morgane THOMAS-CHOLLIER wrote: > > > >>Hello Mark, >> >>My file is indeed too large to be posted. >>So I have exported a smaller sequence from Ensembl that I tested with >>the parser. The behavior is the same. >>You will find below this "Genbank" formatted file enclosed. >> >>Thanks for your help, >> >>Morgane. >> >>LOCUS 6 3498 bp DNA HTG 14-FEB-2006 >>DEFINITION Mus musculus chromosome 6 NCBIM34 partial sequence >> 52305503..52309000 reannotated via EnsEMBL >>ACCESSION chromosome:NCBIM34:6:52305503:52309000:1 >>VERSION chromosome:NCBIM34:6:52305503:52309000:1 >>KEYWORDS . >>SOURCE House mouse >> ORGANISM Mus musculus >> Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; >>Euteleostomi; >> Mammalia; Eutheria; Euarchontoglires; Glires; Rodentia; >> Sciurognathi; Muridae; Murinae; Mus. >>COMMENT This sequence was annotated by the Ensembl system. Please >>visit the >> Ensembl web site, http://www.ensembl.org/ for more >>information. >>COMMENT All feature locations are relative to the first (5') base >>of the >> sequence in this file. The sequence presented is always the >> forward strand of the assembly. Features that lie outside >>of the >> sequence contained in this file have clonal location >>coordinates in >> the format: .:.. >>COMMENT The /gene indicates a unique id for a gene, >> /note="transcript_id=..." a unique id for a transcript, >>/protein_id >> a unique id for a peptide and note="exon_id=..." a unique >>id for an >> exon. These ids are maintained wherever possible between >>versions. >>COMMENT All the exons and transcripts in Ensembl are confirmed by >> similarity to either protein or cDNA sequences. >>FEATURES Location/Qualifiers >> source 1..3498 >> /organism="Mus musculus" >> /db_xref="taxon:10090" >> gene complement(506..2826) >> /gene=ENSMUSG00000014704 >> mRNA join(complement(2261..2826),complement(506..1620)) >> /gene="ENSMUSG00000014704" >> /note="transcript_id=ENSMUST00000014848" >> CDS join(complement(2261..2639),complement(881..1620)) >> /gene="ENSMUSG00000014704" >> /protein_id="ENSMUSP00000014848" >> /note="transcript_id=ENSMUST00000014848" >> /db_xref="MarkerSymbol:Hoxa2" >> /db_xref="Uniprot/SWISSPROT:HXA2_MOUSE" >> /db_xref="RefSeq_peptide:NP_034581.1" >> /db_xref="RefSeq_dna:NM_010451.1" >> /db_xref="Uniprot/SPTREMBL:Q3UYP9_MOUSE" >> /db_xref="Uniprot/SPTREMBL:Q920T7_MOUSE" >> /db_xref="Uniprot/SPTREMBL:Q920T9_MOUSE" >> /db_xref="Uniprot/SPTREMBL:Q920U0_MOUSE" >> /db_xref="Uniprot/SPTREMBL:Q920U1_MOUSE" >> /db_xref="Uniprot/SPTREMBL:Q920U2_MOUSE" >> /db_xref="Uniprot/SPTREMBL:Q920U3_MOUSE" >> /db_xref="Uniprot/SPTREMBL:Q920U4_MOUSE" >> /db_xref="Uniprot/SPTREMBL:Q920U5_MOUSE" >> /db_xref="EntrezGene:15399" >> /db_xref="AgilentProbe:A_51_P501803" >> /db_xref="EMBL:AB039184" >> /db_xref="EMBL:AB039185" >> /db_xref="EMBL:AB039186" >> /db_xref="EMBL:AB039187" >> /db_xref="EMBL:AB039188" >> /db_xref="EMBL:AB039189" >> /db_xref="EMBL:AB039190" >> /db_xref="EMBL:AB039191" >> /db_xref="EMBL:AB039192" >> /db_xref="EMBL:AK134501" >> /db_xref="EMBL:M87801" >> /db_xref="EMBL:M93148" >> /db_xref="EMBL:M93292" >> /db_xref="EMBL:M95599" >> /db_xref="GO:GO:0003700" >> /db_xref="GO:GO:0005634" >> /db_xref="GO:GO:0006355" >> /db_xref="GO:GO:0007275" >> /db_xref="IPI:IPI00132242.1" >> /db_xref="UniGene:Mm.131" >> /db_xref="protein_id:AAA37827.1" >> /db_xref="protein_id:AAA37834.1" >> /db_xref="protein_id:AAA37835.1" >> /db_xref="protein_id:AAA37836.1" >> /db_xref="protein_id:BAB68708.1" >> /db_xref="protein_id:BAB68709.1" >> /db_xref="protein_id:BAB68710.1" >> /db_xref="protein_id:BAB68711.1" >> /db_xref="protein_id:BAB68712.1" >> /db_xref="protein_id:BAB68713.1" >> /db_xref="protein_id:BAB68714.1" >> /db_xref="protein_id:BAB68715.1" >> /db_xref="protein_id:BAB68716.1" >> /db_xref="protein_id:BAE22163.1" >> /db_xref="AFFY_MG_U74Av2:102643_at" >> /db_xref="AFFY_MG_U74Cv2:171063_at" >> /db_xref="AFFY_Mouse430A_2:1419602_at" >> /db_xref="AFFY_Mouse430_2:1419602_at" >> >>/translation="MNYEFEREIGFINSQPSLAECLTSFPPVADTFQSSSIKTSTLSH >> >>STLIPPPFEQTIPSLNPGSHPRHGAGVGGRPKSSPAGSRGSPVPAGALQPPEYPWMKE >> >>KKAAKKTALPPAAASTGPACLGHKESLEIADGSGGGSRRLRTAYTNTQLLELEKEFHF >> >>NKYLCRPRRVEIAALLDLTERQVKVWFQNRRMKHKRQTQCKENQNSEGKFKNLEDSDK >> >>VEEDEEEKSLFEQALSVSGALLEREGYTFQQNALSQQQAPNGHNGDSQTFPVSPLTSN >> >>EKNLKHFQHQSPTVPNCLSTMGQNCGAGLNNDSPEAIEVPSLQDFNVFSTDSCLQLSD >> ALSPSLPGSLDSPVDISADSFDFFTDTLTTIDLQHLNY" >> exon complement(506..1620) >> /note="exon_id=ENSMUSE00000387033" >> exon complement(2261..2826) >> /note="exon_id=ENSMUSE00000193269" >>BASE COUNT 938 a 815 c 882 g 863 t >>ORIGIN >> 1 AGGAAGAGTT GGAACGTAGA TGTTTGAAAC AAATGTGTAT AAATAAATGA >>ATTTTTGATA >> 61 ACTCCGTTAT TGACCTAGAA ACTAGCAGCT TGGTAAGGGA ACTCCATTCC >>ACTCCACTCG >> 121 TCCTAGAACT GGAAGTTTTT GTAGGCACTT TTCCTCTCCA CACTCAAAAG >>CTTGGGCTAG >> 181 GGCCAACTCA GGCTGCCCAA GCCCATTTCT ATTACTAATG TAACTCTATG >>GCCTGAGTCT >> 241 CAACACTGAA AACCAAATTC ATTCCCTTAG GGGGGAAAAA TCCAAAAAAA >>AAAAAAAAAA >> 301 AAGTCTTGCC AGAAGCCCTA GCACTTTCTG GTTTTCTTCT TTGTTGCTGT >>TTGTTGCAGG >> 361 CTTTGAACAT GCCACCCTAA TAAAATATAT TAAGATTGAA AAGTAAATTG >>TGACCAGACT >> 421 TTTATTTACC ATGTTAGACT AAAAGAAGTA TAAGAAATCA GTATGAGTCT >>TGAGAAAGAG >> 481 GGGAAGAAAA AAATAAGAAA GCTACTTATA GCAAAGGAGA ATTTATTCTA >>CCAAAAATAC >> 541 GCATGACAAT GCATTCTAAT GTGGTACAAA AATAAACAGA AAGTGACAAG >>ACAATTTATG >> 601 GTCACTTTCT TGCAGGCCTC CTGTTTTGTT TTTCAGGAAA ATCACATAGA >>AGCTTGTTGG >> 661 GTTCTGTGTA AAAACCACTT AGAACGCCAA CATAATTTGC AAGAGATGGC >>TTTAAAACTG >> 721 TGTCAGGGGA GAACATTAAA CGGAAAGTCC TCAACATTTG AGAGAGTAGG >>GGTAGATCAA >> 781 GAAGAAACTA AAACGAAAAT CAACTCCCAG AATAAAAGAA GGCAAAGCCA >>CCTGGTCAAA >> 841 GGCGTTTTGT TTTGTGAAGC TTTGTTTTGC TTTAATGTTC TTAGTAATTC >>AGATGCTGTA >> 901 GGTCGATTGT GGTGAGTGTG TCTGTAAAAA AGTCAAAGCT GTCAGCTGAG >>ATATCTACAG >> 961 GACTGTCCAG GGAGCCAGGC AAGCTGGGCG ACAGTGCATC TGAAAGCTGC >>AGGCAGGAAT >> 1021 CTGTGGAGAA AACATTGAAG TCCTGCAAAG AGGGGACCTC GATGGCCTCG >>GGACTGTCAT >> 1081 TGTTTAGGCC AGCTCCACAG TTCTGGCCCA TTGTTGACAA GCAGTTAGGA >>ACAGTGGGTG >> 1141 ACTGGTGCTG AAAATGTTTC AAATTTTTCT CATTGCTGGT TAAAGGCGAA >>ACTGGGAAAG >> 1201 TTTGGGAGTC GCCATTGTGT CCATTGGGAG CCTGCTGTTG AGAGAGCGCA >>TTTTGCTGAA >> 1261 AAGTGTACCC TTCCCTCTCC AGAAGGGCCC CGGAGACACT GAGGGCTTGC >>TCAAAGAGTG >> 1321 ACTTCTCTTC CTCGTCTTCC TCCACTTTGT CCGAGTCCTC CAGGTTTTTA >>AATTTCCCTT >> 1381 CGCTGTTTTG GTTCTCCTTG CACTGGGTTT GCCTCTTATG CTTCATTCTC >>CGGTTCTGAA >> 1441 ACCACACTTT CACTTGTCTC TCGGTCAAAT CCAGCAGCGC GGCGATTTCC >>ACCCTGCGGG >> 1501 GTCTGCAAAG GTACTTGTTG AAATGAAATT CCTTTTCCAG CTCCAAAAGC >>TGAGTGTTGG >> 1561 TGTACGCGGT TCTCAGACGC CTGGATCCCC CGCCGCTGCC ATCAGCTATT >>TCCAGGGATT >> 1621 CTGCAGAAAG GGAAACCAAC AAGAGACACA CATACAGTTG AAGGTGGAAG >>GGTCCGAGCA >> 1681 GGGTTATTCC ATTGGAGCAT AAATACAGCA GAAAAGATCA ACTGCAACAA >>AATGGCCGCC >> 1741 CCTGGATGCA GTGCAGCTAT TGTGCTGCCC TTCCTGGGAG CCCAGCCCGG >>GGAAGCCCAG >> 1801 TCTCTTCCAC CTCCATCAAA TTCCTGCCTG TGGCTTCCCC CAACCTCTTC >>ATCCGGGAGC >> 1861 AAACTTTATA TTAGCTACAA CACAATTTAT AATTAATGCA TCAGCTGCTT >>AGCTGAGCAA >> 1921 GAGCGGTCTA TCACTCTTCA TTACTGTCAA AAAGCCAAAC TCTAGGACAA >>CTAGACAAGA >> 1981 GGAGGTCAGT TCCAACTCAA ATAAATCATC CTACATTACA CAAGTTAGGG >>AAAGTGCCCC >> 2041 CCCTCCTCAA AATATATATG TCTCATTGTG GGACTCGGGA TCTATTTTCC >>CCTCCACCAA >> 2101 ACCCACTCCT GAGACCACAG GGGCATGAGA CCCGCCACCA GGCATCTCTC >>TCTCTCCCCC >> 2161 TTCCCTCGAA GCTCATGGTC CCCTCCCCCA CAACCGCTCC TAGGGAAGCC >>CGGAGGGGGA >> 2221 CAAGGGTCCC CGAGACCTGG GGCCAAGTCT CCGGACTGAC CTTTGTGGCC >>GAGGCAGGCA >> 2281 GGGCCCGTGG AGGCGGCGGC GGGCGGCAGC GCGGTTTTCT TGGCCGCCTT >>CTTCTCCTTC >> 2341 ATCCAGGGAT ACTCAGGCGG CTGCAGGGCG CCGGCAGGCA CCGGGCTGCC >>GCGACTGCCC >> 2401 GCGGGGCTCG ACTTGGGGCG GCCGCCAACG CCAGCGCCGT GGCGAGGGTG >>ACTGCCCGGG >> 2461 TTCAGGCTGG GAATGGTCTG CTCAAAAGGA GGAGGAATCA GTGTCGAGTG >>TGAAAGCGTC >> 2521 GAGGTCTTGA TTGATGAACT TTGAAATGTA TCAGCGACAG GGGGAAAAGA >>TGTCAGGCAC >> 2581 TCAGCGAGCG ACGGCTGGCT ATTGATAAAA CCAATCTCTC GCTCAAATTC >>GTAATTCATG >> 2641 GCCTTCTCCT TGGAGCCCCC TCGGAGGAAA AGTTCCCTCT TTTGGAGGGG >>CTTTGGGGGG >> 2701 GCAAGGCCCA GGAAAAAGGC GAGCGCGAAG GAAAAAAAAA TCTATCATAG >>AAGATCGCTG >> 2761 CTGGGGTGTT TTTTTTCTAA TTCACTGATT ACAGCCGTAT GGGGACCGCG >>CTACTATTAA >> 2821 ACTATTGAAT TCATGGAGAC AAGGTTGAAA TTGGACCGAA TTGGCTGTCA >>CATGATTGCT >> 2881 TCTGCCCAAT GACAATTTGG GCTTTAATCA AAAGAAGCCA CTGTCTGTTT >>GATTGATCCA >> 2941 AAAAAGTCAG AAAGGAACGC CTCATTGGGG GCCATCGAGG CTTTATTTAC >>ACTTTTTTTC >> 3001 AGGGCAAAAA TACATATATG TGGGTGTGGA TGGCAATGCC CCGGGAGTGC >>GTGGGGGGCG >> 3061 AGAGTGCCTG TTTGCCTCCT GATCTGCAAG GATCTAGTGT GCTCCCTGGA >>GTGTGTGTGT >> 3121 GAGTGTGTGC GTGTGAGCCC TGCTGCCGTC CCGCCAGTGG CTGCCCTCTG >>CCTCCCCCGC >> 3181 ACACTCCGCG CATTGTTTGG GACTGTCGGG AAGACGCCTC GCACCTCACA >>AATCATTTAA >> 3241 GCACCTCAGC CTGACGCCTG CAGTCATTAA CAAAGTAATC CATTAATCTT >>CAAAGTTTTG >> 3301 ACACCCCAGG GCCCTGCATC TCAGCCACAT AAGTTCTGCT AAGGCAAGAG >>AAAGGAGCAG >> 3361 AGTGGGAGAG AGAGAGGAGA GAGGGAGAGA GGGAGAGAGG GAGAGAGAGA >>GAGAGAGAGA >> 3421 GAGAGAGAGA GAGAGAGAGA GAGAGAATGA ATATTGGGGT TCACCTTTCC >>TCTTCCTCCT >> 3481 CTTTTTCCAA AATCAGTT >>// >> >> >> >> >>mark.schreiber@novartis.com wrote: >> >> >> >>>Hi Morgane - >>> >>>I have to say that doesn't look much like Genbank : ) >>> >>>The biojavax parser are possibly a bit brittle due to their use of >>>regexps to recognize key elements. It should be fixable, I think the >>>problem is that the parser expects a word after LOCUS not a number. >>>This may not be the only problem though. Could you post the entire >>>file? Or if it is large then a representative file of smaller size. >>> >>>- Mark >>> >>> >>> >>> >>> >>>Morgane THOMAS-CHOLLIER >>>Sent by: biojava-l-bounces@portal.open-bio.org >>>02/14/2006 04:36 AM >>> >>> >>> To: biojava-l@biojava.org >>> cc: (bcc: Mark Schreiber/GP/Novartis) >>> Subject: [Biojava-l] Genbank parser error [biojavax] >>> >>> >>>Hello, >>> >>>I have tried biojavax today with a view to use the Genbank file parser. >>> >>>My test file is a Genbank formatted file which has been produced by >>>Ensembl export system. >>> >>>The head of the file is as follow : >>> >>>LOCUS 6 489671 bp DNA HTG 13-FEB-2006 >>>DEFINITION Mus musculus chromosome 6 NCBIM34 partial sequence >>> 52296503..52786173 reannotated via EnsEMBL >>>ACCESSION chromosome:NCBIM34:6:52296503:52786173:1 >>>VERSION chromosome:NCBIM34:6:52296503:52786173:1 >>> >>>I used the code provided in biojavax docbook to parse this file. >>>I get the following error : >>> >>>Exception in thread "main" org.biojava.bio.BioException: Could not >>>read sequence >>> at >>> >>> >>> >org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) > > > >>> at >>> >>> >>> >org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31) > > > >>>Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line >>>found: 6 489671 bp DNA HTG 13-FEB-2006 >>> at >>> >>> >>> >org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229) > > > >>> at >>> >>> >>> >org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) > > > >>> ... 1 more >>> >>>I had a look at GenbankFormat.java, and I guess the problem comes >>>from the regular expression that do not recognize the LOCUS as a >>>standard Genbank file LOCUS tag. >>> >>>Am I wrong ? Have biojavax Genbank parser been tested on Ensembl >>>exported files ? >>> >>>Morgane. >>> >>> >>> >>> >>> > > > -- ********************************************************** Morgane THOMAS-CHOLLIER, PHD Student (mthomasc@vub.ac.be) Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium Tel : +32 2 629 15 22 ********************************************************** Stop Using Internet Explorer, choose FIREFOX ! From mark.schreiber at novartis.com Sun Feb 19 21:39:52 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Feb 19 21:35:27 2006 Subject: [Biojava-l] RichAnnotation Message-ID: Hello - We recently had some questions on the list about getting information out of a RichAnnotation. This would be one way to do it ... RichAnnotation theAnnotation = (RichAnnotation)seq.getAnnotation(); Iterator notesIterator = theAnnotation.getNoteSet().iterator(); while (notesIterator.hasNext()) { System.out.println(); Note note = (Note)notesIterator.next(); System.out.println(note); } All notes have a Term and a value. The value is a String and the Term is an ontology term. Term term = note.getTerm(); String value = note.getValue(); The term has a name which is also a String String name = term.getName(); So to get the name, value pair of a note you would do this: String name = note.getTerm().getName(); String value = note.getValue(); Which is pretty much what the Note toString() method does. - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From mark.schreiber at novartis.com Sun Feb 19 22:24:03 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Feb 19 22:20:33 2006 Subject: [Biojava-l] release plan Message-ID: Hello - There have been a few questions about when we are planning for a biojava 1.5 release. I have posted a release plan to the website http://biojava.open-bio.org/wiki/BioJava:1.5ReleasePlan Please feel free to comment and modify. As always volunteers are required to help move this forward. Let me know if you can help at all with any of the tasks. - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From mark.schreiber at novartis.com Sun Feb 19 22:36:54 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Feb 19 22:32:27 2006 Subject: [Biojava-l] last call for logos Message-ID: Hello - Last chance to post a logo for the biojava logo (http://biojava.open-bio.org/wiki/BioJava:Logo). If you like one of the ones you see but think it could be improved why not modify it. This is open-source after all. "Voting" will start soon :) - Mark From dreher at mpiib-berlin.mpg.de Mon Feb 20 12:34:22 2006 From: dreher at mpiib-berlin.mpg.de (Felix Dreher) Date: Mon Feb 20 12:33:44 2006 Subject: [Biojava-l] BioSQLRichSequenceDB: initialisation error Message-ID: <43F9FD9E.1050408@mpiib-berlin.mpg.de> Hello, in the constructor of BioSQLRichSequenceDB the following line is called: this.addCriteria = criteria.getMethod("add", new Class[]{Class.class}); where criteria is an instance of org.hibernate.Criteria The problem is, when I try to initialise a BioSQLRichSequenceDB, a "NoSuchMethodException" is thrown at this line. I searched for the getMethod- method and in fact it is not present in org.hibernate.Criteria. So does anyone know if this is an error in the BioJava-class or in the Hibernate-class? Greetings, Felix -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 From mark.schreiber at novartis.com Wed Feb 22 07:45:21 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Feb 22 07:40:52 2006 Subject: [Biojava-l] BioSQLRichSequenceDB: initialisation error Message-ID: I think the method is add(Class class) but I will look into it shortly. Please be aware that this is a very new untested and experimental class, unfortunately the author is in the process of moving countries so we may not be able to support you on this immediately. Apologies for the problem. - Mark Felix Dreher Sent by: biojava-l-bounces@portal.open-bio.org 02/21/2006 01:34 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] BioSQLRichSequenceDB: initialisation error Hello, in the constructor of BioSQLRichSequenceDB the following line is called: this.addCriteria = criteria.getMethod("add", new Class[]{Class.class}); where criteria is an instance of org.hibernate.Criteria The problem is, when I try to initialise a BioSQLRichSequenceDB, a "NoSuchMethodException" is thrown at this line. I searched for the getMethod- method and in fact it is not present in org.hibernate.Criteria. So does anyone know if this is an error in the BioJava-class or in the Hibernate-class? Greetings, Felix -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From dreher at mpiib-berlin.mpg.de Fri Feb 24 07:53:45 2006 From: dreher at mpiib-berlin.mpg.de (Felix Dreher) Date: Fri Feb 24 07:49:43 2006 Subject: [Biojava-l] BioJavaX docbook minor corrections Message-ID: <43FF01D9.8040107@mpiib-berlin.mpg.de> Hello, just two suggestions/corrections about the BioJavaX-docbook. In the section "Configuring your application to use Hibernate and BioSQL", complete example, I found two errors (or at least these parts don't work in my test-app). 1) // print out all the sequences in the namespace Query sq = session.createQuery("from BioEntry where namespace=?",ns); --> should probably be: Query sq = session.createQuery("from BioEntry where namespace=:nsp"); sq.setParameter("nsp",ns); 2) // if the sequence is called bloggs, change its version to 99 be.setVersion(99); --> can't use the method setVersion(int) --> but e.g. setDescription("XYZ"); Regards, Felix -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 From mark.schreiber at novartis.com Mon Feb 27 03:18:10 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Feb 27 03:13:30 2006 Subject: [Biojava-l] BioSQLRichSequenceDB: initialisation error Message-ID: Hi - This should be resolved in CVS now. Let me know if it doesn't work. Best regards, - Mark Felix Dreher Sent by: biojava-l-bounces@portal.open-bio.org 02/21/2006 01:34 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] BioSQLRichSequenceDB: initialisation error Hello, in the constructor of BioSQLRichSequenceDB the following line is called: this.addCriteria = criteria.getMethod("add", new Class[]{Class.class}); where criteria is an instance of org.hibernate.Criteria The problem is, when I try to initialise a BioSQLRichSequenceDB, a "NoSuchMethodException" is thrown at this line. I searched for the getMethod- method and in fact it is not present in org.hibernate.Criteria. So does anyone know if this is an error in the BioJava-class or in the Hibernate-class? Greetings, Felix -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Tue Feb 28 01:04:18 2006 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Tue Feb 28 00:59:40 2006 Subject: [Biojava-l] BioJavaX docbook minor corrections Message-ID: Thanks for pointing these out. Corrected now in CVS. The use of name parameters ( namespace= :nsp) is prefered (and the previous syntax was incorrect). As you point out, set version is a private method so you cannot access it (even in Hibernate which does some pretty odd things) so I changed this to you setDescription suggestion. - Mark Felix Dreher Sent by: biojava-l-bounces@portal.open-bio.org 02/24/2006 08:53 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] BioJavaX docbook minor corrections Hello, just two suggestions/corrections about the BioJavaX-docbook. In the section "Configuring your application to use Hibernate and BioSQL", complete example, I found two errors (or at least these parts don't work in my test-app). 1) // print out all the sequences in the namespace Query sq = session.createQuery("from BioEntry where namespace=?",ns); --> should probably be: Query sq = session.createQuery("from BioEntry where namespace=:nsp"); sq.setParameter("nsp",ns); 2) // if the sequence is called bloggs, change its version to 99 be.setVersion(99); --> can't use the method setVersion(int) --> but e.g. setDescription("XYZ"); Regards, Felix -- Felix Dreher Max-Planck-Institute for Infection Biology Campus Charit? Mitte Department of Immunology Mailing address: Schumannstra?e 21/22 Visitors: Virchowweg 12 10117 Berlin Germany Tel.: +49 (0)30 28460-254 / -494 Mobile: +49 (0)163 7542426 _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From shameer at ncbs.res.in Mon Feb 6 05:03:10 2006 From: shameer at ncbs.res.in (Shameer Khadar) Date: Wed Mar 8 09:45:10 2006 Subject: [Biojava-l] OBF - logo+slogan sample In-Reply-To: <2888.192.168.4.38.1139214470.squirrel@192.168.4.38> References: <001001c62793$bef08f70$93656785@zhur> <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com> <47205.192.168.1.176.1139048133.squirrel@192.168.1.176> <2888.192.168.4.38.1139214470.squirrel@192.168.4.38> Message-ID: <33354.192.168.1.1.1139218154.squirrel@192.168.1.1> Dear All, I have done with one - have a look at it- pls check the attachment S K > Dear All, > > As we are moving to the all new look wiki-style-web - why dont we think > about a unique logo + slogan that can express our spirit and excitement > ??? > > For Example we can have a logo with O|B|F its full form and the slogan - > any body is interested - i would be happy to design logos once we have > done with the logo. > > I have a couple of suggestions -I hope all OBF members can sent much more > powerful slogans than mine > > 'Let's Code for Life' > 'Let's Decode Life' > 'Let's Recode Life' > 'Code your Life ' > > Happy O|B|!!! > -- > Mr. Shameer Khadar (JRF) > Dr. R. Sowdhamini's Lab (# 25) The Computational Biology Group > National Centre for Biological Sciences (TIFR) > UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India > T - 91-080-23636420-32 EXT 4241 > F - 91-080-23636662/23636675 > W - http://www.ncbs.res.in > -------------------------------------------------- > "Refrain from illusions, insist on work and not words, > patiently seek divine and scientific truth." > MM > -------------- next part -------------- A non-text attachment was scrubbed... Name: obf-logo.gif Type: image/gif Size: 5370 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-l/attachments/20060206/6e9b166d/obf-logo.gif From hotafin at gmail.com Wed Feb 8 08:50:25 2006 From: hotafin at gmail.com (Tamas Horvath) Date: Wed Mar 8 09:45:13 2006 Subject: [Biojava-l] structureNMRImpl Message-ID: Skipped content of type multipart/alternative-------------- next part -------------- A non-text attachment was scrubbed... Name: structureNMRImpl.java Type: application/octet-stream Size: 10699 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-l/attachments/20060208/3f62c456/structureNMRImpl.obj