From andreas at sdsc.edu Fri Mar 5 11:56:40 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 5 Mar 2010 08:56:40 -0800 Subject: [Biojava-l] Google summer of code Message-ID: <59a41c431003050856v17c83b80sf1fb59f2587c9cd1@mail.gmail.com> Hi, The Open Bioinformatics Foundation (BioJava's mother organisation) is preparing an application for the Google Summer of Code. If you are interested in becoming a mentor for a BioJava related project, you can join us in the application. If you are a student and are interested in a project, please take a look at these pages: http://www.open-bio.org/wiki/Google_Summer_of_Code http://biojava.org/wiki/Google_Summer_of_Code Andreas From jeedward at yahoo.com Mon Mar 8 10:44:05 2010 From: jeedward at yahoo.com (John Edward) Date: Mon, 8 Mar 2010 07:44:05 -0800 (PST) Subject: [Biojava-l] Call for papers: BCBGC-10, USA, July 2010 Message-ID: <800341.81267.qm@web45915.mail.sp1.yahoo.com> It would be highly appreciated if you could share this announcement with your colleagues, students and individuals whose research is in bioinformatics, computational biology, genomics, data-mining, and related areas. Call for papers: BCBGC-10, USA, July 2010 The 2010 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) (website: http://www.PromoteResearch.org ) will be held during 12-14 of July 2010 in Orlando, FL, USA. BCBGC is an important event in the areas of bioinformatics, computational biology, genomics and chemoinformatics and focuses on all areas related to the conference. The conference will be held at the same time and location where several other major international conferences will be taking place. The conference will be held as part of 2010 multi-conference (MULTICONF-10). MULTICONF-10 will be held during July 12-14, 2010 in Orlando, Florida, USA. The primary goal of MULTICONF is to promote research and developmental activities in computer science, information technology, control engineering, and related fields. Another goal is to promote the dissemination of research to a multidisciplinary audience and to facilitate communication among researchers, developers, practitioners in different fields. The following conferences are planned to be organized as part of MULTICONF-10. ? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-10) ? International Conference on Automation, Robotics and Control Systems (ARCS-10) ? International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) ? International Conference on Computer Communications and Networks (CCN-10) ? International Conference on Enterprise Information Systems and Web Technologies (EISWT-10) ? International Conference on High Performance Computing Systems (HPCS-10) ? International Conference on Information Security and Privacy (ISP-10) ? International Conference on Image and Video Processing and Computer Vision (IVPCV-10) ? International Conference on Software Engineering Theory and Practice (SETP-10) ? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-10) MULTICONF-10 will be held at Imperial Swan Hotel and Suites. It is a full-service resort that puts you in the middle of the fun! Located 1/2 block south of the famed International Drive, the hotel is just minutes from great entertainment like Walt Disney World? Resort, Universal Studios and Sea World Orlando. Guests can enjoy free scheduled transportation to these theme parks, as well as spacious accommodations, outdoor pools and on-site dining ? all situated on 10 tropically landscaped acres. Here, guests can experience a full-service resort with discount hotel pricing in Orlando. We invite draft paper submissions. Please see the website http://www.PromoteResearch.org for more details. Sincerely John Edward From sheoran143 at gmail.com Mon Mar 8 16:11:05 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Mon, 08 Mar 2010 15:11:05 -0600 Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in current maven based project Message-ID: <4B9567E9.7080909@gmail.com> Hi I was making a local version of current maven project on my machine so that i can fix some reference related bugs in biojava. But when I build the local version and tried to use it. I got an error on method RichObjectFactory.connectToBioSql(Object session) of current version of bio-java live. when I had a look on it I saw a comment on it "// commenting out for the moment, since it prevents core from compiling. // TODO: move to BioSql module" then I uncommitted the code and add these import statements to RichObjectFactory.java and the problem is fixed : import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver; import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder; import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler; After this I tried compiling bioSql module it went successfully and also when I compiled Core module it went successfully too.I don't if this is the only reason then please uncomment these line in main svn version since i don't how to do it. Thanks Deepak Sheoran From andreas at sdsc.edu Tue Mar 9 12:28:25 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 9 Mar 2010 09:28:25 -0800 Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in current maven based project In-Reply-To: <4B9567E9.7080909@gmail.com> References: <4B9567E9.7080909@gmail.com> Message-ID: <59a41c431003090928w5f28c6e1ladd571aa283e93c1@mail.gmail.com> Hi Deepak, thanks for spotting this. This factory method should clearly be moved to the biosql module and not be part of the core. Anybody who has a deeper knowledge of the biosql code: Where is the best place in the biosql module to move this to? A work around the compile problem would be to use reflection to mask the calls to the methods in the other module, but it feels like a hack... Andreas On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran wrote: > Hi > I was making a local version of current maven project on my machine so that > i can fix some reference related bugs in biojava. But when I build the local > version and tried to use it. I got an error on method > RichObjectFactory.connectToBioSql(Object session) of current version of > bio-java live. when I had a look on it I saw a comment on it > > "// commenting out for the moment, since it prevents core from > compiling. > // TODO: move to BioSql module" > > then I uncommitted the code and add these import statements to > RichObjectFactory.java and the problem is fixed : > > import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver; > import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder; > import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler; > > After this I tried compiling bioSql module it went successfully and also > when I compiled Core module it went successfully too.I don't if this is the > only reason then please uncomment these line in main svn version since i > don't how to do it. > > Thanks > Deepak Sheoran > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From sheoran143 at gmail.com Tue Mar 9 15:10:00 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Tue, 09 Mar 2010 14:10:00 -0600 Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in current maven based project In-Reply-To: <59a41c431003090928w5f28c6e1ladd571aa283e93c1@mail.gmail.com> References: <4B9567E9.7080909@gmail.com> <59a41c431003090928w5f28c6e1ladd571aa283e93c1@mail.gmail.com> Message-ID: <4B96AB18.908@gmail.com> Hi Andreas I guess it should go in "org.biojavax.bio.db.biosql" package, it make sense to put this class their. Deepak Sheoran On 3/9/2010 11:28 AM, Andreas Prlic wrote: > Hi Deepak, > > thanks for spotting this. This factory method should clearly be moved > to the biosql module and not be part of the core. Anybody who has a > deeper knowledge of the biosql code: Where is the best place in the > biosql module to move this to? > > A work around the compile problem would be to use reflection to mask > the calls to the methods in the other module, but it feels like a hack... > > Andreas > > On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran > wrote: > > Hi > I was making a local version of current maven project on my > machine so that i can fix some reference related bugs in biojava. > But when I build the local version and tried to use it. I got an > error on method > RichObjectFactory.connectToBioSql(Object session) of current > version of bio-java live. when I had a look on it I saw a comment > on it > > "// commenting out for the moment, since it prevents core from > compiling. > // TODO: move to BioSql module" > > then I uncommitted the code and add these import statements to > RichObjectFactory.java and the problem is fixed : > > import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver; > import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder; > import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler; > > After this I tried compiling bioSql module it went successfully > and also when I compiled Core module it went successfully too.I > don't if this is the only reason then please uncomment these line > in main svn version since i don't how to do it. > > Thanks > Deepak Sheoran > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From holland at eaglegenomics.com Wed Mar 10 08:31:43 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 10 Mar 2010 21:31:43 +0800 Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in current maven based project In-Reply-To: <4B96AB18.908@gmail.com> References: <4B9567E9.7080909@gmail.com> <59a41c431003090928w5f28c6e1ladd571aa283e93c1@mail.gmail.com> <4B96AB18.908@gmail.com> Message-ID: The problem is that the RichObjectFactory is generic, but the connectToBioSQL method is BioSQL specific. What really needs to happen is abstract out the connectToBioSQL method _only_ to a more specific class in the biosql module, and use (if necessary create) setters on RichObjectFactory for it to use. On 10 Mar 2010, at 04:10, Deepak Sheoran wrote: > Hi Andreas > I guess it should go in "org.biojavax.bio.db.biosql" package, it make sense to put this class their. > > Deepak Sheoran > > On 3/9/2010 11:28 AM, Andreas Prlic wrote: >> Hi Deepak, >> >> thanks for spotting this. This factory method should clearly be moved to the biosql module and not be part of the core. Anybody who has a deeper knowledge of the biosql code: Where is the best place in the biosql module to move this to? >> >> A work around the compile problem would be to use reflection to mask the calls to the methods in the other module, but it feels like a hack... >> >> Andreas >> >> On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran > wrote: >> >> Hi >> I was making a local version of current maven project on my >> machine so that i can fix some reference related bugs in biojava. >> But when I build the local version and tried to use it. I got an >> error on method >> RichObjectFactory.connectToBioSql(Object session) of current >> version of bio-java live. when I had a look on it I saw a comment >> on it >> >> "// commenting out for the moment, since it prevents core from >> compiling. >> // TODO: move to BioSql module" >> >> then I uncommitted the code and add these import statements to >> RichObjectFactory.java and the problem is fixed : >> >> import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver; >> import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder; >> import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler; >> >> After this I tried compiling bioSql module it went successfully >> and also when I compiled Core module it went successfully too.I >> don't if this is the only reason then please uncomment these line >> in main svn version since i don't how to do it. >> >> Thanks >> Deepak Sheoran >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From mark.schreiber at novartis.com Wed Mar 10 22:14:54 2010 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 11 Mar 2010 11:14:54 +0800 Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in current maven based project In-Reply-To: Message-ID: Could a subclass of the RichObjectFactory exist in the BioSQL module. If you want your RichObjects backed by BioSQL you use the [BioSQL]RichObjectFactory from the BioSQL package??? - Mark biojava-l-bounces at lists.open-bio.org wrote on 03/10/2010 09:31:43 PM: > The problem is that the RichObjectFactory is generic, but the > connectToBioSQL method is BioSQL specific. What really needs to > happen is abstract out the connectToBioSQL method _only_ to a more > specific class in the biosql module, and use (if necessary create) > setters on RichObjectFactory for it to use. > > > On 10 Mar 2010, at 04:10, Deepak Sheoran wrote: > > > Hi Andreas > > I guess it should go in "org.biojavax.bio.db.biosql" package, it > make sense to put this class their. > > > > Deepak Sheoran > > > > On 3/9/2010 11:28 AM, Andreas Prlic wrote: > >> Hi Deepak, > >> > >> thanks for spotting this. This factory method should clearly be > moved to the biosql module and not be part of the core. Anybody who > has a deeper knowledge of the biosql code: Where is the best place > in the biosql module to move this to? > >> > >> A work around the compile problem would be to use reflection to > mask the calls to the methods in the other module, but it feels likea hack... > >> > >> Andreas > >> > >> On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran mailto:sheoran143 at gmail.com>> wrote: > >> > >> Hi > >> I was making a local version of current maven project on my > >> machine so that i can fix some reference related bugs in biojava. > >> But when I build the local version and tried to use it. I got an > >> error on method > >> RichObjectFactory.connectToBioSql(Object session) of current > >> version of bio-java live. when I had a look on it I saw a comment > >> on it > >> > >> "// commenting out for the moment, since it prevents core from > >> compiling. > >> // TODO: move to BioSql module" > >> > >> then I uncommitted the code and add these import statements to > >> RichObjectFactory.java and the problem is fixed : > >> > >> import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver; > >> import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder; > >> import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler; > >> > >> After this I tried compiling bioSql module it went successfully > >> and also when I compiled Core module it went successfully too.I > >> don't if this is the only reason then please uncomment these line > >> in main svn version since i don't how to do it. > >> > >> Thanks > >> Deepak Sheoran > >> > >> > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > >> > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From holland at eaglegenomics.com Thu Mar 11 11:10:15 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 12 Mar 2010 00:10:15 +0800 Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in current maven based project In-Reply-To: References: Message-ID: <4E92965B-F9EA-43B1-9235-4FA7BAC09308@eaglegenomics.com> Could do. On 11 Mar 2010, at 11:14, mark.schreiber at novartis.com wrote: > > Could a subclass of the RichObjectFactory exist in the BioSQL module. If you want your RichObjects backed by BioSQL you use the [BioSQL]RichObjectFactory from the BioSQL package??? > > - Mark > > > biojava-l-bounces at lists.open-bio.org wrote on 03/10/2010 09:31:43 PM: > > > The problem is that the RichObjectFactory is generic, but the > > connectToBioSQL method is BioSQL specific. What really needs to > > happen is abstract out the connectToBioSQL method _only_ to a more > > specific class in the biosql module, and use (if necessary create) > > setters on RichObjectFactory for it to use. > > > > > > On 10 Mar 2010, at 04:10, Deepak Sheoran wrote: > > > > > Hi Andreas > > > I guess it should go in "org.biojavax.bio.db.biosql" package, it > > make sense to put this class their. > > > > > > Deepak Sheoran > > > > > > On 3/9/2010 11:28 AM, Andreas Prlic wrote: > > >> Hi Deepak, > > >> > > >> thanks for spotting this. This factory method should clearly be > > moved to the biosql module and not be part of the core. Anybody who > > has a deeper knowledge of the biosql code: Where is the best place > > in the biosql module to move this to? > > >> > > >> A work around the compile problem would be to use reflection to > > mask the calls to the methods in the other module, but it feels likea hack... > > >> > > >> Andreas > > >> > > >> On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran > mailto:sheoran143 at gmail.com>> wrote: > > >> > > >> Hi > > >> I was making a local version of current maven project on my > > >> machine so that i can fix some reference related bugs in biojava. > > >> But when I build the local version and tried to use it. I got an > > >> error on method > > >> RichObjectFactory.connectToBioSql(Object session) of current > > >> version of bio-java live. when I had a look on it I saw a comment > > >> on it > > >> > > >> "// commenting out for the moment, since it prevents core from > > >> compiling. > > >> // TODO: move to BioSql module" > > >> > > >> then I uncommitted the code and add these import statements to > > >> RichObjectFactory.java and the problem is fixed : > > >> > > >> import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver; > > >> import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder; > > >> import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler; > > >> > > >> After this I tried compiling bioSql module it went successfully > > >> and also when I compiled Core module it went successfully too.I > > >> don't if this is the only reason then please uncomment these line > > >> in main svn version since i don't how to do it. > > >> > > >> Thanks > > >> Deepak Sheoran > > >> > > >> > > >> _______________________________________________ > > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > > >> > > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > >> > > >> > > > > > > _______________________________________________ > > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > -- > > Richard Holland, BSc MBCS > > Operations and Delivery Director, Eagle Genomics Ltd > > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Mon Mar 15 06:34:14 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 15 Mar 2010 10:34:14 +0000 Subject: [Biojava-l] Hackathon in Boston, July 2010 Message-ID: <5FC2D8EC-5408-4126-9A7D-CB6B3500B61C@eaglegenomics.com> Hi all, Following the successful hackathon in Cambridge earlier this year, it was originally planned to hold a second one in Boston in conjunction with BOSC in order to give those who couldn't make it to the UK a chance to get involved. However, OBF have beaten us to it by organising a cross-project CodeFest! http://www.open-bio.org/wiki/Codefest_2010 It would be great for BioJava people to get involved with this cross-project hackathon effort, and it saves organising one of our own! :) All relevant info is on the web page linked to above, and if you have any questions, ask Brad as detailed on the page. cheers, Richard -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From xuejiachen at gmail.com Mon Mar 15 19:09:50 2010 From: xuejiachen at gmail.com (Jiachen Xue) Date: Mon, 15 Mar 2010 19:09:50 -0400 Subject: [Biojava-l] question about BLAST output parsing Message-ID: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com> Hi, Thanks advance for help. For the following piece of text appearing in a blast output. How can I get the fields of "Identities", "Positives", "Gaps" as well as the alignment information, such as "DK V L+D + G +S + +++ +E GA+K+ L + AAPE" and subject string? >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase; AltName: Full=UMP pyrophosphorylase; AltName: Full=UPRTase Length = 209 Score = 32.0 bits (71), Expect = 9.7, Method: Compositional matrix adjust. Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%) Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399 DK V L+D + G +S + +++ +E GA+K+ L + AAPE Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165 From anjolou at hotmail.com Tue Mar 16 05:20:35 2010 From: anjolou at hotmail.com (Louise Ott) Date: Tue, 16 Mar 2010 10:20:35 +0100 Subject: [Biojava-l] question about BLAST output parsing In-Reply-To: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com> References: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com> Message-ID: Hello, I tried to use the biojava blast parser myself but i didn't find a way to get back these informations.If your blast result can be in xml, you should try to use jaxb to parse it (this is what i used).There are already some code for marshall/unmarshall in the biojava3 project.I give you the link, but it seems to be dead right now : http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/branches/biojava3 http://www.biojava.org/wiki/BioJava3_project Have a nice day, Louise > Date: Mon, 15 Mar 2010 19:09:50 -0400 > From: xuejiachen at gmail.com > To: biojava-l at lists.open-bio.org > Subject: [Biojava-l] question about BLAST output parsing > > Hi, > > Thanks advance for help. > > For the following piece of text appearing in a blast output. How can I get > the fields of "Identities", "Positives", "Gaps" as well as the alignment > information, such as "DK V L+D + G +S + +++ +E GA+K+ L + AAPE" and > subject string? > > >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase; > AltName: Full=UMP > pyrophosphorylase; AltName: Full=UPRTase > Length = 209 > > Score = 32.0 bits (71), Expect = 9.7, Method: Compositional matrix > adjust. > Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%) > > Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399 > DK V L+D + G +S + +++ +E GA+K+ L + AAPE > Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165 > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l _________________________________________________________________ Hotmail arrive sur votre t?l?phone ! Compatible Iphone, Windows Phone, Blackberry, ? http://www.messengersurvotremobile.com/?d=Hotmail From anjolou at hotmail.com Tue Mar 16 05:23:37 2010 From: anjolou at hotmail.com (Louise Ott) Date: Tue, 16 Mar 2010 10:23:37 +0100 Subject: [Biojava-l] question about BLAST output parsing In-Reply-To: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com> References: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com> Message-ID: Sorry i forgot : there is an example of using blast parser in here : http://biojava.org/wiki/BioJava:CookBook:Blast:Parser It should be enough for what you want to do. > Date: Mon, 15 Mar 2010 19:09:50 -0400 > From: xuejiachen at gmail.com > To: biojava-l at lists.open-bio.org > Subject: [Biojava-l] question about BLAST output parsing > > Hi, > > Thanks advance for help. > > For the following piece of text appearing in a blast output. How can I get > the fields of "Identities", "Positives", "Gaps" as well as the alignment > information, such as "DK V L+D + G +S + +++ +E GA+K+ L + AAPE" and > subject string? > > >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase; > AltName: Full=UMP > pyrophosphorylase; AltName: Full=UPRTase > Length = 209 > > Score = 32.0 bits (71), Expect = 9.7, Method: Compositional matrix > adjust. > Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%) > > Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399 > DK V L+D + G +S + +++ +E GA+K+ L + AAPE > Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165 > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l _________________________________________________________________ Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans HOTMAIL ! http://www.windowslive.fr/hotmail/agregation/ From andreas at sdsc.edu Tue Mar 16 11:19:45 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 16 Mar 2010 08:19:45 -0700 Subject: [Biojava-l] question about BLAST output parsing In-Reply-To: References: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com> Message-ID: <59a41c431003160819p4a7031c9jfc7c16454df5210@mail.gmail.com> Yea, the BioJava Blast parser has not been maintained in quite a while. Probably parsing the XML output of Blast is the thing to do nowadays. About Biojava3: the wiki documentation is a bit behind, the code is now in the main biojava-trunk and development has been quite active over the last months. Andreas On Tue, Mar 16, 2010 at 2:20 AM, Louise Ott wrote: > > > Hello, > I tried to use the biojava blast parser myself but i didn't find a way to > get back these informations.If your blast result can be in xml, you should > try to use jaxb to parse it (this is what i used).There are already some > code for marshall/unmarshall in the biojava3 project.I give you the link, > but it seems to be dead right now : > > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/branches/biojava3 > http://www.biojava.org/wiki/BioJava3_project > Have a nice day, > Louise > > > > Date: Mon, 15 Mar 2010 19:09:50 -0400 > > From: xuejiachen at gmail.com > > To: biojava-l at lists.open-bio.org > > Subject: [Biojava-l] question about BLAST output parsing > > > > Hi, > > > > Thanks advance for help. > > > > For the following piece of text appearing in a blast output. How can I > get > > the fields of "Identities", "Positives", "Gaps" as well as the alignment > > information, such as "DK V L+D + G +S + +++ +E GA+K+ L + AAPE" and > > subject string? > > > > >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase; > > AltName: Full=UMP > > pyrophosphorylase; AltName: Full=UPRTase > > Length = 209 > > > > Score = 32.0 bits (71), Expect = 9.7, Method: Compositional matrix > > adjust. > > Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%) > > > > Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399 > > DK V L+D + G +S + +++ +E GA+K+ L + AAPE > > Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165 > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > _________________________________________________________________ > Hotmail arrive sur votre t?l?phone ! Compatible Iphone, Windows Phone, > Blackberry, ? > http://www.messengersurvotremobile.com/?d=Hotmail > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From hlapp at drycafe.net Tue Mar 16 16:03:50 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 16 Mar 2010 16:03:50 -0400 Subject: [Biojava-l] [OT] Job opportunity: Training coordinator and Bioinformatics Project Manager Message-ID: <0CDDCED9-266E-4CCE-8240-D7E2C8522784@drycafe.net> Hi all - first off, sorry for the cross-posting, we're trying to advertise this as widely as possible. Second, apologies if this is committing an offense and considered spam. I thought though that there might be some people around here who may be interested and suitable. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== A unique position is available for a training coordinator and bioinformatics project manager at the U.S. National Evolutionary Synthesis Center in Durham, North Carolina (NESCent, http:// nescent.org). NESCent is a National Science Foundation funded research center managed by Duke University, the University of North Carolina at Chapel Hill and North Carolina State University on behalf of the international evolutionary biology community. NESCent facilitates synthetic research by bringing together diverse expertise, data, tools and concepts (Sidlauskas et al. 2009). In addition to a resident population of 20-30 scientists, the Center hosts over 800 visitors a year. An informatics staff is on-site to support resident and visiting scientists? needs in high-performance computing, electronic collaboration, scientific software and databases; this includes custom software development for a limited number of high- impact projects. NESCent?s informatics training program includes a rotating series of open-application summer courses, ad-hoc short courses for resident scientists, and remote internships (including past participation in the Google Summer of Code). The training coordinator and bioinformatics project manager will provide oversight to the Center?s training activities. The incumbent will also serve as the interface between scientists and software developers at NESCent. The position provides extensive opportunities for collaboration and intellectual engagement with both NESCent- sponsored scientists and informatics staff; however, this is not an independent research position. The incumbent will report to the Director, while overseeing the work of a small informatics team and coordinating activities among the Center?s science, education and informatics programs. Responsibilities: ? 50% - Consult with sponsored scientists (including scientists in residence and working group participants) about informatics resources and needs. Manage software product development by gathering requirements from scientists, participating in conceptual design, monitoring implementation progress and product quality, facilitating communication between software developers and scientists, and researching software solutions. ? 25% - Oversee NESCent?s course curriculum by identifying opportunities for onsite or online informatics courses that satisfy demand for advanced training of resident and visiting scientists, recruiting instructors, providing guidance to instructors in developing course syllabi, coordinating logistical and technical support requirements, conducting assessments, and serving as a liaison to course organizers at other institutions. ? 25% - Assisting in the management of NESCent?s summer informatics intern program, by coordinating the recruitment, application & review process for students, communicating expectations to students and mentors, monitoring student progress, documenting student outcomes, and performing assessments. Education: Required: M.S. in Biology, Bioinformatics, or a related field. Preferred: Ph.D. and two years postdoctoral experience in evolutionary biology, or an equivalent combination of relevant education and/or experience. Experience: Required: Excellent communication, interpersonal, and organizational skills. Experience with computationally oriented scientific research. Preferred: At least two years in development of databases and open source software. Organization, coordination, development and delivery of courses and workshops appropriate for graduate-level participants. Terms of Employment: Salary will be competitive and commensurate with experience. As a full-time employee, the incumbent will receive Duke University?s benefits package (http://hr.duke.edu/benefits/main.html). The position is available immediately and will remain open until filled. The position is currently funded through November 2014, contingent on annual renewal of the Center by the NSF. How to Apply: Please send a C.V., including contact information for three references, and a brief statement of interest to Allen Rodrigo, Director, NESCent, at a.rodrigo at nescent.org. Inquiries about suitability for the position are welcome. Duke University is an Equal Opportunity/Affirmative Action employer. Additional information about NESCent: http://www.nescent.org References: Sidlauskas B, Ganapathy G, Hazkani-Covo E, Jenkins KP, Lapp H, McCall LW, Price S, Scherle R, Spaeth PA, Kidd DM (2009) Linking Big: The Continuing Promise of Evolutionary Synthesis. Evolution. http://dx.doi.org/10.1111/j.1558-5646.2009.00892.x From markjschreiber at gmail.com Tue Mar 16 21:14:51 2010 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 17 Mar 2010 09:14:51 +0800 Subject: [Biojava-l] question about BLAST output parsing In-Reply-To: <59a41c431003160819p4a7031c9jfc7c16454df5210@mail.gmail.com> References: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com> <59a41c431003160819p4a7031c9jfc7c16454df5210@mail.gmail.com> Message-ID: <93b45ca51003161814y7196e3e8i8e329b79e612cf50@mail.gmail.com> I generally don't recommend parsing the standard BLAST output as it keeps changing subtly . Best to parse one of the tabular formats or the XML output. - Mark On Tue, Mar 16, 2010 at 11:19 PM, Andreas Prlic wrote: > Yea, the BioJava Blast parser has not been maintained in quite a while. > Probably parsing the XML output of Blast is the thing to do nowadays. About > Biojava3: the wiki documentation is a bit behind, the code is now in the > main biojava-trunk and development has been quite active over the last > months. > > Andreas > > On Tue, Mar 16, 2010 at 2:20 AM, Louise Ott wrote: > > > > > > > Hello, > > I tried to use the biojava blast parser myself but i didn't find a way to > > get back these informations.If your blast result can be in xml, you > should > > try to use jaxb to parse it (this is what i used).There are already some > > code for marshall/unmarshall in the biojava3 project.I give you the link, > > but it seems to be dead right now : > > > > > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/branches/biojava3 > > http://www.biojava.org/wiki/BioJava3_project > > Have a nice day, > > Louise > > > > > > > Date: Mon, 15 Mar 2010 19:09:50 -0400 > > > From: xuejiachen at gmail.com > > > To: biojava-l at lists.open-bio.org > > > Subject: [Biojava-l] question about BLAST output parsing > > > > > > Hi, > > > > > > Thanks advance for help. > > > > > > For the following piece of text appearing in a blast output. How can I > > get > > > the fields of "Identities", "Positives", "Gaps" as well as the > alignment > > > information, such as "DK V L+D + G +S + +++ +E GA+K+ L + AAPE" and > > > subject string? > > > > > > >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase; > > > AltName: Full=UMP > > > pyrophosphorylase; AltName: Full=UPRTase > > > Length = 209 > > > > > > Score = 32.0 bits (71), Expect = 9.7, Method: Compositional matrix > > > adjust. > > > Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%) > > > > > > Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399 > > > DK V L+D + G +S + +++ +E GA+K+ L + AAPE > > > Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165 > > > _______________________________________________ > > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _________________________________________________________________ > > Hotmail arrive sur votre t?l?phone ! Compatible Iphone, Windows Phone, > > Blackberry, ? > > http://www.messengersurvotremobile.com/?d=Hotmail > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From Richard.Finkers at wur.nl Wed Mar 17 03:21:16 2010 From: Richard.Finkers at wur.nl (Richard Finkers) Date: Wed, 17 Mar 2010 08:21:16 +0100 Subject: [Biojava-l] SVN repository In-Reply-To: References: Message-ID: <4BA082EC.8010908@wur.nl> Hi, I would like to have a look at the BioJava 3 code (and perhaps in the future contribute to). However, I cannot access the SVN repository (http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk). Is the repository down? Thanks, Richard From biopython at maubp.freeserve.co.uk Wed Mar 17 06:16:45 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 17 Mar 2010 10:16:45 +0000 Subject: [Biojava-l] SVN repository In-Reply-To: <4BA082EC.8010908@wur.nl> References: <4BA082EC.8010908@wur.nl> Message-ID: <320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com> On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers wrote: > > Hi, > > I would like to have a look at the BioJava 3 code (and perhaps in the future > contribute to). However, I cannot access the SVN repository > (http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk). > > Is the repository down? > > Thanks, > Richard Probably :( There have been problems discussed on the BioPerl mailing list (they use the same servers), and the OBF team are aware of it: http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html The code.open-bio.org repositories are a read only public mirror, while dev.open-bio.org is the master repository I think is fine (but not available for anonymous download). In the mean time BioPerl have also setup a read only mirror on github - perhaps BioJava could do the same? Meanwhile BioRuby and Biopython are just using github (not SVN or CVS). Peter From andreas at sdsc.edu Wed Mar 17 13:39:41 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 17 Mar 2010 10:39:41 -0700 Subject: [Biojava-l] SVN repository In-Reply-To: <320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com> References: <4BA082EC.8010908@wur.nl> <320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com> Message-ID: <59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com> I have just heard back from the OBF-helpdesk. The VM hosting the anonymous SVN is currently down. Depending on how big the problem turns out to be, it will be back at some point later today / should be back latest tomorrow. Sorry for this inconvenience. Andreas On Wed, Mar 17, 2010 at 3:16 AM, Peter wrote: > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers > wrote: > > > > Hi, > > > > I would like to have a look at the BioJava 3 code (and perhaps in the > future > > contribute to). However, I cannot access the SVN repository > > ( > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk > ). > > > > Is the repository down? > > > > Thanks, > > Richard > > Probably :( > > There have been problems discussed on the BioPerl mailing list > (they use the same servers), and the OBF team are aware of it: > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html > > The code.open-bio.org repositories are a read only public mirror, > while dev.open-bio.org is the master repository I think is fine > (but not available for anonymous download). > > In the mean time BioPerl have also setup a read only mirror > on github - perhaps BioJava could do the same? Meanwhile > BioRuby and Biopython are just using github (not SVN or CVS). > > Peter > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Thu Mar 18 16:36:38 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 18 Mar 2010 13:36:38 -0700 Subject: [Biojava-l] Google summer of code Message-ID: <59a41c431003181336i33d388aak4b5a26e11ee4161b@mail.gmail.com> Hi, It seems our (the Open Biology Foundation's) Google Summer of Code application has been accepted. http://socghop.appspot.com/gsoc/program/accepted_orgs/google/gsoc2010 As such we are now looking for an interested and skilled student to work on the BioJava multiple sequence alignment project. Take a look at the project description, and if you think you are up for the challenge, send me an email with your application. http://biojava.org/wiki/Google_Summer_of_Code Andreas From shakunb at uom.ac.mu Fri Mar 19 06:50:40 2010 From: shakunb at uom.ac.mu (Shakuntala baichoo) Date: Fri, 19 Mar 2010 14:50:40 +0400 Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9 In-Reply-To: References: Message-ID: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com> Hi! I would like to know the interpretation of the scores after running the needleman-wunsch algorithm using the NUCC44.txt substitution matrix. Actually I have taken the named genes from a bacteria EMBL file and I am trying to compare each gene to the other genes in the lot, using the needleman-wunsch algorithm based on the NUCC44.txt substitution matrix. I would like to determine the % match for each pair but since I get mostly -ve and some positive values, I would like to know how to calculate the % match for a pair of genes. I would be grateful if anybody could help me. Thanks. Shakuntala On Thu, Mar 18, 2010 at 8:00 PM, wrote: > Send Biojava-l mailing list submissions to > biojava-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/biojava-l > or, via email, send a message with subject or body 'help' to > biojava-l-request at lists.open-bio.org > > You can reach the person managing the list at > biojava-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Biojava-l digest..." > > > Today's Topics: > > 1. Re: SVN repository (Andreas Prlic) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 17 Mar 2010 10:39:41 -0700 > From: Andreas Prlic > Subject: Re: [Biojava-l] SVN repository > To: Richard Finkers > Cc: biojava-l at lists.open-bio.org > Message-ID: > <59a41c431003171039h4ca1267bibc45b0d7d270b2a9 at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > I have just heard back from the OBF-helpdesk. The VM hosting the anonymous > SVN is currently down. Depending on how big the problem turns out to be, it > will be back at some point later today / should be back latest tomorrow. > > Sorry for this inconvenience. > Andreas > > > > > On Wed, Mar 17, 2010 at 3:16 AM, Peter >wrote: > > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers > > > wrote: > > > > > > Hi, > > > > > > I would like to have a look at the BioJava 3 code (and perhaps in the > > future > > > contribute to). However, I cannot access the SVN repository > > > ( > > > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk > > ). > > > > > > Is the repository down? > > > > > > Thanks, > > > Richard > > > > Probably :( > > > > There have been problems discussed on the BioPerl mailing list > > (they use the same servers), and the OBF team are aware of it: > > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html > > > > The code.open-bio.org repositories are a read only public mirror, > > while dev.open-bio.org is the master repository I think is fine > > (but not available for anonymous download). > > > > In the mean time BioPerl have also setup a read only mirror > > on github - perhaps BioJava could do the same? Meanwhile > > BioRuby and Biopython are just using github (not SVN or CVS). > > > > Peter > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > ------------------------------ > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > End of Biojava-l Digest, Vol 86, Issue 9 > **************************************** > -- Best Regards Dr. (Mrs.) S.Baichoo Senior Lecturer CSE Dept, FoE University of Mauritius From andreas at sdsc.edu Fri Mar 19 13:42:44 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 19 Mar 2010 10:42:44 -0700 Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9 In-Reply-To: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com> References: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com> Message-ID: <59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com> sorry, can you clarify: what do you mean with you "get mostly -ve" ? Andreas On Fri, Mar 19, 2010 at 3:50 AM, Shakuntala baichoo wrote: > Hi! > I would like to know the interpretation of the scores after running the > needleman-wunsch algorithm using the NUCC44.txt substitution matrix. > Actually I have taken the named genes from a bacteria EMBL file and I am > trying to compare each gene to the other genes in the lot, using the > needleman-wunsch algorithm based on the NUCC44.txt substitution matrix. I > would like to determine the % match for each pair but since I get mostly > -ve > and some positive values, I would like to know how to calculate the % match > for a pair of genes. > I would be grateful if anybody could help me. > > Thanks. > Shakuntala > > On Thu, Mar 18, 2010 at 8:00 PM, >wrote: > > > Send Biojava-l mailing list submissions to > > biojava-l at lists.open-bio.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > or, via email, send a message with subject or body 'help' to > > biojava-l-request at lists.open-bio.org > > > > You can reach the person managing the list at > > biojava-l-owner at lists.open-bio.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Biojava-l digest..." > > > > > > Today's Topics: > > > > 1. Re: SVN repository (Andreas Prlic) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Wed, 17 Mar 2010 10:39:41 -0700 > > From: Andreas Prlic > > Subject: Re: [Biojava-l] SVN repository > > To: Richard Finkers > > Cc: biojava-l at lists.open-bio.org > > Message-ID: > > <59a41c431003171039h4ca1267bibc45b0d7d270b2a9 at mail.gmail.com> > > Content-Type: text/plain; charset=ISO-8859-1 > > > > I have just heard back from the OBF-helpdesk. The VM hosting the > anonymous > > SVN is currently down. Depending on how big the problem turns out to be, > it > > will be back at some point later today / should be back latest tomorrow. > > > > Sorry for this inconvenience. > > Andreas > > > > > > > > > > On Wed, Mar 17, 2010 at 3:16 AM, Peter > >wrote: > > > > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers < > Richard.Finkers at wur.nl > > > > > > wrote: > > > > > > > > Hi, > > > > > > > > I would like to have a look at the BioJava 3 code (and perhaps in the > > > future > > > > contribute to). However, I cannot access the SVN repository > > > > ( > > > > > > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk > > > ). > > > > > > > > Is the repository down? > > > > > > > > Thanks, > > > > Richard > > > > > > Probably :( > > > > > > There have been problems discussed on the BioPerl mailing list > > > (they use the same servers), and the OBF team are aware of it: > > > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html > > > > > > The code.open-bio.org repositories are a read only public mirror, > > > while dev.open-bio.org is the master repository I think is fine > > > (but not available for anonymous download). > > > > > > In the mean time BioPerl have also setup a read only mirror > > > on github - perhaps BioJava could do the same? Meanwhile > > > BioRuby and Biopython are just using github (not SVN or CVS). > > > > > > Peter > > > _______________________________________________ > > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > > > > ------------------------------ > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > End of Biojava-l Digest, Vol 86, Issue 9 > > **************************************** > > > > > > -- > Best Regards > > Dr. (Mrs.) S.Baichoo > Senior Lecturer > CSE Dept, FoE > University of Mauritius > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From mitlox at op.pl Sat Mar 20 06:17:17 2010 From: mitlox at op.pl (xyz) Date: Sat, 20 Mar 2010 20:17:17 +1000 Subject: [Biojava-l] sort fasta file Message-ID: <20100320201718.4420a9b9@wp01> Hello, I would like to sort multiple fasta file depends on the sequence length, ie. from the read with longest sequence to the read with the shortest sequence. import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import org.biojava.bio.BioException; import org.biojavax.SimpleNamespace; import org.biojavax.bio.seq.RichSequence; import org.biojavax.bio.seq.RichSequenceIterator; public class SortFasta { public static void main(String[] args) throws FileNotFoundException, BioException { BufferedReader br = new BufferedReader(new FileReader("sortfasta.fasta")); SimpleNamespace ns = new SimpleNamespace("biojava"); RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, null, ns); while (rsi.hasNext()) { RichSequence rs = rsi.nextRichSequence(); System.out.println(rs.getName()); System.out.println(rs.seqString()); } } } I have tried to do it, but I do not how to continue. Thank you in advance. Best regards, From jswetnam at gmail.com Sun Mar 21 16:56:35 2010 From: jswetnam at gmail.com (James Swetnam) Date: Sun, 21 Mar 2010 16:56:35 -0400 Subject: [Biojava-l] sort fasta file In-Reply-To: <20100320201718.4420a9b9@wp01> References: <20100320201718.4420a9b9@wp01> Message-ID: Just hacked this together, warning: I am new to both java and biojava. import java.io.*; import java.util.*; import org.biojava.bio.BioException; import org.biojava.bio.symbol.*; import org.biojavax.SimpleNamespace; import org.biojavax.bio.seq.*; import java.util.Comparator; public class SortFasta { static private class RichSequenceComparator implements Comparator { public int compare(RichSequence seq1, RichSequence seq2) { return seq1.length() - seq2.length(); } } // Usage: SortFasta unsortedFile.fasta public static void main(String[] args) throws FileNotFoundException, BioException { String fastaFile = args[0]; BufferedReader br = new BufferedReader(new FileReader(fastaFile)); SimpleNamespace ns = new SimpleNamespace("biojava"); Alphabet protein = AlphabetManager.alphabetForName("PROTEIN"); RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, protein.getTokenization("token"), ns); SortedSet sorted = new TreeSet( new SortFasta.RichSequenceComparator()); while (rsi.hasNext()) { sorted.add(rsi.nextRichSequence()); } Iterator sortedIt = sorted.iterator(); //Do whatever you want here with the ascending list of RichSequences by length, I'll just print them. while(sortedIt.hasNext()) { System.out.println(((RichSequence) sortedIt.next()).length()); } } } On Sat, Mar 20, 2010 at 6:17 AM, xyz wrote: > Hello, > I would like to sort multiple fasta file depends on the sequence length, > ie. from the read with longest sequence to the read with the shortest > sequence. > > import java.io.BufferedReader; > import java.io.FileNotFoundException; > import java.io.FileReader; > import org.biojava.bio.BioException; > > import org.biojavax.SimpleNamespace; > import org.biojavax.bio.seq.RichSequence; > import org.biojavax.bio.seq.RichSequenceIterator; > > public class SortFasta { > > public static void main(String[] args) throws FileNotFoundException, > BioException { > > BufferedReader br = new BufferedReader(new > FileReader("sortfasta.fasta")); SimpleNamespace ns = new > SimpleNamespace("biojava"); > > RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, null, > ns); > > while (rsi.hasNext()) { > RichSequence rs = rsi.nextRichSequence(); > System.out.println(rs.getName()); > System.out.println(rs.seqString()); > } > } > } > > I have tried to do it, but I do not how to continue. > > Thank you in advance. > > Best regards, > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Mon Mar 22 19:46:26 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 22 Mar 2010 16:46:26 -0700 Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9 In-Reply-To: <3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com> References: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com> <59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com> <3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com> Message-ID: <59a41c431003221646u2b1da091uc81d601dbf412599@mail.gmail.com> Hi Shakuntala, at the present the NeedlemanWunch implementation does not make it totally straightforward to access the %id. You could try parsing the result of the getAlignmentString() call and accessing the information from there ... Making the underlying data more accessible is on the TODO list for this module: http://biojava.org/wiki/BioJava:Modules Andreas 2010/3/21 Shakuntala baichoo > Hi Andreas! > The problem is as follows. We have a bacteria file. There are about 565 > named genes/features there. We wish to compare each gene with the other 564 > genes. I am using needleman-wunsch from biojava to do so. For one specific > run, I am attaching the result. > The score after comparing Feature no. 0 with Feature no. 1 to Feature no. > 564 is displayed (along with the product name etc...). If I wish to > interpret these scores as a percentage homology, how do I do it? > > P.S. Most of the scores are -ve. Only one or a few is +ve. The comparison > is done using NUCC44.txt. > > Thanks > Kind Regards > Shakuntala > > > On Fri, Mar 19, 2010 at 9:42 PM, Andreas Prlic wrote: > >> sorry, can you clarify: what do you mean with you "get mostly -ve" ? >> >> Andreas >> >> >> On Fri, Mar 19, 2010 at 3:50 AM, Shakuntala baichoo wrote: >> >>> Hi! >>> I would like to know the interpretation of the scores after running the >>> needleman-wunsch algorithm using the NUCC44.txt substitution matrix. >>> Actually I have taken the named genes from a bacteria EMBL file and I am >>> trying to compare each gene to the other genes in the lot, using the >>> needleman-wunsch algorithm based on the NUCC44.txt substitution matrix. I >>> would like to determine the % match for each pair but since I get mostly >>> -ve >>> and some positive values, I would like to know how to calculate the % >>> match >>> for a pair of genes. >>> I would be grateful if anybody could help me. >>> >>> Thanks. >>> Shakuntala >>> >>> On Thu, Mar 18, 2010 at 8:00 PM, >> >wrote: >>> >>> > Send Biojava-l mailing list submissions to >>> > biojava-l at lists.open-bio.org >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit >>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >>> > or, via email, send a message with subject or body 'help' to >>> > biojava-l-request at lists.open-bio.org >>> > >>> > You can reach the person managing the list at >>> > biojava-l-owner at lists.open-bio.org >>> > >>> > When replying, please edit your Subject line so it is more specific >>> > than "Re: Contents of Biojava-l digest..." >>> > >>> > >>> > Today's Topics: >>> > >>> > 1. Re: SVN repository (Andreas Prlic) >>> > >>> > >>> > ---------------------------------------------------------------------- >>> > >>> > Message: 1 >>> > Date: Wed, 17 Mar 2010 10:39:41 -0700 >>> > From: Andreas Prlic >>> > Subject: Re: [Biojava-l] SVN repository >>> > To: Richard Finkers >>> > Cc: biojava-l at lists.open-bio.org >>> > Message-ID: >>> > <59a41c431003171039h4ca1267bibc45b0d7d270b2a9 at mail.gmail.com> >>> > Content-Type: text/plain; charset=ISO-8859-1 >>> > >>> > I have just heard back from the OBF-helpdesk. The VM hosting the >>> anonymous >>> > SVN is currently down. Depending on how big the problem turns out to >>> be, it >>> > will be back at some point later today / should be back latest >>> tomorrow. >>> > >>> > Sorry for this inconvenience. >>> > Andreas >>> > >>> > >>> > >>> > >>> > On Wed, Mar 17, 2010 at 3:16 AM, Peter < >>> biopython at maubp.freeserve.co.uk >>> > >wrote: >>> > >>> > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers < >>> Richard.Finkers at wur.nl >>> > > >>> > > wrote: >>> > > > >>> > > > Hi, >>> > > > >>> > > > I would like to have a look at the BioJava 3 code (and perhaps in >>> the >>> > > future >>> > > > contribute to). However, I cannot access the SVN repository >>> > > > ( >>> > > >>> > >>> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk >>> > > ). >>> > > > >>> > > > Is the repository down? >>> > > > >>> > > > Thanks, >>> > > > Richard >>> > > >>> > > Probably :( >>> > > >>> > > There have been problems discussed on the BioPerl mailing list >>> > > (they use the same servers), and the OBF team are aware of it: >>> > > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html >>> > > >>> > > The code.open-bio.org repositories are a read only public mirror, >>> > > while dev.open-bio.org is the master repository I think is fine >>> > > (but not available for anonymous download). >>> > > >>> > > In the mean time BioPerl have also setup a read only mirror >>> > > on github - perhaps BioJava could do the same? Meanwhile >>> > > BioRuby and Biopython are just using github (not SVN or CVS). >>> > > >>> > > Peter >>> > > _______________________________________________ >>> > > Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> > > http://lists.open-bio.org/mailman/listinfo/biojava-l >>> > > >>> > >>> > >>> > ------------------------------ >>> > >>> > _______________________________________________ >>> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >>> > >>> > >>> > End of Biojava-l Digest, Vol 86, Issue 9 >>> > **************************************** >>> > >>> >>> >>> >>> -- >>> Best Regards >>> >>> Dr. (Mrs.) S.Baichoo >>> Senior Lecturer >>> CSE Dept, FoE >>> University of Mauritius >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> > > > -- > Best Regards > > Dr. (Mrs.) S.Baichoo > Senior Lecturer > CSE Dept, FoE > University of Mauritius > From zm19fitz at siena.edu Mon Mar 22 16:36:14 2010 From: zm19fitz at siena.edu (Fitzsimmons, Zachary) Date: Mon, 22 Mar 2010 16:36:14 -0400 Subject: [Biojava-l] (no subject) Message-ID: <3898DEB8D4D8E34EB622AC53CEFFA2680173D9476385@mb-1.siena.edu> Hi, I am currently a sophomore at Siena College and a Dual Major in Computer Science and Mathematics and I am writing you today to voice my interest in developing for BioJava this summer through Google?s Summer of Code program. I did research at my own college last summer on the Netflix Prize Project with one of my computer science professors and I am very interested in diversifying my work this summer. Currently I am taking an upper-level computer science course in bioinformatics and I have always thought of this as a possible field of study when I attend graduate school. I have learned about different global alignment algorithms such as Needleman?Wunsch and Smith?Waterman in class to match proteins and DNA sequences and later we are going to study the HP folding problem in-depth. I am well versed in the Java programming language, having taken all of the Java courses at my college, and confident in my abilities to contribute to the BioJava project. I consider the All-Java Multiple Sequence Alignment project described in your wiki article [http://biojava.org/wiki/Google_Summer_of_Code] something within my abilities as an experienced Java programmer with past research experience and an interest in the field of bioinformatics. Updating the BioJava code to be newly compliant and eventually implementing a Clustal algorithm for multiple sequence alignment is well within my grasp especially on completion of my college?s bioinformatics course and studying BioJava?s documentation. I would just like your feedback on my proposal for working on your project. I hope to hear from you soon and to apply for the position through Google. Sincerely, Zack Fitzsimmons From andreas at sdsc.edu Tue Mar 23 20:33:09 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 23 Mar 2010 17:33:09 -0700 Subject: [Biojava-l] GSoC update Message-ID: <59a41c431003231733t1e259753k55fbe0a8bfb801a3@mail.gmail.com> Hi, A quick update regarding the current status of our Google Summer of Code project: Several students already have expressed their interest. In fact the response was so good that I believe BioJava should try to run more than just one project. In the meanwhile we added another "mentor proposed" project to our GSoC page : http://biojava.org/wiki/Google_Summer_of_Code . Identification and Classification of Posttranslational Modification of Proteins: Develop a Postranslational Modification package for the BioJava project. In general Google strongly encourages to have student-proposed projects, since historically those are often the most successful GSoC projects. It is recommended that students contact us / possible mentors prior to their application so we can match up students with suitable mentors and projects and we can help in solidifying your project ideas. In principle any BioJava contributor is suitable as a mentor. Students can apply between March 22nd and April 9th via the google web site. http://socghop.appspot.com/ Andreas From andreas at sdsc.edu Wed Mar 24 11:37:43 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 24 Mar 2010 08:37:43 -0700 Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9 In-Reply-To: <3baec70c1003240332n25b11e10i23c2c00c96a71a89@mail.gmail.com> References: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com> <59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com> <3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com> <59a41c431003221646u2b1da091uc81d601dbf412599@mail.gmail.com> <3baec70c1003240332n25b11e10i23c2c00c96a71a89@mail.gmail.com> Message-ID: <59a41c431003240837w3db713a3v3161b4e078faa483@mail.gmail.com> Hi Shakuntala, If the score is positive or negative only depends on the implementation and representation... I think most people expect the score to be positive, so the toAlignmentString method displays it as a positive value, while internally it is a bit different... Andreas On Wed, Mar 24, 2010 at 3:32 AM, Shakuntala baichoo wrote: > Hello Andreas! > Thanks for the quick reply. > I tried the getAlignmentString. It provides a lot of information. However, > I think there is a slight problem here. From the getAlignmentString call I > see that the score after aligning a pair of dna strings is 2706. > But when I view the return value from the method pairwiseAlignment (for the > same set) then the score is -2706. Why? > > Thanks > Shakuntala > > * > * > > > On Tue, Mar 23, 2010 at 3:46 AM, Andreas Prlic wrote: > >> Hi Shakuntala, >> >> at the present the NeedlemanWunch implementation does not make it totally >> straightforward to access the %id. You could try parsing the result of the >> getAlignmentString() call and accessing the information from there ... >> Making the underlying data more accessible is on the TODO list for this >> module: http://biojava.org/wiki/BioJava:Modules >> >> Andreas >> >> 2010/3/21 Shakuntala baichoo >> >> Hi Andreas! >>> The problem is as follows. We have a bacteria file. There are about 565 >>> named genes/features there. We wish to compare each gene with the other 564 >>> genes. I am using needleman-wunsch from biojava to do so. For one specific >>> run, I am attaching the result. >>> The score after comparing Feature no. 0 with Feature no. 1 to Feature no. >>> 564 is displayed (along with the product name etc...). If I wish to >>> interpret these scores as a percentage homology, how do I do it? >>> >>> P.S. Most of the scores are -ve. Only one or a few is +ve. The >>> comparison is done using NUCC44.txt. >>> >>> Thanks >>> Kind Regards >>> Shakuntala >>> >>> >>> On Fri, Mar 19, 2010 at 9:42 PM, Andreas Prlic wrote: >>> >>>> sorry, can you clarify: what do you mean with you "get mostly -ve" ? >>>> >>>> Andreas >>>> >>>> >>>> On Fri, Mar 19, 2010 at 3:50 AM, Shakuntala baichoo wrote: >>>> >>>>> Hi! >>>>> I would like to know the interpretation of the scores after running the >>>>> needleman-wunsch algorithm using the NUCC44.txt substitution matrix. >>>>> Actually I have taken the named genes from a bacteria EMBL file and I >>>>> am >>>>> trying to compare each gene to the other genes in the lot, using the >>>>> needleman-wunsch algorithm based on the NUCC44.txt substitution matrix. >>>>> I >>>>> would like to determine the % match for each pair but since I get >>>>> mostly -ve >>>>> and some positive values, I would like to know how to calculate the % >>>>> match >>>>> for a pair of genes. >>>>> I would be grateful if anybody could help me. >>>>> >>>>> Thanks. >>>>> Shakuntala >>>>> >>>>> On Thu, Mar 18, 2010 at 8:00 PM, >>>> >wrote: >>>>> >>>>> > Send Biojava-l mailing list submissions to >>>>> > biojava-l at lists.open-bio.org >>>>> > >>>>> > To subscribe or unsubscribe via the World Wide Web, visit >>>>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> > or, via email, send a message with subject or body 'help' to >>>>> > biojava-l-request at lists.open-bio.org >>>>> > >>>>> > You can reach the person managing the list at >>>>> > biojava-l-owner at lists.open-bio.org >>>>> > >>>>> > When replying, please edit your Subject line so it is more specific >>>>> > than "Re: Contents of Biojava-l digest..." >>>>> > >>>>> > >>>>> > Today's Topics: >>>>> > >>>>> > 1. Re: SVN repository (Andreas Prlic) >>>>> > >>>>> > >>>>> > >>>>> ---------------------------------------------------------------------- >>>>> > >>>>> > Message: 1 >>>>> > Date: Wed, 17 Mar 2010 10:39:41 -0700 >>>>> > From: Andreas Prlic >>>>> > Subject: Re: [Biojava-l] SVN repository >>>>> > To: Richard Finkers >>>>> > Cc: biojava-l at lists.open-bio.org >>>>> > Message-ID: >>>>> > <59a41c431003171039h4ca1267bibc45b0d7d270b2a9 at mail.gmail.com> >>>>> > Content-Type: text/plain; charset=ISO-8859-1 >>>>> > >>>>> > I have just heard back from the OBF-helpdesk. The VM hosting the >>>>> anonymous >>>>> > SVN is currently down. Depending on how big the problem turns out to >>>>> be, it >>>>> > will be back at some point later today / should be back latest >>>>> tomorrow. >>>>> > >>>>> > Sorry for this inconvenience. >>>>> > Andreas >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > On Wed, Mar 17, 2010 at 3:16 AM, Peter < >>>>> biopython at maubp.freeserve.co.uk >>>>> > >wrote: >>>>> > >>>>> > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers < >>>>> Richard.Finkers at wur.nl >>>>> > > >>>>> > > wrote: >>>>> > > > >>>>> > > > Hi, >>>>> > > > >>>>> > > > I would like to have a look at the BioJava 3 code (and perhaps in >>>>> the >>>>> > > future >>>>> > > > contribute to). However, I cannot access the SVN repository >>>>> > > > ( >>>>> > > >>>>> > >>>>> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk >>>>> > > ). >>>>> > > > >>>>> > > > Is the repository down? >>>>> > > > >>>>> > > > Thanks, >>>>> > > > Richard >>>>> > > >>>>> > > Probably :( >>>>> > > >>>>> > > There have been problems discussed on the BioPerl mailing list >>>>> > > (they use the same servers), and the OBF team are aware of it: >>>>> > > >>>>> http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html >>>>> > > >>>>> > > The code.open-bio.org repositories are a read only public mirror, >>>>> > > while dev.open-bio.org is the master repository I think is fine >>>>> > > (but not available for anonymous download). >>>>> > > >>>>> > > In the mean time BioPerl have also setup a read only mirror >>>>> > > on github - perhaps BioJava could do the same? Meanwhile >>>>> > > BioRuby and Biopython are just using github (not SVN or CVS). >>>>> > > >>>>> > > Peter >>>>> > > _______________________________________________ >>>>> > > Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> > > http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> > > >>>>> > >>>>> > >>>>> > ------------------------------ >>>>> > >>>>> > _______________________________________________ >>>>> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> > >>>>> > >>>>> > End of Biojava-l Digest, Vol 86, Issue 9 >>>>> > **************************************** >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards >>>>> >>>>> Dr. (Mrs.) S.Baichoo >>>>> Senior Lecturer >>>>> CSE Dept, FoE >>>>> University of Mauritius >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> >>>> >>>> >>> >>> >>> -- >>> Best Regards >>> >>> Dr. (Mrs.) S.Baichoo >>> Senior Lecturer >>> CSE Dept, FoE >>> University of Mauritius >>> >> >> > > > -- > Best Regards > > Dr. (Mrs.) S.Baichoo > Senior Lecturer > CSE Dept, FoE > University of Mauritius > From jeedward at yahoo.com Wed Mar 24 20:27:28 2010 From: jeedward at yahoo.com (John Edward) Date: Wed, 24 Mar 2010 17:27:28 -0700 (PDT) Subject: [Biojava-l] Call for papers (Deadline Extended): BCBGC-10, USA, July 2010 Message-ID: <852924.28793.qm@web45911.mail.sp1.yahoo.com> It would be highly appreciated if you could share this announcement with your colleagues, students and individuals whose research is in bioinformatics, computational biology, genomics, data-mining, and related areas. Call for papers (Deadline Extended): BCBGC-10, USA, July 2010 The 2010 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) (website: http://www.PromoteResearch.org ) will be held during 12-14 of July 2010 in Orlando, FL, USA. BCBGC is an important event in the areas of bioinformatics, computational biology, genomics and chemoinformatics and focuses on all areas related to the conference. The conference will be held at the same time and location where several other major international conferences will be taking place. The conference will be held as part of 2010 multi-conference (MULTICONF-10). MULTICONF-10 will be held during July 12-14, 2010 in Orlando, Florida, USA. The primary goal of MULTICONF is to promote research and developmental activities in computer science, information technology, control engineering, and related fields. Another goal is to promote the dissemination of research to a multidisciplinary audience and to facilitate communication among researchers, developers, practitioners in different fields. The following conferences are planned to be organized as part of MULTICONF-10. ? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-10) ? International Conference on Automation, Robotics and Control Systems (ARCS-10) ? International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) ? International Conference on Computer Communications and Networks (CCN-10) ? International Conference on Enterprise Information Systems and Web Technologies (EISWT-10) ? International Conference on High Performance Computing Systems (HPCS-10) ? International Conference on Information Security and Privacy (ISP-10) ? International Conference on Image and Video Processing and Computer Vision (IVPCV-10) ? International Conference on Software Engineering Theory and Practice (SETP-10) ? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-10) MULTICONF-10 will be held at Imperial Swan Hotel and Suites. It is a full-service resort that puts you in the middle of the fun! Located 1/2 block south of the famed International Drive, the hotel is just minutes from great entertainment like Walt Disney World? Resort, Universal Studios and Sea World Orlando. Guests can enjoy free scheduled transportation to these theme parks, as well as spacious accommodations, outdoor pools and on-site dining ? all situated on 10 tropically landscaped acres. Here, guests can experience a full-service resort with discount hotel pricing in Orlando. We invite draft paper submissions. Please see the website http://www.PromoteResearch.org for more details. Sincerely John Edward From andreas.draeger at uni-tuebingen.de Thu Mar 25 10:19:02 2010 From: andreas.draeger at uni-tuebingen.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=) Date: Thu, 25 Mar 2010 15:19:02 +0100 Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9 In-Reply-To: <59a41c431003240837w3db713a3v3161b4e078faa483@mail.gmail.com> References: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com> <59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com> <3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com> <59a41c431003221646u2b1da091uc81d601dbf412599@mail.gmail.com> <3baec70c1003240332n25b11e10i23c2c00c96a71a89@mail.gmail.com> <59a41c431003240837w3db713a3v3161b4e078faa483@mail.gmail.com> Message-ID: <4BAB70D6.5060309@uni-tuebingen.de> Hi Andreas and Shakuntala, The alignment classes have just been revised and can be now updated from the repository. As a major improvement the alignment result has become much easier usable. So, if you're interested in computing something based on the score, you can now simply apply the dedicated get method and don't have to care about parsing anymore. I hope that helps. Cheers Andreas -- Dipl.-Bioinform. Andreas Dr?ger Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Sand 1 72076 T?bingen Germany Phone: +49-7071-29-70436 Fax: +49-7071-29-5091 From mitlox at op.pl Thu Mar 25 09:23:37 2010 From: mitlox at op.pl (xyz) Date: Thu, 25 Mar 2010 23:23:37 +1000 Subject: [Biojava-l] sort fasta file In-Reply-To: References: <20100320201718.4420a9b9@wp01> Message-ID: <20100325232337.3021200a@wp01> Hi James, Thank you for the solution, but I get this 7 13 23 30 as output for this input file: >1 atccccc >2 atccccctttttt >3 atccccccccccccccccctttt >4 tttttttccccccccccccccccccccccc >5 tttttttccccccccccccccccccccccc How is it possible to fix it and why did you chose Comparator and not Comparable? Thank you in advance. Best regards, On Sun, 21 Mar 2010 16:56:35 -0400 James Swetnam wrote: > Just hacked this together, warning: I am new to both java and biojava. > > import java.io.*; > import java.util.*; > > import org.biojava.bio.BioException; > import org.biojava.bio.symbol.*; > import org.biojavax.SimpleNamespace; > import org.biojavax.bio.seq.*; > > import java.util.Comparator; > > public class SortFasta { > > static private class RichSequenceComparator implements > Comparator { > > public int compare(RichSequence seq1, RichSequence seq2) > { > return seq1.length() - seq2.length(); > } > > > } > > // Usage: SortFasta unsortedFile.fasta > public static void main(String[] args) throws > FileNotFoundException, BioException { > > String fastaFile = args[0]; > > BufferedReader br = new BufferedReader(new FileReader(fastaFile)); > SimpleNamespace ns = new SimpleNamespace("biojava"); > > Alphabet protein = AlphabetManager.alphabetForName("PROTEIN"); > > RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, > protein.getTokenization("token"), > ns); > > SortedSet sorted = new TreeSet( new > SortFasta.RichSequenceComparator()); > > while (rsi.hasNext()) { > sorted.add(rsi.nextRichSequence()); > } > > Iterator sortedIt = sorted.iterator(); > > //Do whatever you want here with the ascending list of > RichSequences by length, I'll just print them. > while(sortedIt.hasNext()) > { > System.out.println(((RichSequence) sortedIt.next()).length()); > } > } > } > From holland at eaglegenomics.com Thu Mar 25 12:27:17 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 25 Mar 2010 16:27:17 +0000 Subject: [Biojava-l] Bug fix for Biojava in regard to email with subject :( Hibernate Exception and suggestion for change in BioSqlSchema) In-Reply-To: <4BAABA21.4000301@gmail.com> References: <4BAABA21.4000301@gmail.com> Message-ID: <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com> Patched and in subversion on the head in the new Biojava 3 code. I modified the code slightly to simplify it. There were also parallel changes required over in SimpleDocRef itself to enable it to continue working without being connected to BioSQL. On 25 Mar 2010, at 01:19, Deepak Sheoran wrote: > I am writing this email again, I didn't get any response weather this bugs are patched or are they lost some where on mailing list. I am not sure that's why I am writing this back. I don't know how to apply this patch So I am counting on you guys to apply theses patch and reply me back so I know its fixed. > > > > Thanks > Deepak Sheoran > > > Hi > In response to bug fix suggested by Richard I have created some patches. We need to apply these to fix biojava from processing references from a genbank record in a wrong manner which cause more hibernate exceptions. After applying patch, reference resolution code will test pubmed or medline id, then if no match then test author/title/location, then if still no match create a new reference. I even tested it with GenbankRelease 175 and I gained almost 3159 more records in my database. > > Can somebody please have a look on second issue of it and fix it > " > 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from). > " > > Also I am planning on making a bridge between biosql database loaded using bioperl and biojava, here is my some of the investigation can you guys suggest some direction on it. > Have a look on attached files > 1) Biojava_BioPerl_Diff.xls ==> it have view of tables where genbank record is stored in biosql instance by bioperl and biojava > 2) GenbankRecord.doc ==> its word document having a genbank showing where its information goes in biosql using bioperl and biojava > 3) BioSqlRichobjectBuilder.patch ==> patch needed for BioSqlRichObjectBuild.java class > 4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class > > > Thanks > Deepak Sheoran > > > > -------- Original Message -------- > Subject: Re: Hibernate Exception and suggestion for change in BioSqlSchema > Date: Tue, 9 Feb 2010 20:34:32 +1300 > From: Richard Holland > To: Deepak Sheoran > CC: biojava-l at biojava.org > > Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text. > > However, in answer to your two questions: > > 1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March). > > 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from). > > cheers, > Richard > > On 9 Feb 2010, at 20:21, Deepak Sheoran wrote: > > > > > Hi Richard > > > > Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message. > > > > > > Thanks > > Deepak Sheoran > > -------- Original Message -------- > > Subject: Hibernate Exception and suggestion for change in BioSqlSchema > > Date: Wed, 03 Feb 2010 08:07:35 -0600 > > From: Deepak Sheoran > > > > To: > biojava-l at lists.open-bio.org > > > > > Hi guys, > > > > A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is: > http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html > > > On Richard suggestion in above link I am able to resolve some of issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us. > > ? The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id. > > This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object . > > Now when you tie RichObjectFactory to a active hibernate session then the class "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database. > > But problem is with below part of that method: > > ?..LineNumber: 114 > > else if (SimpleDocRef.class.isAssignableFrom(clazz)) > > { queryType = "DocRef"; > > // convert List constructor to String representation for query > > ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true)); > > if (ourParamsList.size()<3) { > > queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null"; > > } else { > > queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?"; > > } > > } > > ..LineNubmer: 123 > > Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code > > ?.LineNumber: 447 > > else { > > try { > > CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)}); > > RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount); > > rlistener.getCurrentFeature().addRankedCrossRef(rcr); > > } catch (ChangeVetoException e) { > > throw new ParseException(e+", accession:"+accession); > > } > > } > > ?..LineNumber:455 > > Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of "unique constraint on dbxref_id" column. > > > > The only way to get these record in database is: > > ? The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table. Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them. > > ? Second solution is slightly difficult to implement, is to change the way "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)" make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session. > > > > Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email) > > Reference_id > > Dbxref_id > > Location > > Title > > Authors > > crc > > 216 > > 18554304 > > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008) > > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H. > > 9E940E01F4BE3CD0 > > 230 > > 18554304 > > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008) > > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H. > > D3BC0C17F3F786C9 > > 415 > > 16790744 > > Infect. Immun. 74 (7), 3715-3726 (2006) > > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A. > > 60AEDFA0CEEACC38 > > 969 > > 16790744 > > Infect. Immun. 74 (7), 3715-3726 (2006) > > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A. > > 4B1232999F6E8130 > > 929 > > 8688087 > > Science 273 (5278), 1058-1073 (1996) > > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. > > 3E79B40DD2AAA2B7 > > 932 > > 8688087 > > Science 273 (5278), 1058-1073 (1996) > > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. > > 094EB3384F8D6DE8 > > 1426 > > 10684935 > > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39 > > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M. > > 357648D8FD8C6C8A > > 1481 > > 10684935 > > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39 > > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C. > > 115411EB2DEE5654 > > 1497 > > 14689165 > > Arch. Microbiol. 181 (2), 144-154 (2004) > > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E. > > 4D5D376EECCD186B > > 1501 > > 14689165 > > Arch. Microbiol. 181 (2), 144-154 (2004) > > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E. > > 4D57954EECDED66B > > 1556 > > 18060065 > > PLoS ONE 2 (12), E1271 (2007) > > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > > 698688FB6DB95247 > > 1559 > > 18060065 > > PLoS ONE 2 (12), E1271 (2007) > > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > > E25E1BA99DB18F3D > > > > ? The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature > > ? Which means in richsequence object some feature have location object which have its feature set to null. > > ? My Observation: > > ? Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record > > ? After catching the hibernate exception I went through all the features and either biojava or hibernate changed the object type of a CompoundRichLocation to SimpleRichLocation and set the feature variable to null. > > ? Below is the screen shot of one of my tests > > ? Settings before trying to persits the richsequence object to database > > > > > > ? > > ? After trying to persits the richsequence object to database and got in hibernate exception catch > > > > ? > > > > ? So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening. > > ? Some extra information to make things more clear to you guys. > > ? Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object. > > ? LOCUS AE001439 1643831 bp DNA circular BCT 19-JAN-2006 > > ? richSequence.feature Index : 2540 and line number in the genbank record : 22115 > > ? LOCUS CP001189 3887492 bp DNA circular BCT 16-OCT-2008 > > ? richSequence.feature Index : 127 and line number in the genbank record : 2137 > > ? LOCUS CP001292 328635 bp DNA circular BCT 17-DEC-2008 > > ? richSequence.feature Index : 389 and line number in the genbank record : 3632 > > ? LOCUS AM279694 238517 bp DNA linear BCT 23-OCT-2008 > > ? richSequence.feature Index : 47 and line number in the genbank record : 4841 > > ? LOCUS CR931663 18517 bp DNA linear BCT 18-SEP-2008 > > ? richSequence.feature Index : 45 and line number in the genbank record : 442 > > ? The complete exception msg : > > org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature > > at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72) > > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290) > > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > > at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > > at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > > at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > > at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > > at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > > at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > > at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > > at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > > at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > > at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > > at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > > at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > > at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535) > > at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523) > > at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78) > > > > > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: > holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andreas at sdsc.edu Thu Mar 25 12:47:45 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 25 Mar 2010 09:47:45 -0700 Subject: [Biojava-l] Bug fix for Biojava in regard to email with subject :( Hibernate Exception and suggestion for change in BioSqlSchema) In-Reply-To: <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com> References: <4BAABA21.4000301@gmail.com> <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com> Message-ID: <59a41c431003250947g6ecd11cbw21c5be5858b9aa09@mail.gmail.com> Excellent, thanks Richard and Deepak! Andreas On Thu, Mar 25, 2010 at 9:27 AM, Richard Holland wrote: > Patched and in subversion on the head in the new Biojava 3 code. I modified > the code slightly to simplify it. There were also parallel changes required > over in SimpleDocRef itself to enable it to continue working without being > connected to BioSQL. > > On 25 Mar 2010, at 01:19, Deepak Sheoran wrote: > > > I am writing this email again, I didn't get any response weather this > bugs are patched or are they lost some where on mailing list. I am not sure > that's why I am writing this back. I don't know how to apply this patch So I > am counting on you guys to apply theses patch and reply me back so I know > its fixed. > > > > > > > > Thanks > > Deepak Sheoran > > > > > > Hi > > In response to bug fix suggested by Richard I have created some patches. > We need to apply these to fix biojava from processing references from a > genbank record in a wrong manner which cause more hibernate exceptions. > After applying patch, reference resolution code will test pubmed or medline > id, then if no match then test author/title/location, then if still no match > create a new reference. I even tested it with GenbankRelease 175 and I > gained almost 3159 more records in my database. > > > > Can somebody please have a look on second issue of it and fix it > > " > > 2. I think that's a bug (compound locations with null features) but not > sure why. Could be that the process of constructing a CompoundRichLocation > is somehow losing the feature reference from the original > SimpleRichLocation. Again I can't investigate until March - can someone else > take a look at the code? (A good starting point would be to look at how a > CompoundRichLocation decides to select the feature from the > SimpleRichLocations it is made up from). > > " > > > > Also I am planning on making a bridge between biosql database loaded > using bioperl and biojava, here is my some of the investigation can you guys > suggest some direction on it. > > Have a look on attached files > > 1) Biojava_BioPerl_Diff.xls ==> it have view of tables where genbank > record is stored in biosql instance by bioperl and biojava > > 2) GenbankRecord.doc ==> its word document having a genbank showing > where its information goes in biosql using bioperl and biojava > > 3) BioSqlRichobjectBuilder.patch ==> patch needed for > BioSqlRichObjectBuild.java class > > 4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class > > > > > > Thanks > > Deepak Sheoran > > > > > > > > -------- Original Message -------- > > Subject: Re: Hibernate Exception and suggestion for change in > BioSqlSchema > > Date: Tue, 9 Feb 2010 20:34:32 +1300 > > From: Richard Holland > > To: Deepak Sheoran > > CC: biojava-l at biojava.org > > > > Hi. It's possible that your original email didn't make it to the list > because it is HTML format, and the list only accepts plain text. > > > > However, in answer to your two questions: > > > > 1. The code that does the resolution of references might be better if > it looks up existing IDs rather than using author, title, location to > identify existing records. I would suggest modifying it to a three-step > process - test ID, then if no match then test author/title/location, then if > still no match create a new reference. Could someone do that? (I'm unable to > do anything until late March). > > > > 2. I think that's a bug (compound locations with null features) but not > sure why. Could be that the process of constructing a CompoundRichLocation > is somehow losing the feature reference from the original > SimpleRichLocation. Again I can't investigate until March - can someone else > take a look at the code? (A good starting point would be to look at how a > CompoundRichLocation decides to select the feature from the > SimpleRichLocations it is made up from). > > > > cheers, > > Richard > > > > On 9 Feb 2010, at 20:21, Deepak Sheoran wrote: > > > > > > > > Hi Richard > > > > > > Below is the email which I sent to Biojava-1 mailing list but it never > get posted on the mailing list server neither do i got any response, so > please have a look on this email and tell what can be the solution of the > problem described in the message. > > > > > > > > > Thanks > > > Deepak Sheoran > > > -------- Original Message -------- > > > Subject: Hibernate Exception and suggestion for change in > BioSqlSchema > > > Date: Wed, 03 Feb 2010 08:07:35 -0600 > > > From: Deepak Sheoran > > > > > > > To: > > biojava-l at lists.open-bio.org > > > > > > > > Hi guys, > > > > > > A couple of days back I was having some problem with hibernate > exception but that exception got resolved and the reference to that email > is: > > > http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html > > > > > On Richard suggestion in above link I am able to resolve some of > issues but then, I got stuck in to some other error with hibernate and then > decided to investigate the matter and below are some facts and information > which I found and I guess it is going to affect all of us. > > > ? The "Reference" table in bioSql schema have unique constraint on > "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). > Which mean only one entry in reference table can use on dbxref_id. > > > This Works wells but in cases when you have little variation in value > of following column "location", "title", "authors" and all these variation > refers to same PUBMED_ID. Then we can't persist or create a richsequence > object . > > > Now when you tie RichObjectFactory to a active hibernate session then > the class "BioSqlRichObjectBuilder" have method called "buildObject(Class > clazz, List paramsList) " which is responsible for looking up details of > object in the database and if it find one then it will return that object, > else it will try to persist the new object into the database. > > > But problem is with below part of that method: > > > ?..LineNumber: 114 > > > else if (SimpleDocRef.class.isAssignableFrom(clazz)) > > > { queryType = "DocRef"; > > > // convert List constructor to String representation > for query > > > ourParamsList.set(0, > DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true)); > > > if (ourParamsList.size()<3) { > > > queryText = "from DocRef as cr where cr.authors > = ? and cr.location = ? and cr.title is null"; > > > } else { > > > queryText = "from DocRef as cr where cr.authors > = ? and cr.location = ? and cr.title = ?"; > > > } > > > } > > > ..LineNubmer: 123 > > > Now when hibernate search the database, it won't find any other record > in "reference" table because those two record are different in string > comparison, so it will return a new object back to "GenbankFormat" to > following piece of code > > > ?.LineNumber: 447 > > > else { > > > try { > > > CrossRef cr = > (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new > Object[]{dbname, raccession, new Integer(0)}); > > > RankedCrossRef rcr = new > SimpleRankedCrossRef(cr, ++rcrossrefCount); > > > > rlistener.getCurrentFeature().addRankedCrossRef(rcr); > > > } catch (ChangeVetoException e) > { > > > throw new > ParseException(e+", accession:"+accession); > > > } > > > } > > > ?..LineNumber:455 > > > Then we will add that object to rlistener. And move to next part of > genbank record and then biojava search for a new crossref in database and it > will try to persist the old one it get a hibernate exception regarding > violation of "unique constraint on dbxref_id" column. > > > > > > The only way to get these record in database is: > > > ? The very easy solution and the way I did it for testing > my theory is Change the bioSql schema so that it can allow many to one on > relation between "reference" and "dbxref" table. Which even make sense > because one paper can have many different variation of naming, and this > change allow us to store that info too. But this is something BioSql people > have decide and I don't know how to approach them. > > > ? Second solution is slightly difficult to implement, is to > change the way "BioSqlRichObjectBuilder.buildObject(Class clazz,List > paramsList)" make decision about weather a particular DocRef already exist > in database or not. I am mean testing all possible string variations of > authors, location, title of the docRef which we are searching. Which does > have many complications and may slow down process of creating a richsequence > object when link RichObjectFactory with a active hibernate session. > > > > > > Example:Below is a sample of what i have in my local biosql schema > which has modification suggested by me. (dbxref_id column have Pubmed_id , I > replaced the local dbxref_id which was present on this table in my database > with pubmed_id stored in "dbxref" table, for easy reference with outside > world in this email) > > > Reference_id > > > Dbxref_id > > > Location > > > Title > > > Authors > > > crc > > > 216 > > > 18554304 > > > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 > (2008) > > > Isolation of lactate-utilizing butyrate-producing bacteria from human > feces and in vivo administration of Anaerostipes caccae strain L2 and > galacto-oligosaccharides in a rat model > > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., > Nomoto,K., Ito,M. and Sawada,H. > > > 9E940E01F4BE3CD0 > > > 230 > > > 18554304 > > > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008) > > > Isolation of lactate-utilizing butyrate-producing bacteria from human > feces and in vivo administration of Anaerostipes caccae strain L2 and > galacto-oligosaccharides in a rat model > > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., > Nomoto,K., Ito,M. and Sawada,H. > > > D3BC0C17F3F786C9 > > > 415 > > > 16790744 > > > Infect. Immun. 74 (7), 3715-3726 (2006) > > > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is > Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via > Recombination with Repetitive Chromosomal Sequences > > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and > Totten,P.A. > > > 60AEDFA0CEEACC38 > > > 969 > > > 16790744 > > > Infect. Immun. 74 (7), 3715-3726 (2006) > > > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is > extensive in vitro and in vivo and suggests that variation is generated via > recombination with repetitive chromosomal sequences > > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and > Totten,P.A. > > > 4B1232999F6E8130 > > > 929 > > > 8688087 > > > Science 273 (5278), 1058-1073 (1996) > > > Complete genome sequence of the methanogenic archaeon, Methanococcus > jannaschii > > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., > Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., > Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., > Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., > Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., > Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., > Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., > Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and > Venter,J.C. > > > 3E79B40DD2AAA2B7 > > > 932 > > > 8688087 > > > Science 273 (5278), 1058-1073 (1996) > > > Complete genome sequence of the methanogenic archaeon, Methanococcus > jannaschii > > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., > Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., > Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., > Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., > Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., > Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., > Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., > Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. > > > 094EB3384F8D6DE8 > > > 1426 > > > 10684935 > > > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae > AR39 > > > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., > Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., > Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., > Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and > Fraser,C.M. > > > 357648D8FD8C6C8A > > > 1481 > > > 10684935 > > > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae > AR39 > > > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., > Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., > Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., > DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C. > > > 115411EB2DEE5654 > > > 1497 > > > 14689165 > > > Arch. Microbiol. 181 (2), 144-154 (2004) > > > The effect of FITA mutations on the symbiotic properties of > Sinorhizobium fredii varies in a chromosomal-background-dependent manner > > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., > del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. > and Ruiz-Sainz,J.E. > > > 4D5D376EECCD186B > > > 1501 > > > 14689165 > > > Arch. Microbiol. 181 (2), 144-154 (2004) > > > The effect of FITA mutations on the symbiotic properties of > Sinorhizobium fredii varies in a chromosomal-background-dependent manner > > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., > Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. > and Ruiz-Sainz,J.E. > > > 4D57954EECDED66B > > > 1556 > > > 18060065 > > > PLoS ONE 2 (12), E1271 (2007) > > > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 > and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids > > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., > Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > > > 698688FB6DB95247 > > > 1559 > > > 18060065 > > > PLoS ONE 2 (12), E1271 (2007) > > > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 > and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids > > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., > Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > > > E25E1BA99DB18F3D > > > > > > ? The second kind of error which I got was : > org.hibernate.PropertyValueException: not-null property references a null or > transient value: Location.feature > > > ? Which means in richsequence object some feature have > location object which have its feature set to null. > > > ? My Observation: > > > ? Usually occur when you try to persist a > richsequence object to database, and occur to those features which have > CompoundRichLocation usually "joins" and "complement" in cds region of a > genbank record > > > ? After catching the hibernate exception I went > through all the features and either biojava or hibernate changed the object > type of a CompoundRichLocation to SimpleRichLocation and set the feature > variable to null. > > > ? Below is the screen shot of one of my tests > > > ? Settings before trying to persits the > richsequence object to database > > > > > > > > > ? > > > ? After trying to persits the richsequence object to > database and got in hibernate exception catch > > > > > > ? > > > > > > ? So my question is why is this happening and how to stop > or how to get these record into database, I have no clue why is this > happening. > > > ? Some extra information to make things more clear to you > guys. > > > ? Below are some Locus line from genbank record for > which I know the error of location, I mean the cds region causing error, and > array index in richsequence.feature arrayList object. > > > ? LOCUS AE001439 1643831 > bp DNA circular BCT 19-JAN-2006 > > > ? richSequence.feature Index : 2540 > and line number in the genbank record : 22115 > > > ? LOCUS CP001189 3887492 > bp DNA circular BCT 16-OCT-2008 > > > ? richSequence.feature Index : 127 > and line number in the genbank record : 2137 > > > ? LOCUS CP001292 328635 > bp DNA circular BCT 17-DEC-2008 > > > ? richSequence.feature Index : 389 > and line number in the genbank record : 3632 > > > ? LOCUS AM279694 238517 > bp DNA linear BCT 23-OCT-2008 > > > ? richSequence.feature Index : 47 > and line number in the genbank record : 4841 > > > ? LOCUS CR931663 18517 > bp DNA linear BCT 18-SEP-2008 > > > ? richSequence.feature Index : 45 > and line number in the genbank record : 442 > > > ? The complete exception msg : > > > org.hibernate.PropertyValueException: not-null property references a > null or transient value: Location.feature > > > at > org.hibernate.engine.Nullability.checkNullability(Nullability.java:72) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > > > at > org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > > > at > org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > > > at > org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > > > at > org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > > > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > > > at > org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > > > at > org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > > at > org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > > > at > org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > > > at > org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > > > at > org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > > > at > org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > > > at > org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > > > at > org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > > > at > org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > > > at > org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > > > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > > > at > org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > > > at > org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > > at > org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > > > at > org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > > > at > org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > > > at > org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > > > at > org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > > > at > org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > > > at > org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > > > at > org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > > > at > org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535) > > > at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523) > > > at > trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78) > > > > > > > > > > -- > > Richard Holland, BSc MBCS > > Operations and Delivery Director, Eagle Genomics Ltd > > T: +44 (0)1223 654481 ext 3 | E: > > holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > > > > > > > > > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > From andreas at sdsc.edu Thu Mar 25 12:56:21 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 25 Mar 2010 09:56:21 -0700 Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9 In-Reply-To: <4BAB70D6.5060309@uni-tuebingen.de> References: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com> <59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com> <3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com> <59a41c431003221646u2b1da091uc81d601dbf412599@mail.gmail.com> <3baec70c1003240332n25b11e10i23c2c00c96a71a89@mail.gmail.com> <59a41c431003240837w3db713a3v3161b4e078faa483@mail.gmail.com> <4BAB70D6.5060309@uni-tuebingen.de> Message-ID: <59a41c431003250956h14abdbe2t1367bec10069d1f3@mail.gmail.com> Hi Andreas, that sounds great! I'll take a look at this soon... Thanks, Andreas On Thu, Mar 25, 2010 at 7:19 AM, Andreas Dr?ger < andreas.draeger at uni-tuebingen.de> wrote: > Hi Andreas and Shakuntala, > > The alignment classes have just been revised and can be now updated from > the repository. As a major improvement the alignment result has become much > easier usable. So, if you're interested in computing something based on the > score, you can now simply apply the dedicated get method and don't have to > care about parsing anymore. I hope that helps. > > Cheers > Andreas > > -- > Dipl.-Bioinform. Andreas Dr?ger > Eberhard Karls University T?bingen > Center for Bioinformatics (ZBIT) > Sand 1 > 72076 T?bingen > Germany > > Phone: +49-7071-29-70436 > Fax: +49-7071-29-5091 > From zhangyiwei79 at gmail.com Thu Mar 25 16:14:50 2010 From: zhangyiwei79 at gmail.com (Yiwei Zhang) Date: Thu, 25 Mar 2010 16:14:50 -0400 Subject: [Biojava-l] Question about All-Java Multiple Sequence Alignment project of Google Summer of Code Message-ID: <55c06db21003251314u142e44d3v25dac787216234e0@mail.gmail.com> Hi, I am a graduate student of computer science and my field of study is related to Bioinformatic algorithms. I am proficient at JAVA programming. I feel very interested in this project because currently I am working on sequence alignment and phylogeny tree reconstruction. My question is that, if the project requires implementing the existing alignment algorithms of current tools, what is the original implementation language of the tools? C++ or C or something else? Thanks! From biopython at maubp.freeserve.co.uk Thu Mar 25 18:16:55 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 25 Mar 2010 22:16:55 +0000 Subject: [Biojava-l] [Biojava-dev] Bug fix for Biojava in regard to email with subject : ( Hibernate Exception and suggestion for change in BioSqlSchema) In-Reply-To: <4BABAFA1.6090806@orionbiosciences.com> References: <4BAABA21.4000301@gmail.com> <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com> <4BABAFA1.6090806@orionbiosciences.com> Message-ID: <320fb6e01003251516w2977ab2h9869342f94576287@mail.gmail.com> On Thu, Mar 25, 2010 at 6:46 PM, Deepak Sheoran wrote: > > That is reason why I was getting error when i was creating a Richsequence > object without any active session to biosql, I didn't had the clue that I > created one more bug by fixing one, thanks for noticing that and fixing > that. > > I am thinking should we use bioperl -biojava and biosql compatibility ?as > one of the google summer of code project. I have vision on this, but don't > know right way to being with. This can ?help people who want to use biojava > but can't because they are afraid to loos their Perl code,which is heavily > dependent on perl way of loading the schema. Or come out with a hybrid way > which have good from both languages. > > Deepak Sheoran That is an interesting idea for GSoC, I wonder if we at Biopython should do the same. I know of a few things where we differ from BioPerl's BioSQL support (e.g. SwissProt comment lines). [I take we agree that bioperl-db is the de facto reference implementation for mapping GenBank etc into BioSQL?] Peter From chapman at cs.wisc.edu Fri Mar 26 03:14:24 2010 From: chapman at cs.wisc.edu (Mark Chapman) Date: Fri, 26 Mar 2010 02:14:24 -0500 Subject: [Biojava-l] Question about All-Java Multiple Sequence Alignment project of Google Summer of Code In-Reply-To: <55c06db21003251314u142e44d3v25dac787216234e0@mail.gmail.com> References: <55c06db21003251314u142e44d3v25dac787216234e0@mail.gmail.com> Message-ID: <4BAC5ED0.1050009@cs.wisc.edu> Hi Yiwei (and list members), I am also a graduate student in Bioinformatics interested in the Google Summer of Code project. The authors' current implementations of ClustalW and ClustalX are written in C++. Binaries, code, and references are located at http://www.clustal.org/ . Download the boldfaced references (Larkin et al 2007 and Thompson et al 1994) for the most relevant information. Take care, Mark On 3/25/2010 3:14 PM, Yiwei Zhang wrote: > Hi, > > I am a graduate student of computer science and my field of study is related > to Bioinformatic algorithms. I am proficient at JAVA programming. I feel > very interested in this project because currently I am working on sequence > alignment and phylogeny tree reconstruction. > > My question is that, if the project requires implementing the existing > alignment algorithms of current tools, what is the > original implementation language of the tools? C++ or C or something else? > > Thanks! > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From bernd.jagla at pasteur.fr Fri Mar 26 05:33:05 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Fri, 26 Mar 2010 10:33:05 +0100 Subject: [Biojava-l] SVN repository In-Reply-To: <59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com> References: <4BA082EC.8010908@wur.nl><320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com> <59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com> Message-ID: <776506315DB04C3EBF2A7FDA610390AB@zillumina> Hi, I am trying to check out biojava for the first time, and I am not sure if the server is still down... Could you please let me if it is up or down? Thanks, Bernd > -----Original Message----- > From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l- > bounces at lists.open-bio.org] On Behalf Of Andreas Prlic > Sent: Wednesday, March 17, 2010 6:40 PM > To: Richard Finkers > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] SVN repository > > I have just heard back from the OBF-helpdesk. The VM hosting the anonymous > SVN is currently down. Depending on how big the problem turns out to be, > it > will be back at some point later today / should be back latest tomorrow. > > Sorry for this inconvenience. > Andreas > > > > > On Wed, Mar 17, 2010 at 3:16 AM, Peter > wrote: > > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers > > > wrote: > > > > > > Hi, > > > > > > I would like to have a look at the BioJava 3 code (and perhaps in the > > future > > > contribute to). However, I cannot access the SVN repository > > > ( > > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava- > live/trunk > > ). > > > > > > Is the repository down? > > > > > > Thanks, > > > Richard > > > > Probably :( > > > > There have been problems discussed on the BioPerl mailing list > > (they use the same servers), and the OBF team are aware of it: > > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html > > > > The code.open-bio.org repositories are a read only public mirror, > > while dev.open-bio.org is the master repository I think is fine > > (but not available for anonymous download). > > > > In the mean time BioPerl have also setup a read only mirror > > on github - perhaps BioJava could do the same? Meanwhile > > BioRuby and Biopython are just using github (not SVN or CVS). > > > > Peter > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From mitlox at op.pl Fri Mar 26 05:57:41 2010 From: mitlox at op.pl (xyz) Date: Fri, 26 Mar 2010 19:57:41 +1000 Subject: [Biojava-l] sort fasta file In-Reply-To: References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> Message-ID: <20100326195741.4799c398@wp01> @Andy: Thank you for the explanation. After the last sequence in the input file in no newline character. @James: I change the code in order to get the biggest sequence first, but the last sequence is missing. import java.io.*; import java.util.*; import org.biojava.bio.BioException; import org.biojava.bio.symbol.*; import org.biojavax.SimpleNamespace; import org.biojavax.bio.seq.*; import java.util.Comparator; public class SortFasta2 { static private class RichSequenceComparator implements Comparator { public int compare(RichSequence seq1, RichSequence seq2) { return seq2.length() - seq1.length(); } } // Usage: SortFasta unsortedFile.fasta public static void main(String[] args) throws FileNotFoundException, BioException { String fastaFile = "sortFasta.fasta"; BufferedReader br = new BufferedReader(new FileReader(fastaFile)); SimpleNamespace ns = new SimpleNamespace("biojava"); Alphabet protein = AlphabetManager.alphabetForName("DNA"); RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, protein.getTokenization("token"), ns); SortedSet sorted = new TreeSet(new SortFasta2.RichSequenceComparator()); while (rsi.hasNext()) { sorted.add(rsi.nextRichSequence()); } Iterator sortedIt = sorted.iterator(); /*Do whatever you want here with the ascending list of RichSequences by length, I'll just print them. */ while (sortedIt.hasNext()) { //System.out.println(((RichSequence) sortedIt.next()).length()); //System.out.println(sortedIt.next().getComments()); System.out.println(sortedIt.next().seqString()); } } } Input file: >1 atccccc >2 atccccctttttt >3 atccccccccccccccccctttt >4 tttttttccccccccccccccccccccccc >5 tttttttcccccccccccccccccccccca Output on the screen: tttttttccccccccccccccccccccccc atccccccccccccccccctttt atccccctttttt atccccc How is it possible to get the last sequence and print the output in fasta format on the screen? Thank you in advance. On Thu, 25 Mar 2010 10:17:31 -0400 James Swetnam wrote: > Just replace the system.out.println with whatever you want to do with > the sequences; write them to a file, etc. > > James > On Fri, 26 Mar 2010 09:40:28 +0000 "Andy Law (RI)" wrote: > Does your input file have a line feed at the end or not? (Just a > thought) > > Comparable is for comparing two objects using their "natural" > ordering and is therefore a "fundamental" property of the class. A > Comparator lets you compare/sort two objects on any characteristics > and you can have many different comparators. Since this is a somewhat > arbitrary way of comparing sequences (you could sort them on > alphabetical sequence for example, or GC content), I guess that's why > James used a comparator. > From richard.finkers at wur.nl Fri Mar 26 06:10:39 2010 From: richard.finkers at wur.nl (Finkers, Richard) Date: Fri, 26 Mar 2010 11:10:39 +0100 Subject: [Biojava-l] SVN repository References: <4BA082EC.8010908@wur.nl><320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com> <59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com> <776506315DB04C3EBF2A7FDA610390AB@zillumina> Message-ID: <33AFFE3255BCA043AF09514A6F6BFBAED04C0D@scomp0039.wurnet.nl> Hi Bernd, It has been working for two days but it seems to be down again. Richard -----Original Message----- From: Bernd Jagla [mailto:bernd.jagla at pasteur.fr] Sent: Fri 2010-03-26 10:33 To: 'Andreas Prlic'; Finkers, Richard Cc: biojava-l at lists.open-bio.org Subject: RE: [Biojava-l] SVN repository Hi, I am trying to check out biojava for the first time, and I am not sure if the server is still down... Could you please let me if it is up or down? Thanks, Bernd > -----Original Message----- > From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l- > bounces at lists.open-bio.org] On Behalf Of Andreas Prlic > Sent: Wednesday, March 17, 2010 6:40 PM > To: Richard Finkers > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] SVN repository > > I have just heard back from the OBF-helpdesk. The VM hosting the anonymous > SVN is currently down. Depending on how big the problem turns out to be, > it > will be back at some point later today / should be back latest tomorrow. > > Sorry for this inconvenience. > Andreas > > > > > On Wed, Mar 17, 2010 at 3:16 AM, Peter > wrote: > > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers > > > wrote: > > > > > > Hi, > > > > > > I would like to have a look at the BioJava 3 code (and perhaps in the > > future > > > contribute to). However, I cannot access the SVN repository > > > ( > > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava- > live/trunk > > ). > > > > > > Is the repository down? > > > > > > Thanks, > > > Richard > > > > Probably :( > > > > There have been problems discussed on the BioPerl mailing list > > (they use the same servers), and the OBF team are aware of it: > > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html > > > > The code.open-bio.org repositories are a read only public mirror, > > while dev.open-bio.org is the master repository I think is fine > > (but not available for anonymous download). > > > > In the mean time BioPerl have also setup a read only mirror > > on github - perhaps BioJava could do the same? Meanwhile > > BioRuby and Biopython are just using github (not SVN or CVS). > > > > Peter > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From andy.law at roslin.ed.ac.uk Fri Mar 26 06:12:11 2010 From: andy.law at roslin.ed.ac.uk (Andy Law (RI)) Date: Fri, 26 Mar 2010 10:12:11 +0000 Subject: [Biojava-l] sort fasta file In-Reply-To: <20100326195741.4799c398@wp01> References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> Message-ID: <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> On 26 Mar 2010, at 09:57, xyz wrote: > @Andy: Thank you for the explanation. After the last sequence in the > input file in no newline character. > Then RichSequenceIterator / RichSequence.IOTools.readFasta() are not seeing the last sequence when the file is not terminated with a newline character. Is this a bug or a feature, folks? Later, Andy -------- Yada, yada, yada... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. From andy.law at roslin.ed.ac.uk Fri Mar 26 06:36:25 2010 From: andy.law at roslin.ed.ac.uk (Andy Law (RI)) Date: Fri, 26 Mar 2010 10:36:25 +0000 Subject: [Biojava-l] sort fasta file In-Reply-To: <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> Message-ID: <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> On 26 Mar 2010, at 10:28, Richard Holland wrote: > That there be a bug. Albeit one with a simple workaround while the SVN server is broken :o} > > On 26 Mar 2010, at 10:12, Andy Law (RI) wrote: > >> >> On 26 Mar 2010, at 09:57, xyz wrote: >> >>> @Andy: Thank you for the explanation. After the last sequence in the >>> input file in no newline character. >>> >> >> Then RichSequenceIterator / RichSequence.IOTools.readFasta() are >> not seeing the last sequence when the file is not terminated with a >> newline character. Is this a bug or a feature, folks? >> >> Later, >> >> Andy >> -------- >> Yada, yada, yada... >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336 >> Disclaimer: This e-mail and any attachments are confidential and >> intended solely for the use of the recipient(s) to whom they are >> addressed. If you have received it in error, please destroy all >> copies and inform the sender. >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > Later, Andy -------- Yada, yada, yada... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. From holland at eaglegenomics.com Fri Mar 26 06:28:19 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 26 Mar 2010 10:28:19 +0000 Subject: [Biojava-l] sort fasta file In-Reply-To: <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> Message-ID: <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> That there be a bug. On 26 Mar 2010, at 10:12, Andy Law (RI) wrote: > > On 26 Mar 2010, at 09:57, xyz wrote: > >> @Andy: Thank you for the explanation. After the last sequence in the >> input file in no newline character. >> > > Then RichSequenceIterator / RichSequence.IOTools.readFasta() are not seeing the last sequence when the file is not terminated with a newline character. Is this a bug or a feature, folks? > > Later, > > Andy > -------- > Yada, yada, yada... > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 > Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Fri Mar 26 06:41:21 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 26 Mar 2010 10:41:21 +0000 Subject: [Biojava-l] sort fasta file In-Reply-To: <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> Message-ID: Do you have a fix? I can't remember if you've got SVN access or not - if you do, please do commit it, otherwise email me a patch and I'll commit it for you. On 26 Mar 2010, at 10:36, Andy Law (RI) wrote: > > On 26 Mar 2010, at 10:28, Richard Holland wrote: > >> That there be a bug. > > Albeit one with a simple workaround while the SVN server is broken :o} > >> >> On 26 Mar 2010, at 10:12, Andy Law (RI) wrote: >> >>> >>> On 26 Mar 2010, at 09:57, xyz wrote: >>> >>>> @Andy: Thank you for the explanation. After the last sequence in the >>>> input file in no newline character. >>>> >>> >>> Then RichSequenceIterator / RichSequence.IOTools.readFasta() are not seeing the last sequence when the file is not terminated with a newline character. Is this a bug or a feature, folks? >>> >>> Later, >>> >>> Andy >>> -------- >>> Yada, yada, yada... >>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > > Later, > > Andy > -------- > Yada, yada, yada... > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 > Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Fri Mar 26 07:04:22 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 26 Mar 2010 11:04:22 +0000 Subject: [Biojava-l] sort fasta file In-Reply-To: <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> Message-ID: I can't see anything in the code that would cause that behaviour. :( Could you provide sample code and a supporting FASTA file that replicates the problem? On 26 Mar 2010, at 10:36, Andy Law (RI) wrote: > > On 26 Mar 2010, at 10:28, Richard Holland wrote: > >> That there be a bug. > > Albeit one with a simple workaround while the SVN server is broken :o} > >> >> On 26 Mar 2010, at 10:12, Andy Law (RI) wrote: >> >>> >>> On 26 Mar 2010, at 09:57, xyz wrote: >>> >>>> @Andy: Thank you for the explanation. After the last sequence in the >>>> input file in no newline character. >>>> >>> >>> Then RichSequenceIterator / RichSequence.IOTools.readFasta() are not seeing the last sequence when the file is not terminated with a newline character. Is this a bug or a feature, folks? >>> >>> Later, >>> >>> Andy >>> -------- >>> Yada, yada, yada... >>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > > Later, > > Andy > -------- > Yada, yada, yada... > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 > Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From Richard.Finkers at wur.nl Fri Mar 26 12:27:59 2010 From: Richard.Finkers at wur.nl (Richard Finkers) Date: Fri, 26 Mar 2010 17:27:59 +0100 Subject: [Biojava-l] SVN repository In-Reply-To: <776506315DB04C3EBF2A7FDA610390AB@zillumina> References: <4BA082EC.8010908@wur.nl><320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com> <59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com> <776506315DB04C3EBF2A7FDA610390AB@zillumina> Message-ID: <4BACE08F.8020604@wur.nl> The repository has been back for two days. But it appears to be down again. Richard Bernd Jagla wrote: > Hi, > > I am trying to check out biojava for the first time, and I am not sure if > the server is still down... Could you please let me if it is up or down? > > Thanks, > > Bernd > > >> -----Original Message----- >> From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l- >> bounces at lists.open-bio.org] On Behalf Of Andreas Prlic >> Sent: Wednesday, March 17, 2010 6:40 PM >> To: Richard Finkers >> Cc: biojava-l at lists.open-bio.org >> Subject: Re: [Biojava-l] SVN repository >> >> I have just heard back from the OBF-helpdesk. The VM hosting the anonymous >> SVN is currently down. Depending on how big the problem turns out to be, >> it >> will be back at some point later today / should be back latest tomorrow. >> >> Sorry for this inconvenience. >> Andreas >> >> >> >> >> On Wed, Mar 17, 2010 at 3:16 AM, Peter >> wrote: >> >> >>> On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers >>> >> >> >>> wrote: >>> >>>> Hi, >>>> >>>> I would like to have a look at the BioJava 3 code (and perhaps in the >>>> >>> future >>> >>>> contribute to). However, I cannot access the SVN repository >>>> ( >>>> >>> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava- >>> >> live/trunk >> >>> ). >>> >>>> Is the repository down? >>>> >>>> Thanks, >>>> Richard >>>> >>> Probably :( >>> >>> There have been problems discussed on the BioPerl mailing list >>> (they use the same servers), and the OBF team are aware of it: >>> http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html >>> >>> The code.open-bio.org repositories are a read only public mirror, >>> while dev.open-bio.org is the master repository I think is fine >>> (but not available for anonymous download). >>> >>> In the mean time BioPerl have also setup a read only mirror >>> on github - perhaps BioJava could do the same? Meanwhile >>> BioRuby and Biopython are just using github (not SVN or CVS). >>> >>> Peter >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > -- Dr. Richard Finkers Researcher Plant Breeding Wageningen UR Plant Breeding P.O. Box 16, 6700 AA, Wageningen, The Netherlands Wageningen Campus, Building 107, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands Tel. +31-317-484165 Fax +31-317-418094 http://www.plantbreeding.wur.nl/ https://www.eu-sol.wur.nl/ https://cbsgdbase.wur.nl/ http://www.disclaimer-uk.wur.nl/ From mitlox at op.pl Fri Mar 26 21:49:46 2010 From: mitlox at op.pl (xyz) Date: Sat, 27 Mar 2010 11:49:46 +1000 Subject: [Biojava-l] Reading and writting Fastq files Message-ID: <20100327114946.276925da@wp01> Hello, I could not find any examples how to read or write fastq files. import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import org.biojava.bio.program.fastq.FastqReader; public class Fastq2Fasta { public static void main(String[] args) throws FileNotFoundException { BufferedReader br = new BufferedReader(new FileReader("fastq2fasta.fasta")); } } Are there any examples how to work with fastq files? Thank you in advance. Best regards, From holland at eaglegenomics.com Sat Mar 27 04:18:04 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Sat, 27 Mar 2010 08:18:04 +0000 Subject: [Biojava-l] sort fasta file In-Reply-To: <20100327100348.1f253bfb@wp01> References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> <20100327100348.1f253bfb@wp01> Message-ID: <2AC8333D-EE71-495E-9C12-98764D81FE2D@eaglegenomics.com> Andy and I came to the conclusion yesterday that this is probably a bug with Java itself - somewhere in the readLine() method in BufferedReader. There's nothing in BioJava that could cause this kind of behaviour other than if it was being fed duff information by BufferedReader. On 27 Mar 2010, at 00:03, xyz wrote: > Please find the input fasta file attached. This file I created under > Linux and I also work with BioJava under Linux. Nothing change if I > created after the last sequence a new line. > > On Fri, 26 Mar 2010 11:04:22 +0000 > Richard Holland wrote: > >> I can't see anything in the code that would cause that >> behaviour. :( Could you provide sample code and a supporting FASTA >> file that replicates the problem? >> > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From mitlox at op.pl Sat Mar 27 05:48:14 2010 From: mitlox at op.pl (xyz) Date: Sat, 27 Mar 2010 19:48:14 +1000 Subject: [Biojava-l] sort fasta file In-Reply-To: References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> Message-ID: <20100327194814.1acc8655@wp01> You can find the input fasta file here http://mitlox.republika.pl/sortFasta.fasta . This file I created under Linux and I also work with BioJava under Linux. Nothing change if I created after the last sequence a new line. On Fri, 26 Mar 2010 11:04:22 +0000 Richard Holland wrote: > I can't see anything in the code that would cause that > behaviour. :( Could you provide sample code and a supporting FASTA > file that replicates the problem? > From voisingreg at yahoo.fr Sat Mar 27 07:24:01 2010 From: voisingreg at yahoo.fr (gregory voisin) Date: Sat, 27 Mar 2010 11:24:01 +0000 (GMT) Subject: [Biojava-l] Unsubcribe? In-Reply-To: <20100327194814.1acc8655@wp01> References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> <20100327194814.1acc8655@wp01> Message-ID: <832231.74869.qm@web23207.mail.ird.yahoo.com> Hi, How to unsubscribe of this list ? thanks greg ? ________________________________ De : xyz ? : Richard Holland Cc : Andy Law (RI) ; "biojava-l at lists.open-bio.org" Envoy? le : Sam 27 mars 2010, 6 h 48 min 14 s Objet?: Re: [Biojava-l] sort fasta file You can find the input fasta file here http://mitlox.republika.pl/sortFasta.fasta . This file I created under Linux and I also work with BioJava under Linux. Nothing change if I created after the last sequence a new line. On Fri, 26 Mar 2010 11:04:22 +0000 Richard Holland wrote: > I can't see anything in the code that would cause that > behaviour. :( Could you provide sample code and a supporting FASTA > file that replicates the problem? > _______________________________________________ Biojava-l mailing list? -? Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From mitlox at op.pl Sat Mar 27 09:54:40 2010 From: mitlox at op.pl (xyz) Date: Sat, 27 Mar 2010 23:54:40 +1000 Subject: [Biojava-l] Reading and writting Fastq files In-Reply-To: <326ea8621003270510v49c00250xc5ad80bece2ae8cb@mail.gmail.com> References: <20100327114946.276925da@wp01> <326ea8621003270510v49c00250xc5ad80bece2ae8cb@mail.gmail.com> Message-ID: <20100327235440.23cffb47@wp01> Hello, I would like to use org.biojava.bio.program.fastq in order to read and write Illumina fastq files. Are there any BioJava examples how to work with fastq files? On Sat, 27 Mar 2010 17:40:21 +0530 jitesh dundas wrote: > Hello, > > Fasta files are normal text files. Try parsing using normal text > parsing methods. > > If you could be more specific & tell me the format details,then I > could help better. > > btw,try using biojava ,the easy & better option if you want. > > Regards, > Jitesh Dundas > > On 3/27/10, xyz wrote: > > Hello, > > I could not find any examples how to read or write fastq files. > > > > import java.io.BufferedReader; > > import java.io.FileNotFoundException; > > import java.io.FileReader; > > import org.biojava.bio.program.fastq.FastqReader; > > > > public class Fastq2Fasta { > > public static void main(String[] args) throws > > FileNotFoundException { BufferedReader br = new BufferedReader(new > > FileReader("fastq2fasta.fasta")); > > } > > } > > > > Are there any examples how to work with fastq files? > > > > Thank you in advance. > > > > Best regards, > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From heuermh at acm.org Sun Mar 28 00:27:16 2010 From: heuermh at acm.org (Michael Heuer) Date: Sun, 28 Mar 2010 00:27:16 -0400 (EDT) Subject: [Biojava-l] Reading and writting Fastq files In-Reply-To: <20100327235440.23cffb47@wp01> Message-ID: Sorry, I haven't written up an example for the Biojava Cookbook yet. The FASTQ package javadoc API is at http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/package-summary.html If you want to read Illumina format FASTQ files, use FastqReader reader = new IlluminaFastqReader(); for (Fastq fastq : reader.read(new File("in.fastq"))) { // ... } michael On Sat, 27 Mar 2010, xyz wrote: > Hello, > I would like to use org.biojava.bio.program.fastq in order to read and > write Illumina fastq files. > > Are there any BioJava examples how to work with fastq files? > > On Sat, 27 Mar 2010 17:40:21 +0530 > jitesh dundas wrote: > > > Hello, > > > > Fasta files are normal text files. Try parsing using normal text > > parsing methods. > > > > If you could be more specific & tell me the format details,then I > > could help better. > > > > btw,try using biojava ,the easy & better option if you want. > > > > Regards, > > Jitesh Dundas > > > > On 3/27/10, xyz wrote: > > > Hello, > > > I could not find any examples how to read or write fastq files. > > > > > > import java.io.BufferedReader; > > > import java.io.FileNotFoundException; > > > import java.io.FileReader; > > > import org.biojava.bio.program.fastq.FastqReader; > > > > > > public class Fastq2Fasta { > > > public static void main(String[] args) throws > > > FileNotFoundException { BufferedReader br = new BufferedReader(new > > > FileReader("fastq2fasta.fasta")); > > > } > > > } > > > > > > Are there any examples how to work with fastq files? > > > > > > Thank you in advance. > > > > > > Best regards, > > > _______________________________________________ > > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From mitlox at op.pl Sun Mar 28 01:44:57 2010 From: mitlox at op.pl (xyz) Date: Sun, 28 Mar 2010 15:44:57 +1000 Subject: [Biojava-l] Reading and writting Fastq files In-Reply-To: <326ea8621003270743j2b4f9d24ib3899d415edf3fc3@mail.gmail.com> References: <20100327114946.276925da@wp01> <326ea8621003270510v49c00250xc5ad80bece2ae8cb@mail.gmail.com> <20100327235440.23cffb47@wp01> <326ea8621003270743j2b4f9d24ib3899d415edf3fc3@mail.gmail.com> Message-ID: <20100328154457.46e088a6@wp01> Hello, I could create methods which can read and write fastq files. However, I downloaded the BioJava source code and in folder src/org/biojava/bio/program are following files: * AbstractFastqReader.java * AbstractFastqWriter.java * Fastq.java * FastqBuilder.java * FastqReader.java * FastqVariant.java * FastqWriter.java * IlluminaFastqReader.java * IlluminaFastqWriter.java * SangerFastqReader.java * SangerFastqWriter.java * SolexaFastqReader.java * SolexaFastqWriter.java These looks to me that is exactly what I need, but unfortunately I do not how to use it. On Sat, 27 Mar 2010 20:13:02 +0530 jitesh dundas wrote: > Hello, > > I could not find much info on that Q.Try the Biojava API for methods. > > However, I would think of this problem as a simple text file parsing > using BufferedReader and ByteInputStream based I/p ..You have to read > the text file content byte by byte using a while loop. The loop will > detect each column using the patterns (i haven't worked on fastq or > biojava that much) in the text file, e.g. space tabs.. > Why don't you try reading this fastq file as a simple text file in > java. > > This is assuming that fastq are text files..Correct me if I am wrong.. > Java tutorial & forums have bulk of egs on that. > > Try writing the code and send the fastq file with the java code if you > face issues.. > > Hope this helps.. > > Regards, > jd From mitlox at op.pl Sun Mar 28 03:20:40 2010 From: mitlox at op.pl (xyz) Date: Sun, 28 Mar 2010 17:20:40 +1000 Subject: [Biojava-l] Reading and writting Fastq files In-Reply-To: References: <20100327235440.23cffb47@wp01> Message-ID: <20100328172040.478de1a1@wp01> Do not worry. I wrote following code: import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.IOException; import org.biojava.bio.program.fastq.Fastq; import org.biojava.bio.program.fastq.FastqBuilder; import org.biojava.bio.program.fastq.FastqReader; import org.biojava.bio.program.fastq.FastqWriter; import org.biojava.bio.program.fastq.IlluminaFastqReader; import org.biojava.bio.program.fastq.IlluminaFastqWriter; public class Fastq2Fasta { public static void main(String[] args) throws FileNotFoundException, IOException { FileInputStream inputFastq = new FileInputStream("fastq2fasta.fastq"); FastqReader qReader = new IlluminaFastqReader(); FileOutputStream outputFastq = new FileOutputStream("fastq2fastaTrim.fastq"); FastqWriter qWriter = new IlluminaFastqWriter(); for (Fastq fastq : qReader.read(inputFastq)) { System.out.println(fastq.getDescription()); System.out.println(fastq.getSequence()); String trimSeq = fastq.getSequence().substring(0, fastq.getSequence().length() - 6); System.out.println(trimSeq); System.out.println(fastq.getQuality()); String trimQual = fastq.getQuality().substring(0, fastq.getQuality().length() - 6); System.out.println(trimQual); FastqBuilder trimFastq = new FastqBuilder(); trimFastq.withDescription(fastq.getDescription()); trimFastq.appendSequence(trimSeq); trimFastq.appendQuality(trimQual); qWriter.write(outputFastq, trimFastq.build()); } } } and the input fastq file is: @HWI-EAS406:5:1:0:1390#0/1 GGGTGATGGCCGCTGCCGATGGCGTCAAAACCCACC +HWI-EAS406:5:1:0:1390#0/1 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAA @HWI-EAS406:5:1:0:1390#0/1 GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC +HWI-EAS406:5:1:0:1390#0/1 PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPBBBBBB @HWI-EAS406:5:1:0:1390#0/1 GGGTGATGGCCGCTGCCGATGGCGTCAAACCCCACC +HWI-EAS406:5:1:0:1390#0/1 QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQCCCCCC Unfortunately, I get the following error: HWI-EAS406:5:1:0:1390#0/1 GGGTGATGGCCGCTGCCGATGGCGTCAAAACCCACC GGGTGATGGCCGCTGCCGATGGCGTCAAAA OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAA OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO Exception in thread "main" java.io.IOException: sequence HWI-EAS406:5:1:0:1390#0/1 not fastq-illumina format, was fastq-sanger at org.biojava.bio.program.fastq.IlluminaFastqWriter.validate(IlluminaFastqWriter.java:41) at org.biojava.bio.program.fastq.AbstractFastqWriter.append(AbstractFastqWriter.java:67) at org.biojava.bio.program.fastq.AbstractFastqWriter.write(AbstractFastqWriter.java:143) at org.biojava.bio.program.fastq.AbstractFastqWriter.write(AbstractFastqWriter.java:125) at Fastq2Fasta.main(Fastq2Fasta.java:37) Java Result: 1 What did I wrong? On Sun, 28 Mar 2010 00:27:16 -0400 (EDT) Michael Heuer wrote: > > Sorry, I haven't written up an example for the Biojava Cookbook yet. > > The FASTQ package javadoc API is at > > http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/package-summary.html > > If you want to read Illumina format FASTQ files, use > > FastqReader reader = new IlluminaFastqReader(); > for (Fastq fastq : reader.read(new File("in.fastq"))) > { > // ... > } > > michael From andreas at sdsc.edu Sun Mar 28 13:44:32 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 28 Mar 2010 10:44:32 -0700 Subject: [Biojava-l] Unsubcribe? In-Reply-To: <832231.74869.qm@web23207.mail.ird.yahoo.com> References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> <20100327194814.1acc8655@wp01> <832231.74869.qm@web23207.mail.ird.yahoo.com> Message-ID: <59a41c431003281044y36137b05nd993e8e51ef7484e@mail.gmail.com> We are using mailman for our mailing lists : http://www.biojava.org/mailman/listinfo/biojava-l Andreas On Sat, Mar 27, 2010 at 4:24 AM, gregory voisin wrote: > Hi, > How to unsubscribe of this list ? > thanks > greg > > > > > > ________________________________ > De : xyz > ? : Richard Holland > Cc : Andy Law (RI) ; " > biojava-l at lists.open-bio.org" > Envoy? le : Sam 27 mars 2010, 6 h 48 min 14 s > Objet : Re: [Biojava-l] sort fasta file > > You can find the input fasta file here > http://mitlox.republika.pl/sortFasta.fasta . This file I created under > Linux and I also work with BioJava under Linux. Nothing change if I > created after the last sequence a new line. > > On Fri, 26 Mar 2010 11:04:22 +0000 > Richard Holland wrote: > > > I can't see anything in the code that would cause that > > behaviour. :( Could you provide sample code and a supporting FASTA > > file that replicates the problem? > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From heuermh at acm.org Mon Mar 29 22:01:23 2010 From: heuermh at acm.org (Michael Heuer) Date: Mon, 29 Mar 2010 22:01:23 -0400 (EDT) Subject: [Biojava-l] Reading and writting Fastq files In-Reply-To: <20100328172040.478de1a1@wp01> Message-ID: FastqBuilder defaults to the Sanger variant, see http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/FastqBuilder.html#DEFAULT_VARIANT In your code, you just need to specify the Illumina variant FastqBuilder trimFastq = new FastqBuilder() .withVariant(FastqVariant.FASTQ_ILLUMINA) .withDescription(fastq.getDescription()) .appendSequence(trimSeq) .appendQuality(trimQual); Please let me know if you have any API or doc suggestions, as this stuff has not been used much by anyone other than myself. michael On Sun, 28 Mar 2010, xyz wrote: > Do not worry. I wrote following code: > > import java.io.FileInputStream; > import java.io.FileNotFoundException; > import java.io.FileOutputStream; > import java.io.IOException; > import org.biojava.bio.program.fastq.Fastq; > import org.biojava.bio.program.fastq.FastqBuilder; > import org.biojava.bio.program.fastq.FastqReader; > import org.biojava.bio.program.fastq.FastqWriter; > import org.biojava.bio.program.fastq.IlluminaFastqReader; > import org.biojava.bio.program.fastq.IlluminaFastqWriter; > > public class Fastq2Fasta { > > public static void main(String[] args) throws FileNotFoundException, > IOException { > FileInputStream inputFastq = new > FileInputStream("fastq2fasta.fastq"); FastqReader qReader = new > IlluminaFastqReader(); > > FileOutputStream outputFastq = new > FileOutputStream("fastq2fastaTrim.fastq"); FastqWriter qWriter = > new IlluminaFastqWriter(); > > > for (Fastq fastq : qReader.read(inputFastq)) { > System.out.println(fastq.getDescription()); > System.out.println(fastq.getSequence()); > String trimSeq = fastq.getSequence().substring(0, > fastq.getSequence().length() - 6); System.out.println(trimSeq); > System.out.println(fastq.getQuality()); > String trimQual = fastq.getQuality().substring(0, > fastq.getQuality().length() - 6); System.out.println(trimQual); > > FastqBuilder trimFastq = new FastqBuilder(); > trimFastq.withDescription(fastq.getDescription()); > trimFastq.appendSequence(trimSeq); > trimFastq.appendQuality(trimQual); > > qWriter.write(outputFastq, trimFastq.build()); > } > } > } > > and the input fastq file is: > @HWI-EAS406:5:1:0:1390#0/1 > GGGTGATGGCCGCTGCCGATGGCGTCAAAACCCACC > +HWI-EAS406:5:1:0:1390#0/1 > OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAA > @HWI-EAS406:5:1:0:1390#0/1 > GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC > +HWI-EAS406:5:1:0:1390#0/1 > PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPBBBBBB > @HWI-EAS406:5:1:0:1390#0/1 > GGGTGATGGCCGCTGCCGATGGCGTCAAACCCCACC > +HWI-EAS406:5:1:0:1390#0/1 > QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQCCCCCC > > Unfortunately, I get the following error: > HWI-EAS406:5:1:0:1390#0/1 > GGGTGATGGCCGCTGCCGATGGCGTCAAAACCCACC > GGGTGATGGCCGCTGCCGATGGCGTCAAAA > OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAA > OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO > Exception in thread "main" java.io.IOException: sequence > HWI-EAS406:5:1:0:1390#0/1 not fastq-illumina format, was fastq-sanger > at > org.biojava.bio.program.fastq.IlluminaFastqWriter.validate(IlluminaFastqWriter.java:41) > at > org.biojava.bio.program.fastq.AbstractFastqWriter.append(AbstractFastqWriter.java:67) > at > org.biojava.bio.program.fastq.AbstractFastqWriter.write(AbstractFastqWriter.java:143) > at > org.biojava.bio.program.fastq.AbstractFastqWriter.write(AbstractFastqWriter.java:125) > at Fastq2Fasta.main(Fastq2Fasta.java:37) Java Result: 1 > > What did I wrong? > > On Sun, 28 Mar 2010 00:27:16 -0400 (EDT) > Michael Heuer wrote: > > > > > Sorry, I haven't written up an example for the Biojava Cookbook yet. > > > > The FASTQ package javadoc API is at > > > > http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/package-summary.html > > > > If you want to read Illumina format FASTQ files, use > > > > FastqReader reader = new IlluminaFastqReader(); > > for (Fastq fastq : reader.read(new File("in.fastq"))) > > { > > // ... > > } > > > > michael > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From mitlox at op.pl Tue Mar 30 07:50:47 2010 From: mitlox at op.pl (xyz) Date: Tue, 30 Mar 2010 21:50:47 +1000 Subject: [Biojava-l] Reading and writting Fastq files In-Reply-To: References: <20100328172040.478de1a1@wp01> Message-ID: <20100330215047.084f6b00@wp01> Thank you it works, but after I extended the code with RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns, fastq.getDescription()); in order to get also a trimmed fasta file I got the following error: Fastq2Fasta.java:51: cannot find symbol symbol : method writeFasta(java.io.FileOutputStream,java.lang.String,org.biojavax.SimpleNamespace,java.lang.String) location: class org.biojavax.bio.seq.RichSequence.IOTools RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns, fastq.getDescription()); 1 error Complete Code: import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.IOException; import org.biojava.bio.program.fastq.Fastq; import org.biojava.bio.program.fastq.FastqBuilder; import org.biojava.bio.program.fastq.FastqReader; import org.biojava.bio.program.fastq.FastqVariant; import org.biojava.bio.program.fastq.FastqWriter; import org.biojava.bio.program.fastq.IlluminaFastqReader; import org.biojava.bio.program.fastq.IlluminaFastqWriter; import org.biojavax.SimpleNamespace; import org.biojavax.bio.seq.RichSequence; public class Fastq2Fasta { public static void main(String[] args) throws FileNotFoundException, IOException { FileInputStream inputFastq = new FileInputStream("fastq2fasta.fastq"); FastqReader qReader = new IlluminaFastqReader(); FileOutputStream outputFastq = new FileOutputStream("fastq2fastaTrim.fastq"); FastqWriter qWriter = new IlluminaFastqWriter(); SimpleNamespace ns = new SimpleNamespace("biojava"); FileOutputStream outputFasta = new FileOutputStream("fastq2fastaTrim.fasta"); for (Fastq fastq : qReader.read(inputFastq)) { System.out.println(fastq.getDescription()); System.out.println(fastq.getSequence()); String trimSeq = fastq.getSequence().substring(0, fastq.getSequence().length() - 6); System.out.println(trimSeq); System.out.println(fastq.getQuality()); String trimQual = fastq.getQuality().substring(0, fastq.getQuality().length() - 6); System.out.println(trimQual); FastqBuilder trimFastq = new FastqBuilder(); trimFastq.withVariant(FastqVariant.FASTQ_ILLUMINA); trimFastq.withDescription(fastq.getDescription()); trimFastq.appendSequence(trimSeq); trimFastq.appendQuality(trimQual); qWriter.write(outputFastq, trimFastq.build()); RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns, fastq.getDescription()); } } } What did I wrong? Suggestions: 1) After I trimmed the fastq files the header information for quality is empty @HWI-EAS406:5:1:0:1390#0/1 GGGTGATGGCCGCTGCCGATGGCGTCAAAA + OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO this reduced the size of the files but is it compatible with SOAP and TopHat? 2) I was using fastq files up to 6 GBytes and I have not run any benchmarks with different Buffer/stream combination on big text files and therefore I am not sure that is enough to use just FileInputStream or FileOutputStream. BioJavaX is using BufferedReader br = new BufferedReader(new FileReader()) are there any speed difference? Overall I think the API looks good and for doc you could use this code and put it on BioJava. On Mon, 29 Mar 2010 22:01:23 -0400 (EDT) Michael Heuer wrote: > > FastqBuilder defaults to the Sanger variant, see > > http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/FastqBuilder.html#DEFAULT_VARIANT > > > In your code, you just need to specify the Illumina variant > > FastqBuilder trimFastq = new FastqBuilder() > .withVariant(FastqVariant.FASTQ_ILLUMINA) > .withDescription(fastq.getDescription()) > .appendSequence(trimSeq) > .appendQuality(trimQual); > > > Please let me know if you have any API or doc suggestions, as this > stuff has not been used much by anyone other than myself. > > michael > > > From heuermh at acm.org Wed Mar 31 23:56:42 2010 From: heuermh at acm.org (Michael Heuer) Date: Wed, 31 Mar 2010 23:56:42 -0400 (EDT) Subject: [Biojava-l] Reading and writting Fastq files In-Reply-To: <20100330215047.084f6b00@wp01> Message-ID: xyz wrote: > Thank you it works, but after I extended the code with > RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns, > fastq.getDescription()); > in order to get also a trimmed fasta file I got the following error: > > Fastq2Fasta.java:51: cannot > find symbol symbol : method > writeFasta(java.io.FileOutputStream,java.lang.String,org.biojavax.SimpleNamespace,java.lang.String) > location: class org.biojavax.bio.seq.RichSequence.IOTools > RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns, > fastq.getDescription()); 1 error The fastq package has not yet been integrated with biojava core or the biojavax packages. If you would like to use RichSequence.IOTools, you would need to create a RichSequence from each Fastq object before writing. Something like import static ...RichSequence.Tools.*; import static ...RichSequence.IOTools.*; Fastq fastq = ...; Namespace namepace = ...; RichSequence richSequence = createRichSequence( namespace, fastq.getDescription(), fastq.getSequence(), DNATools.getDNA()); writeFasta(outputStream, richSequence, namespace); may work. > Suggestions: > 1) > After I trimmed the fastq files the header information for quality > is empty > > @HWI-EAS406:5:1:0:1390#0/1 > GGGTGATGGCCGCTGCCGATGGCGTCAAAA > + > OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO > > this reduced the size of the files but is it compatible with > SOAP and TopHat? Sorry, not sure what you are asking here. > 2) > I was using fastq files up to 6 GBytes and I have not run any benchmarks > with different Buffer/stream combination on big text files and therefore > I am not sure that is enough to use just FileInputStream or > FileOutputStream. BioJavaX is using BufferedReader br = new > BufferedReader(new FileReader()) are there any speed difference? AbstractFastqReader.read(InputStream) uses a BufferedReader, and all the other read methods pass through that one. michael From rmb32 at cornell.edu Fri Mar 26 03:44:09 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 00:44:09 -0700 Subject: [Biojava-l] GSoC mentors mailing list Message-ID: <4BAC65C9.307@cornell.edu> Hi all, If you have volunteered to be a possible GSoC mentor, and have not already been subscribed to the (mentors-only) gsoc-mentors mailing list, send me an email and I'll subscribe you. Rob Buels OBF GSoC 2010 Admin From rmb32 at cornell.edu Fri Mar 26 12:30:30 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 09:30:30 -0700 Subject: [Biojava-l] Announcing OBF Summer of Code - please forward! Message-ID: <4BACE126.1030500@cornell.edu> Hi all, Here's an advertising-ready announcement for OBF's Summer of Code, thanks to Christian Zmasek and Hilmar Lapp for their excellent writing. Student applications are due April 9! Please spread it widely, we need to reach lots of students with it! Rob Buels OBF GSoC 2010 Admin ============================================================ *** Please disseminate widely at your local institutions *** *** including posting to message and job boards, so that *** *** we reach as many students as possible. *** ============================================================ OPEN BIOINFORMATICS FOUNDATION SUMMER OF CODE 2010 Applications due 19:00 UTC, April 9, 2010. http://www.open-bio.org/wiki/Google_Summer_of_Code The Open Bioinformatics Foundation Summer of Code program provides a unique opportunity for undergraduate, masters, and PhD students to obtain hands-on experience writing and extending open-source software for bioinformatics under the mentorship of experienced developers from around the world. The program is the participation of the Open Bioinformatics Foundation (OBF) as a mentoring organization in the Google Summer of Code(tm) (http://code.google.com/soc/). Students successfully completing the 3 month program receive a $5,000 USD stipend, and may work entirely from their home or home institution. Participation is open to students from any country in the world except countries subject to US trade restrictions. Each student will have at least one dedicated mentor to show them the ropes and help them complete their project. The Open Bioinformatics Foundation is particularly seeking students interested in both bioinformatics (computational biology) and software development. Some initial project ideas are listed on the website. These range from Galaxy phylogenetics pipeline development in Biopython to lightweight sequence objects and lazy parsing in BioPerl, a DAS Server for large files on local filesystems, and mapping Java libraries to Perl/Ruby/Python using Biolib+SWIG+JNI. All project ideas are flexible and many can be adjusted in scope to match the skills of the student. We also welcome and encourage students proposing their own project ideas; historically some of the most successful Summer of Code projects are ones proposed by the students themselves. TO APPLY: Apply online at the Google Summer of Code website (http://socghop.appspot.com/), where you will also find GSoC program rules and eligibility requirements. The 12-day application period for students runs from Monday, March 29 through Friday, April 9th, 2010. INQUIRIES: We strongly encourage all interested students to get in touch with us with their ideas as early on as possible. See the OBF GSoC page for contact details. 2010 OBF Summer of Code: http://www.open-bio.org/wiki/Google_Summer_of_Code Google Summer of Code FAQ: http://socghop.appspot.com/document/show/program/google/gsoc2010/faqs From sheoran143 at gmail.com Wed Mar 24 21:19:29 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Wed, 24 Mar 2010 20:19:29 -0500 Subject: [Biojava-l] Bug fix for Biojava in regard to email with subject :( Hibernate Exception and suggestion for change in BioSqlSchema) Message-ID: <4BAABA21.4000301@gmail.com> I am writing this email again, I didn't get any response weather this bugs are patched or are they lost some where on mailing list. I am not sure that's why I am writing this back. I don't know how to apply this patch So I am counting on you guys to apply theses patch and reply me back so I know its fixed. Thanks Deepak Sheoran Hi In response to bug fix suggested by Richard I have created some patches. We need to apply these to fix biojava from processing references from a genbank record in a wrong manner which cause more hibernate exceptions. After applying patch, reference resolution code will test pubmed or medline id, then if no match then test author/title/location, then if still no match create a new reference. I even tested it with GenbankRelease 175 and I gained almost 3159 more records in my database. Can somebody please have a look on second issue of it and fix it " 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from). " Also I am planning on making a bridge between biosql database loaded using bioperl and biojava, here is my some of the investigation can you guys suggest some direction on it. Have a look on attached files 1) Biojava_BioPerl_Diff.xls ==> it have view of tables where genbank record is stored in biosql instance by bioperl and biojava 2) GenbankRecord.doc ==> its word document having a genbank showing where its information goes in biosql using bioperl and biojava 3) BioSqlRichobjectBuilder.patch ==> patch needed for BioSqlRichObjectBuild.java class 4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class Thanks Deepak Sheoran -------- Original Message -------- Subject: Re: Hibernate Exception and suggestion for change in BioSqlSchema Date: Tue, 9 Feb 2010 20:34:32 +1300 From: Richard Holland To: Deepak Sheoran CC: biojava-l at biojava.org Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text. However, in answer to your two questions: 1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March). 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from). cheers, Richard On 9 Feb 2010, at 20:21, Deepak Sheoran wrote: > > Hi Richard > > Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message. > > > Thanks > Deepak Sheoran > -------- Original Message -------- > Subject: Hibernate Exception and suggestion for change in BioSqlSchema > Date: Wed, 03 Feb 2010 08:07:35 -0600 > From: Deepak Sheoran > To: biojava-l at lists.open-bio.org > > Hi guys, > > A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is:http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html > On Richard suggestion in above link I am able to resolve some of issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us. > ? The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id. > This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object . > Now when you tie RichObjectFactory to a active hibernate session then the class "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database. > But problem is with below part of that method: > ?..LineNumber: 114 > else if (SimpleDocRef.class.isAssignableFrom(clazz)) > { queryType = "DocRef"; > // convert List constructor to String representation for query > ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true)); > if (ourParamsList.size()<3) { > queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null"; > } else { > queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?"; > } > } > ..LineNubmer: 123 > Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code > ?.LineNumber: 447 > else { > try { > CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)}); > RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount); > rlistener.getCurrentFeature().addRankedCrossRef(rcr); > } catch (ChangeVetoException e) { > throw new ParseException(e+", accession:"+accession); > } > } > ?..LineNumber:455 > Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of "unique constraint on dbxref_id" column. > > The only way to get these record in database is: > ? The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table. Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them. > ? Second solution is slightly difficult to implement, is to change the way "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)" make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session. > > Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email) > Reference_id > Dbxref_id > Location > Title > Authors > crc > 216 > 18554304 > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008) > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H. > 9E940E01F4BE3CD0 > 230 > 18554304 > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008) > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H. > D3BC0C17F3F786C9 > 415 > 16790744 > Infect. Immun. 74 (7), 3715-3726 (2006) > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A. > 60AEDFA0CEEACC38 > 969 > 16790744 > Infect. Immun. 74 (7), 3715-3726 (2006) > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A. > 4B1232999F6E8130 > 929 > 8688087 > Science 273 (5278), 1058-1073 (1996) > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. > 3E79B40DD2AAA2B7 > 932 > 8688087 > Science 273 (5278), 1058-1073 (1996) > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. > 094EB3384F8D6DE8 > 1426 > 10684935 > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39 > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M. > 357648D8FD8C6C8A > 1481 > 10684935 > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39 > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C. > 115411EB2DEE5654 > 1497 > 14689165 > Arch. Microbiol. 181 (2), 144-154 (2004) > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E. > 4D5D376EECCD186B > 1501 > 14689165 > Arch. Microbiol. 181 (2), 144-154 (2004) > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E. > 4D57954EECDED66B > 1556 > 18060065 > PLoS ONE 2 (12), E1271 (2007) > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > 698688FB6DB95247 > 1559 > 18060065 > PLoS ONE 2 (12), E1271 (2007) > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > E25E1BA99DB18F3D > > ? The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature > ? Which means in richsequence object some feature have location object which have its feature set to null. > ? My Observation: > ? Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record > ? After catching the hibernate exception I went through all the features and either biojava or hibernate changed the object type of a CompoundRichLocation to SimpleRichLocation and set the feature variable to null. > ? Below is the screen shot of one of my tests > ? Settings before trying to persits the richsequence object to database > > > ? > ? After trying to persits the richsequence object to database and got in hibernate exception catch > > ? > > ? So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening. > ? Some extra information to make things more clear to you guys. > ? Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object. > ? LOCUS AE001439 1643831 bp DNA circular BCT 19-JAN-2006 > ? richSequence.feature Index : 2540 and line number in the genbank record : 22115 > ? LOCUS CP001189 3887492 bp DNA circular BCT 16-OCT-2008 > ? richSequence.feature Index : 127 and line number in the genbank record : 2137 > ? LOCUS CP001292 328635 bp DNA circular BCT 17-DEC-2008 > ? richSequence.feature Index : 389 and line number in the genbank record : 3632 > ? LOCUS AM279694 238517 bp DNA linear BCT 23-OCT-2008 > ? richSequence.feature Index : 47 and line number in the genbank record : 4841 > ? LOCUS CR931663 18517 bp DNA linear BCT 18-SEP-2008 > ? richSequence.feature Index : 45 and line number in the genbank record : 442 > ? The complete exception msg : > org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature > at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72) > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290) > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535) > at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523) > at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78) > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E:holland at eaglegenomics.com http://www.eaglegenomics.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: Biojava_BioPerl_diff.xls Type: application/vnd.ms-excel Size: 346624 bytes Desc: not available URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: BioSqlRichObjectBuilder.patch URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: GenbankFormat.patch URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: GenbankRecord.doc Type: application/msword Size: 59392 bytes Desc: not available URL: From andreas at sdsc.edu Fri Mar 5 16:56:40 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 5 Mar 2010 08:56:40 -0800 Subject: [Biojava-l] Google summer of code Message-ID: <59a41c431003050856v17c83b80sf1fb59f2587c9cd1@mail.gmail.com> Hi, The Open Bioinformatics Foundation (BioJava's mother organisation) is preparing an application for the Google Summer of Code. If you are interested in becoming a mentor for a BioJava related project, you can join us in the application. If you are a student and are interested in a project, please take a look at these pages: http://www.open-bio.org/wiki/Google_Summer_of_Code http://biojava.org/wiki/Google_Summer_of_Code Andreas From jeedward at yahoo.com Mon Mar 8 15:44:05 2010 From: jeedward at yahoo.com (John Edward) Date: Mon, 8 Mar 2010 07:44:05 -0800 (PST) Subject: [Biojava-l] Call for papers: BCBGC-10, USA, July 2010 Message-ID: <800341.81267.qm@web45915.mail.sp1.yahoo.com> It would be highly appreciated if you could share this announcement with your colleagues, students and individuals whose research is in bioinformatics, computational biology, genomics, data-mining, and related areas. Call for papers: BCBGC-10, USA, July 2010 The 2010 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) (website: http://www.PromoteResearch.org ) will be held during 12-14 of July 2010 in Orlando, FL, USA. BCBGC is an important event in the areas of bioinformatics, computational biology, genomics and chemoinformatics and focuses on all areas related to the conference. The conference will be held at the same time and location where several other major international conferences will be taking place. The conference will be held as part of 2010 multi-conference (MULTICONF-10). MULTICONF-10 will be held during July 12-14, 2010 in Orlando, Florida, USA. The primary goal of MULTICONF is to promote research and developmental activities in computer science, information technology, control engineering, and related fields. Another goal is to promote the dissemination of research to a multidisciplinary audience and to facilitate communication among researchers, developers, practitioners in different fields. The following conferences are planned to be organized as part of MULTICONF-10. ? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-10) ? International Conference on Automation, Robotics and Control Systems (ARCS-10) ? International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) ? International Conference on Computer Communications and Networks (CCN-10) ? International Conference on Enterprise Information Systems and Web Technologies (EISWT-10) ? International Conference on High Performance Computing Systems (HPCS-10) ? International Conference on Information Security and Privacy (ISP-10) ? International Conference on Image and Video Processing and Computer Vision (IVPCV-10) ? International Conference on Software Engineering Theory and Practice (SETP-10) ? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-10) MULTICONF-10 will be held at Imperial Swan Hotel and Suites. It is a full-service resort that puts you in the middle of the fun! Located 1/2 block south of the famed International Drive, the hotel is just minutes from great entertainment like Walt Disney World? Resort, Universal Studios and Sea World Orlando. Guests can enjoy free scheduled transportation to these theme parks, as well as spacious accommodations, outdoor pools and on-site dining ? all situated on 10 tropically landscaped acres. Here, guests can experience a full-service resort with discount hotel pricing in Orlando. We invite draft paper submissions. Please see the website http://www.PromoteResearch.org for more details. Sincerely John Edward From sheoran143 at gmail.com Mon Mar 8 21:11:05 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Mon, 08 Mar 2010 15:11:05 -0600 Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in current maven based project Message-ID: <4B9567E9.7080909@gmail.com> Hi I was making a local version of current maven project on my machine so that i can fix some reference related bugs in biojava. But when I build the local version and tried to use it. I got an error on method RichObjectFactory.connectToBioSql(Object session) of current version of bio-java live. when I had a look on it I saw a comment on it "// commenting out for the moment, since it prevents core from compiling. // TODO: move to BioSql module" then I uncommitted the code and add these import statements to RichObjectFactory.java and the problem is fixed : import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver; import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder; import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler; After this I tried compiling bioSql module it went successfully and also when I compiled Core module it went successfully too.I don't if this is the only reason then please uncomment these line in main svn version since i don't how to do it. Thanks Deepak Sheoran From andreas at sdsc.edu Tue Mar 9 17:28:25 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 9 Mar 2010 09:28:25 -0800 Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in current maven based project In-Reply-To: <4B9567E9.7080909@gmail.com> References: <4B9567E9.7080909@gmail.com> Message-ID: <59a41c431003090928w5f28c6e1ladd571aa283e93c1@mail.gmail.com> Hi Deepak, thanks for spotting this. This factory method should clearly be moved to the biosql module and not be part of the core. Anybody who has a deeper knowledge of the biosql code: Where is the best place in the biosql module to move this to? A work around the compile problem would be to use reflection to mask the calls to the methods in the other module, but it feels like a hack... Andreas On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran wrote: > Hi > I was making a local version of current maven project on my machine so that > i can fix some reference related bugs in biojava. But when I build the local > version and tried to use it. I got an error on method > RichObjectFactory.connectToBioSql(Object session) of current version of > bio-java live. when I had a look on it I saw a comment on it > > "// commenting out for the moment, since it prevents core from > compiling. > // TODO: move to BioSql module" > > then I uncommitted the code and add these import statements to > RichObjectFactory.java and the problem is fixed : > > import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver; > import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder; > import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler; > > After this I tried compiling bioSql module it went successfully and also > when I compiled Core module it went successfully too.I don't if this is the > only reason then please uncomment these line in main svn version since i > don't how to do it. > > Thanks > Deepak Sheoran > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From sheoran143 at gmail.com Tue Mar 9 20:10:00 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Tue, 09 Mar 2010 14:10:00 -0600 Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in current maven based project In-Reply-To: <59a41c431003090928w5f28c6e1ladd571aa283e93c1@mail.gmail.com> References: <4B9567E9.7080909@gmail.com> <59a41c431003090928w5f28c6e1ladd571aa283e93c1@mail.gmail.com> Message-ID: <4B96AB18.908@gmail.com> Hi Andreas I guess it should go in "org.biojavax.bio.db.biosql" package, it make sense to put this class their. Deepak Sheoran On 3/9/2010 11:28 AM, Andreas Prlic wrote: > Hi Deepak, > > thanks for spotting this. This factory method should clearly be moved > to the biosql module and not be part of the core. Anybody who has a > deeper knowledge of the biosql code: Where is the best place in the > biosql module to move this to? > > A work around the compile problem would be to use reflection to mask > the calls to the methods in the other module, but it feels like a hack... > > Andreas > > On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran > wrote: > > Hi > I was making a local version of current maven project on my > machine so that i can fix some reference related bugs in biojava. > But when I build the local version and tried to use it. I got an > error on method > RichObjectFactory.connectToBioSql(Object session) of current > version of bio-java live. when I had a look on it I saw a comment > on it > > "// commenting out for the moment, since it prevents core from > compiling. > // TODO: move to BioSql module" > > then I uncommitted the code and add these import statements to > RichObjectFactory.java and the problem is fixed : > > import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver; > import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder; > import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler; > > After this I tried compiling bioSql module it went successfully > and also when I compiled Core module it went successfully too.I > don't if this is the only reason then please uncomment these line > in main svn version since i don't how to do it. > > Thanks > Deepak Sheoran > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From holland at eaglegenomics.com Wed Mar 10 13:31:43 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Wed, 10 Mar 2010 21:31:43 +0800 Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in current maven based project In-Reply-To: <4B96AB18.908@gmail.com> References: <4B9567E9.7080909@gmail.com> <59a41c431003090928w5f28c6e1ladd571aa283e93c1@mail.gmail.com> <4B96AB18.908@gmail.com> Message-ID: The problem is that the RichObjectFactory is generic, but the connectToBioSQL method is BioSQL specific. What really needs to happen is abstract out the connectToBioSQL method _only_ to a more specific class in the biosql module, and use (if necessary create) setters on RichObjectFactory for it to use. On 10 Mar 2010, at 04:10, Deepak Sheoran wrote: > Hi Andreas > I guess it should go in "org.biojavax.bio.db.biosql" package, it make sense to put this class their. > > Deepak Sheoran > > On 3/9/2010 11:28 AM, Andreas Prlic wrote: >> Hi Deepak, >> >> thanks for spotting this. This factory method should clearly be moved to the biosql module and not be part of the core. Anybody who has a deeper knowledge of the biosql code: Where is the best place in the biosql module to move this to? >> >> A work around the compile problem would be to use reflection to mask the calls to the methods in the other module, but it feels like a hack... >> >> Andreas >> >> On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran > wrote: >> >> Hi >> I was making a local version of current maven project on my >> machine so that i can fix some reference related bugs in biojava. >> But when I build the local version and tried to use it. I got an >> error on method >> RichObjectFactory.connectToBioSql(Object session) of current >> version of bio-java live. when I had a look on it I saw a comment >> on it >> >> "// commenting out for the moment, since it prevents core from >> compiling. >> // TODO: move to BioSql module" >> >> then I uncommitted the code and add these import statements to >> RichObjectFactory.java and the problem is fixed : >> >> import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver; >> import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder; >> import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler; >> >> After this I tried compiling bioSql module it went successfully >> and also when I compiled Core module it went successfully too.I >> don't if this is the only reason then please uncomment these line >> in main svn version since i don't how to do it. >> >> Thanks >> Deepak Sheoran >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From mark.schreiber at novartis.com Thu Mar 11 03:14:54 2010 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 11 Mar 2010 11:14:54 +0800 Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in current maven based project In-Reply-To: Message-ID: Could a subclass of the RichObjectFactory exist in the BioSQL module. If you want your RichObjects backed by BioSQL you use the [BioSQL]RichObjectFactory from the BioSQL package??? - Mark biojava-l-bounces at lists.open-bio.org wrote on 03/10/2010 09:31:43 PM: > The problem is that the RichObjectFactory is generic, but the > connectToBioSQL method is BioSQL specific. What really needs to > happen is abstract out the connectToBioSQL method _only_ to a more > specific class in the biosql module, and use (if necessary create) > setters on RichObjectFactory for it to use. > > > On 10 Mar 2010, at 04:10, Deepak Sheoran wrote: > > > Hi Andreas > > I guess it should go in "org.biojavax.bio.db.biosql" package, it > make sense to put this class their. > > > > Deepak Sheoran > > > > On 3/9/2010 11:28 AM, Andreas Prlic wrote: > >> Hi Deepak, > >> > >> thanks for spotting this. This factory method should clearly be > moved to the biosql module and not be part of the core. Anybody who > has a deeper knowledge of the biosql code: Where is the best place > in the biosql module to move this to? > >> > >> A work around the compile problem would be to use reflection to > mask the calls to the methods in the other module, but it feels likea hack... > >> > >> Andreas > >> > >> On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran mailto:sheoran143 at gmail.com>> wrote: > >> > >> Hi > >> I was making a local version of current maven project on my > >> machine so that i can fix some reference related bugs in biojava. > >> But when I build the local version and tried to use it. I got an > >> error on method > >> RichObjectFactory.connectToBioSql(Object session) of current > >> version of bio-java live. when I had a look on it I saw a comment > >> on it > >> > >> "// commenting out for the moment, since it prevents core from > >> compiling. > >> // TODO: move to BioSql module" > >> > >> then I uncommitted the code and add these import statements to > >> RichObjectFactory.java and the problem is fixed : > >> > >> import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver; > >> import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder; > >> import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler; > >> > >> After this I tried compiling bioSql module it went successfully > >> and also when I compiled Core module it went successfully too.I > >> don't if this is the only reason then please uncomment these line > >> in main svn version since i don't how to do it. > >> > >> Thanks > >> Deepak Sheoran > >> > >> > >> _______________________________________________ > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > >> > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > >> > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From holland at eaglegenomics.com Thu Mar 11 16:10:15 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 12 Mar 2010 00:10:15 +0800 Subject: [Biojava-l] Bug fix for biojava class RichObjectFactory.java in current maven based project In-Reply-To: References: Message-ID: <4E92965B-F9EA-43B1-9235-4FA7BAC09308@eaglegenomics.com> Could do. On 11 Mar 2010, at 11:14, mark.schreiber at novartis.com wrote: > > Could a subclass of the RichObjectFactory exist in the BioSQL module. If you want your RichObjects backed by BioSQL you use the [BioSQL]RichObjectFactory from the BioSQL package??? > > - Mark > > > biojava-l-bounces at lists.open-bio.org wrote on 03/10/2010 09:31:43 PM: > > > The problem is that the RichObjectFactory is generic, but the > > connectToBioSQL method is BioSQL specific. What really needs to > > happen is abstract out the connectToBioSQL method _only_ to a more > > specific class in the biosql module, and use (if necessary create) > > setters on RichObjectFactory for it to use. > > > > > > On 10 Mar 2010, at 04:10, Deepak Sheoran wrote: > > > > > Hi Andreas > > > I guess it should go in "org.biojavax.bio.db.biosql" package, it > > make sense to put this class their. > > > > > > Deepak Sheoran > > > > > > On 3/9/2010 11:28 AM, Andreas Prlic wrote: > > >> Hi Deepak, > > >> > > >> thanks for spotting this. This factory method should clearly be > > moved to the biosql module and not be part of the core. Anybody who > > has a deeper knowledge of the biosql code: Where is the best place > > in the biosql module to move this to? > > >> > > >> A work around the compile problem would be to use reflection to > > mask the calls to the methods in the other module, but it feels likea hack... > > >> > > >> Andreas > > >> > > >> On Mon, Mar 8, 2010 at 1:11 PM, Deepak Sheoran > mailto:sheoran143 at gmail.com>> wrote: > > >> > > >> Hi > > >> I was making a local version of current maven project on my > > >> machine so that i can fix some reference related bugs in biojava. > > >> But when I build the local version and tried to use it. I got an > > >> error on method > > >> RichObjectFactory.connectToBioSql(Object session) of current > > >> version of bio-java live. when I had a look on it I saw a comment > > >> on it > > >> > > >> "// commenting out for the moment, since it prevents core from > > >> compiling. > > >> // TODO: move to BioSql module" > > >> > > >> then I uncommitted the code and add these import statements to > > >> RichObjectFactory.java and the problem is fixed : > > >> > > >> import org.biojavax.bio.db.biosql.BioSQLCrossReferenceResolver; > > >> import org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder; > > >> import org.biojavax.bio.db.biosql.BioSQLRichSequenceHandler; > > >> > > >> After this I tried compiling bioSql module it went successfully > > >> and also when I compiled Core module it went successfully too.I > > >> don't if this is the only reason then please uncomment these line > > >> in main svn version since i don't how to do it. > > >> > > >> Thanks > > >> Deepak Sheoran > > >> > > >> > > >> _______________________________________________ > > >> Biojava-l mailing list - Biojava-l at lists.open-bio.org > > >> > > >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > >> > > >> > > > > > > _______________________________________________ > > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > -- > > Richard Holland, BSc MBCS > > Operations and Delivery Director, Eagle Genomics Ltd > > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Mon Mar 15 10:34:14 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 15 Mar 2010 10:34:14 +0000 Subject: [Biojava-l] Hackathon in Boston, July 2010 Message-ID: <5FC2D8EC-5408-4126-9A7D-CB6B3500B61C@eaglegenomics.com> Hi all, Following the successful hackathon in Cambridge earlier this year, it was originally planned to hold a second one in Boston in conjunction with BOSC in order to give those who couldn't make it to the UK a chance to get involved. However, OBF have beaten us to it by organising a cross-project CodeFest! http://www.open-bio.org/wiki/Codefest_2010 It would be great for BioJava people to get involved with this cross-project hackathon effort, and it saves organising one of our own! :) All relevant info is on the web page linked to above, and if you have any questions, ask Brad as detailed on the page. cheers, Richard -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From xuejiachen at gmail.com Mon Mar 15 23:09:50 2010 From: xuejiachen at gmail.com (Jiachen Xue) Date: Mon, 15 Mar 2010 19:09:50 -0400 Subject: [Biojava-l] question about BLAST output parsing Message-ID: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com> Hi, Thanks advance for help. For the following piece of text appearing in a blast output. How can I get the fields of "Identities", "Positives", "Gaps" as well as the alignment information, such as "DK V L+D + G +S + +++ +E GA+K+ L + AAPE" and subject string? >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase; AltName: Full=UMP pyrophosphorylase; AltName: Full=UPRTase Length = 209 Score = 32.0 bits (71), Expect = 9.7, Method: Compositional matrix adjust. Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%) Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399 DK V L+D + G +S + +++ +E GA+K+ L + AAPE Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165 From anjolou at hotmail.com Tue Mar 16 09:20:35 2010 From: anjolou at hotmail.com (Louise Ott) Date: Tue, 16 Mar 2010 10:20:35 +0100 Subject: [Biojava-l] question about BLAST output parsing In-Reply-To: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com> References: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com> Message-ID: Hello, I tried to use the biojava blast parser myself but i didn't find a way to get back these informations.If your blast result can be in xml, you should try to use jaxb to parse it (this is what i used).There are already some code for marshall/unmarshall in the biojava3 project.I give you the link, but it seems to be dead right now : http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/branches/biojava3 http://www.biojava.org/wiki/BioJava3_project Have a nice day, Louise > Date: Mon, 15 Mar 2010 19:09:50 -0400 > From: xuejiachen at gmail.com > To: biojava-l at lists.open-bio.org > Subject: [Biojava-l] question about BLAST output parsing > > Hi, > > Thanks advance for help. > > For the following piece of text appearing in a blast output. How can I get > the fields of "Identities", "Positives", "Gaps" as well as the alignment > information, such as "DK V L+D + G +S + +++ +E GA+K+ L + AAPE" and > subject string? > > >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase; > AltName: Full=UMP > pyrophosphorylase; AltName: Full=UPRTase > Length = 209 > > Score = 32.0 bits (71), Expect = 9.7, Method: Compositional matrix > adjust. > Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%) > > Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399 > DK V L+D + G +S + +++ +E GA+K+ L + AAPE > Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165 > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l _________________________________________________________________ Hotmail arrive sur votre t?l?phone ! Compatible Iphone, Windows Phone, Blackberry, ? http://www.messengersurvotremobile.com/?d=Hotmail From anjolou at hotmail.com Tue Mar 16 09:23:37 2010 From: anjolou at hotmail.com (Louise Ott) Date: Tue, 16 Mar 2010 10:23:37 +0100 Subject: [Biojava-l] question about BLAST output parsing In-Reply-To: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com> References: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com> Message-ID: Sorry i forgot : there is an example of using blast parser in here : http://biojava.org/wiki/BioJava:CookBook:Blast:Parser It should be enough for what you want to do. > Date: Mon, 15 Mar 2010 19:09:50 -0400 > From: xuejiachen at gmail.com > To: biojava-l at lists.open-bio.org > Subject: [Biojava-l] question about BLAST output parsing > > Hi, > > Thanks advance for help. > > For the following piece of text appearing in a blast output. How can I get > the fields of "Identities", "Positives", "Gaps" as well as the alignment > information, such as "DK V L+D + G +S + +++ +E GA+K+ L + AAPE" and > subject string? > > >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase; > AltName: Full=UMP > pyrophosphorylase; AltName: Full=UPRTase > Length = 209 > > Score = 32.0 bits (71), Expect = 9.7, Method: Compositional matrix > adjust. > Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%) > > Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399 > DK V L+D + G +S + +++ +E GA+K+ L + AAPE > Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165 > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l _________________________________________________________________ Consultez gratuitement vos emails Orange, Gmail, Free, ... directement dans HOTMAIL ! http://www.windowslive.fr/hotmail/agregation/ From andreas at sdsc.edu Tue Mar 16 15:19:45 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 16 Mar 2010 08:19:45 -0700 Subject: [Biojava-l] question about BLAST output parsing In-Reply-To: References: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com> Message-ID: <59a41c431003160819p4a7031c9jfc7c16454df5210@mail.gmail.com> Yea, the BioJava Blast parser has not been maintained in quite a while. Probably parsing the XML output of Blast is the thing to do nowadays. About Biojava3: the wiki documentation is a bit behind, the code is now in the main biojava-trunk and development has been quite active over the last months. Andreas On Tue, Mar 16, 2010 at 2:20 AM, Louise Ott wrote: > > > Hello, > I tried to use the biojava blast parser myself but i didn't find a way to > get back these informations.If your blast result can be in xml, you should > try to use jaxb to parse it (this is what i used).There are already some > code for marshall/unmarshall in the biojava3 project.I give you the link, > but it seems to be dead right now : > > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/branches/biojava3 > http://www.biojava.org/wiki/BioJava3_project > Have a nice day, > Louise > > > > Date: Mon, 15 Mar 2010 19:09:50 -0400 > > From: xuejiachen at gmail.com > > To: biojava-l at lists.open-bio.org > > Subject: [Biojava-l] question about BLAST output parsing > > > > Hi, > > > > Thanks advance for help. > > > > For the following piece of text appearing in a blast output. How can I > get > > the fields of "Identities", "Positives", "Gaps" as well as the alignment > > information, such as "DK V L+D + G +S + +++ +E GA+K+ L + AAPE" and > > subject string? > > > > >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase; > > AltName: Full=UMP > > pyrophosphorylase; AltName: Full=UPRTase > > Length = 209 > > > > Score = 32.0 bits (71), Expect = 9.7, Method: Compositional matrix > > adjust. > > Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%) > > > > Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399 > > DK V L+D + G +S + +++ +E GA+K+ L + AAPE > > Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165 > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > _________________________________________________________________ > Hotmail arrive sur votre t?l?phone ! Compatible Iphone, Windows Phone, > Blackberry, ? > http://www.messengersurvotremobile.com/?d=Hotmail > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From hlapp at drycafe.net Tue Mar 16 20:03:50 2010 From: hlapp at drycafe.net (Hilmar Lapp) Date: Tue, 16 Mar 2010 16:03:50 -0400 Subject: [Biojava-l] [OT] Job opportunity: Training coordinator and Bioinformatics Project Manager Message-ID: <0CDDCED9-266E-4CCE-8240-D7E2C8522784@drycafe.net> Hi all - first off, sorry for the cross-posting, we're trying to advertise this as widely as possible. Second, apologies if this is committing an offense and considered spam. I thought though that there might be some people around here who may be interested and suitable. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- informatics.nescent.org : =========================================================== A unique position is available for a training coordinator and bioinformatics project manager at the U.S. National Evolutionary Synthesis Center in Durham, North Carolina (NESCent, http:// nescent.org). NESCent is a National Science Foundation funded research center managed by Duke University, the University of North Carolina at Chapel Hill and North Carolina State University on behalf of the international evolutionary biology community. NESCent facilitates synthetic research by bringing together diverse expertise, data, tools and concepts (Sidlauskas et al. 2009). In addition to a resident population of 20-30 scientists, the Center hosts over 800 visitors a year. An informatics staff is on-site to support resident and visiting scientists? needs in high-performance computing, electronic collaboration, scientific software and databases; this includes custom software development for a limited number of high- impact projects. NESCent?s informatics training program includes a rotating series of open-application summer courses, ad-hoc short courses for resident scientists, and remote internships (including past participation in the Google Summer of Code). The training coordinator and bioinformatics project manager will provide oversight to the Center?s training activities. The incumbent will also serve as the interface between scientists and software developers at NESCent. The position provides extensive opportunities for collaboration and intellectual engagement with both NESCent- sponsored scientists and informatics staff; however, this is not an independent research position. The incumbent will report to the Director, while overseeing the work of a small informatics team and coordinating activities among the Center?s science, education and informatics programs. Responsibilities: ? 50% - Consult with sponsored scientists (including scientists in residence and working group participants) about informatics resources and needs. Manage software product development by gathering requirements from scientists, participating in conceptual design, monitoring implementation progress and product quality, facilitating communication between software developers and scientists, and researching software solutions. ? 25% - Oversee NESCent?s course curriculum by identifying opportunities for onsite or online informatics courses that satisfy demand for advanced training of resident and visiting scientists, recruiting instructors, providing guidance to instructors in developing course syllabi, coordinating logistical and technical support requirements, conducting assessments, and serving as a liaison to course organizers at other institutions. ? 25% - Assisting in the management of NESCent?s summer informatics intern program, by coordinating the recruitment, application & review process for students, communicating expectations to students and mentors, monitoring student progress, documenting student outcomes, and performing assessments. Education: Required: M.S. in Biology, Bioinformatics, or a related field. Preferred: Ph.D. and two years postdoctoral experience in evolutionary biology, or an equivalent combination of relevant education and/or experience. Experience: Required: Excellent communication, interpersonal, and organizational skills. Experience with computationally oriented scientific research. Preferred: At least two years in development of databases and open source software. Organization, coordination, development and delivery of courses and workshops appropriate for graduate-level participants. Terms of Employment: Salary will be competitive and commensurate with experience. As a full-time employee, the incumbent will receive Duke University?s benefits package (http://hr.duke.edu/benefits/main.html). The position is available immediately and will remain open until filled. The position is currently funded through November 2014, contingent on annual renewal of the Center by the NSF. How to Apply: Please send a C.V., including contact information for three references, and a brief statement of interest to Allen Rodrigo, Director, NESCent, at a.rodrigo at nescent.org. Inquiries about suitability for the position are welcome. Duke University is an Equal Opportunity/Affirmative Action employer. Additional information about NESCent: http://www.nescent.org References: Sidlauskas B, Ganapathy G, Hazkani-Covo E, Jenkins KP, Lapp H, McCall LW, Price S, Scherle R, Spaeth PA, Kidd DM (2009) Linking Big: The Continuing Promise of Evolutionary Synthesis. Evolution. http://dx.doi.org/10.1111/j.1558-5646.2009.00892.x From markjschreiber at gmail.com Wed Mar 17 01:14:51 2010 From: markjschreiber at gmail.com (Mark Schreiber) Date: Wed, 17 Mar 2010 09:14:51 +0800 Subject: [Biojava-l] question about BLAST output parsing In-Reply-To: <59a41c431003160819p4a7031c9jfc7c16454df5210@mail.gmail.com> References: <1227e7b91003151609l1e4dcaaaga9733ca4c2704c70@mail.gmail.com> <59a41c431003160819p4a7031c9jfc7c16454df5210@mail.gmail.com> Message-ID: <93b45ca51003161814y7196e3e8i8e329b79e612cf50@mail.gmail.com> I generally don't recommend parsing the standard BLAST output as it keeps changing subtly . Best to parse one of the tabular formats or the XML output. - Mark On Tue, Mar 16, 2010 at 11:19 PM, Andreas Prlic wrote: > Yea, the BioJava Blast parser has not been maintained in quite a while. > Probably parsing the XML output of Blast is the thing to do nowadays. About > Biojava3: the wiki documentation is a bit behind, the code is now in the > main biojava-trunk and development has been quite active over the last > months. > > Andreas > > On Tue, Mar 16, 2010 at 2:20 AM, Louise Ott wrote: > > > > > > > Hello, > > I tried to use the biojava blast parser myself but i didn't find a way to > > get back these informations.If your blast result can be in xml, you > should > > try to use jaxb to parse it (this is what i used).There are already some > > code for marshall/unmarshall in the biojava3 project.I give you the link, > > but it seems to be dead right now : > > > > > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/branches/biojava3 > > http://www.biojava.org/wiki/BioJava3_project > > Have a nice day, > > Louise > > > > > > > Date: Mon, 15 Mar 2010 19:09:50 -0400 > > > From: xuejiachen at gmail.com > > > To: biojava-l at lists.open-bio.org > > > Subject: [Biojava-l] question about BLAST output parsing > > > > > > Hi, > > > > > > Thanks advance for help. > > > > > > For the following piece of text appearing in a blast output. How can I > > get > > > the fields of "Identities", "Positives", "Gaps" as well as the > alignment > > > information, such as "DK V L+D + G +S + +++ +E GA+K+ L + AAPE" and > > > subject string? > > > > > > >sp|B9KAQ6.1|UPP_THENN RecName: Full=Uracil phosphoribosyltransferase; > > > AltName: Full=UMP > > > pyrophosphorylase; AltName: Full=UPRTase > > > Length = 209 > > > > > > Score = 32.0 bits (71), Expect = 9.7, Method: Compositional matrix > > > adjust. > > > Identities = 16/42 (38%), Positives = 27/42 (64%), Gaps = 2/42 (4%) > > > > > > Query: 360 DKNVLLVDDSIVRGTTSEQIIEMAREAGAKKVYLAS--AAPE 399 > > > DK V L+D + G +S + +++ +E GA+K+ L + AAPE > > > Sbjct: 124 DKEVFLLDPMLATGVSSVKALDILKENGARKITLVALIAAPE 165 > > > _______________________________________________ > > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _________________________________________________________________ > > Hotmail arrive sur votre t?l?phone ! Compatible Iphone, Windows Phone, > > Blackberry, ? > > http://www.messengersurvotremobile.com/?d=Hotmail > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From Richard.Finkers at wur.nl Wed Mar 17 07:21:16 2010 From: Richard.Finkers at wur.nl (Richard Finkers) Date: Wed, 17 Mar 2010 08:21:16 +0100 Subject: [Biojava-l] SVN repository In-Reply-To: References: Message-ID: <4BA082EC.8010908@wur.nl> Hi, I would like to have a look at the BioJava 3 code (and perhaps in the future contribute to). However, I cannot access the SVN repository (http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk). Is the repository down? Thanks, Richard From biopython at maubp.freeserve.co.uk Wed Mar 17 10:16:45 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 17 Mar 2010 10:16:45 +0000 Subject: [Biojava-l] SVN repository In-Reply-To: <4BA082EC.8010908@wur.nl> References: <4BA082EC.8010908@wur.nl> Message-ID: <320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com> On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers wrote: > > Hi, > > I would like to have a look at the BioJava 3 code (and perhaps in the future > contribute to). However, I cannot access the SVN repository > (http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk). > > Is the repository down? > > Thanks, > Richard Probably :( There have been problems discussed on the BioPerl mailing list (they use the same servers), and the OBF team are aware of it: http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html The code.open-bio.org repositories are a read only public mirror, while dev.open-bio.org is the master repository I think is fine (but not available for anonymous download). In the mean time BioPerl have also setup a read only mirror on github - perhaps BioJava could do the same? Meanwhile BioRuby and Biopython are just using github (not SVN or CVS). Peter From andreas at sdsc.edu Wed Mar 17 17:39:41 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 17 Mar 2010 10:39:41 -0700 Subject: [Biojava-l] SVN repository In-Reply-To: <320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com> References: <4BA082EC.8010908@wur.nl> <320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com> Message-ID: <59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com> I have just heard back from the OBF-helpdesk. The VM hosting the anonymous SVN is currently down. Depending on how big the problem turns out to be, it will be back at some point later today / should be back latest tomorrow. Sorry for this inconvenience. Andreas On Wed, Mar 17, 2010 at 3:16 AM, Peter wrote: > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers > wrote: > > > > Hi, > > > > I would like to have a look at the BioJava 3 code (and perhaps in the > future > > contribute to). However, I cannot access the SVN repository > > ( > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk > ). > > > > Is the repository down? > > > > Thanks, > > Richard > > Probably :( > > There have been problems discussed on the BioPerl mailing list > (they use the same servers), and the OBF team are aware of it: > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html > > The code.open-bio.org repositories are a read only public mirror, > while dev.open-bio.org is the master repository I think is fine > (but not available for anonymous download). > > In the mean time BioPerl have also setup a read only mirror > on github - perhaps BioJava could do the same? Meanwhile > BioRuby and Biopython are just using github (not SVN or CVS). > > Peter > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Thu Mar 18 20:36:38 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 18 Mar 2010 13:36:38 -0700 Subject: [Biojava-l] Google summer of code Message-ID: <59a41c431003181336i33d388aak4b5a26e11ee4161b@mail.gmail.com> Hi, It seems our (the Open Biology Foundation's) Google Summer of Code application has been accepted. http://socghop.appspot.com/gsoc/program/accepted_orgs/google/gsoc2010 As such we are now looking for an interested and skilled student to work on the BioJava multiple sequence alignment project. Take a look at the project description, and if you think you are up for the challenge, send me an email with your application. http://biojava.org/wiki/Google_Summer_of_Code Andreas From shakunb at uom.ac.mu Fri Mar 19 10:50:40 2010 From: shakunb at uom.ac.mu (Shakuntala baichoo) Date: Fri, 19 Mar 2010 14:50:40 +0400 Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9 In-Reply-To: References: Message-ID: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com> Hi! I would like to know the interpretation of the scores after running the needleman-wunsch algorithm using the NUCC44.txt substitution matrix. Actually I have taken the named genes from a bacteria EMBL file and I am trying to compare each gene to the other genes in the lot, using the needleman-wunsch algorithm based on the NUCC44.txt substitution matrix. I would like to determine the % match for each pair but since I get mostly -ve and some positive values, I would like to know how to calculate the % match for a pair of genes. I would be grateful if anybody could help me. Thanks. Shakuntala On Thu, Mar 18, 2010 at 8:00 PM, wrote: > Send Biojava-l mailing list submissions to > biojava-l at lists.open-bio.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://lists.open-bio.org/mailman/listinfo/biojava-l > or, via email, send a message with subject or body 'help' to > biojava-l-request at lists.open-bio.org > > You can reach the person managing the list at > biojava-l-owner at lists.open-bio.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of Biojava-l digest..." > > > Today's Topics: > > 1. Re: SVN repository (Andreas Prlic) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 17 Mar 2010 10:39:41 -0700 > From: Andreas Prlic > Subject: Re: [Biojava-l] SVN repository > To: Richard Finkers > Cc: biojava-l at lists.open-bio.org > Message-ID: > <59a41c431003171039h4ca1267bibc45b0d7d270b2a9 at mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > I have just heard back from the OBF-helpdesk. The VM hosting the anonymous > SVN is currently down. Depending on how big the problem turns out to be, it > will be back at some point later today / should be back latest tomorrow. > > Sorry for this inconvenience. > Andreas > > > > > On Wed, Mar 17, 2010 at 3:16 AM, Peter >wrote: > > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers > > > wrote: > > > > > > Hi, > > > > > > I would like to have a look at the BioJava 3 code (and perhaps in the > > future > > > contribute to). However, I cannot access the SVN repository > > > ( > > > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk > > ). > > > > > > Is the repository down? > > > > > > Thanks, > > > Richard > > > > Probably :( > > > > There have been problems discussed on the BioPerl mailing list > > (they use the same servers), and the OBF team are aware of it: > > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html > > > > The code.open-bio.org repositories are a read only public mirror, > > while dev.open-bio.org is the master repository I think is fine > > (but not available for anonymous download). > > > > In the mean time BioPerl have also setup a read only mirror > > on github - perhaps BioJava could do the same? Meanwhile > > BioRuby and Biopython are just using github (not SVN or CVS). > > > > Peter > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > ------------------------------ > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > End of Biojava-l Digest, Vol 86, Issue 9 > **************************************** > -- Best Regards Dr. (Mrs.) S.Baichoo Senior Lecturer CSE Dept, FoE University of Mauritius From andreas at sdsc.edu Fri Mar 19 17:42:44 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Fri, 19 Mar 2010 10:42:44 -0700 Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9 In-Reply-To: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com> References: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com> Message-ID: <59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com> sorry, can you clarify: what do you mean with you "get mostly -ve" ? Andreas On Fri, Mar 19, 2010 at 3:50 AM, Shakuntala baichoo wrote: > Hi! > I would like to know the interpretation of the scores after running the > needleman-wunsch algorithm using the NUCC44.txt substitution matrix. > Actually I have taken the named genes from a bacteria EMBL file and I am > trying to compare each gene to the other genes in the lot, using the > needleman-wunsch algorithm based on the NUCC44.txt substitution matrix. I > would like to determine the % match for each pair but since I get mostly > -ve > and some positive values, I would like to know how to calculate the % match > for a pair of genes. > I would be grateful if anybody could help me. > > Thanks. > Shakuntala > > On Thu, Mar 18, 2010 at 8:00 PM, >wrote: > > > Send Biojava-l mailing list submissions to > > biojava-l at lists.open-bio.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > or, via email, send a message with subject or body 'help' to > > biojava-l-request at lists.open-bio.org > > > > You can reach the person managing the list at > > biojava-l-owner at lists.open-bio.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of Biojava-l digest..." > > > > > > Today's Topics: > > > > 1. Re: SVN repository (Andreas Prlic) > > > > > > ---------------------------------------------------------------------- > > > > Message: 1 > > Date: Wed, 17 Mar 2010 10:39:41 -0700 > > From: Andreas Prlic > > Subject: Re: [Biojava-l] SVN repository > > To: Richard Finkers > > Cc: biojava-l at lists.open-bio.org > > Message-ID: > > <59a41c431003171039h4ca1267bibc45b0d7d270b2a9 at mail.gmail.com> > > Content-Type: text/plain; charset=ISO-8859-1 > > > > I have just heard back from the OBF-helpdesk. The VM hosting the > anonymous > > SVN is currently down. Depending on how big the problem turns out to be, > it > > will be back at some point later today / should be back latest tomorrow. > > > > Sorry for this inconvenience. > > Andreas > > > > > > > > > > On Wed, Mar 17, 2010 at 3:16 AM, Peter > >wrote: > > > > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers < > Richard.Finkers at wur.nl > > > > > > wrote: > > > > > > > > Hi, > > > > > > > > I would like to have a look at the BioJava 3 code (and perhaps in the > > > future > > > > contribute to). However, I cannot access the SVN repository > > > > ( > > > > > > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk > > > ). > > > > > > > > Is the repository down? > > > > > > > > Thanks, > > > > Richard > > > > > > Probably :( > > > > > > There have been problems discussed on the BioPerl mailing list > > > (they use the same servers), and the OBF team are aware of it: > > > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html > > > > > > The code.open-bio.org repositories are a read only public mirror, > > > while dev.open-bio.org is the master repository I think is fine > > > (but not available for anonymous download). > > > > > > In the mean time BioPerl have also setup a read only mirror > > > on github - perhaps BioJava could do the same? Meanwhile > > > BioRuby and Biopython are just using github (not SVN or CVS). > > > > > > Peter > > > _______________________________________________ > > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > > > > ------------------------------ > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > End of Biojava-l Digest, Vol 86, Issue 9 > > **************************************** > > > > > > -- > Best Regards > > Dr. (Mrs.) S.Baichoo > Senior Lecturer > CSE Dept, FoE > University of Mauritius > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From mitlox at op.pl Sat Mar 20 10:17:17 2010 From: mitlox at op.pl (xyz) Date: Sat, 20 Mar 2010 20:17:17 +1000 Subject: [Biojava-l] sort fasta file Message-ID: <20100320201718.4420a9b9@wp01> Hello, I would like to sort multiple fasta file depends on the sequence length, ie. from the read with longest sequence to the read with the shortest sequence. import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import org.biojava.bio.BioException; import org.biojavax.SimpleNamespace; import org.biojavax.bio.seq.RichSequence; import org.biojavax.bio.seq.RichSequenceIterator; public class SortFasta { public static void main(String[] args) throws FileNotFoundException, BioException { BufferedReader br = new BufferedReader(new FileReader("sortfasta.fasta")); SimpleNamespace ns = new SimpleNamespace("biojava"); RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, null, ns); while (rsi.hasNext()) { RichSequence rs = rsi.nextRichSequence(); System.out.println(rs.getName()); System.out.println(rs.seqString()); } } } I have tried to do it, but I do not how to continue. Thank you in advance. Best regards, From jswetnam at gmail.com Sun Mar 21 20:56:35 2010 From: jswetnam at gmail.com (James Swetnam) Date: Sun, 21 Mar 2010 16:56:35 -0400 Subject: [Biojava-l] sort fasta file In-Reply-To: <20100320201718.4420a9b9@wp01> References: <20100320201718.4420a9b9@wp01> Message-ID: Just hacked this together, warning: I am new to both java and biojava. import java.io.*; import java.util.*; import org.biojava.bio.BioException; import org.biojava.bio.symbol.*; import org.biojavax.SimpleNamespace; import org.biojavax.bio.seq.*; import java.util.Comparator; public class SortFasta { static private class RichSequenceComparator implements Comparator { public int compare(RichSequence seq1, RichSequence seq2) { return seq1.length() - seq2.length(); } } // Usage: SortFasta unsortedFile.fasta public static void main(String[] args) throws FileNotFoundException, BioException { String fastaFile = args[0]; BufferedReader br = new BufferedReader(new FileReader(fastaFile)); SimpleNamespace ns = new SimpleNamespace("biojava"); Alphabet protein = AlphabetManager.alphabetForName("PROTEIN"); RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, protein.getTokenization("token"), ns); SortedSet sorted = new TreeSet( new SortFasta.RichSequenceComparator()); while (rsi.hasNext()) { sorted.add(rsi.nextRichSequence()); } Iterator sortedIt = sorted.iterator(); //Do whatever you want here with the ascending list of RichSequences by length, I'll just print them. while(sortedIt.hasNext()) { System.out.println(((RichSequence) sortedIt.next()).length()); } } } On Sat, Mar 20, 2010 at 6:17 AM, xyz wrote: > Hello, > I would like to sort multiple fasta file depends on the sequence length, > ie. from the read with longest sequence to the read with the shortest > sequence. > > import java.io.BufferedReader; > import java.io.FileNotFoundException; > import java.io.FileReader; > import org.biojava.bio.BioException; > > import org.biojavax.SimpleNamespace; > import org.biojavax.bio.seq.RichSequence; > import org.biojavax.bio.seq.RichSequenceIterator; > > public class SortFasta { > > public static void main(String[] args) throws FileNotFoundException, > BioException { > > BufferedReader br = new BufferedReader(new > FileReader("sortfasta.fasta")); SimpleNamespace ns = new > SimpleNamespace("biojava"); > > RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, null, > ns); > > while (rsi.hasNext()) { > RichSequence rs = rsi.nextRichSequence(); > System.out.println(rs.getName()); > System.out.println(rs.seqString()); > } > } > } > > I have tried to do it, but I do not how to continue. > > Thank you in advance. > > Best regards, > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From andreas at sdsc.edu Mon Mar 22 23:46:26 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Mon, 22 Mar 2010 16:46:26 -0700 Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9 In-Reply-To: <3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com> References: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com> <59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com> <3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com> Message-ID: <59a41c431003221646u2b1da091uc81d601dbf412599@mail.gmail.com> Hi Shakuntala, at the present the NeedlemanWunch implementation does not make it totally straightforward to access the %id. You could try parsing the result of the getAlignmentString() call and accessing the information from there ... Making the underlying data more accessible is on the TODO list for this module: http://biojava.org/wiki/BioJava:Modules Andreas 2010/3/21 Shakuntala baichoo > Hi Andreas! > The problem is as follows. We have a bacteria file. There are about 565 > named genes/features there. We wish to compare each gene with the other 564 > genes. I am using needleman-wunsch from biojava to do so. For one specific > run, I am attaching the result. > The score after comparing Feature no. 0 with Feature no. 1 to Feature no. > 564 is displayed (along with the product name etc...). If I wish to > interpret these scores as a percentage homology, how do I do it? > > P.S. Most of the scores are -ve. Only one or a few is +ve. The comparison > is done using NUCC44.txt. > > Thanks > Kind Regards > Shakuntala > > > On Fri, Mar 19, 2010 at 9:42 PM, Andreas Prlic wrote: > >> sorry, can you clarify: what do you mean with you "get mostly -ve" ? >> >> Andreas >> >> >> On Fri, Mar 19, 2010 at 3:50 AM, Shakuntala baichoo wrote: >> >>> Hi! >>> I would like to know the interpretation of the scores after running the >>> needleman-wunsch algorithm using the NUCC44.txt substitution matrix. >>> Actually I have taken the named genes from a bacteria EMBL file and I am >>> trying to compare each gene to the other genes in the lot, using the >>> needleman-wunsch algorithm based on the NUCC44.txt substitution matrix. I >>> would like to determine the % match for each pair but since I get mostly >>> -ve >>> and some positive values, I would like to know how to calculate the % >>> match >>> for a pair of genes. >>> I would be grateful if anybody could help me. >>> >>> Thanks. >>> Shakuntala >>> >>> On Thu, Mar 18, 2010 at 8:00 PM, >> >wrote: >>> >>> > Send Biojava-l mailing list submissions to >>> > biojava-l at lists.open-bio.org >>> > >>> > To subscribe or unsubscribe via the World Wide Web, visit >>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >>> > or, via email, send a message with subject or body 'help' to >>> > biojava-l-request at lists.open-bio.org >>> > >>> > You can reach the person managing the list at >>> > biojava-l-owner at lists.open-bio.org >>> > >>> > When replying, please edit your Subject line so it is more specific >>> > than "Re: Contents of Biojava-l digest..." >>> > >>> > >>> > Today's Topics: >>> > >>> > 1. Re: SVN repository (Andreas Prlic) >>> > >>> > >>> > ---------------------------------------------------------------------- >>> > >>> > Message: 1 >>> > Date: Wed, 17 Mar 2010 10:39:41 -0700 >>> > From: Andreas Prlic >>> > Subject: Re: [Biojava-l] SVN repository >>> > To: Richard Finkers >>> > Cc: biojava-l at lists.open-bio.org >>> > Message-ID: >>> > <59a41c431003171039h4ca1267bibc45b0d7d270b2a9 at mail.gmail.com> >>> > Content-Type: text/plain; charset=ISO-8859-1 >>> > >>> > I have just heard back from the OBF-helpdesk. The VM hosting the >>> anonymous >>> > SVN is currently down. Depending on how big the problem turns out to >>> be, it >>> > will be back at some point later today / should be back latest >>> tomorrow. >>> > >>> > Sorry for this inconvenience. >>> > Andreas >>> > >>> > >>> > >>> > >>> > On Wed, Mar 17, 2010 at 3:16 AM, Peter < >>> biopython at maubp.freeserve.co.uk >>> > >wrote: >>> > >>> > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers < >>> Richard.Finkers at wur.nl >>> > > >>> > > wrote: >>> > > > >>> > > > Hi, >>> > > > >>> > > > I would like to have a look at the BioJava 3 code (and perhaps in >>> the >>> > > future >>> > > > contribute to). However, I cannot access the SVN repository >>> > > > ( >>> > > >>> > >>> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk >>> > > ). >>> > > > >>> > > > Is the repository down? >>> > > > >>> > > > Thanks, >>> > > > Richard >>> > > >>> > > Probably :( >>> > > >>> > > There have been problems discussed on the BioPerl mailing list >>> > > (they use the same servers), and the OBF team are aware of it: >>> > > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html >>> > > >>> > > The code.open-bio.org repositories are a read only public mirror, >>> > > while dev.open-bio.org is the master repository I think is fine >>> > > (but not available for anonymous download). >>> > > >>> > > In the mean time BioPerl have also setup a read only mirror >>> > > on github - perhaps BioJava could do the same? Meanwhile >>> > > BioRuby and Biopython are just using github (not SVN or CVS). >>> > > >>> > > Peter >>> > > _______________________________________________ >>> > > Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> > > http://lists.open-bio.org/mailman/listinfo/biojava-l >>> > > >>> > >>> > >>> > ------------------------------ >>> > >>> > _______________________________________________ >>> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >>> > >>> > >>> > End of Biojava-l Digest, Vol 86, Issue 9 >>> > **************************************** >>> > >>> >>> >>> >>> -- >>> Best Regards >>> >>> Dr. (Mrs.) S.Baichoo >>> Senior Lecturer >>> CSE Dept, FoE >>> University of Mauritius >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >> >> > > > -- > Best Regards > > Dr. (Mrs.) S.Baichoo > Senior Lecturer > CSE Dept, FoE > University of Mauritius > From zm19fitz at siena.edu Mon Mar 22 20:36:14 2010 From: zm19fitz at siena.edu (Fitzsimmons, Zachary) Date: Mon, 22 Mar 2010 16:36:14 -0400 Subject: [Biojava-l] (no subject) Message-ID: <3898DEB8D4D8E34EB622AC53CEFFA2680173D9476385@mb-1.siena.edu> Hi, I am currently a sophomore at Siena College and a Dual Major in Computer Science and Mathematics and I am writing you today to voice my interest in developing for BioJava this summer through Google?s Summer of Code program. I did research at my own college last summer on the Netflix Prize Project with one of my computer science professors and I am very interested in diversifying my work this summer. Currently I am taking an upper-level computer science course in bioinformatics and I have always thought of this as a possible field of study when I attend graduate school. I have learned about different global alignment algorithms such as Needleman?Wunsch and Smith?Waterman in class to match proteins and DNA sequences and later we are going to study the HP folding problem in-depth. I am well versed in the Java programming language, having taken all of the Java courses at my college, and confident in my abilities to contribute to the BioJava project. I consider the All-Java Multiple Sequence Alignment project described in your wiki article [http://biojava.org/wiki/Google_Summer_of_Code] something within my abilities as an experienced Java programmer with past research experience and an interest in the field of bioinformatics. Updating the BioJava code to be newly compliant and eventually implementing a Clustal algorithm for multiple sequence alignment is well within my grasp especially on completion of my college?s bioinformatics course and studying BioJava?s documentation. I would just like your feedback on my proposal for working on your project. I hope to hear from you soon and to apply for the position through Google. Sincerely, Zack Fitzsimmons From andreas at sdsc.edu Wed Mar 24 00:33:09 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Tue, 23 Mar 2010 17:33:09 -0700 Subject: [Biojava-l] GSoC update Message-ID: <59a41c431003231733t1e259753k55fbe0a8bfb801a3@mail.gmail.com> Hi, A quick update regarding the current status of our Google Summer of Code project: Several students already have expressed their interest. In fact the response was so good that I believe BioJava should try to run more than just one project. In the meanwhile we added another "mentor proposed" project to our GSoC page : http://biojava.org/wiki/Google_Summer_of_Code . Identification and Classification of Posttranslational Modification of Proteins: Develop a Postranslational Modification package for the BioJava project. In general Google strongly encourages to have student-proposed projects, since historically those are often the most successful GSoC projects. It is recommended that students contact us / possible mentors prior to their application so we can match up students with suitable mentors and projects and we can help in solidifying your project ideas. In principle any BioJava contributor is suitable as a mentor. Students can apply between March 22nd and April 9th via the google web site. http://socghop.appspot.com/ Andreas From andreas at sdsc.edu Wed Mar 24 15:37:43 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Wed, 24 Mar 2010 08:37:43 -0700 Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9 In-Reply-To: <3baec70c1003240332n25b11e10i23c2c00c96a71a89@mail.gmail.com> References: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com> <59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com> <3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com> <59a41c431003221646u2b1da091uc81d601dbf412599@mail.gmail.com> <3baec70c1003240332n25b11e10i23c2c00c96a71a89@mail.gmail.com> Message-ID: <59a41c431003240837w3db713a3v3161b4e078faa483@mail.gmail.com> Hi Shakuntala, If the score is positive or negative only depends on the implementation and representation... I think most people expect the score to be positive, so the toAlignmentString method displays it as a positive value, while internally it is a bit different... Andreas On Wed, Mar 24, 2010 at 3:32 AM, Shakuntala baichoo wrote: > Hello Andreas! > Thanks for the quick reply. > I tried the getAlignmentString. It provides a lot of information. However, > I think there is a slight problem here. From the getAlignmentString call I > see that the score after aligning a pair of dna strings is 2706. > But when I view the return value from the method pairwiseAlignment (for the > same set) then the score is -2706. Why? > > Thanks > Shakuntala > > * > * > > > On Tue, Mar 23, 2010 at 3:46 AM, Andreas Prlic wrote: > >> Hi Shakuntala, >> >> at the present the NeedlemanWunch implementation does not make it totally >> straightforward to access the %id. You could try parsing the result of the >> getAlignmentString() call and accessing the information from there ... >> Making the underlying data more accessible is on the TODO list for this >> module: http://biojava.org/wiki/BioJava:Modules >> >> Andreas >> >> 2010/3/21 Shakuntala baichoo >> >> Hi Andreas! >>> The problem is as follows. We have a bacteria file. There are about 565 >>> named genes/features there. We wish to compare each gene with the other 564 >>> genes. I am using needleman-wunsch from biojava to do so. For one specific >>> run, I am attaching the result. >>> The score after comparing Feature no. 0 with Feature no. 1 to Feature no. >>> 564 is displayed (along with the product name etc...). If I wish to >>> interpret these scores as a percentage homology, how do I do it? >>> >>> P.S. Most of the scores are -ve. Only one or a few is +ve. The >>> comparison is done using NUCC44.txt. >>> >>> Thanks >>> Kind Regards >>> Shakuntala >>> >>> >>> On Fri, Mar 19, 2010 at 9:42 PM, Andreas Prlic wrote: >>> >>>> sorry, can you clarify: what do you mean with you "get mostly -ve" ? >>>> >>>> Andreas >>>> >>>> >>>> On Fri, Mar 19, 2010 at 3:50 AM, Shakuntala baichoo wrote: >>>> >>>>> Hi! >>>>> I would like to know the interpretation of the scores after running the >>>>> needleman-wunsch algorithm using the NUCC44.txt substitution matrix. >>>>> Actually I have taken the named genes from a bacteria EMBL file and I >>>>> am >>>>> trying to compare each gene to the other genes in the lot, using the >>>>> needleman-wunsch algorithm based on the NUCC44.txt substitution matrix. >>>>> I >>>>> would like to determine the % match for each pair but since I get >>>>> mostly -ve >>>>> and some positive values, I would like to know how to calculate the % >>>>> match >>>>> for a pair of genes. >>>>> I would be grateful if anybody could help me. >>>>> >>>>> Thanks. >>>>> Shakuntala >>>>> >>>>> On Thu, Mar 18, 2010 at 8:00 PM, >>>> >wrote: >>>>> >>>>> > Send Biojava-l mailing list submissions to >>>>> > biojava-l at lists.open-bio.org >>>>> > >>>>> > To subscribe or unsubscribe via the World Wide Web, visit >>>>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> > or, via email, send a message with subject or body 'help' to >>>>> > biojava-l-request at lists.open-bio.org >>>>> > >>>>> > You can reach the person managing the list at >>>>> > biojava-l-owner at lists.open-bio.org >>>>> > >>>>> > When replying, please edit your Subject line so it is more specific >>>>> > than "Re: Contents of Biojava-l digest..." >>>>> > >>>>> > >>>>> > Today's Topics: >>>>> > >>>>> > 1. Re: SVN repository (Andreas Prlic) >>>>> > >>>>> > >>>>> > >>>>> ---------------------------------------------------------------------- >>>>> > >>>>> > Message: 1 >>>>> > Date: Wed, 17 Mar 2010 10:39:41 -0700 >>>>> > From: Andreas Prlic >>>>> > Subject: Re: [Biojava-l] SVN repository >>>>> > To: Richard Finkers >>>>> > Cc: biojava-l at lists.open-bio.org >>>>> > Message-ID: >>>>> > <59a41c431003171039h4ca1267bibc45b0d7d270b2a9 at mail.gmail.com> >>>>> > Content-Type: text/plain; charset=ISO-8859-1 >>>>> > >>>>> > I have just heard back from the OBF-helpdesk. The VM hosting the >>>>> anonymous >>>>> > SVN is currently down. Depending on how big the problem turns out to >>>>> be, it >>>>> > will be back at some point later today / should be back latest >>>>> tomorrow. >>>>> > >>>>> > Sorry for this inconvenience. >>>>> > Andreas >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > On Wed, Mar 17, 2010 at 3:16 AM, Peter < >>>>> biopython at maubp.freeserve.co.uk >>>>> > >wrote: >>>>> > >>>>> > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers < >>>>> Richard.Finkers at wur.nl >>>>> > > >>>>> > > wrote: >>>>> > > > >>>>> > > > Hi, >>>>> > > > >>>>> > > > I would like to have a look at the BioJava 3 code (and perhaps in >>>>> the >>>>> > > future >>>>> > > > contribute to). However, I cannot access the SVN repository >>>>> > > > ( >>>>> > > >>>>> > >>>>> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava-live/trunk >>>>> > > ). >>>>> > > > >>>>> > > > Is the repository down? >>>>> > > > >>>>> > > > Thanks, >>>>> > > > Richard >>>>> > > >>>>> > > Probably :( >>>>> > > >>>>> > > There have been problems discussed on the BioPerl mailing list >>>>> > > (they use the same servers), and the OBF team are aware of it: >>>>> > > >>>>> http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html >>>>> > > >>>>> > > The code.open-bio.org repositories are a read only public mirror, >>>>> > > while dev.open-bio.org is the master repository I think is fine >>>>> > > (but not available for anonymous download). >>>>> > > >>>>> > > In the mean time BioPerl have also setup a read only mirror >>>>> > > on github - perhaps BioJava could do the same? Meanwhile >>>>> > > BioRuby and Biopython are just using github (not SVN or CVS). >>>>> > > >>>>> > > Peter >>>>> > > _______________________________________________ >>>>> > > Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> > > http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> > > >>>>> > >>>>> > >>>>> > ------------------------------ >>>>> > >>>>> > _______________________________________________ >>>>> > Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> > http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> > >>>>> > >>>>> > End of Biojava-l Digest, Vol 86, Issue 9 >>>>> > **************************************** >>>>> > >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards >>>>> >>>>> Dr. (Mrs.) S.Baichoo >>>>> Senior Lecturer >>>>> CSE Dept, FoE >>>>> University of Mauritius >>>>> _______________________________________________ >>>>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>>>> >>>> >>>> >>> >>> >>> -- >>> Best Regards >>> >>> Dr. (Mrs.) S.Baichoo >>> Senior Lecturer >>> CSE Dept, FoE >>> University of Mauritius >>> >> >> > > > -- > Best Regards > > Dr. (Mrs.) S.Baichoo > Senior Lecturer > CSE Dept, FoE > University of Mauritius > From jeedward at yahoo.com Thu Mar 25 00:27:28 2010 From: jeedward at yahoo.com (John Edward) Date: Wed, 24 Mar 2010 17:27:28 -0700 (PDT) Subject: [Biojava-l] Call for papers (Deadline Extended): BCBGC-10, USA, July 2010 Message-ID: <852924.28793.qm@web45911.mail.sp1.yahoo.com> It would be highly appreciated if you could share this announcement with your colleagues, students and individuals whose research is in bioinformatics, computational biology, genomics, data-mining, and related areas. Call for papers (Deadline Extended): BCBGC-10, USA, July 2010 The 2010 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) (website: http://www.PromoteResearch.org ) will be held during 12-14 of July 2010 in Orlando, FL, USA. BCBGC is an important event in the areas of bioinformatics, computational biology, genomics and chemoinformatics and focuses on all areas related to the conference. The conference will be held at the same time and location where several other major international conferences will be taking place. The conference will be held as part of 2010 multi-conference (MULTICONF-10). MULTICONF-10 will be held during July 12-14, 2010 in Orlando, Florida, USA. The primary goal of MULTICONF is to promote research and developmental activities in computer science, information technology, control engineering, and related fields. Another goal is to promote the dissemination of research to a multidisciplinary audience and to facilitate communication among researchers, developers, practitioners in different fields. The following conferences are planned to be organized as part of MULTICONF-10. ? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-10) ? International Conference on Automation, Robotics and Control Systems (ARCS-10) ? International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) ? International Conference on Computer Communications and Networks (CCN-10) ? International Conference on Enterprise Information Systems and Web Technologies (EISWT-10) ? International Conference on High Performance Computing Systems (HPCS-10) ? International Conference on Information Security and Privacy (ISP-10) ? International Conference on Image and Video Processing and Computer Vision (IVPCV-10) ? International Conference on Software Engineering Theory and Practice (SETP-10) ? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-10) MULTICONF-10 will be held at Imperial Swan Hotel and Suites. It is a full-service resort that puts you in the middle of the fun! Located 1/2 block south of the famed International Drive, the hotel is just minutes from great entertainment like Walt Disney World? Resort, Universal Studios and Sea World Orlando. Guests can enjoy free scheduled transportation to these theme parks, as well as spacious accommodations, outdoor pools and on-site dining ? all situated on 10 tropically landscaped acres. Here, guests can experience a full-service resort with discount hotel pricing in Orlando. We invite draft paper submissions. Please see the website http://www.PromoteResearch.org for more details. Sincerely John Edward From andreas.draeger at uni-tuebingen.de Thu Mar 25 14:19:02 2010 From: andreas.draeger at uni-tuebingen.de (=?ISO-8859-1?Q?Andreas_Dr=E4ger?=) Date: Thu, 25 Mar 2010 15:19:02 +0100 Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9 In-Reply-To: <59a41c431003240837w3db713a3v3161b4e078faa483@mail.gmail.com> References: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com> <59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com> <3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com> <59a41c431003221646u2b1da091uc81d601dbf412599@mail.gmail.com> <3baec70c1003240332n25b11e10i23c2c00c96a71a89@mail.gmail.com> <59a41c431003240837w3db713a3v3161b4e078faa483@mail.gmail.com> Message-ID: <4BAB70D6.5060309@uni-tuebingen.de> Hi Andreas and Shakuntala, The alignment classes have just been revised and can be now updated from the repository. As a major improvement the alignment result has become much easier usable. So, if you're interested in computing something based on the score, you can now simply apply the dedicated get method and don't have to care about parsing anymore. I hope that helps. Cheers Andreas -- Dipl.-Bioinform. Andreas Dr?ger Eberhard Karls University T?bingen Center for Bioinformatics (ZBIT) Sand 1 72076 T?bingen Germany Phone: +49-7071-29-70436 Fax: +49-7071-29-5091 From mitlox at op.pl Thu Mar 25 13:23:37 2010 From: mitlox at op.pl (xyz) Date: Thu, 25 Mar 2010 23:23:37 +1000 Subject: [Biojava-l] sort fasta file In-Reply-To: References: <20100320201718.4420a9b9@wp01> Message-ID: <20100325232337.3021200a@wp01> Hi James, Thank you for the solution, but I get this 7 13 23 30 as output for this input file: >1 atccccc >2 atccccctttttt >3 atccccccccccccccccctttt >4 tttttttccccccccccccccccccccccc >5 tttttttccccccccccccccccccccccc How is it possible to fix it and why did you chose Comparator and not Comparable? Thank you in advance. Best regards, On Sun, 21 Mar 2010 16:56:35 -0400 James Swetnam wrote: > Just hacked this together, warning: I am new to both java and biojava. > > import java.io.*; > import java.util.*; > > import org.biojava.bio.BioException; > import org.biojava.bio.symbol.*; > import org.biojavax.SimpleNamespace; > import org.biojavax.bio.seq.*; > > import java.util.Comparator; > > public class SortFasta { > > static private class RichSequenceComparator implements > Comparator { > > public int compare(RichSequence seq1, RichSequence seq2) > { > return seq1.length() - seq2.length(); > } > > > } > > // Usage: SortFasta unsortedFile.fasta > public static void main(String[] args) throws > FileNotFoundException, BioException { > > String fastaFile = args[0]; > > BufferedReader br = new BufferedReader(new FileReader(fastaFile)); > SimpleNamespace ns = new SimpleNamespace("biojava"); > > Alphabet protein = AlphabetManager.alphabetForName("PROTEIN"); > > RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, > protein.getTokenization("token"), > ns); > > SortedSet sorted = new TreeSet( new > SortFasta.RichSequenceComparator()); > > while (rsi.hasNext()) { > sorted.add(rsi.nextRichSequence()); > } > > Iterator sortedIt = sorted.iterator(); > > //Do whatever you want here with the ascending list of > RichSequences by length, I'll just print them. > while(sortedIt.hasNext()) > { > System.out.println(((RichSequence) sortedIt.next()).length()); > } > } > } > From holland at eaglegenomics.com Thu Mar 25 16:27:17 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 25 Mar 2010 16:27:17 +0000 Subject: [Biojava-l] Bug fix for Biojava in regard to email with subject :( Hibernate Exception and suggestion for change in BioSqlSchema) In-Reply-To: <4BAABA21.4000301@gmail.com> References: <4BAABA21.4000301@gmail.com> Message-ID: <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com> Patched and in subversion on the head in the new Biojava 3 code. I modified the code slightly to simplify it. There were also parallel changes required over in SimpleDocRef itself to enable it to continue working without being connected to BioSQL. On 25 Mar 2010, at 01:19, Deepak Sheoran wrote: > I am writing this email again, I didn't get any response weather this bugs are patched or are they lost some where on mailing list. I am not sure that's why I am writing this back. I don't know how to apply this patch So I am counting on you guys to apply theses patch and reply me back so I know its fixed. > > > > Thanks > Deepak Sheoran > > > Hi > In response to bug fix suggested by Richard I have created some patches. We need to apply these to fix biojava from processing references from a genbank record in a wrong manner which cause more hibernate exceptions. After applying patch, reference resolution code will test pubmed or medline id, then if no match then test author/title/location, then if still no match create a new reference. I even tested it with GenbankRelease 175 and I gained almost 3159 more records in my database. > > Can somebody please have a look on second issue of it and fix it > " > 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from). > " > > Also I am planning on making a bridge between biosql database loaded using bioperl and biojava, here is my some of the investigation can you guys suggest some direction on it. > Have a look on attached files > 1) Biojava_BioPerl_Diff.xls ==> it have view of tables where genbank record is stored in biosql instance by bioperl and biojava > 2) GenbankRecord.doc ==> its word document having a genbank showing where its information goes in biosql using bioperl and biojava > 3) BioSqlRichobjectBuilder.patch ==> patch needed for BioSqlRichObjectBuild.java class > 4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class > > > Thanks > Deepak Sheoran > > > > -------- Original Message -------- > Subject: Re: Hibernate Exception and suggestion for change in BioSqlSchema > Date: Tue, 9 Feb 2010 20:34:32 +1300 > From: Richard Holland > To: Deepak Sheoran > CC: biojava-l at biojava.org > > Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text. > > However, in answer to your two questions: > > 1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March). > > 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from). > > cheers, > Richard > > On 9 Feb 2010, at 20:21, Deepak Sheoran wrote: > > > > > Hi Richard > > > > Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message. > > > > > > Thanks > > Deepak Sheoran > > -------- Original Message -------- > > Subject: Hibernate Exception and suggestion for change in BioSqlSchema > > Date: Wed, 03 Feb 2010 08:07:35 -0600 > > From: Deepak Sheoran > > > > To: > biojava-l at lists.open-bio.org > > > > > Hi guys, > > > > A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is: > http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html > > > On Richard suggestion in above link I am able to resolve some of issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us. > > ? The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id. > > This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object . > > Now when you tie RichObjectFactory to a active hibernate session then the class "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database. > > But problem is with below part of that method: > > ?..LineNumber: 114 > > else if (SimpleDocRef.class.isAssignableFrom(clazz)) > > { queryType = "DocRef"; > > // convert List constructor to String representation for query > > ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true)); > > if (ourParamsList.size()<3) { > > queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null"; > > } else { > > queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?"; > > } > > } > > ..LineNubmer: 123 > > Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code > > ?.LineNumber: 447 > > else { > > try { > > CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)}); > > RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount); > > rlistener.getCurrentFeature().addRankedCrossRef(rcr); > > } catch (ChangeVetoException e) { > > throw new ParseException(e+", accession:"+accession); > > } > > } > > ?..LineNumber:455 > > Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of "unique constraint on dbxref_id" column. > > > > The only way to get these record in database is: > > ? The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table. Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them. > > ? Second solution is slightly difficult to implement, is to change the way "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)" make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session. > > > > Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email) > > Reference_id > > Dbxref_id > > Location > > Title > > Authors > > crc > > 216 > > 18554304 > > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008) > > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H. > > 9E940E01F4BE3CD0 > > 230 > > 18554304 > > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008) > > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H. > > D3BC0C17F3F786C9 > > 415 > > 16790744 > > Infect. Immun. 74 (7), 3715-3726 (2006) > > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A. > > 60AEDFA0CEEACC38 > > 969 > > 16790744 > > Infect. Immun. 74 (7), 3715-3726 (2006) > > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A. > > 4B1232999F6E8130 > > 929 > > 8688087 > > Science 273 (5278), 1058-1073 (1996) > > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. > > 3E79B40DD2AAA2B7 > > 932 > > 8688087 > > Science 273 (5278), 1058-1073 (1996) > > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. > > 094EB3384F8D6DE8 > > 1426 > > 10684935 > > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39 > > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M. > > 357648D8FD8C6C8A > > 1481 > > 10684935 > > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39 > > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C. > > 115411EB2DEE5654 > > 1497 > > 14689165 > > Arch. Microbiol. 181 (2), 144-154 (2004) > > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E. > > 4D5D376EECCD186B > > 1501 > > 14689165 > > Arch. Microbiol. 181 (2), 144-154 (2004) > > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E. > > 4D57954EECDED66B > > 1556 > > 18060065 > > PLoS ONE 2 (12), E1271 (2007) > > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > > 698688FB6DB95247 > > 1559 > > 18060065 > > PLoS ONE 2 (12), E1271 (2007) > > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > > E25E1BA99DB18F3D > > > > ? The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature > > ? Which means in richsequence object some feature have location object which have its feature set to null. > > ? My Observation: > > ? Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record > > ? After catching the hibernate exception I went through all the features and either biojava or hibernate changed the object type of a CompoundRichLocation to SimpleRichLocation and set the feature variable to null. > > ? Below is the screen shot of one of my tests > > ? Settings before trying to persits the richsequence object to database > > > > > > ? > > ? After trying to persits the richsequence object to database and got in hibernate exception catch > > > > ? > > > > ? So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening. > > ? Some extra information to make things more clear to you guys. > > ? Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object. > > ? LOCUS AE001439 1643831 bp DNA circular BCT 19-JAN-2006 > > ? richSequence.feature Index : 2540 and line number in the genbank record : 22115 > > ? LOCUS CP001189 3887492 bp DNA circular BCT 16-OCT-2008 > > ? richSequence.feature Index : 127 and line number in the genbank record : 2137 > > ? LOCUS CP001292 328635 bp DNA circular BCT 17-DEC-2008 > > ? richSequence.feature Index : 389 and line number in the genbank record : 3632 > > ? LOCUS AM279694 238517 bp DNA linear BCT 23-OCT-2008 > > ? richSequence.feature Index : 47 and line number in the genbank record : 4841 > > ? LOCUS CR931663 18517 bp DNA linear BCT 18-SEP-2008 > > ? richSequence.feature Index : 45 and line number in the genbank record : 442 > > ? The complete exception msg : > > org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature > > at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72) > > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290) > > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > > at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > > at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > > at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > > at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > > at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > > at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > > at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > > at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > > at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > > at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > > at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > > at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27) > > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > > at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535) > > at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523) > > at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78) > > > > > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: > holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From andreas at sdsc.edu Thu Mar 25 16:47:45 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 25 Mar 2010 09:47:45 -0700 Subject: [Biojava-l] Bug fix for Biojava in regard to email with subject :( Hibernate Exception and suggestion for change in BioSqlSchema) In-Reply-To: <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com> References: <4BAABA21.4000301@gmail.com> <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com> Message-ID: <59a41c431003250947g6ecd11cbw21c5be5858b9aa09@mail.gmail.com> Excellent, thanks Richard and Deepak! Andreas On Thu, Mar 25, 2010 at 9:27 AM, Richard Holland wrote: > Patched and in subversion on the head in the new Biojava 3 code. I modified > the code slightly to simplify it. There were also parallel changes required > over in SimpleDocRef itself to enable it to continue working without being > connected to BioSQL. > > On 25 Mar 2010, at 01:19, Deepak Sheoran wrote: > > > I am writing this email again, I didn't get any response weather this > bugs are patched or are they lost some where on mailing list. I am not sure > that's why I am writing this back. I don't know how to apply this patch So I > am counting on you guys to apply theses patch and reply me back so I know > its fixed. > > > > > > > > Thanks > > Deepak Sheoran > > > > > > Hi > > In response to bug fix suggested by Richard I have created some patches. > We need to apply these to fix biojava from processing references from a > genbank record in a wrong manner which cause more hibernate exceptions. > After applying patch, reference resolution code will test pubmed or medline > id, then if no match then test author/title/location, then if still no match > create a new reference. I even tested it with GenbankRelease 175 and I > gained almost 3159 more records in my database. > > > > Can somebody please have a look on second issue of it and fix it > > " > > 2. I think that's a bug (compound locations with null features) but not > sure why. Could be that the process of constructing a CompoundRichLocation > is somehow losing the feature reference from the original > SimpleRichLocation. Again I can't investigate until March - can someone else > take a look at the code? (A good starting point would be to look at how a > CompoundRichLocation decides to select the feature from the > SimpleRichLocations it is made up from). > > " > > > > Also I am planning on making a bridge between biosql database loaded > using bioperl and biojava, here is my some of the investigation can you guys > suggest some direction on it. > > Have a look on attached files > > 1) Biojava_BioPerl_Diff.xls ==> it have view of tables where genbank > record is stored in biosql instance by bioperl and biojava > > 2) GenbankRecord.doc ==> its word document having a genbank showing > where its information goes in biosql using bioperl and biojava > > 3) BioSqlRichobjectBuilder.patch ==> patch needed for > BioSqlRichObjectBuild.java class > > 4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class > > > > > > Thanks > > Deepak Sheoran > > > > > > > > -------- Original Message -------- > > Subject: Re: Hibernate Exception and suggestion for change in > BioSqlSchema > > Date: Tue, 9 Feb 2010 20:34:32 +1300 > > From: Richard Holland > > To: Deepak Sheoran > > CC: biojava-l at biojava.org > > > > Hi. It's possible that your original email didn't make it to the list > because it is HTML format, and the list only accepts plain text. > > > > However, in answer to your two questions: > > > > 1. The code that does the resolution of references might be better if > it looks up existing IDs rather than using author, title, location to > identify existing records. I would suggest modifying it to a three-step > process - test ID, then if no match then test author/title/location, then if > still no match create a new reference. Could someone do that? (I'm unable to > do anything until late March). > > > > 2. I think that's a bug (compound locations with null features) but not > sure why. Could be that the process of constructing a CompoundRichLocation > is somehow losing the feature reference from the original > SimpleRichLocation. Again I can't investigate until March - can someone else > take a look at the code? (A good starting point would be to look at how a > CompoundRichLocation decides to select the feature from the > SimpleRichLocations it is made up from). > > > > cheers, > > Richard > > > > On 9 Feb 2010, at 20:21, Deepak Sheoran wrote: > > > > > > > > Hi Richard > > > > > > Below is the email which I sent to Biojava-1 mailing list but it never > get posted on the mailing list server neither do i got any response, so > please have a look on this email and tell what can be the solution of the > problem described in the message. > > > > > > > > > Thanks > > > Deepak Sheoran > > > -------- Original Message -------- > > > Subject: Hibernate Exception and suggestion for change in > BioSqlSchema > > > Date: Wed, 03 Feb 2010 08:07:35 -0600 > > > From: Deepak Sheoran > > > > > > > To: > > biojava-l at lists.open-bio.org > > > > > > > > Hi guys, > > > > > > A couple of days back I was having some problem with hibernate > exception but that exception got resolved and the reference to that email > is: > > > http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html > > > > > On Richard suggestion in above link I am able to resolve some of > issues but then, I got stuck in to some other error with hibernate and then > decided to investigate the matter and below are some facts and information > which I found and I guess it is going to affect all of us. > > > ? The "Reference" table in bioSql schema have unique constraint on > "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). > Which mean only one entry in reference table can use on dbxref_id. > > > This Works wells but in cases when you have little variation in value > of following column "location", "title", "authors" and all these variation > refers to same PUBMED_ID. Then we can't persist or create a richsequence > object . > > > Now when you tie RichObjectFactory to a active hibernate session then > the class "BioSqlRichObjectBuilder" have method called "buildObject(Class > clazz, List paramsList) " which is responsible for looking up details of > object in the database and if it find one then it will return that object, > else it will try to persist the new object into the database. > > > But problem is with below part of that method: > > > ?..LineNumber: 114 > > > else if (SimpleDocRef.class.isAssignableFrom(clazz)) > > > { queryType = "DocRef"; > > > // convert List constructor to String representation > for query > > > ourParamsList.set(0, > DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true)); > > > if (ourParamsList.size()<3) { > > > queryText = "from DocRef as cr where cr.authors > = ? and cr.location = ? and cr.title is null"; > > > } else { > > > queryText = "from DocRef as cr where cr.authors > = ? and cr.location = ? and cr.title = ?"; > > > } > > > } > > > ..LineNubmer: 123 > > > Now when hibernate search the database, it won't find any other record > in "reference" table because those two record are different in string > comparison, so it will return a new object back to "GenbankFormat" to > following piece of code > > > ?.LineNumber: 447 > > > else { > > > try { > > > CrossRef cr = > (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new > Object[]{dbname, raccession, new Integer(0)}); > > > RankedCrossRef rcr = new > SimpleRankedCrossRef(cr, ++rcrossrefCount); > > > > rlistener.getCurrentFeature().addRankedCrossRef(rcr); > > > } catch (ChangeVetoException e) > { > > > throw new > ParseException(e+", accession:"+accession); > > > } > > > } > > > ?..LineNumber:455 > > > Then we will add that object to rlistener. And move to next part of > genbank record and then biojava search for a new crossref in database and it > will try to persist the old one it get a hibernate exception regarding > violation of "unique constraint on dbxref_id" column. > > > > > > The only way to get these record in database is: > > > ? The very easy solution and the way I did it for testing > my theory is Change the bioSql schema so that it can allow many to one on > relation between "reference" and "dbxref" table. Which even make sense > because one paper can have many different variation of naming, and this > change allow us to store that info too. But this is something BioSql people > have decide and I don't know how to approach them. > > > ? Second solution is slightly difficult to implement, is to > change the way "BioSqlRichObjectBuilder.buildObject(Class clazz,List > paramsList)" make decision about weather a particular DocRef already exist > in database or not. I am mean testing all possible string variations of > authors, location, title of the docRef which we are searching. Which does > have many complications and may slow down process of creating a richsequence > object when link RichObjectFactory with a active hibernate session. > > > > > > Example:Below is a sample of what i have in my local biosql schema > which has modification suggested by me. (dbxref_id column have Pubmed_id , I > replaced the local dbxref_id which was present on this table in my database > with pubmed_id stored in "dbxref" table, for easy reference with outside > world in this email) > > > Reference_id > > > Dbxref_id > > > Location > > > Title > > > Authors > > > crc > > > 216 > > > 18554304 > > > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 > (2008) > > > Isolation of lactate-utilizing butyrate-producing bacteria from human > feces and in vivo administration of Anaerostipes caccae strain L2 and > galacto-oligosaccharides in a rat model > > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., > Nomoto,K., Ito,M. and Sawada,H. > > > 9E940E01F4BE3CD0 > > > 230 > > > 18554304 > > > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008) > > > Isolation of lactate-utilizing butyrate-producing bacteria from human > feces and in vivo administration of Anaerostipes caccae strain L2 and > galacto-oligosaccharides in a rat model > > > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., > Nomoto,K., Ito,M. and Sawada,H. > > > D3BC0C17F3F786C9 > > > 415 > > > 16790744 > > > Infect. Immun. 74 (7), 3715-3726 (2006) > > > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is > Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via > Recombination with Repetitive Chromosomal Sequences > > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and > Totten,P.A. > > > 60AEDFA0CEEACC38 > > > 969 > > > 16790744 > > > Infect. Immun. 74 (7), 3715-3726 (2006) > > > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is > extensive in vitro and in vivo and suggests that variation is generated via > recombination with repetitive chromosomal sequences > > > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and > Totten,P.A. > > > 4B1232999F6E8130 > > > 929 > > > 8688087 > > > Science 273 (5278), 1058-1073 (1996) > > > Complete genome sequence of the methanogenic archaeon, Methanococcus > jannaschii > > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., > Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., > Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., > Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., > Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., > Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., > Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., > Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and > Venter,J.C. > > > 3E79B40DD2AAA2B7 > > > 932 > > > 8688087 > > > Science 273 (5278), 1058-1073 (1996) > > > Complete genome sequence of the methanogenic archaeon, Methanococcus > jannaschii > > > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., > Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., > Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., > Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., > Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., > Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., > Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., > Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. > > > 094EB3384F8D6DE8 > > > 1426 > > > 10684935 > > > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae > AR39 > > > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., > Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., > Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., > Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and > Fraser,C.M. > > > 357648D8FD8C6C8A > > > 1481 > > > 10684935 > > > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > > > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae > AR39 > > > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., > Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., > Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., > DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C. > > > 115411EB2DEE5654 > > > 1497 > > > 14689165 > > > Arch. Microbiol. 181 (2), 144-154 (2004) > > > The effect of FITA mutations on the symbiotic properties of > Sinorhizobium fredii varies in a chromosomal-background-dependent manner > > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., > del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. > and Ruiz-Sainz,J.E. > > > 4D5D376EECCD186B > > > 1501 > > > 14689165 > > > Arch. Microbiol. 181 (2), 144-154 (2004) > > > The effect of FITA mutations on the symbiotic properties of > Sinorhizobium fredii varies in a chromosomal-background-dependent manner > > > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., > Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. > and Ruiz-Sainz,J.E. > > > 4D57954EECDED66B > > > 1556 > > > 18060065 > > > PLoS ONE 2 (12), E1271 (2007) > > > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 > and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids > > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., > Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > > > 698688FB6DB95247 > > > 1559 > > > 18060065 > > > PLoS ONE 2 (12), E1271 (2007) > > > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 > and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids > > > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., > Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > > > E25E1BA99DB18F3D > > > > > > ? The second kind of error which I got was : > org.hibernate.PropertyValueException: not-null property references a null or > transient value: Location.feature > > > ? Which means in richsequence object some feature have > location object which have its feature set to null. > > > ? My Observation: > > > ? Usually occur when you try to persist a > richsequence object to database, and occur to those features which have > CompoundRichLocation usually "joins" and "complement" in cds region of a > genbank record > > > ? After catching the hibernate exception I went > through all the features and either biojava or hibernate changed the object > type of a CompoundRichLocation to SimpleRichLocation and set the feature > variable to null. > > > ? Below is the screen shot of one of my tests > > > ? Settings before trying to persits the > richsequence object to database > > > > > > > > > ? > > > ? After trying to persits the richsequence object to > database and got in hibernate exception catch > > > > > > ? > > > > > > ? So my question is why is this happening and how to stop > or how to get these record into database, I have no clue why is this > happening. > > > ? Some extra information to make things more clear to you > guys. > > > ? Below are some Locus line from genbank record for > which I know the error of location, I mean the cds region causing error, and > array index in richsequence.feature arrayList object. > > > ? LOCUS AE001439 1643831 > bp DNA circular BCT 19-JAN-2006 > > > ? richSequence.feature Index : 2540 > and line number in the genbank record : 22115 > > > ? LOCUS CP001189 3887492 > bp DNA circular BCT 16-OCT-2008 > > > ? richSequence.feature Index : 127 > and line number in the genbank record : 2137 > > > ? LOCUS CP001292 328635 > bp DNA circular BCT 17-DEC-2008 > > > ? richSequence.feature Index : 389 > and line number in the genbank record : 3632 > > > ? LOCUS AM279694 238517 > bp DNA linear BCT 23-OCT-2008 > > > ? richSequence.feature Index : 47 > and line number in the genbank record : 4841 > > > ? LOCUS CR931663 18517 > bp DNA linear BCT 18-SEP-2008 > > > ? richSequence.feature Index : 45 > and line number in the genbank record : 442 > > > ? The complete exception msg : > > > org.hibernate.PropertyValueException: not-null property references a > null or transient value: Location.feature > > > at > org.hibernate.engine.Nullability.checkNullability(Nullability.java:72) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > > > at > org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > > > at > org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > > > at > org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > > > at > org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > > > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > > > at > org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > > > at > org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > > at > org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > > > at > org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > > > at > org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > > > at > org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > > > at > org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > > > at > org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > > > at > org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > > > at > org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > > > at > org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > > > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > > > at > org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > > > at > org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > > at > org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > > > at > org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > > > at > org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > > > at > org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > > > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > > > at > org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > > > at > org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > > > at > org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > > > at > org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > > > at > org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27) > > > at > org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > > > at > org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535) > > > at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523) > > > at > trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78) > > > > > > > > > > -- > > Richard Holland, BSc MBCS > > Operations and Delivery Director, Eagle Genomics Ltd > > T: +44 (0)1223 654481 ext 3 | E: > > holland at eaglegenomics.com > > http://www.eaglegenomics.com/ > > > > > > > > > > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > > From andreas at sdsc.edu Thu Mar 25 16:56:21 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Thu, 25 Mar 2010 09:56:21 -0700 Subject: [Biojava-l] Biojava-l Digest, Vol 86, Issue 9 In-Reply-To: <4BAB70D6.5060309@uni-tuebingen.de> References: <3baec70c1003190350h7e73c1e7q726b823b5e3eae82@mail.gmail.com> <59a41c431003191042l406bdba1g501290b96c018ed0@mail.gmail.com> <3baec70c1003210007t39f22261j9838b3eb5b9ff861@mail.gmail.com> <59a41c431003221646u2b1da091uc81d601dbf412599@mail.gmail.com> <3baec70c1003240332n25b11e10i23c2c00c96a71a89@mail.gmail.com> <59a41c431003240837w3db713a3v3161b4e078faa483@mail.gmail.com> <4BAB70D6.5060309@uni-tuebingen.de> Message-ID: <59a41c431003250956h14abdbe2t1367bec10069d1f3@mail.gmail.com> Hi Andreas, that sounds great! I'll take a look at this soon... Thanks, Andreas On Thu, Mar 25, 2010 at 7:19 AM, Andreas Dr?ger < andreas.draeger at uni-tuebingen.de> wrote: > Hi Andreas and Shakuntala, > > The alignment classes have just been revised and can be now updated from > the repository. As a major improvement the alignment result has become much > easier usable. So, if you're interested in computing something based on the > score, you can now simply apply the dedicated get method and don't have to > care about parsing anymore. I hope that helps. > > Cheers > Andreas > > -- > Dipl.-Bioinform. Andreas Dr?ger > Eberhard Karls University T?bingen > Center for Bioinformatics (ZBIT) > Sand 1 > 72076 T?bingen > Germany > > Phone: +49-7071-29-70436 > Fax: +49-7071-29-5091 > From zhangyiwei79 at gmail.com Thu Mar 25 20:14:50 2010 From: zhangyiwei79 at gmail.com (Yiwei Zhang) Date: Thu, 25 Mar 2010 16:14:50 -0400 Subject: [Biojava-l] Question about All-Java Multiple Sequence Alignment project of Google Summer of Code Message-ID: <55c06db21003251314u142e44d3v25dac787216234e0@mail.gmail.com> Hi, I am a graduate student of computer science and my field of study is related to Bioinformatic algorithms. I am proficient at JAVA programming. I feel very interested in this project because currently I am working on sequence alignment and phylogeny tree reconstruction. My question is that, if the project requires implementing the existing alignment algorithms of current tools, what is the original implementation language of the tools? C++ or C or something else? Thanks! From biopython at maubp.freeserve.co.uk Thu Mar 25 22:16:55 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 25 Mar 2010 22:16:55 +0000 Subject: [Biojava-l] [Biojava-dev] Bug fix for Biojava in regard to email with subject : ( Hibernate Exception and suggestion for change in BioSqlSchema) In-Reply-To: <4BABAFA1.6090806@orionbiosciences.com> References: <4BAABA21.4000301@gmail.com> <4FAB0AC5-3D97-4FD8-8A7E-81D1D6381D39@eaglegenomics.com> <4BABAFA1.6090806@orionbiosciences.com> Message-ID: <320fb6e01003251516w2977ab2h9869342f94576287@mail.gmail.com> On Thu, Mar 25, 2010 at 6:46 PM, Deepak Sheoran wrote: > > That is reason why I was getting error when i was creating a Richsequence > object without any active session to biosql, I didn't had the clue that I > created one more bug by fixing one, thanks for noticing that and fixing > that. > > I am thinking should we use bioperl -biojava and biosql compatibility ?as > one of the google summer of code project. I have vision on this, but don't > know right way to being with. This can ?help people who want to use biojava > but can't because they are afraid to loos their Perl code,which is heavily > dependent on perl way of loading the schema. Or come out with a hybrid way > which have good from both languages. > > Deepak Sheoran That is an interesting idea for GSoC, I wonder if we at Biopython should do the same. I know of a few things where we differ from BioPerl's BioSQL support (e.g. SwissProt comment lines). [I take we agree that bioperl-db is the de facto reference implementation for mapping GenBank etc into BioSQL?] Peter From chapman at cs.wisc.edu Fri Mar 26 07:14:24 2010 From: chapman at cs.wisc.edu (Mark Chapman) Date: Fri, 26 Mar 2010 02:14:24 -0500 Subject: [Biojava-l] Question about All-Java Multiple Sequence Alignment project of Google Summer of Code In-Reply-To: <55c06db21003251314u142e44d3v25dac787216234e0@mail.gmail.com> References: <55c06db21003251314u142e44d3v25dac787216234e0@mail.gmail.com> Message-ID: <4BAC5ED0.1050009@cs.wisc.edu> Hi Yiwei (and list members), I am also a graduate student in Bioinformatics interested in the Google Summer of Code project. The authors' current implementations of ClustalW and ClustalX are written in C++. Binaries, code, and references are located at http://www.clustal.org/ . Download the boldfaced references (Larkin et al 2007 and Thompson et al 1994) for the most relevant information. Take care, Mark On 3/25/2010 3:14 PM, Yiwei Zhang wrote: > Hi, > > I am a graduate student of computer science and my field of study is related > to Bioinformatic algorithms. I am proficient at JAVA programming. I feel > very interested in this project because currently I am working on sequence > alignment and phylogeny tree reconstruction. > > My question is that, if the project requires implementing the existing > alignment algorithms of current tools, what is the > original implementation language of the tools? C++ or C or something else? > > Thanks! > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From bernd.jagla at pasteur.fr Fri Mar 26 09:33:05 2010 From: bernd.jagla at pasteur.fr (Bernd Jagla) Date: Fri, 26 Mar 2010 10:33:05 +0100 Subject: [Biojava-l] SVN repository In-Reply-To: <59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com> References: <4BA082EC.8010908@wur.nl><320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com> <59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com> Message-ID: <776506315DB04C3EBF2A7FDA610390AB@zillumina> Hi, I am trying to check out biojava for the first time, and I am not sure if the server is still down... Could you please let me if it is up or down? Thanks, Bernd > -----Original Message----- > From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l- > bounces at lists.open-bio.org] On Behalf Of Andreas Prlic > Sent: Wednesday, March 17, 2010 6:40 PM > To: Richard Finkers > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] SVN repository > > I have just heard back from the OBF-helpdesk. The VM hosting the anonymous > SVN is currently down. Depending on how big the problem turns out to be, > it > will be back at some point later today / should be back latest tomorrow. > > Sorry for this inconvenience. > Andreas > > > > > On Wed, Mar 17, 2010 at 3:16 AM, Peter > wrote: > > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers > > > wrote: > > > > > > Hi, > > > > > > I would like to have a look at the BioJava 3 code (and perhaps in the > > future > > > contribute to). However, I cannot access the SVN repository > > > ( > > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava- > live/trunk > > ). > > > > > > Is the repository down? > > > > > > Thanks, > > > Richard > > > > Probably :( > > > > There have been problems discussed on the BioPerl mailing list > > (they use the same servers), and the OBF team are aware of it: > > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html > > > > The code.open-bio.org repositories are a read only public mirror, > > while dev.open-bio.org is the master repository I think is fine > > (but not available for anonymous download). > > > > In the mean time BioPerl have also setup a read only mirror > > on github - perhaps BioJava could do the same? Meanwhile > > BioRuby and Biopython are just using github (not SVN or CVS). > > > > Peter > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From mitlox at op.pl Fri Mar 26 09:57:41 2010 From: mitlox at op.pl (xyz) Date: Fri, 26 Mar 2010 19:57:41 +1000 Subject: [Biojava-l] sort fasta file In-Reply-To: References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> Message-ID: <20100326195741.4799c398@wp01> @Andy: Thank you for the explanation. After the last sequence in the input file in no newline character. @James: I change the code in order to get the biggest sequence first, but the last sequence is missing. import java.io.*; import java.util.*; import org.biojava.bio.BioException; import org.biojava.bio.symbol.*; import org.biojavax.SimpleNamespace; import org.biojavax.bio.seq.*; import java.util.Comparator; public class SortFasta2 { static private class RichSequenceComparator implements Comparator { public int compare(RichSequence seq1, RichSequence seq2) { return seq2.length() - seq1.length(); } } // Usage: SortFasta unsortedFile.fasta public static void main(String[] args) throws FileNotFoundException, BioException { String fastaFile = "sortFasta.fasta"; BufferedReader br = new BufferedReader(new FileReader(fastaFile)); SimpleNamespace ns = new SimpleNamespace("biojava"); Alphabet protein = AlphabetManager.alphabetForName("DNA"); RichSequenceIterator rsi = RichSequence.IOTools.readFasta(br, protein.getTokenization("token"), ns); SortedSet sorted = new TreeSet(new SortFasta2.RichSequenceComparator()); while (rsi.hasNext()) { sorted.add(rsi.nextRichSequence()); } Iterator sortedIt = sorted.iterator(); /*Do whatever you want here with the ascending list of RichSequences by length, I'll just print them. */ while (sortedIt.hasNext()) { //System.out.println(((RichSequence) sortedIt.next()).length()); //System.out.println(sortedIt.next().getComments()); System.out.println(sortedIt.next().seqString()); } } } Input file: >1 atccccc >2 atccccctttttt >3 atccccccccccccccccctttt >4 tttttttccccccccccccccccccccccc >5 tttttttcccccccccccccccccccccca Output on the screen: tttttttccccccccccccccccccccccc atccccccccccccccccctttt atccccctttttt atccccc How is it possible to get the last sequence and print the output in fasta format on the screen? Thank you in advance. On Thu, 25 Mar 2010 10:17:31 -0400 James Swetnam wrote: > Just replace the system.out.println with whatever you want to do with > the sequences; write them to a file, etc. > > James > On Fri, 26 Mar 2010 09:40:28 +0000 "Andy Law (RI)" wrote: > Does your input file have a line feed at the end or not? (Just a > thought) > > Comparable is for comparing two objects using their "natural" > ordering and is therefore a "fundamental" property of the class. A > Comparator lets you compare/sort two objects on any characteristics > and you can have many different comparators. Since this is a somewhat > arbitrary way of comparing sequences (you could sort them on > alphabetical sequence for example, or GC content), I guess that's why > James used a comparator. > From richard.finkers at wur.nl Fri Mar 26 10:10:39 2010 From: richard.finkers at wur.nl (Finkers, Richard) Date: Fri, 26 Mar 2010 11:10:39 +0100 Subject: [Biojava-l] SVN repository References: <4BA082EC.8010908@wur.nl><320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com> <59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com> <776506315DB04C3EBF2A7FDA610390AB@zillumina> Message-ID: <33AFFE3255BCA043AF09514A6F6BFBAED04C0D@scomp0039.wurnet.nl> Hi Bernd, It has been working for two days but it seems to be down again. Richard -----Original Message----- From: Bernd Jagla [mailto:bernd.jagla at pasteur.fr] Sent: Fri 2010-03-26 10:33 To: 'Andreas Prlic'; Finkers, Richard Cc: biojava-l at lists.open-bio.org Subject: RE: [Biojava-l] SVN repository Hi, I am trying to check out biojava for the first time, and I am not sure if the server is still down... Could you please let me if it is up or down? Thanks, Bernd > -----Original Message----- > From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l- > bounces at lists.open-bio.org] On Behalf Of Andreas Prlic > Sent: Wednesday, March 17, 2010 6:40 PM > To: Richard Finkers > Cc: biojava-l at lists.open-bio.org > Subject: Re: [Biojava-l] SVN repository > > I have just heard back from the OBF-helpdesk. The VM hosting the anonymous > SVN is currently down. Depending on how big the problem turns out to be, > it > will be back at some point later today / should be back latest tomorrow. > > Sorry for this inconvenience. > Andreas > > > > > On Wed, Mar 17, 2010 at 3:16 AM, Peter > wrote: > > > On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers > > > wrote: > > > > > > Hi, > > > > > > I would like to have a look at the BioJava 3 code (and perhaps in the > > future > > > contribute to). However, I cannot access the SVN repository > > > ( > > http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava- > live/trunk > > ). > > > > > > Is the repository down? > > > > > > Thanks, > > > Richard > > > > Probably :( > > > > There have been problems discussed on the BioPerl mailing list > > (they use the same servers), and the OBF team are aware of it: > > http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html > > > > The code.open-bio.org repositories are a read only public mirror, > > while dev.open-bio.org is the master repository I think is fine > > (but not available for anonymous download). > > > > In the mean time BioPerl have also setup a read only mirror > > on github - perhaps BioJava could do the same? Meanwhile > > BioRuby and Biopython are just using github (not SVN or CVS). > > > > Peter > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l From andy.law at roslin.ed.ac.uk Fri Mar 26 10:12:11 2010 From: andy.law at roslin.ed.ac.uk (Andy Law (RI)) Date: Fri, 26 Mar 2010 10:12:11 +0000 Subject: [Biojava-l] sort fasta file In-Reply-To: <20100326195741.4799c398@wp01> References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> Message-ID: <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> On 26 Mar 2010, at 09:57, xyz wrote: > @Andy: Thank you for the explanation. After the last sequence in the > input file in no newline character. > Then RichSequenceIterator / RichSequence.IOTools.readFasta() are not seeing the last sequence when the file is not terminated with a newline character. Is this a bug or a feature, folks? Later, Andy -------- Yada, yada, yada... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. From andy.law at roslin.ed.ac.uk Fri Mar 26 10:36:25 2010 From: andy.law at roslin.ed.ac.uk (Andy Law (RI)) Date: Fri, 26 Mar 2010 10:36:25 +0000 Subject: [Biojava-l] sort fasta file In-Reply-To: <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> Message-ID: <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> On 26 Mar 2010, at 10:28, Richard Holland wrote: > That there be a bug. Albeit one with a simple workaround while the SVN server is broken :o} > > On 26 Mar 2010, at 10:12, Andy Law (RI) wrote: > >> >> On 26 Mar 2010, at 09:57, xyz wrote: >> >>> @Andy: Thank you for the explanation. After the last sequence in the >>> input file in no newline character. >>> >> >> Then RichSequenceIterator / RichSequence.IOTools.readFasta() are >> not seeing the last sequence when the file is not terminated with a >> newline character. Is this a bug or a feature, folks? >> >> Later, >> >> Andy >> -------- >> Yada, yada, yada... >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336 >> Disclaimer: This e-mail and any attachments are confidential and >> intended solely for the use of the recipient(s) to whom they are >> addressed. If you have received it in error, please destroy all >> copies and inform the sender. >> >> >> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > Later, Andy -------- Yada, yada, yada... The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. From holland at eaglegenomics.com Fri Mar 26 10:28:19 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 26 Mar 2010 10:28:19 +0000 Subject: [Biojava-l] sort fasta file In-Reply-To: <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> Message-ID: <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> That there be a bug. On 26 Mar 2010, at 10:12, Andy Law (RI) wrote: > > On 26 Mar 2010, at 09:57, xyz wrote: > >> @Andy: Thank you for the explanation. After the last sequence in the >> input file in no newline character. >> > > Then RichSequenceIterator / RichSequence.IOTools.readFasta() are not seeing the last sequence when the file is not terminated with a newline character. Is this a bug or a feature, folks? > > Later, > > Andy > -------- > Yada, yada, yada... > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 > Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Fri Mar 26 10:41:21 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 26 Mar 2010 10:41:21 +0000 Subject: [Biojava-l] sort fasta file In-Reply-To: <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> Message-ID: Do you have a fix? I can't remember if you've got SVN access or not - if you do, please do commit it, otherwise email me a patch and I'll commit it for you. On 26 Mar 2010, at 10:36, Andy Law (RI) wrote: > > On 26 Mar 2010, at 10:28, Richard Holland wrote: > >> That there be a bug. > > Albeit one with a simple workaround while the SVN server is broken :o} > >> >> On 26 Mar 2010, at 10:12, Andy Law (RI) wrote: >> >>> >>> On 26 Mar 2010, at 09:57, xyz wrote: >>> >>>> @Andy: Thank you for the explanation. After the last sequence in the >>>> input file in no newline character. >>>> >>> >>> Then RichSequenceIterator / RichSequence.IOTools.readFasta() are not seeing the last sequence when the file is not terminated with a newline character. Is this a bug or a feature, folks? >>> >>> Later, >>> >>> Andy >>> -------- >>> Yada, yada, yada... >>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > > Later, > > Andy > -------- > Yada, yada, yada... > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 > Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Fri Mar 26 11:04:22 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 26 Mar 2010 11:04:22 +0000 Subject: [Biojava-l] sort fasta file In-Reply-To: <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> Message-ID: I can't see anything in the code that would cause that behaviour. :( Could you provide sample code and a supporting FASTA file that replicates the problem? On 26 Mar 2010, at 10:36, Andy Law (RI) wrote: > > On 26 Mar 2010, at 10:28, Richard Holland wrote: > >> That there be a bug. > > Albeit one with a simple workaround while the SVN server is broken :o} > >> >> On 26 Mar 2010, at 10:12, Andy Law (RI) wrote: >> >>> >>> On 26 Mar 2010, at 09:57, xyz wrote: >>> >>>> @Andy: Thank you for the explanation. After the last sequence in the >>>> input file in no newline character. >>>> >>> >>> Then RichSequenceIterator / RichSequence.IOTools.readFasta() are not seeing the last sequence when the file is not terminated with a newline character. Is this a bug or a feature, folks? >>> >>> Later, >>> >>> Andy >>> -------- >>> Yada, yada, yada... >>> The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 >>> Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. >>> >>> >>> >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >> >> -- >> Richard Holland, BSc MBCS >> Operations and Delivery Director, Eagle Genomics Ltd >> T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> > > Later, > > Andy > -------- > Yada, yada, yada... > The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336 > Disclaimer: This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. > > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From Richard.Finkers at wur.nl Fri Mar 26 16:27:59 2010 From: Richard.Finkers at wur.nl (Richard Finkers) Date: Fri, 26 Mar 2010 17:27:59 +0100 Subject: [Biojava-l] SVN repository In-Reply-To: <776506315DB04C3EBF2A7FDA610390AB@zillumina> References: <4BA082EC.8010908@wur.nl><320fb6e01003170316x4fbf924do95dcc84703eaa28e@mail.gmail.com> <59a41c431003171039h4ca1267bibc45b0d7d270b2a9@mail.gmail.com> <776506315DB04C3EBF2A7FDA610390AB@zillumina> Message-ID: <4BACE08F.8020604@wur.nl> The repository has been back for two days. But it appears to be down again. Richard Bernd Jagla wrote: > Hi, > > I am trying to check out biojava for the first time, and I am not sure if > the server is still down... Could you please let me if it is up or down? > > Thanks, > > Bernd > > >> -----Original Message----- >> From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l- >> bounces at lists.open-bio.org] On Behalf Of Andreas Prlic >> Sent: Wednesday, March 17, 2010 6:40 PM >> To: Richard Finkers >> Cc: biojava-l at lists.open-bio.org >> Subject: Re: [Biojava-l] SVN repository >> >> I have just heard back from the OBF-helpdesk. The VM hosting the anonymous >> SVN is currently down. Depending on how big the problem turns out to be, >> it >> will be back at some point later today / should be back latest tomorrow. >> >> Sorry for this inconvenience. >> Andreas >> >> >> >> >> On Wed, Mar 17, 2010 at 3:16 AM, Peter >> wrote: >> >> >>> On Wed, Mar 17, 2010 at 7:21 AM, Richard Finkers >>> >> >> >>> wrote: >>> >>>> Hi, >>>> >>>> I would like to have a look at the BioJava 3 code (and perhaps in the >>>> >>> future >>> >>>> contribute to). However, I cannot access the SVN repository >>>> ( >>>> >>> http://code.open-bio.org/svnweb/index.cgi/biojava/browse/biojava- >>> >> live/trunk >> >>> ). >>> >>>> Is the repository down? >>>> >>>> Thanks, >>>> Richard >>>> >>> Probably :( >>> >>> There have been problems discussed on the BioPerl mailing list >>> (they use the same servers), and the OBF team are aware of it: >>> http://lists.open-bio.org/pipermail/bioperl-l/2010-March/032534.html >>> >>> The code.open-bio.org repositories are a read only public mirror, >>> while dev.open-bio.org is the master repository I think is fine >>> (but not available for anonymous download). >>> >>> In the mean time BioPerl have also setup a read only mirror >>> on github - perhaps BioJava could do the same? Meanwhile >>> BioRuby and Biopython are just using github (not SVN or CVS). >>> >>> Peter >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> >>> >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > > > > -- Dr. Richard Finkers Researcher Plant Breeding Wageningen UR Plant Breeding P.O. Box 16, 6700 AA, Wageningen, The Netherlands Wageningen Campus, Building 107, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands Tel. +31-317-484165 Fax +31-317-418094 http://www.plantbreeding.wur.nl/ https://www.eu-sol.wur.nl/ https://cbsgdbase.wur.nl/ http://www.disclaimer-uk.wur.nl/ From mitlox at op.pl Sat Mar 27 01:49:46 2010 From: mitlox at op.pl (xyz) Date: Sat, 27 Mar 2010 11:49:46 +1000 Subject: [Biojava-l] Reading and writting Fastq files Message-ID: <20100327114946.276925da@wp01> Hello, I could not find any examples how to read or write fastq files. import java.io.BufferedReader; import java.io.FileNotFoundException; import java.io.FileReader; import org.biojava.bio.program.fastq.FastqReader; public class Fastq2Fasta { public static void main(String[] args) throws FileNotFoundException { BufferedReader br = new BufferedReader(new FileReader("fastq2fasta.fasta")); } } Are there any examples how to work with fastq files? Thank you in advance. Best regards, From holland at eaglegenomics.com Sat Mar 27 08:18:04 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Sat, 27 Mar 2010 08:18:04 +0000 Subject: [Biojava-l] sort fasta file In-Reply-To: <20100327100348.1f253bfb@wp01> References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> <20100327100348.1f253bfb@wp01> Message-ID: <2AC8333D-EE71-495E-9C12-98764D81FE2D@eaglegenomics.com> Andy and I came to the conclusion yesterday that this is probably a bug with Java itself - somewhere in the readLine() method in BufferedReader. There's nothing in BioJava that could cause this kind of behaviour other than if it was being fed duff information by BufferedReader. On 27 Mar 2010, at 00:03, xyz wrote: > Please find the input fasta file attached. This file I created under > Linux and I also work with BioJava under Linux. Nothing change if I > created after the last sequence a new line. > > On Fri, 26 Mar 2010 11:04:22 +0000 > Richard Holland wrote: > >> I can't see anything in the code that would cause that >> behaviour. :( Could you provide sample code and a supporting FASTA >> file that replicates the problem? >> > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From mitlox at op.pl Sat Mar 27 09:48:14 2010 From: mitlox at op.pl (xyz) Date: Sat, 27 Mar 2010 19:48:14 +1000 Subject: [Biojava-l] sort fasta file In-Reply-To: References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> Message-ID: <20100327194814.1acc8655@wp01> You can find the input fasta file here http://mitlox.republika.pl/sortFasta.fasta . This file I created under Linux and I also work with BioJava under Linux. Nothing change if I created after the last sequence a new line. On Fri, 26 Mar 2010 11:04:22 +0000 Richard Holland wrote: > I can't see anything in the code that would cause that > behaviour. :( Could you provide sample code and a supporting FASTA > file that replicates the problem? > From voisingreg at yahoo.fr Sat Mar 27 11:24:01 2010 From: voisingreg at yahoo.fr (gregory voisin) Date: Sat, 27 Mar 2010 11:24:01 +0000 (GMT) Subject: [Biojava-l] Unsubcribe? In-Reply-To: <20100327194814.1acc8655@wp01> References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> <20100327194814.1acc8655@wp01> Message-ID: <832231.74869.qm@web23207.mail.ird.yahoo.com> Hi, How to unsubscribe of this list ? thanks greg ? ________________________________ De : xyz ? : Richard Holland Cc : Andy Law (RI) ; "biojava-l at lists.open-bio.org" Envoy? le : Sam 27 mars 2010, 6 h 48 min 14 s Objet?: Re: [Biojava-l] sort fasta file You can find the input fasta file here http://mitlox.republika.pl/sortFasta.fasta . This file I created under Linux and I also work with BioJava under Linux. Nothing change if I created after the last sequence a new line. On Fri, 26 Mar 2010 11:04:22 +0000 Richard Holland wrote: > I can't see anything in the code that would cause that > behaviour. :( Could you provide sample code and a supporting FASTA > file that replicates the problem? > _______________________________________________ Biojava-l mailing list? -? Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From mitlox at op.pl Sat Mar 27 13:54:40 2010 From: mitlox at op.pl (xyz) Date: Sat, 27 Mar 2010 23:54:40 +1000 Subject: [Biojava-l] Reading and writting Fastq files In-Reply-To: <326ea8621003270510v49c00250xc5ad80bece2ae8cb@mail.gmail.com> References: <20100327114946.276925da@wp01> <326ea8621003270510v49c00250xc5ad80bece2ae8cb@mail.gmail.com> Message-ID: <20100327235440.23cffb47@wp01> Hello, I would like to use org.biojava.bio.program.fastq in order to read and write Illumina fastq files. Are there any BioJava examples how to work with fastq files? On Sat, 27 Mar 2010 17:40:21 +0530 jitesh dundas wrote: > Hello, > > Fasta files are normal text files. Try parsing using normal text > parsing methods. > > If you could be more specific & tell me the format details,then I > could help better. > > btw,try using biojava ,the easy & better option if you want. > > Regards, > Jitesh Dundas > > On 3/27/10, xyz wrote: > > Hello, > > I could not find any examples how to read or write fastq files. > > > > import java.io.BufferedReader; > > import java.io.FileNotFoundException; > > import java.io.FileReader; > > import org.biojava.bio.program.fastq.FastqReader; > > > > public class Fastq2Fasta { > > public static void main(String[] args) throws > > FileNotFoundException { BufferedReader br = new BufferedReader(new > > FileReader("fastq2fasta.fasta")); > > } > > } > > > > Are there any examples how to work with fastq files? > > > > Thank you in advance. > > > > Best regards, > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > From heuermh at acm.org Sun Mar 28 04:27:16 2010 From: heuermh at acm.org (Michael Heuer) Date: Sun, 28 Mar 2010 00:27:16 -0400 (EDT) Subject: [Biojava-l] Reading and writting Fastq files In-Reply-To: <20100327235440.23cffb47@wp01> Message-ID: Sorry, I haven't written up an example for the Biojava Cookbook yet. The FASTQ package javadoc API is at http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/package-summary.html If you want to read Illumina format FASTQ files, use FastqReader reader = new IlluminaFastqReader(); for (Fastq fastq : reader.read(new File("in.fastq"))) { // ... } michael On Sat, 27 Mar 2010, xyz wrote: > Hello, > I would like to use org.biojava.bio.program.fastq in order to read and > write Illumina fastq files. > > Are there any BioJava examples how to work with fastq files? > > On Sat, 27 Mar 2010 17:40:21 +0530 > jitesh dundas wrote: > > > Hello, > > > > Fasta files are normal text files. Try parsing using normal text > > parsing methods. > > > > If you could be more specific & tell me the format details,then I > > could help better. > > > > btw,try using biojava ,the easy & better option if you want. > > > > Regards, > > Jitesh Dundas > > > > On 3/27/10, xyz wrote: > > > Hello, > > > I could not find any examples how to read or write fastq files. > > > > > > import java.io.BufferedReader; > > > import java.io.FileNotFoundException; > > > import java.io.FileReader; > > > import org.biojava.bio.program.fastq.FastqReader; > > > > > > public class Fastq2Fasta { > > > public static void main(String[] args) throws > > > FileNotFoundException { BufferedReader br = new BufferedReader(new > > > FileReader("fastq2fasta.fasta")); > > > } > > > } > > > > > > Are there any examples how to work with fastq files? > > > > > > Thank you in advance. > > > > > > Best regards, > > > _______________________________________________ > > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From mitlox at op.pl Sun Mar 28 05:44:57 2010 From: mitlox at op.pl (xyz) Date: Sun, 28 Mar 2010 15:44:57 +1000 Subject: [Biojava-l] Reading and writting Fastq files In-Reply-To: <326ea8621003270743j2b4f9d24ib3899d415edf3fc3@mail.gmail.com> References: <20100327114946.276925da@wp01> <326ea8621003270510v49c00250xc5ad80bece2ae8cb@mail.gmail.com> <20100327235440.23cffb47@wp01> <326ea8621003270743j2b4f9d24ib3899d415edf3fc3@mail.gmail.com> Message-ID: <20100328154457.46e088a6@wp01> Hello, I could create methods which can read and write fastq files. However, I downloaded the BioJava source code and in folder src/org/biojava/bio/program are following files: * AbstractFastqReader.java * AbstractFastqWriter.java * Fastq.java * FastqBuilder.java * FastqReader.java * FastqVariant.java * FastqWriter.java * IlluminaFastqReader.java * IlluminaFastqWriter.java * SangerFastqReader.java * SangerFastqWriter.java * SolexaFastqReader.java * SolexaFastqWriter.java These looks to me that is exactly what I need, but unfortunately I do not how to use it. On Sat, 27 Mar 2010 20:13:02 +0530 jitesh dundas wrote: > Hello, > > I could not find much info on that Q.Try the Biojava API for methods. > > However, I would think of this problem as a simple text file parsing > using BufferedReader and ByteInputStream based I/p ..You have to read > the text file content byte by byte using a while loop. The loop will > detect each column using the patterns (i haven't worked on fastq or > biojava that much) in the text file, e.g. space tabs.. > Why don't you try reading this fastq file as a simple text file in > java. > > This is assuming that fastq are text files..Correct me if I am wrong.. > Java tutorial & forums have bulk of egs on that. > > Try writing the code and send the fastq file with the java code if you > face issues.. > > Hope this helps.. > > Regards, > jd From mitlox at op.pl Sun Mar 28 07:20:40 2010 From: mitlox at op.pl (xyz) Date: Sun, 28 Mar 2010 17:20:40 +1000 Subject: [Biojava-l] Reading and writting Fastq files In-Reply-To: References: <20100327235440.23cffb47@wp01> Message-ID: <20100328172040.478de1a1@wp01> Do not worry. I wrote following code: import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.IOException; import org.biojava.bio.program.fastq.Fastq; import org.biojava.bio.program.fastq.FastqBuilder; import org.biojava.bio.program.fastq.FastqReader; import org.biojava.bio.program.fastq.FastqWriter; import org.biojava.bio.program.fastq.IlluminaFastqReader; import org.biojava.bio.program.fastq.IlluminaFastqWriter; public class Fastq2Fasta { public static void main(String[] args) throws FileNotFoundException, IOException { FileInputStream inputFastq = new FileInputStream("fastq2fasta.fastq"); FastqReader qReader = new IlluminaFastqReader(); FileOutputStream outputFastq = new FileOutputStream("fastq2fastaTrim.fastq"); FastqWriter qWriter = new IlluminaFastqWriter(); for (Fastq fastq : qReader.read(inputFastq)) { System.out.println(fastq.getDescription()); System.out.println(fastq.getSequence()); String trimSeq = fastq.getSequence().substring(0, fastq.getSequence().length() - 6); System.out.println(trimSeq); System.out.println(fastq.getQuality()); String trimQual = fastq.getQuality().substring(0, fastq.getQuality().length() - 6); System.out.println(trimQual); FastqBuilder trimFastq = new FastqBuilder(); trimFastq.withDescription(fastq.getDescription()); trimFastq.appendSequence(trimSeq); trimFastq.appendQuality(trimQual); qWriter.write(outputFastq, trimFastq.build()); } } } and the input fastq file is: @HWI-EAS406:5:1:0:1390#0/1 GGGTGATGGCCGCTGCCGATGGCGTCAAAACCCACC +HWI-EAS406:5:1:0:1390#0/1 OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAA @HWI-EAS406:5:1:0:1390#0/1 GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC +HWI-EAS406:5:1:0:1390#0/1 PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPBBBBBB @HWI-EAS406:5:1:0:1390#0/1 GGGTGATGGCCGCTGCCGATGGCGTCAAACCCCACC +HWI-EAS406:5:1:0:1390#0/1 QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQCCCCCC Unfortunately, I get the following error: HWI-EAS406:5:1:0:1390#0/1 GGGTGATGGCCGCTGCCGATGGCGTCAAAACCCACC GGGTGATGGCCGCTGCCGATGGCGTCAAAA OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAA OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO Exception in thread "main" java.io.IOException: sequence HWI-EAS406:5:1:0:1390#0/1 not fastq-illumina format, was fastq-sanger at org.biojava.bio.program.fastq.IlluminaFastqWriter.validate(IlluminaFastqWriter.java:41) at org.biojava.bio.program.fastq.AbstractFastqWriter.append(AbstractFastqWriter.java:67) at org.biojava.bio.program.fastq.AbstractFastqWriter.write(AbstractFastqWriter.java:143) at org.biojava.bio.program.fastq.AbstractFastqWriter.write(AbstractFastqWriter.java:125) at Fastq2Fasta.main(Fastq2Fasta.java:37) Java Result: 1 What did I wrong? On Sun, 28 Mar 2010 00:27:16 -0400 (EDT) Michael Heuer wrote: > > Sorry, I haven't written up an example for the Biojava Cookbook yet. > > The FASTQ package javadoc API is at > > http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/package-summary.html > > If you want to read Illumina format FASTQ files, use > > FastqReader reader = new IlluminaFastqReader(); > for (Fastq fastq : reader.read(new File("in.fastq"))) > { > // ... > } > > michael From andreas at sdsc.edu Sun Mar 28 17:44:32 2010 From: andreas at sdsc.edu (Andreas Prlic) Date: Sun, 28 Mar 2010 10:44:32 -0700 Subject: [Biojava-l] Unsubcribe? In-Reply-To: <832231.74869.qm@web23207.mail.ird.yahoo.com> References: <20100320201718.4420a9b9@wp01> <20100325232337.3021200a@wp01> <20100326195741.4799c398@wp01> <7C86676A-ADC3-450E-8AB0-C6D91C3FA50E@roslin.ed.ac.uk> <84422194-0966-487C-BB32-537C6BA256CF@eaglegenomics.com> <1A0823B4-55A9-4C32-9A9C-8499EE26DEF5@roslin.ed.ac.uk> <20100327194814.1acc8655@wp01> <832231.74869.qm@web23207.mail.ird.yahoo.com> Message-ID: <59a41c431003281044y36137b05nd993e8e51ef7484e@mail.gmail.com> We are using mailman for our mailing lists : http://www.biojava.org/mailman/listinfo/biojava-l Andreas On Sat, Mar 27, 2010 at 4:24 AM, gregory voisin wrote: > Hi, > How to unsubscribe of this list ? > thanks > greg > > > > > > ________________________________ > De : xyz > ? : Richard Holland > Cc : Andy Law (RI) ; " > biojava-l at lists.open-bio.org" > Envoy? le : Sam 27 mars 2010, 6 h 48 min 14 s > Objet : Re: [Biojava-l] sort fasta file > > You can find the input fasta file here > http://mitlox.republika.pl/sortFasta.fasta . This file I created under > Linux and I also work with BioJava under Linux. Nothing change if I > created after the last sequence a new line. > > On Fri, 26 Mar 2010 11:04:22 +0000 > Richard Holland wrote: > > > I can't see anything in the code that would cause that > > behaviour. :( Could you provide sample code and a supporting FASTA > > file that replicates the problem? > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From heuermh at acm.org Tue Mar 30 02:01:23 2010 From: heuermh at acm.org (Michael Heuer) Date: Mon, 29 Mar 2010 22:01:23 -0400 (EDT) Subject: [Biojava-l] Reading and writting Fastq files In-Reply-To: <20100328172040.478de1a1@wp01> Message-ID: FastqBuilder defaults to the Sanger variant, see http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/FastqBuilder.html#DEFAULT_VARIANT In your code, you just need to specify the Illumina variant FastqBuilder trimFastq = new FastqBuilder() .withVariant(FastqVariant.FASTQ_ILLUMINA) .withDescription(fastq.getDescription()) .appendSequence(trimSeq) .appendQuality(trimQual); Please let me know if you have any API or doc suggestions, as this stuff has not been used much by anyone other than myself. michael On Sun, 28 Mar 2010, xyz wrote: > Do not worry. I wrote following code: > > import java.io.FileInputStream; > import java.io.FileNotFoundException; > import java.io.FileOutputStream; > import java.io.IOException; > import org.biojava.bio.program.fastq.Fastq; > import org.biojava.bio.program.fastq.FastqBuilder; > import org.biojava.bio.program.fastq.FastqReader; > import org.biojava.bio.program.fastq.FastqWriter; > import org.biojava.bio.program.fastq.IlluminaFastqReader; > import org.biojava.bio.program.fastq.IlluminaFastqWriter; > > public class Fastq2Fasta { > > public static void main(String[] args) throws FileNotFoundException, > IOException { > FileInputStream inputFastq = new > FileInputStream("fastq2fasta.fastq"); FastqReader qReader = new > IlluminaFastqReader(); > > FileOutputStream outputFastq = new > FileOutputStream("fastq2fastaTrim.fastq"); FastqWriter qWriter = > new IlluminaFastqWriter(); > > > for (Fastq fastq : qReader.read(inputFastq)) { > System.out.println(fastq.getDescription()); > System.out.println(fastq.getSequence()); > String trimSeq = fastq.getSequence().substring(0, > fastq.getSequence().length() - 6); System.out.println(trimSeq); > System.out.println(fastq.getQuality()); > String trimQual = fastq.getQuality().substring(0, > fastq.getQuality().length() - 6); System.out.println(trimQual); > > FastqBuilder trimFastq = new FastqBuilder(); > trimFastq.withDescription(fastq.getDescription()); > trimFastq.appendSequence(trimSeq); > trimFastq.appendQuality(trimQual); > > qWriter.write(outputFastq, trimFastq.build()); > } > } > } > > and the input fastq file is: > @HWI-EAS406:5:1:0:1390#0/1 > GGGTGATGGCCGCTGCCGATGGCGTCAAAACCCACC > +HWI-EAS406:5:1:0:1390#0/1 > OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAA > @HWI-EAS406:5:1:0:1390#0/1 > GGGTGATGGCCGCTGCCGATGGCGTCAAATCCCACC > +HWI-EAS406:5:1:0:1390#0/1 > PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPBBBBBB > @HWI-EAS406:5:1:0:1390#0/1 > GGGTGATGGCCGCTGCCGATGGCGTCAAACCCCACC > +HWI-EAS406:5:1:0:1390#0/1 > QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQCCCCCC > > Unfortunately, I get the following error: > HWI-EAS406:5:1:0:1390#0/1 > GGGTGATGGCCGCTGCCGATGGCGTCAAAACCCACC > GGGTGATGGCCGCTGCCGATGGCGTCAAAA > OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOAAAAAA > OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO > Exception in thread "main" java.io.IOException: sequence > HWI-EAS406:5:1:0:1390#0/1 not fastq-illumina format, was fastq-sanger > at > org.biojava.bio.program.fastq.IlluminaFastqWriter.validate(IlluminaFastqWriter.java:41) > at > org.biojava.bio.program.fastq.AbstractFastqWriter.append(AbstractFastqWriter.java:67) > at > org.biojava.bio.program.fastq.AbstractFastqWriter.write(AbstractFastqWriter.java:143) > at > org.biojava.bio.program.fastq.AbstractFastqWriter.write(AbstractFastqWriter.java:125) > at Fastq2Fasta.main(Fastq2Fasta.java:37) Java Result: 1 > > What did I wrong? > > On Sun, 28 Mar 2010 00:27:16 -0400 (EDT) > Michael Heuer wrote: > > > > > Sorry, I haven't written up an example for the Biojava Cookbook yet. > > > > The FASTQ package javadoc API is at > > > > http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/package-summary.html > > > > If you want to read Illumina format FASTQ files, use > > > > FastqReader reader = new IlluminaFastqReader(); > > for (Fastq fastq : reader.read(new File("in.fastq"))) > > { > > // ... > > } > > > > michael > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From mitlox at op.pl Tue Mar 30 11:50:47 2010 From: mitlox at op.pl (xyz) Date: Tue, 30 Mar 2010 21:50:47 +1000 Subject: [Biojava-l] Reading and writting Fastq files In-Reply-To: References: <20100328172040.478de1a1@wp01> Message-ID: <20100330215047.084f6b00@wp01> Thank you it works, but after I extended the code with RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns, fastq.getDescription()); in order to get also a trimmed fasta file I got the following error: Fastq2Fasta.java:51: cannot find symbol symbol : method writeFasta(java.io.FileOutputStream,java.lang.String,org.biojavax.SimpleNamespace,java.lang.String) location: class org.biojavax.bio.seq.RichSequence.IOTools RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns, fastq.getDescription()); 1 error Complete Code: import java.io.FileInputStream; import java.io.FileNotFoundException; import java.io.FileOutputStream; import java.io.IOException; import org.biojava.bio.program.fastq.Fastq; import org.biojava.bio.program.fastq.FastqBuilder; import org.biojava.bio.program.fastq.FastqReader; import org.biojava.bio.program.fastq.FastqVariant; import org.biojava.bio.program.fastq.FastqWriter; import org.biojava.bio.program.fastq.IlluminaFastqReader; import org.biojava.bio.program.fastq.IlluminaFastqWriter; import org.biojavax.SimpleNamespace; import org.biojavax.bio.seq.RichSequence; public class Fastq2Fasta { public static void main(String[] args) throws FileNotFoundException, IOException { FileInputStream inputFastq = new FileInputStream("fastq2fasta.fastq"); FastqReader qReader = new IlluminaFastqReader(); FileOutputStream outputFastq = new FileOutputStream("fastq2fastaTrim.fastq"); FastqWriter qWriter = new IlluminaFastqWriter(); SimpleNamespace ns = new SimpleNamespace("biojava"); FileOutputStream outputFasta = new FileOutputStream("fastq2fastaTrim.fasta"); for (Fastq fastq : qReader.read(inputFastq)) { System.out.println(fastq.getDescription()); System.out.println(fastq.getSequence()); String trimSeq = fastq.getSequence().substring(0, fastq.getSequence().length() - 6); System.out.println(trimSeq); System.out.println(fastq.getQuality()); String trimQual = fastq.getQuality().substring(0, fastq.getQuality().length() - 6); System.out.println(trimQual); FastqBuilder trimFastq = new FastqBuilder(); trimFastq.withVariant(FastqVariant.FASTQ_ILLUMINA); trimFastq.withDescription(fastq.getDescription()); trimFastq.appendSequence(trimSeq); trimFastq.appendQuality(trimQual); qWriter.write(outputFastq, trimFastq.build()); RichSequence.IOTools.writeFasta(outputFasta, trimSeq, ns, fastq.getDescription()); } } } What did I wrong? Suggestions: 1) After I trimmed the fastq files the header information for quality is empty @HWI-EAS406:5:1:0:1390#0/1 GGGTGATGGCCGCTGCCGATGGCGTCAAAA + OOOOOOOOOOOOOOOOOOOOOOOOOOOOOO this reduced the size of the files but is it compatible with SOAP and TopHat? 2) I was using fastq files up to 6 GBytes and I have not run any benchmarks with different Buffer/stream combination on big text files and therefore I am not sure that is enough to use just FileInputStream or FileOutputStream. BioJavaX is using BufferedReader br = new BufferedReader(new FileReader()) are there any speed difference? Overall I think the API looks good and for doc you could use this code and put it on BioJava. On Mon, 29 Mar 2010 22:01:23 -0400 (EDT) Michael Heuer wrote: > > FastqBuilder defaults to the Sanger variant, see > > http://www.biojava.org/docs/api/org/biojava/bio/program/fastq/FastqBuilder.html#DEFAULT_VARIANT > > > In your code, you just need to specify the Illumina variant > > FastqBuilder trimFastq = new FastqBuilder() > .withVariant(FastqVariant.FASTQ_ILLUMINA) > .withDescription(fastq.getDescription()) > .appendSequence(trimSeq) > .appendQuality(trimQual); > > > Please let me know if you have any API or doc suggestions, as this > stuff has not been used much by anyone other than myself. > > michael > > > From rmb32 at cornell.edu Fri Mar 26 07:44:09 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 00:44:09 -0700 Subject: [Biojava-l] GSoC mentors mailing list Message-ID: <4BAC65C9.307@cornell.edu> Hi all, If you have volunteered to be a possible GSoC mentor, and have not already been subscribed to the (mentors-only) gsoc-mentors mailing list, send me an email and I'll subscribe you. Rob Buels OBF GSoC 2010 Admin From rmb32 at cornell.edu Fri Mar 26 16:30:30 2010 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 26 Mar 2010 09:30:30 -0700 Subject: [Biojava-l] Announcing OBF Summer of Code - please forward! Message-ID: <4BACE126.1030500@cornell.edu> Hi all, Here's an advertising-ready announcement for OBF's Summer of Code, thanks to Christian Zmasek and Hilmar Lapp for their excellent writing. Student applications are due April 9! Please spread it widely, we need to reach lots of students with it! Rob Buels OBF GSoC 2010 Admin ============================================================ *** Please disseminate widely at your local institutions *** *** including posting to message and job boards, so that *** *** we reach as many students as possible. *** ============================================================ OPEN BIOINFORMATICS FOUNDATION SUMMER OF CODE 2010 Applications due 19:00 UTC, April 9, 2010. http://www.open-bio.org/wiki/Google_Summer_of_Code The Open Bioinformatics Foundation Summer of Code program provides a unique opportunity for undergraduate, masters, and PhD students to obtain hands-on experience writing and extending open-source software for bioinformatics under the mentorship of experienced developers from around the world. The program is the participation of the Open Bioinformatics Foundation (OBF) as a mentoring organization in the Google Summer of Code(tm) (http://code.google.com/soc/). Students successfully completing the 3 month program receive a $5,000 USD stipend, and may work entirely from their home or home institution. Participation is open to students from any country in the world except countries subject to US trade restrictions. Each student will have at least one dedicated mentor to show them the ropes and help them complete their project. The Open Bioinformatics Foundation is particularly seeking students interested in both bioinformatics (computational biology) and software development. Some initial project ideas are listed on the website. These range from Galaxy phylogenetics pipeline development in Biopython to lightweight sequence objects and lazy parsing in BioPerl, a DAS Server for large files on local filesystems, and mapping Java libraries to Perl/Ruby/Python using Biolib+SWIG+JNI. All project ideas are flexible and many can be adjusted in scope to match the skills of the student. We also welcome and encourage students proposing their own project ideas; historically some of the most successful Summer of Code projects are ones proposed by the students themselves. TO APPLY: Apply online at the Google Summer of Code website (http://socghop.appspot.com/), where you will also find GSoC program rules and eligibility requirements. The 12-day application period for students runs from Monday, March 29 through Friday, April 9th, 2010. INQUIRIES: We strongly encourage all interested students to get in touch with us with their ideas as early on as possible. See the OBF GSoC page for contact details. 2010 OBF Summer of Code: http://www.open-bio.org/wiki/Google_Summer_of_Code Google Summer of Code FAQ: http://socghop.appspot.com/document/show/program/google/gsoc2010/faqs From sheoran143 at gmail.com Thu Mar 25 01:19:29 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Wed, 24 Mar 2010 20:19:29 -0500 Subject: [Biojava-l] Bug fix for Biojava in regard to email with subject :( Hibernate Exception and suggestion for change in BioSqlSchema) Message-ID: <4BAABA21.4000301@gmail.com> I am writing this email again, I didn't get any response weather this bugs are patched or are they lost some where on mailing list. I am not sure that's why I am writing this back. I don't know how to apply this patch So I am counting on you guys to apply theses patch and reply me back so I know its fixed. Thanks Deepak Sheoran Hi In response to bug fix suggested by Richard I have created some patches. We need to apply these to fix biojava from processing references from a genbank record in a wrong manner which cause more hibernate exceptions. After applying patch, reference resolution code will test pubmed or medline id, then if no match then test author/title/location, then if still no match create a new reference. I even tested it with GenbankRelease 175 and I gained almost 3159 more records in my database. Can somebody please have a look on second issue of it and fix it " 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from). " Also I am planning on making a bridge between biosql database loaded using bioperl and biojava, here is my some of the investigation can you guys suggest some direction on it. Have a look on attached files 1) Biojava_BioPerl_Diff.xls ==> it have view of tables where genbank record is stored in biosql instance by bioperl and biojava 2) GenbankRecord.doc ==> its word document having a genbank showing where its information goes in biosql using bioperl and biojava 3) BioSqlRichobjectBuilder.patch ==> patch needed for BioSqlRichObjectBuild.java class 4) GenBankFormat.patch ==> patch needed for GenBankFormat.java class Thanks Deepak Sheoran -------- Original Message -------- Subject: Re: Hibernate Exception and suggestion for change in BioSqlSchema Date: Tue, 9 Feb 2010 20:34:32 +1300 From: Richard Holland To: Deepak Sheoran CC: biojava-l at biojava.org Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text. However, in answer to your two questions: 1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March). 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from). cheers, Richard On 9 Feb 2010, at 20:21, Deepak Sheoran wrote: > > Hi Richard > > Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message. > > > Thanks > Deepak Sheoran > -------- Original Message -------- > Subject: Hibernate Exception and suggestion for change in BioSqlSchema > Date: Wed, 03 Feb 2010 08:07:35 -0600 > From: Deepak Sheoran > To: biojava-l at lists.open-bio.org > > Hi guys, > > A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is:http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html > On Richard suggestion in above link I am able to resolve some of issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us. > ? The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id. > This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object . > Now when you tie RichObjectFactory to a active hibernate session then the class "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database. > But problem is with below part of that method: > ?..LineNumber: 114 > else if (SimpleDocRef.class.isAssignableFrom(clazz)) > { queryType = "DocRef"; > // convert List constructor to String representation for query > ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true)); > if (ourParamsList.size()<3) { > queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null"; > } else { > queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?"; > } > } > ..LineNubmer: 123 > Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code > ?.LineNumber: 447 > else { > try { > CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)}); > RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount); > rlistener.getCurrentFeature().addRankedCrossRef(rcr); > } catch (ChangeVetoException e) { > throw new ParseException(e+", accession:"+accession); > } > } > ?..LineNumber:455 > Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of "unique constraint on dbxref_id" column. > > The only way to get these record in database is: > ? The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table. Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them. > ? Second solution is slightly difficult to implement, is to change the way "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)" make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session. > > Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email) > Reference_id > Dbxref_id > Location > Title > Authors > crc > 216 > 18554304 > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008) > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H. > 9E940E01F4BE3CD0 > 230 > 18554304 > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008) > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H. > D3BC0C17F3F786C9 > 415 > 16790744 > Infect. Immun. 74 (7), 3715-3726 (2006) > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A. > 60AEDFA0CEEACC38 > 969 > 16790744 > Infect. Immun. 74 (7), 3715-3726 (2006) > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A. > 4B1232999F6E8130 > 929 > 8688087 > Science 273 (5278), 1058-1073 (1996) > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. > 3E79B40DD2AAA2B7 > 932 > 8688087 > Science 273 (5278), 1058-1073 (1996) > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. > 094EB3384F8D6DE8 > 1426 > 10684935 > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39 > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M. > 357648D8FD8C6C8A > 1481 > 10684935 > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39 > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C. > 115411EB2DEE5654 > 1497 > 14689165 > Arch. Microbiol. 181 (2), 144-154 (2004) > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E. > 4D5D376EECCD186B > 1501 > 14689165 > Arch. Microbiol. 181 (2), 144-154 (2004) > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E. > 4D57954EECDED66B > 1556 > 18060065 > PLoS ONE 2 (12), E1271 (2007) > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > 698688FB6DB95247 > 1559 > 18060065 > PLoS ONE 2 (12), E1271 (2007) > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > E25E1BA99DB18F3D > > ? The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature > ? Which means in richsequence object some feature have location object which have its feature set to null. > ? My Observation: > ? Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record > ? After catching the hibernate exception I went through all the features and either biojava or hibernate changed the object type of a CompoundRichLocation to SimpleRichLocation and set the feature variable to null. > ? Below is the screen shot of one of my tests > ? Settings before trying to persits the richsequence object to database > > > ? > ? After trying to persits the richsequence object to database and got in hibernate exception catch > > ? > > ? So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening. > ? Some extra information to make things more clear to you guys. > ? Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object. > ? LOCUS AE001439 1643831 bp DNA circular BCT 19-JAN-2006 > ? richSequence.feature Index : 2540 and line number in the genbank record : 22115 > ? LOCUS CP001189 3887492 bp DNA circular BCT 16-OCT-2008 > ? richSequence.feature Index : 127 and line number in the genbank record : 2137 > ? LOCUS CP001292 328635 bp DNA circular BCT 17-DEC-2008 > ? richSequence.feature Index : 389 and line number in the genbank record : 3632 > ? LOCUS AM279694 238517 bp DNA linear BCT 23-OCT-2008 > ? richSequence.feature Index : 47 and line number in the genbank record : 4841 > ? LOCUS CR931663 18517 bp DNA linear BCT 18-SEP-2008 > ? richSequence.feature Index : 45 and line number in the genbank record : 442 > ? The complete exception msg : > org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature > at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72) > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290) > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535) > at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523) > at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78) > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E:holland at eaglegenomics.com http://www.eaglegenomics.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: Biojava_BioPerl_diff.xls Type: application/vnd.ms-excel Size: 346624 bytes Desc: not available URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: BioSqlRichObjectBuilder.patch URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: GenbankFormat.patch URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: GenbankRecord.doc Type: application/msword Size: 59392 bytes Desc: not available URL: