From mauricio at open-bio.org Fri Feb 5 10:48:30 2010 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Fri, 05 Feb 2010 09:48:30 -0600 Subject: [Biojava-l] Fwd: Changes to NCBI BLAST and E-utilities. Message-ID: <4B6C3DCE.2070808@open-bio.org> Forwarding to the proper lists... -------- Original Message -------- Subject: [O|B|F Helpdesk #889] Changes to NCBI BLAST and E-utilities. Date: Fri, 5 Feb 2010 10:08:51 -0500 From: mcginnis via RT Reply-To: support at helpdesk.open-bio.org To: chris at bioteam.net, heikki at sanbi.ac.za, hlapp at gmx.net, jason at bioperl.org, mauricio at open-bio.org Fri Feb 05 10:08:51 2010: Request 889 was acted upon. Transaction: Ticket created by mcginnis at ncbi.nlm.nih.gov Queue: support at open-bio.org Subject: Changes to NCBI BLAST and E-utilities. Owner: Nobody Requestors: mcginnis at ncbi.nlm.nih.gov Status: new Ticket Dear Colleague: There are two changes I'd like to make you aware of. As you may or may not have noticed, we have been working on a new C++ version of the BLAST binaries. In the coming months we will be moving the C++ binaries into prominence and (slowly) phasing out the C toolkit binaries. There are many changes not least of which is a move to individual binaries for each program (blastn, blastp, etc). We are not sure how many of your users use BioPerl with the BLAST binaries, my understanding is that many use BioPerl to to remote BLAST. However, there isa change to the BLAST results in Text and presumably HTML. This could have an effect on any parsers which scrape these formats and do not use XML. For obvious reason, we want to support only the XML format for parsing, but we thought we should give you heads up on this. blast 2.2.22 Query: 3307 ------------------------------------------------------------ 3307 Sbjct: 390 GSGPEAFRGSGPEAFRGSGPEAFRGSGPEAFRGSGPEAFRGSGPEAFRGSGPEAFRGSGP 449 blast 2.2.22+ Query ------------------------------------------------------------ Sbjct 390 GSGPEAFRGSGPEAFRGSGPEAFRGSGPEAFRGSGPEAFRGSGPEAFRGSGPEAFRGSGP 449 A single line of gaps lacks the Query numbering in the blast+ output. The C version of blast has numbering in this case. Sample alignment shown below. According to users the blast+ output without the numbering breaks bioperl parsers. Wehave heard forma few but I think they may be older parsers? The second issue is a policy concerning E-utilities. This was announced on the utilities-announce at ncbi.nlm.nih.gov mail-list but you may not have seen it. As part of an ongoing effort to ensure efficient access to the Entrez Utilities (E-utilities) by all users, NCBI has decided to change the usage policy for the E-utilities effective June 1, 2010. Effective on June 1, 2010, all E-utility requests, either using standard URLs or SOAP, must contain non-null values for both the &tool and &email parameters. Any E-utility request made after June 1, 2010 that does not contain values for both parameters will return an error explaining that these parameters must be included in E-utility requests. The value of the &tool parameter should be a URI-safe string that is the name of the software package, script or web page producing the E-utility request. The value of the &email parameter should be a valid e-mail address for the appropriate contact person or group responsible for maintaining the tool producing the E-utility request. NCBI uses these parameters to contact users whose use of the E-utilities violates the standard usage policies described athttp://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements. These usage policies are designed to prevent excessive requests from a small group of users from reducing or eliminating the wider community's access to the E-utilities. NCBI will attempt to contact a user at the e-mail address provided in the &email parameter prior to blocking access to the E-utilities. NCBI realizes that this policy change will require many of our users to change their code. Based on past experience, we anticipate that most of our users should be able to make the necessary changes before the June 1, 2010 deadline. If you have any concerns about making these changes by that date, or if you have any questions about these policies, please contact eutilities at ncbi.nlm.nih.gov. Thank you for your understanding and cooperation in helping us continue to deliver a reliable and efficient web service. I think you already adhere to this policy but should a user's script not meet these requirements, than the script will fail and requests will be turned away with an error message. Scott D. McGinnis M.A. NCBI/NLM/NIH 45 Center Drive, MSC 6511 Bldg 45, Room 4AN.44C Bethesda, MD 20892 mcginnis at ncbi.nlm.nih.gov From charles at imbusch.net Mon Feb 8 11:25:23 2010 From: charles at imbusch.net (Charles Imbusch) Date: Mon, 08 Feb 2010 17:25:23 +0100 Subject: [Biojava-l] .sff support Message-ID: <4B703AF3.4000300@imbusch.net> Hello, I have been wondering whether Biojava is able to handle sff files coming from 454 sequencing runs. I found something here: http://lists.open-bio.org/pipermail/biojava-dev/2009-July/003907.html Does somebody know about the current status on Biojava and sff files? Thanks in advance, Charles From paolo.pavan at gmail.com Mon Feb 8 16:24:49 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Mon, 8 Feb 2010 22:24:49 +0100 Subject: [Biojava-l] .sff support In-Reply-To: <4B703AF3.4000300@imbusch.net> References: <4B703AF3.4000300@imbusch.net> Message-ID: <56be91b61002081324t3423359dm917b283c6a1f2474@mail.gmail.com> Unfortunately, after spending some time on it, I didn't anything, sorry. There is just a post more I sent to Andreas Prlic without enclose the list by mistake, in which I report a few info more, coming from my reading on BioPerl's way to manage contigs and assembly informations. Nothing more. Paolo 2010/2/8 Charles Imbusch > Hello, > > I have been wondering whether Biojava is able to > handle sff files coming from 454 sequencing runs. > > I found something here: > http://lists.open-bio.org/pipermail/biojava-dev/2009-July/003907.html > > Does somebody know about the current status on Biojava and sff files? > > > Thanks in advance, > Charles > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From biopython at maubp.freeserve.co.uk Mon Feb 8 19:59:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 9 Feb 2010 00:59:37 +0000 Subject: [Biojava-l] .sff support In-Reply-To: <56be91b61002081324t3423359dm917b283c6a1f2474@mail.gmail.com> References: <4B703AF3.4000300@imbusch.net> <56be91b61002081324t3423359dm917b283c6a1f2474@mail.gmail.com> Message-ID: <320fb6e01002081659u793228d1g17abb4f8e0100837@mail.gmail.com> > 2010/2/8 Charles Imbusch >> Hello, >> >> I have been wondering whether Biojava is able to >> handle sff files coming from 454 sequencing runs. >> >> I found something here: >> http://lists.open-bio.org/pipermail/biojava-dev/2009-July/003907.html >> >> Does somebody know about the current status on Biojava and sff files? >> >> >> Thanks in advance, >> Charles On Mon, Feb 8, 2010 at 9:24 PM, Paolo Pavan wrote: > > Unfortunately, after spending some time on it, I didn't anything, sorry. > There is just a post more I sent to Andreas Prlic without enclose the list > by mistake, in which I report a few info more, coming from my reading on > BioPerl's way to manage contigs and assembly informations. > Nothing more. > > Paolo Hi, I've CC'd the common OpenBio mailing list as this is probably of interest beyond just BioJava. Based on code from Jose Blanca (author of sff_extract), I implemented support for the SFF (Roche 454) sequencing reads for Biopython last year on a branch that I hope to merge into our next release, currently here: http://github.com/peterjc/biopython/tree/sff-seqio In addition to the Roche Manuals (which may not be that easy to get a copy of), the SFF format is described on this NCBI webpage: http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?cmd=show&f=formats&m=doc&s=formats#sff I'm happy to answer questions on how the file format works (including the undocumented index block which I had to reverse engineer). Peter P.S. Just to clarify (from the old BioJava thread), the SFF file just holds the raw reads - it is an input file for doing an assembly or mapping. From holland at eaglegenomics.com Tue Feb 9 02:34:32 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 9 Feb 2010 20:34:32 +1300 Subject: [Biojava-l] Hibernate Exception and suggestion for change in BioSqlSchema In-Reply-To: <4B710CED.5060404@gmail.com> References: <4B710CED.5060404@gmail.com> Message-ID: <9BB14CC0-4A89-4DC8-8928-C8475108B54A@eaglegenomics.com> Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text. However, in answer to your two questions: 1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March). 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from). cheers, Richard On 9 Feb 2010, at 20:21, Deepak Sheoran wrote: > > Hi Richard > > Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message. > > > Thanks > Deepak Sheoran > -------- Original Message -------- > Subject: Hibernate Exception and suggestion for change in BioSqlSchema > Date: Wed, 03 Feb 2010 08:07:35 -0600 > From: Deepak Sheoran > To: biojava-l at lists.open-bio.org > > Hi guys, > > A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is: http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html > On Richard suggestion in above link I am able to resolve some of issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us. > ? The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id. > This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object . > Now when you tie RichObjectFactory to a active hibernate session then the class "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database. > But problem is with below part of that method: > ?..LineNumber: 114 > else if (SimpleDocRef.class.isAssignableFrom(clazz)) > { queryType = "DocRef"; > // convert List constructor to String representation for query > ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true)); > if (ourParamsList.size()<3) { > queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null"; > } else { > queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?"; > } > } > ..LineNubmer: 123 > Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code > ?.LineNumber: 447 > else { > try { > CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)}); > RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount); > rlistener.getCurrentFeature().addRankedCrossRef(rcr); > } catch (ChangeVetoException e) { > throw new ParseException(e+", accession:"+accession); > } > } > ?..LineNumber:455 > Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of "unique constraint on dbxref_id" column. > > The only way to get these record in database is: > ? The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table. Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them. > ? Second solution is slightly difficult to implement, is to change the way "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)" make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session. > > Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email) > Reference_id > Dbxref_id > Location > Title > Authors > crc > 216 > 18554304 > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008) > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H. > 9E940E01F4BE3CD0 > 230 > 18554304 > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008) > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H. > D3BC0C17F3F786C9 > 415 > 16790744 > Infect. Immun. 74 (7), 3715-3726 (2006) > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A. > 60AEDFA0CEEACC38 > 969 > 16790744 > Infect. Immun. 74 (7), 3715-3726 (2006) > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A. > 4B1232999F6E8130 > 929 > 8688087 > Science 273 (5278), 1058-1073 (1996) > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. > 3E79B40DD2AAA2B7 > 932 > 8688087 > Science 273 (5278), 1058-1073 (1996) > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. > 094EB3384F8D6DE8 > 1426 > 10684935 > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39 > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M. > 357648D8FD8C6C8A > 1481 > 10684935 > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39 > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C. > 115411EB2DEE5654 > 1497 > 14689165 > Arch. Microbiol. 181 (2), 144-154 (2004) > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E. > 4D5D376EECCD186B > 1501 > 14689165 > Arch. Microbiol. 181 (2), 144-154 (2004) > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E. > 4D57954EECDED66B > 1556 > 18060065 > PLoS ONE 2 (12), E1271 (2007) > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > 698688FB6DB95247 > 1559 > 18060065 > PLoS ONE 2 (12), E1271 (2007) > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > E25E1BA99DB18F3D > > ? The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature > ? Which means in richsequence object some feature have location object which have its feature set to null. > ? My Observation: > ? Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record > ? After catching the hibernate exception I went through all the features and either biojava or hibernate changed the object type of a CompoundRichLocation to SimpleRichLocation and set the feature variable to null. > ? Below is the screen shot of one of my tests > ? Settings before trying to persits the richsequence object to database > > > ? > ? After trying to persits the richsequence object to database and got in hibernate exception catch > > ? > > ? So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening. > ? Some extra information to make things more clear to you guys. > ? Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object. > ? LOCUS AE001439 1643831 bp DNA circular BCT 19-JAN-2006 > ? richSequence.feature Index : 2540 and line number in the genbank record : 22115 > ? LOCUS CP001189 3887492 bp DNA circular BCT 16-OCT-2008 > ? richSequence.feature Index : 127 and line number in the genbank record : 2137 > ? LOCUS CP001292 328635 bp DNA circular BCT 17-DEC-2008 > ? richSequence.feature Index : 389 and line number in the genbank record : 3632 > ? LOCUS AM279694 238517 bp DNA linear BCT 23-OCT-2008 > ? richSequence.feature Index : 47 and line number in the genbank record : 4841 > ? LOCUS CR931663 18517 bp DNA linear BCT 18-SEP-2008 > ? richSequence.feature Index : 45 and line number in the genbank record : 442 > ? The complete exception msg : > org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature > at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72) > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290) > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535) > at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523) > at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78) > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From anjolou at hotmail.com Tue Feb 9 10:17:57 2010 From: anjolou at hotmail.com (Louise Ott) Date: Tue, 9 Feb 2010 16:17:57 +0100 Subject: [Biojava-l] Displaying a simple alignment blastoutput-like Message-ID: Hi all ! I read and followed all the advices i found on the net and in this mailing list and i came to this : public class OtherAlignmentPanel extends TranslatedSequencePanel { public otherAlignmentPanel(Alignment ali) { this.setSequence((SymbolList) ali); this.setBackground(Color.white); this.setPreferredSize(new Dimension(MainFrame.screenSize.width -40, MainFrame.screenSize.height - 15)); MultiLineRenderer multi = new MultiLineRenderer(); AlignmentRenderer render1 = new AlignmentRenderer(); AlignmentRenderer render2 = new AlignmentRenderer(); AlignmentRenderer render3 = new AlignmentRenderer(); SymbolSequenceRenderer symbol = new SymbolSequenceRenderer(); render1.setLabel(ali.getLabels().get(0)); render1.setRenderer(symbol); multi.addRenderer(render1); render2.setLabel(ali.getLabels().get(1)); render2.setRenderer(symbol); multi.addRenderer(render2); render3.setLabel(ali.getLabels().get(2)); render3.setRenderer(symbol); multi.addRenderer(render3); this.setRenderer(multi); } } I created the panel in a frame like this : Sequence seq1; seq1 = ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspQseq(), "query"); Sequence seq2; seq2 = ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspHseq(), "hit"); Sequence seq3; seq3 = ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspMidline(), "midline"); Map list = new HashMap(); list.put("query", seq1); list.put("middle", seq2); list.put("hit", seq3); SimpleAlignment ali = new SimpleAlignment((Map) list); OtherAlignmentPanel pane = new otherAlignmentPanel(ali); this.add(pane); My problems :- First, it just shows nothing ! I am totally lost, it has been 3 days i am just trying to display this simple alignment but it doesn't work !!- I really would like the lines to be wrapped like in the blast output, so i tryied with the SequencePanelWrapper, but it doesn't work either... Is there any simple solution ??What did i do wrong in my code ? Have a nice day and thanks ! Louise _________________________________________________________________ D?couvrez Windows 7 en 7 secondes?! http://clk.atdmt.com/FRM/go/181574577/direct/01/ From jolyon.holdstock at ogt.co.uk Tue Feb 9 10:40:54 2010 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Tue, 9 Feb 2010 15:40:54 -0000 Subject: [Biojava-l] Displaying a simple alignment blastoutput-like[Scanned] References: Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F02D2D448@EUCLID.internal.ogtip.com> Does nothing become visible, if so how are you calling the panels/frame to display? Can you post all your code? -----Original Message----- From: Louise Ott [mailto:anjolou at hotmail.com] Sent: 09 February 2010 15:18 To: biojava-l at lists.open-bio.org Subject: [Biojava-l] Displaying a simple alignment blastoutput-like[Scanned] Hi all ! I read and followed all the advices i found on the net and in this mailing list and i came to this : public class OtherAlignmentPanel extends TranslatedSequencePanel { public otherAlignmentPanel(Alignment ali) { this.setSequence((SymbolList) ali); this.setBackground(Color.white); this.setPreferredSize(new Dimension(MainFrame.screenSize.width -40, MainFrame.screenSize.height - 15)); MultiLineRenderer multi = new MultiLineRenderer(); AlignmentRenderer render1 = new AlignmentRenderer(); AlignmentRenderer render2 = new AlignmentRenderer(); AlignmentRenderer render3 = new AlignmentRenderer(); SymbolSequenceRenderer symbol = new SymbolSequenceRenderer(); render1.setLabel(ali.getLabels().get(0)); render1.setRenderer(symbol); multi.addRenderer(render1); render2.setLabel(ali.getLabels().get(1)); render2.setRenderer(symbol); multi.addRenderer(render2); render3.setLabel(ali.getLabels().get(2)); render3.setRenderer(symbol); multi.addRenderer(render3); this.setRenderer(multi); } } I created the panel in a frame like this : Sequence seq1; seq1 = ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspQseq(), "query"); Sequence seq2; seq2 = ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspHseq(), "hit"); Sequence seq3; seq3 = ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspMidline(), "midline"); Map list = new HashMap(); list.put("query", seq1); list.put("middle", seq2); list.put("hit", seq3); SimpleAlignment ali = new SimpleAlignment((Map) list); OtherAlignmentPanel pane = new otherAlignmentPanel(ali); this.add(pane); My problems :- First, it just shows nothing ! I am totally lost, it has been 3 days i am just trying to display this simple alignment but it doesn't work !!- I really would like the lines to be wrapped like in the blast output, so i tryied with the SequencePanelWrapper, but it doesn't work either... Is there any simple solution ??What did i do wrong in my code ? Have a nice day and thanks ! Louise _________________________________________________________________ D?couvrez Windows 7 en 7 secondes ! http://clk.atdmt.com/FRM/go/181574577/direct/01/ _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l This email has been scanned by Oxford Gene Technology Security Systems. From anjolou at hotmail.com Tue Feb 9 11:47:47 2010 From: anjolou at hotmail.com (Louise Ott) Date: Tue, 9 Feb 2010 17:47:47 +0100 Subject: [Biojava-l] Displaying a simple alignment blastoutput-like[Scanned] In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3F02D2D448@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3F02D2D448@EUCLID.internal.ogtip.com> Message-ID: Posting all the code won't help because everything is visible except the object of OtherAlignmentPanel. I already tested the same code but with a SequencePanel and a SequencePanelWrapper added to my frame (with one sequence to display) and it worked fine. So if there is a problem, I think this can be only in the code I already posted. (And the code is really really big, is would be hard to read) Sorry for my poor english, i'm french, I don't always use the good words to explain my problem :) -------------------------------------------------- From: "Jolyon Holdstock" Sent: Tuesday, February 09, 2010 4:40 PM To: "Louise Ott" ; Subject: RE: [Biojava-l] Displaying a simple alignment blastoutput-like[Scanned] > Does nothing become visible, if so how are you calling the panels/frame to > display? > > Can you post all your code? > > -----Original Message----- > From: Louise Ott [mailto:anjolou at hotmail.com] > Sent: 09 February 2010 15:18 > To: biojava-l at lists.open-bio.org > Subject: [Biojava-l] Displaying a simple alignment > blastoutput-like[Scanned] > > > Hi all ! > I read and followed all the advices i found on the net and in this mailing > list and i came to this : > > public class OtherAlignmentPanel extends TranslatedSequencePanel { > public otherAlignmentPanel(Alignment ali) { > this.setSequence((SymbolList) ali); > this.setBackground(Color.white); this.setPreferredSize(new > Dimension(MainFrame.screenSize.width -40, MainFrame.screenSize.height - > 15)); > MultiLineRenderer multi = new MultiLineRenderer(); > AlignmentRenderer render1 = new AlignmentRenderer(); > AlignmentRenderer render2 = new AlignmentRenderer(); > AlignmentRenderer render3 = new AlignmentRenderer(); > SymbolSequenceRenderer symbol = new SymbolSequenceRenderer(); > render1.setLabel(ali.getLabels().get(0)); > render1.setRenderer(symbol); > multi.addRenderer(render1); > render2.setLabel(ali.getLabels().get(1)); > render2.setRenderer(symbol); > multi.addRenderer(render2); > render3.setLabel(ali.getLabels().get(2)); > render3.setRenderer(symbol); > multi.addRenderer(render3); > this.setRenderer(multi); } > } > > I created the panel in a frame like this : > Sequence seq1; seq1 = > ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspQseq(), > "query"); > Sequence seq2; seq2 = > ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspHseq(), > "hit"); > Sequence seq3; seq3 = > ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspMidline(), > "midline"); > Map list = new HashMap(); > list.put("query", seq1); list.put("middle", seq2); > list.put("hit", seq3); > SimpleAlignment ali = new SimpleAlignment((Map) list); > OtherAlignmentPanel pane = new otherAlignmentPanel(ali); this.add(pane); > > > > My problems :- First, it just shows nothing ! I am totally lost, it has > been 3 days i am just trying to display this simple alignment but it > doesn't work !!- I really would like the lines to be wrapped like in the > blast output, so i tryied with the SequencePanelWrapper, but it doesn't > work either... > Is there any simple solution ??What did i do wrong in my code ? > > > Have a nice day and thanks ! > Louise > > _________________________________________________________________ > D?couvrez Windows 7 en 7 secondes ! > http://clk.atdmt.com/FRM/go/181574577/direct/01/ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > > > > > > > This email has been scanned by Oxford Gene Technology Security Systems. > > > > > > > > > > > > This email has been scanned by Oxford Gene Technology Security Systems. > From anjolou at hotmail.com Tue Feb 9 11:47:47 2010 From: anjolou at hotmail.com (Louise Ott) Date: Tue, 9 Feb 2010 17:47:47 +0100 Subject: [Biojava-l] Displaying a simple alignment blastoutput-like[Scanned] In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3F02D2D448@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3F02D2D448@EUCLID.internal.ogtip.com> Message-ID: Posting all the code won't help because everything is visible except the object of OtherAlignmentPanel. I already tested the same code but with a SequencePanel and a SequencePanelWrapper added to my frame (with one sequence to display) and it worked fine. So if there is a problem, I think this can be only in the code I already posted. (And the code is really really big, is would be hard to read) Sorry for my poor english, i'm french, I don't always use the good words to explain my problem :) -------------------------------------------------- From: "Jolyon Holdstock" Sent: Tuesday, February 09, 2010 4:40 PM To: "Louise Ott" ; Subject: RE: [Biojava-l] Displaying a simple alignment blastoutput-like[Scanned] > Does nothing become visible, if so how are you calling the panels/frame to > display? > > Can you post all your code? > > -----Original Message----- > From: Louise Ott [mailto:anjolou at hotmail.com] > Sent: 09 February 2010 15:18 > To: biojava-l at lists.open-bio.org > Subject: [Biojava-l] Displaying a simple alignment > blastoutput-like[Scanned] > > > Hi all ! > I read and followed all the advices i found on the net and in this mailing > list and i came to this : > > public class OtherAlignmentPanel extends TranslatedSequencePanel { > public otherAlignmentPanel(Alignment ali) { > this.setSequence((SymbolList) ali); > this.setBackground(Color.white); this.setPreferredSize(new > Dimension(MainFrame.screenSize.width -40, MainFrame.screenSize.height - > 15)); > MultiLineRenderer multi = new MultiLineRenderer(); > AlignmentRenderer render1 = new AlignmentRenderer(); > AlignmentRenderer render2 = new AlignmentRenderer(); > AlignmentRenderer render3 = new AlignmentRenderer(); > SymbolSequenceRenderer symbol = new SymbolSequenceRenderer(); > render1.setLabel(ali.getLabels().get(0)); > render1.setRenderer(symbol); > multi.addRenderer(render1); > render2.setLabel(ali.getLabels().get(1)); > render2.setRenderer(symbol); > multi.addRenderer(render2); > render3.setLabel(ali.getLabels().get(2)); > render3.setRenderer(symbol); > multi.addRenderer(render3); > this.setRenderer(multi); } > } > > I created the panel in a frame like this : > Sequence seq1; seq1 = > ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspQseq(), > "query"); > Sequence seq2; seq2 = > ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspHseq(), > "hit"); > Sequence seq3; seq3 = > ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspMidline(), > "midline"); > Map list = new HashMap(); > list.put("query", seq1); list.put("middle", seq2); > list.put("hit", seq3); > SimpleAlignment ali = new SimpleAlignment((Map) list); > OtherAlignmentPanel pane = new otherAlignmentPanel(ali); this.add(pane); > > > > My problems :- First, it just shows nothing ! I am totally lost, it has > been 3 days i am just trying to display this simple alignment but it > doesn't work !!- I really would like the lines to be wrapped like in the > blast output, so i tryied with the SequencePanelWrapper, but it doesn't > work either... > Is there any simple solution ??What did i do wrong in my code ? > > > Have a nice day and thanks ! > Louise > > _________________________________________________________________ > D?couvrez Windows 7 en 7 secondes ! > http://clk.atdmt.com/FRM/go/181574577/direct/01/ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > > > > > > > This email has been scanned by Oxford Gene Technology Security Systems. > > > > > > > > > > > > This email has been scanned by Oxford Gene Technology Security Systems. > From sheoran143 at gmail.com Thu Feb 11 04:00:15 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Thu, 11 Feb 2010 03:00:15 -0600 Subject: [Biojava-l] Issues with BioSqlRichSequenceDB.java class Message-ID: <4B73C71F.1080706@gmail.com> Hi This class(BiosqlRichSequence) have methods to retrieve record from a local instance of biosql schema but when you type in accession number for record it mostly show the info but in some case (Record with accession:M97762) it give following error : Hibernate: select sequence0_.bioentry_id as bioentry1_9_, sequence0_1_.name as name9_, sequence0_1_.identifier as identifier9_, sequence0_1_.accession as accession9_, sequence0_1_.description as descript5_9_, sequence0_1_.version as version9_, sequence0_1_.division as division9_, sequence0_1_.taxon_id as taxon8_9_, sequence0_1_.biodatabase_id as biodatab9_9_, sequence0_.version as version13_, sequence0_.length as length13_, sequence0_.alphabet as alphabet13_, sequence0_.seq as seq13_ from biosequence sequence0_ inner join bioentry sequence0_1_ on sequence0_.bioentry_id=sequence0_1_.bioentry_id where sequence0_1_.name=? Exception in thread "main" java.lang.RuntimeException: Error while trying to load by id: M97762 at org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.getRichSequence(BioSQLRichSequenceDB.java:212) at com.orionbiosciences.orionGenBankLib.genBankDb.GenBankDb.GenBankDbToFileDownLoader(GenBankDb.java:355) at trashtesting.Main.main(Main.java:39) Caused by: org.biojava.bio.seq.db.IllegalIDException: Id not found: M97762 at org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.getRichSequence(BioSQLRichSequenceDB.java:206) ... 2 more Java Result: 1 The only way to find this record in my database is to search for LOCUS instead of Accession number which is "BTVNS1TUBA", java doc for BioSqlRichSequenceDb class say the id should be Genbank Id i can't understand what does that means, but when investigated the matter the error is in following method public RichSequenceDB getRichSequences(Set ids, RichSequenceDB db) throws BioException, IllegalIDException { if (db==null) db = new HashRichSequenceDB(); try { for (Iterator i = ids.iterator(); i.hasNext(); ) { String id = (String)i.next(); // Build the query object ***************************error******************* String queryText = "from Sequence where name = ?"; ***************************error*********************** *****************************solution************************** String queryText = "from Sequence where accession = ?"; // because name stand for Locus from gen-bank record which don't have any unique constraint name so its should not be good idea to use it for searching unique records // also people usually refer to a gen-bank record using accession number instead of LOCUS *****************************solution****************************** Object query = this.createQuery.invoke(this.session, new Object[]{queryText}); // Set the parameters query = this.setParameter.invoke(query, new Object[]{new Integer(0), id}); // Get the results List result = (List)this.list.invoke(query,(Object[]) null); // If the result doesn't just have a single entry, throw an exception if (result.size()==0) throw new IllegalIDException("Id not found: "+id); // Add the results to the results db. for (Iterator j = result.iterator(); j.hasNext(); ) db.addRichSequence((RichSequence)j.next()); } } catch (Exception e) { // Throw the exception with our nice message throw new RuntimeException("Error while trying to load by ids: "+ids,e); } return db; } even ncbi says " It is better to search for the actual accession number rather than the locus name, because the accessions are stable and locus names can change." REF: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#LocusNameB So my suggestion is to change the query so it will look for accession instead of name in this method. Also if you will try to download record from ncbi using java interface first with accession:M97762( as genbank_id) you can get it, but when you try to get using LOCUS you will get bad section exception around reference I don't know why ? Deepak Sheoran From holland at eaglegenomics.com Thu Feb 11 17:18:18 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 12 Feb 2010 11:18:18 +1300 Subject: [Biojava-l] Issues with BioSqlRichSequenceDB.java class In-Reply-To: <4B73C71F.1080706@gmail.com> References: <4B73C71F.1080706@gmail.com> Message-ID: <4ACB7402-D8DB-4CEF-AC70-BE228312EB37@eaglegenomics.com> My preference would be to leave the existing method as-is, and modify the javadocs so that it is explicit that it is searching by name and not accession. This is to prevent breaking any code that may rely on this behaviour (as Genbank is not the only kind of sequence that can be stored in BioSQL, we can't guarantee that other sequence types are not using name as the unique identifier instead). Instead, I would propose adding a second method called getRichSequencesByAccession, with the modification you suggest. With regard to your last point about bad section errors, could you post the stack trace and the code that causes it? cheers, Richard On 11 Feb 2010, at 22:00, Deepak Sheoran wrote: > Hi > This class(BiosqlRichSequence) have methods to retrieve record from a local instance of biosql schema but when you type in accession number for record it mostly show the info but in some case (Record with accession:M97762) it give following error : > > Hibernate: select sequence0_.bioentry_id as bioentry1_9_, sequence0_1_.name as name9_, sequence0_1_.identifier as identifier9_, sequence0_1_.accession as accession9_, sequence0_1_.description as descript5_9_, sequence0_1_.version as version9_, sequence0_1_.division as division9_, sequence0_1_.taxon_id as taxon8_9_, sequence0_1_.biodatabase_id as biodatab9_9_, sequence0_.version as version13_, sequence0_.length as length13_, sequence0_.alphabet as alphabet13_, sequence0_.seq as seq13_ from biosequence sequence0_ inner join bioentry sequence0_1_ on sequence0_.bioentry_id=sequence0_1_.bioentry_id where sequence0_1_.name=? > Exception in thread "main" java.lang.RuntimeException: Error while trying to load by id: M97762 > at org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.getRichSequence(BioSQLRichSequenceDB.java:212) > at com.orionbiosciences.orionGenBankLib.genBankDb.GenBankDb.GenBankDbToFileDownLoader(GenBankDb.java:355) > at trashtesting.Main.main(Main.java:39) > Caused by: org.biojava.bio.seq.db.IllegalIDException: Id not found: M97762 > at org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.getRichSequence(BioSQLRichSequenceDB.java:206) > ... 2 more > Java Result: 1 > > The only way to find this record in my database is to search for LOCUS instead of Accession number which is "BTVNS1TUBA", java doc for BioSqlRichSequenceDb class say the id should be Genbank Id i can't understand what does that means, but when investigated the matter the error is in following method > > public RichSequenceDB getRichSequences(Set ids, RichSequenceDB db) throws BioException, IllegalIDException { > if (db==null) db = new HashRichSequenceDB(); > try { > for (Iterator i = ids.iterator(); i.hasNext(); ) { > String id = (String)i.next(); > // Build the query object > ***************************error******************* > String queryText = "from Sequence where name = ?"; > ***************************error*********************** > *****************************solution************************** > String queryText = "from Sequence where accession = ?"; > // because name stand for Locus from gen-bank record which don't have any unique constraint name so its should not be good idea to use it for searching unique records > // also people usually refer to a gen-bank record using accession number instead of LOCUS > *****************************solution****************************** > Object query = this.createQuery.invoke(this.session, new Object[]{queryText}); > // Set the parameters > query = this.setParameter.invoke(query, new Object[]{new Integer(0), id}); > // Get the results > List result = (List)this.list.invoke(query,(Object[]) null); > // If the result doesn't just have a single entry, throw an exception > if (result.size()==0) throw new IllegalIDException("Id not found: "+id); > // Add the results to the results db. > for (Iterator j = result.iterator(); j.hasNext(); ) db.addRichSequence((RichSequence)j.next()); > } > } catch (Exception e) { > // Throw the exception with our nice message > throw new RuntimeException("Error while trying to load by ids: "+ids,e); > } > return db; > } > > even ncbi says " It is better to search for the actual accession number rather than the locus name, because the accessions are stable and locus names can change." > REF: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#LocusNameB > > So my suggestion is to change the query so it will look for accession instead of name in this method. > Also if you will try to download record from ncbi using java interface first with accession:M97762( as genbank_id) you can get it, but when you try to get using LOCUS you will get bad section exception around reference I don't know why ? > > > Deepak Sheoran > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From jeedward at yahoo.com Fri Feb 12 12:01:32 2010 From: jeedward at yahoo.com (John Edward) Date: Fri, 12 Feb 2010 09:01:32 -0800 (PST) Subject: [Biojava-l] Draft paper submission deadline is extended: BCBGC-10, Orlando, USA Message-ID: <767436.37495.qm@web45906.mail.sp1.yahoo.com> It would be highly appreciated if you could share this announcement with your colleagues, students and individuals whose research is in bioinformatics, computational biology, genomics, data-mining, and related areas. Draft paper submission deadline is extended: BCBGC-10, Orlando, USA The 2010 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) (website: http://www.PromoteResearch.org) will be held during 12-14 of July 2010 in Orlando, FL, USA. BCBGC is an important event in the areas of bioinformatics, computational biology, genomics and chemoinformatics and focuses on all areas related to the conference. The conference will be held at the same time and location where several other major international conferences will be taking place. The conference will be held as part of 2010 multi-conference (MULTICONF-10). MULTICONF-10 will be held during July 12-14, 2010 in Orlando, Florida, USA. The primary goal of MULTICONF is to promote research and developmental activities in computer science, information technology, control engineering, and related fields. Another goal is to promote the dissemination of research to a multidisciplinary audience and to facilitate communication among researchers, developers, practitioners in different fields.The following conferences are planned to be organized as part of MULTICONF-10. * International Conference on Artificial Intelligence and Pattern Recognition (AIPR-10) * International Conference on Automation, Robotics and Control Systems (ARCS-10) * International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) * International Conference on Computer Communications and Networks (CCN-10) * International Conference on Enterprise Information Systems and Web Technologies (EISWT-10) * International Conference on High Performance Computing Systems (HPCS-10) * International Conference on Information Security and Privacy (ISP-10) * International Conference on Image and Video Processing and Computer Vision (IVPCV-10) * International Conference on Software Engineering Theory and Practice (SETP-10) * International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-10) MULTICONF-10 will be held at Imperial Swan Hotel and Suites. It is a full-service resort that puts you in the middle of the fun! Located 1/2 block south of the famed International Drive, the hotel is just minutes from great entertainment like Walt Disney World? Resort, Universal Studios and Sea World Orlando. Guests can enjoy free scheduled transportation to these theme parks, as well as spacious accommodations, outdoor pools and on-site dining ? all situated on 10 tropically landscaped acres. Here, guests can experience a full-service resort with discount hotel pricing in Orlando. We invite draft paper submissions. Please see the website http://www.PromoteResearch.org for more details. Sincerely John Edward From koen.bruynseels at cropdesign.com Sat Feb 13 12:13:54 2010 From: koen.bruynseels at cropdesign.com (koen.bruynseels at cropdesign.com) Date: Sat, 13 Feb 2010 18:13:54 +0100 Subject: [Biojava-l] Koen Bruynseels is out of the office. Message-ID: I will be out of the office starting 02/13/2010 and will not return until 02/21/2010. I will respond to your message when I return. From charles at imbusch.net Mon Feb 15 17:32:28 2010 From: charles at imbusch.net (Charles Imbusch) Date: Mon, 15 Feb 2010 23:32:28 +0100 Subject: [Biojava-l] .sff support In-Reply-To: <320fb6e01002081659u793228d1g17abb4f8e0100837@mail.gmail.com> References: <4B703AF3.4000300@imbusch.net> <56be91b61002081324t3423359dm917b283c6a1f2474@mail.gmail.com> <320fb6e01002081659u793228d1g17abb4f8e0100837@mail.gmail.com> Message-ID: <4B79CB7C.3040008@imbusch.net> Hi all, I've been playing around with the sff file based on the file format definition at NCBI. I uploaded the output which includes the common header, the read header and read data section for the first read of that file. http://home.arcor.de/cimbusch/output.txt > I'm happy to answer questions on how the file format works > (including the undocumented index block which I had to reverse > engineer). > Yes, I would like to know how that works. index_magic_number:778921588 .mft version:1.00 Couldn't find anything about ".mft" version 1. At the moment I have two classes: sffParser and sffFile My idea was that sffParser can hold one or multiple sff files. Each instance of sffFile has a hashtable with the identifiers as keys and the filepointers are stored as the values. Now I would like to find a good representation of one single "read" object, which shall be accessible with an identifier like EV5RTWS02JXUUH At the moment I'm making use of the BigInteger class to store many variables but thats probably a waste of memory. The variables for the read object I'm thinking of: Read Header Section: read_header_length -> int name_length -> int number_of_bases -> int clip_qual_left -> int clip_qual_right -> int clip_adapter_left -> int clip_adapter_right -> int name -> string Read Data Section: flowgram_values -> float[] flow_index_per_base -> int[] bases: -> string quality_scores -> int[] But I'm not very familiar with the existing data structures of BioJava, is there maybe already something similar? Cheers, Charles From biopython at maubp.freeserve.co.uk Mon Feb 22 06:35:11 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Feb 2010 11:35:11 +0000 Subject: [Biojava-l] .sff support In-Reply-To: <4B79CB7C.3040008@imbusch.net> References: <4B703AF3.4000300@imbusch.net> <56be91b61002081324t3423359dm917b283c6a1f2474@mail.gmail.com> <320fb6e01002081659u793228d1g17abb4f8e0100837@mail.gmail.com> <4B79CB7C.3040008@imbusch.net> Message-ID: <320fb6e01002220335x6899a44bl68789cd4d7d772e3@mail.gmail.com> On Mon, Feb 15, 2010 at 10:32 PM, Charles Imbusch wrote: > > Hi all, > > I've been playing around with the sff file based on the file > format definition at NCBI. > I uploaded the output which includes the common header, > the read header and read data section for the first read > of that file. > > http://home.arcor.de/cimbusch/output.txt Looks like you've been making excellent progress :) Sorry for the delay in my reply, I was on leave last week (and without internet access for most of it). >> I'm happy to answer questions on how the file format works >> (including the undocumented index block which I had to reverse >> engineer). >> > > Yes, I would like to know how that works. > index_magic_number:778921588 .mft > version:1.00 > Couldn't find anything about ".mft" version 1. I believe ".mft" stands for "Manifest format", and Roche 454 use this block to hold both a read index and an XML string (the manifest). Immediately after the ".mft1.00" string are two longs which give the lengths of the XML string and the actual index data. Then comes the XML manifest string, followed by the actual index data (same format as Roche's older ".srt" index only block, uses base 256). Note the Biopython SFF code has now been merged into our trunk: http://github.com/biopython/biopython/blob/master/Bio/SeqIO/SffIO.py > At the moment I have two classes: sffParser and sffFile > My idea was that sffParser can hold one or multiple sff files. > Each instance of sffFile has a hashtable with the identifiers > as keys and the filepointers are stored as the values. Not all SFF files will have an index, but the Roche .srt and .mft index blocks will let you map from the ID to the offset. I take advantage of this in Biopython for our Bio.SeqIO.index(...) functionality with a slower fall back on scanning the file to build the index if the index information is missing (or in an unsupported format). The Biopython index code then uses a Python dictionary (hash) to hold the mapping from read name to file offset. See also: http://github.com/biopython/biopython/blob/master/Bio/SeqIO/_index.py > Now I would like to find a good representation of one single "read" object, > which shall be accessible with an identifier like EV5RTWS02JXUUH I think this is a Java question, so not my area of expertise. Peter From jeedward at yahoo.com Mon Feb 22 15:37:23 2010 From: jeedward at yahoo.com (John Edward) Date: Mon, 22 Feb 2010 12:37:23 -0800 (PST) Subject: [Biojava-l] Call for papers: BCBGC-10, Orlando, USA, July 2010 Message-ID: <472328.10215.qm@web45916.mail.sp1.yahoo.com> It would be highly appreciated if you could share this announcement with your colleagues, students and individuals whose research is in bioinformatics, computational biology, genomics, data-mining, and related areas. Call for papers: BCBGC-10, Orlando, USA, July 2010 The 2010 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) (website: http://www.PromoteResearch.org ) will be held during 12-14 of July 2010 in Orlando, FL, USA. BCBGC is an important event in the areas of bioinformatics, computational biology, genomics and chemoinformatics and focuses on all areas related to the conference. The conference will be held at the same time and location where several other major international conferences will be taking place. The conference will be held as part of 2010 multi-conference (MULTICONF-10). MULTICONF-10 will be held during July 12-14, 2010 in Orlando, Florida, USA. The primary goal of MULTICONF is to promote research and developmental activities in computer science, information technology, control engineering, and related fields. Another goal is to promote the dissemination of research to a multidisciplinary audience and to facilitate communication among researchers, developers, practitioners in different fields. The following conferences are planned to be organized as part of MULTICONF-10. ? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-10) ? International Conference on Automation, Robotics and Control Systems (ARCS-10) ? International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) ? International Conference on Computer Communications and Networks (CCN-10) ? International Conference on Enterprise Information Systems and Web Technologies (EISWT-10) ? International Conference on High Performance Computing Systems (HPCS-10) ? International Conference on Information Security and Privacy (ISP-10) ? International Conference on Image and Video Processing and Computer Vision (IVPCV-10) ? International Conference on Software Engineering Theory and Practice (SETP-10) ? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-10) MULTICONF-10 will be held at Imperial Swan Hotel and Suites. It is a full-service resort that puts you in the middle of the fun! Located 1/2 block south of the famed International Drive, the hotel is just minutes from great entertainment like Walt Disney World? Resort, Universal Studios and Sea World Orlando. Guests can enjoy free scheduled transportation to these theme parks, as well as spacious accommodations, outdoor pools and on-site dining ? all situated on 10 tropically landscaped acres. Here, guests can experience a full-service resort with discount hotel pricing in Orlando. We invite draft paper submissions. Please see the website http://www.PromoteResearch.org for more details. Sincerely John Edward From charles at imbusch.net Thu Feb 25 05:08:38 2010 From: charles at imbusch.net (Charles Imbusch) Date: Thu, 25 Feb 2010 11:08:38 +0100 Subject: [Biojava-l] .sff support In-Reply-To: <320fb6e01002220335x6899a44bl68789cd4d7d772e3@mail.gmail.com> References: <4B703AF3.4000300@imbusch.net> <56be91b61002081324t3423359dm917b283c6a1f2474@mail.gmail.com> <320fb6e01002081659u793228d1g17abb4f8e0100837@mail.gmail.com> <4B79CB7C.3040008@imbusch.net> <320fb6e01002220335x6899a44bl68789cd4d7d772e3@mail.gmail.com> Message-ID: <4B864C26.3050709@imbusch.net> Dear Peter, thanks for your mail. I will try to make use of that index to speed things up when I have time available. Cheers, Charles Peter schrieb: > I believe ".mft" stands for "Manifest format", and Roche 454 use this > block to hold both a read index and an XML string (the manifest). > Immediately after the ".mft1.00" string are two longs which give the > lengths of the XML string and the actual index data. Then comes > the XML manifest string, followed by the actual index data (same > format as Roche's older ".srt" index only block, uses base 256). > > Note the Biopython SFF code has now been merged into our trunk: > http://github.com/biopython/biopython/blob/master/Bio/SeqIO/SffIO.py > From biopython at maubp.freeserve.co.uk Fri Feb 26 08:33:19 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Feb 2010 13:33:19 +0000 Subject: [Biojava-l] .sff support In-Reply-To: <4B864C26.3050709@imbusch.net> References: <4B703AF3.4000300@imbusch.net> <56be91b61002081324t3423359dm917b283c6a1f2474@mail.gmail.com> <320fb6e01002081659u793228d1g17abb4f8e0100837@mail.gmail.com> <4B79CB7C.3040008@imbusch.net> <320fb6e01002220335x6899a44bl68789cd4d7d772e3@mail.gmail.com> <4B864C26.3050709@imbusch.net> Message-ID: <320fb6e01002260533y148936fdg36a5c8c814deb141@mail.gmail.com> On Thu, Feb 25, 2010 at 10:08 AM, Charles Imbusch wrote: > > Dear Peter, > > thanks for your mail. I will try to make use of that index > to speed things up when I have time available. > > Cheers, > ?Charles Hi Charles, If found when you want random access to the reads, loading the provided .mft or .srt index is MUCH faster than scanning the whole file to build the index manually. So this really is worth the effort. I hope the comments in my code are reasonably clear, but to recap the key idea of the index block is you get chunks of data of varying length (although typically all the same length since by default all the Roche reads have the same read length) like this name, null char, four character offset, terminator char of 0xFF You divide the index block into entries for each read by finding the 0xFF terminators. Because 0xFF (decimal 255) is used in this way, it cannot be used to encode the offsets which must only use 0x00 to 0xFE (decimal 0 to 254). The offset therefore uses base 255 instead of base 256. Note that this means that the largest offset the current Roche index blocks can hold is 255^4, or a little under 4GB. If you use the Roche tools to try and merge SFF files to make an example SFF file over 4GB you get a warning that there will be no index (and no manifest). The index holds the reads sorted alphabetically by name. We don't take advantage of this in Biopython since I use a Python dictionary (like a Perl hash) to store the offsets. In case you missed them, I'd like to draw your attention to the SFF files I am using in the Biopython unit tests: http://github.com/biopython/biopython/tree/master/Tests/Roche/ Regards, Peter From sheoran143 at gmail.com Wed Feb 3 09:07:44 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Wed, 03 Feb 2010 14:07:44 -0000 Subject: [Biojava-l] Hibernate Exception and suggestion for change in BioSqlSchema Message-ID: <4B698327.3080203@gmail.com> Hi guys, A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is: http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html On Richard suggestion in above link I am able to resolve some of issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us. 1. The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id. This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object . Now when you tie RichObjectFactory to a active hibernate session then the class "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database. But problem is with below part of that method: .....LineNumber: 114 else if (SimpleDocRef.class.isAssignableFrom(clazz)) { queryType = "DocRef"; // convert List constructor to String representation for query ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true)); if (ourParamsList.size()<3) { queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null"; } else { queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?"; } } ..LineNubmer: 123 Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code ....LineNumber: 447 else { try { CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)}); RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount); rlistener.getCurrentFeature().addRankedCrossRef(rcr); } catch (ChangeVetoException e) { throw new ParseException(e+", accession:"+accession); } } .....LineNumber:455 Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of "unique constraint on dbxref_id" column. The only way to get these record in database is: * The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table. Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them. * Second solution is slightly difficult to implement, is to change the way "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)" make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session. Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world) Reference_id Dbxref_id Location Title Authors crc 216 18554304 FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008) Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H. 9E940E01F4BE3CD0 230 18554304 FEMS Microbiol. Ecol. 66 (3), 528-536 (2008) Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H. D3BC0C17F3F786C9 415 16790744 Infect. Immun. 74 (7), 3715-3726 (2006) Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A. 60AEDFA0CEEACC38 969 16790744 Infect. Immun. 74 (7), 3715-3726 (2006) Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A. 4B1232999F6E8130 929 8688087 Science 273 (5278), 1058-1073 (1996) Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. 3E79B40DD2AAA2B7 932 8688087 Science 273 (5278), 1058-1073 (1996) Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. 094EB3384F8D6DE8 1426 10684935 Nucleic Acids Res. 28 (6), 1397-1406 (2000) Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39 Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M. 357648D8FD8C6C8A 1481 10684935 Nucleic Acids Res. 28 (6), 1397-1406 (2000) Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39 Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C. 115411EB2DEE5654 1497 14689165 Arch. Microbiol. 181 (2), 144-154 (2004) The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E. 4D5D376EECCD186B 1501 14689165 Arch. Microbiol. 181 (2), 144-154 (2004) The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E. 4D57954EECDED66B 1556 18060065 PLoS ONE 2 (12), E1271 (2007) Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. 698688FB6DB95247 1559 18060065 PLoS ONE 2 (12), E1271 (2007) Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. E25E1BA99DB18F3D 2. The second kind of error which I got was :* org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature* * Which means in richsequence object some feature have location object which have its feature set to null. * My Observation: o Usually occur when you try to persist a richsequence object to database, and occur to those features which have *CompoundRichLocation *usually "joins" and "complement" in cds region of a genbank record o After catching the hibernate exception I went through all the features and either biojava or hibernate changed the object type of a CompoundRichLocation to SimpleRichLocation and set the feature variable to null. o Below is the screen shot of one of my tests + Settings before trying to persits the richsequence object to database * * After trying to persits the richsequence object to database and got in hibernate exception catch * * So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening. * Some extra information to make things more clear to you guys. o Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object. 1. LOCUS AE001439 1643831 bp DNA circular BCT 19-JAN-2006 + richSequence.feature Index : 2540 and line number in the genbank record : 22115 2. LOCUS CP001189 3887492 bp DNA circular BCT 16-OCT-2008 + richSequence.feature Index : 127 and line number in the genbank record : 2137 3. LOCUS CP001292 328635 bp DNA circular BCT 17-DEC-2008 + richSequence.feature Index : 389 and line number in the genbank record : 3632 4. LOCUS AM279694 238517 bp DNA linear BCT 23-OCT-2008 + richSequence.feature Index : 47 and line number in the genbank record : 4841 5. LOCUS CR931663 18517 bp DNA linear BCT 18-SEP-2008 + richSequence.feature Index : 45 and line number in the genbank record : 442 * The complete exception msg : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72) at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290) at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) at org.hibernate.engine.Cascade.cascade(Cascade.java:130) at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) at org.hibernate.engine.Cascade.cascade(Cascade.java:130) at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535) at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523) at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: moz-screenshot.png Type: image/png Size: 35664 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: moz-screenshot-1.png Type: image/png Size: 102659 bytes Desc: not available URL: From charles at imbusch.net Fri Feb 12 09:08:57 2010 From: charles at imbusch.net (Charles Imbusch) Date: Fri, 12 Feb 2010 14:08:57 -0000 Subject: [Biojava-l] .sff support In-Reply-To: <320fb6e01002081659u793228d1g17abb4f8e0100837@mail.gmail.com> References: <4B703AF3.4000300@imbusch.net> <56be91b61002081324t3423359dm917b283c6a1f2474@mail.gmail.com> <320fb6e01002081659u793228d1g17abb4f8e0100837@mail.gmail.com> Message-ID: <4B7560F3.5080208@imbusch.net> Hi all, I've been playing around with the sff file based on the file format definition at NCBI. I attached the output which includes the common header, the read header and read data section for the first read of that file. > I'm happy to answer questions on how the file format works > (including the undocumented index block which I had to reverse > engineer). > Yes, I would like to know how that works. index_magic_number:778921588 .mft version:1.00 Couldn't find anything about ".mft" version 1. At the moment I have two classes: sffParser and sffFile My idea was that sffParser can hold one or multiple sff files. Each instance of sffFile has a hashtable with the identifiers as keys and the filepointers are stored as the values. Now I would like to find a good representation of one single "read" object, which shall be accessible with an identifier like EV5RTWS02JXUUH At the moment I'm making use of the BigInteger class to store many variables but thats probably a waste of memory. The variables for the read object I'm thinking of: Read Header Section: read_header_length -> int name_length -> int number_of_bases -> int clip_qual_left -> int clip_qual_right -> int clip_adapter_left -> int clip_adapter_right -> int name -> string Read Data Section: flowgram_values -> float[] flow_index_per_base -> int[] bases: -> string quality_scores -> int[] But I'm not very familiar with the existing data structures of BioJava, is there maybe already something similar existing? Please comment. Cheers, Charles -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: output.txt URL: From mauricio at open-bio.org Fri Feb 5 15:48:30 2010 From: mauricio at open-bio.org (Mauricio Herrera Cuadra) Date: Fri, 05 Feb 2010 09:48:30 -0600 Subject: [Biojava-l] Fwd: Changes to NCBI BLAST and E-utilities. Message-ID: <4B6C3DCE.2070808@open-bio.org> Forwarding to the proper lists... -------- Original Message -------- Subject: [O|B|F Helpdesk #889] Changes to NCBI BLAST and E-utilities. Date: Fri, 5 Feb 2010 10:08:51 -0500 From: mcginnis via RT Reply-To: support at helpdesk.open-bio.org To: chris at bioteam.net, heikki at sanbi.ac.za, hlapp at gmx.net, jason at bioperl.org, mauricio at open-bio.org Fri Feb 05 10:08:51 2010: Request 889 was acted upon. Transaction: Ticket created by mcginnis at ncbi.nlm.nih.gov Queue: support at open-bio.org Subject: Changes to NCBI BLAST and E-utilities. Owner: Nobody Requestors: mcginnis at ncbi.nlm.nih.gov Status: new Ticket Dear Colleague: There are two changes I'd like to make you aware of. As you may or may not have noticed, we have been working on a new C++ version of the BLAST binaries. In the coming months we will be moving the C++ binaries into prominence and (slowly) phasing out the C toolkit binaries. There are many changes not least of which is a move to individual binaries for each program (blastn, blastp, etc). We are not sure how many of your users use BioPerl with the BLAST binaries, my understanding is that many use BioPerl to to remote BLAST. However, there isa change to the BLAST results in Text and presumably HTML. This could have an effect on any parsers which scrape these formats and do not use XML. For obvious reason, we want to support only the XML format for parsing, but we thought we should give you heads up on this. blast 2.2.22 Query: 3307 ------------------------------------------------------------ 3307 Sbjct: 390 GSGPEAFRGSGPEAFRGSGPEAFRGSGPEAFRGSGPEAFRGSGPEAFRGSGPEAFRGSGP 449 blast 2.2.22+ Query ------------------------------------------------------------ Sbjct 390 GSGPEAFRGSGPEAFRGSGPEAFRGSGPEAFRGSGPEAFRGSGPEAFRGSGPEAFRGSGP 449 A single line of gaps lacks the Query numbering in the blast+ output. The C version of blast has numbering in this case. Sample alignment shown below. According to users the blast+ output without the numbering breaks bioperl parsers. Wehave heard forma few but I think they may be older parsers? The second issue is a policy concerning E-utilities. This was announced on the utilities-announce at ncbi.nlm.nih.gov mail-list but you may not have seen it. As part of an ongoing effort to ensure efficient access to the Entrez Utilities (E-utilities) by all users, NCBI has decided to change the usage policy for the E-utilities effective June 1, 2010. Effective on June 1, 2010, all E-utility requests, either using standard URLs or SOAP, must contain non-null values for both the &tool and &email parameters. Any E-utility request made after June 1, 2010 that does not contain values for both parameters will return an error explaining that these parameters must be included in E-utility requests. The value of the &tool parameter should be a URI-safe string that is the name of the software package, script or web page producing the E-utility request. The value of the &email parameter should be a valid e-mail address for the appropriate contact person or group responsible for maintaining the tool producing the E-utility request. NCBI uses these parameters to contact users whose use of the E-utilities violates the standard usage policies described athttp://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html#UserSystemRequirements. These usage policies are designed to prevent excessive requests from a small group of users from reducing or eliminating the wider community's access to the E-utilities. NCBI will attempt to contact a user at the e-mail address provided in the &email parameter prior to blocking access to the E-utilities. NCBI realizes that this policy change will require many of our users to change their code. Based on past experience, we anticipate that most of our users should be able to make the necessary changes before the June 1, 2010 deadline. If you have any concerns about making these changes by that date, or if you have any questions about these policies, please contact eutilities at ncbi.nlm.nih.gov. Thank you for your understanding and cooperation in helping us continue to deliver a reliable and efficient web service. I think you already adhere to this policy but should a user's script not meet these requirements, than the script will fail and requests will be turned away with an error message. Scott D. McGinnis M.A. NCBI/NLM/NIH 45 Center Drive, MSC 6511 Bldg 45, Room 4AN.44C Bethesda, MD 20892 mcginnis at ncbi.nlm.nih.gov From charles at imbusch.net Mon Feb 8 16:25:23 2010 From: charles at imbusch.net (Charles Imbusch) Date: Mon, 08 Feb 2010 17:25:23 +0100 Subject: [Biojava-l] .sff support Message-ID: <4B703AF3.4000300@imbusch.net> Hello, I have been wondering whether Biojava is able to handle sff files coming from 454 sequencing runs. I found something here: http://lists.open-bio.org/pipermail/biojava-dev/2009-July/003907.html Does somebody know about the current status on Biojava and sff files? Thanks in advance, Charles From paolo.pavan at gmail.com Mon Feb 8 21:24:49 2010 From: paolo.pavan at gmail.com (Paolo Pavan) Date: Mon, 8 Feb 2010 22:24:49 +0100 Subject: [Biojava-l] .sff support In-Reply-To: <4B703AF3.4000300@imbusch.net> References: <4B703AF3.4000300@imbusch.net> Message-ID: <56be91b61002081324t3423359dm917b283c6a1f2474@mail.gmail.com> Unfortunately, after spending some time on it, I didn't anything, sorry. There is just a post more I sent to Andreas Prlic without enclose the list by mistake, in which I report a few info more, coming from my reading on BioPerl's way to manage contigs and assembly informations. Nothing more. Paolo 2010/2/8 Charles Imbusch > Hello, > > I have been wondering whether Biojava is able to > handle sff files coming from 454 sequencing runs. > > I found something here: > http://lists.open-bio.org/pipermail/biojava-dev/2009-July/003907.html > > Does somebody know about the current status on Biojava and sff files? > > > Thanks in advance, > Charles > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From biopython at maubp.freeserve.co.uk Tue Feb 9 00:59:37 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 9 Feb 2010 00:59:37 +0000 Subject: [Biojava-l] .sff support In-Reply-To: <56be91b61002081324t3423359dm917b283c6a1f2474@mail.gmail.com> References: <4B703AF3.4000300@imbusch.net> <56be91b61002081324t3423359dm917b283c6a1f2474@mail.gmail.com> Message-ID: <320fb6e01002081659u793228d1g17abb4f8e0100837@mail.gmail.com> > 2010/2/8 Charles Imbusch >> Hello, >> >> I have been wondering whether Biojava is able to >> handle sff files coming from 454 sequencing runs. >> >> I found something here: >> http://lists.open-bio.org/pipermail/biojava-dev/2009-July/003907.html >> >> Does somebody know about the current status on Biojava and sff files? >> >> >> Thanks in advance, >> Charles On Mon, Feb 8, 2010 at 9:24 PM, Paolo Pavan wrote: > > Unfortunately, after spending some time on it, I didn't anything, sorry. > There is just a post more I sent to Andreas Prlic without enclose the list > by mistake, in which I report a few info more, coming from my reading on > BioPerl's way to manage contigs and assembly informations. > Nothing more. > > Paolo Hi, I've CC'd the common OpenBio mailing list as this is probably of interest beyond just BioJava. Based on code from Jose Blanca (author of sff_extract), I implemented support for the SFF (Roche 454) sequencing reads for Biopython last year on a branch that I hope to merge into our next release, currently here: http://github.com/peterjc/biopython/tree/sff-seqio In addition to the Roche Manuals (which may not be that easy to get a copy of), the SFF format is described on this NCBI webpage: http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?cmd=show&f=formats&m=doc&s=formats#sff I'm happy to answer questions on how the file format works (including the undocumented index block which I had to reverse engineer). Peter P.S. Just to clarify (from the old BioJava thread), the SFF file just holds the raw reads - it is an input file for doing an assembly or mapping. From holland at eaglegenomics.com Tue Feb 9 07:34:32 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 9 Feb 2010 20:34:32 +1300 Subject: [Biojava-l] Hibernate Exception and suggestion for change in BioSqlSchema In-Reply-To: <4B710CED.5060404@gmail.com> References: <4B710CED.5060404@gmail.com> Message-ID: <9BB14CC0-4A89-4DC8-8928-C8475108B54A@eaglegenomics.com> Hi. It's possible that your original email didn't make it to the list because it is HTML format, and the list only accepts plain text. However, in answer to your two questions: 1. The code that does the resolution of references might be better if it looks up existing IDs rather than using author, title, location to identify existing records. I would suggest modifying it to a three-step process - test ID, then if no match then test author/title/location, then if still no match create a new reference. Could someone do that? (I'm unable to do anything until late March). 2. I think that's a bug (compound locations with null features) but not sure why. Could be that the process of constructing a CompoundRichLocation is somehow losing the feature reference from the original SimpleRichLocation. Again I can't investigate until March - can someone else take a look at the code? (A good starting point would be to look at how a CompoundRichLocation decides to select the feature from the SimpleRichLocations it is made up from). cheers, Richard On 9 Feb 2010, at 20:21, Deepak Sheoran wrote: > > Hi Richard > > Below is the email which I sent to Biojava-1 mailing list but it never get posted on the mailing list server neither do i got any response, so please have a look on this email and tell what can be the solution of the problem described in the message. > > > Thanks > Deepak Sheoran > -------- Original Message -------- > Subject: Hibernate Exception and suggestion for change in BioSqlSchema > Date: Wed, 03 Feb 2010 08:07:35 -0600 > From: Deepak Sheoran > To: biojava-l at lists.open-bio.org > > Hi guys, > > A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is: http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html > On Richard suggestion in above link I am able to resolve some of issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us. > ? The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id. > This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object . > Now when you tie RichObjectFactory to a active hibernate session then the class "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database. > But problem is with below part of that method: > ?..LineNumber: 114 > else if (SimpleDocRef.class.isAssignableFrom(clazz)) > { queryType = "DocRef"; > // convert List constructor to String representation for query > ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true)); > if (ourParamsList.size()<3) { > queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null"; > } else { > queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?"; > } > } > ..LineNubmer: 123 > Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code > ?.LineNumber: 447 > else { > try { > CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)}); > RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount); > rlistener.getCurrentFeature().addRankedCrossRef(rcr); > } catch (ChangeVetoException e) { > throw new ParseException(e+", accession:"+accession); > } > } > ?..LineNumber:455 > Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of "unique constraint on dbxref_id" column. > > The only way to get these record in database is: > ? The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table. Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them. > ? Second solution is slightly difficult to implement, is to change the way "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)" make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session. > > Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world in this email) > Reference_id > Dbxref_id > Location > Title > Authors > crc > 216 > 18554304 > FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008) > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H. > 9E940E01F4BE3CD0 > 230 > 18554304 > FEMS Microbiol. Ecol. 66 (3), 528-536 (2008) > Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model > Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H. > D3BC0C17F3F786C9 > 415 > 16790744 > Infect. Immun. 74 (7), 3715-3726 (2006) > Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A. > 60AEDFA0CEEACC38 > 969 > 16790744 > Infect. Immun. 74 (7), 3715-3726 (2006) > Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences > Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A. > 4B1232999F6E8130 > 929 > 8688087 > Science 273 (5278), 1058-1073 (1996) > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. > 3E79B40DD2AAA2B7 > 932 > 8688087 > Science 273 (5278), 1058-1073 (1996) > Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii > Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. > 094EB3384F8D6DE8 > 1426 > 10684935 > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39 > Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M. > 357648D8FD8C6C8A > 1481 > 10684935 > Nucleic Acids Res. 28 (6), 1397-1406 (2000) > Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39 > Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C. > 115411EB2DEE5654 > 1497 > 14689165 > Arch. Microbiol. 181 (2), 144-154 (2004) > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E. > 4D5D376EECCD186B > 1501 > 14689165 > Arch. Microbiol. 181 (2), 144-154 (2004) > The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner > Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E. > 4D57954EECDED66B > 1556 > 18060065 > PLoS ONE 2 (12), E1271 (2007) > Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > 698688FB6DB95247 > 1559 > 18060065 > PLoS ONE 2 (12), E1271 (2007) > Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids > Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. > E25E1BA99DB18F3D > > ? The second kind of error which I got was : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature > ? Which means in richsequence object some feature have location object which have its feature set to null. > ? My Observation: > ? Usually occur when you try to persist a richsequence object to database, and occur to those features which have CompoundRichLocation usually "joins" and "complement" in cds region of a genbank record > ? After catching the hibernate exception I went through all the features and either biojava or hibernate changed the object type of a CompoundRichLocation to SimpleRichLocation and set the feature variable to null. > ? Below is the screen shot of one of my tests > ? Settings before trying to persits the richsequence object to database > > > ? > ? After trying to persits the richsequence object to database and got in hibernate exception catch > > ? > > ? So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening. > ? Some extra information to make things more clear to you guys. > ? Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object. > ? LOCUS AE001439 1643831 bp DNA circular BCT 19-JAN-2006 > ? richSequence.feature Index : 2540 and line number in the genbank record : 22115 > ? LOCUS CP001189 3887492 bp DNA circular BCT 16-OCT-2008 > ? richSequence.feature Index : 127 and line number in the genbank record : 2137 > ? LOCUS CP001292 328635 bp DNA circular BCT 17-DEC-2008 > ? richSequence.feature Index : 389 and line number in the genbank record : 3632 > ? LOCUS AM279694 238517 bp DNA linear BCT 23-OCT-2008 > ? richSequence.feature Index : 47 and line number in the genbank record : 4841 > ? LOCUS CR931663 18517 bp DNA linear BCT 18-SEP-2008 > ? richSequence.feature Index : 45 and line number in the genbank record : 442 > ? The complete exception msg : > org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature > at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72) > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290) > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) > at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) > at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) > at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) > at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) > at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) > at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) > at org.hibernate.engine.Cascade.cascade(Cascade.java:130) > at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) > at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) > at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) > at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) > at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) > at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27) > at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) > at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535) > at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523) > at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78) > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From anjolou at hotmail.com Tue Feb 9 15:17:57 2010 From: anjolou at hotmail.com (Louise Ott) Date: Tue, 9 Feb 2010 16:17:57 +0100 Subject: [Biojava-l] Displaying a simple alignment blastoutput-like Message-ID: Hi all ! I read and followed all the advices i found on the net and in this mailing list and i came to this : public class OtherAlignmentPanel extends TranslatedSequencePanel { public otherAlignmentPanel(Alignment ali) { this.setSequence((SymbolList) ali); this.setBackground(Color.white); this.setPreferredSize(new Dimension(MainFrame.screenSize.width -40, MainFrame.screenSize.height - 15)); MultiLineRenderer multi = new MultiLineRenderer(); AlignmentRenderer render1 = new AlignmentRenderer(); AlignmentRenderer render2 = new AlignmentRenderer(); AlignmentRenderer render3 = new AlignmentRenderer(); SymbolSequenceRenderer symbol = new SymbolSequenceRenderer(); render1.setLabel(ali.getLabels().get(0)); render1.setRenderer(symbol); multi.addRenderer(render1); render2.setLabel(ali.getLabels().get(1)); render2.setRenderer(symbol); multi.addRenderer(render2); render3.setLabel(ali.getLabels().get(2)); render3.setRenderer(symbol); multi.addRenderer(render3); this.setRenderer(multi); } } I created the panel in a frame like this : Sequence seq1; seq1 = ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspQseq(), "query"); Sequence seq2; seq2 = ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspHseq(), "hit"); Sequence seq3; seq3 = ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspMidline(), "midline"); Map list = new HashMap(); list.put("query", seq1); list.put("middle", seq2); list.put("hit", seq3); SimpleAlignment ali = new SimpleAlignment((Map) list); OtherAlignmentPanel pane = new otherAlignmentPanel(ali); this.add(pane); My problems :- First, it just shows nothing ! I am totally lost, it has been 3 days i am just trying to display this simple alignment but it doesn't work !!- I really would like the lines to be wrapped like in the blast output, so i tryied with the SequencePanelWrapper, but it doesn't work either... Is there any simple solution ??What did i do wrong in my code ? Have a nice day and thanks ! Louise _________________________________________________________________ D?couvrez Windows 7 en 7 secondes?! http://clk.atdmt.com/FRM/go/181574577/direct/01/ From jolyon.holdstock at ogt.co.uk Tue Feb 9 15:40:54 2010 From: jolyon.holdstock at ogt.co.uk (Jolyon Holdstock) Date: Tue, 9 Feb 2010 15:40:54 -0000 Subject: [Biojava-l] Displaying a simple alignment blastoutput-like[Scanned] References: Message-ID: <588D0DD225D05746B5D8CAE1BE971F3F02D2D448@EUCLID.internal.ogtip.com> Does nothing become visible, if so how are you calling the panels/frame to display? Can you post all your code? -----Original Message----- From: Louise Ott [mailto:anjolou at hotmail.com] Sent: 09 February 2010 15:18 To: biojava-l at lists.open-bio.org Subject: [Biojava-l] Displaying a simple alignment blastoutput-like[Scanned] Hi all ! I read and followed all the advices i found on the net and in this mailing list and i came to this : public class OtherAlignmentPanel extends TranslatedSequencePanel { public otherAlignmentPanel(Alignment ali) { this.setSequence((SymbolList) ali); this.setBackground(Color.white); this.setPreferredSize(new Dimension(MainFrame.screenSize.width -40, MainFrame.screenSize.height - 15)); MultiLineRenderer multi = new MultiLineRenderer(); AlignmentRenderer render1 = new AlignmentRenderer(); AlignmentRenderer render2 = new AlignmentRenderer(); AlignmentRenderer render3 = new AlignmentRenderer(); SymbolSequenceRenderer symbol = new SymbolSequenceRenderer(); render1.setLabel(ali.getLabels().get(0)); render1.setRenderer(symbol); multi.addRenderer(render1); render2.setLabel(ali.getLabels().get(1)); render2.setRenderer(symbol); multi.addRenderer(render2); render3.setLabel(ali.getLabels().get(2)); render3.setRenderer(symbol); multi.addRenderer(render3); this.setRenderer(multi); } } I created the panel in a frame like this : Sequence seq1; seq1 = ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspQseq(), "query"); Sequence seq2; seq2 = ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspHseq(), "hit"); Sequence seq3; seq3 = ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspMidline(), "midline"); Map list = new HashMap(); list.put("query", seq1); list.put("middle", seq2); list.put("hit", seq3); SimpleAlignment ali = new SimpleAlignment((Map) list); OtherAlignmentPanel pane = new otherAlignmentPanel(ali); this.add(pane); My problems :- First, it just shows nothing ! I am totally lost, it has been 3 days i am just trying to display this simple alignment but it doesn't work !!- I really would like the lines to be wrapped like in the blast output, so i tryied with the SequencePanelWrapper, but it doesn't work either... Is there any simple solution ??What did i do wrong in my code ? Have a nice day and thanks ! Louise _________________________________________________________________ D?couvrez Windows 7 en 7 secondes ! http://clk.atdmt.com/FRM/go/181574577/direct/01/ _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l This email has been scanned by Oxford Gene Technology Security Systems. From anjolou at hotmail.com Tue Feb 9 16:47:47 2010 From: anjolou at hotmail.com (Louise Ott) Date: Tue, 9 Feb 2010 17:47:47 +0100 Subject: [Biojava-l] Displaying a simple alignment blastoutput-like[Scanned] In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3F02D2D448@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3F02D2D448@EUCLID.internal.ogtip.com> Message-ID: Posting all the code won't help because everything is visible except the object of OtherAlignmentPanel. I already tested the same code but with a SequencePanel and a SequencePanelWrapper added to my frame (with one sequence to display) and it worked fine. So if there is a problem, I think this can be only in the code I already posted. (And the code is really really big, is would be hard to read) Sorry for my poor english, i'm french, I don't always use the good words to explain my problem :) -------------------------------------------------- From: "Jolyon Holdstock" Sent: Tuesday, February 09, 2010 4:40 PM To: "Louise Ott" ; Subject: RE: [Biojava-l] Displaying a simple alignment blastoutput-like[Scanned] > Does nothing become visible, if so how are you calling the panels/frame to > display? > > Can you post all your code? > > -----Original Message----- > From: Louise Ott [mailto:anjolou at hotmail.com] > Sent: 09 February 2010 15:18 > To: biojava-l at lists.open-bio.org > Subject: [Biojava-l] Displaying a simple alignment > blastoutput-like[Scanned] > > > Hi all ! > I read and followed all the advices i found on the net and in this mailing > list and i came to this : > > public class OtherAlignmentPanel extends TranslatedSequencePanel { > public otherAlignmentPanel(Alignment ali) { > this.setSequence((SymbolList) ali); > this.setBackground(Color.white); this.setPreferredSize(new > Dimension(MainFrame.screenSize.width -40, MainFrame.screenSize.height - > 15)); > MultiLineRenderer multi = new MultiLineRenderer(); > AlignmentRenderer render1 = new AlignmentRenderer(); > AlignmentRenderer render2 = new AlignmentRenderer(); > AlignmentRenderer render3 = new AlignmentRenderer(); > SymbolSequenceRenderer symbol = new SymbolSequenceRenderer(); > render1.setLabel(ali.getLabels().get(0)); > render1.setRenderer(symbol); > multi.addRenderer(render1); > render2.setLabel(ali.getLabels().get(1)); > render2.setRenderer(symbol); > multi.addRenderer(render2); > render3.setLabel(ali.getLabels().get(2)); > render3.setRenderer(symbol); > multi.addRenderer(render3); > this.setRenderer(multi); } > } > > I created the panel in a frame like this : > Sequence seq1; seq1 = > ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspQseq(), > "query"); > Sequence seq2; seq2 = > ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspHseq(), > "hit"); > Sequence seq3; seq3 = > ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspMidline(), > "midline"); > Map list = new HashMap(); > list.put("query", seq1); list.put("middle", seq2); > list.put("hit", seq3); > SimpleAlignment ali = new SimpleAlignment((Map) list); > OtherAlignmentPanel pane = new otherAlignmentPanel(ali); this.add(pane); > > > > My problems :- First, it just shows nothing ! I am totally lost, it has > been 3 days i am just trying to display this simple alignment but it > doesn't work !!- I really would like the lines to be wrapped like in the > blast output, so i tryied with the SequencePanelWrapper, but it doesn't > work either... > Is there any simple solution ??What did i do wrong in my code ? > > > Have a nice day and thanks ! > Louise > > _________________________________________________________________ > D?couvrez Windows 7 en 7 secondes ! > http://clk.atdmt.com/FRM/go/181574577/direct/01/ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > > > > > > > This email has been scanned by Oxford Gene Technology Security Systems. > > > > > > > > > > > > This email has been scanned by Oxford Gene Technology Security Systems. > From anjolou at hotmail.com Tue Feb 9 16:47:47 2010 From: anjolou at hotmail.com (Louise Ott) Date: Tue, 9 Feb 2010 17:47:47 +0100 Subject: [Biojava-l] Displaying a simple alignment blastoutput-like[Scanned] In-Reply-To: <588D0DD225D05746B5D8CAE1BE971F3F02D2D448@EUCLID.internal.ogtip.com> References: <588D0DD225D05746B5D8CAE1BE971F3F02D2D448@EUCLID.internal.ogtip.com> Message-ID: Posting all the code won't help because everything is visible except the object of OtherAlignmentPanel. I already tested the same code but with a SequencePanel and a SequencePanelWrapper added to my frame (with one sequence to display) and it worked fine. So if there is a problem, I think this can be only in the code I already posted. (And the code is really really big, is would be hard to read) Sorry for my poor english, i'm french, I don't always use the good words to explain my problem :) -------------------------------------------------- From: "Jolyon Holdstock" Sent: Tuesday, February 09, 2010 4:40 PM To: "Louise Ott" ; Subject: RE: [Biojava-l] Displaying a simple alignment blastoutput-like[Scanned] > Does nothing become visible, if so how are you calling the panels/frame to > display? > > Can you post all your code? > > -----Original Message----- > From: Louise Ott [mailto:anjolou at hotmail.com] > Sent: 09 February 2010 15:18 > To: biojava-l at lists.open-bio.org > Subject: [Biojava-l] Displaying a simple alignment > blastoutput-like[Scanned] > > > Hi all ! > I read and followed all the advices i found on the net and in this mailing > list and i came to this : > > public class OtherAlignmentPanel extends TranslatedSequencePanel { > public otherAlignmentPanel(Alignment ali) { > this.setSequence((SymbolList) ali); > this.setBackground(Color.white); this.setPreferredSize(new > Dimension(MainFrame.screenSize.width -40, MainFrame.screenSize.height - > 15)); > MultiLineRenderer multi = new MultiLineRenderer(); > AlignmentRenderer render1 = new AlignmentRenderer(); > AlignmentRenderer render2 = new AlignmentRenderer(); > AlignmentRenderer render3 = new AlignmentRenderer(); > SymbolSequenceRenderer symbol = new SymbolSequenceRenderer(); > render1.setLabel(ali.getLabels().get(0)); > render1.setRenderer(symbol); > multi.addRenderer(render1); > render2.setLabel(ali.getLabels().get(1)); > render2.setRenderer(symbol); > multi.addRenderer(render2); > render3.setLabel(ali.getLabels().get(2)); > render3.setRenderer(symbol); > multi.addRenderer(render3); > this.setRenderer(multi); } > } > > I created the panel in a frame like this : > Sequence seq1; seq1 = > ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspQseq(), > "query"); > Sequence seq2; seq2 = > ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspHseq(), > "hit"); > Sequence seq3; seq3 = > ProteinTools.createGappedProteinSequence(hspHits.getHsp().get(0).getHspMidline(), > "midline"); > Map list = new HashMap(); > list.put("query", seq1); list.put("middle", seq2); > list.put("hit", seq3); > SimpleAlignment ali = new SimpleAlignment((Map) list); > OtherAlignmentPanel pane = new otherAlignmentPanel(ali); this.add(pane); > > > > My problems :- First, it just shows nothing ! I am totally lost, it has > been 3 days i am just trying to display this simple alignment but it > doesn't work !!- I really would like the lines to be wrapped like in the > blast output, so i tryied with the SequencePanelWrapper, but it doesn't > work either... > Is there any simple solution ??What did i do wrong in my code ? > > > Have a nice day and thanks ! > Louise > > _________________________________________________________________ > D?couvrez Windows 7 en 7 secondes ! > http://clk.atdmt.com/FRM/go/181574577/direct/01/ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > > > > > > > This email has been scanned by Oxford Gene Technology Security Systems. > > > > > > > > > > > > This email has been scanned by Oxford Gene Technology Security Systems. > From sheoran143 at gmail.com Thu Feb 11 09:00:15 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Thu, 11 Feb 2010 03:00:15 -0600 Subject: [Biojava-l] Issues with BioSqlRichSequenceDB.java class Message-ID: <4B73C71F.1080706@gmail.com> Hi This class(BiosqlRichSequence) have methods to retrieve record from a local instance of biosql schema but when you type in accession number for record it mostly show the info but in some case (Record with accession:M97762) it give following error : Hibernate: select sequence0_.bioentry_id as bioentry1_9_, sequence0_1_.name as name9_, sequence0_1_.identifier as identifier9_, sequence0_1_.accession as accession9_, sequence0_1_.description as descript5_9_, sequence0_1_.version as version9_, sequence0_1_.division as division9_, sequence0_1_.taxon_id as taxon8_9_, sequence0_1_.biodatabase_id as biodatab9_9_, sequence0_.version as version13_, sequence0_.length as length13_, sequence0_.alphabet as alphabet13_, sequence0_.seq as seq13_ from biosequence sequence0_ inner join bioentry sequence0_1_ on sequence0_.bioentry_id=sequence0_1_.bioentry_id where sequence0_1_.name=? Exception in thread "main" java.lang.RuntimeException: Error while trying to load by id: M97762 at org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.getRichSequence(BioSQLRichSequenceDB.java:212) at com.orionbiosciences.orionGenBankLib.genBankDb.GenBankDb.GenBankDbToFileDownLoader(GenBankDb.java:355) at trashtesting.Main.main(Main.java:39) Caused by: org.biojava.bio.seq.db.IllegalIDException: Id not found: M97762 at org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.getRichSequence(BioSQLRichSequenceDB.java:206) ... 2 more Java Result: 1 The only way to find this record in my database is to search for LOCUS instead of Accession number which is "BTVNS1TUBA", java doc for BioSqlRichSequenceDb class say the id should be Genbank Id i can't understand what does that means, but when investigated the matter the error is in following method public RichSequenceDB getRichSequences(Set ids, RichSequenceDB db) throws BioException, IllegalIDException { if (db==null) db = new HashRichSequenceDB(); try { for (Iterator i = ids.iterator(); i.hasNext(); ) { String id = (String)i.next(); // Build the query object ***************************error******************* String queryText = "from Sequence where name = ?"; ***************************error*********************** *****************************solution************************** String queryText = "from Sequence where accession = ?"; // because name stand for Locus from gen-bank record which don't have any unique constraint name so its should not be good idea to use it for searching unique records // also people usually refer to a gen-bank record using accession number instead of LOCUS *****************************solution****************************** Object query = this.createQuery.invoke(this.session, new Object[]{queryText}); // Set the parameters query = this.setParameter.invoke(query, new Object[]{new Integer(0), id}); // Get the results List result = (List)this.list.invoke(query,(Object[]) null); // If the result doesn't just have a single entry, throw an exception if (result.size()==0) throw new IllegalIDException("Id not found: "+id); // Add the results to the results db. for (Iterator j = result.iterator(); j.hasNext(); ) db.addRichSequence((RichSequence)j.next()); } } catch (Exception e) { // Throw the exception with our nice message throw new RuntimeException("Error while trying to load by ids: "+ids,e); } return db; } even ncbi says " It is better to search for the actual accession number rather than the locus name, because the accessions are stable and locus names can change." REF: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#LocusNameB So my suggestion is to change the query so it will look for accession instead of name in this method. Also if you will try to download record from ncbi using java interface first with accession:M97762( as genbank_id) you can get it, but when you try to get using LOCUS you will get bad section exception around reference I don't know why ? Deepak Sheoran From holland at eaglegenomics.com Thu Feb 11 22:18:18 2010 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 12 Feb 2010 11:18:18 +1300 Subject: [Biojava-l] Issues with BioSqlRichSequenceDB.java class In-Reply-To: <4B73C71F.1080706@gmail.com> References: <4B73C71F.1080706@gmail.com> Message-ID: <4ACB7402-D8DB-4CEF-AC70-BE228312EB37@eaglegenomics.com> My preference would be to leave the existing method as-is, and modify the javadocs so that it is explicit that it is searching by name and not accession. This is to prevent breaking any code that may rely on this behaviour (as Genbank is not the only kind of sequence that can be stored in BioSQL, we can't guarantee that other sequence types are not using name as the unique identifier instead). Instead, I would propose adding a second method called getRichSequencesByAccession, with the modification you suggest. With regard to your last point about bad section errors, could you post the stack trace and the code that causes it? cheers, Richard On 11 Feb 2010, at 22:00, Deepak Sheoran wrote: > Hi > This class(BiosqlRichSequence) have methods to retrieve record from a local instance of biosql schema but when you type in accession number for record it mostly show the info but in some case (Record with accession:M97762) it give following error : > > Hibernate: select sequence0_.bioentry_id as bioentry1_9_, sequence0_1_.name as name9_, sequence0_1_.identifier as identifier9_, sequence0_1_.accession as accession9_, sequence0_1_.description as descript5_9_, sequence0_1_.version as version9_, sequence0_1_.division as division9_, sequence0_1_.taxon_id as taxon8_9_, sequence0_1_.biodatabase_id as biodatab9_9_, sequence0_.version as version13_, sequence0_.length as length13_, sequence0_.alphabet as alphabet13_, sequence0_.seq as seq13_ from biosequence sequence0_ inner join bioentry sequence0_1_ on sequence0_.bioentry_id=sequence0_1_.bioentry_id where sequence0_1_.name=? > Exception in thread "main" java.lang.RuntimeException: Error while trying to load by id: M97762 > at org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.getRichSequence(BioSQLRichSequenceDB.java:212) > at com.orionbiosciences.orionGenBankLib.genBankDb.GenBankDb.GenBankDbToFileDownLoader(GenBankDb.java:355) > at trashtesting.Main.main(Main.java:39) > Caused by: org.biojava.bio.seq.db.IllegalIDException: Id not found: M97762 > at org.biojavax.bio.db.biosql.BioSQLRichSequenceDB.getRichSequence(BioSQLRichSequenceDB.java:206) > ... 2 more > Java Result: 1 > > The only way to find this record in my database is to search for LOCUS instead of Accession number which is "BTVNS1TUBA", java doc for BioSqlRichSequenceDb class say the id should be Genbank Id i can't understand what does that means, but when investigated the matter the error is in following method > > public RichSequenceDB getRichSequences(Set ids, RichSequenceDB db) throws BioException, IllegalIDException { > if (db==null) db = new HashRichSequenceDB(); > try { > for (Iterator i = ids.iterator(); i.hasNext(); ) { > String id = (String)i.next(); > // Build the query object > ***************************error******************* > String queryText = "from Sequence where name = ?"; > ***************************error*********************** > *****************************solution************************** > String queryText = "from Sequence where accession = ?"; > // because name stand for Locus from gen-bank record which don't have any unique constraint name so its should not be good idea to use it for searching unique records > // also people usually refer to a gen-bank record using accession number instead of LOCUS > *****************************solution****************************** > Object query = this.createQuery.invoke(this.session, new Object[]{queryText}); > // Set the parameters > query = this.setParameter.invoke(query, new Object[]{new Integer(0), id}); > // Get the results > List result = (List)this.list.invoke(query,(Object[]) null); > // If the result doesn't just have a single entry, throw an exception > if (result.size()==0) throw new IllegalIDException("Id not found: "+id); > // Add the results to the results db. > for (Iterator j = result.iterator(); j.hasNext(); ) db.addRichSequence((RichSequence)j.next()); > } > } catch (Exception e) { > // Throw the exception with our nice message > throw new RuntimeException("Error while trying to load by ids: "+ids,e); > } > return db; > } > > even ncbi says " It is better to search for the actual accession number rather than the locus name, because the accessions are stable and locus names can change." > REF: http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html#LocusNameB > > So my suggestion is to change the query so it will look for accession instead of name in this method. > Also if you will try to download record from ncbi using java interface first with accession:M97762( as genbank_id) you can get it, but when you try to get using LOCUS you will get bad section exception around reference I don't know why ? > > > Deepak Sheoran > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From jeedward at yahoo.com Fri Feb 12 17:01:32 2010 From: jeedward at yahoo.com (John Edward) Date: Fri, 12 Feb 2010 09:01:32 -0800 (PST) Subject: [Biojava-l] Draft paper submission deadline is extended: BCBGC-10, Orlando, USA Message-ID: <767436.37495.qm@web45906.mail.sp1.yahoo.com> It would be highly appreciated if you could share this announcement with your colleagues, students and individuals whose research is in bioinformatics, computational biology, genomics, data-mining, and related areas. Draft paper submission deadline is extended: BCBGC-10, Orlando, USA The 2010 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) (website: http://www.PromoteResearch.org) will be held during 12-14 of July 2010 in Orlando, FL, USA. BCBGC is an important event in the areas of bioinformatics, computational biology, genomics and chemoinformatics and focuses on all areas related to the conference. The conference will be held at the same time and location where several other major international conferences will be taking place. The conference will be held as part of 2010 multi-conference (MULTICONF-10). MULTICONF-10 will be held during July 12-14, 2010 in Orlando, Florida, USA. The primary goal of MULTICONF is to promote research and developmental activities in computer science, information technology, control engineering, and related fields. Another goal is to promote the dissemination of research to a multidisciplinary audience and to facilitate communication among researchers, developers, practitioners in different fields.The following conferences are planned to be organized as part of MULTICONF-10. * International Conference on Artificial Intelligence and Pattern Recognition (AIPR-10) * International Conference on Automation, Robotics and Control Systems (ARCS-10) * International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) * International Conference on Computer Communications and Networks (CCN-10) * International Conference on Enterprise Information Systems and Web Technologies (EISWT-10) * International Conference on High Performance Computing Systems (HPCS-10) * International Conference on Information Security and Privacy (ISP-10) * International Conference on Image and Video Processing and Computer Vision (IVPCV-10) * International Conference on Software Engineering Theory and Practice (SETP-10) * International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-10) MULTICONF-10 will be held at Imperial Swan Hotel and Suites. It is a full-service resort that puts you in the middle of the fun! Located 1/2 block south of the famed International Drive, the hotel is just minutes from great entertainment like Walt Disney World? Resort, Universal Studios and Sea World Orlando. Guests can enjoy free scheduled transportation to these theme parks, as well as spacious accommodations, outdoor pools and on-site dining ? all situated on 10 tropically landscaped acres. Here, guests can experience a full-service resort with discount hotel pricing in Orlando. We invite draft paper submissions. Please see the website http://www.PromoteResearch.org for more details. Sincerely John Edward From koen.bruynseels at cropdesign.com Sat Feb 13 17:13:54 2010 From: koen.bruynseels at cropdesign.com (koen.bruynseels at cropdesign.com) Date: Sat, 13 Feb 2010 18:13:54 +0100 Subject: [Biojava-l] Koen Bruynseels is out of the office. Message-ID: I will be out of the office starting 02/13/2010 and will not return until 02/21/2010. I will respond to your message when I return. From charles at imbusch.net Mon Feb 15 22:32:28 2010 From: charles at imbusch.net (Charles Imbusch) Date: Mon, 15 Feb 2010 23:32:28 +0100 Subject: [Biojava-l] .sff support In-Reply-To: <320fb6e01002081659u793228d1g17abb4f8e0100837@mail.gmail.com> References: <4B703AF3.4000300@imbusch.net> <56be91b61002081324t3423359dm917b283c6a1f2474@mail.gmail.com> <320fb6e01002081659u793228d1g17abb4f8e0100837@mail.gmail.com> Message-ID: <4B79CB7C.3040008@imbusch.net> Hi all, I've been playing around with the sff file based on the file format definition at NCBI. I uploaded the output which includes the common header, the read header and read data section for the first read of that file. http://home.arcor.de/cimbusch/output.txt > I'm happy to answer questions on how the file format works > (including the undocumented index block which I had to reverse > engineer). > Yes, I would like to know how that works. index_magic_number:778921588 .mft version:1.00 Couldn't find anything about ".mft" version 1. At the moment I have two classes: sffParser and sffFile My idea was that sffParser can hold one or multiple sff files. Each instance of sffFile has a hashtable with the identifiers as keys and the filepointers are stored as the values. Now I would like to find a good representation of one single "read" object, which shall be accessible with an identifier like EV5RTWS02JXUUH At the moment I'm making use of the BigInteger class to store many variables but thats probably a waste of memory. The variables for the read object I'm thinking of: Read Header Section: read_header_length -> int name_length -> int number_of_bases -> int clip_qual_left -> int clip_qual_right -> int clip_adapter_left -> int clip_adapter_right -> int name -> string Read Data Section: flowgram_values -> float[] flow_index_per_base -> int[] bases: -> string quality_scores -> int[] But I'm not very familiar with the existing data structures of BioJava, is there maybe already something similar? Cheers, Charles From biopython at maubp.freeserve.co.uk Mon Feb 22 11:35:11 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Feb 2010 11:35:11 +0000 Subject: [Biojava-l] .sff support In-Reply-To: <4B79CB7C.3040008@imbusch.net> References: <4B703AF3.4000300@imbusch.net> <56be91b61002081324t3423359dm917b283c6a1f2474@mail.gmail.com> <320fb6e01002081659u793228d1g17abb4f8e0100837@mail.gmail.com> <4B79CB7C.3040008@imbusch.net> Message-ID: <320fb6e01002220335x6899a44bl68789cd4d7d772e3@mail.gmail.com> On Mon, Feb 15, 2010 at 10:32 PM, Charles Imbusch wrote: > > Hi all, > > I've been playing around with the sff file based on the file > format definition at NCBI. > I uploaded the output which includes the common header, > the read header and read data section for the first read > of that file. > > http://home.arcor.de/cimbusch/output.txt Looks like you've been making excellent progress :) Sorry for the delay in my reply, I was on leave last week (and without internet access for most of it). >> I'm happy to answer questions on how the file format works >> (including the undocumented index block which I had to reverse >> engineer). >> > > Yes, I would like to know how that works. > index_magic_number:778921588 .mft > version:1.00 > Couldn't find anything about ".mft" version 1. I believe ".mft" stands for "Manifest format", and Roche 454 use this block to hold both a read index and an XML string (the manifest). Immediately after the ".mft1.00" string are two longs which give the lengths of the XML string and the actual index data. Then comes the XML manifest string, followed by the actual index data (same format as Roche's older ".srt" index only block, uses base 256). Note the Biopython SFF code has now been merged into our trunk: http://github.com/biopython/biopython/blob/master/Bio/SeqIO/SffIO.py > At the moment I have two classes: sffParser and sffFile > My idea was that sffParser can hold one or multiple sff files. > Each instance of sffFile has a hashtable with the identifiers > as keys and the filepointers are stored as the values. Not all SFF files will have an index, but the Roche .srt and .mft index blocks will let you map from the ID to the offset. I take advantage of this in Biopython for our Bio.SeqIO.index(...) functionality with a slower fall back on scanning the file to build the index if the index information is missing (or in an unsupported format). The Biopython index code then uses a Python dictionary (hash) to hold the mapping from read name to file offset. See also: http://github.com/biopython/biopython/blob/master/Bio/SeqIO/_index.py > Now I would like to find a good representation of one single "read" object, > which shall be accessible with an identifier like EV5RTWS02JXUUH I think this is a Java question, so not my area of expertise. Peter From jeedward at yahoo.com Mon Feb 22 20:37:23 2010 From: jeedward at yahoo.com (John Edward) Date: Mon, 22 Feb 2010 12:37:23 -0800 (PST) Subject: [Biojava-l] Call for papers: BCBGC-10, Orlando, USA, July 2010 Message-ID: <472328.10215.qm@web45916.mail.sp1.yahoo.com> It would be highly appreciated if you could share this announcement with your colleagues, students and individuals whose research is in bioinformatics, computational biology, genomics, data-mining, and related areas. Call for papers: BCBGC-10, Orlando, USA, July 2010 The 2010 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) (website: http://www.PromoteResearch.org ) will be held during 12-14 of July 2010 in Orlando, FL, USA. BCBGC is an important event in the areas of bioinformatics, computational biology, genomics and chemoinformatics and focuses on all areas related to the conference. The conference will be held at the same time and location where several other major international conferences will be taking place. The conference will be held as part of 2010 multi-conference (MULTICONF-10). MULTICONF-10 will be held during July 12-14, 2010 in Orlando, Florida, USA. The primary goal of MULTICONF is to promote research and developmental activities in computer science, information technology, control engineering, and related fields. Another goal is to promote the dissemination of research to a multidisciplinary audience and to facilitate communication among researchers, developers, practitioners in different fields. The following conferences are planned to be organized as part of MULTICONF-10. ? International Conference on Artificial Intelligence and Pattern Recognition (AIPR-10) ? International Conference on Automation, Robotics and Control Systems (ARCS-10) ? International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics (BCBGC-10) ? International Conference on Computer Communications and Networks (CCN-10) ? International Conference on Enterprise Information Systems and Web Technologies (EISWT-10) ? International Conference on High Performance Computing Systems (HPCS-10) ? International Conference on Information Security and Privacy (ISP-10) ? International Conference on Image and Video Processing and Computer Vision (IVPCV-10) ? International Conference on Software Engineering Theory and Practice (SETP-10) ? International Conference on Theoretical and Mathematical Foundations of Computer Science (TMFCS-10) MULTICONF-10 will be held at Imperial Swan Hotel and Suites. It is a full-service resort that puts you in the middle of the fun! Located 1/2 block south of the famed International Drive, the hotel is just minutes from great entertainment like Walt Disney World? Resort, Universal Studios and Sea World Orlando. Guests can enjoy free scheduled transportation to these theme parks, as well as spacious accommodations, outdoor pools and on-site dining ? all situated on 10 tropically landscaped acres. Here, guests can experience a full-service resort with discount hotel pricing in Orlando. We invite draft paper submissions. Please see the website http://www.PromoteResearch.org for more details. Sincerely John Edward From charles at imbusch.net Thu Feb 25 10:08:38 2010 From: charles at imbusch.net (Charles Imbusch) Date: Thu, 25 Feb 2010 11:08:38 +0100 Subject: [Biojava-l] .sff support In-Reply-To: <320fb6e01002220335x6899a44bl68789cd4d7d772e3@mail.gmail.com> References: <4B703AF3.4000300@imbusch.net> <56be91b61002081324t3423359dm917b283c6a1f2474@mail.gmail.com> <320fb6e01002081659u793228d1g17abb4f8e0100837@mail.gmail.com> <4B79CB7C.3040008@imbusch.net> <320fb6e01002220335x6899a44bl68789cd4d7d772e3@mail.gmail.com> Message-ID: <4B864C26.3050709@imbusch.net> Dear Peter, thanks for your mail. I will try to make use of that index to speed things up when I have time available. Cheers, Charles Peter schrieb: > I believe ".mft" stands for "Manifest format", and Roche 454 use this > block to hold both a read index and an XML string (the manifest). > Immediately after the ".mft1.00" string are two longs which give the > lengths of the XML string and the actual index data. Then comes > the XML manifest string, followed by the actual index data (same > format as Roche's older ".srt" index only block, uses base 256). > > Note the Biopython SFF code has now been merged into our trunk: > http://github.com/biopython/biopython/blob/master/Bio/SeqIO/SffIO.py > From biopython at maubp.freeserve.co.uk Fri Feb 26 13:33:19 2010 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Feb 2010 13:33:19 +0000 Subject: [Biojava-l] .sff support In-Reply-To: <4B864C26.3050709@imbusch.net> References: <4B703AF3.4000300@imbusch.net> <56be91b61002081324t3423359dm917b283c6a1f2474@mail.gmail.com> <320fb6e01002081659u793228d1g17abb4f8e0100837@mail.gmail.com> <4B79CB7C.3040008@imbusch.net> <320fb6e01002220335x6899a44bl68789cd4d7d772e3@mail.gmail.com> <4B864C26.3050709@imbusch.net> Message-ID: <320fb6e01002260533y148936fdg36a5c8c814deb141@mail.gmail.com> On Thu, Feb 25, 2010 at 10:08 AM, Charles Imbusch wrote: > > Dear Peter, > > thanks for your mail. I will try to make use of that index > to speed things up when I have time available. > > Cheers, > ?Charles Hi Charles, If found when you want random access to the reads, loading the provided .mft or .srt index is MUCH faster than scanning the whole file to build the index manually. So this really is worth the effort. I hope the comments in my code are reasonably clear, but to recap the key idea of the index block is you get chunks of data of varying length (although typically all the same length since by default all the Roche reads have the same read length) like this name, null char, four character offset, terminator char of 0xFF You divide the index block into entries for each read by finding the 0xFF terminators. Because 0xFF (decimal 255) is used in this way, it cannot be used to encode the offsets which must only use 0x00 to 0xFE (decimal 0 to 254). The offset therefore uses base 255 instead of base 256. Note that this means that the largest offset the current Roche index blocks can hold is 255^4, or a little under 4GB. If you use the Roche tools to try and merge SFF files to make an example SFF file over 4GB you get a warning that there will be no index (and no manifest). The index holds the reads sorted alphabetically by name. We don't take advantage of this in Biopython since I use a Python dictionary (like a Perl hash) to store the offsets. In case you missed them, I'd like to draw your attention to the SFF files I am using in the Biopython unit tests: http://github.com/biopython/biopython/tree/master/Tests/Roche/ Regards, Peter From sheoran143 at gmail.com Wed Feb 3 14:07:44 2010 From: sheoran143 at gmail.com (Deepak Sheoran) Date: Wed, 03 Feb 2010 14:07:44 -0000 Subject: [Biojava-l] Hibernate Exception and suggestion for change in BioSqlSchema Message-ID: <4B698327.3080203@gmail.com> Hi guys, A couple of days back I was having some problem with hibernate exception but that exception got resolved and the reference to that email is: http://old.nabble.com/Hibernate-Exception-when-persisting-some-richsequence-object-to-biosql-schema-to27299245.html On Richard suggestion in above link I am able to resolve some of issues but then, I got stuck in to some other error with hibernate and then decided to investigate the matter and below are some facts and information which I found and I guess it is going to affect all of us. 1. The "Reference" table in bioSql schema have unique constraint on "dbxref_id" column (CONSTRAINT reference_dbxref_id_key UNIQUE (dbxref_id)). Which mean only one entry in reference table can use on dbxref_id. This Works wells but in cases when you have little variation in value of following column "location", "title", "authors" and all these variation refers to same PUBMED_ID. Then we can't persist or create a richsequence object . Now when you tie RichObjectFactory to a active hibernate session then the class "BioSqlRichObjectBuilder" have method called "buildObject(Class clazz, List paramsList) " which is responsible for looking up details of object in the database and if it find one then it will return that object, else it will try to persist the new object into the database. But problem is with below part of that method: .....LineNumber: 114 else if (SimpleDocRef.class.isAssignableFrom(clazz)) { queryType = "DocRef"; // convert List constructor to String representation for query ourParamsList.set(0, DocRefAuthor.Tools.generateAuthorString((List)ourParamsList.get(0), true)); if (ourParamsList.size()<3) { queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title is null"; } else { queryText = "from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?"; } } ..LineNubmer: 123 Now when hibernate search the database, it won't find any other record in "reference" table because those two record are different in string comparison, so it will return a new object back to "GenbankFormat" to following piece of code ....LineNumber: 447 else { try { CrossRef cr = (CrossRef)RichObjectFactory.getObject(SimpleCrossRef.class,new Object[]{dbname, raccession, new Integer(0)}); RankedCrossRef rcr = new SimpleRankedCrossRef(cr, ++rcrossrefCount); rlistener.getCurrentFeature().addRankedCrossRef(rcr); } catch (ChangeVetoException e) { throw new ParseException(e+", accession:"+accession); } } .....LineNumber:455 Then we will add that object to rlistener. And move to next part of genbank record and then biojava search for a new crossref in database and it will try to persist the old one it get a hibernate exception regarding violation of "unique constraint on dbxref_id" column. The only way to get these record in database is: * The very easy solution and the way I did it for testing my theory is Change the bioSql schema so that it can allow many to one on relation between "reference" and "dbxref" table. Which even make sense because one paper can have many different variation of naming, and this change allow us to store that info too. But this is something BioSql people have decide and I don't know how to approach them. * Second solution is slightly difficult to implement, is to change the way "BioSqlRichObjectBuilder.buildObject(Class clazz,List paramsList)" make decision about weather a particular DocRef already exist in database or not. I am mean testing all possible string variations of authors, location, title of the docRef which we are searching. Which does have many complications and may slow down process of creating a richsequence object when link RichObjectFactory with a active hibernate session. Example:Below is a sample of what i have in my local biosql schema which has modification suggested by me. (dbxref_id column have Pubmed_id , I replaced the local dbxref_id which was present on this table in my database with pubmed_id stored in "dbxref" table, for easy reference with outside world) Reference_id Dbxref_id Location Title Authors crc 216 18554304 FEMS Microbiol. Ecol. 66 (3THEMATIC ISSUE: GUT MICROBIOLOGY), 528-536 (2008) Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H. 9E940E01F4BE3CD0 230 18554304 FEMS Microbiol. Ecol. 66 (3), 528-536 (2008) Isolation of lactate-utilizing butyrate-producing bacteria from human feces and in vivo administration of Anaerostipes caccae strain L2 and galacto-oligosaccharides in a rat model Sato,T., Matsumoto,K., Okumura,T., Yokoi,W., Naito,E., Yoshida,Y., Nomoto,K., Ito,M. and Sawada,H. D3BC0C17F3F786C9 415 16790744 Infect. Immun. 74 (7), 3715-3726 (2006) Intrastrain Heterogeneity of the mgpB Gene in Mycoplasma genitalium Is Extensive In Vitro and In Vivo and Suggests that Variation Is Generated via Recombination with Repetitive Chromosomal Sequences Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A. 60AEDFA0CEEACC38 969 16790744 Infect. Immun. 74 (7), 3715-3726 (2006) Intrastrain heterogeneity of the mgpB gene in mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences Iverson-Cabral,S.L., Astete,S.G., Cohen,C.R., Rocha,E.P. and Totten,P.A. 4B1232999F6E8130 929 8688087 Science 273 (5278), 1058-1073 (1996) Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J.-F., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.L., Geoghagen,N.S.M., Weidman,J.F., Fuhrmann,J.L., Presley,E.A., Nguyen,D., Utterback,T.R., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.P., Borodovsky,M., Klenk,H.-P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. 3E79B40DD2AAA2B7 932 8688087 Science 273 (5278), 1058-1073 (1996) Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D., Kerlavage,A.R., Dougherty,B.A., Tomb,J., Adams,M.D., Reich,C.I., Overbeek,R., Kirkness,E.F., Weinstock,K.G., Merrick,J.M., Glodek,A., Scott,J.D., Geoghagen,N.S., Weidman,J.F., Fuhrmann,J.L., Nguyen,D.T., Utterback,T., Kelley,J.M., Peterson,J.D., Sadow,P.W., Hanna,M.C., Cotton,M.D., Hurst,M.A., Roberts,K.M., Kaine,B.B., Borodovsky,M., Klenk,H.P., Fraser,C.M., Smith,H.O., Woese,C.R. and Venter,J.C. 094EB3384F8D6DE8 1426 10684935 Nucleic Acids Res. 28 (6), 1397-1406 (2000) Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39 Read,T.D., Brunham,R.C., Shen,C., Gill,S.R., Heidelberg,J.F., White,O., Hickey,E.K., Peterson,J., Umayam,L.A., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S.L., Eisen,J. and Fraser,C.M. 357648D8FD8C6C8A 1481 10684935 Nucleic Acids Res. 28 (6), 1397-1406 (2000) Genome sequences of Chlamydia trachomatis MoPn and Chlamydia pneumoniae AR39 Read,T., Brunham,R., Shen,C., Gill,S., Heidelberg,J., White,O., Hickey,E., Peterson,J., Utterback,T., Berry,K., Bass,S., Linher,K., Weidman,J., Khouri,H., Craven,B., Bowman,C., Dodson,R., Gwinn,M., Nelson,W., DeBoy,R., Kolonay,J., McClarty,G., Salzberg,S., Eisen,J. and Fraser,C. 115411EB2DEE5654 1497 14689165 Arch. Microbiol. 181 (2), 144-154 (2004) The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E. 4D5D376EECCD186B 1501 14689165 Arch. Microbiol. 181 (2), 144-154 (2004) The effect of FITA mutations on the symbiotic properties of Sinorhizobium fredii varies in a chromosomal-background-dependent manner Vinardell,J.M., Lopez-Baena,F.J., Hidalgo,A., Ollero,F.J., Bellogin,R., Del Rosario Espuny,M., Temprano,F., Romero,F., Krishnan,H.B., Pueppke,S.G. and Ruiz-Sainz,J.E. 4D57954EECDED66B 1556 18060065 PLoS ONE 2 (12), E1271 (2007) Analysis of the Neurotoxin Complex Genes in Clostridium botulinum A1-A4 and B1 Strains: BoNT/A3, /Ba4 and /B1 Clusters Are Located within Plasmids Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,A.C., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. 698688FB6DB95247 1559 18060065 PLoS ONE 2 (12), E1271 (2007) Analysis of the neurotoxin complex genes in Clostridium botulinum A1-A4 and B1 strains: BoNT/A3, /Ba4 and /B1 clusters are located within plasmids Smith,T.J., Hill,K.K., Foley,B.T., Detter,J.C., Munk,C.A., Bruce,D.C., Doggett,N.A., Smith,L.A., Marks,J.D., Xie,G. and Brettin,T.S. E25E1BA99DB18F3D 2. The second kind of error which I got was :* org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature* * Which means in richsequence object some feature have location object which have its feature set to null. * My Observation: o Usually occur when you try to persist a richsequence object to database, and occur to those features which have *CompoundRichLocation *usually "joins" and "complement" in cds region of a genbank record o After catching the hibernate exception I went through all the features and either biojava or hibernate changed the object type of a CompoundRichLocation to SimpleRichLocation and set the feature variable to null. o Below is the screen shot of one of my tests + Settings before trying to persits the richsequence object to database * * After trying to persits the richsequence object to database and got in hibernate exception catch * * So my question is why is this happening and how to stop or how to get these record into database, I have no clue why is this happening. * Some extra information to make things more clear to you guys. o Below are some Locus line from genbank record for which I know the error of location, I mean the cds region causing error, and array index in richsequence.feature arrayList object. 1. LOCUS AE001439 1643831 bp DNA circular BCT 19-JAN-2006 + richSequence.feature Index : 2540 and line number in the genbank record : 22115 2. LOCUS CP001189 3887492 bp DNA circular BCT 16-OCT-2008 + richSequence.feature Index : 127 and line number in the genbank record : 2137 3. LOCUS CP001292 328635 bp DNA circular BCT 17-DEC-2008 + richSequence.feature Index : 389 and line number in the genbank record : 3632 4. LOCUS AM279694 238517 bp DNA linear BCT 23-OCT-2008 + richSequence.feature Index : 47 and line number in the genbank record : 4841 5. LOCUS CR931663 18517 bp DNA linear BCT 18-SEP-2008 + richSequence.feature Index : 45 and line number in the genbank record : 442 * The complete exception msg : org.hibernate.PropertyValueException: not-null property references a null or transient value: Location.feature at org.hibernate.engine.Nullability.checkNullability(Nullability.java:72) at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:290) at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) at org.hibernate.engine.Cascade.cascade(Cascade.java:130) at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.performSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:94) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) at org.hibernate.impl.SessionImpl.fireSaveOrUpdate(SessionImpl.java:507) at org.hibernate.impl.SessionImpl.saveOrUpdate(SessionImpl.java:499) at org.hibernate.engine.CascadingAction$5.cascade(CascadingAction.java:218) at org.hibernate.engine.Cascade.cascadeToOne(Cascade.java:268) at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:216) at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) at org.hibernate.engine.Cascade.cascadeCollectionElements(Cascade.java:296) at org.hibernate.engine.Cascade.cascadeCollection(Cascade.java:242) at org.hibernate.engine.Cascade.cascadeAssociation(Cascade.java:219) at org.hibernate.engine.Cascade.cascadeProperty(Cascade.java:169) at org.hibernate.engine.Cascade.cascade(Cascade.java:130) at org.hibernate.event.def.AbstractSaveEventListener.cascadeAfterSave(AbstractSaveEventListener.java:456) at org.hibernate.event.def.AbstractSaveEventListener.performSaveOrReplicate(AbstractSaveEventListener.java:334) at org.hibernate.event.def.AbstractSaveEventListener.performSave(AbstractSaveEventListener.java:181) at org.hibernate.event.def.AbstractSaveEventListener.saveWithGeneratedId(AbstractSaveEventListener.java:121) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.saveWithGeneratedOrRequestedId(DefaultSaveOrUpdateEventListener.java:187) at org.hibernate.event.def.DefaultSaveEventListener.saveWithGeneratedOrRequestedId(DefaultSaveEventListener.java:33) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.entityIsTransient(DefaultSaveOrUpdateEventListener.java:172) at org.hibernate.event.def.DefaultSaveEventListener.performSaveOrUpdate(DefaultSaveEventListener.java:27) at org.hibernate.event.def.DefaultSaveOrUpdateEventListener.onSaveOrUpdate(DefaultSaveOrUpdateEventListener.java:70) at org.hibernate.impl.SessionImpl.fireSave(SessionImpl.java:535) at org.hibernate.impl.SessionImpl.save(SessionImpl.java:523) at trashtesting.GenBankLoaderTesting.main(GenBankLoaderTesting.java:78) -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: moz-screenshot.png Type: image/png Size: 35664 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: moz-screenshot-1.png Type: image/png Size: 102659 bytes Desc: not available URL: From charles at imbusch.net Fri Feb 12 14:08:57 2010 From: charles at imbusch.net (Charles Imbusch) Date: Fri, 12 Feb 2010 14:08:57 -0000 Subject: [Biojava-l] .sff support In-Reply-To: <320fb6e01002081659u793228d1g17abb4f8e0100837@mail.gmail.com> References: <4B703AF3.4000300@imbusch.net> <56be91b61002081324t3423359dm917b283c6a1f2474@mail.gmail.com> <320fb6e01002081659u793228d1g17abb4f8e0100837@mail.gmail.com> Message-ID: <4B7560F3.5080208@imbusch.net> Hi all, I've been playing around with the sff file based on the file format definition at NCBI. I attached the output which includes the common header, the read header and read data section for the first read of that file. > I'm happy to answer questions on how the file format works > (including the undocumented index block which I had to reverse > engineer). > Yes, I would like to know how that works. index_magic_number:778921588 .mft version:1.00 Couldn't find anything about ".mft" version 1. At the moment I have two classes: sffParser and sffFile My idea was that sffParser can hold one or multiple sff files. Each instance of sffFile has a hashtable with the identifiers as keys and the filepointers are stored as the values. Now I would like to find a good representation of one single "read" object, which shall be accessible with an identifier like EV5RTWS02JXUUH At the moment I'm making use of the BigInteger class to store many variables but thats probably a waste of memory. The variables for the read object I'm thinking of: Read Header Section: read_header_length -> int name_length -> int number_of_bases -> int clip_qual_left -> int clip_qual_right -> int clip_adapter_left -> int clip_adapter_right -> int name -> string Read Data Section: flowgram_values -> float[] flow_index_per_base -> int[] bases: -> string quality_scores -> int[] But I'm not very familiar with the existing data structures of BioJava, is there maybe already something similar existing? Please comment. Cheers, Charles -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: output.txt URL: