From me at hongyu.org Wed Dec 3 14:54:15 2008 From: me at hongyu.org (Hongyu Zhang) Date: Wed, 3 Dec 2008 11:54:15 -0800 (PST) Subject: [Biojava-l] read three letter amino acid sequence Message-ID: <228538.63472.qm@web51412.mail.re2.yahoo.com> Dear all, Is there anyway to read a three letter amino acid sequence containing words like "Ala", "Tyr" or an unknown amino acid "Xaa", and then convert it to a biojava SymbolList or sequence? Thanks, Hongyu From aumanga at biggjapan.com Wed Dec 3 20:23:26 2008 From: aumanga at biggjapan.com (Ashika Umanga Umagiliya) Date: Thu, 04 Dec 2008 10:23:26 +0900 Subject: [Biojava-l] Find Epitope and CDR3? Message-ID: <4937310E.4020005@biggjapan.com> Greetings all, This is kind of off the topic , but since I am from a computer science background and I have very little knowledge in bioinformatics ,i thought I can get some help form here. In the project I am working with, I have to find the 'epitope' sequence in a PDB file.At the moment they have implemented some logic (not very good, as they say) for it using Perl ,but for the requirements they ask I need to do it in Java. And also there is a requirement to highlight CRD3 region in JMol. Any tips where I found start? Thanks in advance. umanga From markjschreiber at gmail.com Thu Dec 4 01:05:33 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 4 Dec 2008 14:05:33 +0800 Subject: [Biojava-l] read three letter amino acid sequence In-Reply-To: <228538.63472.qm@web51412.mail.re2.yahoo.com> References: <228538.63472.qm@web51412.mail.re2.yahoo.com> Message-ID: <93b45ca50812032205h82fb623kd78a500e8d59da58@mail.gmail.com> If there is not one already all you would need to do is implement SymbolTokenizer. - Mark On Thu, Dec 4, 2008 at 3:54 AM, Hongyu Zhang wrote: > Dear all, > > Is there anyway to read a three letter amino acid sequence containing words like "Ala", "Tyr" or an unknown amino acid "Xaa", and then convert it to a biojava SymbolList or sequence? > > Thanks, > > Hongyu > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From me at hongyu.org Fri Dec 5 03:23:24 2008 From: me at hongyu.org (Hongyu Zhang) Date: Fri, 5 Dec 2008 00:23:24 -0800 (PST) Subject: [Biojava-l] read three letter amino acid sequence Message-ID: <128708.22162.qm@web51403.mail.re2.yahoo.com> Thanks for the hint, Mark. I found that the NameTokenization class could be used for this purpose, although it can't deal with the ambiguous amino acid "Xaa". The following code will work for regular amino acid such as "Cys" Alphabet ap = AlphabetManager.alphabetForName("PROTEIN"); NameTokenization nt = new NameTokenization((FiniteAlphabet)ap); Symbol s = nt.parseToken("Cys"); From jimp at compbio.dundee.ac.uk Sun Dec 7 02:56:30 2008 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Sun, 07 Dec 2008 07:56:30 +0000 Subject: [Biojava-l] SimpleRichAnnotation In-Reply-To: <381a3e850810210445sc801d40ja36655349b5920b9@mail.gmail.com> References: <381a3e850810210421u54058163ncf347b57394af1b2@mail.gmail.com> <381a3e850810210445sc801d40ja36655349b5920b9@mail.gmail.com> Message-ID: <493B81AE.7090002@compbio.dundee.ac.uk> Hello. I'm going to re-itereate Augusto's question here - and plead (if not grovel) for some kind of answer: Augusto Fernandes Vellozo wrote: > I am having problems with the class SimpleRichAnnotation. > I have one term t of ontology o and I put one note n (with the term t) > in an SimpleRichAnnotation object a, but in the moment i call the > method > a.getProperties(t) it didn't return the note n. I have this same problem - but in this case its when I'm trying to get at anything from a bioSQL backed database. > I saw in the code of Biojava that the method getProperties imports the > term t into of the ontology default before to do the search. Because > this it doesn't return the correct note. > > Please, someone knows why is this method changing the ontology? .. and if it 'should be' changing the ontology, why doesn't the getOrImport method work properly ? Here are my observations: After my initial failing to get my feature annotation properties by the obvious means: via the Feauture's getAnnotation("note name"); I thought that it was due to the default ontology namespace not matching the BioSQL ontology namespace (that is, the ontology name prefix didn't match). I therefore created a new ontology with a matching namespace, thinking that would be sufficient. However, looking at the implementation of the biojavax SimpleRichAnnotation 'contains' method, I think I'm on a hiding to nothing, since I made the same observation as Augusto: it always calls the default ontology import or create term function to make a local instance of the Term passed as argument. Logically, this sounds fine, although messy and not thread-safe since the RichObjectFactory holds a single static instance of the default Ontology - except that it completely breaks in more ways than I thought possible. The reason this doesn't work is that the as far as I can tell, the Term equivalence relationship is implemented by object reference equivalence, rather than by equals method equivalence. That is, Terms can only be equal if they come from the same ComparableOntology object instance, and have the same string constant reference. That means terms can never be compared from different instances of the same ontology (such as would be returned from two independent sources - ie a persistence framework and an external ontology resource). I note that in the deprecated org.biojava.bio.seq.db.biosql implementation, there are methods to handle the BioSQL ontology index. These are, however, deprecated, and so there is currently no way to retrieve a singleton instance of the Biosql database's ontology, and therefore no way that the above equivalence method is ever going to retrieve the corresponding Note for a particular named property... even if I could set the RichSequenceObjectFactory's default ontology to be the one retrieved from the BioSQL database back end! sorry to be long winded... but it would be nice to know if I've missed something completely here! thanks Jim. -- ------------------------------------------------------------------- J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk The University of Dundee is a Scottish Registered Charity, No. SC015096. From holland at eaglegenomics.com Sun Dec 7 14:19:36 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Sun, 07 Dec 2008 19:19:36 +0000 Subject: [Biojava-l] SimpleRichAnnotation In-Reply-To: <493B81AE.7090002@compbio.dundee.ac.uk> References: <381a3e850810210421u54058163ncf347b57394af1b2@mail.gmail.com> <381a3e850810210445sc801d40ja36655349b5920b9@mail.gmail.com> <493B81AE.7090002@compbio.dundee.ac.uk> Message-ID: <493C21C8.4030800@eaglegenomics.com> Your email is basically four questions as far as I can tell? (a) why doesn't getProperties on a RichAnnotation work the way you'd expect it to (b) how can you get a singleton ontology instance loaded from BioSQL (c) how can you change the default ontology returned by RichObjectFactory (d) why are Terms using the == equality test instead of equals(). I'll answer them last first! (d) for speed, and on the assumption that nobody would want to have two copies of the exact same ontology in memory at any one time. But, you are right, two terms of the same name from two ontologies of the same name from two different sources will not match. It can be argued both ways - it is a good thing because you don't want a collection to report it contains a specific object when it in fact contains only a copy that is like that object but not the exact same object - and it is a bad thing because you can't load two copies of the same ontology twice for comparison. If you want to test for equality without making an == comparison because of the above assumption, you can compare them using the equals() methods on the name string and ontology reference provided by the terms getter methods rather than using equals() on the term itself. (c) You can change the default ontology. Call RichObjectFactory.setDefaultOntologyName() with the name of the ontology you want to use. It will create a new ontology of this name, or if you have already created one of that name recently (you MUST have done so using the getObject() method of the RichObjectFactory for this to work), then it will use that one instead. However, if you have already called the connectToBioSQL() method on the factory, then this will load (or create if it doesn't exist) the ontology from BioSQL and use that instance instead of creating a new one. (b) Use the getObject() method on the RichObjectFactory after already having called the connectToBioSQL() method. (a) getProperty()/getProperties() is deprecated but hasn't been marked so yet. Instead, you should use getNote()/getNoteSet(). This will work the way you expect it to. It accepts an actual term, rather than a term name, so you can pass in any ontology term you wish. I hope this helps. cheers, Richard James Procter wrote: > Hello. > > I'm going to re-itereate Augusto's question here - and plead (if not > grovel) for some kind of answer: > Augusto Fernandes Vellozo wrote: >> I am having problems with the class SimpleRichAnnotation. >> I have one term t of ontology o and I put one note n (with the term t) >> in an SimpleRichAnnotation object a, but in the moment i call the >> method >> a.getProperties(t) it didn't return the note n. > I have this same problem - but in this case its when I'm trying to get > at anything from a bioSQL backed database. > >> I saw in the code of Biojava that the method getProperties imports the >> term t into of the ontology default before to do the search. Because >> this it doesn't return the correct note. >> >> Please, someone knows why is this method changing the ontology? > .. and if it 'should be' changing the ontology, why doesn't the > getOrImport method work properly ? > > Here are my observations: > > After my initial failing to get my feature annotation properties by the > obvious means: via the Feauture's getAnnotation("note name"); I thought > that it was due to the default ontology namespace not matching the > BioSQL ontology namespace (that is, the ontology name prefix didn't > match). I therefore created a new ontology with a matching namespace, > thinking that would be sufficient. However, looking at the > implementation of the biojavax SimpleRichAnnotation 'contains' method, I > think I'm on a hiding to nothing, since I made the same observation as > Augusto: it always calls the default ontology import or create term > function to make a local instance of the Term passed as argument. > > Logically, this sounds fine, although messy and not thread-safe since > the RichObjectFactory holds a single static instance of the default > Ontology - except that it completely breaks in more ways than I thought > possible. The reason this doesn't work is that the as far as I can tell, > the Term equivalence relationship is implemented by object reference > equivalence, rather than by equals method equivalence. That is, Terms > can only be equal if they come from the same ComparableOntology object > instance, and have the same string constant reference. That means terms > can never be compared from different instances of the same ontology > (such as would be returned from two independent sources - ie a > persistence framework and an external ontology resource). > > I note that in the deprecated org.biojava.bio.seq.db.biosql > implementation, there are methods to handle the BioSQL ontology index. > These are, however, deprecated, and so there is currently no way to > retrieve a singleton instance of the Biosql database's ontology, and > therefore no way that the above equivalence method is ever going to > retrieve the corresponding Note for a particular named property... even > if I could set the RichSequenceObjectFactory's default ontology to be > the one retrieved from the BioSQL database back end! > > sorry to be long winded... but it would be nice to know if I've missed > something completely here! > > thanks > Jim. > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From jimp at compbio.dundee.ac.uk Mon Dec 8 04:41:28 2008 From: jimp at compbio.dundee.ac.uk (James Procter) Date: Mon, 08 Dec 2008 09:41:28 +0000 Subject: [Biojava-l] SimpleRichAnnotation In-Reply-To: <493C21C8.4030800@eaglegenomics.com> References: <381a3e850810210421u54058163ncf347b57394af1b2@mail.gmail.com> <381a3e850810210445sc801d40ja36655349b5920b9@mail.gmail.com> <493B81AE.7090002@compbio.dundee.ac.uk> <493C21C8.4030800@eaglegenomics.com> Message-ID: <493CEBC8.20805@compbio.dundee.ac.uk> Thanks for the reply, Richard - you've cleared up the situation immensely. Richard Holland wrote: > > (d) why are Terms using the == equality test instead of equals(). > (d) for speed, and on the assumption that nobody would want to have two > copies of the exact same ontology in memory at any one time. this is what I thought - and I understand both sides. However, there is always a potential bone of contention here. Is there some kind of mechanism to normalise (or register) an ontology/term instance against the set of known ontology instances ? This might prove useful in situations where ontologies have been imported ad-hoc from alternative sources (such as a web service, where a clone of the object structure is received by the server which will have their own object references). > (b) how can you get a singleton ontology instance loaded from BioSQL > (b) Use the getObject() method on the RichObjectFactory after already > having called the connectToBioSQL() method. Ah - this is what I missed - in the documentation or otherwise :) However - this isn't going to work perfectly in my situation. I am expecting to need thread safe access (as far as is possible) to several biosql backed rich sequence databases - and potentially, each instance may work in a different namespace. I'm sure you can see how this requirement conflicts with the use of a default ontology singleton for term import in equivalence methods. > (a) getProperty()/getProperties() is deprecated but hasn't been marked Its good to know this - as it happens, I ended up using the getNoteSet and implementing an equals() based term equivalence mechanism for my specific sitation. > I hope this helps. it does ! thanks again, Jim. -- ------------------------------------------------------------------- J. B. Procter (ENFIN/VAMSAS) Barton Bioinformatics Research Group Phone/Fax:+44(0)1382 388734/345764 http://www.compbio.dundee.ac.uk The University of Dundee is a Scottish Registered Charity, No. SC015096. From helpmedicine.savelife at gmail.com Mon Dec 8 14:19:54 2008 From: helpmedicine.savelife at gmail.com (helpmedicine savelife) Date: Mon, 8 Dec 2008 14:19:54 -0500 Subject: [Biojava-l] Genbank Parser help Message-ID: <3d0f70fd0812081119s6a5beba9gafb4079ce54b6433@mail.gmail.com> Hello Mark, I am using BioJava 1.6 and trying to parse a genbank file for the organism and source tag information. I am able to get taxon ID, however, I am unable to use getNameHierarchy() or get the common name of the organism. I looked at the documentation and understood that they are ignored by the parser. Is there a way to get this? Thanks, Medi From mark.schreiber at novartis.com Mon Dec 8 21:26:43 2008 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Tue, 9 Dec 2008 10:26:43 +0800 Subject: [Biojava-l] SimpleRichAnnotation In-Reply-To: <493CEBC8.20805@compbio.dundee.ac.uk> Message-ID: Can someone mark these methods as deprecated. I can imagine this causes a lot of confusion. > > (a) getProperty()/getProperties() is deprecated but hasn't been marked > Its good to know this - as it happens, I ended up using the getNoteSet > and implementing an equals() based term equivalence mechanism for my > specific sitation. _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From holland at eaglegenomics.com Tue Dec 9 04:56:58 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Tue, 09 Dec 2008 09:56:58 +0000 Subject: [Biojava-l] Genbank Parser help In-Reply-To: <3d0f70fd0812081119s6a5beba9gafb4079ce54b6433@mail.gmail.com> References: <3d0f70fd0812081119s6a5beba9gafb4079ce54b6433@mail.gmail.com> Message-ID: <493E40EA.9020101@eaglegenomics.com> To get this, you'll need to have loaded the NCBI taxonomy first, either into memory (a bit big) or via a BioSQL database (the better option). cheers, Richard helpmedicine savelife wrote: > Hello Mark, > > I am using BioJava 1.6 and trying to parse a genbank file for the organism > and source tag information. > > I am able to get taxon ID, however, I am unable to use getNameHierarchy() or > get the common name of the organism. I looked at the documentation and > understood that they are ignored by the parser. Is there a way to get this? > > Thanks, > Medi > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From augustovmail-java at yahoo.com.br Wed Dec 10 13:00:00 2008 From: augustovmail-java at yahoo.com.br (Augusto Fernandes Vellozo) Date: Wed, 10 Dec 2008 19:00:00 +0100 Subject: [Biojava-l] Getting features with the same Location Message-ID: <381a3e850812101000j66364e5dh5df8006a11326288@mail.gmail.com> Hi everyone. I need to get all features of a sequence that are on the same location L. I was thinking to do this getting all features of the sequence and after executing the equals(L) for each one. But, I have a lot of features in a sequence. I am trying to reduce the resultset of features to test the equals, getting only the features that start in the position L.getMin() Please, someone knows how can I do this? Thanks, -- Augusto F. Vellozo From holland at eaglegenomics.com Thu Dec 11 05:55:47 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 11 Dec 2008 10:55:47 +0000 Subject: [Biojava-l] Getting features with the same Location In-Reply-To: <381a3e850812101000j66364e5dh5df8006a11326288@mail.gmail.com> References: <381a3e850812101000j66364e5dh5df8006a11326288@mail.gmail.com> Message-ID: <4940F1B3.4070508@eaglegenomics.com> Are you using BioSQL? If you are, take a look at the BioSQLFeatureFilter classes. These will construct a query which will only return the features you are specifically interested in. This bit of code should work once you've connected your BioJava instance to BioSQL (from an earlier post by Gabrielle Doan): public FeatureHolder filterFeature(String name, int startpos, int endpos) { RichLocation rl = new SimpleRichLocation(new SimplePosition(startpos), new SimplePosition(endpos), 0); BioSQLFeatureFilter filter = new BioSQLFeatureFilter.And( new BioSQLFeatureFilter.BySequenceName(name), new BioSQLFeatureFilter.OverlapsRichLocation(rl)); return BioSQLRichSequenceDB.filter(filter); } However, please note that Gabrielle reported a bug on the mailing list back in October with the above code. It's unclear whether or not it's yet been fixed as I can't find the entry in BugZilla. cheers, Richard Augusto Fernandes Vellozo wrote: > Hi everyone. > > I need to get all features of a sequence that are on the same location L. > I was thinking to do this getting all features of the sequence and > after executing the equals(L) for each one. > But, I have a lot of features in a sequence. I am trying to reduce the > resultset of features to test the equals, getting only the features > that start in the position L.getMin() > > Please, someone knows how can I do this? > > Thanks, > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From markjschreiber at gmail.com Thu Dec 11 09:19:02 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 11 Dec 2008 22:19:02 +0800 Subject: [Biojava-l] Getting features with the same Location In-Reply-To: <4940F1B3.4070508@eaglegenomics.com> References: <381a3e850812101000j66364e5dh5df8006a11326288@mail.gmail.com> <4940F1B3.4070508@eaglegenomics.com> Message-ID: <93b45ca50812110619nda505f8hbb24ffbcbdb1e16d@mail.gmail.com> You might also want to use the LocationTools class to determine if the locations of the two features intersect or if one wholey contains another. - Mark On Thu, Dec 11, 2008 at 6:55 PM, Richard Holland wrote: > Are you using BioSQL? > > If you are, take a look at the BioSQLFeatureFilter classes. These will > construct a query which will only return the features you are > specifically interested in. > > This bit of code should work once you've connected your BioJava instance > to BioSQL (from an earlier post by Gabrielle Doan): > > public FeatureHolder filterFeature(String name, int startpos, int > endpos) { > RichLocation rl = new SimpleRichLocation(new > SimplePosition(startpos), > new SimplePosition(endpos), 0); > BioSQLFeatureFilter filter = new BioSQLFeatureFilter.And( > new > BioSQLFeatureFilter.BySequenceName(name), > new > BioSQLFeatureFilter.OverlapsRichLocation(rl)); > return BioSQLRichSequenceDB.filter(filter); > } > > However, please note that Gabrielle reported a bug on the mailing list > back in October with the above code. It's unclear whether or not it's > yet been fixed as I can't find the entry in BugZilla. > > cheers, > Richard > > Augusto Fernandes Vellozo wrote: >> Hi everyone. >> >> I need to get all features of a sequence that are on the same location L. >> I was thinking to do this getting all features of the sequence and >> after executing the equals(L) for each one. >> But, I have a lot of features in a sequence. I am trying to reduce the >> resultset of features to test the equals, getting only the features >> that start in the position L.getMin() >> >> Please, someone knows how can I do this? >> >> Thanks, >> > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From augustovmail-java at yahoo.com.br Thu Dec 11 09:59:31 2008 From: augustovmail-java at yahoo.com.br (Augusto Fernandes Vellozo) Date: Thu, 11 Dec 2008 15:59:31 +0100 Subject: [Biojava-l] Getting features with the same Location In-Reply-To: <93b45ca50812110619nda505f8hbb24ffbcbdb1e16d@mail.gmail.com> References: <381a3e850812101000j66364e5dh5df8006a11326288@mail.gmail.com> <4940F1B3.4070508@eaglegenomics.com> <93b45ca50812110619nda505f8hbb24ffbcbdb1e16d@mail.gmail.com> Message-ID: <381a3e850812110659p6f6bee69x23c67dc0172276e0@mail.gmail.com> Thanks Mark and Richard. Yes Richard, I am using BIOSQL and Gabrielle said that she doesn't know a solution to the problem. Mark, if I use your suggestion, do I need to retrieve all features of the sequence and analyze each one? I am thinking to use direct SQL query to retrieve the features id whit the same location start x of the sequence s and after to load each feature of this resultset to test the equals(location). With this filter, the quantity of the executions of equals(Location) will be reduced a lot. The problem is: I don't have the sequence bioentry_id to use in the direct SQL query. I have an object sequence but I don't have access to bioentry_id. Thanks, Augusto 2008/12/11 Mark Schreiber : > You might also want to use the LocationTools class to determine if the > locations of the two features intersect or if one wholey contains > another. > > - Mark > > On Thu, Dec 11, 2008 at 6:55 PM, Richard Holland > wrote: >> Are you using BioSQL? >> >> If you are, take a look at the BioSQLFeatureFilter classes. These will >> construct a query which will only return the features you are >> specifically interested in. >> >> This bit of code should work once you've connected your BioJava instance >> to BioSQL (from an earlier post by Gabrielle Doan): >> >> public FeatureHolder filterFeature(String name, int startpos, int >> endpos) { >> RichLocation rl = new SimpleRichLocation(new >> SimplePosition(startpos), >> new SimplePosition(endpos), 0); >> BioSQLFeatureFilter filter = new BioSQLFeatureFilter.And( >> new >> BioSQLFeatureFilter.BySequenceName(name), >> new >> BioSQLFeatureFilter.OverlapsRichLocation(rl)); >> return BioSQLRichSequenceDB.filter(filter); >> } >> >> However, please note that Gabrielle reported a bug on the mailing list >> back in October with the above code. It's unclear whether or not it's >> yet been fixed as I can't find the entry in BugZilla. >> >> cheers, >> Richard >> >> Augusto Fernandes Vellozo wrote: >>> Hi everyone. >>> >>> I need to get all features of a sequence that are on the same location L. >>> I was thinking to do this getting all features of the sequence and >>> after executing the equals(L) for each one. >>> But, I have a lot of features in a sequence. I am trying to reduce the >>> resultset of features to test the equals, getting only the features >>> that start in the position L.getMin() >>> >>> Please, someone knows how can I do this? >>> >>> Thanks, >>> >> >> -- >> Richard Holland, BSc MBCS >> Finance Director, Eagle Genomics Ltd >> M: +44 7500 438846 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l >> > -- Augusto F. Vellozo From holland at eaglegenomics.com Thu Dec 11 10:02:18 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 11 Dec 2008 15:02:18 +0000 Subject: [Biojava-l] Getting features with the same Location In-Reply-To: <381a3e850812110659p6f6bee69x23c67dc0172276e0@mail.gmail.com> References: <381a3e850812101000j66364e5dh5df8006a11326288@mail.gmail.com> <4940F1B3.4070508@eaglegenomics.com> <93b45ca50812110619nda505f8hbb24ffbcbdb1e16d@mail.gmail.com> <381a3e850812110659p6f6bee69x23c67dc0172276e0@mail.gmail.com> Message-ID: <49412B7A.4060601@eaglegenomics.com> Yes, the bioentry_id is private. You could temporarily modify the BioEntry object in your local copy to make the getId() method public, which would then give you access to bioentry_id. cheers, Richard Augusto Fernandes Vellozo wrote: > Thanks Mark and Richard. > > Yes Richard, I am using BIOSQL and Gabrielle said that she doesn't > know a solution to the problem. > Mark, if I use your suggestion, do I need to retrieve all features of > the sequence and analyze each one? > > I am thinking to use direct SQL query to retrieve the features id whit > the same location start x of the sequence s and after to load each > feature of this resultset to test the equals(location). With this > filter, the quantity of the executions of equals(Location) will be > reduced a lot. The problem is: I don't have the sequence bioentry_id > to use in the direct SQL query. I have an object sequence but I don't > have access to bioentry_id. > > Thanks, > > Augusto > > 2008/12/11 Mark Schreiber : >> You might also want to use the LocationTools class to determine if the >> locations of the two features intersect or if one wholey contains >> another. >> >> - Mark >> >> On Thu, Dec 11, 2008 at 6:55 PM, Richard Holland >> wrote: >>> Are you using BioSQL? >>> >>> If you are, take a look at the BioSQLFeatureFilter classes. These will >>> construct a query which will only return the features you are >>> specifically interested in. >>> >>> This bit of code should work once you've connected your BioJava instance >>> to BioSQL (from an earlier post by Gabrielle Doan): >>> >>> public FeatureHolder filterFeature(String name, int startpos, int >>> endpos) { >>> RichLocation rl = new SimpleRichLocation(new >>> SimplePosition(startpos), >>> new SimplePosition(endpos), 0); >>> BioSQLFeatureFilter filter = new BioSQLFeatureFilter.And( >>> new >>> BioSQLFeatureFilter.BySequenceName(name), >>> new >>> BioSQLFeatureFilter.OverlapsRichLocation(rl)); >>> return BioSQLRichSequenceDB.filter(filter); >>> } >>> >>> However, please note that Gabrielle reported a bug on the mailing list >>> back in October with the above code. It's unclear whether or not it's >>> yet been fixed as I can't find the entry in BugZilla. >>> >>> cheers, >>> Richard >>> >>> Augusto Fernandes Vellozo wrote: >>>> Hi everyone. >>>> >>>> I need to get all features of a sequence that are on the same location L. >>>> I was thinking to do this getting all features of the sequence and >>>> after executing the equals(L) for each one. >>>> But, I have a lot of features in a sequence. I am trying to reduce the >>>> resultset of features to test the equals, getting only the features >>>> that start in the position L.getMin() >>>> >>>> Please, someone knows how can I do this? >>>> >>>> Thanks, >>>> >>> -- >>> Richard Holland, BSc MBCS >>> Finance Director, Eagle Genomics Ltd >>> M: +44 7500 438846 | E: holland at eaglegenomics.com >>> http://www.eaglegenomics.com/ >>> _______________________________________________ >>> Biojava-l mailing list - Biojava-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/biojava-l >>> > > > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From hlapp at gmx.net Thu Dec 11 10:13:14 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 11 Dec 2008 10:13:14 -0500 Subject: [Biojava-l] Getting features with the same Location In-Reply-To: <49412B7A.4060601@eaglegenomics.com> References: <381a3e850812101000j66364e5dh5df8006a11326288@mail.gmail.com> <4940F1B3.4070508@eaglegenomics.com> <93b45ca50812110619nda505f8hbb24ffbcbdb1e16d@mail.gmail.com> <381a3e850812110659p6f6bee69x23c67dc0172276e0@mail.gmail.com> <49412B7A.4060601@eaglegenomics.com> Message-ID: On Dec 11, 2008, at 10:02 AM, Richard Holland wrote: > Yes, the bioentry_id is private. What's the rationale for that? It's the primary key; why should it be forbidden to view it, even for derived classes? (For a Bioperl-db persistent object, the primary key is public and always accessible as $pobj->primary_key(). It can even be changed by the programmer, though of course you should know what you're doing when you decide to do that.) -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From holland at eaglegenomics.com Thu Dec 11 11:28:11 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 11 Dec 2008 16:28:11 +0000 Subject: [Biojava-l] Getting features with the same Location In-Reply-To: References: <381a3e850812101000j66364e5dh5df8006a11326288@mail.gmail.com> <4940F1B3.4070508@eaglegenomics.com> <93b45ca50812110619nda505f8hbb24ffbcbdb1e16d@mail.gmail.com> <381a3e850812110659p6f6bee69x23c67dc0172276e0@mail.gmail.com> <49412B7A.4060601@eaglegenomics.com> Message-ID: <49413F9B.9030404@eaglegenomics.com> The rationale was to prevent it being changed by the programmer, because if that happens then Hibernate gets seriously confused (it relies on the PKs remaining constant whilst an object is in memory). Hilmar Lapp wrote: > > On Dec 11, 2008, at 10:02 AM, Richard Holland wrote: > >> Yes, the bioentry_id is private. > > > What's the rationale for that? It's the primary key; why should it be > forbidden to view it, even for derived classes? > > (For a Bioperl-db persistent object, the primary key is public and > always accessible as $pobj->primary_key(). It can even be changed by the > programmer, though of course you should know what you're doing when you > decide to do that.) > > -hilmar -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From hlapp at gmx.net Thu Dec 11 13:35:11 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 11 Dec 2008 13:35:11 -0500 Subject: [Biojava-l] Getting features with the same Location In-Reply-To: <49413F9B.9030404@eaglegenomics.com> References: <381a3e850812101000j66364e5dh5df8006a11326288@mail.gmail.com> <4940F1B3.4070508@eaglegenomics.com> <93b45ca50812110619nda505f8hbb24ffbcbdb1e16d@mail.gmail.com> <381a3e850812110659p6f6bee69x23c67dc0172276e0@mail.gmail.com> <49412B7A.4060601@eaglegenomics.com> <49413F9B.9030404@eaglegenomics.com> Message-ID: <4188AF5E-FFBF-417B-AD75-9E2F6FE907F8@gmx.net> Understood. You could still make it a read-only property though, no? Would there be dangers with that too? -hilmar On Dec 11, 2008, at 11:28 AM, Richard Holland wrote: > The rationale was to prevent it being changed by the programmer, > because > if that happens then Hibernate gets seriously confused (it relies on > the > PKs remaining constant whilst an object is in memory). > > Hilmar Lapp wrote: >> >> On Dec 11, 2008, at 10:02 AM, Richard Holland wrote: >> >>> Yes, the bioentry_id is private. >> >> >> What's the rationale for that? It's the primary key; why should it be >> forbidden to view it, even for derived classes? >> >> (For a Bioperl-db persistent object, the primary key is public and >> always accessible as $pobj->primary_key(). It can even be changed >> by the >> programmer, though of course you should know what you're doing when >> you >> decide to do that.) >> >> -hilmar > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From mark.schreiber at novartis.com Thu Dec 11 20:41:53 2008 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 12 Dec 2008 09:41:53 +0800 Subject: [Biojava-l] Getting features with the same Location In-Reply-To: <381a3e850812110659p6f6bee69x23c67dc0172276e0@mail.gmail.com> Message-ID: biojava-l-bounces at lists.open-bio.org wrote on 12/11/2008 10:59:31 PM: > Thanks Mark and Richard. > > Mark, if I use your suggestion, do I need to retrieve all features of > the sequence and analyze each one? It really depends what you are trying to do. If you want to find if a location overlaps any other location you can form a union of all the other locations and then test if your location overlaps (or is contained by etc) the CompoundLocation. - Mark _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From mark.schreiber at novartis.com Thu Dec 11 20:49:05 2008 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 12 Dec 2008 09:49:05 +0800 Subject: [Biojava-l] Getting features with the same Location In-Reply-To: <49413F9B.9030404@eaglegenomics.com> Message-ID: Are you sure about that? Under the JPA spec (which Hibernate can comply with) the primary key is often public and settable. I don't think Hibernate gets confused if you change it but I suspect the programmer might. There are cases where you would want to set it. For example you retrieve record 1 change a few fields and then change the PK to 2 and then persist it back. You now have two separate records. Effectively a template copy. If you have auto PK generation you can simply make the PK = null and let the DB find a new PK for you. Yes it can be a little bit dangerous if you don't think about it but it probably doesn't need to be private. Even if it is can't there be a public getter? - Mark biojava-l-bounces at lists.open-bio.org wrote on 12/12/2008 12:28:11 AM: > The rationale was to prevent it being changed by the programmer, because > if that happens then Hibernate gets seriously confused (it relies on the > PKs remaining constant whilst an object is in memory). > > Hilmar Lapp wrote: > > > > On Dec 11, 2008, at 10:02 AM, Richard Holland wrote: > > > >> Yes, the bioentry_id is private. > > > > > > What's the rationale for that? It's the primary key; why should it be > > forbidden to view it, even for derived classes? > > > > (For a Bioperl-db persistent object, the primary key is public and > > always accessible as $pobj->primary_key(). It can even be changed by the > > programmer, though of course you should know what you're doing when you > > decide to do that.) > > > > -hilmar > > -- > Richard Holland, BSc MBCS > Finance Director, Eagle Genomics Ltd > M: +44 7500 438846 | E: holland at eaglegenomics.com > http://www.eaglegenomics.com/ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l _________________________ CONFIDENTIALITY NOTICE The information contained in this e-mail message is intended only for the exclusive use of the individual or entity named above and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivery of the message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately by e-mail and delete the material from any computer. Thank you. From holland at eaglegenomics.com Fri Dec 12 03:48:49 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 12 Dec 2008 08:48:49 +0000 Subject: [Biojava-l] Getting features with the same Location In-Reply-To: References: Message-ID: <49422571.80806@eaglegenomics.com> OK, I didn't know that! In which case there's no reason why at least the getters can't be public. I'll get round to changing them at some point... mark.schreiber at novartis.com wrote: > > Are you sure about that? Under the JPA spec (which Hibernate can comply > with) the primary key is often public and settable. I don't think > Hibernate gets confused if you change it but I suspect the programmer > might. > > There are cases where you would want to set it. For example you > retrieve record 1 change a few fields and then change the PK to 2 and > then persist it back. You now have two separate records. Effectively a > template copy. If you have auto PK generation you can simply make the PK > = null and let the DB find a new PK for you. > > Yes it can be a little bit dangerous if you don't think about it but it > probably doesn't need to be private. > > Even if it is can't there be a public getter? > > - Mark > > biojava-l-bounces at lists.open-bio.org wrote on 12/12/2008 12:28:11 AM: > >> The rationale was to prevent it being changed by the programmer, because >> if that happens then Hibernate gets seriously confused (it relies on the >> PKs remaining constant whilst an object is in memory). >> >> Hilmar Lapp wrote: >> > >> > On Dec 11, 2008, at 10:02 AM, Richard Holland wrote: >> > >> >> Yes, the bioentry_id is private. >> > >> > >> > What's the rationale for that? It's the primary key; why should it be >> > forbidden to view it, even for derived classes? >> > >> > (For a Bioperl-db persistent object, the primary key is public and >> > always accessible as $pobj->primary_key(). It can even be changed by the >> > programmer, though of course you should know what you're doing when you >> > decide to do that.) >> > >> > -hilmar >> >> -- >> Richard Holland, BSc MBCS >> Finance Director, Eagle Genomics Ltd >> M: +44 7500 438846 | E: holland at eaglegenomics.com >> http://www.eaglegenomics.com/ >> _______________________________________________ >> Biojava-l mailing list - Biojava-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-l > > _________________________ > > CONFIDENTIALITY NOTICE > > The information contained in this e-mail message is intended only for > the exclusive use of the individual or entity named above and may > contain information that is privileged, confidential or exempt from > disclosure under applicable law. If the reader of this message is not > the intended recipient, or the employee or agent responsible for > delivery of the message to the intended recipient, you are hereby > notified that any dissemination, distribution or copying of this > communication is strictly prohibited. If you have received this > communication in error, please notify the sender immediately by e-mail > and delete the material from any computer. Thank you. -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From pwrose at ucsd.edu Wed Dec 17 13:00:11 2008 From: pwrose at ucsd.edu (Peter Rose) Date: Wed, 17 Dec 2008 10:00:11 -0800 Subject: [Biojava-l] Job - Scientific Software Developer at RCSB PDB Message-ID: <001a01c96071$4e3e0c70$eaba2550$@edu> The RCSB Protein Data Bank has an opening for a Scientific Software Developer: http://www.pdb.org/pdb/static.do?p=general_information/about_pdb/contact/job _listings.html The Protein Data Bank (http://www.pdb.org) is the single worldwide archive of structural data of biological macromolecules used by more than 150,000 scientists from more than 150 countries per month. Incumbent will identify functional requirements and develop or enhance the current analysis and visualization tools offered by the PDB to the scientific community. Develop innovative approaches that will satisfy users' current needs and anticipate their future needs based on progress in the science of structural biology and structural bioinformatics. Act as a domain expert in structural and computational biology to solve scientific problems and answer user requests. Qualifications: Advanced degree in Computational Biology, Bioinformatics, Structural Biology or comparable combination of education and experience with focus in scientific software development; demonstrated experience with structural analysis of macromolecular structures, protein-protein or protein-ligand interactions, sequence analysis, and functional annotation, scientific programming using the Java programming language; experience developing a dynamic, database-driven web application using HTML, CSS, JavaScript, AJAX, and JSP in a Java EE environment; experience with database design, SQL, and RDBMSs such as MySQL; experience with object-relational mapping is a plus. Please send resume to Dr. Peter Rose at pwrose at ucsd.edu. From bandarus at niaid.nih.gov Thu Dec 18 11:49:42 2008 From: bandarus at niaid.nih.gov (Bandaru, Sandya (NIH/NIAID) [C]) Date: Thu, 18 Dec 2008 11:49:42 -0500 Subject: [Biojava-l] Retrieving Partial gene/CDS information Message-ID: <3AF2B2106E4A7045BFB5AB1795130B4B0116DFCC1A@NIHMLBXBB01.nih.gov> Hi, I used BioJava 1.6 to store Genbank sequences into MySQL database, and tried writing out the Sequence. I noticed the feature location is not having '<' or '>' (if is partial gene/CDS). I looked into the documentation of biojavax and didn't find way to get this information. I found FuzzyLocation in Biojava. I appreciate if you can provide similar feature to get the partial CDS/gene locations. Thanks to the BioJava team for the good work. Regards, Sandya Bandaru Bioinformatics Software Developer Bioinformatics and Computational Biosciences Branch (BCBB) OCICB/OSMO/OD/NIAID/NIH 10401 Fernwood Road Bethesda, MD 20892 Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives From holland at eaglegenomics.com Thu Dec 18 12:14:43 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 18 Dec 2008 17:14:43 +0000 Subject: [Biojava-l] Retrieving Partial gene/CDS information In-Reply-To: <3AF2B2106E4A7045BFB5AB1795130B4B0116DFCC1A@NIHMLBXBB01.nih.gov> References: <3AF2B2106E4A7045BFB5AB1795130B4B0116DFCC1A@NIHMLBXBB01.nih.gov> Message-ID: <494A8503.9080306@eaglegenomics.com> Hello. Whilst BioJava supports the < and > annotation, and will correctly load these from the file and store them as FuzzyLocations internally, when it is asked to write them to BioSQL they are lost. This is BioSQL has no way of storing this information. Therefore when you save the sequence into BioSQL and load it back in later, the < and > are lost. cheers, Richard Bandaru, Sandya (NIH/NIAID) [C] wrote: > Hi, > > I used BioJava 1.6 to store Genbank sequences into MySQL database, and tried writing out the Sequence. I noticed the feature location is not having '<' or '>' (if is partial gene/CDS). I looked into the documentation of biojavax and didn't find way to get this information. > > I found FuzzyLocation in Biojava. I appreciate if you can provide similar feature to get the partial CDS/gene locations. > > Thanks to the BioJava team for the good work. > > Regards, > Sandya Bandaru > Bioinformatics Software Developer > Bioinformatics and Computational Biosciences Branch (BCBB) > OCICB/OSMO/OD/NIAID/NIH > 10401 Fernwood Road > Bethesda, MD 20892 > > Disclaimer: The information in this e-mail and any of its attachments is confidential and may contain sensitive information. It should not be used by anyone who is not the original intended recipient. If you have received this e-mail in error please inform the sender and delete it from your mailbox or any other storage devices. National Institute of Allergy and Infectious Diseases shall not accept liability for any statements made that are sender's own and not expressly made on behalf of the NIAID by one of its representatives > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From hlapp at gmx.net Thu Dec 18 15:22:32 2008 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 18 Dec 2008 15:22:32 -0500 Subject: [Biojava-l] Retrieving Partial gene/CDS information In-Reply-To: <494A8503.9080306@eaglegenomics.com> References: <3AF2B2106E4A7045BFB5AB1795130B4B0116DFCC1A@NIHMLBXBB01.nih.gov> <494A8503.9080306@eaglegenomics.com> Message-ID: On Dec 18, 2008, at 12:14 PM, Richard Holland wrote: > Whilst BioJava supports the < and > annotation, and will correctly > load > these from the file and store them as FuzzyLocations internally, > when it > is asked to write them to BioSQL they are lost. This is BioSQL has no > way of storing this information. That's actually not true. BioSQL has the location_qualifier_value table for storing this, and the foreign key to term to store the location type. Bioperl-db does indeed lose this information when persisting feature locations to BioSQL, but that's Bioperl-db's fault, not BioSQL's. Or am I missing something? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From holland at eaglegenomics.com Thu Dec 18 16:13:22 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Thu, 18 Dec 2008 21:13:22 +0000 Subject: [Biojava-l] Retrieving Partial gene/CDS information In-Reply-To: References: <3AF2B2106E4A7045BFB5AB1795130B4B0116DFCC1A@NIHMLBXBB01.nih.gov> <494A8503.9080306@eaglegenomics.com> Message-ID: <494ABCF2.9090604@eaglegenomics.com> Fair enough. Well, in that case it sounds like a potential improvement case for BioJavaX (at the time I wrote the BioSQL mappers I'm sure there was no way of mapping them else I would have implemented it... but I might have been wrong). First thing to do is to raise the appropriate bug in BugZilla. Bandaru - could you do that? (That way you will 'own' the bug and will be notified when people are working on it / have fixed it.) Then, as to who fixes it... volunteers, please! I do make infrequent trawls through BugZilla, but I rarely have the time to fix any of them. :( cheers, Richard Hilmar Lapp wrote: > > On Dec 18, 2008, at 12:14 PM, Richard Holland wrote: > >> Whilst BioJava supports the < and > annotation, and will correctly load >> these from the file and store them as FuzzyLocations internally, when it >> is asked to write them to BioSQL they are lost. This is BioSQL has no >> way of storing this information. > > > That's actually not true. BioSQL has the location_qualifier_value table > for storing this, and the foreign key to term to store the location type. > > Bioperl-db does indeed lose this information when persisting feature > locations to BioSQL, but that's Bioperl-db's fault, not BioSQL's. > > Or am I missing something? > > -hilmar -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Fri Dec 19 06:28:17 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 19 Dec 2008 11:28:17 +0000 Subject: [Biojava-l] Biojava3 updates Message-ID: <494B8551.7050406@eaglegenomics.com> It seems I forgot to commit my FASTA parser code last time round. I've just committed it now, along with a new class called ThingParserFactory to make file reading/writing much easier. See the updated docs here for a how-to: http://www.biojava.org/wiki/BioJava3:HowTo#FASTA cheers, Richard -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Fri Dec 19 05:25:55 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 19 Dec 2008 10:25:55 +0000 Subject: [Biojava-l] Annotations and Hibernate IDs Message-ID: <494B76B3.7040408@eaglegenomics.com> Hi all, I've just made a commit to the trunk of BioJavaX which resolves the following points: 1. Deprecated get/setProperty in RichAnnotation (hopefully no more confusion - people should use get/setNote[Set] instead). 2. Updated Rich* classes to explicitly specify RichAnnotation instead of Annotation (means getAnnotation returns RichAnnotation now, not plain old Annotation. This helps with point 1 above.). 3. Made all IDs on BioSQL-Rich* classes publicly get/settable. Use with caution! This allows you to identify individual database records from within your code. cheers, Richard -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Fri Dec 19 07:01:14 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Fri, 19 Dec 2008 12:01:14 +0000 Subject: [Biojava-l] Please help - BugZilla Message-ID: <494B8D0A.4090104@eaglegenomics.com> Hi all. I'd like to make a plea for help! There's about 16 reported bugs still open in BugZilla which have been there for quite some time. http://bugzilla.open-bio.org/buglist.cgi?product=BioJava&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED I would really appreciate it if a few people could "Adopt A Bug" and take a serious look at seeing if they can fix it. Just one bug per person would make all the difference. It would hugely help the project, and I would be eternally grateful. You can let me know if you've adopted a bug by assigning it to yourself in BugZilla. Currently only one of the 16 is actually assigned to anyone (thanks Andreas!), but I'm hoping that maybe someone out there will have a few moments to spare over the forthcoming holiday season and might fancy a challenge. Remember, a bug is for life, not just for Christmas! cheers, Richard -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From holland at eaglegenomics.com Mon Dec 22 07:11:50 2008 From: holland at eaglegenomics.com (Richard Holland) Date: Mon, 22 Dec 2008 12:11:50 +0000 Subject: [Biojava-l] BlastXML parsers Message-ID: <494F8406.2070901@eaglegenomics.com> Mark Schreiber has kindly written some BlastXML parsers which can be found in the biojava-blastxml module in the BioJava3 repository. If you run your blasts with XML output, it will be able to fully parse every kind of blast output supported by NCBI blast. cheers, Richard -- Richard Holland, BSc MBCS Finance Director, Eagle Genomics Ltd M: +44 7500 438846 | E: holland at eaglegenomics.com http://www.eaglegenomics.com/ From armita_sh at yahoo.com Wed Dec 24 12:02:11 2008 From: armita_sh at yahoo.com (Armita Sheari) Date: Wed, 24 Dec 2008 09:02:11 -0800 (PST) Subject: [Biojava-l] How can I calculate the RMSD of Global Alignment? Message-ID: <537200.67656.qm@web51412.mail.re2.yahoo.com> ???? Hi everyone, ?I want to calculate the RMSD relevent to the Structural Global Alignment of two proteins. I have used the align method of the StructurePairAligner class with default parameters, and then I have calculated the RMSD using the getRmsd method of the AlternativeAlignment class. But it seems something is wrong. I think I should change some parameters of align method which are defined in StrucAligParameters class. Unfortunately, I couldn't find any documentation that describes the parameters (not in the api nor in the source code). I would be thankful if you take a look at my code and let me know your opinion about which parameter(s) I should change. StructurePairAligner structurePairAligner = new StructurePairAligner(); structurePairAligner.align(structure1, structure2); AlternativeAlignment[] alternativeAlignment =? structurePairAligner.getAlignments(); ClusterAltAligs.cluster(alternativeAlignment); double minRmsd = 1000; double rmsd = 0; for(int i = 0; i < alternativeAlignment.length; i++) ?????????? { ?????????????? rmsd = alternativeAlignment[i].getRmsd(); ?????????????? if(rmsd < minRmsd) minRmsd = rmsd; ?????????????? rmsd = 0; ?????????? } return minRmsd; Thanks, Armitash From andreas.prlic at gmail.com Thu Dec 25 07:20:33 2008 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Thu, 25 Dec 2008 13:20:33 +0100 Subject: [Biojava-l] [Biojava-dev] How can I calculate the RMSD of Global Alignment? In-Reply-To: <537200.67656.qm@web51412.mail.re2.yahoo.com> References: <537200.67656.qm@web51412.mail.re2.yahoo.com> Message-ID: <59a41c430812250420j35947f87i6589735ff4c4273c@mail.gmail.com> Hi Armita, I agree the missing documentation for all the alignment parameters is a problem... It is not exactly clear from your mail what is the problem you encountered. The code that you sent in principle looks fine. I suspect you want to select the "best" of the alternative alignments? In that case you can simply take the first, since the alignments come out sorted. An example can be run from here: http://www.biojava.org/download/performance/biojava-structure-example1.jnlp The alternative alignments are sorted according to their number of structurally equivalent residues. The rmsd in the different alternative solutions is always kept under a certain threshold (one of the parameters), since one of the strategies is to try to keep the rmsd constant, while maximizing the number of structurally equivalent residues. Similar results are clustered together in the same cluster. In the example shown above this allows to find the multiple matches between the four chains of hemoglobin and myoglobin. Andreas On Wed, Dec 24, 2008 at 6:02 PM, Armita Sheari wrote: > Hi everyone, > > I want to calculate the RMSD relevent to the Structural Global Alignment of two proteins. I have used the align method of the StructurePairAligner class with default parameters, and then I have calculated the RMSD using the getRmsd method of the AlternativeAlignment class. But it seems something is wrong. > I think I should change some parameters of align method which are defined in StrucAligParameters class. Unfortunately, I couldn't find any documentation that describes the parameters (not in the api nor in the source code). > > I would be thankful if you take a look at my code and let me know your opinion about which parameter(s) I should change. > > StructurePairAligner structurePairAligner = new StructurePairAligner(); > structurePairAligner.align(structure1, structure2); > AlternativeAlignment[] alternativeAlignment = structurePairAligner.getAlignments(); > ClusterAltAligs.cluster(alternativeAlignment); > > double minRmsd = 1000; > double rmsd = 0; > for(int i = 0; i < alternativeAlignment.length; i++) > { > rmsd = alternativeAlignment[i].getRmsd(); > if(rmsd < minRmsd) minRmsd = rmsd; > rmsd = 0; > } > return minRmsd; > > Thanks, > Armitash > > > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From andreas at sdsc.edu Sat Dec 27 01:26:27 2008 From: andreas at sdsc.edu (Andreas Prlic) Date: Sat, 27 Dec 2008 07:26:27 +0100 Subject: [Biojava-l] BioJava In-Reply-To: <4954D4A4.7040303@wp.pl> References: <4954D4A4.7040303@wp.pl> Message-ID: <59a41c430812262226p40e4aa4eq3c6edddf1e0d1fb0@mail.gmail.com> Hi Michal, If you look at the summary of the structure alignments, the alignment went well > #1 cluster:1 eqr:17 rmsd:0.00 %id:17 gaps:0 score:102.00 and it superimposed your 17-residue test fragments perfectly. The problem is that you got a weird residue naming in your input files as is indicated by the output > unknown amino acid CCC so the BioJava PDB output method converts those groups to Hetam records: HETATM 1 CA CCC A 1 14.566 55.361 10.273 1 0 HETATM 2 CA CCC A 2 26.928 51.739 16.185 1 0 HETATM 3 CA CCC A 3 13.379 55.253 7.685 1 0 HETATM 4 CA CCC A 4 18.626 45.916 1.163 1 0 ... I assume your viewer does not display this right? Set it to spacefill mode and you will see the atoms.... To conclude, If you replace your input file with some real protein data it will work fine.... Andreas 2008/12/26 Michal Lorenc : > Hello Andreas, > I tried to superpose two pdb files with BioJava ( > http://www.biojava.org/wiki/BioJava:CookBook:PDB:align ), but unfortunately > it does not work with attached two pdb files. > > aligning centroid_template vs. centroid_target > unknown amino acid CCC > unknown amino acid CCC > .... > unknown amino acid CCC > #1 cluster:1 eqr:17 rmsd:0.00 %id:17 gaps:0 score:102.00 > #2 cluster:1 eqr:17 rmsd:0.00 %id:17 gaps:0 score:102.00 > #3 cluster:1 eqr:17 rmsd:0.00 %id:17 gaps:0 score:102.00 > #4 cluster:1 eqr:17 rmsd:0.00 %id:17 gaps:0 score:102.00 > #5 cluster:1 eqr:17 rmsd:0.00 %id:17 gaps:0 score:102.00 > #6 cluster:1 eqr:17 rmsd:0.00 %id:17 gaps:0 score:102.00 > #7 cluster:1 eqr:17 rmsd:0.00 %id:17 gaps:0 score:102.00 > #8 cluster:1 eqr:17 rmsd:0.00 %id:17 gaps:0 score:102.00 > > 0.194 -0.854 0.483 > 0.949 0.289 0.129 > -0.250 0.433 0.866 > writing alignment to centroid_template_centroid_target.pdb > > Do you know how could I fix the problem? > > Thank you in advance. > > Best regards, > > Michal > > P.S. Happy New Year! > From armita_sh at yahoo.com Mon Dec 29 07:44:16 2008 From: armita_sh at yahoo.com (Armita Sheari) Date: Mon, 29 Dec 2008 04:44:16 -0800 (PST) Subject: [Biojava-l] [Biojava-dev] How can I calculate the RMSD of Global Alignment? In-Reply-To: <59a41c430812250420j35947f87i6589735ff4c4273c@mail.gmail.com> Message-ID: <577293.91936.qm@web51403.mail.re2.yahoo.com> Dear Andreas, ? Thanks for your answer. I saw the run of the example you had linked. As I found, all rmsd numbers?which?were calculated in the example?were?related?to Local Alignments. Wheares I need to?find the?RMSD of Global Alignment (without any Gap). And we should consider all Alpha Carbons of the proteins to calculate the?RMSD. ? Which parameters should I change in align method to?reach the Global Alignment with NO Gap? ? Thanks again, ArmitaSh --- On Thu, 12/25/08, Andreas Prlic wrote: From: Andreas Prlic Subject: Re: [Biojava-dev] How can I calculate the RMSD of Global Alignment? To: armita_sh at yahoo.com Cc: "Biojava" , "biojava-dev" Date: Thursday, December 25, 2008, 7:20 PM Hi Armita, I agree the missing documentation for all the alignment parameters is a problem... It is not exactly clear from your mail what is the problem you encountered. The code that you sent in principle looks fine. I suspect you want to select the "best" of the alternative alignments? In that case you can simply take the first, since the alignments come out sorted. An example can be run from here: http://www.biojava.org/download/performance/biojava-structure-example1.jnlp The alternative alignments are sorted according to their number of structurally equivalent residues. The rmsd in the different alternative solutions is always kept under a certain threshold (one of the parameters), since one of the strategies is to try to keep the rmsd constant, while maximizing the number of structurally equivalent residues. Similar results are clustered together in the same cluster. In the example shown above this allows to find the multiple matches between the four chains of hemoglobin and myoglobin. Andreas On Wed, Dec 24, 2008 at 6:02 PM, Armita Sheari wrote: > Hi everyone, > > I want to calculate the RMSD relevent to the Structural Global Alignment of two proteins. I have used the align method of the StructurePairAligner class with default parameters, and then I have calculated the RMSD using the getRmsd method of the AlternativeAlignment class. But it seems something is wrong. > I think I should change some parameters of align method which are defined in StrucAligParameters class. Unfortunately, I couldn't find any documentation that describes the parameters (not in the api nor in the source code). > > I would be thankful if you take a look at my code and let me know your opinion about which parameter(s) I should change. > > StructurePairAligner structurePairAligner = new StructurePairAligner(); > structurePairAligner.align(structure1, structure2); > AlternativeAlignment[] alternativeAlignment = structurePairAligner.getAlignments(); > ClusterAltAligs.cluster(alternativeAlignment); > > double minRmsd = 1000; > double rmsd = 0; > for(int i = 0; i < alternativeAlignment.length; i++) > { > rmsd = alternativeAlignment[i].getRmsd(); > if(rmsd < minRmsd) minRmsd = rmsd; > rmsd = 0; > } > return minRmsd; > > Thanks, > Armitash > > > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From andreas.prlic at gmail.com Mon Dec 29 11:47:16 2008 From: andreas.prlic at gmail.com (Andreas Prlic) Date: Mon, 29 Dec 2008 17:47:16 +0100 Subject: [Biojava-l] [Biojava-dev] How can I calculate the RMSD of Global Alignment? In-Reply-To: <577293.91936.qm@web51403.mail.re2.yahoo.com> References: <59a41c430812250420j35947f87i6589735ff4c4273c@mail.gmail.com> <577293.91936.qm@web51403.mail.re2.yahoo.com> Message-ID: <59a41c430812290847j2e80b475lcc8f85bff1fe7146@mail.gmail.com> Hi Armita, About calculating the global alignment with no gaps: This makes sense if you work with highly similar proteins or e.g. if you run time simulations for one protein and want to compare different points in time. I assume in your situation that every CA atom has a well defined match in the protein to compare with. In this case going through the full pairwise structure alignment procedure is not needed, since you already know what the equivalent residues are. Equivalent sets of Atoms can be compared using the SVDSuperimposer class. Andreas On Mon, Dec 29, 2008 at 1:44 PM, Armita Sheari wrote: > Dear Andreas, > > Thanks for your answer. I saw the run of the example you had linked. As I found, all rmsd numbers which were calculated in the example were related to Local Alignments. Wheares I need to find the RMSD of Global Alignment (without any Gap). And we should consider all Alpha Carbons of the proteins to calculate the RMSD. > > Which parameters should I change in align method to reach the Global Alignment with NO Gap? > > Thanks again, > ArmitaSh > > > --- On Thu, 12/25/08, Andreas Prlic wrote: > > From: Andreas Prlic > Subject: Re: [Biojava-dev] How can I calculate the RMSD of Global Alignment? > To: armita_sh at yahoo.com > Cc: "Biojava" , "biojava-dev" > Date: Thursday, December 25, 2008, 7:20 PM > > Hi Armita, > > I agree the missing documentation for all the alignment parameters is > a problem... It is not exactly clear from your mail what is the > problem you encountered. The code that you sent in principle looks > fine. I suspect you want to select the "best" of the alternative > alignments? In that case you can simply take the first, since the > alignments come out sorted. An example can be run from here: > > http://www.biojava.org/download/performance/biojava-structure-example1.jnlp > > The alternative alignments are sorted according to their number of > structurally equivalent residues. The rmsd in the different > alternative solutions is always kept under a certain threshold (one of > the parameters), since one of the strategies is to try to keep the > rmsd constant, while maximizing the number of structurally equivalent > residues. Similar results are clustered together in the same cluster. > In the example shown above this allows to find the multiple matches > between the four chains of hemoglobin and myoglobin. > > Andreas > > > > > > On Wed, Dec 24, 2008 at 6:02 PM, Armita Sheari > wrote: >> Hi everyone, >> >> I want to calculate the RMSD relevent to the Structural Global Alignment > of two proteins. I have used the align method of the StructurePairAligner class > with default parameters, and then I have calculated the RMSD using the getRmsd > method of the AlternativeAlignment class. But it seems something is wrong. >> I think I should change some parameters of align method which are defined > in StrucAligParameters class. Unfortunately, I couldn't find any > documentation that describes the parameters (not in the api nor in the source > code). >> >> I would be thankful if you take a look at my code and let me know your > opinion about which parameter(s) I should change. >> >> StructurePairAligner structurePairAligner = new StructurePairAligner(); >> structurePairAligner.align(structure1, structure2); >> AlternativeAlignment[] alternativeAlignment = > structurePairAligner.getAlignments(); >> ClusterAltAligs.cluster(alternativeAlignment); >> >> double minRmsd = 1000; >> double rmsd = 0; >> for(int i = 0; i < alternativeAlignment.length; i++) >> { >> rmsd = alternativeAlignment[i].getRmsd(); >> if(rmsd < minRmsd) minRmsd = rmsd; >> rmsd = 0; >> } >> return minRmsd; >> >> Thanks, >> Armitash >> >> >> >> >> >> >> >> _______________________________________________ >> biojava-dev mailing list >> biojava-dev at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/biojava-dev >> > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-dev > From markjschreiber at gmail.com Tue Dec 30 02:12:42 2008 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 30 Dec 2008 15:12:42 +0800 Subject: [Biojava-l] Join the LinkedIn BioJava group Message-ID: <93b45ca50812292312o72fa24e5lfbf4dfcbba659e9a@mail.gmail.com> Hi - FYI there is a BioJava group for people who are on linkedin. You can join it by following this link http://www.linkedin.com/e/gis/58404 - Mark