From zagato.gekko at gmail.com Thu Mar 1 21:27:12 2007 From: zagato.gekko at gmail.com (Zagato) Date: Thu, 1 Mar 2007 21:27:12 -0500 Subject: [Biojava-l] (pre-Singapore) BioSQL ??? Message-ID: <98028b00703011827j67e6772fm66903d5175ff1845@mail.gmail.com> Hello everyone... i'm experimenting with connection to a BioSQL database on postgres, i'm follow this guide: http://www.biojava.org/wiki/BioJava:CookBook:BioSQL:Manage, and change the parameter to connect to postrges, but i get an error and i don't know why (i download the scripts for postgres from: http://www.biojava.org/download/biosql/), my error is: org.biojava.bio.BioException: This database appears to be an old (pre-Singapore) BioSQL. If you need to access it, try an older BioJava snapshot (1.3pre1 or earlier) at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.initDb( BioSQLSequenceDB.java:227) at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.( BioSQLSequenceDB.java:195) at Connect.main(Connect.java:40) I cannot using the CVS version of BioJava and can't use an older because a bug. Someone can give me some guide.... Thanks !!! Alan Acosta -- Farewell. ruby << __EOF__ puts [ 111, 116, 97, 103, 97, 90 ].collect{|v| v.chr}.join.reverse __EOF__ From mark.schreiber at novartis.com Thu Mar 1 21:48:31 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Fri, 2 Mar 2007 10:48:31 +0800 Subject: [Biojava-l] (pre-Singapore) BioSQL ??? Message-ID: Hi - I think those scripts are particularly old. You could try getting the appropriate ones from http://code.open-bio.org/cgi/viewcvs.cgi/biosql-schema/sql/?cvsroot=biosql You should also know that we are no longer supporting the biojava 1.4 bindings to biosql. As of biojava 1.5 we are using Hibernate to do a much better job of the ORM. You also get much better control over transactions. I strongly recommend you download biojava1.5-pre2 from http://biojava.org/wiki/BioJava:Download. For instructions on using Hibernate with Biojava take a look at http://biojava.org/wiki/BioJava:BioJavaXDocs#BioSQL_and_Hibernate. Best regards, - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com www.dengueinfo.org phone +65 6722 2973 fax +65 6722 2910 Zagato Sent by: biojava-l-bounces at lists.open-bio.org 03/02/2007 10:27 AM To: Biojava-l at lists.open-bio.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] (pre-Singapore) BioSQL ??? Hello everyone... i'm experimenting with connection to a BioSQL database on postgres, i'm follow this guide: http://www.biojava.org/wiki/BioJava:CookBook:BioSQL:Manage, and change the parameter to connect to postrges, but i get an error and i don't know why (i download the scripts for postgres from: http://www.biojava.org/download/biosql/), my error is: org.biojava.bio.BioException: This database appears to be an old (pre-Singapore) BioSQL. If you need to access it, try an older BioJava snapshot (1.3pre1 or earlier) at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.initDb( BioSQLSequenceDB.java:227) at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.( BioSQLSequenceDB.java:195) at Connect.main(Connect.java:40) I cannot using the CVS version of BioJava and can't use an older because a bug. Someone can give me some guide.... Thanks !!! Alan Acosta -- Farewell. ruby << __EOF__ puts [ 111, 116, 97, 103, 97, 90 ].collect{|v| v.chr}.join.reverse __EOF__ _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From zagato.gekko at gmail.com Fri Mar 2 09:09:54 2007 From: zagato.gekko at gmail.com (Zagato) Date: Fri, 2 Mar 2007 09:09:54 -0500 Subject: [Biojava-l] (pre-Singapore) BioSQL ??? In-Reply-To: References: Message-ID: <98028b00703020609vb00c5cdgcb38462b7cc1a4a5@mail.gmail.com> Thanks for the answer, i will check the hibernate alternative :D Alan On 3/1/07, mark.schreiber at novartis.com wrote: > > Hi - > > I think those scripts are particularly old. You could try getting the > appropriate ones from > http://code.open-bio.org/cgi/viewcvs.cgi/biosql-schema/sql/?cvsroot=biosql > > You should also know that we are no longer supporting the biojava 1.4 > bindings to biosql. As of biojava 1.5 we are using Hibernate to do a much > better job of the ORM. You also get much better control over transactions. > I strongly recommend you download biojava1.5-pre2 from > http://biojava.org/wiki/BioJava:Download. > > For instructions on using Hibernate with Biojava take a look at > http://biojava.org/wiki/BioJava:BioJavaXDocs#BioSQL_and_Hibernate. > > Best regards, > > - Mark > > Mark Schreiber > Research Investigator (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > www.dengueinfo.org > > phone +65 6722 2973 > fax +65 6722 2910 > > > > > > Zagato > Sent by: biojava-l-bounces at lists.open-bio.org > 03/02/2007 10:27 AM > > > To: Biojava-l at lists.open-bio.org > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] (pre-Singapore) BioSQL ??? > > > Hello everyone... i'm experimenting with connection to a BioSQL database > on > postgres, i'm follow this guide: > http://www.biojava.org/wiki/BioJava:CookBook:BioSQL:Manage, and change the > parameter to connect to postrges, but i get an error and i don't know why > (i > download the scripts for postgres from: > http://www.biojava.org/download/biosql/), my error is: > > org.biojava.bio.BioException: This database appears to be an old > (pre-Singapore) BioSQL. If you need to access it, try an older BioJava > snapshot (1.3pre1 or earlier) > at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.initDb( > BioSQLSequenceDB.java:227) > at org.biojava.bio.seq.db.biosql.BioSQLSequenceDB.( > BioSQLSequenceDB.java:195) > at Connect.main(Connect.java:40) > > I cannot using the CVS version of BioJava and can't use an older because a > bug. Someone can give me some guide.... Thanks !!! > > Alan Acosta > > -- > Farewell. > ruby << __EOF__ > puts [ 111, 116, 97, 103, 97, 90 ].collect{|v| v.chr}.join.reverse > __EOF__ > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > -- Farewell. ruby << __EOF__ puts [ 111, 116, 97, 103, 97, 90 ].collect{|v| v.chr}.join.reverse __EOF__ From alexamies at gmail.com Fri Mar 2 22:03:44 2007 From: alexamies at gmail.com (Alex Amies) Date: Fri, 2 Mar 2007 19:03:44 -0800 Subject: [Biojava-l] [BioJava] New Article Approaches to Web Development for Bioinformatics Message-ID: <1ad8057e0703021903y535f3a14xcf5df2eb39ef2f87@mail.gmail.com> I have written an article titled Approaches to Web Development for Bioinformatics at http://medicalcomputing.net/tools_dna1.php There is a section on BioJava at http://medicalcomputing.net/tools_dna14.php Hopefully, someone will get something out of it. Please let me know if you have any comments or find mistakes. Alex From markjschreiber at gmail.com Sat Mar 3 00:39:05 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 3 Mar 2007 13:39:05 +0800 Subject: [Biojava-l] [BioJava] New Article Approaches to Web Development for Bioinformatics In-Reply-To: <1ad8057e0703021903y535f3a14xcf5df2eb39ef2f87@mail.gmail.com> References: <1ad8057e0703021903y535f3a14xcf5df2eb39ef2f87@mail.gmail.com> Message-ID: <93b45ca50703022139i7d281ff0jd9b95a3e57ccfb96@mail.gmail.com> Hi Alex - This is a nice introduction. Please feel free to put a link in the biojava wiki (www.biojava.org) probably somewhere in the cookbook section would be good. - Mark On 3/3/07, Alex Amies wrote: > I have written an article titled Approaches to Web Development for > Bioinformatics at > > http://medicalcomputing.net/tools_dna1.php > > There is a section on BioJava at > > http://medicalcomputing.net/tools_dna14.php > > Hopefully, someone will get something out of it. Please let me know > if you have any comments or find mistakes. > > Alex > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From arareko at campus.iztacala.unam.mx Sat Mar 3 17:32:46 2007 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Sat, 03 Mar 2007 16:32:46 -0600 Subject: [Biojava-l] [Bioperl-l] New Article on Approaches to Web Development for Bioinformatics In-Reply-To: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com> References: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com> Message-ID: <45E9F78E.8040406@campus.iztacala.unam.mx> Hi Alex, I think you've put a very nice & concise introductory article. I'd like to comment a little on some sections I've read: * Introduction > "Given that you have an idea for analyzing or presenting data in a > particular was, a complete bioinformatics web application depends of > these basic pieces, which is what this article is all about: > > 1. A source of data... > 2. An application programming language... > 3. A web application platform... > 4. Optionally, a data store... > 5. Optionally, you would reuse software tools..." Even though you do a small mention about Web Services at the very end of the article (under Application Integration -> Programmatic Integration), I believe that Web Services can be another optional (or even basic) piece of a web application. In fact, many web applications consist only of Web Services without HTML user interfaces. * Application Development Languages > "There are many different programming platforms and tools available to > solve bioinformatics problems. It can be bewildering at first, but it > makes more sense to build on top of some of these tools rather than > build from scratch. Some the problems with using these tools for a > bioinformatics portal are > > 1. Many tools are written... > 2. Some tools have particular prerequisites... > 3. Many may not be in a form... > 4. The context that gives meaning... > > Standardization on a particular platform can help manageability but > for most organizations a compromise between standardization and > adoption of several different platforms will allow many people to > develop software in platforms that they are already comfortable with > and allow the reuse of a large amount of freely available software..." I would add to the problems list the fact that building web (or other kind of) applications on top of a platform whose codebase is evolving constantly, can make them very difficult to maintain. The case of EnsEMBL comes to my mind here: they opted to stick with BioPerl 1.2.3 as a core library and haven't moved onto a higher version of it because the EnsEMBL code is so vast, that a simple upgrade of BioPerl would break a lot of their code. AFAIK, it's because of this and the slowness at some parts of BioPerl that EnsEMBL is gradually saying goodbye to BioPerl. Also, I think that depending on the amount of available code you plan to import into your application, sometimes having a whole platform at the very bottom can add unnecessary extra weight to your application. More weight could be equal to less speed, this is critical in web development. * Application Integration -> Navigation > "The basic way that users will navigate into and around your > application should be using HTTP GET and POST requests with specific > URL's. Users bookmark these URL's and other applications will link to > them. Most applications developers did not realize it at first, but > these URL's are, in fact, an interface into your application that you > must maintain in a consistent way as you change and evolve your > software. Otherwise, they will find dead links..." Just as I clicked the bookmark button for your article :) The same principle could apply to its filenames. A URL of the form: http://medicalcomputing.net/tools_dna17.php is less indicative of the real content of the article and can mislead potential readers. Optimising the URL's will make them better to be indexed by search engines, something like: http://medicalcomputing.net/web-development-bioinformatics17.php would do the trick. To conclude my comments, I was surprised to see a section about BioPHP and not about other more-known toolkits like BioPython or BioRuby. What about their role in web development? Python is also a common language for web programming and with all the recent *hot* stuff like Ruby On Rails, it's very likely that both Bio* toolkits are more than ready for deploying web applications. I'm Cc'ing this to their respective mailing lists to see if someone wants to give you some feedback about them in order to complement your article. Other than that, I really liked your work :) Cheers, Mauricio. Alex Amies wrote: > I have written an article on Approaches to Web Development for > Bioinformatics at > > http://medicalcomputing.net/tools_dna1.php > > There is a fairly large section on BioPerl at > > http://medicalcomputing.net/tools_dna13.php > > I hope that someone gets something useful out of it. I also looking for > feedback on it and, in particular, please let me know about any mistakes in > it. > > The intent of the article is to give an overview of various approaches to > developing web based tools for bioinformatics. It describes the alternatives > at each layer of the system, including the data layer and sources of data, > the application programming layer, the web layer, and bioinformatics tools > and software libraries. > > Alex > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From alexamies at gmail.com Sat Mar 3 22:09:51 2007 From: alexamies at gmail.com (Alex Amies) Date: Sat, 3 Mar 2007 19:09:51 -0800 Subject: [Biojava-l] [Bioperl-l] New Article on Approaches to Web Development for Bioinformatics In-Reply-To: <45E9F78E.8040406@campus.iztacala.unam.mx> References: <1ad8057e0703021842y683853f5k1c97dbf362f20dda@mail.gmail.com> <45E9F78E.8040406@campus.iztacala.unam.mx> Message-ID: <1ad8057e0703031909v4880f5f1t3c4159b75c36bcca@mail.gmail.com> Mauricio, Thanks for your comments. You are right that I could have said a lot more about web services. I plan on doing that but I haven't got there yet. Actually, with all the hype about web services I have been surprised to find the programming model so complicated. As you mention, I certainly could have thought out my own URL's better. I have been surprised not to find more PHP activity in bioinformatics. To me, besides being a lightweight and pleasant language to program in it is incredibly economical for hosting Internet applications and there is a huge open source community around PHP in general. The same can be said of Perl. It is because of my own ignorance and lack of time that I have not investigated Python and Ruby. I may do in the future and write about them. Alex On 3/3/07, Mauricio Herrera Cuadra wrote: > Hi Alex, > > I think you've put a very nice & concise introductory article. I'd like > to comment a little on some sections I've read: > > * Introduction > > > "Given that you have an idea for analyzing or presenting data in a > > particular was, a complete bioinformatics web application depends of > > these basic pieces, which is what this article is all about: > > > > 1. A source of data... > > 2. An application programming language... > > 3. A web application platform... > > 4. Optionally, a data store... > > 5. Optionally, you would reuse software tools..." > > Even though you do a small mention about Web Services at the very end of > the article (under Application Integration -> Programmatic Integration), > I believe that Web Services can be another optional (or even basic) > piece of a web application. In fact, many web applications consist only > of Web Services without HTML user interfaces. > > * Application Development Languages > > > "There are many different programming platforms and tools available to > > solve bioinformatics problems. It can be bewildering at first, but it > > makes more sense to build on top of some of these tools rather than > > build from scratch. Some the problems with using these tools for a > > bioinformatics portal are > > > > 1. Many tools are written... > > 2. Some tools have particular prerequisites... > > 3. Many may not be in a form... > > 4. The context that gives meaning... > > > > Standardization on a particular platform can help manageability but > > for most organizations a compromise between standardization and > > adoption of several different platforms will allow many people to > > develop software in platforms that they are already comfortable with > > and allow the reuse of a large amount of freely available software..." > > I would add to the problems list the fact that building web (or other > kind of) applications on top of a platform whose codebase is evolving > constantly, can make them very difficult to maintain. The case of > EnsEMBL comes to my mind here: they opted to stick with BioPerl 1.2.3 as > a core library and haven't moved onto a higher version of it because the > EnsEMBL code is so vast, that a simple upgrade of BioPerl would break a > lot of their code. AFAIK, it's because of this and the slowness at some > parts of BioPerl that EnsEMBL is gradually saying goodbye to BioPerl. > > Also, I think that depending on the amount of available code you plan to > import into your application, sometimes having a whole platform at the > very bottom can add unnecessary extra weight to your application. More > weight could be equal to less speed, this is critical in web development. > > * Application Integration -> Navigation > > > "The basic way that users will navigate into and around your > > application should be using HTTP GET and POST requests with specific > > URL's. Users bookmark these URL's and other applications will link to > > them. Most applications developers did not realize it at first, but > > these URL's are, in fact, an interface into your application that you > > must maintain in a consistent way as you change and evolve your > > software. Otherwise, they will find dead links..." > > Just as I clicked the bookmark button for your article :) The same > principle could apply to its filenames. A URL of the form: > http://medicalcomputing.net/tools_dna17.php is less indicative of the > real content of the article and can mislead potential readers. > Optimising the URL's will make them better to be indexed by search > engines, something like: > http://medicalcomputing.net/web-development-bioinformatics17.php would > do the trick. > > To conclude my comments, I was surprised to see a section about BioPHP > and not about other more-known toolkits like BioPython or BioRuby. What > about their role in web development? Python is also a common language > for web programming and with all the recent *hot* stuff like Ruby On > Rails, it's very likely that both Bio* toolkits are more than ready for > deploying web applications. I'm Cc'ing this to their respective mailing > lists to see if someone wants to give you some feedback about them in > order to complement your article. Other than that, I really liked your > work :) > > Cheers, > Mauricio. > > Alex Amies wrote: > > I have written an article on Approaches to Web Development for > > Bioinformatics at > > > > http://medicalcomputing.net/tools_dna1.php > > > > There is a fairly large section on BioPerl at > > > > http://medicalcomputing.net/tools_dna13.php > > > > I hope that someone gets something useful out of it. I also looking for > > feedback on it and, in particular, please let me know about any mistakes in > > it. > > > > The intent of the article is to give an overview of various approaches to > > developing web based tools for bioinformatics. It describes the alternatives > > at each layer of the system, including the data layer and sources of data, > > the application programming layer, the web layer, and bioinformatics tools > > and software libraries. > > > > Alex > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM > > > > From christoph.gille at charite.de Mon Mar 5 16:51:34 2007 From: christoph.gille at charite.de (Dr. Christoph Gille) Date: Mon, 5 Mar 2007 22:51:34 +0100 (CET) Subject: [Biojava-l] Biojava plugin Message-ID: <55580.141.42.56.114.1173131494.squirrel@webmail.charite.de> If you have developed an application based on Biojava which operates on nucleotide or amino acid sequences, or even on protein 3D-structures, you can make a plugin for STRAP with relatively little effort. Plugins can be downloaded and run in STRAP with only a few mouse clicks. Therefore your plugin may be well accepted by the growing number of STRAP users. STRAP still lacks tools for restriction site analysis, proteolytic site prediction, promotor and transcription factor binding site analyses and gene prediction. It would be nice if you could extend the functionality of STRAP with your application. Christoph From dankoc at gmail.com Sun Mar 11 16:22:48 2007 From: dankoc at gmail.com (Charles Danko) Date: Sun, 11 Mar 2007 16:22:48 -0400 Subject: [Biojava-l] Ambiguity consensus string search Message-ID: <8adccabf0703111322ya303927o194eabcd9c5978c@mail.gmail.com> Hi, I'm trying to search a Sequence or SymbolList object for a consensus sequence that contains IUPac ambiguity codes. Without ambiguity codes, I could write a function that breaks the sequence into "windows" the size of the consensus, and checks each window for a match. Am I missing a simple function that does this for me? Next, adding ambiguity codes ... do I have to define my own alphabet for the DNA IUPac codes, or are these already included in the distribution somewhere? I have found the weight matrix class, and realize that I could create one of these objects and calculate a threshold that will work in the same manner as a consensus, but this seems like a bit of a hack for some functionality I am most likely overlooking. Thanks very much! Charles From heuermh at acm.org Sun Mar 11 17:58:04 2007 From: heuermh at acm.org (Michael Heuer) Date: Sun, 11 Mar 2007 16:58:04 -0500 (EST) Subject: [Biojava-l] Ambiguity consensus string search In-Reply-To: <8adccabf0703111322ya303927o194eabcd9c5978c@mail.gmail.com> Message-ID: Hello Charles, You may find the org.biojava.utils.regex package useful in this regard: > http://www.biojava.org/docs/api15b/index.html michael Charles Danko wrote: > Hi, > > I'm trying to search a Sequence or SymbolList object for a consensus > sequence that contains IUPac ambiguity codes. > > Without ambiguity codes, I could write a function that breaks the sequence > into "windows" the size of the consensus, and checks each window for a > match. Am I missing a simple function that does this for me? > > Next, adding ambiguity codes ... do I have to define my own alphabet for the > DNA IUPac codes, or are these already included in the distribution > somewhere? > > I have found the weight matrix class, and realize that I could create one of > these objects and calculate a threshold that will work in the same manner as > a consensus, but this seems like a bit of a hack for some functionality I am > most likely overlooking. > > Thanks very much! > > Charles From dankoc at gmail.com Wed Mar 14 12:15:11 2007 From: dankoc at gmail.com (Charles Danko) Date: Wed, 14 Mar 2007 12:15:11 -0400 Subject: [Biojava-l] Having problems using biojava regex. Message-ID: <8adccabf0703140915s7e955c4et6fd86a12d2f21d8b@mail.gmail.com> Hi, I'm having problems using the biojava regex classes. According to my understanding, the code posted below is the simplest possible example of this class. However, my output is: TAG false 0 TAG The TAG, TAG part of the output is for pattern.patternAsString() and occurence.pattern().patternAsString(). As I understand it, both of these are correct, leading me to believe that both the Pattern and Matcher objects are being created correctly. However, occurences.find() = false and occurences.groupCount() = 0 ... meaning it's not finding any matches!? Where am I going wrong? Many thanks! Charles import org.biojava.bio.*; import org.biojava.bio.seq.*; import org.biojava.bio.symbol.*; import org.biojava.utils.regex.*; import java.util.*; import java.io.*; public class Ambiguity2 { public static void main(String[] args) { try { FiniteAlphabet IUPAC = DNATools.getDNA(); // Create pattern using pattern factory. Pattern pattern; PatternFactory FACTORY = PatternFactory.makeFactory(IUPAC); try{ pattern = FACTORY.compile("TAG"); } catch(Exception e) {e.printStackTrace(); return;} System.out.println(pattern.patternAsString()); // Variables needed... Matcher occurences; // Promoter & Element Element WorkingElement = new Element("ElementName"); SymbolList WorkingPromoter = DNATools.createDNA ("TAGAGATAGACGATAGC"); // Obtain iterator of patterns. try { occurences = pattern.matcher( WorkingPromoter ); } catch(Exception e) {e.printStackTrace(); return;} System.out.println(occurences.find()); System.out.println(occurences.groupCount()); System.out.println(occurences.pattern().patternAsString()); // Foreach match while( occurences.find() ) { // Create Occurence object using information from patterns. System.out.println("Match: " +"\t"+ WorkingPromoter +"\n"+ occurences.start() +"\t"+ occurences.group().seqString()); } } catch (Exception ex) { ex.printStackTrace(); } } } From markjschreiber at gmail.com Wed Mar 14 21:37:06 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Thu, 15 Mar 2007 09:37:06 +0800 Subject: [Biojava-l] Having problems using biojava regex. In-Reply-To: <8adccabf0703140915s7e955c4et6fd86a12d2f21d8b@mail.gmail.com> References: <8adccabf0703140915s7e955c4et6fd86a12d2f21d8b@mail.gmail.com> Message-ID: <93b45ca50703141837g15a8de9fr5bc01341d8cd57c8@mail.gmail.com> Hi - >From memory everything should be lower case. BioJava always represents DNA as lowercase and protein as upper case as per convention. Try that. - Mark On 3/15/07, Charles Danko wrote: > > Hi, > > I'm having problems using the biojava regex classes. > > According to my understanding, the code posted below is the simplest > possible example of this class. > > However, my output is: > TAG > false > 0 > TAG > > The TAG, TAG part of the output is for pattern.patternAsString() and > occurence.pattern().patternAsString(). As I understand it, both of these > are correct, leading me to believe that both the Pattern and Matcher > objects > are being created correctly. However, occurences.find() = false and > occurences.groupCount() = 0 ... meaning it's not finding any matches!? > > Where am I going wrong? > > Many thanks! > Charles > > import org.biojava.bio.*; > import org.biojava.bio.seq.*; > import org.biojava.bio.symbol.*; > import org.biojava.utils.regex.*; > import java.util.*; > import java.io.*; > > public class Ambiguity2 { > public static void main(String[] args) { > try { > FiniteAlphabet IUPAC = DNATools.getDNA(); > > // Create pattern using pattern factory. > Pattern pattern; > PatternFactory FACTORY = PatternFactory.makeFactory(IUPAC); > try{ > pattern = FACTORY.compile("TAG"); > } catch(Exception e) {e.printStackTrace(); return;} > System.out.println(pattern.patternAsString()); > > // Variables needed... > Matcher occurences; > > // Promoter & Element > Element WorkingElement = new Element("ElementName"); > SymbolList WorkingPromoter = DNATools.createDNA > ("TAGAGATAGACGATAGC"); > > // Obtain iterator of patterns. > try { > occurences = pattern.matcher( WorkingPromoter ); > } catch(Exception e) {e.printStackTrace(); return;} > System.out.println(occurences.find()); > System.out.println(occurences.groupCount()); > System.out.println(occurences.pattern().patternAsString()); > // Foreach match > while( occurences.find() ) { > // Create Occurence object using information from patterns. > System.out.println("Match: " +"\t"+ WorkingPromoter +"\n"+ > occurences.start() +"\t"+ occurences.group().seqString()); > } > } > > catch (Exception ex) { > ex.printStackTrace(); > } > } > } > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From dankoc at gmail.com Wed Mar 14 22:20:54 2007 From: dankoc at gmail.com (Charles Danko) Date: Wed, 14 Mar 2007 22:20:54 -0400 Subject: [Biojava-l] Having problems using biojava regex. In-Reply-To: <93b45ca50703141837g15a8de9fr5bc01341d8cd57c8@mail.gmail.com> References: <8adccabf0703140915s7e955c4et6fd86a12d2f21d8b@mail.gmail.com> <93b45ca50703141837g15a8de9fr5bc01341d8cd57c8@mail.gmail.com> Message-ID: <8adccabf0703141920w1246c3b0u39e288941c20cb17@mail.gmail.com> Thank worked! I had no idea it would matter! Thanks very much for the response! Charles On 3/14/07, Mark Schreiber wrote: > > Hi - > > From memory everything should be lower case. BioJava always represents DNA > as lowercase and protein as upper case as per convention. > > Try that. > > - Mark > > > On 3/15/07, Charles Danko wrote: > > > Hi, > > > > I'm having problems using the biojava regex classes. > > > > According to my understanding, the code posted below is the simplest > > possible example of this class. > > > > However, my output is: > > TAG > > false > > 0 > > TAG > > > > The TAG, TAG part of the output is for pattern.patternAsString() and > > occurence.pattern().patternAsString(). As I understand it, both of > > these > > are correct, leading me to believe that both the Pattern and Matcher > > objects > > are being created correctly. However, occurences.find() = false and > > occurences.groupCount() = 0 ... meaning it's not finding any matches!? > > > > Where am I going wrong? > > > > Many thanks! > > Charles > > > > import org.biojava.bio.*; > > import org.biojava.bio.seq.*; > > import org.biojava.bio.symbol.*; > > import org.biojava.utils.regex.*; > > import java.util.* ; > > import java.io.*; > > > > public class Ambiguity2 { > > public static void main(String[] args) { > > try { > > FiniteAlphabet IUPAC = DNATools.getDNA(); > > > > // Create pattern using pattern factory. > > Pattern pattern; > > PatternFactory FACTORY = PatternFactory.makeFactory(IUPAC); > > try{ > > pattern = FACTORY.compile("TAG"); > > } catch(Exception e) {e.printStackTrace(); return;} > > System.out.println(pattern.patternAsString()); > > > > // Variables needed... > > Matcher occurences; > > > > // Promoter & Element > > Element WorkingElement = new Element("ElementName"); > > SymbolList WorkingPromoter = DNATools.createDNA > > ("TAGAGATAGACGATAGC"); > > > > // Obtain iterator of patterns. > > try { > > occurences = pattern.matcher( WorkingPromoter ); > > } catch(Exception e) {e.printStackTrace(); return;} > > System.out.println(occurences.find()); > > System.out.println(occurences.groupCount()); > > System.out.println(occurences.pattern().patternAsString()); > > // Foreach match > > while( occurences.find() ) { > > // Create Occurence object using information from > > patterns. > > System.out.println("Match: " +"\t"+ WorkingPromoter +"\n"+ > > occurences.start() +"\t"+ occurences.group().seqString()); > > } > > } > > > > catch (Exception ex) { > > ex.printStackTrace(); > > } > > } > > } > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > From mark.schreiber at novartis.com Wed Mar 14 22:27:14 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Thu, 15 Mar 2007 10:27:14 +0800 Subject: [Biojava-l] Having problems using biojava regex. Message-ID: It's not so much that biojava is case sensitive as it is that regexes are. You could probably set your regex to be case insensitive to avoid this kind of problem. As a cool trick though, it is possible to soft-mask sequences in BioJava such that lower case and upper case have different meaning. Typically lower case is being masked due to repeats or similar. You could then make a regex that finds only matches in the unmasked regions or alternatively the masked regions. - Mark Mark Schreiber Research Investigator (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com www.dengueinfo.org phone +65 6722 2973 fax +65 6722 2910 "Charles Danko" Sent by: biojava-l-bounces at lists.open-bio.org 03/15/2007 10:20 AM To: "Mark Schreiber" cc: biojava-l at lists.open-bio.org, (bcc: Mark Schreiber/GP/Novartis) Subject: Re: [Biojava-l] Having problems using biojava regex. Thank worked! I had no idea it would matter! Thanks very much for the response! Charles On 3/14/07, Mark Schreiber wrote: > > Hi - > > From memory everything should be lower case. BioJava always represents DNA > as lowercase and protein as upper case as per convention. > > Try that. > > - Mark > > > On 3/15/07, Charles Danko wrote: > > > Hi, > > > > I'm having problems using the biojava regex classes. > > > > According to my understanding, the code posted below is the simplest > > possible example of this class. > > > > However, my output is: > > TAG > > false > > 0 > > TAG > > > > The TAG, TAG part of the output is for pattern.patternAsString() and > > occurence.pattern().patternAsString(). As I understand it, both of > > these > > are correct, leading me to believe that both the Pattern and Matcher > > objects > > are being created correctly. However, occurences.find() = false and > > occurences.groupCount() = 0 ... meaning it's not finding any matches!? > > > > Where am I going wrong? > > > > Many thanks! > > Charles > > > > import org.biojava.bio.*; > > import org.biojava.bio.seq.*; > > import org.biojava.bio.symbol.*; > > import org.biojava.utils.regex.*; > > import java.util.* ; > > import java.io.*; > > > > public class Ambiguity2 { > > public static void main(String[] args) { > > try { > > FiniteAlphabet IUPAC = DNATools.getDNA(); > > > > // Create pattern using pattern factory. > > Pattern pattern; > > PatternFactory FACTORY = PatternFactory.makeFactory(IUPAC); > > try{ > > pattern = FACTORY.compile("TAG"); > > } catch(Exception e) {e.printStackTrace(); return;} > > System.out.println(pattern.patternAsString()); > > > > // Variables needed... > > Matcher occurences; > > > > // Promoter & Element > > Element WorkingElement = new Element("ElementName"); > > SymbolList WorkingPromoter = DNATools.createDNA > > ("TAGAGATAGACGATAGC"); > > > > // Obtain iterator of patterns. > > try { > > occurences = pattern.matcher( WorkingPromoter ); > > } catch(Exception e) {e.printStackTrace(); return;} > > System.out.println(occurences.find()); > > System.out.println(occurences.groupCount()); > > System.out.println(occurences.pattern().patternAsString()); > > // Foreach match > > while( occurences.find() ) { > > // Create Occurence object using information from > > patterns. > > System.out.println("Match: " +"\t"+ WorkingPromoter +"\n"+ > > occurences.start() +"\t"+ occurences.group().seqString()); > > } > > } > > > > catch (Exception ex) { > > ex.printStackTrace(); > > } > > } > > } > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From Sofia.Burvall at hgen.slu.se Fri Mar 16 12:31:23 2007 From: Sofia.Burvall at hgen.slu.se (Sofia Burvall) Date: Fri, 16 Mar 2007 17:31:23 +0100 Subject: [Biojava-l] Uniprot files Message-ID: Hi! I have just started to get to know biojava. I have written a small program that reads a file with the help of the biojavax method RichSequence.IOTools.readFile(filen,ns ); and then tries to write the file as UniProt using RichSequence.IOTools.writeUniProt(System.out, seqit, ns); This works nicely when I read a fasta file. But when I try to read a Uniprot file I get this error message: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence (RichStreamReader.java:113) at org.biojavax.bio.seq.io.RichStreamReader.nextSequence (RichStreamReader.java:92) at org.biojavax.bio.seq.io.RichStreamWriter.writeStream (RichStreamWriter.java:66) at org.biojavax.bio.seq.RichSequence$IOTools.writeUniProt (RichSequence.java:1426) at bc_biojava.GeneralReader.main(GeneralReader.java:81) Caused by: org.biojava.bio.seq.io.ParseException: Bad date line found: 01-JAN-1990 (Rel. 13, Created) at org.biojavax.bio.seq.io.UniProtFormat.readRichSequence (UniProtFormat.java:349) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence (RichStreamReader.java:110) ... 4 more When I try other uniprot files i get the same error. It complains about "Bad date line..". What can be the reason for this? Is it the wrong file format? Cheers /Sofia *** Here is the UniProt flat file: *** ID FOSB_MOUSE STANDARD; PRT; 338 AA. AC P13346; DT 01-JAN-1990 (Rel. 13, Created) DT 01-JAN-1990 (Rel. 13, Last sequence update) DT 15-JUN-2002 (Rel. 41, Last annotation update) DE Protein fosB. GN FOSB. OS Mus musculus (Mouse). OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; OC Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. OX NCBI_Taxid=10090; RN [1] RP SEQUENCE FROM N.A. RX MEDLINE=89251612; PubMed=2498083; RA Zerial M., Toschi L., Ryseck R.-P., Schuermann M., Mueller R., RA Bravo R.; RT "The product of a novel growth factor activated gene, fos B, interacts RT with JUN proteins enhancing their DNA binding activity."; RL EMBO J. 8:805-813(1989). RN [2] RP SEQUENCE FROM N.A. RX MEDLINE=92158623; PubMed=1741260; RA Lazo P.S., Dorfman K., Noguchi T., Mattei M.-G., Bravo R.; RT "Structure and mapping of the fosB gene. FosB downregulates the RT activity of the fosB promoter."; RL Nucleic Acids Res. 20:343-350(1992). CC -!- FUNCTION: FOSB INTERACTS WITH JUN PROTEINS ENHANCING THEIR DNA CC BINDING ACTIVITY. CC -!- SUBUNIT: HETERODIMER (BY SIMILARITY). CC -!- SUBCELLULAR LOCATION: NUCLEAR. CC -!- INDUCTION: BY GROWTH FACTORS. CC -!- SIMILARITY: BELONGS TO THE BZIP FAMILY. FOS SUBFAMILY. CC ------------------------------------------------------------------------ -- CC This Swiss-Prot entry is copyright. It is produced through a collaboration CC between the Swiss Institute of Bioinformatics and the EMBL outstation - CC the European Bioinformatics Institute. There are no restrictions on its CC use by non-profit institutions as long as its content is in no way CC modified and this statement is not removed. Usage by and for commercial CC entities requires a license agreement (See http://www.isb-sib.ch/ announce/ CC or send an email to license at isb-sib.ch). CC ------------------------------------------------------------------------ -- DR EMBL; X14897; CAA33026.1; -. DR EMBL; AF093624; AAD13196.1; -. DR PIR; S04108; TVMSFB. DR PIR; S35477; S35477. DR HSSP; P01100; 1FOS. DR TRANSFAC; T00291; -. DR MGD; MGI:95575; Fosb. DR InterPro; IPR000837; Leuzip_Fos. DR InterPro; IPR004827; TF_bZIP. DR Pfam; PF00170; bZIP; 1. DR PRINTS; PR00042; LEUZIPPRFOS. DR SMART; SM00338; BRLZ; 1. DR PROSITE; PS00036; BZIP_BASIC; 1. KW Nuclear protein; DNA-binding. FT DNA_BIND 161 179 BASIC MOTIF. FT DOMAIN 183 211 LEUCINE-ZIPPER. SQ SEQUENCE 338 AA; 35976 MW; E9D031A4BEAE48EC CRC64; MFQAFPGDYD SGSRCSSSPS AESQYLSSVD SFGSPPTAAA SQECAGLGEM PGSFVPTVTA ITTSQDLQWL VQPTLISSMA QSQGQPLASQ PPAVDPYDMP GTSYSTPGLS AYSTGGASGS GGPSTSTTTS GPVSARPARA RPRRPREETL TPEEEEKRRV RRERNKLAAA KCRNRRRELT DRLQAETDQL EEEKAELESE IAELQKEKER LEFVLVAHKP GCKIPYEEGP GPGPLAEVRD LPGSTSAKED GFGWLLPPPP PPPLPFQSSR DAPPNLTASL FTHSEVQVLG DPFPVVSPSY TSSFVLTCPE VSAFAGAQRT SGSEQPSDPL NSPSLLAL // From ericgibert at yahoo.fr Fri Mar 16 21:42:38 2007 From: ericgibert at yahoo.fr (Eric Gibert) Date: Sat, 17 Mar 2007 09:42:38 +0800 Subject: [Biojava-l] BioJava 1.5 versus BioJava 2 Message-ID: <015201c76835$8e85b070$3e00a8c0@Gecko> Dear all, I am using BioJava 1.4 and I think to upgrade to 1.5. But I would like to benefit of the new Java language improvement ( and "foreach a la Java"). My understanding is that Biojava 1.5 uses the List type (with the Iterator). On the other hand, I read about another project called Biojava 2 which takes the approach to implement and other features of Java 5.0. But it looks like Biojava 2 is still new and does not offer the rich environment of Biojava 1.x Any comments? Eric Gibert http://www.asia-dragonfly.net & http://www.africa-dragonfly.net From markjschreiber at gmail.com Sat Mar 17 01:48:10 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Sat, 17 Mar 2007 13:48:10 +0800 Subject: [Biojava-l] BioJava 1.5 versus BioJava 2 In-Reply-To: <015201c76835$8e85b070$3e00a8c0@Gecko> References: <015201c76835$8e85b070$3e00a8c0@Gecko> Message-ID: <93b45ca50703162248m30a0319bi1ee8e70b9eaa7521@mail.gmail.com> Hi - BioJava2 is more of a concept than an actual usable API. I'm not sure if the development of this project is even active right now. BioJava 1.5 will be the next generation of biojava. It doesn't use features of JDK1.5 like generics, foreach, and unboxing but later versions will. However, this doesn't mean you can't use them with the API. For example there is no reason why you cannot make a List and iterate over it with a foreach statement. It's just that BioJava 1.5 doesn't do that internally yet. - Mark On 3/17/07, Eric Gibert wrote: > Dear all, > > > > I am using BioJava 1.4 and I think to upgrade to 1.5. But I would like to > benefit of the new Java language improvement ( and "foreach a la > Java"). My understanding is that Biojava 1.5 uses the List type (with the > Iterator). On the other hand, I read about another project called Biojava 2 > which takes the approach to implement and other features of Java > 5.0. > > > > But it looks like Biojava 2 is still new and does not offer the rich > environment of Biojava 1.x > > > > Any comments? > > > > Eric Gibert > > http://www.asia-dragonfly.net & > http://www.africa-dragonfly.net > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From hlapp at gmx.net Sat Mar 17 16:15:01 2007 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 17 Mar 2007 16:15:01 -0400 Subject: [Biojava-l] Phyloinformatics Summer of Code Message-ID: (Apologies if you receive multiple copies of this. The message is being posted to multiple channels.) (Note that leading developers from both Biojava and BioPerl are amongst the mentors.) Phyloinformatics Summer of Code 2007 A collaborative Phyloinformatics Group, sponsored by the National Evolutionary Synthesis Center (NESCent: http://www.nescent.org/), is working to develop user-interfaces, improve software interoperability and support data exchange standards in evolutionary bioinformatics. The specific projects are diverse in nature and range from the development of AJAX components for web-based bioinformatics applications, managing workflows using approaches from functional and logic programming, and developing data exchange standards for phylogenetic substitution models. The Phyloinformatics group will be sponsoring student collaborators through the Google Summer of Code program (http://code.google.com/ soc), which provides undergraduate, masters and PhD students with a unique opportunity (over three summer months) to obtain hands-on experience writing and extending open-source software under the mentorship of experienced developers from around the world. We are particularly targeting students interested in both evolutionary biology and software development. Students will have one or more dedicated mentors with expertise in phylogenetic methods and open- source software development. Our project proposals are flexible and can be adjusted in scope to match the skills of students with less programming proficiency. If the program sounds interesting to you but you are unsure whether you have the necessary skills, please email the mentors at phylosoc {at} nescent {dot} org. We will work with those who are genuinely interested to find a project that fits your interest and skills. Students will receive a stipend from Google and will be invited to participate in future collaborative events such as the NESCent Phyloinformatics Hackathons (http://www.nescent.org/wg/ phyloinformatics). TO APPLY: Students must apply on-line at the Google Summer of Code website (http://code.google.com/soc). The application period for students is now open and ends on Saturday, March 24, 2007 (one week from now). The Phyloinformatics Summer of Code project and ideas page is at the following URL: http://phyloinformatics.net/Phyloinformatics_Summer_of_Code_2007 The above page also contains links to the GSoC program rules, eligibility requirements, and stipend payment mechanism. We encourage all interested students to email any questions, or self-proposed project ideas, to phylosoc {at} nescent {dot} org. This will reach all prospective mentors. Eligibility requirements for students: http://code.google.com/support/bin/answer.py?answer=60279&topic=10730 Stipend for students: http://code.google.com/support/bin/answer.py?answer=60322&topic=10731 Please disseminate this announcement to appropriate students at your institution. Hilmar Lapp Assistant Director for Informatics NESCent From ericgibert at yahoo.fr Mon Mar 19 09:46:49 2007 From: ericgibert at yahoo.fr (Eric Gibert) Date: Mon, 19 Mar 2007 21:46:49 +0800 Subject: [Biojava-l] BioJava 1.5 versus BioJava 2 In-Reply-To: <45FE5436.7000807@ebi.ac.uk> References: <015201c76835$8e85b070$3e00a8c0@Gecko> <45FE5436.7000807@ebi.ac.uk> Message-ID: <002901c76a2d$0ec96260$3e00a8c0@Gecko> Thank you Richard and Mark for your answers. So my next move is to install Hibernate and move to BJ1.5. Eric -----Original Message----- From: Richard Holland [mailto:holland at ebi.ac.uk] Sent: Monday, March 19, 2007 5:13 PM To: Eric Gibert Cc: biojava-l at lists.open-bio.org Subject: Re: [Biojava-l] BioJava 1.5 versus BioJava 2 -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 BJ2 is developed by a different group of developers than BJ1.5. As far as I know, BJ2 never got further than a framework of ideas with one or two proof-of-concept interface implementations. It is not currently under active development that I am aware of. BJ1.5 still uses Java 1.4 language features, as you correctly identify. We have discussed moving to Java 1.5 for the BJ1.6 release and will make a decision/announcement when the time comes. One of the reasons for the delayed conversion is that up until recently many users' systems do not have Java 1.5 environments available (older Macs, Compaq/DEC Alphas, etc.). cheers, Richard Eric Gibert wrote: > Dear all, > > > > I am using BioJava 1.4 and I think to upgrade to 1.5. But I would like to > benefit of the new Java language improvement ( and "foreach a la > Java"). My understanding is that Biojava 1.5 uses the List type (with the > Iterator). On the other hand, I read about another project called Biojava 2 > which takes the approach to implement and other features of Java > 5.0. > > > > But it looks like Biojava 2 is still new and does not offer the rich > environment of Biojava 1.x > > > > Any comments? > > > > Eric Gibert > > http://www.asia-dragonfly.net & > http://www.africa-dragonfly.net > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF/lQ14C5LeMEKA/QRAoH2AJ9Vs22ArHtlx+fDTRS9tU0PtRLztACeLnxj lCsOBXw/oHGrEqf+Gzd0RQo= =GOvY -----END PGP SIGNATURE----- ___________________________________________________________________________ Yahoo! Mail r?invente le mail ! D?couvrez le nouveau Yahoo! Mail et son interface r?volutionnaire. http://fr.mail.yahoo.com From Joseph.Bedell at sial.com Mon Mar 19 11:00:58 2007 From: Joseph.Bedell at sial.com (Joseph Bedell) Date: Mon, 19 Mar 2007 10:00:58 -0500 Subject: [Biojava-l] Joseph Bedell is out of the office. Message-ID: I will be out of the office starting 03/19/2007 and will not return until 03/26/2007. I will respond to your message when I return. From mark.schreiber at novartis.com Tue Mar 20 22:55:48 2007 From: mark.schreiber at novartis.com (mark.schreiber at novartis.com) Date: Wed, 21 Mar 2007 10:55:48 +0800 Subject: [Biojava-l] State of the nation Message-ID: Hi - Recently several of the current and former core developers of the biojava project happened to be in Cambridge at the same time and had some discussions about the directions and immediate future of the project. A summary of the discussion has been posted on the biojava website at http://biojava.org/wiki/BioJava:CambridgeDiscussion Please take a look and feel free to add your comments to the page. - Mark From gwaldon at geneinfinity.org Wed Mar 21 14:38:03 2007 From: gwaldon at geneinfinity.org (george waldon) Date: Wed, 21 Mar 2007 11:38:03 -0700 Subject: [Biojava-l] Uniprot files Message-ID: <200703211838.l2LIc4rM063269@mmm1924.dulles19-verio.com> Sorry for the late response. I have also noticed the problem with uniprot files recently. I am going to file a bug report. In the future, you can submit bugs yourself on bugzilla at http://bugzilla.open-bio.org/. Best, George -----Original Message----- From: biojava-l-bounces at lists.open-bio.org [mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Sofia Burvall Sent: Friday, March 16, 2007 9:31 AM To: biojava-l at lists.open-bio.org Subject: [Biojava-l] Uniprot files Hi! I have just started to get to know biojava. I have written a small program that reads a file with the help of the biojavax method RichSequence.IOTools.readFile(filen,ns ); and then tries to write the file as UniProt using RichSequence.IOTools.writeUniProt(System.out, seqit, ns); This works nicely when I read a fasta file. But when I try to read a Uniprot file I get this error message: org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence (RichStreamReader.java:113) at org.biojavax.bio.seq.io.RichStreamReader.nextSequence (RichStreamReader.java:92) at org.biojavax.bio.seq.io.RichStreamWriter.writeStream (RichStreamWriter.java:66) at org.biojavax.bio.seq.RichSequence$IOTools.writeUniProt (RichSequence.java:1426) at bc_biojava.GeneralReader.main(GeneralReader.java:81) Caused by: org.biojava.bio.seq.io.ParseException: Bad date line found: 01-JAN-1990 (Rel. 13, Created) at org.biojavax.bio.seq.io.UniProtFormat.readRichSequence (UniProtFormat.java:349) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence (RichStreamReader.java:110) ... 4 more When I try other uniprot files i get the same error. It complains about "Bad date line..". What can be the reason for this? Is it the wrong file format? Cheers /Sofia *** Here is the UniProt flat file: *** ID FOSB_MOUSE STANDARD; PRT; 338 AA. AC P13346; DT 01-JAN-1990 (Rel. 13, Created) DT 01-JAN-1990 (Rel. 13, Last sequence update) DT 15-JUN-2002 (Rel. 41, Last annotation update) DE Protein fosB. GN FOSB. OS Mus musculus (Mouse). OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; OC Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. OX NCBI_Taxid=10090; RN [1] RP SEQUENCE FROM N.A. RX MEDLINE=89251612; PubMed=2498083; RA Zerial M., Toschi L., Ryseck R.-P., Schuermann M., Mueller R., RA Bravo R.; RT "The product of a novel growth factor activated gene, fos B, interacts RT with JUN proteins enhancing their DNA binding activity."; RL EMBO J. 8:805-813(1989). RN [2] RP SEQUENCE FROM N.A. RX MEDLINE=92158623; PubMed=1741260; RA Lazo P.S., Dorfman K., Noguchi T., Mattei M.-G., Bravo R.; RT "Structure and mapping of the fosB gene. FosB downregulates the RT activity of the fosB promoter."; RL Nucleic Acids Res. 20:343-350(1992). CC -!- FUNCTION: FOSB INTERACTS WITH JUN PROTEINS ENHANCING THEIR DNA CC BINDING ACTIVITY. CC -!- SUBUNIT: HETERODIMER (BY SIMILARITY). CC -!- SUBCELLULAR LOCATION: NUCLEAR. CC -!- INDUCTION: BY GROWTH FACTORS. CC -!- SIMILARITY: BELONGS TO THE BZIP FAMILY. FOS SUBFAMILY. CC ------------------------------------------------------------------------ -- CC This Swiss-Prot entry is copyright. It is produced through a collaboration CC between the Swiss Institute of Bioinformatics and the EMBL outstation - CC the European Bioinformatics Institute. There are no restrictions on its CC use by non-profit institutions as long as its content is in no way CC modified and this statement is not removed. Usage by and for commercial CC entities requires a license agreement (See http://www.isb-sib.ch/ announce/ CC or send an email to license at isb-sib.ch). CC ------------------------------------------------------------------------ -- DR EMBL; X14897; CAA33026.1; -. DR EMBL; AF093624; AAD13196.1; -. DR PIR; S04108; TVMSFB. DR PIR; S35477; S35477. DR HSSP; P01100; 1FOS. DR TRANSFAC; T00291; -. DR MGD; MGI:95575; Fosb. DR InterPro; IPR000837; Leuzip_Fos. DR InterPro; IPR004827; TF_bZIP. DR Pfam; PF00170; bZIP; 1. DR PRINTS; PR00042; LEUZIPPRFOS. DR SMART; SM00338; BRLZ; 1. DR PROSITE; PS00036; BZIP_BASIC; 1. KW Nuclear protein; DNA-binding. FT DNA_BIND 161 179 BASIC MOTIF. FT DOMAIN 183 211 LEUCINE-ZIPPER. SQ SEQUENCE 338 AA; 35976 MW; E9D031A4BEAE48EC CRC64; MFQAFPGDYD SGSRCSSSPS AESQYLSSVD SFGSPPTAAA SQECAGLGEM PGSFVPTVTA ITTSQDLQWL VQPTLISSMA QSQGQPLASQ PPAVDPYDMP GTSYSTPGLS AYSTGGASGS GGPSTSTTTS GPVSARPARA RPRRPREETL TPEEEEKRRV RRERNKLAAA KCRNRRRELT DRLQAETDQL EEEKAELESE IAELQKEKER LEFVLVAHKP GCKIPYEEGP GPGPLAEVRD LPGSTSAKED GFGWLLPPPP PPPLPFQSSR DAPPNLTASL FTHSEVQVLG DPFPVVSPSY TSSFVLTCPE VSAFAGAQRT SGSEQPSDPL NSPSLLAL // _______________________________________________ Biojava-l mailing list - Biojava-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/biojava-l From phidias51 at gmail.com Thu Mar 22 22:53:50 2007 From: phidias51 at gmail.com (Mark Fortner) Date: Thu, 22 Mar 2007 18:53:50 -0800 Subject: [Biojava-l] State of the nation In-Reply-To: References: Message-ID: <6e1d61f50703221953y1ea5a8e0j99e8b806782351e7@mail.gmail.com> Mark As you point out, the build server changes (and potentially the bug tracking system) would require the participation of the open bio folks to make these changes to their server. It might be a good idea to begin engaging them now. They may have some other ideas as well. There are complete software development management systems like Source Forge that could be setup. When I worked at a previous company we had our own internal SourceForge instance that gave us forums, a wiki, subversion, bug tracking, etc. Atlassian offers a similar product suite. Such an all-encompassing solution might suit OpenBio more than a series of piecemeal solutions. Mark On 3/22/07, mark.schreiber at novartis.com wrote: > > > > > > *"Mark Fortner" * > > 03/21/2007 01:22 PM > > To: "mark.schreiber at novartis.com" < > mark.schreiber at novartis.com> > cc: > Subject: Re: [Biojava-l] State of the nation > > > Hi Mark, > I started to edit the wiki, but then thought it might be better to discuss > a few things first. > > (1) There are some new javadoc tags that can make some of this easier to > do. {@inheritDoc} for example can be used to inherit the documentation from > an interface or superclass, thus saving some typing. > > Yes, we use these a lot in biojavax but it could be added as a > recomendation in the wiki. > > (2) Unit tests will definitely make it easier to handle regression tests, > but it might be better to simply add additional test methods to existing > unit tests (currently located in the tests directory). This directory > mirrors the source packages and is fairly easy to maintain. We can add a > comment in the test method to indicate that it's in response to a particular > bug fix. > > You could name the method testBug12345. It's a matter of style I suppose. > Having it all together reduces redundancy. Having regression tests > seperately makes it easier to find the appropriate regression test > especially if it involves more than one object. > > (3) Regardless of the method used to build BioJava it would be useful to > continuously build and test the project. There are tools to do this for > both Maven and Ant. In addition to simply telling us that the build is > broken, there are code coverage plugins that can tell us if the unit tests > accurately cover the code they purport to test. You can also generate > documentation, HTML-browsable code (java2html), and UML diagrams (through > UMLJGraph or doxygen) in an automated fashion. Here's a partial plugin > list to give you some ideas: *http://maven.apache.org/plugins/index.html* > > I agree. I'm not much of a build expert and only an ant hacker. What we > really need is some volunteers to help set this sort of thing up. If you > know how to do it and have the time it would be excellent. > > (4) If you've reported bugs for any of the Apache projects, you may have > noticed that they switched to JIRA. This is a wonderful tool created by a > company in Sydney called Atlassian. The tool can provide users with a > roadmap of what changes are due in which releases, you can create RSS feeds > for different reports. The reports are completely customizable. If you use > the Eclipse Mylar plugin, then tasks that are assigned to you in JIRA appear > in Mylar. You can also add JIRA issue numbers to code checkins in > subversion or CVS. When JIRA indexes the code, you it then provides you > with a list of all classes that were modified in response to a particular > bug. There are numerous other features described here: * > http://www.atlassian.com/software/jira/*and the software is available free of charge for open source project! s. > *http://www.atlassian.com/software/jira/pricing.jsp* > > Again, I don't have much experience. I use bugzilla because open-bio set > it up for us. We could probably request other things if you know what you > want. > > Go ahead and add some suggestions on either the wiki or the mailing list. > Hopefully some prodding will provoke the slumbering biojava community into > action : ) > > - Mark > > Hope this helps, > > Mark Fortner > > > On 3/20/07, *mark.schreiber at novartis.com* <* > mark.schreiber at novartis.com* > wrote: > Hi - > > Recently several of the current and former core developers of the biojava > project happened to be in Cambridge at the same time and had some > discussions about the directions and immediate future of the project. A > summary of the discussion has been posted on the biojava website at* > **http://biojava.org/wiki/BioJava:CambridgeDiscussion* > > Please take a look and feel free to add your comments to the page. > > - Mark > > _______________________________________________ > Biojava-l mailing list - *Biojava-l at lists.open-bio.org* > * > **http://lists.open-bio.org/mailman/listinfo/biojava-l* > > > From e.willighagen at science.ru.nl Fri Mar 23 13:50:54 2007 From: e.willighagen at science.ru.nl (Egon Willighagen) Date: Fri, 23 Mar 2007 18:50:54 +0100 Subject: [Biojava-l] State of the nation In-Reply-To: <6e1d61f50703221953y1ea5a8e0j99e8b806782351e7@mail.gmail.com> References: <6e1d61f50703221953y1ea5a8e0j99e8b806782351e7@mail.gmail.com> Message-ID: <200703231850.54872.e.willighagen@science.ru.nl> On Friday 23 March 2007, Mark Fortner wrote: > > We can add a > > comment in the test method to indicate that it's in response to a > > particular bug fix. > > > > You could name the method testBug12345. It's a matter of style I suppose. > > Having it all together reduces redundancy. Having regression tests > > seperately makes it easier to find the appropriate regression test > > especially if it involves more than one object. I can very much recommend this, as we have good experiences with this in the CDK [1]. We actually tag Junit test methods, as well as the class in which the bug was found until it is closed. By using a simple JavaDoc Taglet, we link to our bug track system (e.g. [2]), so that users can always see in the JavaDoc if there are relevant open bug for a certain pieace of code. Additionally, we have a nightly build service which checks about open bugs for which no tests are available, and a few administrative things [3]. BTW, we use the tag @cdk.bug, but that just what we have chosen it to be. Egon 1.http://cdk.sf.net/ 2.http://66.102.9.104/search?q=cache:UBsZ1moOpLwJ:cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/api/org/openscience/cdk/smiles/SmilesParser.html+cdk+SMILESParser&hl=nl&client=firefox-a&strip=1 3.http://cheminfo.informatics.indiana.edu/~rguha/code/java/nightly/bugs.html (ad. 2: Yeah, it seems 'Nightly' is down, so you get the Google cache :) -- e.willighagen at science.ru.nl Cologne University Bioinformatics Center (CUBIC) Blog: http://chem-bla-ics.blogspot.com/ GPG: 1024D/D6336BA6 From Sofia.Burvall at hgen.slu.se Mon Mar 26 08:11:13 2007 From: Sofia.Burvall at hgen.slu.se (Sofia Burvall) Date: Mon, 26 Mar 2007 14:11:13 +0200 Subject: [Biojava-l] biojava/biojavax In-Reply-To: <6e1d61f50703221953y1ea5a8e0j99e8b806782351e7@mail.gmail.com> References: <6e1d61f50703221953y1ea5a8e0j99e8b806782351e7@mail.gmail.com> Message-ID: Hi, I'm currently updating some code that require biojava. I am interested in having as many sequence formats as possible supported and I want it to be stable. I was wondering if there are any differences between biojava and biojavax in those aspects? Thank you /Sofia From kjaanson at gmail.com Mon Mar 26 14:41:50 2007 From: kjaanson at gmail.com (Kaur Jaanson) Date: Mon, 26 Mar 2007 21:41:50 +0300 Subject: [Biojava-l] RichSequence.IOTools Message-ID: <4f6282d40703261141y249381efl7d5da6317e3c49b9@mail.gmail.com> Hi. I tried to save Sequence with RichSequence.IOTools.writeGenbank. It saved the sequence data but not the features I just added. I was not so sure how to use the Namespace so I used null as its value. The SeqIOTools.writeGenbank saved features nicely. -- Kaur Jaanson From zagato.gekko at gmail.com Mon Mar 26 17:03:08 2007 From: zagato.gekko at gmail.com (Zagato) Date: Mon, 26 Mar 2007 16:03:08 -0500 Subject: [Biojava-l] RichSequence.IOTools In-Reply-To: <4f6282d40703261141y249381efl7d5da6317e3c49b9@mail.gmail.com> References: <4f6282d40703261141y249381efl7d5da6317e3c49b9@mail.gmail.com> Message-ID: <98028b00703261403j682cca9bpcf08e70710f538d3@mail.gmail.com> Hello, ?are you using the CVS version of BioJava?, in old days use the lastest version from CVS solved me a problem with features. For Namespace can you start using some like: SimpleNamespace bacterianamespace = new SimpleNamespace("Bacterias"); or Namespace dflNS = RichObjectFactory.getDefaultNamespace(); I'm really not an expert but i think that Namespace represents a real Namespace in a BioSQL database, it's like a group of genes, or genomes, etc. I Hope this helps you a few.. Bye.. Alan Jairo Acosta Cali - Colombia On 3/26/07, Kaur Jaanson wrote: > > Hi. > > I tried to save Sequence with RichSequence.IOTools.writeGenbank. It saved > the sequence data but not the features I just added. I was not so sure how > to use the Namespace so I used null as its value. The > SeqIOTools.writeGenbank saved features nicely. > > -- > Kaur Jaanson > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > -- Farewell. http://www.youtube.com/zagatogekko ruby << __EOF__ puts [ 111, 116, 97, 103, 97, 90 ].collect{|v| v.chr}.join.reverse __EOF__ From markjschreiber at gmail.com Mon Mar 26 21:08:44 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 27 Mar 2007 09:08:44 +0800 Subject: [Biojava-l] biojava/biojavax In-Reply-To: References: <6e1d61f50703221953y1ea5a8e0j99e8b806782351e7@mail.gmail.com> Message-ID: <93b45ca50703261808s643c185ctc18cf6d9802b9bf8@mail.gmail.com> Hi - On the whole biojavax supports more formats and has a more detailed object model. The parsers are generally more reliable but sometimes a little strict about modifications to a format. We are currently trying to do some stress testing of format reading. In the most current CVS release I have added more informative exception statements that make it easier to debug parsing issues. - Mark On 3/26/07, Sofia Burvall wrote: > Hi, > > I'm currently updating some code that require biojava. I am > interested in having as many sequence formats as possible supported > and I want it to be stable. I was wondering if there are any > differences between biojava and biojavax in those aspects? > > Thank you > /Sofia > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From markjschreiber at gmail.com Mon Mar 26 21:09:56 2007 From: markjschreiber at gmail.com (Mark Schreiber) Date: Tue, 27 Mar 2007 09:09:56 +0800 Subject: [Biojava-l] RichSequence.IOTools In-Reply-To: <98028b00703261403j682cca9bpcf08e70710f538d3@mail.gmail.com> References: <4f6282d40703261141y249381efl7d5da6317e3c49b9@mail.gmail.com> <98028b00703261403j682cca9bpcf08e70710f538d3@mail.gmail.com> Message-ID: <93b45ca50703261809p55f2bfdeu46d1f66993b9634b@mail.gmail.com> You can also use null for namespace and you will get the default value. - Mark On 3/27/07, Zagato wrote: > Hello, ?are you using the CVS version of BioJava?, in old days use the > lastest version from CVS solved me a problem with features. > > For Namespace can you start using some like: > > SimpleNamespace bacterianamespace = new SimpleNamespace("Bacterias"); > or > Namespace dflNS = RichObjectFactory.getDefaultNamespace(); > > I'm really not an expert but i think that Namespace represents a real > Namespace in a BioSQL database, it's like a group of genes, or genomes, etc. > > I Hope this helps you a few.. Bye.. > > Alan Jairo Acosta > Cali - Colombia > > On 3/26/07, Kaur Jaanson wrote: > > > > Hi. > > > > I tried to save Sequence with RichSequence.IOTools.writeGenbank. It saved > > the sequence data but not the features I just added. I was not so sure how > > to use the Namespace so I used null as its value. The > > SeqIOTools.writeGenbank saved features nicely. > > > > -- > > Kaur Jaanson > > _______________________________________________ > > Biojava-l mailing list - Biojava-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > > > > > -- > Farewell. > http://www.youtube.com/zagatogekko > ruby << __EOF__ > puts [ 111, 116, 97, 103, 97, 90 ].collect{|v| v.chr}.join.reverse > __EOF__ > > _______________________________________________ > Biojava-l mailing list - Biojava-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biojava-l > From kjaanson at gmail.com Wed Mar 28 18:50:28 2007 From: kjaanson at gmail.com (Kaur Jaanson) Date: Thu, 29 Mar 2007 01:50:28 +0300 Subject: [Biojava-l] RichSequence.IOTools In-Reply-To: <4608DE49.9060203@ebi.ac.uk> References: <4f6282d40703261141y249381efl7d5da6317e3c49b9@mail.gmail.com> <4608DE49.9060203@ebi.ac.uk> Message-ID: <4f6282d40703281550l289b77a4k9163bc8fc896c070@mail.gmail.com> I used example from the cookbook to add the feature and created Sequence with DNATools.createDNASequence(..). I wanted to get the reverse complement strand, find regions with regexp, and mark them as features. Code used for creating reverse strand Sequence object: SymbolList rat_rev_list = DNATools.reverseComplement(rat_gene); Sequence rat_rev = DNATools.createDNASequence(rat_rev_list.seqString(), "rat_gene"); Code used for creating feature: StrandedFeature.Template feat_temp = new StrandedFeature.Template(); // fill in the template feat_temp.annotation = Annotation.EMPTY_ANNOTATION; feat_temp.location = new RangeLocation(matcher.start(), matcher.end() - 1); feat_temp.source = "myself_mapped"; feat_temp.strand = StrandedFeature.POSITIVE; feat_temp.type = key; try { Feature feat = rat_rev.createFeature(feat_temp); } catch (Exception ex) { ex.printStackTrace(); return; } Thanks for clearing up namespace for me :).. got to look into it more. Hopefully you can give me some leads for working with RichSequence & RichFeature :) Kaur Jaanson