From Martin.Szugat at GMX.net Fri Sep 2 19:09:29 2005 From: Martin.Szugat at GMX.net (Martin Szugat) Date: Fri Sep 9 14:34:22 2005 Subject: [Biojava-dev] SymbolPropertyTableIterator for AAindex files Message-ID: <200509022258.j82MwkAG001525@portal.open-bio.org> Hi! I've implemented a stream reader for AAindex files (Amino acid indices and similarity matrices, http://www.genome.ad.jp/dbget/aaindex.html) called AAindexStreamReader. It implements an interface called SymbolPropertyTableIterator which iterates over SymbolPropertyTable objects. The iterator is BioJava-style and fully documentated. The AAindexStreamReader returns in fact AAindex objects which is derived from SimpleSymbolPropertyTable and provides additional methods to set and retrieve information that is stored within an AAindex file (in the AAindex1 format) like an hashtable of similar amino acid indices and its correlation coefficients. I'll hope you find these classes useful and integrate it into BioJava. If you have further question or if some changes are needed don't hesitate to contact me! I'd really like to see these classes in BioJava ;) In addition there are a few more classes that might be useful, too. First there is an interface called SymbolPropertyTableDB (in analogy to the SequenceDB interface) and a simple implementation called SimpleSymbolPropertyTableDB (what a long name!). Finally there is a class called ClassificationFastaDescriptionLineParser which extends SequenceBuilderFilter and extracts a classification value (e.g. SCOP or CATH) from the description line of FASTA entries. This must be the second item in the description line after the name. The ClassificationFastaDescriptionLineParser should be used in conjunction with the FastaDescriptionLineParser. I've implemented all these classes for an open source project called BioWeka (http://www.bioweka.org)---it's an extension to the Weka data mining framework for bioinformaticians and biologists. And of course, it relies on BioJava. In this sense, thanks for your fine work! Martin -------------- next part -------------- A non-text attachment was scrubbed... Name: AAindexStreamReader.java Type: text/java Size: 8019 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-dev/attachments/20050903/2fb3aabf/AAindexStreamReader-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: ClassificationFastaDescriptionLineParser.java Type: text/java Size: 3844 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-dev/attachments/20050903/2fb3aabf/ClassificationFastaDescriptionLineParser-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: AAindex.java Type: text/java Size: 6686 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-dev/attachments/20050903/2fb3aabf/AAindex-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: SimpleSymbolPropertyTableDB.java Type: text/java Size: 5018 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-dev/attachments/20050903/2fb3aabf/SimpleSymbolPropertyTableDB-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: SymbolPropertyTableDB.java Type: text/java Size: 2039 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-dev/attachments/20050903/2fb3aabf/SymbolPropertyTableDB-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: SymbolPropertyTableIterator.java Type: text/java Size: 1757 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-dev/attachments/20050903/2fb3aabf/SymbolPropertyTableIterator-0001.bin From Martin.Szugat at GMX.net Tue Sep 6 07:55:12 2005 From: Martin.Szugat at GMX.net (Martin Szugat) Date: Fri Sep 9 14:34:24 2005 Subject: [Biojava-dev] ExternalProcess class Message-ID: <200509061144.j86BiFAG016259@portal.open-bio.org> Hi! It's me again. I've implemented an ExternalProcess class. It encapsulates the necessity to run multi threaded input and output handlers when calling an external process using Runtime.exec(). The STDERR and STDOUT outputs as well as the STDIN input must be read/written in separate threads otherwise the calling application may hang up. A problem occurs when running an external program multiple times, e.g. running BLAST a thousand times. In this case for each iteration three threads are generated. Under Linux threads are implemented as processes, thus the process "java" is started three times for each iteration. However the thread objects are not terminated by the garbage collector and thus the threads/processes are not terminated. Even explicitly freeing the objects does not work (I've tested this several times). This results in an OutOfMemoryException after a few hundreds iterations, because the numbers of processes is limited under Linux. I've solved this problem by using BioJava's SimpleThreadPool. Output reading and input writing is handled by Runnable input/output handlers, e.g. using a StreamPipe objects it is possible to redirect the STDOUT of an external process to the STDOUT of the calling process. The usage of the ExternalProcess class is very simple, e.g. there are some simple static methods that encapsulate the internal complexity. The class also supports setting the environment variables or to inherit them from the parent process. In addition the working directory can be set or can be inherited. Finally there is a special feature: one can define variables for the command line arguments, e.g. "program -c %PARAMETER%"---%PARAMETER% is replaced by the value from a Properties object with the key "PARAMETER". The classes and interfaces are fully documentated and there is a (repeated) Unit-Test for the ExternalProcess class. I've attached this test, the ExternalProcess class as well as the various handler classes. I hope you'll find these classes useful and integrate them into BioJava. Best regards Martin ____________________ / Martin Szugat \ / Author and Developer \ +--------------------------------+---------------+ |Phone: +49 (0)821 4206442 |Address: | |Fax: +49 (0)821 4206443 | | |Mobil: +49 (0)179 7789714 |Zwerchgasse 6 | |Email: Martin.Szugat@GMX.net |86150 Augsburg | |Web: http://szugat.gmxhome.de |Germany | +--------------------------------+---------------+ -------------- next part -------------- A non-text attachment was scrubbed... Name: ReaderWriterPipe.java Type: text/java Size: 4012 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-dev/attachments/20050906/654b4746/ReaderWriterPipe-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: StreamPipe.java Type: text/java Size: 4158 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-dev/attachments/20050906/654b4746/StreamPipe-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: ExternalProcess.java Type: text/java Size: 19109 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-dev/attachments/20050906/654b4746/ExternalProcess-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: ExternalProcessTest.java Type: text/java Size: 4426 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-dev/attachments/20050906/654b4746/ExternalProcessTest-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: OutputHandler.java Type: text/java Size: 1674 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-dev/attachments/20050906/654b4746/OutputHandler-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: InputHandler.java Type: text/java Size: 1582 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-dev/attachments/20050906/654b4746/InputHandler-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: SimpleOutputHandler.java Type: text/java Size: 1551 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-dev/attachments/20050906/654b4746/SimpleOutputHandler-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: ReaderInputHandler.java Type: text/java Size: 2846 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-dev/attachments/20050906/654b4746/ReaderInputHandler-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: SimpleInputHandler.java Type: text/java Size: 2250 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-dev/attachments/20050906/654b4746/SimpleInputHandler-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: WriterOutputHandler.java Type: text/java Size: 2085 bytes Desc: not available Url : http://portal.open-bio.org/pipermail/biojava-dev/attachments/20050906/654b4746/WriterOutputHandler-0001.bin From gregsnips at yahoo.com Mon Sep 12 22:32:57 2005 From: gregsnips at yahoo.com (GREG OMENKEUKWU) Date: Mon Sep 12 22:28:38 2005 Subject: [Biojava-dev] Trouble compiling biojava demos Message-ID: <20050913023257.10833.qmail@web30704.mail.mud.yahoo.com> I am new to biojava and I am having a little problem with the sample applications. I tried compiling the samples in the bioja demos but I keep getting the following errors. I will appreciate any help I get on this issue. Thanks. C:\biojava-1.00\demos>javac seq\TestEmbl.java seq\TestEmbl.java:18: cannot find symbol symbol : class EmblFormat location: class seq.TestEmbl SequenceFormat eFormat = new EmblFormat(); ^ seq\TestEmbl.java:21: cannot find symbol symbol : class SimpleSequenceFactory location: class seq.TestEmbl SequenceFactory sFact = new SimpleSequenceFactory(); ^ seq\TestEmbl.java:23: cannot find symbol symbol : class SymbolParser location: class seq.TestEmbl SymbolParser rParser = alpha.getParser("token"); ^ seq\TestEmbl.java:23: cannot find symbol symbol : method getParser(java.lang.String) location: interface org.biojava.bio.symbol.Alphabet SymbolParser rParser = alpha.getParser("token"); ^ seq\TestEmbl.java:25: internal error; cannot instantiate org.biojava.bio.seq.io. StreamReader. at org.biojava.bio.seq.io.StreamReader to () new StreamReader(eReader, eFormat, rParser, sFact); ^ 5 errors __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com From td2 at sanger.ac.uk Tue Sep 13 03:36:44 2005 From: td2 at sanger.ac.uk (Thomas Down) Date: Tue Sep 13 03:54:30 2005 Subject: [Biojava-dev] Trouble compiling biojava demos In-Reply-To: <20050913023257.10833.qmail@web30704.mail.mud.yahoo.com> References: <20050913023257.10833.qmail@web30704.mail.mud.yahoo.com> Message-ID: <67F794B1-ADC4-4BDB-A683-959C23508937@sanger.ac.uk> Hi. Firstly, what version of BioJava are you testing here? From your command line, it looks like you're using 1.00 which is now very old indeed. I'd suggest upgrading to 1.4. How are you trying to compile the demos? From the errors you're quoting, it looks like the compiler couldn't find any of the main BioJava classes -- have you added biojava.jar to your CLASSPATH? Thomas. On 13 Sep 2005, at 03:32, GREG OMENKEUKWU wrote: > I am new to biojava and I am having a little problem > with the sample applications. > I tried compiling the samples in the bioja demos but I > keep getting the following errors. I will appreciate > any help I get on this issue. Thanks. > C:\biojava-1.00\demos>javac seq\TestEmbl.java > seq\TestEmbl.java:18: cannot find symbol > symbol : class EmblFormat > location: class seq.TestEmbl > SequenceFormat eFormat = new EmblFormat(); > ^ > seq\TestEmbl.java:21: cannot find symbol > symbol : class SimpleSequenceFactory > location: class seq.TestEmbl > SequenceFactory sFact = new > SimpleSequenceFactory(); > ^ > seq\TestEmbl.java:23: cannot find symbol > symbol : class SymbolParser > location: class seq.TestEmbl > SymbolParser rParser = alpha.getParser("token"); > ^ > seq\TestEmbl.java:23: cannot find symbol > symbol : method getParser(java.lang.String) > location: interface org.biojava.bio.symbol.Alphabet > SymbolParser rParser = alpha.getParser("token"); > ^ > seq\TestEmbl.java:25: internal error; cannot > instantiate org.biojava.bio.seq.io. > StreamReader. at > org.biojava.bio.seq.io.StreamReader to () > new StreamReader(eReader, eFormat, rParser, > sFact); > ^ > 5 errors > > > > __________________________________ > Yahoo! Mail - PC Magazine Editors' Choice 2005 > http://mail.yahoo.com > _______________________________________________ > biojava-dev mailing list > biojava-dev@biojava.org > http://biojava.org/mailman/listinfo/biojava-dev > From mark.schreiber at novartis.com Sun Sep 18 22:38:04 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Sep 18 22:45:11 2005 Subject: [Biojava-dev] SymbolPropertyTableIterator for AAindex files Message-ID: Hi Martin, This looks like interesting code. Unfortunately the SimpleSymbolPropertyTable class seems to be missing. I can't quite follow what you would use this for. Your code suggests you are reading AAIndex1 format. As far as I can tell this appears to give a frequency?? of amino acid usage. Does your code represent this as a biojava Distribution (or have a way to convert it to one)? If it does this would be a fantastic way of reading in background amino acid distributions. Do you have any plans to read AAIndex2 format? I think this would be good in biojava but more details would be good. I'm also quite interested in your link with BioWeka. Would you be interested in providing an example for the biojava in anger pages? Best regards, - Mark Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 "Martin Szugat" Sent by: biojava-dev-bounces@portal.open-bio.org 09/03/2005 07:09 AM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-dev] SymbolPropertyTableIterator for AAindex files Hi! I've implemented a stream reader for AAindex files (Amino acid indices and similarity matrices, http://www.genome.ad.jp/dbget/aaindex.html) called AAindexStreamReader. It implements an interface called SymbolPropertyTableIterator which iterates over SymbolPropertyTable objects. The iterator is BioJava-style and fully documentated. The AAindexStreamReader returns in fact AAindex objects which is derived from SimpleSymbolPropertyTable and provides additional methods to set and retrieve information that is stored within an AAindex file (in the AAindex1 format) like an hashtable of similar amino acid indices and its correlation coefficients. I'll hope you find these classes useful and integrate it into BioJava. If you have further question or if some changes are needed don't hesitate to contact me! I'd really like to see these classes in BioJava ;) In addition there are a few more classes that might be useful, too. First there is an interface called SymbolPropertyTableDB (in analogy to the SequenceDB interface) and a simple implementation called SimpleSymbolPropertyTableDB (what a long name!). Finally there is a class called ClassificationFastaDescriptionLineParser which extends SequenceBuilderFilter and extracts a classification value (e.g. SCOP or CATH) from the description line of FASTA entries. This must be the second item in the description line after the name. The ClassificationFastaDescriptionLineParser should be used in conjunction with the FastaDescriptionLineParser. I've implemented all these classes for an open source project called BioWeka (http://www.bioweka.org)---it's an extension to the Weka data mining framework for bioinformaticians and biologists. And of course, it relies on BioJava. In this sense, thanks for your fine work! Martin _______________________________________________ biojava-dev mailing list biojava-dev@biojava.org http://biojava.org/mailman/listinfo/biojava-dev [ Attachment ''AAINDEXSTREAMREADER.JAVA'' removed by Mark Schreiber ] [ Attachment ''CLASSIFICATIONFASTADESCRIPTIONLINEPARSER.JAVA'' removed by Mark Schreiber ] [ Attachment ''AAINDEX.JAVA'' removed by Mark Schreiber ] [ Attachment ''SIMPLESYMBOLPROPERTYTABLEDB.JAVA'' removed by Mark Schreiber ] [ Attachment ''SYMBOLPROPERTYTABLEDB.JAVA'' removed by Mark Schreiber ] [ Attachment ''SYMBOLPROPERTYTABLEITERATOR.JAVA'' removed by Mark Schreiber ] From mark.schreiber at novartis.com Sun Sep 18 22:47:53 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Sep 18 22:55:18 2005 Subject: [Biojava-dev] ExternalProcess class Message-ID: Hi Martin - More interesting stuff! BioJava already has two classes to launch external processes, ExecRunner and ProcessTools. From memory one is better than the other but I can't for the life of me remember which (can anyone help me out?). We should deprecate the lesser of the two. Potentially yours is better than these two. Can you check out these two classes, if yours improves upon them then I'd be happy to check it in. - Mark "Martin Szugat" Sent by: biojava-dev-bounces@portal.open-bio.org 09/06/2005 07:55 PM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-dev] ExternalProcess class Hi! It's me again. I've implemented an ExternalProcess class. It encapsulates the necessity to run multi threaded input and output handlers when calling an external process using Runtime.exec(). The STDERR and STDOUT outputs as well as the STDIN input must be read/written in separate threads otherwise the calling application may hang up. A problem occurs when running an external program multiple times, e.g. running BLAST a thousand times. In this case for each iteration three threads are generated. Under Linux threads are implemented as processes, thus the process "java" is started three times for each iteration. However the thread objects are not terminated by the garbage collector and thus the threads/processes are not terminated. Even explicitly freeing the objects does not work (I've tested this several times). This results in an OutOfMemoryException after a few hundreds iterations, because the numbers of processes is limited under Linux. I've solved this problem by using BioJava's SimpleThreadPool. Output reading and input writing is handled by Runnable input/output handlers, e.g. using a StreamPipe objects it is possible to redirect the STDOUT of an external process to the STDOUT of the calling process. The usage of the ExternalProcess class is very simple, e.g. there are some simple static methods that encapsulate the internal complexity. The class also supports setting the environment variables or to inherit them from the parent process. In addition the working directory can be set or can be inherited. Finally there is a special feature: one can define variables for the command line arguments, e.g. "program -c %PARAMETER%"---%PARAMETER% is replaced by the value from a Properties object with the key "PARAMETER". The classes and interfaces are fully documentated and there is a (repeated) Unit-Test for the ExternalProcess class. I've attached this test, the ExternalProcess class as well as the various handler classes. I hope you'll find these classes useful and integrate them into BioJava. Best regards Martin ____________________ / Martin Szugat \ / Author and Developer \ +--------------------------------+---------------+ |Phone: +49 (0)821 4206442 |Address: | |Fax: +49 (0)821 4206443 | | |Mobil: +49 (0)179 7789714 |Zwerchgasse 6 | |Email: Martin.Szugat@GMX.net |86150 Augsburg | |Web: http://szugat.gmxhome.de |Germany | +--------------------------------+---------------+ _______________________________________________ biojava-dev mailing list biojava-dev@biojava.org http://biojava.org/mailman/listinfo/biojava-dev [ Attachment ''READERWRITERPIPE.JAVA'' removed by Mark Schreiber ] [ Attachment ''STREAMPIPE.JAVA'' removed by Mark Schreiber ] [ Attachment ''EXTERNALPROCESS.JAVA'' removed by Mark Schreiber ] [ Attachment ''EXTERNALPROCESSTEST.JAVA'' removed by Mark Schreiber ] [ Attachment ''OUTPUTHANDLER.JAVA'' removed by Mark Schreiber ] [ Attachment ''INPUTHANDLER.JAVA'' removed by Mark Schreiber ] [ Attachment ''SIMPLEOUTPUTHANDLER.JAVA'' removed by Mark Schreiber ] [ Attachment ''READERINPUTHANDLER.JAVA'' removed by Mark Schreiber ] [ Attachment ''SIMPLEINPUTHANDLER.JAVA'' removed by Mark Schreiber ] [ Attachment ''WRITEROUTPUTHANDLER.JAVA'' removed by Mark Schreiber ] From Martin.Szugat at GMX.net Mon Sep 19 05:59:12 2005 From: Martin.Szugat at GMX.net (Martin Szugat) Date: Mon Sep 19 06:05:57 2005 Subject: [Biojava-dev] ExternalProcess class In-Reply-To: Message-ID: <200509191005.j8JA5mnq013118@portal.open-bio.org> Hi Mark, > More interesting stuff! BioJava already has two classes to launch external > processes, ExecRunner and ProcessTools. From memory one is better than the > other but I can't for the life of me remember which (can anyone help me > out?). We should deprecate the lesser of the two. That's interesting! I didn't find these classes because they are not documented in the JavaDoc?! > > Potentially yours is better than these two. Can you check out these two > classes, if yours improves upon them then I'd be happy to check it in. I don't know if it's better but different: You can provide your custom (threaded) input and output handlers. I use it for parsing and writing BLAST output/input on the fly. I'll make a new package and send it you with the next mail. Best regards Martin > > - Mark > > > > > > "Martin Szugat" > Sent by: biojava-dev-bounces@portal.open-bio.org > 09/06/2005 07:55 PM > > > To: > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-dev] ExternalProcess class > > > Hi! > > It's me again. I've implemented an ExternalProcess class. It encapsulates > the necessity to run multi threaded input and output handlers when calling > an external process using Runtime.exec(). > > The STDERR and STDOUT outputs as well as the STDIN input must be > read/written in separate threads otherwise the calling application may > hang > up. A problem occurs when running an external program multiple times, e.g. > running BLAST a thousand times. In this case for each iteration three > threads are generated. Under Linux threads are implemented as processes, > thus the process "java" is started three times for each iteration. However > the thread objects are not terminated by the garbage collector and thus > the > threads/processes are not terminated. Even explicitly freeing the objects > does not work (I've tested this several times). This results in an > OutOfMemoryException after a few hundreds iterations, because the numbers > of > processes is limited under Linux. > > I've solved this problem by using BioJava's SimpleThreadPool. Output > reading > and input writing is handled by Runnable input/output handlers, e.g. using > a > StreamPipe objects it is possible to redirect the STDOUT of an external > process to the STDOUT of the calling process. > > The usage of the ExternalProcess class is very simple, e.g. there are some > simple static methods that encapsulate the internal complexity. The class > also supports setting the environment variables or to inherit them from > the > parent process. In addition the working directory can be set or can be > inherited. Finally there is a special feature: one can define variables > for > the command line arguments, e.g. "program -c %PARAMETER%"---%PARAMETER% is > replaced by the value from a Properties object with the key "PARAMETER". > > The classes and interfaces are fully documentated and there is a > (repeated) > Unit-Test for the ExternalProcess class. I've attached this test, the > ExternalProcess class as well as the various handler classes. > > I hope you'll find these classes useful and integrate them into BioJava. > > Best regards > > Martin > > ____________________ > / Martin Szugat \ > / Author and Developer \ > +--------------------------------+---------------+ > |Phone: +49 (0)821 4206442 |Address: | > |Fax: +49 (0)821 4206443 | | > |Mobil: +49 (0)179 7789714 |Zwerchgasse 6 | > |Email: Martin.Szugat@GMX.net |86150 Augsburg | > |Web: http://szugat.gmxhome.de |Germany | > +--------------------------------+---------------+ > > > > > > > > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev@biojava.org > http://biojava.org/mailman/listinfo/biojava-dev > > [ Attachment ''READERWRITERPIPE.JAVA'' removed by Mark Schreiber ] > [ Attachment ''STREAMPIPE.JAVA'' removed by Mark Schreiber ] > [ Attachment ''EXTERNALPROCESS.JAVA'' removed by Mark Schreiber ] > [ Attachment ''EXTERNALPROCESSTEST.JAVA'' removed by Mark Schreiber ] > [ Attachment ''OUTPUTHANDLER.JAVA'' removed by Mark Schreiber ] > [ Attachment ''INPUTHANDLER.JAVA'' removed by Mark Schreiber ] > [ Attachment ''SIMPLEOUTPUTHANDLER.JAVA'' removed by Mark Schreiber ] > [ Attachment ''READERINPUTHANDLER.JAVA'' removed by Mark Schreiber ] > [ Attachment ''SIMPLEINPUTHANDLER.JAVA'' removed by Mark Schreiber ] > [ Attachment ''WRITEROUTPUTHANDLER.JAVA'' removed by Mark Schreiber ] > > > _______________________________________________ > biojava-dev mailing list > biojava-dev@biojava.org > http://biojava.org/mailman/listinfo/biojava-dev From Martin.Szugat at GMX.net Mon Sep 19 06:26:58 2005 From: Martin.Szugat at GMX.net (Martin Szugat) Date: Mon Sep 19 06:27:02 2005 Subject: [Biojava-dev] SymbolPropertyTableIterator for AAindex files In-Reply-To: Message-ID: <200509191026.j8JAQrnq013434@portal.open-bio.org> Hi Mark, > This looks like interesting code. Unfortunately the > SimpleSymbolPropertyTable class seems to be missing. I can't quite follow > what you would use this for. Your code suggests you are reading AAIndex1 > format. As far as I can tell this appears to give a frequency?? of amino > acid usage. Does your code represent this as a biojava Distribution (or > have a way to convert it to one)? If it does this would be a fantastic way > of reading in background amino acid distributions. The AAindex database is a database of matrices which define different properties for the twenty amino acids. There is e.g. a matrix (or called index) for hydrophobicity, another index for polarity, etc. The AAindexStreamReader class which reads an AAindex1 file is an iterator over SymbolPropertyTable objects, i.e. each index is represented as a SymbolPropertyTable object. Using such a SymbolPropertyTable you can analyze an amino acid sequence and determine e.g. if a protein is more hydrophob or more hydrophil. This information can be used e.g. to classify proteins as transmembrane or non-transmembrane proteins. I hope it is now more clear what AAindex is about. I'll send you the code in a few minutes. > Do you have any plans to read AAIndex2 format? No, not yet, because both files, AAindex1 and AAindex2 contain as far as I know, the same data. > I think this would be good in biojava but more details would be good. > > I'm also quite interested in your link with BioWeka. Would you be > interested in providing an example for the biojava in anger pages? Yes, of course, however BioWeka is not an extension to BioJava. It just uses internally BioJava. But maybe it could be interesting how to implement e.g. converter classes for Weka on the basis BioWeka and BioJava, which can load sequence formats into Weka and write Weka ARFF files back into a sequence file format. At the moment I'm working on my Bachelor thesis about BioWeka, so there is not much time for it. But I put it on my (huge) list! Btw: I've written (a few month ago) an article about BioJava for the German Java Magazin: http://www.java-magazin.de/itr/ausgaben/psecom,id,244,nodeid,20.html If you (or the BioJava team) is interested in this (German) article let me know. I'll ask the editor if you can get a PDF for the BioJava web site. Best regards Martin > > Best regards, > > - Mark > > Mark Schreiber > Principal Scientist (Bioinformatics) > > Novartis Institute for Tropical Diseases (NITD) > 10 Biopolis Road > #05-01 Chromos > Singapore 138670 > www.nitd.novartis.com > > phone +65 6722 2973 > fax +65 6722 2910 > > > > > > "Martin Szugat" > Sent by: biojava-dev-bounces@portal.open-bio.org > 09/03/2005 07:09 AM > > > To: > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-dev] SymbolPropertyTableIterator for > AAindex files > > > Hi! > > I've implemented a stream reader for AAindex files (Amino acid indices and > similarity matrices, http://www.genome.ad.jp/dbget/aaindex.html) called > AAindexStreamReader. It implements an interface called > SymbolPropertyTableIterator which iterates over SymbolPropertyTable > objects. > The iterator is BioJava-style and fully documentated. The > AAindexStreamReader returns in fact AAindex objects which is derived from > SimpleSymbolPropertyTable and provides additional methods to set and > retrieve information that is stored within an AAindex file (in the > AAindex1 > format) like an hashtable of similar amino acid indices and its > correlation > coefficients. > > I'll hope you find these classes useful and integrate it into BioJava. If > you have further question or if some changes are needed don't hesitate to > contact me! I'd really like to see these classes in BioJava ;) > > In addition there are a few more classes that might be useful, too. First > there is an interface called SymbolPropertyTableDB (in analogy to the > SequenceDB interface) and a simple implementation called > SimpleSymbolPropertyTableDB (what a long name!). > > Finally there is a class called ClassificationFastaDescriptionLineParser > which extends SequenceBuilderFilter and extracts a classification value > (e.g. SCOP or CATH) from the description line of FASTA entries. This must > be > the second item in the description line after the name. The > ClassificationFastaDescriptionLineParser should be used in conjunction > with > the FastaDescriptionLineParser. > > I've implemented all these classes for an open source project called > BioWeka > (http://www.bioweka.org)---it's an extension to the Weka data mining > framework for bioinformaticians and biologists. And of course, it relies > on > BioJava. In this sense, thanks for your fine work! > > Martin > > > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev@biojava.org > http://biojava.org/mailman/listinfo/biojava-dev > > [ Attachment ''AAINDEXSTREAMREADER.JAVA'' removed by Mark Schreiber ] > [ Attachment ''CLASSIFICATIONFASTADESCRIPTIONLINEPARSER.JAVA'' removed by > Mark Schreiber ] > [ Attachment ''AAINDEX.JAVA'' removed by Mark Schreiber ] > [ Attachment ''SIMPLESYMBOLPROPERTYTABLEDB.JAVA'' removed by Mark > Schreiber ] > [ Attachment ''SYMBOLPROPERTYTABLEDB.JAVA'' removed by Mark Schreiber ] > [ Attachment ''SYMBOLPROPERTYTABLEITERATOR.JAVA'' removed by Mark > Schreiber ] From mark.schreiber at novartis.com Mon Sep 19 20:40:58 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Sep 19 20:41:07 2005 Subject: [Biojava-dev] ExternalProcess class Message-ID: Hi Martin - They are documented in the 1.4 API (http://www.biojava.org/docs/api14/index.html), but not the 1.3 API. Looking at the code further the ExecRunner appears to be the better of those two. Your I/O pattern may be more useful though. - Mark "Martin Szugat" 09/19/2005 05:59 PM To: Mark Schreiber/GP/Novartis@PH cc: Subject: RE: [Biojava-dev] ExternalProcess class Hi Mark, > More interesting stuff! BioJava already has two classes to launch external > processes, ExecRunner and ProcessTools. From memory one is better than the > other but I can't for the life of me remember which (can anyone help me > out?). We should deprecate the lesser of the two. That's interesting! I didn't find these classes because they are not documented in the JavaDoc?! > > Potentially yours is better than these two. Can you check out these two > classes, if yours improves upon them then I'd be happy to check it in. I don't know if it's better but different: You can provide your custom (threaded) input and output handlers. I use it for parsing and writing BLAST output/input on the fly. I'll make a new package and send it you with the next mail. Best regards Martin > > - Mark > > > > > > "Martin Szugat" > Sent by: biojava-dev-bounces@portal.open-bio.org > 09/06/2005 07:55 PM > > > To: > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-dev] ExternalProcess class > > > Hi! > > It's me again. I've implemented an ExternalProcess class. It encapsulates > the necessity to run multi threaded input and output handlers when calling > an external process using Runtime.exec(). > > The STDERR and STDOUT outputs as well as the STDIN input must be > read/written in separate threads otherwise the calling application may > hang > up. A problem occurs when running an external program multiple times, e.g. > running BLAST a thousand times. In this case for each iteration three > threads are generated. Under Linux threads are implemented as processes, > thus the process "java" is started three times for each iteration. However > the thread objects are not terminated by the garbage collector and thus > the > threads/processes are not terminated. Even explicitly freeing the objects > does not work (I've tested this several times). This results in an > OutOfMemoryException after a few hundreds iterations, because the numbers > of > processes is limited under Linux. > > I've solved this problem by using BioJava's SimpleThreadPool. Output > reading > and input writing is handled by Runnable input/output handlers, e.g. using > a > StreamPipe objects it is possible to redirect the STDOUT of an external > process to the STDOUT of the calling process. > > The usage of the ExternalProcess class is very simple, e.g. there are some > simple static methods that encapsulate the internal complexity. The class > also supports setting the environment variables or to inherit them from > the > parent process. In addition the working directory can be set or can be > inherited. Finally there is a special feature: one can define variables > for > the command line arguments, e.g. "program -c %PARAMETER%"---%PARAMETER% is > replaced by the value from a Properties object with the key "PARAMETER". > > The classes and interfaces are fully documentated and there is a > (repeated) > Unit-Test for the ExternalProcess class. I've attached this test, the > ExternalProcess class as well as the various handler classes. > > I hope you'll find these classes useful and integrate them into BioJava. > > Best regards > > Martin > > ____________________ > / Martin Szugat \ > / Author and Developer \ > +--------------------------------+---------------+ > |Phone: +49 (0)821 4206442 |Address: | > |Fax: +49 (0)821 4206443 | | > |Mobil: +49 (0)179 7789714 |Zwerchgasse 6 | > |Email: Martin.Szugat@GMX.net |86150 Augsburg | > |Web: http://szugat.gmxhome.de |Germany | > +--------------------------------+---------------+ > > > > > > > > > > > > > _______________________________________________ > biojava-dev mailing list > biojava-dev@biojava.org > http://biojava.org/mailman/listinfo/biojava-dev > > [ Attachment ''READERWRITERPIPE.JAVA'' removed by Mark Schreiber ] > [ Attachment ''STREAMPIPE.JAVA'' removed by Mark Schreiber ] > [ Attachment ''EXTERNALPROCESS.JAVA'' removed by Mark Schreiber ] > [ Attachment ''EXTERNALPROCESSTEST.JAVA'' removed by Mark Schreiber ] > [ Attachment ''OUTPUTHANDLER.JAVA'' removed by Mark Schreiber ] > [ Attachment ''INPUTHANDLER.JAVA'' removed by Mark Schreiber ] > [ Attachment ''SIMPLEOUTPUTHANDLER.JAVA'' removed by Mark Schreiber ] > [ Attachment ''READERINPUTHANDLER.JAVA'' removed by Mark Schreiber ] > [ Attachment ''SIMPLEINPUTHANDLER.JAVA'' removed by Mark Schreiber ] > [ Attachment ''WRITEROUTPUTHANDLER.JAVA'' removed by Mark Schreiber ] > > > _______________________________________________ > biojava-dev mailing list > biojava-dev@biojava.org > http://biojava.org/mailman/listinfo/biojava-dev From mark.schreiber at novartis.com Mon Sep 19 20:47:10 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Sep 19 20:47:17 2005 Subject: [Biojava-dev] SymbolPropertyTableIterator for AAindex files Message-ID: "Martin Szugat" 09/19/2005 06:26 PM To: Mark Schreiber/GP/Novartis@PH cc: Subject: RE: [Biojava-dev] SymbolPropertyTableIterator for AAindex files >> Do you have any plans to read AAIndex2 format? >No, not yet, because both files, AAindex1 and AAindex2 contain as far as I >know, the same data. I think the AAindex2 files are substitution matrices (eg PAM250 etc). These would also be good. >Btw: I've written (a few month ago) an article about BioJava for the German >Java Magazin: > >http://www.java-magazin.de/itr/ausgaben/psecom,id,244,nodeid,20.html > >If you (or the BioJava team) is interested in this (German) article let me >know. I'll ask the editor if you can get a PDF for the BioJava web site. Would be good to put this on the site (or link to it). An english translation would also be cool but not essential. - Mark