[Biojava-dev] SymbolPropertyTableIterator for AAindex files

Sun Sep 18 22:38:04 EDT 2005

Hi Martin,

This looks like interesting code. Unfortunately the 
SimpleSymbolPropertyTable class seems to be missing. I can't quite follow 
what you would use this for. Your code suggests you are reading AAIndex1 
format. As far as I can tell this appears to give a frequency?? of amino 
acid usage. Does your code represent this as a biojava Distribution (or 
have a way to convert it to one)? If it does this would be a fantastic way 
of reading in background amino acid distributions.

Do you have any plans to read AAIndex2 format?

I think this would be good in biojava but more details would be good.

I'm also quite interested in your link with BioWeka. Would you be 
interested in providing an example for the biojava in anger pages?

Best regards,

- Mark

Mark Schreiber
Principal Scientist (Bioinformatics)

Novartis Institute for Tropical Diseases (NITD)
10 Biopolis Road
#05-01 Chromos
Singapore 138670
www.nitd.novartis.com

phone +65 6722 2973
fax  +65 6722 2910

"Martin Szugat" <Martin.Szugat at gmx.net>
Sent by: biojava-dev-bounces at portal.open-bio.org
09/03/2005 07:09 AM

        To:     <biojava-dev at biojava.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-dev] SymbolPropertyTableIterator for AAindex files

Hi!

I've implemented a stream reader for AAindex files (Amino acid indices and
similarity matrices, http://www.genome.ad.jp/dbget/aaindex.html) called
AAindexStreamReader. It implements an interface called
SymbolPropertyTableIterator which iterates over SymbolPropertyTable 
objects.
The iterator is BioJava-style and fully documentated. The
AAindexStreamReader returns in fact AAindex objects which is derived from
SimpleSymbolPropertyTable and provides additional methods to set and
retrieve information that is stored within an AAindex file (in the 
AAindex1
format) like an hashtable of similar amino acid indices and its 
correlation
coefficients.

I'll hope you find these classes useful and integrate it into BioJava. If
you have further question or if some changes are needed don't hesitate to
contact me! I'd really like to see these classes in BioJava ;)

In addition there are a few more classes that might be useful, too. First
there is an interface called SymbolPropertyTableDB (in analogy to the
SequenceDB interface) and a simple implementation called
SimpleSymbolPropertyTableDB (what a long name!).

Finally there is a class called ClassificationFastaDescriptionLineParser
which extends SequenceBuilderFilter and extracts a classification value
(e.g. SCOP or CATH) from the description line of FASTA entries. This must 
be
the second item in the description line after the name. The
ClassificationFastaDescriptionLineParser should be used in conjunction 
with
the FastaDescriptionLineParser.

I've implemented all these classes for an open source project called 
BioWeka
(http://www.bioweka.org)---it's an extension to the Weka data mining
framework for bioinformaticians and biologists. And of course, it relies 
on
BioJava. In this sense, thanks for your fine work!

Martin

_______________________________________________
biojava-dev mailing list
biojava-dev at biojava.org
http://biojava.org/mailman/listinfo/biojava-dev

[ Attachment ''AAINDEXSTREAMREADER.JAVA'' removed by Mark Schreiber ]
[ Attachment ''CLASSIFICATIONFASTADESCRIPTIONLINEPARSER.JAVA'' removed by 
Mark Schreiber ]
[ Attachment ''AAINDEX.JAVA'' removed by Mark Schreiber ]
[ Attachment ''SIMPLESYMBOLPROPERTYTABLEDB.JAVA'' removed by Mark 
Schreiber ]
[ Attachment ''SYMBOLPROPERTYTABLEDB.JAVA'' removed by Mark Schreiber ]
[ Attachment ''SYMBOLPROPERTYTABLEITERATOR.JAVA'' removed by Mark 
Schreiber ]