[Biojava-dev] SymbolPropertyTableIterator for AAindex files

Mon Sep 19 06:26:58 EDT 2005

Hi Mark,

> This looks like interesting code. Unfortunately the
> SimpleSymbolPropertyTable class seems to be missing. I can't quite follow
> what you would use this for. Your code suggests you are reading AAIndex1
> format. As far as I can tell this appears to give a frequency?? of amino
> acid usage. Does your code represent this as a biojava Distribution (or
> have a way to convert it to one)? If it does this would be a fantastic way
> of reading in background amino acid distributions.

The AAindex database is a database of matrices which define different
properties for the twenty amino acids. There is e.g. a matrix (or called
index) for hydrophobicity, another index for polarity, etc. The
AAindexStreamReader class which reads an AAindex1 file is an iterator over
SymbolPropertyTable objects, i.e. each index is represented as a
SymbolPropertyTable object. Using such a SymbolPropertyTable you can analyze
an amino acid sequence and determine e.g. if a protein is more hydrophob or
more hydrophil. This information can be used e.g. to classify proteins as
transmembrane or non-transmembrane proteins.

I hope it is now more clear what AAindex is about. I'll send you the code in
a few minutes.

> Do you have any plans to read AAIndex2 format?

No, not yet, because both files, AAindex1 and AAindex2 contain as far as I
know, the same data.

> I think this would be good in biojava but more details would be good.
> 
> I'm also quite interested in your link with BioWeka. Would you be
> interested in providing an example for the biojava in anger pages?

Yes, of course, however BioWeka is not an extension to BioJava. It just uses
internally BioJava. But maybe it could be interesting how to implement e.g.
converter classes for Weka on the basis BioWeka and BioJava, which can load
sequence formats into Weka and write Weka ARFF files back into a sequence
file format. At the moment I'm working on my Bachelor thesis about BioWeka,
so there is not much time for it. But I put it on my (huge) list!

Btw: I've written (a few month ago) an article about BioJava for the German
Java Magazin: 

http://www.java-magazin.de/itr/ausgaben/psecom,id,244,nodeid,20.html

If you (or the BioJava team) is interested in this (German) article let me
know. I'll ask the editor if you can get a PDF for the BioJava web site.

Best regards

Martin

> 
> Best regards,
> 
> - Mark
> 
> Mark Schreiber
> Principal Scientist (Bioinformatics)
> 
> Novartis Institute for Tropical Diseases (NITD)
> 10 Biopolis Road
> #05-01 Chromos
> Singapore 138670
> www.nitd.novartis.com
> 
> phone +65 6722 2973
> fax  +65 6722 2910
> 
> 
> 
> 
> 
> "Martin Szugat" <Martin.Szugat at gmx.net>
> Sent by: biojava-dev-bounces at portal.open-bio.org
> 09/03/2005 07:09 AM
> 
> 
>         To:     <biojava-dev at biojava.org>
>         cc:     (bcc: Mark Schreiber/GP/Novartis)
>         Subject:        [Biojava-dev] SymbolPropertyTableIterator for
> AAindex files
> 
> 
> Hi!
> 
> I've implemented a stream reader for AAindex files (Amino acid indices and
> similarity matrices, http://www.genome.ad.jp/dbget/aaindex.html) called
> AAindexStreamReader. It implements an interface called
> SymbolPropertyTableIterator which iterates over SymbolPropertyTable
> objects.
> The iterator is BioJava-style and fully documentated. The
> AAindexStreamReader returns in fact AAindex objects which is derived from
> SimpleSymbolPropertyTable and provides additional methods to set and
> retrieve information that is stored within an AAindex file (in the
> AAindex1
> format) like an hashtable of similar amino acid indices and its
> correlation
> coefficients.
> 
> I'll hope you find these classes useful and integrate it into BioJava. If
> you have further question or if some changes are needed don't hesitate to
> contact me! I'd really like to see these classes in BioJava ;)
> 
> In addition there are a few more classes that might be useful, too. First
> there is an interface called SymbolPropertyTableDB (in analogy to the
> SequenceDB interface) and a simple implementation called
> SimpleSymbolPropertyTableDB (what a long name!).
> 
> Finally there is a class called ClassificationFastaDescriptionLineParser
> which extends SequenceBuilderFilter and extracts a classification value
> (e.g. SCOP or CATH) from the description line of FASTA entries. This must
> be
> the second item in the description line after the name. The
> ClassificationFastaDescriptionLineParser should be used in conjunction
> with
> the FastaDescriptionLineParser.
> 
> I've implemented all these classes for an open source project called
> BioWeka
> (http://www.bioweka.org)---it's an extension to the Weka data mining
> framework for bioinformaticians and biologists. And of course, it relies
> on
> BioJava. In this sense, thanks for your fine work!
> 
> Martin
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev
> 
> [ Attachment ''AAINDEXSTREAMREADER.JAVA'' removed by Mark Schreiber ]
> [ Attachment ''CLASSIFICATIONFASTADESCRIPTIONLINEPARSER.JAVA'' removed by
> Mark Schreiber ]
> [ Attachment ''AAINDEX.JAVA'' removed by Mark Schreiber ]
> [ Attachment ''SIMPLESYMBOLPROPERTYTABLEDB.JAVA'' removed by Mark
> Schreiber ]
> [ Attachment ''SYMBOLPROPERTYTABLEDB.JAVA'' removed by Mark Schreiber ]
> [ Attachment ''SYMBOLPROPERTYTABLEITERATOR.JAVA'' removed by Mark
> Schreiber ]