[BioRuby] Alignment plugin
Rutger Vos
rutgeraldo at gmail.com
Mon Apr 26 13:03:35 UTC 2010
Do you feel that these objects could/should also double as
character-state matrices?
On Mon, Apr 26, 2010 at 1:53 PM, Pjotr Prins <pjotr.public14 at thebird.nl> wrote:
> I am thinking of creating some new infrastructure for alignments.
>
> The Bioruby alignment architecture is not great. It contains a lot of useful
> functionality, but it is purely sequence organized. I did a writeup on the
> Bioruby blog - on ALN support and colorized HTML - if you remember.
>
> For completeness I checked the BioJAVA and BioPython implementations.
>
> The BioJAVA alignment classes are in a deep tree:
>
> biojava/alignment/src/main/java/org/biojava/bio/alignment
>
> the implementation troubles me. Partly it is JAVA itself - which makes code
> feel dispersed. Partly it is the implementation, which appears to be minimal. I
> guess it is a work in progress.
>
> The BioPython version looks like it is the best of the three. Some
> separation of responsibilities. Good documentation, and good
> validation and testing. I like that. Otherwise, functionally it is
> mostly comparable to BioRuby.
>
> The trick of designing good alignment classes is to make them small and fork
> out responsibilities. The BioJAVA version does not contain much. The BioRuby
> version has everything in one place, including the kitchen sink. BioPython goes
> some way towards what it should be, but it does not look more
> extensible than what we have (and I don't want to use Python).
>
> It sucks. I don't feel like replicating all other code. At the same time I want
> something cleaner.
>
> The PAML output adds information for each column of an alignment.
> Besides we deal with the translated alignment too. So PAML requires a
> dual alignment standard (NU+AA) with columnwise information (homology,
> evidence of positive selection). Add to that the phylogentic tree. For
> my current work I are going to add column-wise and row-wise 'meta'
> information, which is used for output (both HTML and graphics).
>
> I guess the best option is to write two BioRuby plugins. One for the
> new alignment storage and one for PAML alignments, which will include
> meta-info and output functionality. Questions:
>
> * What is the way to store alignments - should gaps be represented as dashes?
> * Should we use a String format?
> * How do we handle multi-value fields (e.g. degenerates)?
> * How do we handle quality scores (sequencers)?
>
> I think the underlying storage format should not be String - as it allows
> toying with the data - say, by embedding HTML. Properties, like
> colors, should be added on top of the alignment structure, not within.
> We should also allow for (future) stronger type checking of
> nucleotides and amino acids.
>
> If we can convert easily to the standard BioRuby alignment old
> functionality can be retained. Though it may not always be that
> natural.
>
> With Ruby a string type may be the most obvious choice (a lists of
> lists of a special nucleotide object is probably overkill, though it
> should not be).
>
> Anyone interested in participating?
>
> With regard to plugins: for now I will merely create a separate
>
> pluginname/lib/bio/pluginname.rb
>
> and add that to the include path. That should be OK for now. It will
> allow adding it as a gem too.
>
> Pj.
> _______________________________________________
> BioRuby Project - http://www.bioruby.org/
> BioRuby mailing list
> BioRuby at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioruby
>
--
Dr. Rutger A. Vos
School of Biological Sciences
Philip Lyle Building, Level 4
University of Reading
Reading
RG6 6BX
United Kingdom
Tel: +44 (0) 118 378 7535
http://www.nexml.org
http://rutgervos.blogspot.com
More information about the BioRuby
mailing list