[Biojava-l] Re: Biojava-l digest, Vol 1 #44 - 2 msgs
Matthew Pocock
mrp@sanger.ac.uk
Mon, 20 Mar 2000 11:38:34 +0000
Aaron,
For those who are interested, GappedResidueList stores a list of ungapped aligned blocks, giving the start-end coordinates in the underlying sequence, and the start-end in the gapped view (this should probably change to two starts and a single length). If a residueAt request is within an aligned block I just flip the coordinates from view to source and get the underlying residue. If it is between blocks then it is a gap, so I return the gap residue. The apropreate block (or gap) can be efficiently found using a binary search. Inserting and removing gaps usualy just causes the view
indecies to be updated. Occasionaly a delete joins two alignmed blocks together, in which case they are merged. Sometimes a gap insertion breaks a block into two, so I create a new block and insert it into the blocks list.
The implementation is not exposed in the API, so if we can agree on the gap-edit opperations, then there is no reason not to make GappedResidueList an interface. Java is pants at string-manipulation (can be extreemly slow on some systems). Mabey we should have a format object that converts between the SeqStoor string system and a java object - in a class called something like org.biojava.bio.programs.GCG.formats?
Aaron, do you have read/write access? We can set you up if you don't.
Matthew
Aaron Kitzmiller wrote:
> I haven't seen this (org.biojava.bio.seq.GappedResidueList ) on the JavaDocs yet, so this may be a moot point, but I'm curious about how you implemented this and if you've created a GappedResidueList interface. The reason I ask is that I've been investigating the use of GCGs SeqStore for the storage of a number of things, including alignments. Their implementation uses a gap vector that stores offsets and gap sizes in a single text line. If you've built this with an interface, I should be able to create a SeqStore-specific implementation that will work with the rest of the code.
>
> Aaron K.
>
> Aaron Kitzmiller
> Genetics Institute
> 35 Cambridge Park Dr.
> Cambridge, MA 02140
> Phone: (617) 665-6831
> Fax: (617) 665-8870
> akitzmiller@genetics.com
>