[Biojava-l] Getting a Slice of an Alignment
Richard Holland
richard.holland at ebi.ac.uk
Tue Jul 4 10:15:50 UTC 2006
Right... didn't hear anything, but I won't make the change. Here's
why...
I read the code again this morning, and realised that the vertical slice
effect can be achieved using existing methods in the alignment classes.
Given an alignment:
Alignment algn = .....;
And a sub-alignment:
Alignment sub = algn.subAlignment(null, new Location(5,10));
The sequences in the sub-alignment can be found like this:
Collection labels = sub.getLabels();
For any given label in this collection, the symbol list (for the full
sequence can be obtained like this):
SymbolList symbols = sub.symbolListForLabel(label);
That symbol list will include gaps in appropriate places. To find out
the offset of that symbollist within the alignment, you can do this:
Location offset = sub.locInAlignment(label);
This will return a location with the min and max set to the position of
the label in relation to the beginning of the alignment.
To explain this, our alignment 'algn' above contains sequence X:
SymbolList symbols = algn.symbolListForLabel("X");
// could have used sub instead but same symbollist returned.
Location algnOffset = algn.locInAlignment("X");
Location subOffset = sub.locInAlignment("X");
If the sub alignment is over positions 5..10 in the main alignment, and
sequence X is 20 bases long and begins 5 bases before the start of the
main alignment, then min and max for 'algnOffset' above will equal
-5..15, and for 'subOffset' will equal -10..10.
You can then use the min and max of 'subOffset' to obtain the chunk of
the sequence that actually occurs within the sub alignment:
int symStart = -subOffset.getMin();
int symEnd = symStart + subOffset.getMax();
If symStart is <=0, you need to change it to 1 and pad the result with a
number of leading gaps equivalent to the negative value.
If symStart is beyond the end of the symbol list, or symEnd is <=0, the
symbol list does not appear in this alignment.
If symEnd is beyond the end of the symbol list, you need to pad the
result with trailing gaps equivalent to the difference.
You can pass these values to symbol list to get the actual symbols that
occur in the sub alignment:
SymbolList subSymbols = symbols.subList(symStart, symEnd);
Doing this for each label returned by getLabels() in the sub alignment
will give you the vertical slice you're looking for.
cheers,
Richard
On Wed, 2006-06-28 at 09:57 +0100, Richard Holland wrote:
> Dear list... if I haven't heard any arguments to the contrary by 9am
> Monday 3rd July (UK time), I'll make the changes described below.
>
> cheers,
> Richard
>
> On Tue, 2006-06-27 at 12:57 -0700, Dexter Riley wrote:
> >
> > Richard Holland-2 wrote:
> > >
> > > Ah...
> > >
> > > I just read the source code for the symbolListForLabel() method on sub
> > > alignments, and found what may well be a bug.
> > >
> > > BioJava list people, your help please! In my understanding,
> > > symbolListForLabel() should return the symbols from the given label that
> > > fall within the alignment. This is the case in all except sub
> > > alignments. Sub alignments, for whatever reason, are returning the
> > > symbols from the given label that fall within the parent alignment upon
> > > which the sub alignment is based, NOT just those that fall within the
> > > sub alignment itself.
> > >
> > > Is this a bug? I think it is.
> > >
> > > The solution would be for me to alter
> > > AbstractULAlignment.SubULAlignment.symbolListForLabel() to restrict the
> > > returned symbols to only include those in the area covered by the sub
> > > alignment. It would return EMPTY_SEQUENCE if the label didn't cover the
> > > area of the sub alignment, and it would return a truncated symbol list
> > > if it only partially covered it.
> > >
> > > Would this be acceptable?
> > >
> > > If so, once this change was made, it would fix Ed's problems below as
> > > subAlignment() would start returning vertical slices as I think it
> > > should probably have done so from the start, rather than the horizontal
> > > slices it is returning at present.
> > >
> > > cheers,
> > > Richard
> > >
> >
> > I think that would provide just the functionality I was looking for! Thanks
> > very much for all your help.
> > All the best,
> > Ed
--
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416
More information about the Biojava-l
mailing list