[Bioperl-l] sanity check my understanding of bioperl's location terminology?
heikki at sanbi.ac.za
Sat Oct 18 06:47:52 UTC 2008
You've got it all right expect the last bit.
Bio::LocatableSeq::column_from_residue_number() is a special case because it's
input is in original sequence coordinates, of which the LocatableSeq in
question is part of.
Look at the tests. I just added a few on column_from_residue_number() that
show that if you take a revcom() on a LocatebleSeq the outcome of this method
remains the same! The reason is that within a same alignment, a revcomed
sequence is not a the same one any more. You can not put it back into the same
The following demonstrates it by taking two sequences that happen to have
almost identical in the wrong strand (I hope I did not mess this up by doing
it by hand):
seq1 --atg---gtaa- -1
seq2 --atg---ctat- 1
Maybe Ewan can be dragged from his bioperl retirement to point us to an old
document somewhere that explains all the logic behind the way strand is used
On Saturday 18 October 2008 00:03:42 George Hartzell wrote:
> Hi All,
> I'm not sure that I'm understanding Bioperl's location terminology and
> how it's carried through into some basic technology like the
> LocatableSeqs. Hopefully this is more about communicating in the
> shared bioperl language and I'm not just demonstrating how much
> biology I've forgotten. This is being driven by my gmap SearchIO
> parser (which hopefully will get committed at some point), which
> currently returns GenericHSPs in GenericHits and from which I can
> retrieve SimpleAlign's. I'm just not sure that I'm translating from
> gmap-speak to bioperl-speak correctly (assumptions, it's always
> There's the basic truism from e.g. Bio::Range:
> length = end - start + 1
> end >= start
> strand = (-1 | 0 | +1)
> So if I have seq_id => foo
> 5' AACTGTTTGG 3'
> 1 5 1
> -start => 3, -end => 6, -strand => +1 would be: CTGT
> -start => 4, -end => 4, -strand => +1 would be: T
> Things get goofier when strand is -1, but I'm pretty confident that
> one would say
> -start => 4, -end => 4, -strand => -1 would be: A
> (but I'm worried that I should say it's T, or the reverse compliment
> of T or something complicated)
You are taking the reverse compliment of T , that is A.
> and slightly less confident that one would say
> -start => 3, -end => 6, -strand => -1 would be: ACAG
> (in other words, always spelling the sequence out 5' to 3' from the
> reverse strand in that range).
> Where I really get shaky about how I understand how things are
> supposed to be said is when LocatableSeqs get involved.
> If I have a simple align that contains a row w/
> -seq_id => foo, -start => 3, -end => 6, -strand => 1
> then I think that the row might look like this in an alignment:
> moose CTCT
> foo CTGT
> bar C-GT
> and that if
> moose AGAG
> foo ACAG
> bar AC-G
> were the rows in the alignment then it's info would be
> -seq_id => foo, -start => 3, -end => 6, -strand => -1.
> Now here's where I really loose faith in my understanding. There's
> code in Bio::LocatableSeq::column_from_residue_number (around line
> 301) that tests the strand and
> if strand == -1 then it counts back from the -end coordinate towards
> the -start
> but if it's +1 then it counts up from 0 towards the -end.
and it adds one to the final result.
> That only makes sense if the sequence string that's contained in the
> locatable seq that's part of the simple alignment is actually the
> forward strand, in which case the alignment would *look* really goofy
> visually. If on the other hand the string in the alignment were the
> 5' to 3' writing of reverse strand (which would look like it aligns
> with the other strings in the alignment because the bases would match)
> then the counting seems messed up.
> If someone wants to round out my understanding and squash any bugs in
> it, I can try to use it to seed a Location HowTo or something.
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
______ _/ _/_____________________________________________________
_/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za
_/_/_/_/_/ Senior Scientist skype: heikki_lehvaslaiho
_/ _/ _/ SANBI, South African National Bioinformatics Institute
_/ _/ _/ University of Western Cape, South Africa
_/ Phone: +27 21 959 2096 FAX: +27 21 959 2512
More information about the Bioperl-l