[Biopython-dev] Start/end co-ordinates in SeqRecord objects?

Tue May 22 10:44:37 UTC 2012

On Tue, May 22, 2012 at 10:44 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, May 22, 2012 at 2:25 AM, Brad Chapman <chapmanb at 50mail.com> wrote:
>>> Thoughts or feedback please? Would a worked example
>>> help with my explanation?
>>
>> A worked example might help: not totally sure I grasp all the
>> subtleties,
>> Brad
>
> OK. This will work best in a mono-spaced font. This was
> picked out of one of our unit tests, bt009.txt - I just looked
> for a BLAST pairwise alignment with some gaps:
>
> [BLASTP example]

On reflection, a translated BLAST search would have been more
interesting - then you've got at least another layer of co-ordinate
transformations to worry about. e.g. for TBLASTX,

query nucleotide <-> query protein <-> gapped protein <->
matched protein <-> matched nucleotide.

Looking at a short snippet from example bt096.txt, an easy case
in that there are no gaps, we have:

 Score =  100 bits (214),  Expect(2) = 4e-49
 Identities = 37/44 (84%), Positives = 38/44 (86%), Gaps = 0/44 (0%)
 Frame = -2/-2

Query  148  FCIFSRDGVLPCWSGWSRTPDLR*SACLGLPKCWDYRCEPPRPA  17
            FCIFSRDGV  CW GWSRTPDL+*S  LGLPKCWDYR EPPRPA
Sbjct  630  FCIFSRDGVSSCWPGWSRTPDLK*STHLGLPKCWDYRREPPRPA  499

The translated query sequence is 44 amino acids (including a stop
codon), thus 44*3 = 132 base pairs, explaining how it runs from position
148 to 17 (one based) in the nucleotide query sequence.

Currently Bio.SeqFeature.FeatureLocation doesn't have anything
really intended for mixing nucleotide and protein coordinates, so
that may not be the best fit for how to hold and manipulate these
co-ordinates.

Hmm.

Peter