[Biopython] Start positions for local pairwise alignments?

Jan T Kim jttkim at googlemail.com
Mon Sep 3 05:31:23 EDT 2012


Dear All,

after reading a pairwise alignment computed using the EMBOSS water
program, is it possible to find out the indices of the sequences in
the local alignment within the input sequences?

As an illustration, the sequences "tttagagccc" and "ccagagc" align to

    s1                 4 agagc      8
			 |||||
    s2                 3 agagc      7

This local alignment doesn't contain the prefixes "ttt" and "cc",
respectively. In the water output above, that's reflected by the
start indices 4 and 3, respectively. However, after reading that
result with

    import Bio.AlignIO

    aStream = Bio.AlignIO.parse('s1s2_align.txt', 'emboss')
    a = aStream.next()
    print a
    print a.__dict__
    print a[0]
    print a[0].__dict__

I can't seem to find that information anywhere either in the resulting
Bio.Align.MultipleSeqAlignment object, or in the SeqRecord objects
that it contains.

So, am I looking at the wrong place?

Best regards, Jan

P.S.: For a while I was convinced that I had seen these indices but it's
now occurred to me that that was actually in the pysam.AlignedRead class,
which contains the indices of the read in the reference sequence, in the
positions instance variable...

-- 
 +- Jan T. Kim -------------------------------------------------------+
 |             email: jttkim at gmail.com                                |
 |             WWW:   http://www.jtkim.dreamhosters.com/              |
 *-----=<  hierarchical systems are for files, not for humans  >=-----*


More information about the Biopython mailing list