[Biopython-dev] Sequences and simple plots

Peter biopython at maubp.freeserve.co.uk
Thu Sep 25 12:58:37 EDT 2008


> If anyone has any suggestions for similar examples let me know (with code
> would be great - but even a nice idea is worthwhile).

How about this example which draws a simple nucleotide dot plot for
the first two sequences in the input FASTA file?

#Step One, load the first two sequences as input
from Bio import SeqIO
handle = open("ls_orchid.fasta")
record_iterator = SeqIO.parse(handle, "fasta")
rec_one = record_iterator.next()
rec_two = record_iterator.next()
handle.close()

print "Comparing %s to %s" % (rec_one.id, rec_two.id)

#Step Two, compile a similarity matrix
# For simplicity, this is constructed as a list of lists
# of booleans (using a mismatch threshold would be more
# complicated).  Also I'm recording mismatches rather than
# matches because that gives a nice image with the pylab
# gray colour scheme used later.
window = 7
seq_one = rec_one.seq.tostring()
seq_two = rec_two.seq.tostring()
data = [[(seq_one[i:i+window] <> seq_two[j:j+window]) \
         for j in range(len(seq_one)-window)] \
        for i in range(len(seq_two)-window)]

#Step Three, plot using pylab
import pylab
pylab.gray()
pylab.imshow(data)
pylab.xlabel("%s (length %i bp)" % (rec_one.id, len(rec_one)))
pylab.ylabel("%s (length %i bp)" % (rec_two.id, len(rec_two)))
pylab.title("Dot plot using window size %i\n(allowing no miss-matches)" \
            % window)
#pylab.show()
pylab.savefig("dot_plot.png", dpi=75)
pylab.savefig("dot_plot.pdf")

Peter


More information about the Biopython-dev mailing list