[Biopython-dev] Sequences and simple plots
Peter
biopython at maubp.freeserve.co.uk
Fri Sep 26 10:15:52 UTC 2008
On Thu, Sep 25, 2008 at 8:39 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Thu, Sep 25, 2008 at 7:34 PM, Jared Flatow <jflatow at northwestern.edu> wrote:
>>
>> Hi Peter,
>>
>> Good ideas for some useful examples! (though I can't actually find them in
>> the cookbook...)
>
> They are in CVS only at the moment - I can send you the PDF of the
> current tutorial if you like off list. We don't normally update the
> tutorial on the website except as part of making a new release - this
> avoid the tutorial talking about unreleased code.
Cut and paste for people to comment on directly,
The first shows a histogram of sequence lengths in a FASTA file (based
having recently done this for some real assembly data). Sample output:
http://biopython.org/DIST/docs/tutorial/images/hist_plot.png
from Bio import SeqIO
handle = open("ls_orchid.fasta")
sizes = [len(seq_record) for seq_record in SeqIO.parse(handle, "fasta")]
handle.close()
import pylab
pylab.hist(sizes, bins=20)
pylab.title("%i orchid sequences\nLengths %i to %i" \
% (len(sizes),min(sizes),max(sizes)))
pylab.xlabel("Sequence length (bp)")
pylab.ylabel("Count")
pylab.show()
The second is based on the GC% example we used for the BOSC 2008
presentation: http://biopython.org/DIST/docs/tutorial/images/gc_plot.png
from Bio import SeqIO
from Bio.SeqUtils import GC
handle = open("ls_orchid.fasta")
gc_values = [GC(seq_record.seq) for seq_record in SeqIO.parse(handle, "fasta")]
gc_values.sort()
handle.close()
import pylab
pylab.plot(gc_values) pylab.title("%i orchid sequences\nGC%% %0.1f to %0.1f" \
% (len(gc_values),min(gc_values),max(gc_values)))
pylab.xlabel("Genes")
pylab.ylabel("GC%")
pylab.show()
Peter
More information about the Biopython-dev
mailing list