[Biopython-dev] Sequences and simple plots
Jared Flatow
jflatow at northwestern.edu
Fri Sep 26 13:40:46 UTC 2008
On Sep 26, 2008, at 5:15 AM, Peter wrote:
> Cut and paste for people to comment on directly,
Ok, cool.
> The first shows a histogram of sequence lengths in a FASTA file (based
> having recently done this for some real assembly data). Sample
> output:
> http://biopython.org/DIST/docs/tutorial/images/hist_plot.png
>
> from Bio import SeqIO
> handle = open("ls_orchid.fasta")
> sizes = [len(seq_record) for seq_record in SeqIO.parse(handle,
> "fasta")]
> handle.close()
>
> import pylab
> pylab.hist(sizes, bins=20)
> pylab.title("%i orchid sequences\nLengths %i to %i" \
> % (len(sizes),min(sizes),max(sizes)))
> pylab.xlabel("Sequence length (bp)")
> pylab.ylabel("Count")
> pylab.show()
Its a perfectly fine example, my only comment would be to do something
like this:
seqs = list(SeqIO.parse(handle, 'fasta'))
hist([len(seq) for seq in seqs], bins=20)
I like to keep the whole sequences in memory, especially if I am just
digging around the data. Also I use the alpha parameter a lot for
histograms, especially when doing overlapping ones. So then you can
also do something like this:
hist([len(seq) for seq in seqs if GC(seq.seq) < .5], bins=20, alpha=.
5, fc='r')
hist([len(seq) for seq in seqs if GC(seq.seq) >= .5], bins=20, alpha=.
5, fc='b')
> The second is based on the GC% example we used for the BOSC 2008
> presentation: http://biopython.org/DIST/docs/tutorial/images/gc_plot.png
>
> from Bio import SeqIO
> from Bio.SeqUtils import GC
> handle = open("ls_orchid.fasta")
> gc_values = [GC(seq_record.seq) for seq_record in
> SeqIO.parse(handle, "fasta")]
> gc_values.sort()
> handle.close()
>
> import pylab
> pylab.plot(gc_values) pylab.title("%i orchid sequences\nGC%% %0.1f
> to %0.1f" \
> % (len(gc_values),min(gc_values),max(gc_values)))
> pylab.xlabel("Genes")
> pylab.ylabel("GC%")
> pylab.show()
Again, if you had all the sequences in a list:
plot(sorted(GC(seq.seq) for seq in seqs))
jared
More information about the Biopython-dev
mailing list