[BioPython] Help for a presentation.

Wed Apr 15 13:59:04 UTC 2009

On Wed, Apr 15, 2009 at 2:35 PM, Sebastian Bassi <sbassi at gmail.com> wrote:
> I am working in a laptop session for a local workshop where I plan to
> show off some biopython features. The file would be available (CC-BY
> 3.0) after presentation in Crunchy compatible HTML (if you don't know
> Crunchy, take a look, it is impressive!).

Sharing the presentation sounds good, we can link to it from here when
its done if you like:
http://biopython.org/wiki/Documentation#Presentations

> In the following drill, the task is to read a DNA sequence from a
> genbank file, translate it to an aminoacid sequence and save it as a
> Fasta file.

Funnily enough, I just recently added a couple of cookbook examples
doing something a bit similar to this to the Tutorial in CVS.  Also,
have you looked at this page with some related stuff?
http://www.warwick.ac.uk/go/peter_cock/python/genbank2fasta/

> The code here does this, but I think it looks a little complex when I
> want to show that biopython way to do it is easy. So I wonder if
> someone knows how to modify this code to get the same result with less
> steps.
>
> from Bio import SeqIO
> from Bio.Seq import translate
> from Bio.SeqRecord import SeqRecord
>
> handle = open('ampRdna.gb')
> seq_record = SeqIO.read(handle, "genbank")

You never closed the input handle in the first place, so it should be
just as safe to just do this:
seq_record = SeqIO.read(open('ampRdna.gb'), "genbank")

I do this often myself - its output handles you must be careful about closing.

> print "DNA Sequence:",seq_record.seq
> # make translation (numbers here is where the CDS starts)
> # I don't want to grab the translated sequence from genbank file
> # , I want to show how to translate it.
> protseq = translate(seq_record.seq[89:694])

You can do that as a method call instead, which saves you an import line:
protseq = seq_record.seq[89:694].translate()

> # show translation
> print "Protein Sequence:",protseq
> # Make a SeqRecord
> seqid = seq_record.id
> seqdesc = seq_record.description
> protrec = SeqRecord(protseq,id=seqid,description=seqdesc)

I would merge those three lines as just:
protrec = SeqRecord(protseq, id=seq_record.id,
description=seq_record.description)

But is it meaningful to use the whole nucleotide's ID and description
for the protein?

> # save it to a fasta file.
> outfile_h = open('ampRprot.fasta','w')
> SeqIO.write([protrec],outfile_h,'fasta')
> outfile_h.close()

I'm not sure if you'll find it clearer or not, but you could change
the last three lines to this:

outfile_h = open('ampRprot.fasta','w')
outfile_h.write(protrec.format('fasta'))
outfile_h.close()

Peter