[BioPython] biopython tutorial

Nick Matzke matzke at berkeley.edu
Tue Aug 5 21:45:56 UTC 2008


Hi again,

I just ran through the biopython tutorial, sections 1 through 9.5.  It 
is really great, & thanks to the people who wrote it.

While copying-pasting code etc. to try it on my own system I noticed a 
few typos & other minor issues which I figured I should make note of for 
Peter or whomever maintains it.

Thanks again for the tutorial!
Nick


1.
my_blast_file = "m_cold.fasta"
should be:
my_blast_db = "m_cold.fasta"


2.
record[0]["GBSeq_definition"]
'Opuntia subulata rpl16 gene, intron; chloroplast'

...should be (AFAICT):
record['Bioseq-set_seq-set'][0]['Seq-entry_set']['Bioseq-set']['Bioseq-set_seq-set'][0]['Seq-entry_seq']['Bioseq']['Bioseq_descr']['Seq-descr'][2]['Seqdesc_title']



3.
 >>> record[0]["GBSeq_source"]
'chloroplast Austrocylindropuntia subulata'
...the exact string 'chloroplast Austrocylindropuntia subulata' doesn't 
seem to exist in the downloaded data, so not sure what is meant...


4.
the 814 hits are now 816 throughout


5.
add links for prosite & swissprot db downloads


6.
Nanoarchaeum equitans Kin4-M (RefSeq NC_005213, GenBank AE017199) which 
can be downloaded from the NCBI here (only 1.15 MB):

link location is weird (only paren is linked)


7.
============
As the name suggests, this is a really simple consensus calculator, and 
will just add up all of the residues at each point in the consensus, and 
if the most common value is higher than some threshold value (the 
default is .3) will add the common residue to the consensus. If it 
doesn’t reach the threshold, it adds an ambiguity character to the 
consensus. The returned consensus object is Seq object whose alphabet is 
inferred from the alphabets of the sequences making up the consensus. So 
doing a print consensus would give:

consensus Seq('TATACATNAAAGNAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAAAAAAATGAAT
...', IUPACAmbiguousDNA())

You can adjust how dumb_consensus works by passing optional parameters:

the threshold
     This is the threshold specifying how common a particular residue 
has to be at a position before it is added. The default is .7.
============

Is the default 0.3 or 0.7 -- I assume 0.7 for DNA.


8.
info_content = summary_align.information_content(5, 30, log_base = 10
                                                  chars_to_ignore = ['N'])
missing comma



9.
9.4.1  Using common substitution matrices

blank


10.
in PDB section:

for model in structure.get_list()
	for chain in model.get_list():
		for residue in chain.get_list():

...first line needs colon (:)

happens again lower down:
for model in structure.get_list()
	for chain in model.get_list():
		for residue in chain.get_list():
	



11.
from PDBParser import PDBParser

should be:

from Bio.PDB.PDBParser import PDBParser




More information about the Biopython mailing list