[BioPython] biopython tutorial

Tue Aug 5 22:28:29 UTC 2008

On Tue, Aug 5, 2008 at 10:45 PM, Nick Matzke <matzke at berkeley.edu> wrote:
> Hi again,
>
> I just ran through the biopython tutorial, sections 1 through 9.5.  It is
> really great, & thanks to the people who wrote it.

On behalf of all the other authors, thank you :)

> While copying-pasting code etc. to try it on my own system I noticed a few
> typos & other minor issues which I figured I should make note of for Peter
> or whomever maintains it.

Although I have made plenty of changes and updates to the tutorial,
its still a joint effort.  I probably tend to make more little fixes
than other people, which shows up more on the CVS history!

Little things like this are always worth pointing out - and comments
from new-comers and beginners can be extra helpful if they reveal
assumptions or other things that could be clearer.

> 1.
> my_blast_file = "m_cold.fasta"
> should be:
> my_blast_db = "m_cold.fasta"

I may have misunderstood you, but I think its correct.  There are two
important things for a BLAST search, the input file (here the FASTA
file m_cold.fasta) and the database to search against (in the example
b. subtilis sequences).

> 2.
> record[0]["GBSeq_definition"]
> 'Opuntia subulata rpl16 gene, intron; chloroplast'
>
> ...should be (AFAICT):

Something strange is going on - the NCBI didn't give me XML by default
as I expected:

from Bio import Entrez
handle = Entrez.efetch(db="nucleotide", id="57240072",
email="A.N.Other at example.com")
data = handle.read()
print data[:100]

It looks like the NCBI may have changed something - Michiel?

> 4.
> the 814 hits are now 816 throughout

That number is always going to increase - maybe we can reword things
slightly to make it clear that may not be exactly what the user will
see.

> 5.
> add links for prosite & swissprot db downloads

Where would you add these, and which URLs did you have in mind?

> 6.
> Nanoarchaeum equitans Kin4-M (RefSeq NC_005213, GenBank AE017199) which can
> be downloaded from the NCBI here (only 1.15 MB):
>
> link location is weird (only paren is linked)

Whoops - both the PDF and HTML are like that... looks like a mix up in
the LaTeX syntax.  Fixed in CVS.

>
> 7.
> ============
> As the name suggests, this is a really simple consensus calculator, and will
> just add up all of the residues at each point in the consensus, and if the
> most common value is higher than some threshold value (the default is .3)
> will add the common residue to the consensus. If it doesn't reach the
> threshold, it adds an ambiguity character to the consensus. The returned
> consensus object is Seq object whose alphabet is inferred from the alphabets
> of the sequences making up the consensus. So doing a print consensus would
> give:
>
> consensus Seq('TATACATNAAAGNAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAAAAAAATGAAT
> ...', IUPACAmbiguousDNA())
>
> You can adjust how dumb_consensus works by passing optional parameters:
>
> the threshold
>    This is the threshold specifying how common a particular residue has to
> be at a position before it is added. The default is .7.
> ============
>
> Is the default 0.3 or 0.7 -- I assume 0.7 for DNA.

The default is 0.7 for any sequence type (DNA, protein, etc).  Do you
mean which way round is the percentage counted (the letter has to be
above 70% I think)?

> 8.
> info_content = summary_align.information_content(5, 30, log_base = 10
>                                                 chars_to_ignore = ['N'])
> missing comma

Fixed in CVS.

> 9.
> 9.4.1  Using common substitution matrices
>
> blank

So it is - would anyone like to write something for this?

> 10.
> in PDB section:
>
> for model in structure.get_list()
>        for chain in model.get_list():
>                for residue in chain.get_list():
>
> ...first line needs colon (:)
>
> happens again lower down:
> for model in structure.get_list()
>        for chain in model.get_list():
>                for residue in chain.get_list():
>

Fixed two of these in CVS.

> 11.
> from PDBParser import PDBParser
>
> should be:
>
> from Bio.PDB.PDBParser import PDBParser

Fixed in CVS.

Note that we don't normally update the online copies of the HTML and
PDF tutorial between releases (so as to avoid talking about unreleased
features).  However, there have been a few updates to the Tutorial
since Biopython 1.47 so maybe we should consider it?

Thanks again Nick!

Peter