[Biopython-dev] Re: [BioPython] first steps into python

Thomas Sicheritz-Ponten thomas at cbs.dtu.dk
Sat Feb 24 11:11:01 EST 2001


Ewan Birney <birney at ebi.ac.uk> writes:

> Ok. To help Ensembl-->python (probably via CORBA) integration. I have
> downloaded biopython and biopython-corba.
> 
> I am, therefore, belated learning python. Don't expect any road-to-damscus
> type conversions (yet) however...

Nice - I am currently working on understanding your perl scripts in order
to make a lightweight python interface to ensembl. Maybe we could join
efforts ? 

> 
> Some questions
> (a) what is the difference between def __name__ and def name functions?

* _single_leading_underscore: weak "internal use" indicator
* single_trailing_underscore_: used by convention to avoid conflicts with Python keyword
* __double_leading_underscore: class-private names in Python 1.4.
*__double_leading_and_trailing_underscore__: "magic" objects or attributes that live in user-controlled namespaces, e.g. __init__, __import__ or __file__.  
       user code should generally refrain from using this convention for
       its own use. 

* Attributes starting with two underschores, eg. "__n" are renamed
when byte-compiled to "_CLASS__VARNAME".  Since the class's name is
used as part of the variable name, the variable "__n" in a subclass
would not be the same as in the superclass.  This is probably the
closest to 'private' as you will get.


> (b) how is inhertiance done in python
> 
class ensembl:
   .. code something

class EnsemblSQL(ensembl):
   ... code something


> (c) is there any concept/files for interfaces either expliciting (java) or
> "just documentation" like bioperl's "I" files
> 
I am not sure if I understood your question ..

> (d) could some one sketch out an easy biopython script like:
>    read embl file.
> 	test whether there is a cacaca repeat in there
> 	if yes, dump a genbank file.


I don't know if there is a already working embl or genbank parser in the biopython
core so I give you an example for a Fasta file.

==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP
import sys
from Bio.Fasta import Fasta

# the filename as first commanline argument
file = sys.argv[1]
# open a Fasta parser
iter = Fasta.Iterator(handle = open(file), parser = Fasta.RecordParser())


repeat = 'CACACA'

# loop over all sequence entries in the fasta file
while 1:
    rec = iter.next()
    if not rec: break

    sequence = rec.sequence

    # test with a simple string count
    n = sequence.count(repeat)
    if n:
        print repeat, 'occured', n, 'times in', rec.title

    else:
        print 'nope'

==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP

If you want to look at actual code, I can put my last weeks, quick and
dirty python script to the ensembl mysql db on www.cbs.dtu.dk/thomas/ensembl.py

Don't hesitate asking/mailing me if you run into problems with python or
have other general questions.

Gotta run to a party :-)

cheers,
-thomas


-- 
Sicheritz-Ponten Thomas, Ph.D  CBS, Department of Biotechnology
thomas at biopython.org           The Technical University of Denmark
CBS:  +45 45 252489            Building 208, DK-2800 Lyngby
Fax   +45 45 931585            http://www.cbs.dtu.dk/thomas

	De Chelonian Mobile ... The Turtle Moves ...



More information about the Biopython-dev mailing list