[Biopython-dev] Re: [BioPython] first steps into python
Thomas Sicheritz-Ponten
thomas at cbs.dtu.dk
Sat Feb 24 11:11:01 EST 2001
Ewan Birney <birney at ebi.ac.uk> writes:
> Ok. To help Ensembl-->python (probably via CORBA) integration. I have
> downloaded biopython and biopython-corba.
>
> I am, therefore, belated learning python. Don't expect any road-to-damscus
> type conversions (yet) however...
Nice - I am currently working on understanding your perl scripts in order
to make a lightweight python interface to ensembl. Maybe we could join
efforts ?
>
> Some questions
> (a) what is the difference between def __name__ and def name functions?
* _single_leading_underscore: weak "internal use" indicator
* single_trailing_underscore_: used by convention to avoid conflicts with Python keyword
* __double_leading_underscore: class-private names in Python 1.4.
*__double_leading_and_trailing_underscore__: "magic" objects or attributes that live in user-controlled namespaces, e.g. __init__, __import__ or __file__.
user code should generally refrain from using this convention for
its own use.
* Attributes starting with two underschores, eg. "__n" are renamed
when byte-compiled to "_CLASS__VARNAME". Since the class's name is
used as part of the variable name, the variable "__n" in a subclass
would not be the same as in the superclass. This is probably the
closest to 'private' as you will get.
> (b) how is inhertiance done in python
>
class ensembl:
.. code something
class EnsemblSQL(ensembl):
... code something
> (c) is there any concept/files for interfaces either expliciting (java) or
> "just documentation" like bioperl's "I" files
>
I am not sure if I understood your question ..
> (d) could some one sketch out an easy biopython script like:
> read embl file.
> test whether there is a cacaca repeat in there
> if yes, dump a genbank file.
I don't know if there is a already working embl or genbank parser in the biopython
core so I give you an example for a Fasta file.
==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP
import sys
from Bio.Fasta import Fasta
# the filename as first commanline argument
file = sys.argv[1]
# open a Fasta parser
iter = Fasta.Iterator(handle = open(file), parser = Fasta.RecordParser())
repeat = 'CACACA'
# loop over all sequence entries in the fasta file
while 1:
rec = iter.next()
if not rec: break
sequence = rec.sequence
# test with a simple string count
n = sequence.count(repeat)
if n:
print repeat, 'occured', n, 'times in', rec.title
else:
print 'nope'
==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP
If you want to look at actual code, I can put my last weeks, quick and
dirty python script to the ensembl mysql db on www.cbs.dtu.dk/thomas/ensembl.py
Don't hesitate asking/mailing me if you run into problems with python or
have other general questions.
Gotta run to a party :-)
cheers,
-thomas
--
Sicheritz-Ponten Thomas, Ph.D CBS, Department of Biotechnology
thomas at biopython.org The Technical University of Denmark
CBS: +45 45 252489 Building 208, DK-2800 Lyngby
Fax +45 45 931585 http://www.cbs.dtu.dk/thomas
De Chelonian Mobile ... The Turtle Moves ...
More information about the Biopython-dev
mailing list