[Biopython-dev] Re: [BioPython] first steps into python

Jason Stajich jason at chg.mc.duke.edu
Sat Feb 24 14:05:03 EST 2001


[cross-posted ]

Ewan & Thomas,

I am happy to work on the perl corba end and I am sure Brad Chapman
would be happy to do the python corba end if we can get an IDL that can
describe the ensembl data.  The question is if we can squeeze ensembl
objects into the current bioperl, biopython objects. I think the Bioperl
gene objects are robust enough, not sure about biopython.

I think it would be a real advantage to build the CORBA bridge because it
would give us the ability to write programs to access the data withough
having to install the mysql server locally.

Otherwise Thomas is stuck learning Ensembl table structure and writing SQL 
which means when/if Ensembl decides to change table structure his code
stops working.  IMHO that is bad.   But Thomas may not want to wait for us
to get this going....

I'm willing to do the coding for this, but I'd need some help with the IDL
design as I am not sure how deep we want to go into the Ensembl data
model.  

-Jason
On 24 Feb 2001, Thomas Sicheritz-Ponten wrote:

> Ewan Birney <birney at ebi.ac.uk> writes:
> 
> > Ok. To help Ensembl-->python (probably via CORBA) integration. I have
> > downloaded biopython and biopython-corba.
> > 
> > I am, therefore, belated learning python. Don't expect any road-to-damscus
> > type conversions (yet) however...
> 
> Nice - I am currently working on understanding your perl scripts in order
> to make a lightweight python interface to ensembl. Maybe we could join
> efforts ? 
> 
> > 
> > Some questions
> > (a) what is the difference between def __name__ and def name functions?
> 
> * _single_leading_underscore: weak "internal use" indicator
> * single_trailing_underscore_: used by convention to avoid conflicts with Python keyword
> * __double_leading_underscore: class-private names in Python 1.4.
> *__double_leading_and_trailing_underscore__: "magic" objects or attributes that live in user-controlled namespaces, e.g. __init__, __import__ or __file__.  
>        user code should generally refrain from using this convention for
>        its own use. 
> 
> * Attributes starting with two underschores, eg. "__n" are renamed
> when byte-compiled to "_CLASS__VARNAME".  Since the class's name is
> used as part of the variable name, the variable "__n" in a subclass
> would not be the same as in the superclass.  This is probably the
> closest to 'private' as you will get.
> 
> 
> > (b) how is inhertiance done in python
> > 
> class ensembl:
>    .. code something
> 
> class EnsemblSQL(ensembl):
>    ... code something
> 
> 
> > (c) is there any concept/files for interfaces either expliciting (java) or
> > "just documentation" like bioperl's "I" files
> > 
> I am not sure if I understood your question ..
> 
> > (d) could some one sketch out an easy biopython script like:
> >    read embl file.
> > 	test whether there is a cacaca repeat in there
> > 	if yes, dump a genbank file.
> 
> 
> I don't know if there is a already working embl or genbank parser in the biopython
> core so I give you an example for a Fasta file.
> 
> ==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP
> import sys
> from Bio.Fasta import Fasta
> 
> # the filename as first commanline argument
> file = sys.argv[1]
> # open a Fasta parser
> iter = Fasta.Iterator(handle = open(file), parser = Fasta.RecordParser())
> 
> 
> repeat = 'CACACA'
> 
> # loop over all sequence entries in the fasta file
> while 1:
>     rec = iter.next()
>     if not rec: break
> 
>     sequence = rec.sequence
> 
>     # test with a simple string count
>     n = sequence.count(repeat)
>     if n:
>         print repeat, 'occured', n, 'times in', rec.title
> 
>     else:
>         print 'nope'
> 
> ==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP ==== SNIP ==== SNAP
> 
> If you want to look at actual code, I can put my last weeks, quick and
> dirty python script to the ensembl mysql db on www.cbs.dtu.dk/thomas/ensembl.py
> 
> Don't hesitate asking/mailing me if you run into problems with python or
> have other general questions.
> 
> Gotta run to a party :-)
> 
> cheers,
> -thomas
> 
> 
> -- 
> Sicheritz-Ponten Thomas, Ph.D  CBS, Department of Biotechnology
> thomas at biopython.org           The Technical University of Denmark
> CBS:  +45 45 252489            Building 208, DK-2800 Lyngby
> Fax   +45 45 931585            http://www.cbs.dtu.dk/thomas
> 
> 	De Chelonian Mobile ... The Turtle Moves ...
> 

Jason Stajich
jason at chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center 
http://www.chg.duke.edu/ 





More information about the Biopython-dev mailing list