[Biopython-dev] large updates in CVS

Jeffrey Chang jchang at smi.stanford.edu
Tue Sep 10 02:39:16 EDT 2002


Hello everybody,

I have just committed into CVS a reworking of the registry framework.
That's the stuff that gets loaded into the Bio namespace.  As before,
you can do:

krusty:~] jchang% python
Python 2.2.1 (#1, 09/06/02, 17:02:21) 
[GCC Apple cpp-precomp 6.14] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import Bio
>>> Bio.formats
Bio.dformats, exporting 'blast', 'blastn', 'blastp', 'blastx', 'embl', 'embl/65', 'empty', 'fasta', 'genbank', 'genbank-records', 'genbank-release', 'ncbi-blastn', 'ncbi-blastp', 'ncbi-blastx', 'ncbi-tblastn', 'ncbi-tblastx', 'search', 'sequence', 'swissprot', 'swissprot/38', 'swissprot/40', 'tblastn', 'tblastx', 'wu-blastn', 'wu-blastp', 'wu-blastx'
>>> Bio.db
db, exporting 'embl', 'embl-dbfetch-cgi', 'embl-ebi-cgi', 'embl-fast', 'embl-xembl-cgi', 'interpro-ebi-cgi', 'nucleotide-dbfetch-cgi', 'nucleotide-genbank-cgi', 'pdb', 'pdb-ebi-cgi', 'pdb-rcsb-cgi', 'prodoc-expasy-cgi', 'prosite-expasy-cgi', 'protein-genbank-cgi', 'swissprot', 'swissprot-expasy-cgi'
>>> print Bio.db['swissprot']["P50105"].read()[:200]
ID   MPT1_YEAST     STANDARD;      PRT;   388 AA.
AC   P50105;
DT   01-OCT-1996 (Rel. 34, Created)
DT   01-OCT-1996 (Rel. 34, Last sequence update)
DT   01-OCT-1996 (Rel. 34, Last annotation update)
D
>>> 



The differences are mostly internal.  The Registry code has now been
pulled out to make it easier to add new registries and new entries to
registries.

The DBRegistry has been slightly reworked from before, though.
Similarly as before, the DBRegistry implements a dictionary-like
interface to different kinds of databases.  Now, the __getitem__
function will return a type native to the kind of database it is.  For
example, CGI databases return handles to the data and BioSQL will
return a SeqRecord object.

However, the DBRegistry objects now export a method:
  get_as(key, to_io)
that will automatically convert the data to a specific type.

>>> seq = Bio.db['swissprot'].get_as("P50105", SeqRecord.io)
>>> print seq.seq
Seq('MANSPKKPSDGTGVSASDTPKYQHTVPETKPAFNLSPGKASELSHSLPSPSQIKSTAHVS ...', SingleLetterAlphabet())
>>> 

This standardizes the data access across different kinds of databases.

Please update your repositores with
cvs update -P -d

Please play around with this and let me know if anything has broken.
One known issue is that some of the dbdefs need to be updated, due to
changes in external web servers.

Jeff



More information about the Biopython-dev mailing list