[BioPython] Help getting GenBank data into Database

Brad Chapman chapmanb@arches.uga.edu
Tue, 28 May 2002 09:14:54 -0400


--4Ckj6UjgE2iN1+kY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Hi Andreas;
Thanks for your interest in BioSQL!

> i'm trying to get the Data from a GenBank flatfile into the mySql 
> database. I've got biopython from cvs. I created all the tables
> whith the bioSQL schema and created an new biodatabase.
> 
> Then I try the following:
> ----
> from Bio import GenBank
> from BioSQL import BioSeqDatabase
> 
> database = BioSeqDatabase.open_database(db="test")
> f=open("/home/kuntzagk/gb_1.dat","r")
> parser = GenBank.RecordParser()
> it = GenBank.Iterator(f,parser)

Looks good.

> biodata = BioSeqDatabase.BioSeqDatabase(database.adaptor, "test")

Just a note, you can do things this way, but if you already have a
database you just need to do:

your_db = database["test"]

or if it's not created yet:

your_db = database.new_database("test")

Sorry, I know the "database name inside of a database name" thing is a
bit confusing.

But, on to your problem:

> biodata.load(it)
> ---
> 
> the last Command gives me an Error:
> 
> Traceback (most recent call last):
>   File "load_database.py", line 9, in ?
>     biodata.load(it)
>   File 
> "/home/kuntzagk/lib/python2.2/site-packages/BioSQL/BioSeqDatabase.py", 
> line 276, in load
>     db_loader.load_seqrecord(cur_record)
>   File "/home/kuntzagk/lib/python2.2/site-packages/BioSQL/Loader.py", 
> line 28, in load_seqrecord
>     bioentry_id = self._load_bioentry_table(record)
>   File "/home/kuntzagk/lib/python2.2/site-packages/BioSQL/Loader.py", 
> line 68, in _load_bioentry_table
>     accession, version = record.id.split('.')
> 
> What is wrong ?

This is a just a dumb assumption on my part: that all of the ids in the
GenBank file would look like:

AF1234.1 

where the first part is the accession and the second is the version.
This isn't a good assumption since there are a ton of funky things in
GenBank. I've fixed this in CVS and attached the small patch so you can
just apply it yourself if you want.

Let me know if there are other problems -- I'm happy to help.

Brad
-- 
PGP public key available from http://pgp.mit.edu/

--4Ckj6UjgE2iN1+kY
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="Loader.diff"

Index: Loader.py
===================================================================
RCS file: /home/repository/biopython/biopython/BioSQL/Loader.py,v
retrieving revision 1.8
retrieving revision 1.7
diff -c -r1.8 -r1.7
*** Loader.py	28 May 2002 13:05:42 -0000	1.8
--- Loader.py	1 Mar 2002 08:22:12 -0000	1.7
***************
*** 64,76 ****
      def _load_bioentry_table(self, record):
          """Fill the bioentry table with sequence information.
          """
!         # get the pertinent info and insert it
!         
!         if record.id.find('.') >= 0: # try to get a version from the id
!             accession, version = record.id.split('.')
!         else: # otherwise just use a null version
!             accession = record.id
!             version = 0
          try:
              division = record.annotations["data_file_divison"]
          except KeyError:
--- 64,71 ----
      def _load_bioentry_table(self, record):
          """Fill the bioentry table with sequence information.
          """
!         # get the pertinet info and insert it
!         accession, version = record.id.split('.')
          try:
              division = record.annotations["data_file_divison"]
          except KeyError:

--4Ckj6UjgE2iN1+kY--