[Bioperl-l] Bio::DB::BioDB - insert failed. Dupllicate entry '' for key 2?

Sun Mar 5 15:36:40 UTC 2006

Chris Fields wrote:
> Sorry if I'm a bit off (pub you know) but have you tried the bioperl- db 
> script load_seqdatabase.pl (scripts dir)? 

I poked around in the scripts directory, but am trying to learn the guts well enough to roll my own since I have some point-and-click CGI interfacing in mind. (I'll be posting about the project to this list once we get our thoughts together). 

> Have you loaded taxonomy?

No, I'm not familiar with that. I'll read up on it.

Marc Logghe wrote:
> Yes, I agree with Chris. I also think you'd be better off with
> load_seqdatabase.pl.

I'm sure I would be for general loading. I'm sure the scripts are far more robust than my little piecemeal stab at it, but I'm not sure I'll learn the guts if I just use scripts. Reading the code there are many nuances I don't understand so I'm trying to learn from the ground up, and I'm not sure what I'm doing wrong in my first baby steps. :)

>>>mysql> select * from biodatabase;
>>>+----------------+------+-----------+-------------+
>>>| biodatabase_id | name | authority | description |
>>>+----------------+------+-----------+-------------+
>>>|             23 |      | NULL      | NULL        |
>>>+----------------+------+-----------+-------------+
>
> BTW, here you actually did not delete your sequence but the namespace.
> If you want to check 'sequences' you should look into the bioentry
> table.

The data also disappeared out of the biosequence table. That indicates I deleted the sequence, right? (I didn't check bioentry at the time.) I have a question out to the BioSQL-l mailing list about the purpose of the biodatabase table. (I assume this mailing list isn't the right forum for that question.) I've been poking around in the BioSQL ERD, trying to understand the purpose of each of the tables. 

> Using load_seqdatabase.pl the namespace is set automatically to the
> default ('bioperl') but you can set it as well with the --namespace
> option.

Am I foolhardy to think that I can roll my own simplistic load via the code I posted? 

If I do get it working should I write up a HOWTO? I can put a big "For robust file loading, please see load_seqdatabase.pl" warning at the top. But in our case, we're using Bio::SeqIO to walk through tens of thousands of flat file sequences to find the hundred or so we're interested in, and are trying to store only the ones we want into mySQL. (And we're trying to automate this process for rapid subsequent runs: Load my database w/ only those sequences that X.)

Thanks for the quick help!

j
Omaha Perl Mongers
http://omaha.pm.org