[Bioperl-l] Indexing large databases / BioSQL

Chris Fields cjfields at uiuc.edu
Mon Apr 28 13:20:39 UTC 2008


On Apr 28, 2008, at 7:18 AM, Bánk Beszteri wrote:

> Dear BioSQL / bioperl-db-ists,
>
> I would like  to share my experiences with trying to load  
> uniprot_trembl into a BioSQL db, and also to ask a couple of  
> questions; perhaps some of you know the problems I encountered. I  
> used bioperl-live and bioperl-db-live as of 2008-04-03 and  
> uniprot_trembl.dat as of 2008-04-04. The command was like
>
> load_seqdatabase.pl --safe --logchunk 1000 --host dbserv --dbname  
> abc --dbuser efg --dbpass xyz --driver mysql --namespace  
> uniprot_trembl --format embl uniprot_trembl.dat
>
> ....
>
> First of all, the below error seems to lead to a crash, in spite of  
> --safe:
>
> >>>
> ------------- EXCEPTION -------------
> MSG: A1XDT7 seems to have an invalid species classification.
> STACK Bio::SeqIO::embl::_read_EMBL_Species /home/biocl/bbeszter/lib/ 
> bioperl-live/bioperl-live/Bio/SeqIO/embl.pm:108
> 7
> STACK Bio::SeqIO::embl::next_seq /home/biocl/bbeszter/lib/bioperl- 
> live/bioperl-live/Bio/SeqIO/embl.pm:320
> STACK toplevel /home/biocl/bbeszter/lib/bioperl-live/bioperl-db/ 
> scripts/biosql/load_seqdatabase.pl:634
> -------------------------------------
>
> Command exited with non-zero status 255
> <<<
>
> What this is about is NCBI Tax_ID:435 (Acetobacter aceti; it has  
> some 30 synonyms in my DB, too), which, to me, looks like a  
> completely normal taxon: I could follow its taxonomy up to the root  
> in my NCBI taxonomy in the BioSQL DB I used. I don´t know if someone  
> else has seen / can reproduce the problem, or should I think about  
> some problem with my taxonomy db? Besides, is it the expected  
> behaviour from load_seqdatabase.pl to die upon this error?

...

You should use 'swiss' format instead of 'embl' when loading Uniprot/ 
SwissProt sequences.  Though on the surface they're similar the  
feature table (among other things) is completely different.  I'm not  
sure if that's causing all of the issues here but it certainly could  
contribute to them.

In the meantime, it's much easier for us to track these problems if  
you file a bug (BioPerl, file for bioperl-db):

http://bugzilla.open-bio.org/

chris



More information about the Bioperl-l mailing list