[BioSQL-l] loading fasta records with load_seqdatabase.pl - correct fasta headers

Amit Indap indapa at gmail.com
Mon Aug 22 10:57:23 EDT 2005


Hi,

I am new to using the biosql. I am trying to load fasta formatted
RefSeq records into the biosql schema. When I try to use the 
load_seqdatabase.pl script I get the following error

load_seqdatabase.pl --host 127.0.0.1 --port 2022 --dbname testbiosql
--namespace refseq --format fasta refseq.fa

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::SeqAdaptor (driver) failed, values
were ("gi|51459331|ref|XM_498785.1|","gi|51459331|ref|XM_498785.1|","unknown","PREDICTED:
Homo sapiens LOC440641 (LOC440641), mRNA","0","") FKs (1,<NULL>)
Duplicate entry 'unknown-1-0' for key 2
---------------------------------------------------
Could not store unknown:
------------- EXCEPTION  -------------
MSG: You're trying to lie about the length: is 1316 but you say 6474
STACK Bio::PrimarySeq::length
/usr/lib/perl5/site_perl/5.8.5/Bio/PrimarySeq.pm:418
STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm:553
STACK Bio::Seq::length /usr/lib/perl5/site_perl/5.8.5/Bio/Seq.pm:612
STACK Bio::DB::Persistent::PersistentObject::AUTOLOAD
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm:553
STACK Bio::DB::BioSQL::BiosequenceAdaptor::populate_from_row
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BiosequenceAdaptor.pm:236
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1310
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:976
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
STACK Bio::DB::BioSQL::PrimarySeqAdaptor::attach_children
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/PrimarySeqAdaptor.pm:284
STACK Bio::DB::BioSQL::SeqAdaptor::attach_children
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/SeqAdaptor.pm:279
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_build_object
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1341
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:976
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::create
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:205
STACK Bio::DB::BioSQL::BasePersistenceAdaptor::store
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:254
STACK Bio::DB::Persistent::PersistentObject::store
/usr/lib/perl5/site_perl/5.8.5/Bio/DB/Persistent/PersistentObject.pm:272
STACK (eval) ./load_seqdatabase.pl:542
STACK toplevel ./load_seqdatabase.pl:525

--------------------------------------
 at ./load_seqdatabase.pl line 555

I think my fasta headers are incorrect since it says it cannot store
unknown. The first fasta record in my refseq.fa is this:

>gi|6912649|ref|NM_012431.1| Homo sapiens sema domain, immunoglobulin
domain (Ig), short basic domain, secreted, (semaphorin) 3E (SEMA3E),
mRNA

Do I need to reformat that header? I downloaded the NM series of
Refseqs in fasta form from NCBI's ftp site and wanted to load them
into the biosql schema.

Thanks,

Amit Indap
Dept. of Biological Statistics and Computational Biology
Cornell University


(error message)
Loading refseq.fa ...



More information about the BioSQL-l mailing list