[Bioperl-l] load_seqdatabase error with a specific locus from genbank
Hilmar Lapp
hlapp at gmx.net
Thu Apr 9 03:35:12 UTC 2009
On Apr 8, 2009, at 11:29 AM, Johann PELLET wrote:
> [...]
> and finally EU608407 and EU608559 made a crash:
>
> [...]
> --------------------- WARNING ---------------------
> MSG: Unexpected error in feature table for Skipping feature,
> attempting to recover
> ---------------------------------------------------
> #######...14 times ...############
I would assume that you figured out that this was triggered by or
affected EU608407? Would you mind sharing how?
> --------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed,
> values were ("Bonhoeffer,S., Chappey,C., Parkin,N.T.,
> Whitcomb,LOCUS EU608407
> 1212 bp DNA linear VRL 20-APR-2008","","","CRC-
> D35248959C54B9F2","1","1212","") FKs (<NULL>)
> ERROR: null value in column "location" violates not-null constraint
Is this really the verbatim copy of the error message you saw on the
screen? What's really puzzling about this is how the genbank SeqIO
parser could mess up parsing the reference entry to badly. Here's the
reference from the version online at NCBI:
REFERENCE 1 (bases 1 to 1212)
AUTHORS Bonhoeffer,S., Chappey,C., Parkin,N.T., Whitcomb,J.M. and
Petropoulos,C.J.
TITLE Evidence for positive epistasis in HIV-1
JOURNAL Science 306 (5701), 1547-1550 (2004)
PUBMED 15567861
How the first author line would be chopped off at the end and the
LOCUS line would have gotten inserted there is a mystery to me.
The location is "Science 306 (5701), 1547-1550 (2004)", and according
to the error message the parser failed to extract that and the TITLE.
Could you confirm that the file you are parsing is not corrupted in
any way, specifically for this record?
> ---------------------------------------------------
> Could not store EU608559:
> ------------- EXCEPTION: Bio::Root::Exception -------------
> [...]
>
> If I check in the biosql database if some part of this records are
> inserted:
So are there other sequences associated with that PubMed ID? Can you
do a grep on the PubMed ID and see whether it occurs already before
the one that trips up the load?
> [...]
> select * from dbxref where dbxref_id=4179;
> dbxref_id | dbname | accession | version
> -----------+--------+-----------+---------
> 4179 | PUBMED | 15567861 | 0
>
> select * from bioentry where accession=15567861;
Note that 15567861 is the accession (PubMed ID) for the referenced
article, not the sequence. Which bioentries are associated with a
reference would be in the bioentry_reference table.
-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the Bioperl-l
mailing list