[BioSQL-l] error loading uniprot release 49.6 into mysql
Hilmar Lapp
hlapp at gmx.net
Mon May 15 12:59:06 EDT 2006
You found the right instance. Unfortunately with the way the bioperl
swissprot parser works the group (RG) isn't promoted to author if
there is no author in addition (in fact you may debate whether that
would even be the best way of doing things), so it doesn't find it on
second occurrence by unique key.
If you can live without this entry, or any other entry that causes a
hiccup, just supply the flag --safe and it will gracefully move on to
the next entry.
Fixing the issue would require either to fix the bioperl swissprot
parser (or Bio::Annotation::Reference) to stick the RG group into the
author slot if there is no author, or to fix Bioperl
Bio::Annotation::Reference to also feature a group and biosql to use
it in place of a missing author.
Actually there is $reference->rg. Maybe Bioperl-db (and hence Biosql)
should just use that in place of a missing author?
The downside is that upon round-tripping an entry, the RG annotation
line will become an RA annotation line. How bad would that be?
Any thoughts from anyone?
-hilmar
On May 15, 2006, at 8:34 AM, s.rayner at att.net wrote:
> I found where the script is hiccuping....
>
> The Uniprot release contains lines with identical annotation for
> the RL keyword for two different sequences.
>
> ___________________
>
> First occurence...
> ___________________
>
> ID 1433T_PONPY STANDARD; PRT; 245 AA.
> AC Q5RFJ2; Q5RDK2;
> DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
> DT 05-JUL-2005, sequence version 2.
> DT 18-APR-2006, entry version 13.
> DE 14-3-3 protein theta.
> GN Name=YWHAQ;
> OS Pongo pygmaeus (Orangutan).
> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
> OC Catarrhini; Hominidae; Pongo.
> OX NCBI_TaxID=9600;
> RN [1]
> RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
> RC TISSUE=Brain cortex, and Kidney;
> RG The German cDNA consortium;
> RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.
> <====== Not Unique
>
>
> ___________________
>
> Second occurence...
> ___________________
>
>
> ID 1433G_PONPY STANDARD; PRT; 246 AA.
> AC Q5RC20;
> DT 05-JUL-2005, integrated into UniProtKB/Swiss-Prot.
> DT 05-JUL-2005, sequence version 2.
> DT 18-APR-2006, entry version 13.
> DE 14-3-3 protein gamma.
> GN Name=YWHAG;
> OS Pongo pygmaeus (Orangutan).
> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> OC Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
> OC Catarrhini; Hominidae; Pongo.
> OX NCBI_TaxID=9600;
> RN [1]
> RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA].
> RC TISSUE=Heart;
> RG The German cDNA consortium;
> RL Submitted (NOV-2004) to the EMBL/GenBank/DDBJ databases.
> <====== Not Unique
>
>
>
> in these two cases the generated CRC key is identical and so MySQL
> throws a wobbly.
>
> if i look at the MySQL entry in the REFERENCE table for the first
> sequence
> ------+-------+---------+----------------------+
> | 139 | NULL | Submitted (NOV-2004) to the EMBL/
> GenBank/DDBJ databases. | NULL | NULL | CRC-E7973FEA4B5611DC |
> +--------------+-----------
> +----------------------------------------------------
>
> and the error when the script choked was
>
> MSG: insert in Bio::DB::BioSQL::ReferenceAdaptor (driver) failed,
> values were
> ("","","Submitted (NOV-2004) to the EMBL/GenBank/DDBJ
> databases.","CRC-E7973FEA4B5611DC","","","") FKs (<NULL)
> Duplicate entry 'CRC-E7973FEA4B5611DC' for key 3
>
> hence the problem.
>
> I'm guessing i'm not the first person to encounter this, but dont
> see any hints for an easy way around this.
>
> any suggestions....?
>
> ta
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the BioSQL-l
mailing list