[BioSQL-l] Special cases of protein data

mark.schreiber at novartis.com mark.schreiber at novartis.com
Wed Aug 24 05:02:59 EDT 2005


NCBI has a taxid for an "Artificial organism" which is often some kind of 
hybrid sequence from two different species. In this case you can expect 
multiple taxids per record.

- Mark





"Marc Logghe" <Marc.Logghe at devgen.com>
Sent by: biosql-l-bounces at portal.open-bio.org
08/24/2005 04:51 PM

 
        To:     "Andreas Dräger" <duze at gmx.de>, <biosql-l at open-bio.org>
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        RE: [BioSQL-l] Special cases of protein data


> I am currently working with BioSQL using MySQL. I tried to 
> insert a lot of protein data which were downloaded from the 
> NCBI web page in GenPept format.
> During the insertion process (performed by BioJava) I got 
> some error messages. Looking at the sequences in detail 
> showed that I got more than 1000 protein sequences which had 
> at least two "source" entries in theire "FEATURE" table. One 
> of these bad examples is given at NCBI by the accession 
> number P76519. This one has even four "source" tags. In my 
> opinion this means that every single species of the four 
> given species contains exactly this protein. This would mean 
> that there are at least these one thousand proteins that I 
> found at NCBI belonging to more than one species. This case 
> cannot be considered with the current BioSQL scheme because 
> there is a one to many relationship between the tables

Gosh, I was not aware of that.
Indeed, if you look at http://www.ncbi.nlm.nih.gov/collab/FT/ it says for the source key:
"identifies the biological source of the specified span of
the sequence; this key is mandatory; more than one source
key per sequence is allowed; every entry/record will have,
as a minimum, either a single source key spanning the 
entire sequence or multiple source keys which together 
span the entire sequence."

Marc

_______________________________________________
BioSQL-l mailing list
BioSQL-l at open-bio.org
http://open-bio.org/mailman/listinfo/biosql-l






More information about the BioSQL-l mailing list