[Biopython-dev] [Bug 2681] BioSQL: record annotations enhancements

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Mon Nov 24 23:05:27 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2681





------- Comment #6 from cymon.cox at gmail.com  2008-11-24 18:05 EST -------
(In reply to comment #4)
> (In reply to comment #2)
> > (In reply to comment #0)
> > > Some swiss prot SeqRecords have ncbi_taxid and they are retrieved
> > > correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing
> > > from the retrieved DBSeqRecord: sp012, sp014, 
> > 
> > Note some swiss prot records may be multi-species, which the BioSQL schema
> > can't cope with.  Not sure if that applies here.
> 
> Yep, thats exactly what was causing the problem. Currently the code refuses to
> load an ncbi_taxid, which I think is correct, after all which one should be
> loaded? Anyway, I'll look into this a bit more...

So, how best to handle records with multiple taxa:

SwissProt/sp014 has 10 organisms which are currently loaded directly into the
taxon_name table:

biosql_test=# select name, name_class from taxon_name where taxon_id = 94;
                                                                               
                                              name                             
                                                                               
                 |   name_class    
------------------------------------------------------------------------------
 Oryza sativa (Rice), Nicotiana tabacum (Common tobacco) Hordeum vulgare
(Barley), Triticum aestivum (Wheat) Secale cereale (Rye), Zea mays (Maize),
Pisum sativum (Garden pea) Spinacia oleracea (Spinach), Capsicum annuum (Bell
pepper) Mesembryanthemum crys | scientific name
(1 row)

That's clearly not a scientific name...

The record has the ncbi_taxon_ids:
OX   NCBI_TaxID=4530, 4097, 4513, 4565, 4550, 4577, 3888, 3562, 4072, 3544,
 19 OX   3555, 3696;

Which are currently not stored because there is more than one:

Loader.py:
 150         ncbi_taxon_id = None
 151         if "ncbi_taxid" in record.annotations :
 152             #Could be a list of IDs.
 153             if isinstance(record.annotations["ncbi_taxid"],list) :
 154                 if len(record.annotations["ncbi_taxid"])==1 :
 155                     ncbi_taxon_id = record.annotations["ncbi_taxid"][0]
 156             else :
 157                 ncbi_taxon_id = record.annotations["ncbi_taxid"]

BioSQL is clearly not designed to store records from multiple taxa: one
bioentry has one taxon_id. Should biopython be refusing to load such records if
the scientific name is not a binomial? What does perl do? 

C.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list