[Biopython-dev] [Bug 2681] New: BioSQL: record annotations enhancements

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Nov 21 19:31:26 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2681

           Summary: BioSQL: record annotations enhancements
           Product: Biopython
           Version: Not Applicable
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: BioSQL
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: cymon.cox at gmail.com


BioSQL storage and retrieval of record annotations. See also bug 2396.


Patch fixes 3 annotations:

1) Fixed date/dates typo.
2) comment's were being stored by not retrieved - fixed with test.
3) A 'reference' annotation, even if an empty list, was being retrieved in a
DBSeqRecord. Fixed so that if there are no references there is no annotation in
DBSeqRecord.

Other annotations:

'date', 'ncbi_taxid', 'gi', and 'contig' are the only annotations we are not
handling correctly in the test suite.

'date' can be ignored if present in DBSeqRecord but absent in SeqRecord because
the current date is entered into table if a date is not present in the record.

Annotation 'ncbi_taxid' will be present in the DBSeqRecords even when not
present in the loaded SeqRecord as they are grabbed from the taxon table. We
can
therefore ignore this specific comparision: old record absent, new record
present. Some swiss prot SeqRecords have ncbi_taxid and they are retrieved
correctly by DBSeqRecord. TODO: others have ncbi_taxid that is missing from the
retrieved DBSeqRecord: sp012, sp014, 

Swissprot, fasta, and EMBL SeqRecords dont have a gi annotation, retrieved
DBSeqRecords do. Loader uses the 'record_id' (line 522) as the identifier in
bioentry, if the gi annotation is missing, which is pulled as the gi
annotation.
So the swissprot, fasta, and embl DBSeqRecords return the accession as the gi
(GenBank identifier). I think this is misleading; annotation 'gi' in the
DBSeqRecord should really be named a more generic 'identifier'...  What to do
here?

'contig' is ignored by loader because it's a SeqFeature object. Is there any
reason it couldnt be loaded and retrieved? (record is GenBank/NT_019265.gb)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list