[Biopython-dev] A modification to BioSQL

Mon Jun 22 20:44:49 UTC 2015

Hi Brian,

Are you familiar with the logic BioPerl uses to set this field?
See also https://github.com/biopython/biopython/pull/366

Peter

On Mon, Jun 22, 2015 at 9:11 PM, Brian Osborne <bosborne11 at verizon.net> wrote:
> All,
>
> I’ve been using the BioSQL schema with Bioperl and would like to start doing
> the same with Biopython, but there’s a limitation I’d like to fix. Here’s
> the relevant table in the BioSQL schema, seqfeature:
>
>      Column     |         Type          |                        Modifiers
> | Storage  | Stats target | Description
> ----------------+-----------------------+---------------------------------------------------------+----------+--------------+-------------
>  seqfeature_id  | integer               | not null default
> nextval('seqfeature_pk_seq'::regclass) | plain    |              |
>  bioentry_id    | integer               | not null
> | plain    |              |
>  type_term_id   | integer               | not null
> | plain    |              |
>  source_term_id | integer               | not null
> | plain    |              |
>  display_name   | character varying(64) |
> | extended |              |
>  rank           | integer               | not null default 0
> | plain    |              |
>
> Note that required field, source_term_id. In the work I’ve been doing with
> Bioperl I’ve been setting this “source term” to different values (e.g.
> “NCBI”) depending on where the tag/value data in the feature comes from.
>
> But here’s the code that makes a persistent feature, from BioSQL/Loader.py:
>
>     def _load_seqfeature_basic(self, feature_type, feature_rank,
> bioentry_id):
>         """Load the first tables of a seqfeature and returns the id
> (PRIVATE).
>
>         This loads the "key" of the seqfeature (ie. CDS, gene) and
>         the basic seqfeature table itself.
>         """
>         ontology_id = self._get_ontology_id('SeqFeature Keys')
>         seqfeature_key_id = self._get_term_id(feature_type,
>                                               ontology_id=ontology_id)
>         # XXX source is always EMBL/GenBank/SwissProt here; it should depend
> on
>         # the record (how?)
>         source_cat_id = self._get_ontology_id('SeqFeature Sources')
>         source_term_id = self._get_term_id('EMBL/GenBank/SwissProt',
>                                            ontology_id=source_cat_id)
>
>         sql = r"INSERT INTO seqfeature (bioentry_id, type_term_id, " \
>               r"source_term_id, rank) VALUES (%s, %s, %s, %s)"
>         self.adaptor.execute(sql, (bioentry_id, seqfeature_key_id,
>                                    source_term_id, feature_rank + 1))
>         seqfeature_id = self.adaptor.last_id('seqfeature')
>
>         return seqfeature_id
>
> This code always sets the source term to “ EMBL/GenBank/SwissProt”, and it
> can not be set to anything else. A better idea is to have a method to set
> and get this, e.g. source(), just as you can set the “type” of the feature.
> The way to do this is to subclass SeqFeature to make DBSeqFeature, just as
> Seq is subclassed to make DBSeq and SeqRecord is subclassed to make
> DBSeqRecord in BioSQL/Seq.py.
>
> So I propose to fork, code, and send a pull request for this. What do you
> think?
>
> Thanks again,
>
> Brian O.
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython-dev