[Biopython-dev] A modification to BioSQL
Fields, Christopher J
cjfields at illinois.edu
Mon Jun 22 21:07:00 UTC 2015
Even though I believe the design had in mind primarily INSDC data sources, I don’t think there were any restrictions primarily b/c the source could be something non-INSDC. I vaguely recall this coming up at some point in the distant past; might be worth asking Hilmar about it.
chris
> On Jun 22, 2015, at 3:44 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>
> Hi Brian,
>
> Are you familiar with the logic BioPerl uses to set this field?
> See also https://github.com/biopython/biopython/pull/366
>
> Peter
>
> On Mon, Jun 22, 2015 at 9:11 PM, Brian Osborne <bosborne11 at verizon.net> wrote:
>> All,
>>
>> I’ve been using the BioSQL schema with Bioperl and would like to start doing
>> the same with Biopython, but there’s a limitation I’d like to fix. Here’s
>> the relevant table in the BioSQL schema, seqfeature:
>>
>> Column | Type | Modifiers
>> | Storage | Stats target | Description
>> ----------------+-----------------------+---------------------------------------------------------+----------+--------------+-------------
>> seqfeature_id | integer | not null default
>> nextval('seqfeature_pk_seq'::regclass) | plain | |
>> bioentry_id | integer | not null
>> | plain | |
>> type_term_id | integer | not null
>> | plain | |
>> source_term_id | integer | not null
>> | plain | |
>> display_name | character varying(64) |
>> | extended | |
>> rank | integer | not null default 0
>> | plain | |
>>
>> Note that required field, source_term_id. In the work I’ve been doing with
>> Bioperl I’ve been setting this “source term” to different values (e.g.
>> “NCBI”) depending on where the tag/value data in the feature comes from.
>>
>> But here’s the code that makes a persistent feature, from BioSQL/Loader.py:
>>
>> def _load_seqfeature_basic(self, feature_type, feature_rank,
>> bioentry_id):
>> """Load the first tables of a seqfeature and returns the id
>> (PRIVATE).
>>
>> This loads the "key" of the seqfeature (ie. CDS, gene) and
>> the basic seqfeature table itself.
>> """
>> ontology_id = self._get_ontology_id('SeqFeature Keys')
>> seqfeature_key_id = self._get_term_id(feature_type,
>> ontology_id=ontology_id)
>> # XXX source is always EMBL/GenBank/SwissProt here; it should depend
>> on
>> # the record (how?)
>> source_cat_id = self._get_ontology_id('SeqFeature Sources')
>> source_term_id = self._get_term_id('EMBL/GenBank/SwissProt',
>> ontology_id=source_cat_id)
>>
>> sql = r"INSERT INTO seqfeature (bioentry_id, type_term_id, " \
>> r"source_term_id, rank) VALUES (%s, %s, %s, %s)"
>> self.adaptor.execute(sql, (bioentry_id, seqfeature_key_id,
>> source_term_id, feature_rank + 1))
>> seqfeature_id = self.adaptor.last_id('seqfeature')
>>
>> return seqfeature_id
>>
>> This code always sets the source term to “ EMBL/GenBank/SwissProt”, and it
>> can not be set to anything else. A better idea is to have a method to set
>> and get this, e.g. source(), just as you can set the “type” of the feature.
>> The way to do this is to subclass SeqFeature to make DBSeqFeature, just as
>> Seq is subclassed to make DBSeq and SeqRecord is subclassed to make
>> DBSeqRecord in BioSQL/Seq.py.
>>
>> So I propose to fork, code, and send a pull request for this. What do you
>> think?
>>
>> Thanks again,
>>
>> Brian O.
>>
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython-dev
More information about the Biopython-dev
mailing list