[BioSQL-l] What should source_term_id in table seqfeature refer to?
Richard Holland
holland at eaglegenomics.com
Sat Aug 15 20:00:39 UTC 2009
Ok, cool. So we can now rephrase the original question to...: How
should provenance information be stored in BioSQL?
:)
cheers,
Richard
On 15 Aug 2009, at 20:31, Hilmar Lapp wrote:
>
> On Aug 15, 2009, at 12:32 PM, Richard Holland wrote:
>
>> [...]
>> Case study:
>
> Great, now we're getting somewhere :-)
>
>> I download some seqs from Genbank. (Which then need to be annotated
>> as having come from Genbank, at the sequence level).
>
> Note, as you say, *at the sequence level*. I.e., you would record
> this either using the bioentry's namespace (biodatabase), or a
> bioentry_qualifier_value annotation. I would choose the former,
> though since a bioentry can on only be in one namespace, it may not
> satisfy your needs.
>
>> They already have some features on them (which need to be annotated
>> as having come from Genbank, at the feature level, but of an
>> unknown algorithm as Genbank doesn't specify how they were
>> generated usually).
>
> Right. The source term would indicate that GenBank provided them to
> you, and that that's all you know.
>
>> I then run BLAST of those sequences against some local data, and
>> record my own features as a result. I also run BLAT, and again
>> record my own features.
>
> BLAST and BLAT would now be the source terms.
>
>> My colleague also runs BLAST of the same seqs against some data of
>> his own, and wants our combined feature results to be stored in the
>> same database. I want to be able to annotate all these new features
>> both with the algorithm used to generate them (BLAST or BLAT)
>
> You use the source term for that.
>
>> and who did it (myself or my colleague at the institute down the
>> road)
>
> Ah - that's provenance information, not the source as is normally
> referred to. BioSQL at present doesn't have an explicit provenance
> model, but you can still record provenance information through
> ontology-typed tag/value annotation in seqfeature_qualifier_value,
> with the terms coming from a provenance ontology (that you make up
> yourself or grab from somewhere else).
>
>> , in addition to retaining the original features that came from
>> Genbank (and making sure they're annotated as such).
>
> That shouldn't be a problem - certainly it's not for BioSQL.
>
>> Hence I'd need a source attribute for the sequence (Genbank in this
>> case), a source attribute for each feature (Genbank, Me, or
>> Colleague X, in this case), and an algorithm/technique/protocol
>> attribute for each feature (BLAST or BLAT or 'don't know it just
>> came from Genbank' in this example).
>
> Not quite - source really is what provided the feature to you, not
> who or when, or using which BLAST database, genome assembly, or how
> you parsed the results, etc etc. That's all provenance information.
>
> -hilmar
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> ===========================================================
>
>
>
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
More information about the BioSQL-l
mailing list