[BioSQL-l] What should source_term_id in table seqfeature refer to?
Richard Holland
holland at eaglegenomics.com
Sat Aug 15 16:32:35 UTC 2009
On 15 Aug 2009, at 15:29, Hilmar Lapp wrote:
>
> On Aug 15, 2009, at 6:44 AM, Richard Holland wrote:
>
>> [...]
>> What I mean is this:
>>
>> 1. The sequence itself could be downloaded from Genbank, EMBL, or
>> elsewhere, or I could have discovered it in-house.
>
> That's actually what I meant.
>
>> 2. The features on the sequence could have been generated by
>> running BLAST, miRBase, etc., or they could be manually annotated.
>> 3. The features on the sequence could have been downloaded from
>> Genbank, EMBL, etc., or they could have been made locally, or by a
>> collaborator at another institute.
>
> Right, but if a feature is the result of you running some algorithm
> against some sequences, then it's not been downloaded or given to
> you. Features on one and the same sequence can have different
> sources, obviously, so I'm a bit confused - I think we're talking
> about the same thing in different words, but I'm not sure.
Probably. :)
Case study: I download some seqs from Genbank. (Which then need to be
annotated as having come from Genbank, at the sequence level). They
already have some features on them (which need to be annotated as
having come from Genbank, at the feature level, but of an unknown
algorithm as Genbank doesn't specify how they were generated usually).
I then run BLAST of those sequences against some local data, and
record my own features as a result. I also run BLAT, and again record
my own features. My colleague also runs BLAST of the same seqs against
some data of his own, and wants our combined feature results to be
stored in the same database. I want to be able to annotate all these
new features both with the algorithm used to generate them (BLAST or
BLAT) and who did it (myself or my colleague at the institute down the
road), in addition to retaining the original features that came from
Genbank (and making sure they're annotated as such). Hence I'd need a
source attribute for the sequence (Genbank in this case), a source
attribute for each feature (Genbank, Me, or Colleague X, in this
case), and an algorithm/technique/protocol attribute for each feature
(BLAST or BLAT or 'don't know it just came from Genbank' in this
example).
cheers,
Richard
> -hilmar
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> ===========================================================
>
>
>
--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/
More information about the BioSQL-l
mailing list