[BioSQL-l] bioentries in a sequence cluster
Hilmar Lapp
hlapp at gmx.net
Sun Mar 30 01:00:25 EDT 2008
On Mar 27, 2008, at 3:33 PM, Peter Müller wrote:
>
>
> Dear list,
>
> I have a few questions, but maybe with a working example, I can
> derive the rest.
>
> With perl-db I can fetch a Bio::Cluster Object wit this query:
> (I found no documentation about c::subject and p::object ...)
Yes, sorry, this needs a lot more documentation. The suffix of the
alias separated from it by '::' is the 'context'. This is needed if
the same entity participates more than once in an association. What's
confusing the issue further here is that at the object level each
object entity (Bio::PrimarySeq, Bio::ClusterI, Bio::Ontology::TermI)
is participating only once, though in reality Bio::ClusterI and
Bio::PrimarySeqI both map to table bioentry.
>
> $query->datacollections(
> ["Bio::PrimarySeqI c::subject",
> "Bio::PrimarySeqI p::object",
I think that Bio::PrimarySeqI can be substituted with Bio::ClusterI
in the second line. This would make the mapping clearer I guess. I'm
not sure why I wrote the example that way, but I'd be surprised if
Bio::ClusterI does not work here.
> "Bio::PrimarySeqI<=>Bio::ClusterI<=>Bio::Ontology::TermI"]);
>
> $query->where(["p.accession_number = 'NM_000015'"]);
Actually I think you need to use c.accession_number to query by
sequence accession. The c (child) alias is the cluster member, and
the p (parent) alias is the cluster itself.
>
> my $adp = $db->get_object_adaptor('Bio::Cluster');
> my $qres = $adp->find_by_query($query);
>
>
> That's great - but here I ask for a sequence accession-number.
>
> Is it possible to aks for the Clone (IMAGE:4722596) or for an STS
> accession-number where the result is also a cluster object?
> "give me the cluster(s) where in the sequence-line is a clone-entry
> with this number 'IMAGE:4722596' ....
> "give me the cluster(s) where in the STS-line is an accession-
> number with this value 'PMC310725P3'...
> PROTID and NID would be also interesting.
PID and NID should become the primary_id() of the sequence members.
Hence, you would say c.primary_id where you have c.accession_number
above.
Each STS line should be in a qualifier/value pair attached to the
cluster bioentry, under the tag 'sts' (which from what I can see
would consist of whole lines, not ACC= and UNISTS= values parsed out,
though I may be mistaken). So you would add
"Bio::PrimarySeqI<=>Bio::Annotation::SimpleValue sv"
to the datacollections, and "sv.value = 'ACC=PMC310725P3
UNISTS=272646'" and "sv.tagname = 'sts'" to the where() array.
The same goes for IMAGE clone IDs, except that the tag name is
'clone' and the qualifier/value is attached to the member sequence,
not the cluster; also here not the entire line is stored, but rather
parsed into tokens.
Does this help?
-hilmar
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the BioSQL-l
mailing list