[BioSQL-l] bioentries in a sequence cluster

Sun Mar 30 01:00:25 EDT 2008

On Mar 27, 2008, at 3:33 PM, Peter Müller wrote:
>
>
> Dear list,
>
> I have a few questions, but maybe with a working example, I can  
> derive the rest.
>
> With perl-db I can fetch a Bio::Cluster Object wit this query:
> (I found no documentation about c::subject and p::object ...)

Yes, sorry, this needs a lot more documentation. The suffix of the  
alias separated from it by '::' is the 'context'. This is needed if  
the same entity participates more than once in an association. What's  
confusing the issue further here is that at the object level each  
object entity (Bio::PrimarySeq, Bio::ClusterI, Bio::Ontology::TermI)  
is participating only once, though in reality Bio::ClusterI and  
Bio::PrimarySeqI both map to table bioentry.

>
> $query->datacollections(
>           ["Bio::PrimarySeqI c::subject",
>           "Bio::PrimarySeqI p::object",

I think that Bio::PrimarySeqI can be substituted with Bio::ClusterI  
in the second line. This would make the mapping clearer I guess. I'm  
not sure why I wrote the example that way, but I'd be surprised if  
Bio::ClusterI does not work here.

>          "Bio::PrimarySeqI<=>Bio::ClusterI<=>Bio::Ontology::TermI"]);
>
> $query->where(["p.accession_number = 'NM_000015'"]);

Actually I think you need to use c.accession_number to query by  
sequence accession. The c (child) alias is the cluster member, and  
the p (parent) alias is the cluster itself.

>
> my $adp = $db->get_object_adaptor('Bio::Cluster');
> my $qres = $adp->find_by_query($query);
>
>
> That's great - but here I ask for a sequence accession-number.
>
> Is it possible to aks for the Clone (IMAGE:4722596) or for an STS  
> accession-number where the result is also a cluster object?
> "give me the cluster(s) where in the sequence-line is a clone-entry  
> with this number 'IMAGE:4722596' ....
> "give me the cluster(s) where in the STS-line is an accession- 
> number with this value 'PMC310725P3'...
> PROTID and NID would be also interesting.

PID and NID should become the primary_id() of the sequence members.  
Hence, you would say c.primary_id where you have c.accession_number  
above.

Each STS line should be in a qualifier/value pair attached to the  
cluster bioentry, under the tag 'sts' (which from what I can see  
would consist of whole lines, not ACC= and UNISTS= values parsed out,  
though I may be mistaken). So you would add

"Bio::PrimarySeqI<=>Bio::Annotation::SimpleValue sv"

to the datacollections, and "sv.value = 'ACC=PMC310725P3  
UNISTS=272646'" and "sv.tagname = 'sts'" to the where() array.

The same goes for IMAGE clone IDs, except that the tag name is  
'clone' and the qualifier/value is attached to the member sequence,  
not the cluster; also here not the entire line is stored, but rather  
parsed into tokens.

Does this help?

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================