[BioRuby] ARBioSQL
RJP
raoul.bonnal at itb.cnr.it
Fri Apr 6 10:40:16 UTC 2007
Hi Toshiaki,
> >>
> >> I don't follow the recent changes on BioSQL schema but during the last
> >> Phyloinformatics Hackathon (2006 Dec http://hackathon.nescent.org/),
> >> which we (Goto-san and me) had attended, Hilmar was working on
> >> the integration of tree into the database.
> >>
> >> https://www.nescent.org/wg/phyloinformatics/index.php?title=Supporting_NEXUS
> >>
> >> BioSQL The BioSQL group created a new set of tables, optional within BioSQL,
> >> for the purposes of storing phylogenetic trees, both sequence or species.
> >> A script was added to the package that allows one to read NEXUS files and
> >> write any trees found in the files to the database.
> > Now I'm a littel bit busy for adding the ne schema and update
> > ActiveRecord implementation.
Mhhh Probably next weeks I'll take care of it.
> > Now I did a schema for blast and described it by Active Record too,
> > could be of interest for you ?
>
> Nakao-san is preparing a repository for rails related projects at rubyforge
> (mainly for rails plugins utilizing bioruby library), so your code is also
> welcomed to be there.
Ok. We will discuss about the schema itself.
> I proposed 'bioruby-annex' as the project name as the 'bioruby-rails-plugins'
> is too long (rubyforge's limit is 15 chars). Nakao-san, is your project
> already granted?
What do you mean for "granted", founds ?
I don't know if it's possible getting founds from the EU to developing
Ruby + BioRuby + Activer Record. Probably we will need a project road
map or ... for example develop a bioportal based on that technologies.
>
> We need to consider how to incorporate rails dependent codes in bioruby
> in the near future.
May you explain better your point,tnx ?
------------------------------THE~CODE--------------------------------------
#
# bio/io/arbsql.rb - BioSQL access module by Active Record
#
# Inspired by bio/io/sql.sql Copyright (C) 2002 KATAYAMA Toshiaki
<k at bioruby.org>
# Copyright (C) 2006 Raoul Jean Pierre Bonnal <raoul.bonnal at itb.cnr.it>
#
# This library is free software; you can redistribute it and/or
# modify it under the terms of the GNU Lesser General Public
# License as published by the Free Software Foundation; either
# version 2 of the License, or (at your option) any later version.
#
# This library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
# Lesser General Public License for more details.
#
# You should have received a copy of the GNU Lesser General Public
# License along with this library; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
USA
#
# $Id: arbsql.rb,v 0.1 2007/04/06 13:04:28 k Exp $
#
begin
require 'active_record'
rescue LoadError
end
require 'bio/sequence'
require 'bio/feature'
module Bio
class BioSQL
def initialize(adapter, database, username)
ActiveRecord::Base.establish_connection(:adapter=>adapter, :database=>database, :username=>username)
@adapter=adapter
@database=database
@username=username
@connection=ActiveRecord::Base.connection()
end
def close
#returns info to reestablish a new connection
ActiveRecord::Base.remove_connection()
end
class Biodatabase < ActiveRecord::Base
set_table_name "biodatabase"
set_sequence_name "biodatabase_pk_seq"
set_primary_key "biodatabase_id"
has_many :bioentry
#per la creazione richiesti:
#name
end
class Bioentry < ActiveRecord::Base
set_table_name "bioentry"
set_sequence_name "bioentry_pk_seq"
set_primary_key "bioentry_id"
belongs_to :biodatabase
belongs_to :taxon
has_one :biosequence
has_one :comment
has_many :seqfeature
has_many :bioentry_reference
has_many :bioentry_dbxref
#per la creazione richiesti:
#name, accession, version
end
class ObjectBioentry < Bioentry
end
class SubjectBioentry < Bioentry
end
class BioentryReference < ActiveRecord::Base
set_table_name "bioentry_reference"
set_sequence_name nil
set_primary_key nil
belongs_to :bioentry
belongs_to :reference
end
class BioentryDbxref < ActiveRecord::Base
set_table_name "bioentry_dbxref"
set_sequence_name nil
set_primary_key nil #bioentry_id,dbxref_id
belongs_to :bioentry
belongs_to :dbxref
end
class BioentryPath < ActiveRecord::Base
set_table_name "bioentry_path"
set_primary_key nil
set_sequence_name nil
belongs_to :term
belongs_to :object_bioentry, :foreign_key=>"object_bioentry_id"
belongs_to :subject_bioentry, :foreign_key=>"subject_bioentry_id"
end
class BioentryQualifierValue < ActiveRecord::Base
#completare
end
class BioentryRelationship < ActiveRecord::Base
#completare
end
class Biosequence < ActiveRecord::Base
set_table_name "biosequence"
set_sequence_name "biosequene_pk_seq"
set_primary_key "bioentry_id"
belongs_to :bioentry
end
class Comment < ActiveRecord::Base
set_table_name "comment"
set_sequence_name "comment_pk_seq"
set_primary_key "comment_id"
belongs_to :bioentry
end
class Ontology < ActiveRecord::Base
set_table_name "ontology"
set_sequence_name "ontology_pk_seq"
set_primary_key "ontology_id"
has_many :term
has_many :term_path
has_many :term_relationship
end
class Term < ActiveRecord::Base
set_table_name "term"
set_sequence_name "term_pk_seq"
set_primary_key "term_id"
belongs_to :ontology
has_many :seqfeature_qualifier_value
has_many :dbxref_qualifier_value
has_many :location
has_many :seqfeature_relationship
has_many :term_dbxref
has_many :term_relationship_term
has_many :term_synonym
has_many :location_qualifier_value
has_many :seqfeature
has_many :term_path
has_many :term_relationship
end
#created to satisfy relations inside Seqfeature. I have not found
another way to define this.
# the problem is how to distinguish between a typeterm and a source term
class Typeterm < Term
end
#verificare le relazioni rispetto a term
class Sourceterm < Term
end
class ObjectTerm < Term
end
class SubjectTerm < Term
end
class PredicateTerm < Term
end
class Seqfeature < ActiveRecord::Base
set_table_name "seqfeature"
set_sequence_name "seqfeature_pk_seq"
set_primary_key "seqfeature_id"
belongs_to :bioentry
belongs_to :typeterm, :foreign_key=>"type_term_id"
belongs_to :sourceterm, :foreign_key=>"source_term_id"
has_many :seqfeature_dbxref
has_many :dbxref
has_many :seqfeature_qualifier_value
has_many :location
has_many :seqfeature_path
has_many :seqfeature_relationship
end
class ObjectseqFeature < Seqfeature
end
class SubjectFeature < Seqfeature
end
class SeqfeatureDbxref < ActiveRecord::Base
set_table_name "seqfeature_dbxref"
set_primary_key nil #seqfeature_id, dbxref_id
set_sequence_name nil
belongs_to :seqfeature
belongs_to :dbxref
end
class SeqfeaturePath < ActiveRecord::Base
set_table_name "seqfeature_path"
set_primary_key nil
set_sequence_name nil
belongs_to :object_feature
belongs_to :subject_feature
end
class SeqfeatureQualifierValue < ActiveRecord::Base
set_table_name "seqfeature_qualifier_value"
set_primary_key nil #seqfeature_id, term_id, rank
set_sequence_name nil
belongs_to :seqfeature
belongs_to :term
end
class SeqfeatureRelationship < ActiveRecord::Base
set_table_name "seqfeature_relationship"
set_primary_key "seqfeature_relationship_id"
set_sequence_name "seqfeatue_relationship_pk_seq"
belongs_to :term
belongs_to :object_seqfeature
belongs_to :subject_seqfeature
end
class Dbxref < ActiveRecord::Base
set_table_name "dbxref"
set_primary_key "dbxref_id"
set_sequence_name "dbxref_pk_seq"
has_many :dbxref_qualifier_value
has_many :location
has_many :reference #probably is a "has_one" rel.
has_many :term_dbxref
has_many :bioentry_dbxref
end
class DbxrefQualifierValue < ActiveRecord::Base
set_table_name "dbxref_qualifier_value"
set_primary_key nil #dbxref_id, term_id, rank
set_sequence_name nil
belongs_to :dbxref
belongs_to :term
end
class Location < ActiveRecord::Base
set_table_name "location"
set_sequence_name "location_pk_seq"
set_primary_key "location_id"
belongs_to :seqfeature
belongs_to :dbxref
belongs_to :term
has_many :locatio_qualifier_value
def to_s
if self.strand!=1
str="complement("+self.start_pos.to_s+".."+self.end_pos.to_s
+")"
else
str="("+self.start_pos.to_s+".."+self.end_pos.to_s+")"
end
return str
end
end
class LocationQualifierValue < ActiveRecord::Base
set_table_name "location_qualifier_value"
set_primary_key nil #location_id, term_id
set_sequence_name nil
belongs_to :location
belongs_to :term
end
class Reference < ActiveRecord::Base
set_table_name "reference"
set_primary_key "reference_id"
set_sequence_name "reference_pk_seq"
belongs_to :dbxref
has_many :bioentry_reference
end
class Taxon < ActiveRecord::Base
set_table_name "taxon"
set_primary_key "taxon_id"
set_sequence_name "taxon_pk_seq"
has_many :taxon_name #probably has_one
has_one :bioentry
end
class TaxonName < ActiveRecord::Base
set_table_name "taxon_name"
set_primary_key nil
set_sequence_name nil
belongs_to :taxon
end
class TermDbxref < ActiveRecord::Base
set_table_name "term_dbxref"
set_primary_key nil #term_id, dbxref_id
set_sequence_name nil
belongs_to :term
belongs_to :dbxref
end
class TermPath < ActiveRecord::Base
set_table_name "term_path"
set_primary_key "term_path_id"
set_sequence_name "term_path_pk_seq"
belongs_to :ontology
belongs_to :subject_term
belongs_to :object_term
belongs_to :predicate_term
end
class TermRelationship < ActiveRecord::Base
set_table_name "term_relationship"
set_primary_key "term_relationship_id"
set_sequence_name "term_relationship_pk_seq"
belongs_to :ontology
belongs_to :subject_term
belongs_to :predicate_term
belongs_to :object_term
has_one :term_relationship_term
end
class TermRelationshipTerm < ActiveRecord::Base
set_table_name "term_relationship_term"
set_primary_key "term_relationship_id"
set_sequence_name nil
belongs_to :term_relationship
belongs_to :term
end
class TermSynonym < ActiveRecord::Base
set_table_name "term_synonym"
set_primary_key nil #term_id, synonym
set_sequence_name nil
belongs_to :term
end
###commented functions should be setted up as private. Only fetch should
be public or better a few functions
## I should set up pub functions to explore database.
#the only object user should see is the Sequence object
###anche se in teoria questa classe la potrei separate ( solo
definizione di biosql) lasciando la def di seq di biosql separata.
def find_bioentry_by_biodatabase_id(id)
return Bioentry.find_by_biodatabase(id)
end
# def find_all_biodatabase
# return Biodatabase.find_all
# end
def fetch_accession(accession)
#forse dovrei fare qui il controllo della entry che ritorna qualche
cosa non nella creazione dell'oggetto che ha come parametro un nil
Sequence.new(self, Bioentry.find_by_accession(accession))
end
def fetch_id(id)
Sequence.new(self, Bioentry.find(id))
end
class Sequence
private
def get_seqfeature(sf)
#in seqfeature BioSQL class
Bio::Feature.new(sf.typeterm.name,
sf.location.to_s,sf.seqfeature_qualifier_value.collect{|sfqv|
Bio::Feature::Qualifier.new(sfqv.term.name,sfqv.value)})
end
def length=(len)
@entry.biosequence.length=len
end
public
def initialize(db,entry)
@db=db unless entry.nil?
@entry=entry unless entry.nil?
end
def bioentry_id
@enty.bioentry_id
end
def name
@entry.name
end
def name=(value)
@entry.name=value
end
def accession
@entry.accession
end
def accession=(value)
@entry.accession=value
end
def taxon_id
@entry.taxon_id
end
def database
@entry.biodatabase.name
end
def database_desc
@entry.biodatabase.description
end
def version
@entry.version
end
def version=(value)
@entry.version=value
end
def division
@entry.division
end
def division=(value)
@entry.division=value
end
def description
@entry.description
end
def description=(value)
@entry.description=value
end
def identifier
@entry.identifier
end
def identifier=(value)
@entry.identifier=value
end
def features
Bio::Features.new(@db.find_feature_by_entry(@entry.bioentry_id).collect
{|sf|
self.get_seqfeature(sf)})
end
def seq
case @entry.biosequence.alphabet
when /.na/i # 'dna' or 'rna'
Bio::Sequence::NA.new(@entry.biosequence.seq)
when /protein/i # 'protein'
Bio::Sequence::AA.new(@entry.biosequence.seq)
when nil
nil
end
end
def seq=(value)
#chk which type of alphabet is, NU/NA/nil
#value could be nil ? I think no.
@entry.biosequence.seq=value
self.length=value.length
end
def length
@entry.biosequence.length
end
def references
#return and array of hash, hash has these keys ["title",
"dbxref_id", "reference_id", "authors", "crc", "location"]
#probably would be better to create a class refrence to collect
these informations
@entry.bioentry_reference.collect{|item|
item.reference.attributes}
end
def comment
@entry.comment.comment_text
end
def save
#I should add chks for SQL errors
@entry.biosequence.save
@entry.save
end
def to_fasta
print ">" + accession + "\n"
print seq.gsub(Regexp.new(".{1,#{60}}"), "\\0\n")
end
end
end
end
=begin
db=Bio::BioSQL.new('postgresql','discovery','febo')
stampa le feature di una sequenza:
s=db.find_seq_by_bioentry_id(7176)
s.features.each{|x| x.qualifiers.each {|y| puts x.feature+' '+x.position
+"\t"+y.qualifier+'-->'+y.value}}
versione molto contratta nel caso fornisca gli id in un array
h=[7294,7176,7094,7247,7294]
h.each{|k| puts k; db.find_seq_by_bioentry_id(k).features.each{|x|
x.qualifiers.each {|y| puts x.feature+' '+x.position+"\t"+y.qualifier
+'-->'+y.value}}}
per la creazione della sequenza prima qualifier, feature, features,
sequenza, genbank
=end
=begin
= Bio::BioSQL
--- Bio::BioSQL.new(adapter = 'postgresql', database = 'yourdbname',
username = 'yourdbusername')
--- Bio::BioSQL.close -> Arguments for establish_connection in
ActiveRecord::Base
--- Bio::BioSQL#fetch_accession(accession)
Returns Bio::BioSQL::Sequence object.
--- Bio::BioSQL#fetch_id(bioentry_id)
Returns Bio::BioSQL::Sequence object.
--- Bio::BioSQL#find_bioentry_by_biodatabase_id(id)
Returns Array of Bio::BioSQL::Bioentry selected by biodatabase_id
Actually not very useful, I think would be usefull to return an
array of sequences.
== Bio::BioSQL::Sequence
--- Bio::BioSQL::Sequence.new(db, entry)
--- Bio::BioSQL::Sequence#bioentry_id -> Integer
--- Bio::BioSQL::Sequence#_name -> String
--- Bio::BioSQL::Sequence#accession -> String
--- Bio::BioSQL::Sequence#definition -> String
--- Bio::BioSQL::Sequence#comment -> String
Returns the first comment. For complete comments, use comments
method.
Note: I have to test this. In not AR version there was comments I
have not yet implemented it.
--- Bio::BioSQL::Sequence#description -> String
--- Bio::BioSQL::Sequence#database -> String
Returns the name of the biodatabase associated with the sequence.
--- Bio::BioSQL::Sequence#database_desc -> String
Returns the description of biodatabase associated with the
sequence.
--- Bio::BioSQL::Sequence#date -> String
NOT IMPLEMENTED
--- Bio::BioSQL::Sequence#division -> String
NOT IMPLEMENTED
--- Bio::BioSQL::Sequence#length -> Integer
Returns the length of the sequence stored into the db.
--- Bio::BioSQL::Sequence#features
Returns Bio::Features object.
--- Bio::BioSQL::Sequence#references -> Array
Returns reference informations in Array of Hash (not
Bio::Reference) (hash has these keys ["title", "dbxref_id",
"reference_id", "authors", "crc", "location"])
--- Bio::BioSQL::Sequence#identifier -> ?
I can't remember. Sorry
--- Bio::BioSQL::Sequence#seq
Returns Bio::Sequence::NA or AA object.
--- Bio::BioSQL::Sequence#subseq(from, to)
NOT IMPLEMENTED Returns Bio::Sequence::NA or AA object (by lazy
fetching).
--- Bio::BioSQL::Sequence#taxonomy -> Integer
Returns taxon_id of the sequence
--- Bio::BioSQL::Sequence#version -> String
--- Bio::BioSQL::Sequence#save
Save the sequence into the db
--- Bio::BioSQL::Sequence#to_fasta
Prints on standard output the sequence formatted in fasta format
=end
--
Ra
More information about the BioRuby
mailing list