[BioRuby] ARBioSQL

RJP raoul.bonnal at itb.cnr.it
Fri Apr 6 06:40:16 EDT 2007


Hi Toshiaki, 

> >>
> >> I don't follow the recent changes on BioSQL schema but during the last
> >> Phyloinformatics Hackathon (2006 Dec http://hackathon.nescent.org/),
> >> which we (Goto-san and me) had attended, Hilmar was working on
> >> the integration of tree into the database.
> >>
> >> https://www.nescent.org/wg/phyloinformatics/index.php?title=Supporting_NEXUS
> >>
> >>   BioSQL The BioSQL group created a new set of tables, optional within BioSQL,
> >>   for the purposes of storing phylogenetic trees, both sequence or species.
> >>   A script was added to the package that allows one to read NEXUS files and
> >>   write any trees found in the files to the database.
> > Now I'm a littel bit busy for adding the ne schema and update
> > ActiveRecord implementation.
Mhhh Probably next weeks I'll take care of it. 


> > Now I did a schema for blast and described it by Active Record too,
> > could be of interest for you ?
> 
> Nakao-san is preparing a repository for rails related projects at rubyforge
> (mainly for rails plugins utilizing bioruby library), so your code is also
> welcomed to be there.
Ok. We will discuss about the schema itself.


> I proposed 'bioruby-annex' as the project name as the 'bioruby-rails-plugins'
> is too long (rubyforge's limit is 15 chars).  Nakao-san, is your project
> already granted?
What do you mean for "granted", founds ?
I don't know if it's possible getting founds from the EU to developing
Ruby + BioRuby + Activer Record. Probably we will need a project road
map or ... for example develop a bioportal based on that technologies.
> 
> We need to consider how to incorporate rails dependent codes in bioruby
> in the near future.
May you explain better your point,tnx ?

------------------------------THE~CODE--------------------------------------
#
# bio/io/arbsql.rb - BioSQL access module by Active Record
#
#  Inspired by bio/io/sql.sql Copyright (C) 2002 KATAYAMA Toshiaki
<k at bioruby.org>
#  Copyright (C) 2006 Raoul Jean Pierre Bonnal <raoul.bonnal at itb.cnr.it>
#
#  This library is free software; you can redistribute it and/or
#  modify it under the terms of the GNU Lesser General Public
#  License as published by the Free Software Foundation; either
#  version 2 of the License, or (at your option) any later version.
#
#  This library is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
#  Lesser General Public License for more details.
#
#  You should have received a copy of the GNU Lesser General Public
#  License along with this library; if not, write to the Free Software
#  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
USA
#
#  $Id: arbsql.rb,v 0.1 2007/04/06 13:04:28 k Exp $
#

begin
  require 'active_record'
rescue LoadError
end
require 'bio/sequence'
require 'bio/feature'

module Bio
  class BioSQL

    def initialize(adapter, database, username)

ActiveRecord::Base.establish_connection(:adapter=>adapter, :database=>database, :username=>username)
      @adapter=adapter
      @database=database
      @username=username
      @connection=ActiveRecord::Base.connection()
    end

    def close
	#returns info to reestablish a new connection
	ActiveRecord::Base.remove_connection()
    end

    class Biodatabase < ActiveRecord::Base
      set_table_name "biodatabase"
      set_sequence_name "biodatabase_pk_seq"
      set_primary_key  "biodatabase_id"
      has_many :bioentry

#per la creazione richiesti:
#name
    end

    class Bioentry < ActiveRecord::Base
      set_table_name "bioentry"
      set_sequence_name "bioentry_pk_seq"
      set_primary_key "bioentry_id"
      belongs_to :biodatabase
      belongs_to :taxon
      has_one :biosequence
      has_one :comment
      has_many :seqfeature
      has_many :bioentry_reference
      has_many :bioentry_dbxref

#per la creazione richiesti:
#name, accession, version
    end
    
    class ObjectBioentry < Bioentry
    end
    class SubjectBioentry < Bioentry
    end
    
    class BioentryReference < ActiveRecord::Base
      set_table_name "bioentry_reference"
      set_sequence_name nil
      set_primary_key nil
      belongs_to :bioentry
      belongs_to :reference
    end
    
    class BioentryDbxref < ActiveRecord::Base
      set_table_name "bioentry_dbxref"
      set_sequence_name nil
      set_primary_key nil #bioentry_id,dbxref_id
      belongs_to :bioentry
      belongs_to :dbxref
    end

    class BioentryPath < ActiveRecord::Base
      set_table_name "bioentry_path"
      set_primary_key nil
      set_sequence_name nil
      belongs_to :term
      belongs_to :object_bioentry, :foreign_key=>"object_bioentry_id"
      belongs_to :subject_bioentry, :foreign_key=>"subject_bioentry_id"
    end
    
    class BioentryQualifierValue < ActiveRecord::Base
      #completare
    end
    
    class BioentryRelationship < ActiveRecord::Base
      #completare
    end
    
    class Biosequence < ActiveRecord::Base
      set_table_name "biosequence"
      set_sequence_name "biosequene_pk_seq"
      set_primary_key "bioentry_id"
      belongs_to :bioentry
    end
    
    class Comment < ActiveRecord::Base
      set_table_name "comment"
      set_sequence_name "comment_pk_seq"
      set_primary_key "comment_id"
      belongs_to :bioentry
    end
    
    class Ontology < ActiveRecord::Base
      set_table_name "ontology"
      set_sequence_name "ontology_pk_seq"
      set_primary_key "ontology_id"
      has_many :term
      has_many :term_path
      has_many :term_relationship
    end

    class Term < ActiveRecord::Base
      set_table_name "term"
      set_sequence_name "term_pk_seq"
      set_primary_key "term_id"
      belongs_to :ontology
      has_many :seqfeature_qualifier_value
      has_many :dbxref_qualifier_value
      has_many :location
      has_many :seqfeature_relationship
      has_many :term_dbxref
      has_many :term_relationship_term
      has_many :term_synonym
      has_many :location_qualifier_value
      has_many :seqfeature
      has_many :term_path
      has_many :term_relationship
    end

#created to satisfy relations inside Seqfeature. I have not found
another way to define this. 
# the problem is how to distinguish between a typeterm and a source term
    class Typeterm < Term
    end
    #verificare le relazioni rispetto a term
    class Sourceterm < Term
    end

    class ObjectTerm < Term
    end

    class SubjectTerm < Term
    end

    class PredicateTerm < Term
    end

    class Seqfeature < ActiveRecord::Base
      set_table_name "seqfeature"
      set_sequence_name "seqfeature_pk_seq"
      set_primary_key "seqfeature_id"
      belongs_to :bioentry
      belongs_to :typeterm, :foreign_key=>"type_term_id"
      belongs_to :sourceterm, :foreign_key=>"source_term_id"
      has_many :seqfeature_dbxref
      has_many :dbxref
      has_many :seqfeature_qualifier_value
      has_many :location
      has_many :seqfeature_path
      has_many :seqfeature_relationship
    end

    class ObjectseqFeature < Seqfeature
    end

    class SubjectFeature < Seqfeature
    end

    class SeqfeatureDbxref < ActiveRecord::Base
      set_table_name "seqfeature_dbxref"
      set_primary_key nil #seqfeature_id, dbxref_id
      set_sequence_name nil
      belongs_to :seqfeature
      belongs_to :dbxref
    end

    class SeqfeaturePath < ActiveRecord::Base
      set_table_name "seqfeature_path"
      set_primary_key nil 
      set_sequence_name nil
      belongs_to :object_feature
      belongs_to :subject_feature
      
    end

    class SeqfeatureQualifierValue < ActiveRecord::Base
      set_table_name "seqfeature_qualifier_value"
      set_primary_key nil #seqfeature_id, term_id, rank
      set_sequence_name nil
      belongs_to :seqfeature
      belongs_to :term
    end

    class SeqfeatureRelationship < ActiveRecord::Base
      set_table_name "seqfeature_relationship"
      set_primary_key "seqfeature_relationship_id"
      set_sequence_name "seqfeatue_relationship_pk_seq"
      belongs_to :term
      belongs_to :object_seqfeature
      belongs_to :subject_seqfeature
    end

    class Dbxref < ActiveRecord::Base
      set_table_name "dbxref"
      set_primary_key "dbxref_id"
      set_sequence_name "dbxref_pk_seq"
      has_many :dbxref_qualifier_value
      has_many :location
      has_many :reference #probably is a "has_one" rel.
      has_many :term_dbxref
      has_many :bioentry_dbxref
    end

    class DbxrefQualifierValue < ActiveRecord::Base
      set_table_name "dbxref_qualifier_value"
      set_primary_key nil #dbxref_id, term_id, rank
      set_sequence_name nil
      belongs_to :dbxref
      belongs_to :term
    end

    class Location < ActiveRecord::Base
      set_table_name "location"
      set_sequence_name "location_pk_seq"
      set_primary_key "location_id"
      belongs_to :seqfeature
      belongs_to :dbxref
      belongs_to :term
      has_many :locatio_qualifier_value

      def to_s
        if self.strand!=1
         str="complement("+self.start_pos.to_s+".."+self.end_pos.to_s
+")"
        else
          str="("+self.start_pos.to_s+".."+self.end_pos.to_s+")"
        end
        return str    
      end 
       
    end

    class LocationQualifierValue <  ActiveRecord::Base
      set_table_name "location_qualifier_value"
      set_primary_key nil #location_id, term_id
      set_sequence_name nil
      belongs_to :location
      belongs_to :term
    end

    class Reference < ActiveRecord::Base
      set_table_name "reference"
      set_primary_key "reference_id"
      set_sequence_name "reference_pk_seq"
      belongs_to :dbxref
      has_many :bioentry_reference
    end

    class Taxon < ActiveRecord::Base
      set_table_name "taxon"
      set_primary_key "taxon_id"
      set_sequence_name "taxon_pk_seq"
      has_many :taxon_name #probably has_one
      has_one :bioentry
    end

    class TaxonName < ActiveRecord::Base
      set_table_name "taxon_name"
      set_primary_key nil
      set_sequence_name nil
      belongs_to :taxon
    end

    class TermDbxref < ActiveRecord::Base
      set_table_name "term_dbxref"
      set_primary_key nil #term_id, dbxref_id
      set_sequence_name nil
      belongs_to :term
      belongs_to :dbxref
    end

    class TermPath < ActiveRecord::Base
      set_table_name "term_path"
      set_primary_key "term_path_id"
      set_sequence_name "term_path_pk_seq"
      belongs_to :ontology
      belongs_to :subject_term
      belongs_to :object_term
      belongs_to :predicate_term

    end

    class TermRelationship < ActiveRecord::Base
      set_table_name "term_relationship"
      set_primary_key "term_relationship_id"
      set_sequence_name "term_relationship_pk_seq"
      belongs_to :ontology
      belongs_to :subject_term
      belongs_to :predicate_term
      belongs_to :object_term
      has_one :term_relationship_term
    end

    class TermRelationshipTerm < ActiveRecord::Base
      set_table_name "term_relationship_term"
      set_primary_key "term_relationship_id"
      set_sequence_name nil
      belongs_to :term_relationship
      belongs_to :term
    end

    class TermSynonym < ActiveRecord::Base
      set_table_name "term_synonym"
      set_primary_key nil #term_id, synonym
      set_sequence_name nil
      belongs_to :term
    end

###commented functions should be setted up as private. Only fetch should
be public or better a few functions
## I should set up pub functions to explore database.
#the only object user should see is the Sequence object
###anche se in teoria questa classe la potrei separate ( solo
definizione di biosql) lasciando la def di seq di biosql separata.

    def find_bioentry_by_biodatabase_id(id)
      return Bioentry.find_by_biodatabase(id)
    end

#    def find_all_biodatabase
#     return Biodatabase.find_all
#    end

    def fetch_accession(accession)
	#forse dovrei fare qui il controllo della entry che ritorna qualche
cosa non nella creazione dell'oggetto che ha come parametro un nil
      Sequence.new(self, Bioentry.find_by_accession(accession))
    end

    def fetch_id(id)
	Sequence.new(self, Bioentry.find(id))
    end

    class Sequence
	  private
      def get_seqfeature(sf)
        #in seqfeature BioSQL class
        Bio::Feature.new(sf.typeterm.name,
sf.location.to_s,sf.seqfeature_qualifier_value.collect{|sfqv|
Bio::Feature::Qualifier.new(sfqv.term.name,sfqv.value)})
      end

	def length=(len)
		@entry.biosequence.length=len
	end


	public
      def initialize(db,entry)
        @db=db unless entry.nil?
        @entry=entry unless entry.nil?
      end
	
     def bioentry_id
	@enty.bioentry_id
     end      

      def name 
        @entry.name
      end
      
      def name=(value)
      	@entry.name=value
      end	
	
      def accession
        @entry.accession
      end
	
	def accession=(value)
	  @entry.accession=value
	end
      
      def taxon_id
        @entry.taxon_id
      end

      def database
        @entry.biodatabase.name
      end

      def database_desc
        @entry.biodatabase.description
      end
      
      def version
        @entry.version
      end
      def version=(value)
      	@entry.version=value
      end

      def division
        @entry.division
      end
	def division=(value)
		@entry.division=value
	end

      def description
        @entry.description
      end
	def description=(value)
		@entry.description=value
	end

      def identifier
        @entry.identifier
      end
	def identifier=(value)
		@entry.identifier=value
	end

      def features

Bio::Features.new(@db.find_feature_by_entry(@entry.bioentry_id).collect
{|sf|
                            self.get_seqfeature(sf)})
      end

      def seq
        case @entry.biosequence.alphabet
        when /.na/i			# 'dna' or 'rna'
          Bio::Sequence::NA.new(@entry.biosequence.seq)
        when /protein/i				# 'protein'
          Bio::Sequence::AA.new(@entry.biosequence.seq)
	when nil
	  nil
        end
      end
      
	def seq=(value)
	#chk which type of alphabet is, NU/NA/nil
	#value could be nil ? I think no.
		@entry.biosequence.seq=value
		
		self.length=value.length
	end
      
      def length 
        @entry.biosequence.length
      end

      def references
        #return and array of hash, hash has these keys ["title",
"dbxref_id", "reference_id", "authors", "crc", "location"]
        #probably would be better to create a class refrence to collect
these informations
        @entry.bioentry_reference.collect{|item|
item.reference.attributes}
      end

      def comment
        @entry.comment.comment_text
      end
	

	def save
	#I should add chks for SQL errors
		@entry.biosequence.save
		@entry.save
	end
	def to_fasta
		print ">" + accession + "\n"
		print seq.gsub(Regexp.new(".{1,#{60}}"), "\\0\n")
	end
    end

  end
end




=begin

db=Bio::BioSQL.new('postgresql','discovery','febo')



stampa le feature di una sequenza:
 s=db.find_seq_by_bioentry_id(7176)

s.features.each{|x| x.qualifiers.each {|y| puts x.feature+' '+x.position
+"\t"+y.qualifier+'-->'+y.value}}

versione molto contratta nel caso fornisca gli id in un array
 h=[7294,7176,7094,7247,7294]

 h.each{|k| puts k; db.find_seq_by_bioentry_id(k).features.each{|x|
x.qualifiers.each {|y| puts x.feature+' '+x.position+"\t"+y.qualifier
+'-->'+y.value}}}

per la creazione della sequenza prima qualifier, feature, features,
sequenza, genbank
=end

=begin

= Bio::BioSQL

--- Bio::BioSQL.new(adapter = 'postgresql', database = 'yourdbname',
username = 'yourdbusername')

--- Bio::BioSQL.close -> Arguments for establish_connection in
ActiveRecord::Base

--- Bio::BioSQL#fetch_accession(accession)

      Returns Bio::BioSQL::Sequence object.

--- Bio::BioSQL#fetch_id(bioentry_id)

      Returns Bio::BioSQL::Sequence object.

--- Bio::BioSQL#find_bioentry_by_biodatabase_id(id)
      
      Returns Array of Bio::BioSQL::Bioentry selected by biodatabase_id
      Actually not very useful, I think would be usefull to return an
array of sequences.

== Bio::BioSQL::Sequence

--- Bio::BioSQL::Sequence.new(db, entry)

--- Bio::BioSQL::Sequence#bioentry_id -> Integer
--- Bio::BioSQL::Sequence#_name -> String
--- Bio::BioSQL::Sequence#accession -> String
--- Bio::BioSQL::Sequence#definition -> String
--- Bio::BioSQL::Sequence#comment -> String

      Returns the first comment.  For complete comments, use comments
method.
      Note: I have to test this. In not AR version there was comments I
have not yet implemented it. 

--- Bio::BioSQL::Sequence#description -> String

--- Bio::BioSQL::Sequence#database -> String

      Returns the name of the biodatabase associated with the sequence.

--- Bio::BioSQL::Sequence#database_desc -> String
      
      Returns the description of biodatabase associated with the
sequence.

--- Bio::BioSQL::Sequence#date -> String

      NOT IMPLEMENTED

--- Bio::BioSQL::Sequence#division -> String

      NOT IMPLEMENTED

--- Bio::BioSQL::Sequence#length -> Integer
      Returns the length of the sequence stored into the db.


--- Bio::BioSQL::Sequence#features

      Returns Bio::Features object.

--- Bio::BioSQL::Sequence#references -> Array

      Returns reference informations in Array of Hash (not
Bio::Reference) (hash has these keys ["title", "dbxref_id",
"reference_id", "authors", "crc", "location"])

--- Bio::BioSQL::Sequence#identifier -> ?
      I can't remember. Sorry

--- Bio::BioSQL::Sequence#seq

      Returns Bio::Sequence::NA or AA object.

--- Bio::BioSQL::Sequence#subseq(from, to)

      NOT IMPLEMENTED Returns Bio::Sequence::NA or AA object (by lazy
fetching).

--- Bio::BioSQL::Sequence#taxonomy -> Integer

      Returns taxon_id of the sequence

--- Bio::BioSQL::Sequence#version -> String

--- Bio::BioSQL::Sequence#save 

      Save the sequence into the db
	
--- Bio::BioSQL::Sequence#to_fasta
      
      Prints on standard output the sequence formatted in fasta format

=end


-- 
Ra



More information about the BioRuby mailing list