[BioRuby] A Rails application with BioRuby

Thu Dec 20 07:41:12 UTC 2007

Hi Yen-Ju,

On 2007/12/19, at 6:54, Yen-Ju Chen wrote:

> Hi,
>  I am working on a rails application using BioRuby to collect references
> and database entries.
>  You can find the application (not source code yet) at
> journalclub.reciprocallattice.com

Cool.

>  It is still at early stage. I use it personally and figure it would be
> interesting to have more users.
>  If you want to join, please write to me in private so that it will not
> pollute BioRuby maillist.
>  I don't know how many users the application can take. Please see the
> website for more details.
>
>  These are things related to BioRuby,
>  * The output from Reference to BibTex format lacks abstract.
>  * It would be nice to be able to output to RIS format for EndNote and
> ReferenceManager.

If you could provide a patch for them, I'll include it in BioRuby.

>  * Is it possible to get DOI from PubMed ?

  entry = Bio::PubMed.query(16946072)
  doi = entry[/AID - (\S+) \[doi\]/, 1]

or you can extend the Bio::MEDLINE class to add the doi method

  class Bio::MEDLINE
    attr_reader :pubmed

    def doi
      @pubmed['AID'][/(\S+) \[doi\]/, 1]
    end
  end

  entry = Bio::PubMed.query(16946072)
  medline = Bio::MEDLINE.new(entry)
  doi = medline.doi

or utilize the XML format of the PubMed output

  entry_xml = Bio::PubMed.efetch(16946072, {"retmode" => "xml"})

           :
        <ArticleIdList>
            <ArticleId IdType="pii">313/5791/1295</ArticleId>
            <ArticleId IdType="doi">10.1126/science.1131542</ArticleId>
            <ArticleId IdType="pubmed">16946072</ArticleId>
        </ArticleIdList>
           :

then extract DOI ID

  require 'rexml/document'
  pubmed = REXML::Document.new(entry_xml)
  doi = pubmed.elements['//ArticleId[@IdType="doi"]'].get_text

>  * BioRuby can get information from many databases through biofetch,
>    but not processing them, like Pfam, Prosite, etc.

You can process them by appropriate corresponding classes. For example,

  cyclins = Bio::Fetch.query('prosite', 'PS00292')
  prosite = Bio::PROSIE.new(cyclins)

  prosite.entry_id
  # ==> "PS00292"

  prosite.definition
  # ==> "Cyclins signature."

  prosite.pattern
  # ==> "R-x(2)-[LIVMSA]-x(2)-[FYWS]-[LIVM]-x(8)-[LIVMFC]-x(4)-[LIVMFYA]-x(2)-[STAGC]-[LIVMFYQ]-x-[LIVMFYC]-[LIVMFY]-D-[RKH]-[LIVMFYW]."

  prosite.re
  # ==> /R.{2}[LIVMSA].{2}[FYWS][LIVM].{8}[LIVMFC].{4}[LIVMFYA].{2}[STAGC][LIVMFYQ].[LIVMFYC][LIVMFY]D[RKH][LIVMFYW]/i

>  * it is not clear what's the database from biofetch, for example: rn, rp,
> str, pr.
>    I am in structural biology. Many of these abbreviation is not obvious.

In BioRuby, the default BioFetch server is implemented as a proxy for the DBGET system through KEGG API.
So, please refer to the abbreviation field in the DBGET manual at

  http://www.genome.jp/dbget/

and also note that the DBGET service for GenBank (gb) database is no longer available.

Regards,
Toshiaki Katayama