[GSoC] GSoC 2014 queries and inputs

Ujjwal Thaakar ujjwalthaakar at gmail.com
Mon Mar 17 15:37:19 EDT 2014


Would we have to write a new VCF parser in Ruby?


On 15 March 2014 17:33, Ujjwal Thaakar <ujjwalthaakar at gmail.com> wrote:

> Hi,
> My name is Ujjwal. I'm a 21 years old student from India and interested in
> contributing to Bioruby this year. I have certain queries regarding the
> project idea listed.
>
>    1. Can you give me some more use cases for this tool. Some specific
>    functional requirements we'd like to see. What we need to mine determines
>    the data structure of our persistence layer and therefore which database
>    engine to use.
>    2. When you say a RESTful api, we want to deploy this on a server with
>    a backing database together with a ruby gem that communicates with the api
>    right? And I presume we also want people to be able to make comparison of
>    our hosted VFC files with their local VCF files
>    3. Although this is a *Bioruby* project, the server doesn't
>    necessarily need to be written in Ruby I presume? As is mentioned, Scala or
>    JRuby could be used. I would suggest we have a look at Go lang too.
>
> To give you a background about me. I was a GSoC intern last year for Ruby
> on Rails where I implemented a RESTful collection routing api. I am an
> intermediate ruby programmer. I have also been interested in synthetic
> biology for about a year now and have some lab experience too so I
> understand the basics of biology and specifically genetic engineering. I am
> a computer science undergrad and have taken a course on data engineering
> too. I also have experience working with REST apis and am building one
> right now for my startup.
>
> I have been wondering on the database. I think Neo4J will be a great fit.
> It's not heavy like oracle and does not need installation. It's portable
> and can be started and stopped easily on the machine. Has low memory
> footprint and support for SPARQL too although it's native query language
> Cypher will do the trick for us right now. We can run embedded instances
> too using JRuby which are super fast. I'm the maintainer of the most
> popular Neo4j ruby bindings and also in the process of rewriting the next
> version of neo4j-core. It will allow us to make all sorts of queries and do
> data mining at an incredible speed while being incredibly portable and
> light. All logic can then reside within the gem itself and we do not need
> any backend. It should be fast enough since we'll be directly dealing with
> java objects made available through jruby. I have a fair idea of how fast
> this is and its really fast although working with such huge files will have
> different challenges. We don't need a database for the embedded version.
> All we need is jars which fortunately are available as a gem so all we have
> to do is include them as dependencies and our database is ready! I don't
> think it will be this easy for any other db while giving us the same speed,
> power and capabilities!
>
> I've started working on the proposal and will upload it in a couple of
> days for your feedback. This is going to be incredibly fun :)
>
> BTW what is the user base of bioruby like? What does it lack from other
> bio libraries like biopython?
>
> How much biology do I need to understand for this project or will I learn
> as we go along?
>
> --
> Thanks
> Ujjwal
>



-- 
Thanks
Ujjwal


More information about the GSoC mailing list