[GSoC] GSoC 2014 queries and inputs

Fields, Christopher J cjfields at illinois.edu
Tue Mar 18 08:08:05 EDT 2014


Htslib also had vcf support.  One advantage there might be that additional format support could be added at some point.  Not sure how the community at large views it though...

Chris

Sent from my iPad

> On Mar 18, 2014, at 4:31 AM, "Francesco Strozzi" <francesco.strozzi at gmail.com> wrote:
> 
> Hi Ujjwal,
> consider that BioRuby itself is only 100% compatible with CRuby and almost
> fully compatible with JRuby (there are few libraries which do not work).
> The idea here should be to provide a higher interface to manage and query
> VCF data and so my advise is to try not to spend too much time on parsing
> issues and instead reuse existing code and libraries. I think we can live
> with a JRuby only implementation, since you also proposed to use Neo4J and
> the possibility to pack everything in a jar may sound tempting in the end
> :).
> But if you would like to implement something that can work across multiple
> Ruby implementations I think there are two ways:
> 1) you can write a simple parser in plain Ruby, VCF are just TSV files so
> it's pretty straight forward. But implementing a solid parser which can
> handle every aspect of the information stored in VCF files still will
> require some time and testing.
> 2) you can look at existing C libraries and write a binding using the Ruby
> FFI. This extension will be usable both by CRuby and JRuby. If this sounds
> interesting, I will suggest looking into VCFLIB (
> https://github.com/ekg/vcflib).
> 
> In the end these options may sound like GSoC projects on their own, so if
> you would like to follow one or the other, I suggest you to try and balance
> this work with the rest of the things to do on the project, to build a
> solid work plan.
> 
> All the best.
> Francesco
> 
> 
> On Mon, Mar 17, 2014 at 9:37 PM, Ujjwal Thaakar <ujjwalthaakar at gmail.com>wrote:
> 
>> If its fine to have a JRuby only implementation then we definitely write a
>> thin wrapper over Picard
>> 
>> 
>>> On 18 March 2014 01:56, Ujjwal Thaakar <ujjwalthaakar at gmail.com> wrote:
>>> 
>>> When we say BioRuby I think it should work with Ruby - CRuby, JRuby,
>>> Rubinius etc. I'm not sure it's a good idea to constrain people to JRuby!
>>> 
>>> 
>>> On 18 March 2014 01:48, Francesco Strozzi <francesco.strozzi at gmail.com>wrote:
>>> 
>>>> I don't think it's necessary.  If you would like to use JRuby, there is
>>>> the Picard API ( http://picard.sourceforge.net ) which you can reuse
>>>> right away. It's fast and well tested.
>>>> 
>>>> All the best.
>>>> Francesco
>>>> Il 17/mar/2014 20:38 "Ujjwal Thaakar" <ujjwalthaakar at gmail.com> ha
>>>> scritto:
>>>> 
>>>>> Would we have to write a new VCF parser in Ruby?
>>>>> 
>>>>> 
>>>>> 
>>>>>> On 15 March 2014 17:33, Ujjwal Thaakar <ujjwalthaakar at gmail.com> wrote:
>>>>>> 
>>>>>> Hi,
>>>>>> My name is Ujjwal. I'm a 21 years old student from India and
>>>>> interested in
>>>>>> contributing to Bioruby this year. I have certain queries regarding
>>>>> the
>>>>>> project idea listed.
>>>>>> 
>>>>>>   1. Can you give me some more use cases for this tool. Some specific
>>>>> 
>>>>>>   functional requirements we'd like to see. What we need to mine
>>>>> determines
>>>>>>   the data structure of our persistence layer and therefore which
>>>>> database
>>>>>>   engine to use.
>>>>>>   2. When you say a RESTful api, we want to deploy this on a server
>>>>> with
>>>>> 
>>>>>>   a backing database together with a ruby gem that communicates with
>>>>> the api
>>>>>>   right? And I presume we also want people to be able to make
>>>>> comparison of
>>>>>>   our hosted VFC files with their local VCF files
>>>>>>   3. Although this is a *Bioruby* project, the server doesn't
>>>>> 
>>>>>>   necessarily need to be written in Ruby I presume? As is mentioned,
>>>>> Scala or
>>>>>>   JRuby could be used. I would suggest we have a look at Go lang too.
>>>>>> 
>>>>>> To give you a background about me. I was a GSoC intern last year for
>>>>> Ruby
>>>>>> on Rails where I implemented a RESTful collection routing api. I am an
>>>>>> intermediate ruby programmer. I have also been interested in synthetic
>>>>>> biology for about a year now and have some lab experience too so I
>>>>>> understand the basics of biology and specifically genetic
>>>>> engineering. I am
>>>>>> a computer science undergrad and have taken a course on data
>>>>> engineering
>>>>>> too. I also have experience working with REST apis and am building one
>>>>>> right now for my startup.
>>>>>> 
>>>>>> I have been wondering on the database. I think Neo4J will be a great
>>>>> fit.
>>>>>> It's not heavy like oracle and does not need installation. It's
>>>>> portable
>>>>>> and can be started and stopped easily on the machine. Has low memory
>>>>>> footprint and support for SPARQL too although it's native query
>>>>> language
>>>>>> Cypher will do the trick for us right now. We can run embedded
>>>>> instances
>>>>>> too using JRuby which are super fast. I'm the maintainer of the most
>>>>>> popular Neo4j ruby bindings and also in the process of rewriting the
>>>>> next
>>>>>> version of neo4j-core. It will allow us to make all sorts of queries
>>>>> and do
>>>>>> data mining at an incredible speed while being incredibly portable and
>>>>>> light. All logic can then reside within the gem itself and we do not
>>>>> need
>>>>>> any backend. It should be fast enough since we'll be directly dealing
>>>>> with
>>>>>> java objects made available through jruby. I have a fair idea of how
>>>>> fast
>>>>>> this is and its really fast although working with such huge files
>>>>> will have
>>>>>> different challenges. We don't need a database for the embedded
>>>>> version.
>>>>>> All we need is jars which fortunately are available as a gem so all
>>>>> we have
>>>>>> to do is include them as dependencies and our database is ready! I
>>>>> don't
>>>>>> think it will be this easy for any other db while giving us the same
>>>>> speed,
>>>>>> power and capabilities!
>>>>>> 
>>>>>> I've started working on the proposal and will upload it in a couple of
>>>>>> days for your feedback. This is going to be incredibly fun :)
>>>>>> 
>>>>>> BTW what is the user base of bioruby like? What does it lack from
>>>>> other
>>>>>> bio libraries like biopython?
>>>>>> 
>>>>>> How much biology do I need to understand for this project or will I
>>>>> learn
>>>>>> as we go along?
>>>>>> 
>>>>>> --
>>>>>> Thanks
>>>>>> Ujjwal
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Thanks
>>>>> Ujjwal
>>>>> _______________________________________________
>>>>> GSoC mailing list
>>>>> GSoC at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/gsoc
>>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Thanks
>>> Ujjwal
>>> 
>> 
>> 
>> 
>> --
>> Thanks
>> Ujjwal
>> 
> 
> 
> 
> -- 
> 
> Francesco Strozzi
> _______________________________________________
> GSoC mailing list
> GSoC at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/gsoc



More information about the GSoC mailing list