[GSoC] GSoC 2014 BioRuby

Razvan Florea razvan.florea91 at gmail.com
Mon Mar 17 08:12:49 EDT 2014


Hi Francesco,

I used your indications and I updated the basic wrapper [1].
Please take a look and tell me if it is what you were expecting from me to
do and if I should do something else.

[1]: https://github.com/razvanflorea/picard-jruby-wrapper

Thank you,
Razvan


2014-03-17 10:14 GMT+01:00 Francesco Strozzi <francesco.strozzi at gmail.com>:

> Hi Razvan,
> have a look at the org.broadinstitute.variant.vcf
> and org.broadinstitute.variant.variantcontext.VariantContext classes within
> the Picard API. Those are used to read from a VCF file, while to write a
> VCF you need to use also
> the org.broadinstitute.variant.variantcontext.writer .
>
> Hope this can help a bit, docs are not incredibly helpful here to point
> out what every library does and you need to dig a bit on Google as well :-)
>
> All the best.
> Francesco
>
>
>
> On Sat, Mar 15, 2014 at 11:16 PM, Razvan Florea <razvan.florea91 at gmail.com
> > wrote:
>
>> Hi Francesco,
>>
>> I am trying to make that wrapper for Picard as you recommend me.
>>  I created a repository on github at [1]. Right now in this repository is
>> a jruby simple script that uses a class from Picard that converts between
>> "vcf" and "bcf" files.
>>
>> I didn't find classes for retrieving SNPs from VCF files. Can you help me
>> please with some information about that?
>>
>> [1] https://github.com/razvanflorea/picard-jruby-wrapper
>>
>> Best,
>> Razvan
>>
>>
>> 2014-03-15 10:17 GMT+01:00 Francesco Strozzi <francesco.strozzi at gmail.com
>> >:
>>
>> Hi Razvan,
>>>
>>> 1) I think having a client would be nice of course but I would not
>>> consider it critical. Building a client around a REST API is pretty
>>> straight forward in any language.
>>>
>>> 2) Yes of course, look also at the Picard (
>>> http://picard.sourceforge.net/) library. This is the low level API to
>>> access VCF and other files and GATK relies heavily on this to fetch the
>>> data out of raw files.
>>>
>>> 3) If you have some code on GitHub or other repo that you would like to
>>> show us, that's fine. Otherwise you could spend a bit of time writing a
>>> simple JRuby wrapper for Picard, to access a VCF file and retrieve a list
>>> of SNPs. This could be like a pet project to start wrapping your head
>>> around these libraries, while spending also some time with JRuby as well.
>>>
>>> All the best.
>>> Francesco
>>>
>>>
>>>
>>>
>>> On Fri, Mar 14, 2014 at 6:50 PM, Razvan Florea <
>>> razvan.florea91 at gmail.com> wrote:
>>>
>>>> Hello Francesco,
>>>>
>>>> 1. The queries will be made through http requests (basically GET and
>>>> POST). But does the project consist also of making a client for the web
>>>> service?
>>>> 2. I think using the GATK framework is absolutely necessary because
>>>> even we will choose to use a database engine, the VCF files have to be
>>>> migrated to the database which I think can be made with this framework. Am
>>>> I right?
>>>> 3. Meanwhile, do you think I can contribute somehow to show my skills
>>>> and my willing to work on this project this summer?
>>>>
>>>> Best,
>>>> Razvan
>>>>
>>>>
>>>> 2014-03-14 14:43 GMT+01:00 Francesco Strozzi <
>>>> francesco.strozzi at gmail.com>:
>>>>
>>>> Hi Razvan,
>>>>> the general idea is to try and have an interface which lets you do
>>>>> queries on top of the data stored into VCF files.
>>>>> For example, as a typical scenario one could ask to retrieve all the
>>>>> variations which are exclusively present into 20 samples out of a dataset
>>>>> of 100 samples.
>>>>> An API could then expose a method which take a list of samples names
>>>>> plus other conditions and returns for instance a json with all the
>>>>> variations fulfilling the query.
>>>>>
>>>>> Whether a database engine is to be used or not it may depend on how
>>>>> you would like to implement the whole thing. One can also imagine not to
>>>>> store anything into a database and just access the data from the VCF files
>>>>> but providing a higher level interface. In this case I'd suggest to you and
>>>>> to other students interested in the topic to explore also the GATK
>>>>> framework (https://github.com/broadgsa/gatk,
>>>>> http://www.broadinstitute.org/gatk/guide/topic?name=developer-zone)
>>>>> since it exposes a number of modules called walkers that should make the
>>>>> life easier in accessing and traversing VCF files.
>>>>>
>>>>> JRuby sounds about right, as you'll have the typical Ruby flexibility
>>>>> to quickly prototype new things while having the ability to include Java
>>>>> code (GATK is written in Java and Scala BTW).
>>>>>
>>>>> Cheers
>>>>> Francesco
>>>>>
>>>>>
>>>>> On Thu, Mar 13, 2014 at 9:58 AM, Razvan Florea <
>>>>> razvan.florea91 at gmail.com> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> My name is Razvan Florea and am studying Computing Science at
>>>>>> University of
>>>>>> Groningen, Netherlands.
>>>>>> I am writing this to show my interest for the BioRuby gsoc project:
>>>>>> "An
>>>>>> ultra-fast scalable RESTful API to query large numbers of genomic
>>>>>> variations".  Currently I am doing my bachelor thesis project which
>>>>>> is also
>>>>>> about developing a RESTful API.
>>>>>>
>>>>>> As Francesco recommand me I took a look on the links there are in the
>>>>>> proposal text and at the proposal itself and so far I understood that
>>>>>> the
>>>>>> basic idea of the project is to replace the manipulation of
>>>>>> information
>>>>>> from VCF files with manipulation of information from a database which
>>>>>> will
>>>>>> reside on an web service. Am I right?
>>>>>> If yes, what do you expect from the API to be capable to do?
>>>>>> Retrieving
>>>>>> "json"s with information is ok? Or is more than that?
>>>>>>
>>>>>> Also, Rails over JRuby could be a good choice of technology for
>>>>>> developing
>>>>>> the web service?
>>>>>>
>>>>>> Please give me any information you think it could be helpful for me.
>>>>>>
>>>>>> Thank you,
>>>>>> Razvan
>>>>>> _______________________________________________
>>>>>> GSoC mailing list
>>>>>> GSoC at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/gsoc
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Francesco Strozzi
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Francesco Strozzi
>>>
>>
>>
>
>
> --
>
> Francesco Strozzi
>


More information about the GSoC mailing list