[Biopython] Query for GSoc projects on SearchIO and Representation and manipulation of genomic variants
Fields, Christopher J
cjfields at illinois.edu
Mon Mar 26 13:24:08 EDT 2012
On Mar 26, 2012, at 4:19 AM, Peter Cock wrote:
> On Mon, Mar 26, 2012 at 5:31 AM, Ankesh Thakur <ankeshth at gmail.com> wrote:
>> Dear Sir,
>> I am a student of Biological Sciences and bioengineering at Indian
>> Institute of Technology, Kanpur (IIT Kanpur). I am willing to write
>> codes for Biopython during this summer. I am not very much clear about
>> the goals of this project. I want to know more about the suggested
>> projects, like what else I need to do apart from conversion of one file
>> format to other and showing the data on the console in human readable
>> form.
>>
>> I have no prior experience with bio modules of python. I have arround than
>> seven months experience with python git hub. And I have done Molecular
>> biology, Genetics and Bio-chemistry courses. I would like to learn
>> Biopython, BioPerl( if required) and other necessary tools during this
>> summer. Eagerly waiting for your reply.
>>
>> Regards,
>> Ankesh Kumar Thakur.
>
> Hello Ankesh,
>
> Both the SearchIO and genomic variant GSoC project ideas are
> more than just file format conversion and 'pretty printing' at the
> console. An essential part of this is designing a suitable object
> representation for efficient use of the data. That probably means
> creating objects (Python classes). This will require both a good
> understanding of the meaning of the data being represented
> (e.g. how are BLAST search results structured) but also how
> to design Python objects.
>
> For the SearchIO project, I went into a lot more detail on the
> Biopython development mailing list last week:
> http://lists.open-bio.org/pipermail/biopython-dev/2012-March/009468.html
>
> Peter
Might be a good opportunity go over what works via the bioperl SearchIO implementations, what doesn't, etc. The vast majority of the speed issues we (bioperl) have seen with SearchIO seem to have much more to do with object generation than with parsing (I think Ruby has the same issue).
Bioperl's SearchIO is summarized in the HOWTO:
http://www.bioperl.org/wiki/HOWTO:SearchIO
Simple enough, each reports are divi'd up into one or more Result, each of which can have multiple Hits, again each of which can have multiple HSPs. HSPs are also paired SeqFeatures, one for the query, one for the hit (I think this was implemented later).
Some basic notes about the BLAST parser design (SAX-like), written by Steve Chervitz during the time this was drawn up, are here:
https://github.com/bioperl/bioperl-live/blob/master/Bio/SearchIO/blast.pm#L2440
This doesn't apply to all SearchIO parsers, but it gives an idea of the thoughts behind it.
chris
More information about the Biopython
mailing list