[Biojava-l] GSoC Application
Mark Chapman
chapman at cs.wisc.edu
Thu Apr 8 20:45:21 UTC 2010
Hi Andreas,
Thanks for the feedback.
Difficulties and risks:
By viewing progressive multiple sequence alignment as four separate stages, I
believe the pieces become easier to manage. However, I also expect a few of my
ideas to prove quite challenging to implement. One of these challenges will be
efficient parallelization. Instead of spending all summer finding the optimal
approach, I plan to make routines which are called in sequence in a simple
implementation and in parallel in a separate one. Later work could then extend
the parallelism to a distributed computing framework such as hadoop or condor.
Another difficult aspect is to make a general interface for choosing anchors in
profile-profile alignment. The Myers-Miller algorithm chooses optimal midpoints
as anchors in an internal decision process. I hope to generalize this to allow
external identification of candidate anchors, as well.
Structural alignment integration:
At least three options exist for inserting structural information into the
multiple sequence alignment task: pairwise scoring, anchoring, and profile
scoring. First, scores from pairwise structural alignments could be used to
construct the similarity matrix. This would create a guide tree that aligns
sequences with similar structures earlier in the progressive alignment. Second,
structural alignment could identify possible anchors. The profile-profile
alignments would then conserve known structures when two profiles share some
anchor candidates. Both of these options are in my plan. The third option
would follow the consistency method of profile-profile alignment which replaces
scoring from a substitution matrix with a consistency score. This technique is
used in T-Coffee and ProbCons. The consistency score comes from how often
residues in each profile aligned when combining information from pairwise
alignments. If these were structural pairwise alignments, then the multiple
sequence alignment would preserve structural information. Later work could
implement this method as an alternative profile-profile alignment.
I'll try to incorporate these ideas when I revise my application later tonight.
And thanks again for your input.
Mark
On 4/8/2010 12:26 PM, Andreas Prlic wrote:
> Hi Mark,
>
> looks pretty good,
>
> * The time schedule feels tight. Where do you see possible
> difficulties and risks. What might take longer than expected?
>
> * I would like to be able to use 3D structure alignment information to
> guide the final alignment. This should increase reliability of the
> final alignment for remote sequence similarities. Any thoughts on how
> to accomplish this?
>
> Andreas
>
>
>
>
> On Thu, Apr 8, 2010 at 5:47 AM, Mark Chapman<chapman at cs.wisc.edu> wrote:
>> I would appreciate any feedback on my proposal from mentors or other
>> developers. Check it out at:
>> http://socghop.appspot.com/gsoc/student_proposal/show/google/gsoc2010/mark_chapman/t127055148817
>>
>> Thanks in advance,
>> Mark
>> _______________________________________________
>> Biojava-l mailing list - Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>
>
>
More information about the Biojava-l
mailing list