[Biojava-l] Gsoc Amino acids physico-chemical properties calculation

Alexandru Paiu paiualex12 at gmail.com
Fri Mar 25 18:06:37 UTC 2011


Hi Peter

It's me again Paiu Alexandru from Romania .

I've started to study for 2 days about amino acids , but i didn't find
anything in romanian about the properties I have to implement for this
project . I found only about amino acids in general . I went to the
university library to find books about amino acids but i wasn't so lucky .
I'm really stucked in getting some information about that properties .

I found on wikipedia formulas and some informations about every method , but
I really can't understand exactly what is every formula trying to do . I'd
need some concrete examples for every method I have to implement . There are
to many abbreviations that I don't understand . I tried to get some help
from some friends that are studying pharmacy , but they couldn't help me .
I studied that tool
http://expasy.org/tools/<http://expasy.org/tools/protparam-doc.html>,
but i haven't figured out yet exactly how those methods work (which
are
the inputs for each method and how they obtain those outputs )

I've started to work the goals on Selection criteria . I've finished the
first 2 , now i'm working on using threads .
I'll use threads for taking multiple lines from the input file . Each Thread
will take a line at a time , will take the 2 strings separated by '\t' and
will be applied  StringOverlapFinder over the 2 Strings . In
StringOverlapFinder i look for the last 5 characters of String 1 in String 2
. If it isn't found i print String 1 . If is found an overlap , then there
are more cases to take care of . Some examples :
Ex 1 : x = "abcdefghijklm" and y = "hijklmnopqrst" (your example)  then the
output is =abcdefg
Ex 2 : x="asdcsadvsasevenfiveseven"  and y="sevenfivesevenasdasdas" then the
output is = asdcsadvsa
Ex 3: x="ascdaeseven" and y="bseven" and the output is x

So after this 3 examples i've found the "right" implementation . And for
this I'll take Ex Nr 2 , because I think it's the most complex .
I find that the the last 5 chars (seven) is found on index 0 . I take the
substring from y starting with index 0 and ending with the index of the
first overlap that is 0 too . So having a null substring the program should
stop here and the output would be asdcsadvsasevenfive . But the real output
shoud be asdcsadvsa .
So , after finding that this is a posible output , i should look for a
second overlap in y . It is found at index 9 ( the second seven) . I take
the substring again from y that starts with index 0 and end with index 8  (
9-1 ) and that is sevenfive . The lenght of this substring is 9 . Now i take
another substring but this time from x . It starts from index
(x.length()-5-9) and ends with index ( x.length()-5) . In this case this
substrings are equall and the program will write the correct output that is
asdcsadvsa . But before that , it should search for another possible overlap
, but in this case there isn't one .
In Ex Nr 3 it is found the "seven" string in y . It's taken the substring
from y , that is "b" , and is compared with the substring from x that is "e"
. They aren't equal so there is not an overlap . The program will look for
another string "seven" in y , but there isn't one so the output will be x ,
because there wasn't found any overlap .
I hope you understand my Ideeas . I'll send to you the jar when it's
finished

Is there a deadline for this selection criteria ?
That's it for today

Best Regards ,
Alex



More information about the Biojava-l mailing list