[Biojava-l] SCF: support for ambiguities

Fri Oct 31 16:14:30 UTC 2008

A patch would be much appreciated!

cheers,
Richard

2008/10/31 community at struck.lu <community at struck.lu>:
> True. It was a first quick and dirty hack to get the rest of my project going.
>
> I think adding support of the IUPAC ambiguities to DNATools would be the most
> approbate solution. The SCF class can then easily be adapted.
>
> Are there any plans to do so?
> If not, I could give it a try and submit a patch for DNATools and SCF.
>
> Greetings,
> Daniel
>
> "Richard Holland" <holland at eaglegenomics.com> wrote:
>
>> It is the correct method, yes.
>>
>> However your code constructs a new hash set every time it does the
>> check for W or S etc.. It would be much more efficient to create
>> class-static references to the ambiguity symbols you need, instead of
>> (re)creating them every time they're encountered. A class-static gap
>> symbol reference would also be good in this situation.
>>
>> cheers,
>> Richard
>>
>>
>>
>> 2008/10/31 community at struck.lu <community at struck.lu>:
>> > Hello,
>> >
>> >
>> > I am using the SCF class in the context of HIV-1 population sequencing. In
>> > this context we do have sometimes ambiguous base calls. To support them I
>> > extended the SCF class to allow for IUPAC ambiguities up to 2 nucleotides.
>> >
>> > Therefore I simply added the following code to the "decode" function:
>> >
>> > #########################
>> >        public Symbol decode(byte call) throws IllegalSymbolException {
>> >
>> >            //get the DNA Alphabet
>> >            Alphabet dna = DNATools.getDNA();
>> >
>> >            char c = (char) call;
>> >            switch (c) {
>> >                case 'a':
>> >                case 'A':
>> >                    return DNATools.a();
>> >                case 'c':
>> >                case 'C':
>> >                    return DNATools.c();
>> >                case 'g':
>> >                case 'G':
>> >                    return DNATools.g();
>> >                case 't':
>> >                case 'T':
>> >                    return DNATools.t();
>> >                case 'n':
>> >                case 'N':
>> >                    return DNATools.n();
>> >                case '-':
>> >                    return DNATools.getDNA().getGapSymbol();
>> >                case 'w':
>> >                case 'W':
>> >                    //make the 'W' symbol
>> >                    Set symbolsThatMakeW = new HashSet();
>> >                    symbolsThatMakeW.add(DNATools.a());
>> >                    symbolsThatMakeW.add(DNATools.t());
>> >                    Symbol w = dna.getAmbiguity(symbolsThatMakeW);
>> >                    return w;
>> >                case 's':
>> >                case 'S':
>> >                    //make the 'S' symbol
>> >                    Set symbolsThatMakeS = new HashSet();
>> >                    symbolsThatMakeS.add(DNATools.c());
>> >                    symbolsThatMakeS.add(DNATools.g());
>> >                    Symbol s = dna.getAmbiguity(symbolsThatMakeS);
>> >                    return s;
>> > ... (and so on)
>> > #########################
>> >
>> > Is this the right way to do it? And if so, how can this code be submitted
> to
>> > the official biojava source code?
>> >
>> >
>> > Best regards,
>> > Daniel Struck
>> > _________________________________________________________
>> > Mail sent using root eSolutions Webmailer - www.root.lu
>> >
>> >
>> > _______________________________________________
>> > Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-l
>> >
>>
>>
>
>
> _________________________________________________________
> Mail sent using root eSolutions Webmailer - www.root.lu
>
>
>

-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/