[Biojava-l] 3 questions and problems
mark.schreiber at novartis.com
mark.schreiber at novartis.com
Sun Sep 18 23:36:57 EDT 2005
>Hello,
>
>I would like to ask three questions or to mention problems, respectively.
>
>1. Trying to write a protein-sequence in a GenPept file resulted in the
>following error message: ClassCastException in GenpeptFileFormer line
361.
>What does this mean and how can I write my sequences?
The class is trying to cast an object called value to a List without
checking it's type. Aparently in the case you have value is not an
instance of a List.
Try changing the code to this and let me know if it fixes the problem. If
it does I'll commit it to CVS.
ub.append("ACCESSION ");
List l;
if(value instanceof List){
l = (List)value;
}else{
l = new ArrayList();
l.add(value);
}
for (Iterator ai = l.iterator(); ai.hasNext();)
{
ub.append((String) ai.next());
}
acb = new StringBuffer(ub.toString());
>2. There is a problem with BioSQL. The attribute alphabet in the table
>biosequence has the type VARCHAR(10). The BioJava alphabet PROTEIN-TERM
has
>12 characters. I always got an error message, when I tryed to get a
protein
>sequence with this alphabet from the database. A simple select statement
>showed that the alphabet in the table is abbrevated to PROTEIN-TE, which
is
>not equal to the BioJava name and causes trouble. I solved this problem
by
>altering the table declaration to VARCHAR(12). Now it works fine. Is
there
>another solution for this or should this be the only one?
This is probably the best fix for now. Ideally it would be good for biosql
to standardise some alphabet names but this might not happen for a while.
Might be worth suggesting to the biosql list that the size of the alphabet
name field be increased.
>3. I also experimented with the HMM for pair wise sequence alignments,
which
>was proposed in the cookbook. Has anybody an idea how one could combine
this
>HMM with the SubstitutionMatrix from the alignment package? I don't see
how
>we can produce a senseful distribution including a substitution matrix in
>the match state. This might especially be hard to realize because we
can't
>exclude that there are some ambigious symbols in the sequences to be
>aligned, which are not in the substitution matrix at all. I am thankfull
for
>any good ideas.
It is possible in theory to make a Distribution from a similarity matrix
providing you know how it was made. Typically similarity matrices are log
odds scores that are mutliplied by a constant and then rounded to an
integer. The value of the constant is probably irrelevant (it's a
constant) so you could convert back again as long as you can normalize to
1.0. This is not perfect as you get some rounding errors but it should be
close enough.
By the way, it seems your alignment classes have not been checked in. Are
you going to do this soon?
- Mark
Sincerely
Andreas Dräger
--
Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko!
Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
_______________________________________________
Biojava-l mailing list - Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l
More information about the Biojava-l
mailing list