[Biojava-dev] Case-sensitive ProteinSequences

Scooter Willis HWillis at scripps.edu
Wed Nov 30 02:08:23 UTC 2011


Once we load the amino acid sequence we would not maintain the upper case
or lower case as each amino acid is a static reference to the
corresponding amino acid compound to save on memory. FastaReader is fairly
flexible in that you can create your own SequenceCreator that does upper
case conversion and then you can parse upper lower case and add as a
feature to the Protein Sequence. Not sure if this solves your problem in
using the sequence alignment code as I think this returns a new sequence
that is aligned. If you look in Biojava3-genome module GeneFeatureHelper
has a method loadFastaAddGeneFeaturesFromUpperCaseExonFastaFile that use
upper lower case in the fasta file to designate exons as an example.

Thanks

Scooter 



On 11/29/11 8:29 PM, "Spencer Bliven" <sbliven at ucsd.edu> wrote:

>I'm currently trying to read a FASTA file which encodes some information
>in
>the case of each amino acid. Specifically, the FASTA contains an alignment
>where upper case letters are aligned and lower case are unaligned.
>
>The first problem I ran into was that lower-case letters are not valid as
>input to AminoAcidCompoundSet.getCompoundForString(String), which gets
>called indirectly from the FastaReader. This could be fixed by subclassing
>AminoAcidCompoundSet and calling toUpper() on the input. However, the
>second problem is that I need to extract that case information later on.
>My
>current solution is a subclass of AminoAcidCompoundSet which contains two
>copies of each amino acid­one upper and one lower. This seems like a very
>ugly solution and it breaks all the Alignment algorithms (due to missing
>amino acids in the scoring matrices). Does anyone have a better
>suggestion?
>
>Thanks,
>Spencer
>
>_______________________________________________
>biojava-dev mailing list
>biojava-dev at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/biojava-dev





More information about the biojava-dev mailing list