[Biojava-dev] EnsemblApi use case for DNASequences
Andy Yates
ayates at ebi.ac.uk
Thu May 13 13:31:15 UTC 2010
Not at the moment. The 2bit implementation has a worker and has been built with the idea that it _could_ be extended to as you say a 4bit implementation. If it were written I wouldn't keep it to just DNA or RNA but to any CompoundSet with 16 or less compounds.
Andy
On 13 May 2010, at 14:20, Peter wrote:
> On Thu, May 13, 2010 at 1:38 PM, Andy Yates <ayates at ebi.ac.uk> wrote:
>>
>>
>> As you said at the end of your email the best way to accomplish this
>> is by creating a SeqeunceProxyReader which can do all this logic
>> and lets you work with the "right" objects and not have to re-implement
>> that code. Now this leaves a few alternatives to how you can represent
>> this in memory. We already have a 2bit implementation (will be called
>> TwoBitSequenceReader) for storing very large pieces of Sequence
>> but that only has support for ACGT and no support for gaps or Ns.
>> This could be extended to bring in support for these as features or
>> you could materialise that sequence and then push it into another
>> Sequence object I have been working with (unchecked in atmo)
>> which lets you join Sequences together. This combined with a
>> Sequence which returns Compounds of a particular type e.g. Ns for
>> any given length would let you represent massive amounts of
>> Sequence in a very small amount of space. All of these updates
>> will be in place soon but I cannot say exactly when
>
> Does BioJava have a 4bit sequence implementation for ambiguous
> DNA (or RNA)? That would let you treat N as 1111 (all four bits set)
> and a gap as 0000 (none of the bits set).
>
> Peter
--
Andrew Yates Ensembl Genomes Engineer
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/
More information about the biojava-dev
mailing list