[Biojava-dev] Changes to Sequence in BioJava3
Andy Yates
ayates at ebi.ac.uk
Wed Nov 3 15:42:41 UTC 2010
Hi George,
The reason why the method name was called reverse() was to do "the right thing" when it came to a reverse of a Sequence. If the compound set supported complementing (as in DNA) then the right thing to get the reverse strand would be to return CGCA. If the method was called getReverseComplement() then what would that mean for those Sequences (peptides) which never had a complementing compound? The method has to apply to all levels of the Sequence interface for it to make sense.
If people can suggest a better name ( opposingStrand() would be on the right tracks) to indicate this state i.e.
DNA -> TGCG.opposingStrand() -> CGCA
PEP -> MVKV.opposingStrand() -> VKVM
Regards,
Andy
On 3 Nov 2010, at 15:35, George Waldon wrote:
> Hi Andy:
>
> Note that the reverse of a sequence is usually used to indicate the sequence in reverse order, from the 3' end to the 5' end. I think you should name your method getReverseComplement if you want to return the reverse & complement of a sequence:
>
> sequence: TGCG
> reverse: GCGT
> complement: ACGC
> reverse & complement: CGCA
>
> Regards,
> George
>
> On Tue, Nov 2, 2010 at 8:16 AM, Andy Yates <ayates at ebi.ac.uk> wrote:
>
> Hi everyone,
>
> As a caution to people with implementations already built on the Sequence interface I'm proposing a couple of changes to it. This will cause a binary class incompatibility & will have impacts in the methods you need to implement but I'll sort them out at the BioJava core end.
>
> 1). Removal of getSequenceAsString(Integer,Integer,Strand)
> ** The implementation is patchy & buggy often exposing data from backing stores
>
> 2). Addition of SequenceView<C> getReverse()
> ** Will return the sequence in the reverse strand
> ** Also complemented if applicable
>
> 3). Addition of isComplementable() to CompoundSet
> ** Used to support the above function
>
> This means substrings of Sequences are retrieved as so:
>
> DNASequence d = new DNASequence("ATGCGC");
> d.getSubSequence(2, 5).getSequenceAsString(); //Returns TGCG
> d.getSubSequence(2, 5).getReverse().getSequenceAsString(); //Returns CGCT
--
Andrew Yates Ensembl Genomes Engineer
EMBL-EBI Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK http://www.ensemblgenomes.org/
More information about the biojava-dev
mailing list