[Biojava-dev] Changes to DNASeqeunce ReverseComplement Behaviour

Andy Yates ayates at ebi.ac.uk
Thu Oct 21 16:24:23 UTC 2010


Hi Trevor,

Yes it was disabled because I didn't trust the implementation & I'd rather it broke than not telling you. The best way to get reversed subseqs would be (off the top of my head):

Sequence<NucleotideCompound> dna = getMyDna();
Sequence<NucleotideCompound> subseq = dna.getSubSequence(40, 1000);
Sequence<NucleotideCompound> revComp = 
  new ReversedSequenceView<NucleotideCompound>(
  new ComplementSequenceView<NucleotideCompound>(subseq));

This means that you've always got to request the position in +ve strand coordinates. Since you're working with Ensembl this shouldn't be a problem since they're always in +ve coordinates.

The advantage of the above mechanism is that you still only have 1 copy of the Sequence in memory since all the other interfaces just decorate the Sequence accordingly. 

Hope this helps

Andy

p.s. If you're stitching sequences together there's also the JoiningSequenceReader which lets you make 2 or more sequences act as if they are one long contiguous sequence. There's also 2bit & 4bit storage engines if you feel that your memory consumption is getting a bit on the large side

On 21 Oct 2010, at 15:12, PATERSON Trevor wrote:

> 
> Hi 
> 
> I am playing with your most recent biojava3-core-3.0-alpha1 release code 
> 
> I see that getting substrings from reverse complement sequences etc is now not implemented
> 
> e.g. by overriding
> 
> getSequenceAsString(Integer start, Integer end, Strand strand)  on a  ComplementSequenceView
> 
> which now throws an UnsupportedOperationException (not nice)
> 
> I found the previous behaviour flakey / wrong - and presumably that is why you have overriden it now 
> but is there any way to get a reverse complement subsequence easily? I have looked but can't see an obvious way - my use case is stitching together assembly fragments on the fly....
> 
> For my purposes I added a getReverseComplementSequenceAsString method to my own implementation of ProxySequenceReader to reverse-iterate and return the reverse complement sequence
> 
> so on my DADNASequence you can call: 
> 
> seq.getReverseComplementSequenceAsString(4,13) 
> 
> Otherwise, using a BioJava DNASequence  
> 
> - I can see you could make an intermediate DNASequence:
> 
> (new DNASequence(seq.getSubSequence(4,13).getSequenceAsString())).getReverseComplement().getSequenceAsString()
> 
> - or you could just do it with subString():
> 
> seq.getReverseComplement().getSequenceAsString().substring(seq.getBioEnd()-13, seq.getBioEnd()-4+1)
> 
> either of which could be wrapped in DNASequence, the View or the Reader. 
> 
> Trevor Paterson PhD
> new email trevor.paterson at roslin.ed.ac.uk 
> 
> Bioinformatics 
> The Roslin Institute
> The Royal (Dick) School of Veterinary Studies
> University of Edinburgh
> Scotland EH25 9PS
> phone +44 (0)131 5274197
> 
> bioinformatics.roslin.ed.ac.uk
> 
> Please consider the environment before printing this e-mail
> 
> The University of Edinburgh is a charitable body, registered in Scotland with registration number SC005336
> Disclaimer:This e-mail and any attachments are confidential and intended solely for the use of the recipient(s) to whom they are addressed. If you have received it in error, please destroy all copies and inform the sender. 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev

-- 
Andrew Yates                   Ensembl Genomes Engineer
EMBL-EBI                       Tel: +44-(0)1223-492538
Wellcome Trust Genome Campus   Fax: +44-(0)1223-494468
Cambridge CB10 1SD, UK         http://www.ensemblgenomes.org/








More information about the biojava-dev mailing list