[BioRuby] Bio::Sequence forces a DNA sequence to lowercase?
Pjotr Prins
pjotr2008 at thebird.nl
Sun Feb 3 09:43:49 UTC 2008
On Sun, Feb 03, 2008 at 02:21:21PM +0900, Naohisa GOTO wrote:
> On Sat, 2 Feb 2008 11:47:41 +0100
> pjotr2008 at thebird.nl (Pjotr Prins) wrote:
>
> > As case contains (external) information to the Sequence class I would
> > favour 'translate' and 'complement' would conserve case after their
> > job. I think that is the correct thing to do.
>
> How do 'translate' conserve case when both upcase and downcase
> characters are mixed?
In fact what I meant was to retain case for AA and NA - as converting
a sequence to lower or upper case makes the assumption the user
actually wants that, while it may be he wants to retain that
information.
After some thought I realise that the requirement on the users-end for
retaining case is really a poor-man's solution for storing positional
binary information in addition to the sequence itself. He has:
'aAttGa' rather than ('aattga','010010').
His reason is that later modifications to the sequence - i.e. a
deletion - he can do in one step, rather than two.
I have been in that position where I wanted to store maximum
likelihood values with each nucleotide. Any post processing required
processing two data structures.
It would be really nice to resolve this in a way where we can create a
sequence, attach positional information (any object type), and allow
processing where the positional information gets retained with the NAs
or AAs. I think a generic solution would be often used if it is
simple enough.
Jan's proposal for using Bio::Feature was in this case useful, but
would not hold for information over a full sequence - or would be
overkill. So perhaps downcasing the sequence is, indeed, the right
thing to do - but allow adding of positional information.
It is a classic case for an adapter. Perhaps something like:
seq = Bio::Sequence::NA.new(:adapter => Bio::Sequence::RetainCase('aAttGA'))
and for maximum likelihoods
class ML < Bio::Sequence::PositionalInformation
end
seqml = Bio::Sequence::NA.new('aattga', :adapter => ML([0.1,0.2,0.3,0.2,0.3,0.1]))
where the adapter handles additional methods like:
seqml[1]
>> 'a'
seqml.posinfo[1]
>> 0.3
and in the RetainCase example:
seq[1]
>> 'a'
seq.posinfo[1]
>> 'A'
Pj.
More information about the BioRuby
mailing list