[BioPython] seq objects etc...
Andrew Dalke
dalke@bioreason.com
Wed, 15 Sep 1999 12:26:39 -0600
Bradley Marshall <bradbioperl@yahoo.com>:
> [Me:]
> > Thus, we could have a light-weight sequence object
> > and use it to
> > compose something which meets the LSR requirements.
>
> How is that done?
A bunch of has-a relationships and proxy methods to forward
function calls to the underlying objects.
> How do we identify the sequences without some form of ID? Do
> you make a sequence record which has-a sequence?
It really depends on what's needed to ID a sequence. For example,
for some cases it is sufficient to say
if seq1 is seq2:
# object identity
if seq1 == seq2:
# could, for example, compare the two string representations
as compared to
if seq1.id == seq2.id:
# compare based on id.
For example, suppose you want to tell if the translation of a
piece of DNA is the same as the protein you have
if Bio.translate(dna_seq) == protein_seq:
pass
There's no way you can do this with ids, so there must be a form
of identity without using id equivalence.
So I don't think the core data structure needs an id field.
BTW, I heard at the OMG meeting that it is possible for two
different databases to think they have sequence information for
the same segment of DNA, but because of different transcription
errors, they have different sequences (even though they are
supposed to be the same).
So is it really possible that two objects with effectively the
same identity (not necessarily the same id) to have different
sequence representations?
I know it is true for sequences from the PDB since the SEQRES
records can be (and are) different from the sequence given in
the ATOM records. The PDB documenation even gives a dozen reasons
why they could be different. Thus, the record ``pdb|2plv1'' could
be two different (but similar) sequences.
What I want to know is, is this a common event? If so, it would
be another reason why ids are not the way to determin equivalence,
or at least should not be the default way.
Andrew Dalke
dalke@bioreason.com