[BioPython] seq objects etc...
Bradley Marshall
bradbioperl@yahoo.com
Fri, 17 Sep 1999 15:29:36 -0700 (PDT)
> > How do we identify the sequences without some form
> of ID? Do
> > you make a sequence record which has-a sequence?
>
> It really depends on what's needed to ID a sequence.
> For example,
> for some cases it is sufficient to say
>
> if seq1 is seq2:
> # object identity
OK, fine.
> if seq1 == seq2:
> # could, for example, compare the two string
> representations
This is fine for if ('atg') = ('atg')
but what about: if (chromosome_A) == (chromosome_B)
Isn't that a pretty expensive comparison?
> as compared to
> if seq1.id == seq2.id:
> # compare based on id.
>
>
> For example, suppose you want to tell if the
> translation of a
> piece of DNA is the same as the protein you have
>
> if Bio.translate(dna_seq) == protein_seq:
> pass
>
> There's no way you can do this with ids, so there
> must be a form
> of identity without using id equivalence.
OK, I'm starting to see where we're thinking differently. I'm not
really thinking about using the ID for object equivalence. I'm more
thinking about pulling a given sequence out of a database. I agree
that ID's shouldn't be used to compare equivalence.
Having said that, I can see the utility of having a seq_record which
has-a sequence. The record could have ID, Accession, type, etc and
would be useful for archival, retrieving, etc. The sequence object
would contain the functionality. Is this what you're suggesting?
> BTW, I heard at the OMG meeting that it is possible
> for two
> different databases to think they have sequence
> information for
> the same segment of DNA, but because of different
> transcription
> errors, they have different sequences (even though
> they are
> supposed to be the same).
How do you mean? You're saying two different people ( or the same
person at different dates ) put in a sequence for the same region, but
it had differences in sequence? That's why I think context is
important. Those two sequences would have different accession #'s,
right?
> So is it really possible that two objects with
> effectively the
> same identity (not necessarily the same id) to have
> different
> sequence representations?
yeah, they can have different sequence REPRESENTATIONS. One might be
wrong... but that's why context is important and why there's an
accession #.
> What I want to know is, is this a common event? If
> so, it would
> be another reason why ids are not the way to
> determin equivalence,
> or at least should not be the default way.
Again, I agree that id's should not be used to determine equivalence.
However, they are useful to have around to determine if two sequences
are SUPPOSED to be the same.
__________________________________________________
Do You Yahoo!?
Bid and sell for free at http://auctions.yahoo.com