[BioPython] seq objects etc...

Bradley Marshall bradbioperl@yahoo.com
Fri, 17 Sep 1999 15:29:36 -0700 (PDT)

> > How do we identify the sequences without some form
> of ID?  Do
> > you make a sequence record which has-a sequence?
> It really depends on what's needed to ID a sequence.
>  For example,
> for some cases it is sufficient to say
>   if seq1 is seq2:
>      # object identity

OK, fine.
>   if seq1 == seq2:
>      # could, for example, compare the two string
> representations

This is fine for if ('atg') = ('atg')

but what about:  if (chromosome_A) == (chromosome_B)
Isn't that a pretty expensive comparison?

> as compared to
>   if seq1.id == seq2.id:
>      # compare based on id.
> For example, suppose you want to tell if the
> translation of a
> piece of DNA is the same as the protein you have
>   if Bio.translate(dna_seq) == protein_seq:
>      pass
> There's no way you can do this with ids, so there
> must be a form
> of identity without using id equivalence.

	OK, I'm starting to see where we're thinking differently.  I'm not
really thinking about using the ID for object equivalence.  I'm more
thinking about pulling a given sequence out of a database.  I agree
that ID's shouldn't be used to compare equivalence.

	Having said that, I can see the utility of having a seq_record which
has-a sequence.  The record could have ID, Accession, type, etc and
would be useful for archival, retrieving, etc.  The sequence object
would contain the functionality.  Is this what you're suggesting?

> BTW, I heard at the OMG meeting that it is possible
> for two
> different databases to think they have sequence
> information for
> the same segment of DNA, but because of different
> transcription
> errors, they have different sequences (even though
> they are
> supposed to be the same).

How do you mean?  You're saying two different people ( or the same
person at different dates ) put in a sequence for the same region, but
it had differences in sequence?  That's why I think context is
important.  Those two sequences would have different accession #'s,

> So is it really possible that two objects with
> effectively the
> same identity (not necessarily the same id) to have
> different
> sequence representations?

yeah, they can have different sequence REPRESENTATIONS. One might be
wrong... but that's why context is important and why there's an
accession #.

> What I want to know is, is this a common event?  If
> so, it would
> be another reason why ids are not the way to
> determin equivalence,
> or at least should not be the default way.

	Again, I agree that id's should not be used to determine equivalence. 
However, they are useful to have around to determine if two sequences
are SUPPOSED to be the same.
Do You Yahoo!?
Bid and sell for free at http://auctions.yahoo.com