[BioPython] seq objects etc...

Bradley Marshall bradbioperl@yahoo.com
Fri, 17 Sep 1999 15:29:36 -0700 (PDT)


> > How do we identify the sequences without some form
> of ID?  Do
> > you make a sequence record which has-a sequence?
> 
> It really depends on what's needed to ID a sequence.
>  For example,
> for some cases it is sufficient to say
> 
>   if seq1 is seq2:
>      # object identity

OK, fine.
 
>   if seq1 == seq2:
>      # could, for example, compare the two string
> representations

This is fine for if ('atg') = ('atg')

but what about:  if (chromosome_A) == (chromosome_B)
Isn't that a pretty expensive comparison?


> as compared to
>   if seq1.id == seq2.id:
>      # compare based on id.
> 
>
> For example, suppose you want to tell if the
> translation of a
> piece of DNA is the same as the protein you have
> 
>   if Bio.translate(dna_seq) == protein_seq:
>      pass
> 
> There's no way you can do this with ids, so there
> must be a form
> of identity without using id equivalence.

	OK, I'm starting to see where we're thinking differently.  I'm not
really thinking about using the ID for object equivalence.  I'm more
thinking about pulling a given sequence out of a database.  I agree
that ID's shouldn't be used to compare equivalence.

	Having said that, I can see the utility of having a seq_record which
has-a sequence.  The record could have ID, Accession, type, etc and
would be useful for archival, retrieving, etc.  The sequence object
would contain the functionality.  Is this what you're suggesting?


> BTW, I heard at the OMG meeting that it is possible
> for two
> different databases to think they have sequence
> information for
> the same segment of DNA, but because of different
> transcription
> errors, they have different sequences (even though
> they are
> supposed to be the same).

How do you mean?  You're saying two different people ( or the same
person at different dates ) put in a sequence for the same region, but
it had differences in sequence?  That's why I think context is
important.  Those two sequences would have different accession #'s,
right?

> So is it really possible that two objects with
> effectively the
> same identity (not necessarily the same id) to have
> different
> sequence representations?

yeah, they can have different sequence REPRESENTATIONS. One might be
wrong... but that's why context is important and why there's an
accession #.

> What I want to know is, is this a common event?  If
> so, it would
> be another reason why ids are not the way to
> determin equivalence,
> or at least should not be the default way.

	Again, I agree that id's should not be used to determine equivalence. 
However, they are useful to have around to determine if two sequences
are SUPPOSED to be the same.
__________________________________________________
Do You Yahoo!?
Bid and sell for free at http://auctions.yahoo.com