[BioPython] proposal 3

Jeffrey Chang jchang@SMI.Stanford.EDU
Tue, 18 Apr 2000 18:51:47 -0700 (PDT)


On Thu, 30 Mar 2000, Andrew Dalke wrote:

> Proposal 3 - The ends of a sequence may correspond to physical
> ends of the real sequence.  This data is stored in the attribute
> "endings", which has two elements, "left" and "right".  (Left
> is position 0.)  The possible values for the elements are
> UNKNOWN, TERMINAL, NONTERMINAL.

This is really complicated by the biology, because it's unclear what the
notion of a sequence really is.  For example, is a sequence based on the
data read from a gel?  What about alternative splicing, post-translational
modifications, SNP's, fragments, or plasmids?  In addition, some common
data structures used in bioinformatics don't have equivalences in biology,
such as consensus sequences, alignment hits, profiles/motifs/blocks,
etc.  In many of these cases, it is unclear what TERMINAL might mean.


>   This information will only be used rarely (as proof, biojava
> and bioperl don't track this data).

Yep.  There was some talk earlier on the bioperl lists on whether blast
HSP's should use the sequence object.  I don't know if they do now, but it
was a very possible use for the object, which would not require this extra
attribute.


>   Since I don't like the complexity and performance hits,
> I'm against the proposal.

No complaints here.

Jeff