Back (for now)

Steve A. Chervitz sac@genome.stanford.edu
Mon, 7 Jul 1997 21:31:15 -0700 (PDT)


A few stray comments from SteveB's last comments:

>   I like the general outline; I think I agree with you that 2D structure
> is a module which should somehow be applicable to both 3D and 1D
> structure.   One consideration is that every (modern) 3D protein structure
> has a known 1D structure (i.e., sequence).   So, perhaps an easy way to
> impelement all of this would be that a 3D-structure has a 1D-structure,
> and a 1D-structure has a 2D structure.   

This seems reasonable to me, but may become awkward in some situations. 
Let's say we have a object to represent a residue within a 3D 
structure and we want to know what 2D state it is in. It seems more 
intuitive for the residue to store this data directly rather than to 
have it always get this information from a 2D structure associated with 
the whole 3D structure. My point here is that having a separate 2D 
structure would be handy for some (but not all) situations.

>   I use 'has a,' as one sort of relationship, though I haven't figured out
> if that is best.  Comments appreciated!  One reason for this approach is

In general, I favor 'has a' relationships since they decrease 
dependencies and encourage a more loose coupling between modules,
making it easier to use each module stand-alone, or with other modules. 
It also promotes extendibility. A 2D module might inherit from and 
extend the 1D module (Bio::Seq), as Georg hinted at. This may be 
reasonable.

>   I like your thoughts about folds (e.g., 4-helix-bundle), as a
> description of the 3D structure; I had not previously considered this.
> However, these describe a domain as a whole rather than any particular
> details of either the secondary or tertiary structure.  Perhaps we should
> have a DomainDescription module which is sort of like the 2D-structure
> module. Where 2D-structure contains secondary structure elements,
> DomainDescriptions have folds. A tricky caveat here is that folds can be
> discontinuous in sequence.

Take a look at:
http://genome-www.stanford.edu/~sac/perlOOP/bioperl/schema/struct.html
I've tried to do something similar with my Bio::Struct::Domain.pm 
module.


> > One more point: my hypothetical Bio::Struct.pm module doesn't know 
> > anything about 3D structures but delegates this task to Bio::Struct::PDB.pm. 
> > Similarly, there could be another module that handles strictly 2D issues. 
> 
> Naming is more of a philosophical and political question than a techical
> one.  On these grounds, I think that it is important that the object which
> knows about coordinates be Bio::Struct.  The reason is that the thing most
> people will want to do most often is parse in a PDB file and do something
> with it -- this "jumble of coordinates" will be the "currency" for
> structures just as "Bio::Seq" will be the corresponding one for sequences.
> 
> To reduece learning curve and to make things appear as simple as possible,
> I think that having a 'Bio::Seq' and a 'Bio::Struct' which are
> more-or-less capable of appearing to do everything necessary is important.

Good points. I'm just concerned about creating complex monolithic 
modules that are difficult to use and extend.


> > > I have no objection to this, but curious to know why you want to
> > > be able to do slices for revcom, etc.
> > 
> > I needed to process sequences for all genes on a yeast chromosome. It 
> > seemed easiest to create a big PreSeq object for the chromosomal sequence 
> > and then extract sub-sequences for each gene as needed. Since some genes 
> > are on the complementary strand, I needed revcom() to work like str(). 
> > See, for example:
> > http://genome-www.stanford.edu/~sac/perlOOP/bioperl/lib/Bio/Gene/Seq.pm
> 
> Ok; this makes sense.  I had forgotten about revcom's current
> impelmentation.  One idea was that it would modify the existing object;
> another idea was that it would return a modified object.  Right now it
> seems to be roughly in-between. :)
> 
> My suggested modification (probably can't show up until Bio::Seq) would be
> for revcom to return an object with the required modification.  Probably
> my preferred calling sequence would be:
> 
> $mybackgene = new Bio::Preseq ($mychromasome->str($end,$beg));
> $mygene = $mybackgene->revcom();
> print $mygene->str(), "\n";
> 
> 
> Or, maybe we should add another method like getseq to return a sequence
> object of a slice:
> 
> $mybackgene = $mychromasome->get_seq_obj($end,$beg);
>    # ick!  get_seq_obj is a horrible method name!    
> $mygene  = $mybackgene->revcom();

I would favor the latter strategy since it is clearer that you are 
dealing with a new sequence object.

SteveC