Back (for now)

Georg Fuellen fuellen@dali.Mathematik.Uni-Bielefeld.DE
Thu, 3 Jul 1997 12:00:53 +0000 (GMT)


Steve Brenner wrote,
> Dear Steve,
> 
>   Thanks for your detailed thoughts about the relationships between
> modules for 2D and 3D structure.  [My comments deal with your overall
> design; I agree with all the issues, so I haven't repeated the text.]
> 
>   I like the general outline; I think I agree with you that 2D structure
> is a module which should somehow be applicable to both 3D and 1D
> structure.   One consideration is that every (modern) 3D protein structure
> has a known 1D structure (i.e., sequence).   So, perhaps an easy way to
> impelement all of this would be that a 3D-structure has a 1D-structure,
> and a 1D-structure has a 2D structure.   

Sounds very elegant !

>   I use 'has a,' as one sort of relationship, though I haven't figured out
> if that is best.  Comments appreciated!  One reason for this approach is
> that 1D-structure and 2D-structure are both discrete and linear.

actually, 2D-structure of RNA may be non-linear -- it can have
pseudo-knots, etc. But that info could be stored extra.

> 3D-structure is neither; any atom can be in any place (though there are
> obviously some correlations), and the atomic geometry is not a linear. So,
> 2D-structure more neatly maps onto 1D-structure; since we do need to link
> the 1D and 3D strucutre, we might as well use that link to get to the 2D
> as well.
> 
>   I like your thoughts about folds (e.g., 4-helix-bundle), as a
> description of the 3D structure; I had not previously considered this.
> However, these describe a domain as a whole rather than any particular
> details of either the secondary or tertiary structure.  Perhaps we should
> have a DomainDescription module which is sort of like the 2D-structure
> module. Where 2D-structure contains secondary structure elements,
> DomainDescriptions have folds. A tricky caveat here is that folds can be
> discontinuous in sequence.
> 
> 
> > However, there's one case where I can see some overlap between 3D and 2D 
> > structural issues: circular dichroism (CD) experiments. Using CD you can 
> > estimate the overall percentage of helix, sheet, and coil in a protein
> ...
> 
> I think that these data are not archived anywhere and are basically not
> much trusted.  They can be useful and we should keep the possibility of
> using them open.  However, I don't think that they are of sufficient
> import that they should play a large role in building the hierarchy.
> 
> 
> > One more point: my hypothetical Bio::Struct.pm module doesn't know 
> > anything about 3D structures but delegates this task to Bio::Struct::PDB.pm. 
> > Similarly, there could be another module that handles strictly 2D issues. 
> 
> Naming is more of a philosophical and political question than a techical
> one.  On these grounds, I think that it is important that the object which
> knows about coordinates be Bio::Struct.  The reason is that the thing most
> people will want to do most often is parse in a PDB file and do something
> with it -- this "jumble of coordinates" will be the "currency" for
> structures just as "Bio::Seq" will be the corresponding one for sequences.
> 
> To reduece learning curve and to make things appear as simple as possible,
> I think that having a 'Bio::Seq' and a 'Bio::Struct' which are
> more-or-less capable of appearing to do everything necessary is important.
> 
> 
> 
> > I decided to go ahead and create a scop module it since I knew I 
> > would be doing alot of work with scop data.  scop_dict.cf is a little 
> > dictionary I created for converting between class/fold number to class/fold 
> > name. You probably already have such a thing, but it was easy enough to 
> > create. Here's a snippet: 
> 
> I see.  We do have a similar type of thing which uses cdb files.  (It's
> just a set of functions.  For various historical and performance reasons,
> scop is not very OO).  As an aside, cdb files are great!
> 
> 
> 
> > > I have no objection to this, but curious to know why you want to
> > > be able to do slices for revcom, etc.
> > 
> > I needed to process sequences for all genes on a yeast chromosome. It 
> > seemed easiest to create a big PreSeq object for the chromosomal sequence 
> > and then extract sub-sequences for each gene as needed. Since some genes 
> > are on the complementary strand, I needed revcom() to work like str(). 
> > See, for example:
> > http://genome-www.stanford.edu/~sac/perlOOP/bioperl/lib/Bio/Gene/Seq.pm
> 
> Ok; this makes sense.  I had forgotten about revcom's current
> impelmentation.  One idea was that it would modify the existing object;
> another idea was that it would return a modified object.  Right now it
> seems to be roughly in-between. :)

In UnivAln.pm, there's the 'inplace' flag; if it is set, the existing object
is modified (if that makes sense).

> My suggested modification (probably can't show up until Bio::Seq) would be
> for revcom to return an object with the required modification.  Probably
> my preferred calling sequence would be:
> 
> $mybackgene = new Bio::Preseq ($mychromasome->str($end,$beg));
> $mygene = $mybackgene->revcom();
> print $mygene->str(), "\n";
> 
> 
> Or, maybe we should add another method like getseq to return a sequence
> object of a slice:
> 
> $mybackgene = $mychromasome->get_seq_obj($end,$beg);
>    # ick!  get_seq_obj is a horrible method name!    

I agree.

> $mygene  = $mybackgene->revcom();
> 
> 
> Steve
> 

best wishes,
georg