Back (for now)

Steve A. Chervitz
Wed, 2 Jul 1997 14:25:25 -0700 (PDT)

Steven E. Brenner wrote:

>   This is interesting.  Could you be a little more specific about where
> you think the overlap between 2D and 3D protein structure is, for module
> design?  (Let's leave 2D RNA aside for now, as that will be even harder,
> though if it can be included -- that's great!)  I have a hard time seeing
> how we could abstract any but the most incidental features.  For example,
> domains usually are not known in 2D structure any more than they are in
> sequence.  'Predicted v. experimental' seems like a characteristic of the
> whole item; just a flag and a pretty incidental one at that.  Active sites
> would be probably defined in quite different ways for 2D structure (which
> would be more like sequence) and 3D structure. 
>   Indeed, I would probably see more overlap between 2D structure and
> sequence than 2D structure and 3D.  But, I'm interested that you see
> otherwise and would like to hear more details. 

Here's my thinking: First, it's clear that "secondary structure" is just  
a mental construct that helps us think about the structural organization 
of biomolecules. Secondary structure can be thought of as a filter that 
can be applied to either a sequence or a 3D structure. So, given a 
primary sequence, you can list the secondary structural state of each 
residue (based on a prediction or analysis of the known structure). Given a 
3D structure, you can do the same thing by inspection of the actual 3D 

You can think of a linear sequence or a 3D structure as an assembly of 
secondary structural elements. In the case of the 3D structure, you can 
also describe the connectivity of the elements (e.g., 5-stranded, 
anti-parallel beta sheet, 4-helix bundle, etc.). This is still just a 
representation of the 3D structure, not the actual structure, which is a 
collection of connected atoms in 3-space. This is analogous to the way in 
which a string of secondary structural states is a representation of a 
primary sequence.

So (contrary to what I initially thought) it would seem best to NOT 
intermingle secondary structure with 3D structure in a Bio::Struct module. 
Instead, it may be best to keep secondary structural issues in a separate 
module which can deal with sequences or structures. 

However, there's one case where I can see some overlap between 3D and 2D 
structural issues: circular dichroism (CD) experiments. Using CD you can 
estimate the overall percentage of helix, sheet, and coil in a protein 
without knowing anything about the distribution of these regions in the 
molecule. Some NMR experiments can also estimate this. This information 
can, of course, also be obtained by analysis of the 3D structure.
Thus there are some proteins which have data about the overall 
fraction of secondary structure but have no 3D structure. I'm not aware 
of any databases that store this sort of information, so we may not want 
to worry too much about it now, but it is something to keep in mind.

One more point: my hypothetical module doesn't know 
anything about 3D structures but delegates this task to 
Similarly, there could be another module that handles strictly 2D issues. 

>   As an aside, what is '' in Bio::Struct::Scop_data ?
>   I had no idea you had coded up so much which uses scop!

I decided to go ahead and create a scop module it since I knew I 
would be doing alot of work with scop data. is a little 
dictionary I created for converting between class/fold number to class/fold 
name. You probably already have such a thing, but it was easy enough to 
create. Here's a snippet: 

1:All alpha     1:Globin-like
1:All alpha     2:Long alpha-hairpin
1:All alpha     3:Cytochrome c

> > My modified version of can be found at:
> >
> > The main change is in the revcom() method to permit slicing. Note that it 
> > would be nice to modify reverse() and complement() similarly.
> I have no objection to this, but curious to know why you want to
> be able to do slices for revcom, etc.

I needed to process sequences for all genes on a yeast chromosome. It 
seemed easiest to create a big PreSeq object for the chromosomal sequence 
and then extract sub-sequences for each gene as needed. Since some genes 
are on the complementary strand, I needed revcom() to work like str(). 
See, for example: