[BioSQL-l] Storing "per letter" annotation?

Peter biopython at maubp.freeserve.co.uk
Sat May 24 08:21:12 EDT 2008


This is a BioSQL related query - but first a little background:

One topic that has recently come up on the Biopython developers
mailing list is extending our sequence classes to deal with "per
letter" annotation.  This annotation should then survive splicing the
sequence into sub-strings for example.

For example, with nucleotide sequences, each base-pair may have an
associated quality score (one float per bp).  Or perhaps you might
have a contig region where for each bp you want to record the number
of fragments it is supported by (one integer per bp).

Similarly, for proteins, you might know the secondary structure (for
example held as a character per amino acid, a = alpha helix etc).  For
a PDB file, you might want to have an object for each residue holding
an associated set of atomic coordinates, or may just the C-alpha back
bone coordinates (three floats per residue).  One final motivating
example, you might want to hold the solvent accessibility of each
residue (one float per residue).

First of all, have any of the other Bio* project implemented anything
like this?  If so, I'd like to have a look at the relevant
documentation (and depending on the language, even the
implementation).  And secondly, how would you go about storing it in
BioSQL?  As far as I can see, there isn't anything in BioSQL at the
moment suitable (other than abusing the sequence features).

I will be away most of next week, so I apologise in advance for an
delayed responses.

Peter
(Biopython)


More information about the BioSQL-l mailing list