[Biopython] Storing SeqRecord objects with annotation

Hilmar Lapp hlapp at gmx.net
Thu Jul 23 09:01:29 EDT 2009


On Jul 23, 2009, at 6:20 AM, Peter wrote:

> Currently the BioSQL schema doesn't have any explicit support
> for "per letter annotation"

I haven't been following the thread closely and so may be missing what  
is really meant by this. If, however, you mean associating annotation  
to a specific letter (position) in the sequence, BioSQL does support  
this - you'd create a seqfeature with appropriate location, and attach  
the annotation to the seqfeature.

Bioentry annotations are location-less, by comparison.

>
> The GenBank file format simply doesn't have an concept of "per
> letter annotation"

Since it does for in the above sense, I'm inclined to assume that you  
really do mean something different than the above?

> [...]
> You can record any object in the SeqRecord's annotation dictionary.
> However, saving the result to a file will be tricky - and it wouldn't
> work in BioSQL either.


Note that that's not entirely true. If you have a textual  
serialization (such as XML) of your object, you *can* store it in  
bioentry_qualifier_value. This is what we do in BioPerl with a TagTree  
annotation object that supports a nested hierarchical annotation  
structure needed for lossless representation of some UniProt lines.

Obviously, that won't allow you to query very well by individual  
elements of your custom annotation object. But you can build a custom  
index (e.g., using Lucene) that does that.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================





More information about the Biopython mailing list