[Biopython-dev] Quality scores (and per-letter-annotation) in a SeqRecord?

Sun Feb 22 21:27:42 UTC 2009

Hi all;

> I was actually thinking about adding a per_letter_annotations (or
> using Brad's suggested name per_symbol_annotations) dictionary which
> could hold phred qualities, solexa qualities, secondary structure,
> atomic coordinates - any python sequence (e.g. string, list or tuple)
> with a length matching the sequence.  This would cover all the use
> cases I have come up with, and we can implement SeqRecord slicing
> which would also slice everything in the per_letter_annotations
> dictionary.
[...]
> I'm not sure if its exactly what Leighton has in mind, but it seems
> more complicated to have to do
> my_record.per_symbol_annotations["quality"]["phred"] rather than just
> my_record.per_symbol_annotations["quality_phred"].

I'm agreed with you here -- the double dictionary I proposed is ugly
and doesn't do much of anything extra. I'm +1 on exactly what you wrote
here, and am not picky about the naming.

> The only catch is the current tables only let us store
> strings.  We could store each per-letter-annotation entry (e.g. a
> single quality score) as a separate table entry (where the rank tells
> us the correct order), but bundling them all into a single long table
> row might be more efficient.  In the case of PHRED or Solexa scores,
> we could even use the FASTQ encoding (but a string "10, 20, 50, ..."
> might be more sensible).  This would require some co-ordination with
> the other Bio* projects, probably on the BioSQL mailing list.

My vote is for bundling them together into a single row table using
json to stringify the lists. It's a nice compact representation and
will be well supported in any language. Python 2.6 has the
simplejson library bundled, so it's just a matter of doing:

jsonified_list = json.dumps(the_quality_list)
the_quality_list = json.loads(jsonified_list)

Since I've been doing more Javascript and Python, I appreciate not
munging lists into strings with obscure separators and really like
json. As a bonus, it looks just like Python.

Brad