[Biopython-dev] Start/end co-ordinates in SeqRecord objects?

Brad Chapman chapmanb at 50mail.com
Tue May 22 01:25:39 UTC 2012


Peter;

> When describing  BLAST results, or FASTA alignments, or
> indeed many other local alignments you typically have a
> (gapped) query sequence and match sequence fragment,
> and the co-ordinates describing which part of the full query
> and matched sequence this is. i.e. You are told the start
> and end of the subsequence (and perhaps strand).
[...]
> One idea for doing this is to introduce a new location
> property to the SeqRecord (defaulting to None), which
> would be a FeatureLocation object normally used for
> SeqFeature objects.

I'm not sure if I understand the representation, but could we handle
this as a standard named SeqFeature within the SeqRecord? This would let
you store the metadata like gap information within the SeqFeature
qualifiers and avoid introducing a new property.

> Maybe all we need is a common convention about which
> keys to use in the annotation dictionary, and how to store
> the information (e.g. Python counting, start < end, and
> strand as +1 or -1 if present)?

I'm becoming more of a fan of this type of convention key/value approach
as opposed to specific attributes but it does seem nice to re-use your
existing classes if it holds the same information.

> Thoughts or feedback please? Would a worked example
> help with my explanation?

A worked example might help: not totally sure I grasp all the
subtleties,
Brad



More information about the Biopython-dev mailing list