[Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Tue Dec 9 15:43:19 UTC 2008
http://bugzilla.open-bio.org/show_bug.cgi?id=2671
------- Comment #21 from lpritc at scri.sari.ac.uk 2008-12-09 10:43 EST -------
(In reply to comment #20)
> (In reply to comment #12)
> >
> > Bio.Graphics.GenomeDiagram.Utilities
> > ====================================
> > This is a collection of utilities for getting information useful for graph
> > values. From the docstring,
> >
> > o apply_to_window (sequence, window_size, function, step=None) Apply a
> > passed function to fragments of the passed sequence of
> > size window_size, with each window separated by the
> > passed step.
>
> This windowing function is rather specific to GenomeDiagram by the nature of
> how it returns the values and their positions. The handling of the end of the
> sequence is also non-general. Suppose we put apply_to_window somewhere under
> Bio.Graphics.GenomeDiagram. It can then be used with any sequence analysis
> function which takes a sequence/string and returns a float, returning the
> scores and window positions as expected by GenomeDiagram for drawing graphical
> tracks.
That seems sensible, to me. I like the generality that would result from it,
and it seems like apply_to_window could even be a useful convenience function
addition to Bio.SeqUtils in its own right.
[...]
> Because they differ from the existing Bio.SeqUtils code, I think there is a
> case for adding the four non-windowed functions from GenomeDiagram's
> Utilities.py under Bio.SeqUtils. Perhaps under a sub module like
> Bio.SeqUtils.Nucleotides or Bio.SeqUtils.NucUtils? The existing GC functions
> in Bio.SeqUtils could be deprecated or at least declared obsolete.
I think that there's value to be had in standardising to a floating-point 0..1
or -1..1 range for some of these kinds of functions, so I would support such a
move on those grounds.
Regarding my GC skew code (and the corresponding AT skew code): that the
behaviour when there is no GC in the sequence is misleading (read: wrong ;) ).
Strictly, a divide-by-zero error would be correct here, but I just lazily went
for a zero value for ease of drawing, instead of doing something that properly
indicated 'not a number'. I think that what needs to be done for GenomeDiagram
is to modify the graphing code so that it does something appropriate for NaNs
(however they may be indicated) - this should perhaps be to stop at the
preceding point, and resume at the subsequent point, for line graphs; not to
draw a box for the heat map; and not to draw a bar for the bar chart (not that
this will always be distinguishable from a zero value...).
The GenomeDiagram GC/AT skew code also needs to be modified to return None or
some other NaN indicator before its behaviour can be considered correct.
Apologies for propagating those shortcuts - my bad.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.
More information about the Biopython-dev
mailing list