[Biopython-dev] [Bug 2671] Including GenomeDiagram in the main Biopython distribution

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Dec 9 15:43:19 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2671





------- Comment #21 from lpritc at scri.sari.ac.uk  2008-12-09 10:43 EST -------
(In reply to comment #20)
> (In reply to comment #12)
> > 
> > Bio.Graphics.GenomeDiagram.Utilities
> > ====================================
> > This is a collection of utilities for getting information useful for graph
> > values.  From the docstring,
> > 
> >     o apply_to_window (sequence, window_size, function, step=None)  Apply a
> >                         passed function to fragments of the passed sequence of
> >                         size window_size, with each window separated by the
> >                         passed step.
> 
> This windowing function is rather specific to GenomeDiagram by the nature of
> how it returns the values and their positions.  The handling of the end of the
> sequence is also non-general.  Suppose we put apply_to_window somewhere under
> Bio.Graphics.GenomeDiagram.  It can then be used with any sequence analysis
> function which takes a sequence/string and returns a float, returning the
> scores and window positions as expected by GenomeDiagram for drawing graphical
> tracks.

That seems sensible, to me.  I like the generality that would result from it,
and it seems like apply_to_window could even be a useful convenience function
addition to Bio.SeqUtils in its own right.

[...]

> Because they differ from the existing Bio.SeqUtils code, I think there is a
> case for adding the four non-windowed functions from GenomeDiagram's
> Utilities.py under Bio.SeqUtils.  Perhaps under a sub module like
> Bio.SeqUtils.Nucleotides or Bio.SeqUtils.NucUtils?  The existing GC functions
> in Bio.SeqUtils could be deprecated or at least declared obsolete.

I think that there's value to be had in standardising to a floating-point 0..1
or -1..1 range for some of these kinds of functions, so I would support such a
move on those grounds.

Regarding my GC skew code (and the corresponding AT skew code): that the
behaviour when there is no GC in the sequence is misleading (read: wrong ;) ). 
Strictly, a divide-by-zero error would be correct here, but I just lazily went
for a zero value for ease of drawing, instead of doing something that properly
indicated 'not a number'.  I think that what needs to be done for GenomeDiagram
is to modify the graphing code so that it does something appropriate for NaNs
(however they may be indicated) - this should perhaps be to stop at the
preceding point, and resume at the subsequent point, for line graphs; not to
draw a box for the heat map; and not to draw a bar for the bar chart (not that
this will always be distinguishable from a zero value...).

The GenomeDiagram GC/AT skew code also needs to be modified to return None or
some other NaN indicator before its behaviour can be considered correct.

Apologies for propagating those shortcuts - my bad.


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.



More information about the Biopython-dev mailing list