[Biopython-dev] Martel debug_level = 2 bug

Tue Jan 16 20:57:23 EST 2001

There's a small bug with the debug_level = 2 option
in Martel.  When the debug position is within the
first 8 characters it does not show the match text.
Here's the context diff for a patch.

*** Generate.py.orig    Tue Jan 16 20:41:48 2001
--- Generate.py Tue Jan 16 12:32:25 2001
***************
*** 460,466 ****
              s = s[:17] + " ... " + s[-17:]
          self.msg = s
      def __call__(self, text, x, end):
!         print "Match %s (x=%d): %s" % (repr(text[x-8:x+8]), x,
                                              repr(self.msg))
          return x

--- 460,466 ----
              s = s[:17] + " ... " + s[-17:]
          self.msg = s
      def __call__(self, text, x, end):
!         print "Match %s (x=%d): %s" % (repr(text[max(0, x-8):x+8]), x,
                                              repr(self.msg))
          return x

Pretty basic problem, which points out the usability problem
in having negative ranges mean something.

Really, this output should be improved to be more descriptive.
I have problems figuring out which character is the current
debug position because things like "\012" (newline) add
characters to the string so it isn't always in the same place.

The current output looks like:
Match 'Q63631;\012' (x=29): '(?P<AC>AC   (?P<a ... +)\\;)*(\\n|\\r\\n?))'

The "Match " is present so there is a well definable piece of
text to key off of, which is important when there is other debug
output.

The second field is the 8 characters +/- the scan position.
My problem is I don't know where that position is without
counting manually, and I don't remember if it's 8 characters or
7 or what.  

The third field is the character position in the string.
It is "(x=29)" for this case, but should probably be "pos=29"
to be more understandable.  (The internal variable name is
x but should also likely be "pos".)

The 4th and last field is the string representation of the
part of the regular expression that matched.  It is at most
40 characters.  If the field is greater than 40 characters,
the first and last 17 characters are used and the " ... "
inserted as a marker for the missing text, as you see above.

I chose 40 characters since that seems to keep the character
count under 80 columns.

There are a couple of ways to change this.  I could break the
match text into two parts, to make it easy to find where the
pre and post parts are, as in

Match 'Q636' '31;\012' pos=24 '[0-9]'

or I could use two lines of information, like

Match 'Q63631;\012' pos=29 '(?P<AC>AC   (?P<a ... +)\\;)*(\\n|\\r\\n?))'
              ^

With two lines I could make the repr of the regex longer, which
provides more context to the match, as in

Match 'Q63631;\012' pos=29 '(?P<AC>AC   (?P<abcdef>(\w+))some [0-9]'
              ^            'text (\\n|\\r\\n?))'

This probably is the most useful, although it takes up twice
as much space.

Brad, you use this debug level a lot.  What are your thoughts on
usability?

                    Andrew