[Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author names

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Jul 18 20:30:36 EDT 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2448


mmokrejs at ribosome.natur.cuni.cz changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mmokrejs at ribosome.natur.cuni
                   |                            |.cz




------- Comment #4 from mmokrejs at ribosome.natur.cuni.cz  2008-07-18 20:30 EST -------
(In reply to comment #2)

> {u'LastName': u'Mart\xednez-Oyanedel', u'Initials': 'J', u'ForeName':
> u'Jos\xe9'},

If I remember right this is the string-ified representation of utf8
data when you call str() or repr() on them. One could then in upper code
try to convert it back but one has to invent the magic code. In my programs
I avoid unicode but stick to utf8 and pass it back to the user. But as I say,
you may never use print(), str(), repr() because they are not utf8/unicode
safe. That should be one of the things to be fixed in python-3.


So in summary when I do raise an exception these values will get always
printed in the above escaped form, but it is the only exception. I believe
as long as you return the values the current code is ok. But, haven't tested.

grep-ing related stuff from my programs use e.g.:

self._connection = connect(unix_socket=unix_socket, db=dbname, user=username,
passwd=password, init_command='SET AUTOCOMMIT=0', charset='utf8',
use_unicode=False)

if self._connection.character_set_name() != 'utf8':
  # test whether we really have utf8 connection
  raise RuntimeError, "Connection to mysql not in utf8 mode: %s" %
self._connection.character_set_name()

value = unicode(value).encode('utf8')

http://evanjones.ca/python-utf8.html
http://www.idealliance.org/proceedings/xtech05/papers/02-08-01/
http://www.amk.ca/python/howto/unicode
http://diveintopython.org/xml_processing/unicode.html
http://www.jorendorff.com/articles/unicode/python.html


from elementtree.ElementTree import parse, Element, SubElement, ElementTree
# use 'utf8' and not 'utf-8' for Element.write() !!!

# We must supply unicode values to ElementTree and not just utf8 encoded
strings.
_value_node.text = _value.decode('utf8')


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list