After looking around (see links at bottom) I've come to the conclusion that,
for now, the simplest, yet mostly successful, way to safely include input from
various places is with xml.sax.saxutils and encode('ascii',
'xmlcharrefreplace'), like so:
>>> from xml.sax import saxutils
>>> xml_snippet = u'ö < på > & бб лн'
>>> xml_snippet
u'\xf6 < p\xe5 > & \u0431\u0431 \u043b\u043d'
>>> saxutils.escape(xml_snippet)
u'\xf6 < p\xe5 > & \u0431\u0431 \u043b\u043d'
>>> saxutils.escape(xml_snippet).encode('ascii', 'xmlcharrefreplace')
'ö < på > & бб лн'
The conversion of > to > is not required, but I prefer it, so I'm glad
it's the default in saxutils. Also, we could encode to iso-8859-1 instead of
ascii, but, as Mr. Lundh points out in [3], it's still safer to use ascii.
links
[1] http://www.xml.com/pub/a/2005/06/15/py-xml.html [2] http://www.diveintopython.org/xml_processing/unicode.html [3] http://online.effbot.org/2003_10_01_archive.htm#20031016 [4] http://boodebr.org/main/python/all-about-python-and-unicode