Python 2's Minidom-XML ParseString is Buggy

2014-03-22 | #python, #solution

On the most basic level that is, since this method will not accept unicode strings actually containing non-ascii-characters. Yippie! Totally understandable as well, since the single most important XML-encoding is UTF-8, which is a unicode representation. Congratulations Mr. Python!

This is especially strange considering parseString documentation says it uses StringIO, which according to its documentation should have no problem with unicodes. oO

So, if you are stupid like me, putting XML-responses from, say the requests-module, directly into parseString (btw: why the fuck is this, unpythonically, written camel cased?), thinking "Hey, great, this is already unicode, how convienient!": Nope, you're screwed. Yes, you really have to encode this again first.

from minidom.xml import parseString

xml_obj = parseString(nice_unicode.encode('utf-8'))

Take that, 21st century! Well, at least that works ...