Sal
Peter Hoffmann Director Data Engineering at Blue Yonder. Python Developer, Conference Speaker, Mountaineer

Pythons libxml2 cant parse unicode strings

This my Answer to the stackoverflow question: Python's libxml2 can't parse unicode strings:

It should be

# -*- coding: utf-8 -*-
import libxml2

DOC = u"""<?xml version="1.0" encoding="UTF-8"?>
<data>
  <something>Bäääh!</something>
</data>
""".encode("UTF-8")

xml_doc = libxml2.parseDoc(DOC)

The .encode("UTF-8") is needed to get the binary representation of the unicode string with the utf8 encoding.