github.com/hoffmann Peter Hoffmann on Stackoverflow @peterhoffmann on twitter Peter Hoffmann on Facebook Contact me per email Subscribe to Atom Feed

Peter Hoffmann

Software Engineer
prev page next page

Pythons libxml2 cant parse unicode strings

Posted on October 14, 2009
#stackoverflow #python

This my Answer to the stackoverflow question: Python's libxml2 can't parse unicode strings:

It should be

# -*- coding: utf-8 -*-
import libxml2

DOC = u"""<?xml version="1.0" encoding="UTF-8"?>
<data>
  <something>Bäääh!</something>
</data>
""".encode("UTF-8")

xml_doc = libxml2.parseDoc(DOC)

The .encode("UTF-8") is needed to get the binary representation of the unicode string with the utf8 encoding.