Sal
Peter Hoffmann Director Data Engineering at Blue Yonder. Python Developer, Conference Speaker, Mountaineer

Filter out HTML tags and resolve entities in python

This my Answer to the stackoverflow question: Filter out HTML tags and resolve entities in python:

Use lxml which is the best xml/html library for python.

import lxml.html
t = lxml.html.fromstring("...")
t.text_content()

And if you just want to sanitize the html look at the lxml.html.clean module