Sal
Peter Hoffmann Director Data Engineering at Blue Yonder. Python Developer, Conference Speaker, Mountaineer

XML data binding with python descriptors

Python descriptors are used to represent attributes of other classes. A descriptor must implement one or more methods of the descriptor protocol:

__get__(self, instance, owner)
__set__(self, instance, value)
__delete__(self, instance)

There are already some great articles about descriptors from Raymond Hettinger, Mark Summerfield and Marty Alchin. Descriptors are used since python 2.2 to implement new style classes and in Django ORM to implement the ForeignKey, OneToOneField and ManyToManyField relations.

The following code shows how you can use descriptors and XPath expressions to access XML datastructures in a more pythonic way.

import lxml.etree
class Bind(object):
    def __init__(self, path, converter=None, first=False):
        '''
        path -- xpath to select elements
        converter -- run result through converter
        first -- return only first element instead of a list of elements
        '''
        self.path = path
        if converter is None:
            converter = lambda x: x
        self.converter = converter
        self.first = first

    def __get__(self, instance, owner=None):
        res = instance._elem.xpath(self.path)
        if self.first:
            return self.converter(res[0])
        return [self.converter(r) for r in res]

The Bind Descriptor expects the class instance to have a attribute _elem which is a lxml.etree._Element.

I'm using a sample XML response from the isbndb.com REST Api to show the data-binding.

<ISBNdb server_time="2010-07-21T15:56:06Z">
    <BookList total_results="1">
        <BookData book_id="programming_collective_intelligence" isbn="0596529325">
            <Title>Programming collective intelligence</Title>
            <AuthorsText>Toby Segaran</AuthorsText>
            <PublisherText publisher_id="oreilly">O'Reilly, 2007.</PublisherText>
        </BookData>
    </BookList>
</ISBNdb>

And the data mapping:

import dateutil

class Data(object):
    def __init__(self, elem):
        self._elem = elem

class Book(Data):
    #use xpath text() to get text 
    title = Bind('Title/text()', first=True)
    #get text via converter
    author = Bind('AuthorsText', converter=lambda x: x.text,  first=True)     
    publisher = Bind('PublisherText/text()', first=True)
    publisher_id = Bind('PublisherText/@publisher_id', first=True)

class ISBNdb(Data):
    #use the dateutil.parser to convert string to datetime
    server_time = Bind('@server_time', converter=dateutil.parser.parse, first=True)
    #convert result to integer
    total_results = Bind('BookList/@total_results', converter=int, first=True)
    #bind result to custom class which is itself a mapping
    books = Bind('//BookData', Book)

Now let's play with the mapping:

>>> db = ISBNdb(lxml.etree.fromstring(test_response))
>>> db.server_time
datetime.datetime(2010, 7, 21, 15, 56, 6, tzinfo=tzutc())
>>> db.total_results
1
>>> db.books
[<Book object at 0x9c1780c>]
>>> book = db.books[0]
>>> book.title
'Programming collective intelligence'
>>> book.author
'Toby Segaran'
>>> book.publisher
"O'Reilly, 2007."
>>> book.publisher_id
'oreilly'

The source code from this example is available on gist.github.com/485977.