Microsoft Sho Word Histogram and the Python Standard Library#python · · Peter Hoffmann
Through a blog post from John D. Cook on planet python I beacame aware of the Microsoft Sho Project for data analysis and scientific computing. I haven't installed it yet, but it looks promising and I always love to see progress and usage of IronPython on Windows.
Here's the Computing a Word Histogramm Example:
>>> fp = System.IO.File.ReadAllText("./declarationofindependence.txt") >>> table = System.Collections.Hashtable() >>> for word in fp.split(): if table.ContainsKey(word): table[word] +=1 else: table[word] = 1 >>> pairs = zip(list(table.Keys), list(table.Values)) >>> pairs.sort(lambda a,b: a<b) >>> bar([elt for elt in pairs[0:10]], [elt for elt in pairs[0:10]])
I've used a hashtable for counting in the past too, but the Counter Datastructure from the python standard library (added in 2.7) is much better suited for this kind of task:
>>> from collections import Counter >>> table = Counter() >>> table(fp.split()) >>> pairs = table.most_common(10)
It's shorter and more readable.
For sorting a list of lists based on a specific element I prefer using operator.itemgetter instead of a lambda expression.
>>> from operator import itemgetter >>> lst = [('orange', 5), ('banana', 7), ('apple', 2)] >>> lst.sort(key=itemgetter(1)) >>> lst [('apple', 2), ('orange', 5), ('banana', 7)]
The bottom line is that python has a great standard library and it is worth knowing it well.