Peter Hoffmann on Stackoverflow @peterhoffmann on twitter Peter Hoffmann on Facebook Contact me per email Subscribe to Atom Feed

Peter Hoffmann

Software Engineer
prev page next page User Data Analysis

Posted on August 13, 2012
#python is a Kickstarter-like project to create a real-time social feed. It is build with python and django so it's worth a look. reached its goal of $500,000 with 7,634 backers in one month 37 hours before deadline. At the moment there are 2080 ative users on I have started to collect user and following data. They will be updated once a day for the next weeks. This will allow an in depth analysis of the growth of a new social network. The data and scripts will be available on

Data Analysis

For the Data Analysis i'll use the Python Data Analysis Library pandas and networkx. You'll find more about pandas in the ecellent book Python for Data Analysis from pandas programmer Wes McKinney.


There are 2080 users on There are currently 7,634 backers for the project, so not all backers are on yet.

>>> import json
>>> from pandas import DataFrame
>>> data = json.load(open('user.json'))
>>> frame_data = [(k, (len(v['following']), len(v['followers']))) for k,v in data.items()]
>>> frame = DataFrame.from_items(frame_data, orient='index', columns=['following', 'followers'])
>>> print frame
<class 'pandas.core.frame.DataFrame'>
Index: 2080 entries, andru to cinchel
Data columns:
following    2080  non-null values
followers    2080  non-null values
dtypes: int64(2)


>>> frame['following'].describe()
count    2080.000000
mean       12.196154
std        25.445203
min         0.000000
25%         2.000000
50%         6.000000
75%        13.000000
max       499.000000

The mean is following 12 other users. The lower 25% or 520 users are following 2 or less other user, 50% or 1040 users are following 6 or less other users. User kuhcoon is following the most other users with 499.

>>> frame.sort('following', ascending=False)[:10]
               following  followers
kuhcoon              499         45
carrie_kane          335         81
ryanmcmillan         330         69
chb                  319         60
icraigt              259         90
teawithcarl          228        131
bjarteminde          226         66
ricki                219         48
jeffio               199         24
thomasmarzano        188         29

>>> frame.sort('following', ascending=False)[:10].plot(kind='barh', rot=0)

Appnet User following


>>> frame['followers'].describe()
count    2080.000000
mean       12.055769
std        47.972182
min         0.000000
25%         0.000000
50%         2.000000
75%         7.000000
max      1329.000000

The mean is beeing followed 12 other users. The lower 25% or 520 users are followed by 0 other users, 50% or 1040 users are followed by 2 or less other users. User dalton has with 1329 the most followers.

>>> frame.sort('followers', ascending=False)[:10]
                following  followers
dalton                152       1329
gruber                  8        958
siracusa                9        610
joshuatopolsky          2        364
berg                   33        336
adn                     2        327
stevestreza            86        321
chockenberry            0        287
dan                     5        280
marco                   0        267

>>> frame.sort('followers', ascending=False)[:10].plot(kind='barh', rot=0)

Appnet User followers

Page Rank

PageRank computes a ranking of the nodes in the graph G based on the structure of the incoming links. It was originally designed as an algorithm to rank web pages NetworkX 1.7 documentation

>>> import networkx as nx
>>> import operator
>>> g  = nx.DiGraph()
>>> for k,v in data.items():
    for f in v['following']:
>>> rank = nx.pagerank(g).items()
>>> rank.sort(key=operator.itemgetter(1))
>>> for u,v in rank[:10]:
...     print u, v
dalton         0.0834293744516
gruber         0.0359611460706
siracusa       0.0203027976961
joshuatopolsky 0.013701136992
chockenberry   0.0136562239226
berg           0.0118944041028
adn            0.0115074654949
leolaporte     0.0114818396115
stevestreza    0.0109684017167
jsnell         0.00999449983332

At the moment the page rank list is not so much different to the followers list, but it will become more interesting in the future.


a clique in an undirected graph is a subset of its vertices such that every two vertices in the subset are connected by an edge. Clique (graph theory) - Wikipedia, the free encyclopedia

The clique algorithom works with a directed graph. An edge between two users will be set if both users follow each. The maximum clique of users who follow and are followd by all other users in the clique is 8. There are 5 different maximum cliques, with mostly the same people. As expected the social network is not very diverse yet.

>>> g = nx.Graph()
>>> for k,v in data.items():
    for f in [x for x in v['following'] if x in v['followers']]:
>>> cliques = list(nx.find_cliques(g))
>>> cliques.sort(key=lambda x: len(x) ,reverse=True)
>>> for c in cliques[-10:]:
    print len(c), c
8 [u'ernie', u'lucypepper', u'zen', u'carrie_kane', u'icraigt', u'ricki', u'topgold', u'tanja']
8 [u'ernie', u'lucypepper', u'warriorgrrl', u'icraigt', u'topgold', u'carrie_kane', u'tanja', u'ricki']
8 [u'steve', u'icraigt', u'carrie_kane', u'sneakyness', u'ricki', u'katcaverly', u'lucypepper', u'tanja']
8 [u'ricki', u'carrie_kane', u'icraigt', u'lucypepper', u'katcaverly', u'sneakyness', u'tanja', u'warriorgrrl']
8 [u'ricki', u'carrie_kane', u'icraigt', u'lucypepper', u'katcaverly', u'sneakyness', u'tanja', u'zen']
7 [u'ernie', u'lucypepper', u'steve', u'carrie_kane', u'icraigt', u'tanja', u'ricki']
7 [u'steve', u'icraigt', u'carrie_kane', u'conor', u'anu', u'rene', u'timhanlon']
7 [u'steve', u'icraigt', u'carrie_kane', u'anu', u'lucypepper', u'katcaverly', u'ricki']
6 [u'ernie', u'lucypepper', u'warriorgrrl', u'icraigt', u'topgold', u'lucyinglis']
6 [u'ernie', u'lucypepper', u'ward', u'icraigt', u'carrie_kane', u'topgold']

Update There is a follow up article available with updated numbers.