App.net is a Kickstarter-like project to create a real-time social feed. It is build with python and django so it's worth a look.
App.net reached its goal of $500,000 with 7,634 backers in one month 37 hours before deadline. At the moment there are 2080 ative users on alpha.app.net. I have started to collect user and following data. They will be updated once a day for the next weeks. This will allow an in depth analysis of the growth of a new social network. The data and scripts will be available on github.com/hoffmann/appnetstats.
For the Data Analysis i'll use the Python Data Analysis Library pandas and networkx. You'll find more about pandas in the ecellent book Python for Data Analysis from pandas programmer Wes McKinney.
There are 2080 users on app.net. There are currently 7,634 backers for the app.net project, so not all backers are on alpa.app.net yet.
>>> import json
>>> from pandas import DataFrame
>>> data = json.load(open('user.json'))
>>> frame_data = [(k, (len(v['following']), len(v['followers']))) for k,v in data.items()]
>>> frame = DataFrame.from_items(frame_data, orient='index', columns=['following', 'followers'])
>>> print frame
<class 'pandas.core.frame.DataFrame'>
Index: 2080 entries, andru to cinchel
Data columns:
following 2080 non-null values
followers 2080 non-null values
dtypes: int64(2)
>>> frame['following'].describe()
count 2080.000000
mean 12.196154
std 25.445203
min 0.000000
25% 2.000000
50% 6.000000
75% 13.000000
max 499.000000
The mean is following 12 other users. The lower 25% or 520 users are following 2 or less other user, 50% or 1040 users are following 6 or less other users. User kuhcoon is following the most other users with 499.
>>> frame.sort('following', ascending=False)[:10]
following followers
kuhcoon 499 45
carrie_kane 335 81
ryanmcmillan 330 69
chb 319 60
icraigt 259 90
teawithcarl 228 131
bjarteminde 226 66
ricki 219 48
jeffio 199 24
thomasmarzano 188 29
>>> frame.sort('following', ascending=False)[:10].plot(kind='barh', rot=0)
>>> frame['followers'].describe()
count 2080.000000
mean 12.055769
std 47.972182
min 0.000000
25% 0.000000
50% 2.000000
75% 7.000000
max 1329.000000
The mean is beeing followed 12 other users. The lower 25% or 520 users are followed by 0 other users, 50% or 1040 users are followed by 2 or less other users. User dalton has with 1329 the most followers.
>>> frame.sort('followers', ascending=False)[:10]
following followers
dalton 152 1329
gruber 8 958
siracusa 9 610
joshuatopolsky 2 364
berg 33 336
adn 2 327
stevestreza 86 321
chockenberry 0 287
dan 5 280
marco 0 267
>>> frame.sort('followers', ascending=False)[:10].plot(kind='barh', rot=0)
PageRank computes a ranking of the nodes in the graph G based on the structure of the incoming links. It was originally designed as an algorithm to rank web pages NetworkX 1.7 documentation
>>> import networkx as nx
>>> import operator
>>> g = nx.DiGraph()
>>> for k,v in data.items():
for f in v['following']:
g.add_edge(k,f)
>>> rank = nx.pagerank(g).items()
>>> rank.sort(key=operator.itemgetter(1))
>>> for u,v in rank[:10]:
... print u, v
...
dalton 0.0834293744516
gruber 0.0359611460706
siracusa 0.0203027976961
joshuatopolsky 0.013701136992
chockenberry 0.0136562239226
berg 0.0118944041028
adn 0.0115074654949
leolaporte 0.0114818396115
stevestreza 0.0109684017167
jsnell 0.00999449983332
At the moment the page rank list is not so much different to the followers list, but it will become more interesting in the future.
a clique in an undirected graph is a subset of its vertices such that every two vertices in the subset are connected by an edge. Clique (graph theory) - Wikipedia, the free encyclopedia
The clique algorithom works with a directed graph. An edge between two users will be set if both users follow each. The maximum clique of users who follow and are followd by all other users in the clique is 8. There are 5 different maximum cliques, with mostly the same people. As expected the app.net social network is not very diverse yet.
>>> g = nx.Graph()
>>> for k,v in data.items():
for f in [x for x in v['following'] if x in v['followers']]:
g.add_edge(k,f)
>>> cliques = list(nx.find_cliques(g))
>>> cliques.sort(key=lambda x: len(x) ,reverse=True)
>>> for c in cliques[-10:]:
print len(c), c
...
8 [u'ernie', u'lucypepper', u'zen', u'carrie_kane', u'icraigt', u'ricki', u'topgold', u'tanja']
8 [u'ernie', u'lucypepper', u'warriorgrrl', u'icraigt', u'topgold', u'carrie_kane', u'tanja', u'ricki']
8 [u'steve', u'icraigt', u'carrie_kane', u'sneakyness', u'ricki', u'katcaverly', u'lucypepper', u'tanja']
8 [u'ricki', u'carrie_kane', u'icraigt', u'lucypepper', u'katcaverly', u'sneakyness', u'tanja', u'warriorgrrl']
8 [u'ricki', u'carrie_kane', u'icraigt', u'lucypepper', u'katcaverly', u'sneakyness', u'tanja', u'zen']
7 [u'ernie', u'lucypepper', u'steve', u'carrie_kane', u'icraigt', u'tanja', u'ricki']
7 [u'steve', u'icraigt', u'carrie_kane', u'conor', u'anu', u'rene', u'timhanlon']
7 [u'steve', u'icraigt', u'carrie_kane', u'anu', u'lucypepper', u'katcaverly', u'ricki']
6 [u'ernie', u'lucypepper', u'warriorgrrl', u'icraigt', u'topgold', u'lucyinglis']
6 [u'ernie', u'lucypepper', u'ward', u'icraigt', u'carrie_kane', u'topgold']
Update There is a follow up article available with updated numbers.