Presentation by Pedro Boueke, June 2017
"The front page of the internet"
It's a content centered anonymous social network.
7th most popular website on USA, 22th on the world.
Reddit owes its success to its comments section. Upvotes, downvotes and replies.
Each thread a tree
Each comment a vertex.
Each reply an edge.
Great for modelling.
Excellent for a mathematical analysis.
Interesting topological metrics.
And more!
To create a model capable of representing the
real thing.
Studying the real structures and being clever.
(trying to)
Kaggle's May 2015 open comments dataset
Get personal with a dataset of comments from May 2015
About 15 million comments.
~30GB
Power laws everywhere.
Many interesting distributions related to the tree topologies, height density and comment degree.
(more [PT-BR] here)
How to recreate trees with such distributions in your garage?
Think of the Barabási-Albert and Price models.
Now think of how Reddit is used. How its users behave and how content is ditributed, ranked and shown.
Perfect
The "R(t,p)" model.
A Reddit comment thread tree generating model.
A simple aproach on how a reddit user comments.
Based on an interative process of random walks guided by preferential attachment.
Distinct results for distinct values of t and p.
Relation between N (size), p and t.Relations between height and width.
Relations between the parameters: t x p
(more [PT-BR] here)
How does the model compare to the real thing?
Can be hard to compare whole subreddits with static parametrizations.
Parameters change everything. Distinct subreddits have distinct parametrizations.
The probability function p greatly influences topology.
(more [PT-BR] here)
Could be better
Test variations of the model.
Try new probability functions.
More statistical analysis.
Analytical studies.
Collaborators: pboueke (me) and gthurler.
Our repository with a python implementation.
[PT-BR] The first presentation on the subject.