Reddit Hole

Exploring Subreddit Sidebar Links

By Devon Bain and Mikaela Brown

Algorithmic recommendations, such as recommended videos on Youtube, are widely cited as problematic. But what about recommendations generated by humans? In this interactive data visualization, we map the ways in which links on reddit can lead users down "rabbit holes" from mainstream communities to more niche, sometimes toxic ones.

The network structure of sidebar links

On Reddit, every subreddit community has a sidebar where the moderators often put information and links for their users to read. The links may go to other subreddits, sometimes to promote them or to suggest further reading. These links form a directed graph, where nodes are subreddits and edges are links between them. The links make it possible to take a path from one community to many others in a few short steps.

While this property may allow users to explore their diverse interests, it may also lead them to smaller communities that harbor more extreme or toxic content. They may also become stuck in an echo chamber with limited access to opposing views.

Sentiment Analysis

"Toxicity" is a difficult property to measure. As a proxy for toxicity, this visualization uses sentiment intensity. By averaging the strength of the positive or negative tone of 100 prominent posts on a subreddit, we get a single sentiment score between 1 and -1, where 1 is most positive and -1 is most negative.

More Positive Tone

More Negative Tone

This sentiment analysis has limitations: for example, it cannot detect sarcasm and it does not recognize coded terms used by communities in ways that differ from their original meaning. Along with each subreddit's sentiment score, we provide example posts to give context to the score.

Choose a subreddit to start:

back one step

Your path so far:

Subscribers

Sentiment Score

Example posts

notes

Method

Data Scraping

Data was gathered from Reddit via the Reddit API, using Praw and BeautifulSoup.

Sentiment Analysis

The titles of the top 100 posts from each subreddit were scraped. This text was run through nltk's SentimentIntensityAnalyzer and the combined score was used as the sentiment score.

Selection of Starting Points

The six starting points were chosen for their size, number of outlinks, and prior knowledge about the content they contain. All have over 100,000 subscribers, and according to redditmetrics.com, all are in the top 1000 subreddits (out of 1.2 million). Their posts regularly get enough upvotes to end up on Reddit's front page, or they are infamous and receive external media coverage. All contain, or are potential gateways to, content that embodies common techniques of media manipulation and disinformation online.

Network Visualizations

Click here to view the data above in network form.

This visualization displays all the data at once. You can drag around the individual nodes, or hover to see more information.

network example