ylliX - Online Advertising Network
My process for learning new languages

Sentiment Analysis with NSTagger: Ranking popular subreddits by the negativity/hostility of its comments


I have been feeling that Reddit is well on its way to taking away from 4chan the title of internet hate machine, because even when a subreddit is themed around happiness it takes little to no effort to find extremely hostile comment chains with complete strangers arguing about the most pointless things. I was curious to see what this looked like across different subreddits, so I decided to use Reddit’s APIs and iOS’s built-in sentiment analysis tools to visualize how negative a subreddit is.

This is not a new problem, and you can find many GitHub repos of people who’ve done similar things with Reddit comments in the past. I started this “project” by modifying this script made by hein-j, but I wasn’t satisfied with the results being given by the usual Python NL frameworks. I thought they were being way too eager at saying that a subreddit was neutral when they’re in reality notorious for being negative (maybe they’re not the best at detecting passive-aggressiveness?), so I wondered if I could get better results by using iOS’s NLTagger that has been available since iOS 12.

With a simple setup of extracting comments and running them through a Swift script, I grabbed a couple of popular subreddits, analyzed the comments of the top 10 submissions at the time and plotted the output by pasting the results in a Google Sheets doc. If you want to try this yourself and/or tweak the parameters, you can find the code I used at the bottom of the article.

Alt
Alt
Alt

Info and Comments

  • Neutrality was defined as sentiment scoring between -0.5 and 0.5. The huge majority of values in this range were on the negative side.
  • Happy is the only subreddit that scored higher in positivity than in negativity.
  • Despite naturally ranking high in positivity, subreddits themed around happiness showed to still contain an overwhelming amount of negative comments.
  • Subreddits themed around advice/knowledge ranking as more hostile than subreddits themed around actual hate is interesting, but not surprising. They are famous among redditors for being hotspots for insecure individuals and often host some of the lowest-quality discussions in the platform.

Code

Python

The purpose of the Python script is to connect to Reddit and dump the comments of a subreddit into a JSON file. The script requires praw and a praw.ini file in the project’s root named bot with the following Reddit App parameters: client_id, client_secret, and user_agent. See here for help with creating a Reddit App and here for help with praw.

Make sure to modify the output path of this script before running.

Usage example: python comments.py getmotivated

import praw
import sys
import argparse

def parse():
    print('parsing arguments and options...')

    parser = argparse.ArgumentParser(description="Get the comments of a subreddit")
    parser.add_argument('subreddit', type=str, help='name of subreddit')

    return parser.parse_args()


def gather(subreddit):
    print('searching subreddit for key phrase...')
    relevant_strings = []
    print('gathering texts for analysis...')
    try:
        for submission in subreddit.hot(limit=10):
            print('...')
            if submission.selftext:
                relevant_strings.append(str(submission.selftext))
            for comment in submission.comments.list():
                if isinstance(comment, praw.models.MoreComments):
                    continue
                relevant_strings.append(str(comment.body))
        if len(relevant_strings) == 0:
            raise Exception
        return relevant_strings
    except:
        sys.exit('ERROR: No posts were found for the provided subreddit and key phrase.')


args = parse()
subreddit_str = args.subreddit

print('establishing reddit instance...')

try:
    reddit = praw.Reddit("bot")
except:
    sys.exit('ERROR: Failed to establish a reddit instance. Have you correctly set up your praw.ini file? See README.md for more detail.')

print('connecting to subreddit...')
subreddit = reddit.subreddit(subreddit_str)

# Get user inputs to analyze
relevant_strings = gather(subreddit)

joined = "||aa||aa||aa||aa||".join(relevant_strings)

#open text file
text_file = open("~/Desktop/r"+subreddit_str+".json", "w")
 
#write string to file
text_file.write(joined)
 
#close file
text_file.close()

Swift

The Swift part of the script loads the json files dumped by the Python script and runs them through iOS’s NSTagger. Make sure to modify the input path and subreddits in the script to the subreddits you’re analyzing.

import Foundation
import NaturalLanguage

let subreddits = ["gaming", "wholesomememes", "funny", "technology", "eyebleach", "dogswithjobs", "aww", "comedyheaven", "iamatotalpieceofshit", "mildlyinteresting", "mildlyinfuriating", "upliftingnews", "politics", "mademesmile", "interestingasfuck", "memes", "science", "animalsbeingbros", "askreddit", "relationships", "happy", "getmotivated", "rarepuppers"]

for sub in subreddits {

    let str = try! String(contentsOfFile: "~/Desktop/r\(sub).json")
    let arr = str.components(separatedBy: "||aa||aa||aa||aa||")

    var negative: Double = 0
    var neutral: Double = 0
    var positive: Double = 0
    for input in arr {
        let tagger = NLTagger(tagSchemes: [.sentimentScore])
        tagger.string = input
        let (sentiment, _) = tagger.tag(at: input.startIndex, unit: .paragraph, scheme: .sentimentScore)
        let score = Double(sentiment?.rawValue ?? "0") ?? 0
        if score > 0.5 {
            positive += 1
        } else if score < -0.5 {
            negative += 1
        } else {
            neutral += 1
        }
    }

    let total = negative + neutral + positive

    print(sub.capitalized)
    print(negative / total * 100)
    print(neutral / total * 100)
    print(positive / total * 100)

}



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *