Saturday, December 11, 2010

Using CrowdFlower for Sentiment Analysis

I recently used CrowdFlower for sentiment analysis of tweets (for CS 424P). I just want to share a few tips that I learned on the way:

  1. Use gold units. This is probably the best feature of CrowdFlower. You can specify gold units (using the gold digger). The gold units are then randomly inserted into jobs. By default, a user needs to answer at least 4 gold units in order to become trusted. This a great way to ensure the quality of the work.
  2. Target the appropriate country. For my project, I was analyzing sentiment towards Obama. Initially the jobs weren't targeted, so people outside the US were assessing the sentiment. I noticed that many people outside the US would provide bad labels, probably because the tweets were so US-centric that it was hard to understand the context for outsiders. After I targeted the US, the quality of the labels went up significantly.
  3. Use the gold report. In the gold report, workers sometime provide feedback on the gold units. Also, you can find items with high disagreement in the gold report. I had made a couple mistakes in the gold unit, and I was able to find them through the gold report.
  4. Expect issues with the API. I kept hitting bugs with the API, which was the most frustrating part of CrowdFlower. For example, sometimes jobs would complete but their status would remain "Running" indefinitely. CrowdFlower confirmed that this was a bug. If you're going to use the API, you should start early so that bugs don't delay your project.
  5. Try your job as a worker on mturk.com. It's very educational to try your own job on mturk.com. You'll understand how the workers see you project, and how they get paid.
  6. Read the general Amazon Mechanical Turk tips. This is the only document I could find on best practices for crowdsourcing. I only read this after I did my project. I would have saved time if I had read their advice before my project. Instead, I ended up learning their tips by trial-and-error.
Overall, the experience with CrowdFlower was okay. I wish documentation was a little bit better and that their API was more reliable, but it got the job done. Let me know if you have any questions or need any help.

Monday, September 27, 2010

Crowdsourcing and Sentiment Analysis

Quick Summary: I am taking a class called "Extracting Social Meaning and Sentiment" at Stanford this quarter. For the final project, I will explore how to best combine crowdsourcing and automated sentiment analysis. This should yield higher accuracy, but it will come at a monetary cost. If you would like us to optimize the sentiment analysis results for a particular keyword, please donate money (~$20) by clicking on the following button:

We'll then contact you to ask what keyword you're interested in optimizing. The money will be used to pay the crowdsourced workers. This offer stands until December 2010.

Background: There are two schools of thought in the sentiment analysis community:
1. Crowdsourcing. The basic idea behind crowdsourcing is that you can pay a group of people to classify a tweet, through a service like Amazon Mechanical Turk or CrowdFlower. There are many workers on these services, so it can take just a few hours to classify several thousands of tweets. I've found that spending about 2 cents per tweet, with the correct settings and tuning, can yield higher accuracy than any automated solution (described next).
2. Automated sentiment analysis. Many computer scientists and linguists are developing algorithms to automatically detect sentiment analysis, without human intervention. We have a whole list of these solutions in our Resources spreadsheet. Also, this is how Twitter Sentiment works today. The problem with automated sentiment analysis is that it's far from perfect. My gut feeling tells me that some human intervention is required to get accuracy to the next level.

Motivation. We want to combine the best of both worlds. By combining crowdsourcing with automated sentiment analysis, we think that we can significantly increase accuracy while being cheaper than a fully crowdsourced system.

Possible solutions. Tweets have unique properties that could be examined to make crowdsourcing more efficient. For example, an easy way to boost accuracy would be to make sure that very highly retweeted items are classified correctly. If a status is retweeted 500 items, sending one example retweet to a classification system with higher accuracy (i.e. our crowdsourced system) would be worth it, in order to get all 500 correct. We have other interesting ideas like this.

Why we need your help. Answer: money. Crowdsourcing can be expensive, especially when you start classifying thousands of tweets. Unfortunately, we don't have a budget to run these crowdsourcing tests. Also, it would be good to have some practical scenarios to work with, rather than contrived trials that don't represent the needs of the real world.

Cost. The money will be used towards paying the workers to classify tweets. At about $0.02 per tweet, you could classify 1000 tweets for $20.

What do you get? We'll work with you on classifying a large set of tweets. We can gather the tweets for you, or you can send us a batch. We think that we can classify tweets at a higher accuracy level offered by automated solutions, while costing less than a fully crowdsourced system.

Interested? Simply fill out our feedback form or contact me at alecmgo at stanford dot edu with the query you would like to track and the time frame. We will then respond with an estimated cost.

Thanks,
Alec

Thursday, June 3, 2010

Tracking sentiments on Twitter over time

The sentiment timeline on Twitter Sentiment is a very useful feature that allows you to track sentiments towards a particular query term over time. By default, Twitter Sentiment tracks sentiments for popular queries like "Google", "iPad", "Obama", etc. But you can add your own custom queries to track - For instance, I've been tracking the query "Indian Cricket Team" since May 17th (i.e. after the team's T20 World Cup debacle):



The timeline is helpful in two ways. First, it gives you an indication of how much buzz there is around the topic, i.e. how much it is being talked (tweeted) about. Secondly, it gives you an idea of the sentiment towards the topic.

As the graph above shows*, there was still some residual anger towards the team for a couple of days after the end of the world cup, but then it settled down for a while. On May 27 (PDT) however, there was a surge of negative tweets about the team, following their loss to Zimbabwe. Sentiments started improving (the positive line climbing up and the negative one coming down) on May 29 after their victory against Sri Lanka... but soon came the announcement about the Indian team not being sent to the Asiad and sentiments started being negative again on May 31. And as one can expect, it only grew worse after their second defeat (and a very embarrassing one) to Zimbabwe on June 2.

Update (12/28/2011): We temporarily disabled this feature. If you would like to track queries over time, please let us know by describing your use case, so that we know how to prioritize this feature.