A first attempt at understanding how Red Sox and Yankees fans feel about their home team, via Twitter data analytics.
August 25, 2016 | Data Science
Data was collected using both the Twitter Search API and Twitter Stream API. Data spans ten days for Yankees and eleven days for Red Sox.
Questions to consider: Do we expect a lot of positive tweets near Boston on days when the Red Sox win? Do we expect negative tweets in other regions of the US? What about on days when they lose or don't play?
There are significant limitations present in our data. Data was collected at a random time each day, which may have been prior to, during, or after that day's game. Also, a set number of tweets were collected for each day (1000).
Red Sox
Though we would need to alter our data set to see this, another interesting question would be: is there a lag in sentiment? If the Red Sox lose today, do we see a spike in negative sentiment tomorrow? We would want to query to get all the data points that occur after a Win day or after a Loss day, rather than on a Win or Loss day.
Yankees
The following SQL query allows the 'coin-stacking' style of data points if desired.
In order to gain more insight, we will need much better data collection methods and more historical data.