*Social media* has a many layers and, in the business and data senses, it is growing up nicely. Social sharing platforms provides developer access to their data, such as membership interactions and status updates, which can come as emotional outpourings, diatribes, celebrations, and affirmations. Armed with time (most difficult thing to come by on this list), stack overflow, reference materials, and an open source coding tool — anyone can quickly #oneup your *social media listening* skills. Not a bad skill to flaunt around, since positions managing and creating content on social media are increasing and relevant in every sector and job function. Now, adding the third word – listening – gives social media scouring, participating, and downloading another lift in professionalism.
Most of the time working in only two dimensions is not as fruitful. What other datasets would be useful or interesting to mash up with tweet sentiments?
- Stock market data — @BarackObama
- Social Stability in Russia/Ukrainian — @WeAreUkraine
- Canadian and U.S. Relations — @usahockey
- Additionally, TIL there is Diplomacy tracking on twitter too (per wiki)
I took a great introductory class on *social media listening* from General Assembly in Los Angeles with
@kaitlinmaud. Kaitlin walked through her process on reviewing brand positioning on social media and marketing channels, as well as shared her through process.
Inspired by learning more about social listening, I started playing around in R with twitter data. First working out the authentication process and document. Then, figuring out interesting ways to display and combine information. For my data science data products course, I created a shiny app comparing tweets by digital fashion brands, such as @WhoWhatWear, @refinery29, @Fashionista_com: Digital Fashion Brands on Twitter. This example demonstrates sentiment analysis of tweets by brand mentions. Text was cleaned and allocated affective ratings from a dictionary of 14,000 English Words. The dictionary was compiled by Amy Warriner and Victor Kuperman, found here. Simple can be a good approach but upon reviewing this approach for twitter, the algorithm needs more. Warriner’s dictionary is not contexualized for colloquial language or the visual nature of words on twitter, thus a twitter specific ratings word list would increase accuracy of twitter sentiment in this simple way.
Notice, this approach in this set of code for sentiment analysis is the most *simple* approach, applying a positive or negative values to a list of single words. In cleaning the text, I removed uppercase, symbols, numbers, and lemmatized the words. In further analysis of sentiments in tweets, grouping of words (ngrams), word relationships to each other, and capitalizing are features that should be added to measure sentiment. *Note to self for v2.0* Also, natural language processing packages in Python seem to have more options than the ones I researched for R.
I looked at a list of the top twitter handles and decided to apply this v1.0 of sentiment analysis on famous people. Even if the *simple* approach may have inaccuracies, there is potential value in comparing the outcomes…..perhaps or not. I officially know that this twitter sentiment approach is broken because it tells me people like @justinbieber, @iggyazalea, and @sarahpalinUSA. According to BuzzFeed, people are apparently not fans of Ms. Azalea.