We all should be sitting at the edge of our seats in the next couple of months. Change is inevitable but the change agent may be questionable. To get psyched for this last stretch before the elections, I apply natural language processing (NLP) on this week’s first presidential debates with a focus on polarity in sentiment. Visualizations in this post include interactive candidate polarity graph and word clouds.
Let’s greet the candidates first via important words from the debate. Each candidate in this election had a unique speaking style and appear in their silhouettes. “Country”, “American(s)”, and “jobs” are common in most political candidates. I find it interesting that the center theme of keywords for candidate Clinton is “people”, while candidate Trump’s center focus is his signature prompt “look…[statement to the audience].”
Hello Candidates via Word Clouds
Candidate Sentiment Polarity during the Debate
The candidates sentences and word choices provides opportunities for us to analyzing their messages, thought processes, intellect, suitability, and etc (pretty much anything subjective or objective about the candidates). Sentiment polarity is extracted solely based on the word choice and sentence structure used by the candidates. It does not include insights on the impact of sentiment or polarity on the observer. In other words, the polarity of a candidate is a highly subjective and it’s reception by the observer hinges on context.
If generally, a statement is positive then the polarity is positive (above the black horizontal line) and visa versa. Spending too much time below the black horizontal line makes people perceive you as a negative Nancy or Eerie disposition. This is nothing to do with any 400 lb programmers, who can be positive or negative in polarity at any given time.
In this context, descriptive statistics on Clinton’s (mean = .033, std=.016) and Trump’s (mean = .020, std=0.159) polarity show that Clinton is more positive overall in sentiment. Trump has larger swings in polarity, which is on par with a breakdown by linguistic experts of candidates’ language, see article here.
The interactive visualization below provides polarity by each candidate as they progressed through the debate.Responses from both candidates fluctuated between positive and negative polarity. The cadence of polarity is candidates start out easy, ramp up after the middle point, and taper at the end. This could be a pattern if we observe similar magnitude change in more speeches and debates. If you have a limited amount of time to listen to these debates, block out your availability for 35% – 80% points of the debate.
In readability and complexity, Clinton’s speaks to us at a 7th and 8th grade level, while Trump speaks to us at a 5th and 6th grade level according to text standard. In the world of elections, appealing to the most number of people and addressing the public at a lower grade level is not a bad idea. Speaking at a lower grade level increases “readability”, where Trump scored an 85.7 and Clinton scored a 74.19 on Flesch reading ease score (out of 100).
Notes about Natural Language Processing and this analysis
I need a bigger dataset for better conclusions. From the text and data analysis perspective, it is hard to draw conclusions from reviewing this debate in a vacuum. This was the beginning observations of the candidates from a different vantage point. Ironic how I am asking my (currently) unfeeling processor to tell me about sentiment. Luckily, University of Santa BarbaraI keeps historical presidential debate text for the public, see full data set. Additionally, some presidential inaugural addresses are available through the NLTK python package library.
More data could lead to better conclusions on candidate debate speeches. The analysis of 1,700 blog posts by this author paints an interesting picture about the intersection of sentiment, readability, and post popularity/sharing. Also lots of good links and visualizations for data readers to drool over.
*see github for code and dataset – TBD
**tools for analysis – Python packages for text analysis: textstat, TextBlob, pandas, nltk; Python packages for scrapping Text from Washington Post site: requests, beautiful soup; Python packages for visualizations – wordcloud, pandas plotting tools, plot.ly for interactive graphs and embedding