
A data scientist at Twitter, Edwin Chen, has used twitter to measure the prevalence of the term ‘soda’ versus ‘pop’ or ‘coke’ across the US, and the world. He compares his work to work done ten years previously on a survey basis, which reveals slight changes over time, but essentially concurs with Chen’s conclusions. In order to arrive at the data set, Chen had to clean the data by removing extraneous references. For example, references to specific drinks – like Coca Cola – were eliminated; and only those references to drinks were included. Then he was left with a pretty accurate picture as represented by Americans who use Twitter – and let’s presume for now that that’s a statistically accurate sample.
It’s an interesting kind of experiment, but the infrastructure for the experiment intrigues me more. Let’s say we can do more than simply isolate linguistic idiosyncrasies. We can figure out who says what where, but can we tell who likes what where? Can we determine whether – in a binary sense – likes or does not like a drink, or a product, or a brand? Well, it turns out we can. IBM (who I work for, but for whom this blog has no relevance or authority whatsoever) has a capability for social sentiment analysis called Cognos Consumer Insight, which allows us to do just that, creating a social sentiment index. The IBM website explains it thus:
The Index uses IBM analytics technology to analyze millions of public tweets, with the goal of creating real-time public opinion snapshots. They transform 140 characters from raw, unstructured data into valuable insights, gauging sentiment and following trends on a variety of topics from the retail, sports and entertainment, including major events like the World Series, the Super Bowl and the Academy Awards.
So far so good – we can understand geographic disbursement of language, and we can understand relative sentiment within that disbursement. the next step, however, is to begin to understand sentiment around ideas. Can we determine the extent to which a state bears current internal legitimacy based on social media sentiment? Is it representative of the population? It is often said that a government has lost its legitimacy, when it slumps in the polls (and people would prefer some other government) or when the leader changes mid-term without an election, as Gordon Brown did in 2007 in the UK, and Brian Cowen did in Ireland in 2008. But what of the state itself? We know that 30-40% of the people don’t vote in most Western Democracies. But voting is not the only thing that contributes to legitimacy. There is also the de facto acknowledgement or recognition of the state by its subjects – though the payment of taxes (which in a kind of twisted way could also be seen as tribute), engagement with state supported civil society structures, and even in the association of national or state culture, including sports teams that represent the country.

This is something I’d like to begin work on – collecting twitter data and measuring state legitimacy. But first, I need to develop a framework for measuring legitimacy, a kind of index. What are the components that go into measuring state legitimacy? I think also we need to measure some alternate identity factors – and I’ll have to look to psychology sources to model that. For each person, we could build a kind of personal identity pie (like on the right), looking at the various elements of personal identity. Now, personal identity is not the exclusive determinant of internal state legitimacy? I may not identify with the state, but I may engage with the state – like I spoke about above – taxes, civic society, and that in turn confers a degree of legitimacy. Is that it? In that instance, we could see Internal State Legitimacy as being the sum of the legitimacy expressed by personal identity plus the legitimacy expressed by state engagement (and therefore implicit recognition and acknowledgement).
So Internal Legitimacy (IL) is equal to the sum of the legitimacy conferred upon the state by each of her citizens through self-identification, or personal identity (PI1 + PI2 + PI3…PIn) plus the legitimacy conferred upon the state by the personal engagement of each of her citizens (E1 + E2 + E3…En). Right…my first formula:
IL = (I1 + I2 + I3…In) + (PE1 + PE2 + PE3…PEn)
We can measure personal identity through social media sentiment analysis as we have outlined above. It will be more difficult to identify personal engagement, but we’ll get to that later. And of course total State Legitimacy is equal to the Internal Legitimacy (IL) plus External Legitimacy (EL)? I don’t know about that yet – we shall see. But here’s that formula anyway.
SL = IL + EL
One thought on “Sentiment Analytics and Measuring Legitimacy”