Social media analytics provide some promising new research tools and techniques, but they are not without their respective strengths and weaknesses. Some of their well-recognized weaknesses include:

  • Lack of error and confidence estimates. In fact, most social media analytics are quite short on even descriptive statistics that might be taken for granted from a data mining /data warehouse or other perspective. Currently, no social media intelligence technology provides standard statistical confidence and error estimates regarding how well the results can be extrapolated to a market or population as a whole, or even the internal descriptive characteristics of the study population.
  • Lack of standards and replicability. Every sentiment analysis provider has its own proprietary method for generating results, and there’s a lot of controversy about the general accuracy and replicability of sentiment analysis as a whole.
  • Noise. Traditional research methods generally don’t have to deal with people agreeing to be research subjects with the intent to purposefully deceive or to distort the research results - which is what flogs (spam blogs) essentially do. While most analytics providers claim to have processes for removing flogs, it’s unlikely you will be able to get an estimate of how much a given statistic has been contaminated.
  • Complexity of human languages. Consider the word “state”. It could mean “state of the union”, a physical state (gas, liquid), a verb (to state a fact). Many terms (tokens) used in sentiment analysis are subjective and contextual, and computers simply aren’t up to the task of spotting sarcasm yet. Understanding sentence structure is difficult even when proper grammar and spelling is present, which is equally rare. Many sentiment analysis tools assign weights based upon sentence proximity instead of full natural-language processing rules-based sentence structure parsing (you can guess why); in the sentence “Joe can pretty much always be a jerk Jack however is usually nice” might interpret Jack as the jerk!
  • Representativeness. We know from “digital divide” discussions that online users aren’t necessarily representative of all Americans (the Pew Internet and American Life project has numerous examples). We also know from other research that as little as one in seven participants in social media actually contribute content (See the 12/2007 Pew study on Teen use of Social Media). The end result is that what the analytics measure is a very skewed portion of the public, despite the assumption is that buzz and sentiment findings are representive of the public as a whole.
  • Artifacts.  While most social media is syndicated via RSS, an XML-based structured data format, the actual content is usually embedded into arbitrary HTML-based websites which must be scraped.  Blog / discussion forum and other content can easily be contaminated with artifacts such as navigation items, advertisements, and static text.
  • Firewalls.  Unfortunately, some of the most interesting social media markets limit access to spiders and crawlers, such as Facebook.

In contrast, social media analytics do have some benefits over most traditional research methods:

  • Cost. Of course, you can get some big price tags with the big firms, but in general social media analytics are relatively automated, and often can provide some relatively inexpensive datapoints. I predict the number of analytics providers will double in the next year or so (Nathan, are you tracking the number of providers over time?), and even the mid-market firms will have to drop prices to keep competitive.
  • No Contamination Bias. Social media actually provides one of the first research environments in which the analyst is truly invisible to the subjects. While it can be debated at length how accurate or inaccurate peoples’ online personas are, at least their reactions will not be colored by the presence of the researcher.
  • No Interval (Timing) Bias. Social media analytics are performed nearly instantaneously. If traditional research is conducted on a population/market over time - perhaps because it would simply take a considerable amount of time to reach everybody - a certain amount of inaccuracy can be introduced when the participants at the beginning of the study are separated by time from those near the end.
  • Access to markets. Social media provide instant access to tremendously large markets and key demographics, on a scale that would be prohibitive to address using traditional media. Performing sentiment / public opinion analysis across 66 million individuals (e.g., bloggers) would be impossible using survey research!
  • Instant Feedback. Speaking from considerable experience, traditional research takes time. With proper controls, traditional methodologies can provide snapshots, and even analyze trends - separated by points in time. Social media tools can track changes more or less as they happen, in a consistent manner.
  • Access to Advanced Research Tools. Sure, sentiment analysis and semantic natural language analysis has been around for decades, but the expertise and technology required to perform it has been pretty much out of reach except for academics and organizations with large data-mining operations. As you can witness from the explosion of sentiment charts tracking every presidential candidate this election year, there obviously was a need for these tools; now they are fairly affordable… and are improving quickly.
  • Exploration. With the right tools, social media applications allow the analyst to experiment and explore different ideas and research hypotheses relatively easily. With traditional research, the analyst better have the research goals, parameters, and methods carefully planned before initiating the research - one simply does not have the luxury of “redoing” a survey! However, if a “blogpulse” search starts producing poor results, or the parameters given to an analytics provider is yielding poor responses, it is no big deal to adjust and try again.

Thanks to “Text Mining Application Programming” by Manu Konchady (2006) and “Mail and Internet Surveys” by Don Dillman (2000) for some thoughts in this post.


Share/Save/Bookmark Subscribe | Permalink | Trackback

 

 Next Post: social media shouldn’t stand alone

 
COMMENT by Nathan Gilliatt

Wow, I guess you’ve thought about this. I haven’t compiled a timeline, but I do ask companies when they started offering social media analysis services, so it wouldn’t be hard to put together. Clearly, though, we’re seeing new companies enter the space all the time.


COMMENT by Jonathan Moody

I cannot speak for all online sentiment insight providers, but here I state what we do (and don’t) at ASOMO:
Lack of error and confidence estimates: We calculate standard statistical confidence and error estimate. I agree that extrapolating an online sentiment sample to a particular market or the population is difficult and/or questionable. We never claim to do this. What we do do is to use our search technology and statistics to calculate the total opinion universe on a particular brand/product in particular languages and offer a percentage of that.
Noise: Yes people can and do post falsely just as they can and do reply falsely in surveys. We cannot detect / eliminate all these references. However false they are, they are still going to impact on your clients reputation - they need to know about this!
Complexity of human languages. All our references are analysed by a multilingual expert human team.
Representativeness: We provide the Alexa Traffic ranking for all media in which references are analysed as well as the human team scoring each website according to factors that encourage people.
to read or participate in the website, thus gauging to some extent it’s impact: interactivity, originality of content and frequency of update. We never claim that online sentiment can replace other forms of market research - it is a comliment and an important one because, unlike other MR (which is essentially C to B communication), this opinion influences other people’s decisions (C to C communication).
Artifacts / firewalls: That’s why our Information Broker Team supplement our search technology platform with manual searchs (yes, trawling through fourms to find references). See http://ijor.mypublicsquare.com/view/sense-and-online
for more on this.
I




Leave a new comment

(required)
(required)