Metric Failures and Data Assumptions: 4 Myths of Social Analytics

kloutfinalFor years now, basic assumptions about social analytics have corrupted social media analysis. From thought leaders and developers to consultants and community managers, professionals in this field have tried to aggregate, analyze, and understand social content, yet few actually appreciate the technical limitations of social data. I believe this lack of understanding is one of the primary reasons that the field of social media analytics is fraught with opacity, blind competition, inaccuracy, and unreliability. The following myths are a few of my social analytics pet peeves that I think are worth sharing in this initial blog post. (I’m always looking for feedback and dissent, so be sure to leave some constructive criticism or hate speech – whichever you prefer.)

Myth #1: ALL Social Media Is Analyzed

In spite of its public perception of privacy infringements, Facebook remains a supremely private medium in which most users share content within private networks that are entirely unavailable to social media analysts. At most, only 25% of Facebook is public (that’s a conservative figure, I’ve heard it’s closer to 7%), yet social analytics tools (and their users) continue to compare Facebook volume – side by side – with that of Twitter volume and other social media volume. The issue here is not a difficult one to grasp, yet it reflects an absolute misunderstanding of the Facebook Social Graph API.

Screen Shot 2013-02-07 at 2.28.29 AM

A consequence of this industry standard is – almost surely – the overemphasis of Twitter’s importance and the underemphasis of the importance of various other social media. Don’t get me wrong; I absolutely love Twitter as one source of data & insight, but we cannot continue pretending that it is the world’s primary medium for shared opinion. Facebook users are NOT quiet, they are private. Well-established social media monitoring tools (like Radian6 and Brandwatch) as well as newer social media analytics tools (like Viralheat and 1000 other tools) provide parallel social media analyses, often without any disclaimer of their inaccuracy (or API rate limits, for that matter). And while a 2010 Radian6 blog post was written on this subject, Radian, its users, and its competitors continue reporting this brand of inaccuracy.

Screen Shot 2013-02-21 at 3.46.49 PM

Take, for instance, a social media volume report on the Summer Olympics published by Bluefin Labs (acquired by Twitter in early February), back in August 2012. While this report is (slightly) better than most in its reference to Facebook data as “public Facebook data” (and not just “Facebook data”), this Bluefin infographic nevertheless leads one to believe that it has fully incorporated both Twitter & Facebook into its social data volume assessments.

The infographic states that Twitter comments represented 97% of Olympic-related comments: 1.1 million Facebook comments and 34.9 million tweets. From my standpoint, this percentage seriously lacks indicia of accuracy and transparency. So… is anyone actually surprised that Twitter acquired Bluefin labs a couple weeks ago? I didn’t think so. It’s thanks to companies like Bluefin and Radian that are keeping Twitter alive… no wonder Twitter has created their “Certified Products Program.” We should all be so lucky to bow down to the all-mighty Firehose. What a racket.

Myth #2: Social Media Data is Clean Enough to Analyze

Spam-bots undermine nearly every dimension of social media analytics. Sorry to break the news to you, but @sexygirl20345 doesn’t actually want to chat with you – or anyone else, for that matter. Truthfully, the impact of bots on a given analysis likely depends on the type of analytics you are performing, but rarely do bots not impact or corrupt a social media analysis.

85799b_spam

Performing a network-interest analysis? Well, guess what: 35% of each of your nodes is going to be influenced by robots. Performing a text analysis of your brand-mentions or a conceptual keyword? You guessed it; more than likely, your results are going to be packed full of spam and noise. If you’re not making the requisite methodological adjustments, it will be at the detriment of your analysis and your KPIs – no matter if they are NLP-based, stats-based, or just some run-of-the-mill sentiment analysis.

Fortunately, Twitter has made somewhat-recent strides toward curbing the impact of bots and spam, but their efforts do not go nearly far enough. Skeptical? See for yourself. Take any given concept (a brand, person, or any word) that you are curious about exploring. Create an account at http://www.shingly.tv, run your query, come back in a day or two, and take a look at the proportion of tweets / users (say nothing for the percentage of those users that are robots in the first place). In any event, I’ll bet serious money that the tweet-to-user ratio is going to be well below 75%. Social analytics platforms need to start normalizing this kind of data… pronto. Otherwise, we’ll continue to base our KPIs on data that is highly skewed toward the robotic and the ultra-vocal.

Extra reading: http://measurematt.tumblr.com/post/17432329170/why-radian6-is-wrong-for-you-part-i

Myth #3: Influencers Should be Targeted

bullseye-target-dart-boardSo you’re on a product’s social media team and you’re trying to get a sense of “who’s dominating the conversation” about your brand. So – of course – you do what you’ve always done: you log into Radian or Topsy or Brandwatch, you figure out “where the conversation is happening” and then engage the “influencers” to the extent that your department’s budget permits it. Then – just as your Radian6 client rep recommends – you calculate the time and money you’ve spent and divide that by the “reach” of the influencers that you’ve convinced yourself that you’ve influenced. Cost divided by Audience = Social ROI, right? Yeah… that’s a load of crap.

Admittedly, you can make the case for this outdated method (and KPI calculation) from a basic customer service perspective. Yes, when people with a lot of followers trash your brand or complain about your product, you’d be crazy not to engage those people; and all the research on brand-advocacy indicates this. Nothing new there. But if you still think that you should influence a conversation around – merely – keyword matches, then you are living in the Stone Age. The wealth of social data and metadata that has been amassed in recent years gives obvious rise to entirely new methods of engagement – according to networks and associated-interests. But in spite of this data-wealth, we are still not utilizing it for communication, engagement, outreach, or product development. This is nuts.

Example: Say you are working with a brand that is targeting white moms on & off social media. If you’re still living in the Stone Age like most professionals in this domain, you probably have an expensive dashboard of topics and sentiments about your brand, and with those limited insights, you are targeting your white-mom demographic according to gender-stereotypes that would resonate with Phyllis Schlafly.  In all seriousness, you desperately need to start rethinking how you are analyzing this data and begin to consider the extraordinary amount of data that you’re neglecting. Consider that there are 149,707 users on Twitter that are self-identified “Mother[s] of [one/two/kyle/etc],” (most of whom are white, by the way;

28713994if you’re looking for self-identified African American moms, look into #teammommy). If you’re not tracking what those individuals are talking about, who they’re talking to, who they are increasingly following, and how their interests are changing overtime, then you – like most professionals in this domain – are squandering critical data opportunities. Welcome to the new age of data. (In case you’re curious, white moms are increasingly tweeting about astrology, not about being a mom.)

Myth #4: Sentiment Analysis Works

Having worked closely with a text analytics company for nearly two years, I can comfortably say this: developing sentiment analysis technology is a bit like working in fast food… it looks fine from the outside, but you really don’t want to know how it’s done; it will only disappoint. In other words, consider all sentiment analysis – no matter the “guaranteed accuracy rates” – in beta.

college-savings-payment-tweets-sentiment-analysis

It’s a noble ambition that has, by-and-large, produced pretty visualizations (cough-Crimson Hexagon-cough) and provided social media analysts the opportunity to create quick Powerpoint presentations, to point to the color red, and to ask for a larger budget. To Seth Grimes, Tom Anderson, Topsy, and the rest of the social sentiment club: keep fighting the good fight and let me know when it actually gets somewhere.

My advice: Overlay network analysis with topic modeling and NLP. If you actually have the time (and, let’s be honest, nobody really does), manually code a random sample of your data and include your social sentiment MoE stats with your sentiment analysis report. If you don’t, you will be taking for granted a finicky & opaque technology and, by extension, misleading your stakeholders. Sorry, but humans just aren’t replaceable yet.

Wrap-up:

When it comes down to it, the field of social analytics is still very much in its infancy and industry-standards need to evolve. Greater data transparency, more nuanced network analysis, and more substantive trend analysis will be key components in the next generation of social analytics.  Ultimately, we need to begin the process of re-educating analytics professionals in the complex realities of data analytics’ next generation. And while platforms adapt to this new way of thinking, they need to make every effort to articulate the ROI of these new methods, which will be the driving force of the industry’s evolution.