I Can See Your Eyes Rolling
From my blog analytics, I can actually see the eyes rolling as my audience quickly skirts my Cloud Computing posts to go to the old stuff (back when I was funny). In some ways I understand the ambivalence. I perhaps like you are not overtaken by awe with virtualization and utility computing that defines cloud computing today. I would call this more cloud hosting than computing. My urge is to go beyond this in an attempt to grasp the computing part of the cloud. Understand what new capabilities may be facilitated and computations performed. So why should you care?
I know. Why be concerned with the future when we barely understand the present (from an analytics perspective). What happens to our data as more customers go mobile? How are we suppose to deal with social media and mashup tools such as Kickin or Cooliris or Yoono, which was the topic of my first blog post “How Web Analytics Sees Reality”? How do we combine brick and mortar marketing with on-line marketing? Is it all new or the old repackaged with a new name? Is Viral Marketing just another name for publicity stunts?
My natural instinct when I seem to be bobbing up and down like a cork on an ocean in a hurricane of change is to attempt to see the larger picture in play. What is turbulence (noise) and what is a meaningful trend? I found out recently (from reading Donella Meadows’ “Thinking in Systems – A Primer -”) that this is an aspect of “Thinking in Systems”, looking at events as part of larger system processes with longer time-lines. As we move from reactive analytics (what happened and why?) to predictive analytics (what will happen and how much?), this is an instinct and discipline that should be nurtured by all in analytics.
But returning to Cloud Computing and why we as analysts should care and have some understanding, I will give three reasons. As you may be aware, I am not big with posts that give lists of simple answers that can be quickly absorbed and just as easily forgotten. In this case will make exception and try not to spin the topic into simplistic babble.
1. IT is Part of Our Job Description
Someday if not already you will find yourself across the table with someone in IT (at Yahoo! the particularly stubborn ones were aptly called Paranoids) whose job description is to maintain the status quo while your objective will be to change the quo. He will have no reason or motivation to understand what you are trying to do, so it will be up to you be the bridge from your problem into what IT is concerned with.
You will not necessarily have to write the code or even design the solution. But you will need to cross the bridge and present options that IT would be interested in understanding and following up with action. Cloud Computing and its concern with scalability, availability along with elasticity and utility presents a lot of options that must be understood by all parties. This includes us data wranglers, since data movement and processing are significant aspects effecting these options. Leading us to reason two:
2. Analytics is typically the First User of Cloud Computing
Some might argue pornography as the first user but all the clicks to all those sites had to be counted by a much larger entity. Web analytics has always had to scale larger than its largest customers. EBay is only a portion of Omniture’s traffic. They don’t have to serve all the content but they do need to record all the transactions.
Having started in this business at the beginning of the switch to Web Analytics as a Service, I have had first hand experience in meeting collection and processing demand. This meant implementing map reduce before the term was even invented. In my case, we developed appservers based upon the actor model of concurrency – Map Reduce on steroids. It was the same struggle for all the other analytic companies. If you want to go really deep, I have published the documentation for the system I developed called LOAF with JAM.
Even the large companies I worked for (Yahoo, Overture, EBay), processing and warehousing data is a central aspect of their business model. This is one of the reasons that Yahoo! has gone “big” on Hadoop server farms having lashed together some of the largest outside of Google.
Even Google Analytics takes a small niche analytics company and hosts them in Google’s cloud to become the commodity provider of web analytics to most of the world. This has not abated! I recently meet with a startup with relatively few customers (couple 100) but yet has collected several petabytes of data that must be mined and reported. All these are possible through the evolving discipline of distributed (i.e. cloud) computing.
Now most of us can have a good career simply trying to get IT to put Web Analytic Provider tags on our sites. At some point we may be wildly successful and actually get the business to look at the data and base decisions upon the insights derived from that data. At which time you will be cubby holed with IT to get new capabilities on line.
At some further point, the business is going to out grow the standard offerings and attempt to develop in-house analytic capabilities and tools. This most likely will also become the business’s introduction to cloud computing platforms such as Hadoop or data warehouses such as Teradata or Interwoven or Customer Experience Management such as Tealeaf. As the web analytics person you will be in the middle of this, maybe even the cause of it.
3. Cloud Computing is the Reality We Are Attempting to Measure
In McLuhan-esque terms – Cloud Computing is the Medium and as we all know, the Medium is the Message. Who knew? You might have thought that Google Search was the message, or Facebook or Twitter or Foursquare or Wrrl or maybe Farmville. These are but ephemeral manifestations enabled by the underlying medium. Ultimately it is the medium that changes us and not specific babbling that crosses the medium.
The message is not Justin Bieber but the fact that an entity that happens to be Justin Bieber can consume 3% of all Twitter’s infrastructure and that with all the freedoms that we can exercise, we as a mass can coalesce around an entity such as Justin Bieber.
Justin Bieber is the Britney Spears of Twitter, who was the Madonna of Search, who was the Cher of MTV, who was a hippie.
The cloud has manifested itself in Web 2.0 first as search and then as social networking and micro-blogging. It is now mobile and in the future will become something else. It will be advancements in computing more and faster that will drive the medium to other manifestations in Web 3.0.
Is Cloud Hosting sufficient to change our reality by enabling all the X as a Service providers to finally have a viable business model or does the Cloud have to actually do work? It appears that an entire economy is growing around apps empowered by the cloud. What is the potential of these AppStores harnessing cloud computing to shape our world? These are questions I am trying to understand.
The reason why I believe it is important to understand is that our first duty in analytics is to understand the reality we are trying to measure and then determine how we track and measure that reality. Advancements in cloud computing will be the enabler of all the realities we will be tracking in the future. Perhaps this is a more fundamental perspective than most would want to take, but I hope that my contributions will reduce the effort for my readers.
What is Coming!
It has been an interesting journey, which started in mid summer. I thought I saw a narrative arc for researching and organizing the material for understanding cloud computing, but then events intervened and converted it into more of a corkscrew.
It started with Google’s dropping Wave in August, which to me was the most ambiguous mash-up to date. It illustrated what could be accomplished with Google’s massive computing power. Why it failed is a proverb of our times, cautioning us on our punditry about the future. More recently is Larry Ellison’s brouhaha with HP over Mark Hurd and Oracle’s announcement of their “Cloud in a Box”. Don’t want to go all Glen Beck but all these events are related.
The result is a white-paper that consists of over 70 references and 40 citations that covers the conventional view of cloud computing [here] and what Google is doing that goes beyond conventional approaches [here]. I will admit that I do have an agenda or primary thesis: that cloud computing is about both virtual hosting AND organizing these virtual computation units into massive distributed computing networks. In other words, cloud computing is about computing not merely hosting.
So there is a post that covers Massive Distributed Computing and its features that change radically how we approach computing in the future [here]. Finally we take this new found perspective for a test drive looking at the surprising failure of Google Wave and Oracles assertion of a Cloud in a Box.
If anyone would like a copy of the complete white paper in Word format, please email me and I will provide a link with no further hassle or obligations. Otherwise, I hope you enjoy the series and find some value in the references and narrative.