That has been around a very long time.
We have seen in a previous post[i] how Map Reduce is analogous to how business and product managers plan and implement projects with Gantt Charts allowing a number of tasks to proceed independently in parallel and integrating their products at the end. A question from those outside of computer science might be: why has it taken this long to figure this out? In fact, if computers are so fast and capable, why don’t they figure the quickest, most parallel way of executing multiple tasks all by themselves?
The answer represents a fascinating history of personalities, ambitions, and feuds that make up what we call computer science. First, computer science is not like any other science in that it is also an engineering discipline that often gets ahead of the science, and a business segment that often gets ahead of the engineering. As a result there is no repository of long-term memory so concepts often have to be rediscovered as the necessity of new situations arise or race to catch up with the hype. A rose by any other name would be considered something entirely different!
What has changed now is the general recognition that having thousands if not 10s of thousands of computers applied to a problem (such as search) can have significant advantages over a super-computer though the latter is tightly coupled, efficient, and designed to be … well, super!
The web discovers Sullivan’s Law[ii], which was defined in the mid 70’s! At the same time, a group at MIT under Hewitt[iii] were developing and evangelizing the concepts of CCP derived from the Actor Model of Concurrency. Also work at Carnegie Mellon by Bennett[iv] involved development of ACID (atomic, consistent, independent, durable) applied to distributed operating systems. All this was done at the same time that Dyjkstra[v] and Hoare[vi] were developing CSP. At the time, there was a great deal of exchange among these groups and the citations provided above give a flavor of the exchange at the time. In the early 80s, CSP was selected as relevant to distributed system problems at that time.
I have had the privilege of working with Sullivan, Hewitt and Bennett at different times in my career as an engineer[vii]. It has puzzled me why their work and contributions have not been more prevalent in our work today. Until recently one could argue the best approach was to build efficient algorithms and rely on Moore’s Law to double processing throughput every 2 years from more transistors on faster chips[viii]. Has anyone noticed that we have been holding at around 3 GHz for 5 years? The increased capacity has come from building multiple processors called cores on a chip. But then applications have to be able exploit parallel execution at the individual computer level. Not so difficult when there a few processors, there always a loose thread around looking for a processor, but with hundreds or thousands, one needs to stop and think.
More recently the computational capacity increase has come from distributing processors into everything from automobiles to cell phones, and interconnecting these via the Internet[ix]. Can there be anyway that this global network of processors can be harnessed to do useful work? To take the next leap in processing capacity requires harnessing distributed computing, and Map Reduce is just a gross (coarse grained) application of this concept.
Concurrency in a Distributed World
In Star Trek Next Generation (I am talking with fellow nerds here, right?), we have the Borg – a collection of assimilated races attached via cyborg appliances to a central controller that is actually the composite of all the Borg minds. All can access anything experienced by one. The Borg operate without individual identity and with relentless efficiency to absorb new races. In their organization and behavior the Borg represent one vision of distributed computing[x].
We humans, enemies of the Borg that to the Borg must be absorbed because we are so inefficient, represent another form of distributed computing[xi]. In comparison to the Borg, we seem highly inefficient, requiring a great deal of support and because we are isolated and individual, unnecessary coddling and communication to get anything done. This nicely lays out the differences and contrasts in positions (including hostilities) of the different approaches to concurrent programming.
There are the Borgist, who insists on tightly coupled processing and who build MIMD (multiple instruction multiple data) and SIMD (single instruction multiple data) processors with shared memory and clock synced instruction execution to squeeze every concurrent advantage with mechanistic efficiency. Then there are us humans that attempt to use tools such as computers as extensions of ourselves to be more effective or productive, and despite our individuality and independent behaviors seem to at be able to organize to perform amazing feats, such as sending someone to the moon, sequencing the human genome, building the Internet or manufacturing tooth picks. What is the essential difference?
The difference is that in the Borg model everyone shares absolute knowledge of the global state and in the human model no one at any time has absolute knowledge of the global state. In the later case everyone works with the assumption of sufficient knowledge of relevant state. This is precisely how concurrency is applied to developing distributed applications. The concept of global shared state is eliminated!
This represents one derivation of the concept of no shared state from CCP Theory. An independent derivation of the same concept (I told you that concepts have to be rediscovered) arises from attempting to unlock the shared constraints in data management systems called Share Nothing Architectures in the 90s[xii]. Yet another line of investigation comes from systems theory and economics called bounded rationality[xiii]. The latter makes clear that concurrent programming is clearly more a systems discipline than a programming discipline, which will become more clear as we continue.
Relativity in a Closed Mechanistic World View
The concept of no global shared state is gut wrenching to many in computer science on a number of different levels and for a number of different reasons. There are many consequences that result from this rule. For one, the concept of global time and ordered sequence of events has to be abandoned. Just as the transition from Newtonian to Relativistic physics absolute time is eliminated, the same is true in the transition from CSP to CCP.
Without absolute time, events in the network cannot be sequenced in absolute order everywhere in the network (unless one waits until all data is in, which can be itself an indeterminate time). But in the meantime, decisions and other events are being generated based upon incomplete knowledge of the global state. As a result, the sequence of events that are “observed” by a process is probabilistic and not guaranteed repeatable.
An example is vote counting. TV networks provide exit polling data based upon sampling; followed later with raw vote precinct counts as they are reported; and followed many weeks later with the certified recorded counts. Each count is different, but the critical threshold of who got the most votes many times can be determined from the exit poll sampling and partial precinct reported counts.
Is absolute certainty of the actual count relevant or the fact the threshold has been passed with high certainty the important fact? This is a question that is often asked in designing decision systems. For example, in designing a distributed operating system, is it important that all processes know exactly the storage available or whether a critical threshold has been crossed such that access must be serialized and tightly coordinated.
In the rare cases when results are too close that a recount is necessary such as the Florida Presidential Recount 2000 or Minnesota Senate recount 2008, one becomes exposed to the messy world of real world data with hanging chads and invalid entries. Ambiguity and noise in voting results is difficult to comprehend much less accept when one believes in a mechanistic closed universe where every vote is counted and the process ensures the same count every time. Whenever the ambiguities of human behavior and the verities of infrastructure are introduced, the system is never closed and the processes must take these variations into consideration. This is fundamental to how all science is performed.
In the real world, for 99.999% of the cases, the exit poll and raw count numbers are sufficient to determine the winner and proceed with high confidence from that result. Both in the real world and in concurrent programming one can insist upon an event being executed simultaneously everywhere; actions preformed only when events are properly sequenced to generate repeatable results; or respond to events in the shortest time possible at a given confidence level. Just not all at the same time.
Massive Distributed Computing in an Open Internet
For massive distributed computing to work over a global open system such as the Internet, one has to be very selective on when to insist upon global understanding of time or any other state parameter if one wants to leverage concurrency to increase computational throughput. To insist on Borg-like omniscience is neither practical nor desirable.
The result is a computer system that is programmed much as businesses are organized. Though businesses seem highly inefficient as a computational model, they do excel at making sure that all their employees (processors) are busy, and organizing them to work productively in parallel. Small start-ups of a few workers can get more done and quicker than one individual alone. Large organizations may seem to not have linear scalability but none-the-less can perform gargantuan enterprises not conceivable for an individual.
There is actually something more going on here than mere programming. The small linear programs, chubby objects, are embedded into a system that can respond to the external environment and optimize to higher-level goals. Enterprises incorporate systems and subsystems that interact at different levels of communication and coordination allowing complex processes to be distributed to simpler processes operating in parallel. The ability to adapt and optimize is a feature of self-organization that can occur when concurrent components communicate forming feedback loops. In computer science these “programs” are referred to as connectionist networks that are an out growth from artificial neural networks and require developing concurrent operating environments over a wide spectrum of couplings in both communications and shared state[xiv].
All this is accomplished by not insisting upon everyone including the president having a global view of the companies state, nor serializing access to shared assets. How this would be accomplished in programming a massively distributive system requires a different programming paradigm. One that identifies not only sequences of processing steps, but also the concurrency of processes and the dependencies of these processes to the completion of other tasks, much as how businesses define work break down structures.
Resilience Comes To The ForeFront Over Predictability
The strongest counter argument by the closed systems advocates is why develop programs that can provide only probabilistic or worst unpredictable results. The thing is we have already developed such programs without being aware we have crossed the line into open systems programming.
The current rage is data driven decisions where managers make decisions based upon analytic data and business intelligence concerning their business. The data is assumed to follow all the data integrity and consistency policies of a “global” view of the business. Companies that insist upon the highest quality data following strict accounting or for online SOX compliance place human decision makers into the process that are neither strict nor consistent. In so doing the linear programs are now part of a non-linear if not unpredictable decision process, especially if the data collected and analyzed reflects the impact of decisions made by those managers.
The fact that humans are making the decisions does not make the enterprise more stable. Until one considers the entire combination of human and automated subsystems as one system, one will not be able to access resilience of the enterprise to competition nor it’s potential for growth. From that system perspective one can see if adding automated decision processes will aid or hinder the performance of the business. At that point one has introduced feedback loops and crossed the line into open system unpredictability.
In place of predictability, one wants to know the resilience of the system to respond to unexpected events. Programs designed assuming a closed system with predictable inputs do not have much resilience when connected to an open environment. Would it not be better to build systems with an understanding of its behavior in real environment rather than expect predictable results in a fantasy environment? It is not a trick question! You would be amazed on the number that insist on the later.
Google makes it’s Move
What is this paradigm is TBD[xv]. However at this point Python can be considered as viable a candidate as C++, C#, or Objective C or even Java or Ruby. The critical question will be how finely concurrent units can be identified and how quickly these units can be integrated into systems that perform useful work. At the moment, most are considering computational units on the scale of individual servers[xvi]. However for Google, the computational units are Objects held as chubby chunks in the Big Table[xvii]. So Google has already broken away from the cloud-computing pylon, huddled around conventional notions of computation.
What can be done that will be New?
So what could be done with thousands potentially millions of servers hosting millions potentially billions of chubby worker bees just waiting to be organized to perform your every whim? Google has the servers and the engine to support chubby worker bees and has experience in some of the ways of organizing these worker bees over a number of applications including Google Analytics, Maps, Finance, and 60 other products[xviii]. Is this sufficient to propel a third wave of Web change? Well Google had a thought about that – NOT!
Agha, G., & Hewitt, C. (1985). Concurrent programming using actors: Exploiting large-scale parallelism . From SpringerLink: http://www.springerlink.com/content/y1wl7q3342006720/
Banerjee, U. (2009, 26-November). Cloud Computing Service: Amazon EC2 vs Google GAE. From Cloud Computing Journal: http://cloudcomputing.sys-con.com/node/1200706
Bhatia, A. (2007, 10-December). Shared Nothing Architecture. From Toolbox.com: http://it.toolbox.com/wiki/index.php/Shared_Nothing_Architecture
Boyer, R., Chen, D., Gray, A., & Lee, T. (1999, 30-November). Operating Systems for the WWW. From Google Docs: http://www.bennetyee.org/ucsd-pages/Courses/cse221.f99/OSSurveyF99/papers/boyer%2Cchen%2Cgray%2Clee.operating_systems_for_the_www.pdf
Business Dictionary. (n.d.). Bounded Rationality. From Business Dictionary: http://www.businessdictionary.com/definition/bounded-rationality.html
Closson, K. (2007, 9-August). Nearly Free or Not, GridSQL for EnterpriseDB is Simply Better Than Real Application Clusters. It is Shared-Nothing Architecture After All! . From Kevin Closson’s Oracle Blog: http://kevinclosson.wordpress.com/2007/08/09/nearly-free-or-not-gridsql-for-enterprisedb-is-simply-better-than-real-application-clusters-it-is-shared-nothing-architecture-after-all/
Dijkstra, E. W. (1975). Guarded commands, nondeterminacy and formal derivation of programs. From ACM: http://doi.acm.org/10.1145/360933.360975
Fay Chang, J. D. (2006, November). Bigtable: A Distributed Storage System for Structured Data. From Google Research Publications: http://labs.google.com/papers/bigtable.html
Hewitt, C., & Baker, H. (1977, 10-May). Laws for Communicating Parallel Processes. From MIT: http://mit.dspace.org/bitstream/handle/1721.1/41962/AI_WP_134A.pdf?sequence=1
Hoare, C. (1978, August). Communicating sequential processes. From ACM: http://portal.acm.org/citation.cfm?doid=359576.359585
Knorr, E., & Gruman, G. (2009, 5-May). What cloud computing really means. From Info World, Cloud Computing: http://www.infoworld.com/d/cloud-computing/what-cloud-computing-really-means-031
Kraft, T. (2010, 02-July). Clouds of Hadoop: How Map Reduce Changed the World . From Mind Before You Mine: https://mindbeforeyoumine.com/2010/07/02/clouds-of-hadoop/
Kraft, T. (2010, 25-June). Send In the Clouds: Old Metaphor Gets New Life. From Mind Before You Mine: https://mindbeforeyoumine.com/2010/06/25/send-in-the-clouds/
Kraft, T. T. (1989). Summary Proposal for the Development of an Architecture Neutral Distribution Format based upon Communicating Concurrent Processes. Neurocomputing , 1(2), 23-28.
Meadows, D. H. (2008). Thinking in Systems – A Primer – . (D. Wright, Ed.) River Junction, Vermont: Chelsea Green Publishing.
Moore, G. (2010). Moore’s Law: Made real by Intel innovation. From Intel: http://www.intel.com/technology/mooreslaw/
Paolo Fogliata, M. T. (2008, 18-August). Intelligence-ready network infrastructure: An ecosystem to control third-party intelligence distribution close to nomadic users. From Wiley Online Library: http://onlinelibrary.wiley.com/doi/10.1002/bltj.20306/abstract
Patton, P. C., & Hersh, C. I. (2003, April). Asynchronous Non-Preemptive Tasking for Parallel and Distributed Systems. From A Quantitative Methods/Computer Science White Paper: http://18.104.22.168/WhitePapers/QMCS/ANT.html
Ricadela, A. (2007, 16-November). Computing Heads for the Clouds. From Bloomberg Businessweek: http://www.businessweek.com/technology/content/nov2007/tc20071116_379585.htm
Sullivan, H., & Bashkow, T. R. (1977). A large scale, homogeneous, fully distributed parallel machine, I. From ACM: http://doi.acm.org/10.1145/800255.810659
Swoyer, S. (2008, 08-July). Is Cloud Computing the Next Disruptive Technology? From ESJ: http://esj.com/articles/2008/07/08/is-cloud-computing-the-next-disruptive-technology.aspx
[iv] John K. Bennett is citied in many papers for his work in distributed operating systems and is currently a professor at University of Colorado. An example paper summarizing various operating systems for the web includes a discussion of Prof. Bennett’s contribution. (Boyer, Chen, Gray, & Lee, 1999)
[vii] Bennett was my first introduction to distributed computing when I worked on a joint DARPA project on distributed operating systems for the Navy. I had been following Hewitt’s work ever since I learned of Actor’s from an old Byte magazine article in 1977. At the time Actors and Objects ala Smalltalk were practically inseparable concepts because of the emphasis on communications (which has been carried over to Objective C). In the late 80’s I began seriously looking at Actor concurrency in developing Neural Network or Connectionist architectures and met Professor Hewitt at a neural network conference. Apparently my advocating Actor’s peeked his interest in neural networks. Later after this experience I met with Dr. Sullivan when I was brought in to consult with one of his companies. At the time I was very much an efficiency wonk and at the same time trying to relate Actors to Ants. For a couple days, Dr. Sullivan took me out to the “wood shed” for an attitude adjustment, which resulted in extensive notes on concurrency and the theory of computation. All the time, he was chain smoking through the entire set of lectures. After being rightly adjusted, I was able to combine what I learned from both Hewitt and Sullivan to develop a fully N-scalable app server based upon actor concurrency for processing web analytics data in one of the first Web Analytics as a Service offerings.
[viii] (Moore, 2010) This is a great reference to Moore’s Law linking to important papers and history. As a cofounder of Intel, it is less a law and more Intel’s mission statement for the last 50 years.
[x] Borg distributed computing model – tightly coupled synchronous processors with shared memory with parallel execution at the instruction level either Same Instruction Multiple Data (SIMD) or Multiple Instructions Multiple Data (MIMD).
[xi] Human distributed computing model – loosely coupled asynchronous distributed.
[xiv] (Kraft T. T., 1989)
[xv] With LOAF with JAM I cover how adding actor language extensions to Java and other languages can meet all the requirements of a massive distributed computing paradigm. See also (Agha & Hewitt, 1985)