The proliferation of layers in the Cloud and what to do about it

There’s an old adage in Computer Science, which goes: “There’s no problem, which cannot be solved by adding another layer of indirection.” One usually goes on to add: “Except, of course, for the problem of too many layers of indirection.” Truth is, layers are great. They have been intimately involved in almost all the great breakthroughs in IT productivity. Fundamentally, they allow us to abstract away unneeded complexity and get on with the work we are supposed to be doing without having to deal with issues that aren’t relevant to us. Assembly language is cumbersome, so we add one or more layers of programming language, which makes our code (almost) human readable and makes us many times more productive as programmers. The compiler takes care of the translation and 99% of the time that causes no major issues. Coding for multiple platforms is hard and time consuming, but with Java or - these days more likely - a nice shiny web application, we can rely on a virtual machine or a browser to ensure that it runs across client environments, operating systems, and hardware platforms. As any web developer will tell you, the process of making something cross-browser compatible is far from trivial and often aggravating, but imagine having to write separate code bases for every target platform instead. A lot less pleasant isn’t it?

From one perspective Cloud Computing in toto is a massive exercise in indirection. IaaS abstracts away the details of hardware and network maintenance and allow us to provision compute and storage resources in a simple interface. PaaS adds another layer that allows you to abstract the OS and system services and only focus on the bits that are relevant to your development.  SaaS, finally, abstracts away most of the complexities of application management and maintenance and allows you to just get on with using the service.

However, the act of abstracting away complexity by its very nature makes the total system architecture more opaque. And with the distributed nature of Cloud Computing systems, this can become a threat to effectively managing your ICT estate. In any given Cloud system there may be numerous vendors involved running platforms with discrete underlying architectures. While some scenarios are relatively simple, for instance a PHP web app running on top of Ubuntu Server on Amazon EC2, others can get quite hairy. Imagine a provider of mobile payment services running a SaaS application on top of a custom mobile provider framework from a second vendor running on Engine Yard running on EC2. SaaS on PaaS on PaaS on IaaS, effectively and it can easily get more complicated if you add in additional integrations and 3rd party web services. At the very least, you would be dealing with four different vendors using four separate system architectures probably hosted in multiple physical locations. How as a customer do I really understand the implications of using this service in my Enterprise Architecture? Some might argue that I don’t have to, just sign-up to and let it be the vendor’s problem. But for any organisation with serious compliance requirements and pro-active management of Service Levels that answer won’t fly further than Superman after a kryptonite sandwich.

This creates two sets of challenges for the ICT organisation: one technical and one commercial. On a technical level, I need to understand how the layers of my proposed stack fit together. What are the potential points of failure, what are the security risks and threat vectors relevant to each layer, and how is the responsibility for my service provision actually distributed across providers? On the commercial level, I need to understand the vendor risk associated with each part of the puzzle, what my real chances of the vendor meeting service level commitments given the distributed nature of the system are, and who I can point the finger at for different types of failure. For compliance, I will need to have risk assessments of each vendor and know how I will monitor that the vendors continue to live up to the commitments they have made. Standards, like ISO27001, help a lot, but are not catch-alls.

This may seem a daunting endeavour, but fortunately doing these types of assessment needn’t be overly strenuous. The Cloud is consolidating on the lower layers, and you only have to assess Amazon once. It is an achievable task to put together a standard set of questions and clarifications you’ll need from providers. If you’re a bit clever about it, you’ll feed that back to your procurement process and make sure that this information is taken in account when assessing vendor risk and performing technology selection.

Cloud Computing for all its great strengths is not a silver bullet. You still need to own and take responsibility for the architecture that you deploy. As the legendary Fred Brooks told us so many years ago, there is an inherent minimum level of complexity to computer systems. Attempts to go beyond this are bound to fail. At the end of the day there is no way to simultaneously abdicate responsibility for your Enterprise Architecture and retain a reasonable conviction that it will remain effective. There never was.