MBAs kill IT with efficiency pr0n

If you feel like a good read pick up a copy of Antifragile by Nassim Nicholas Taleb. Specifically read Chapter 3 temptingly titled “The Cat and The Washing Machine”.

A summary of the book in one line is: Antifragile things get better under stress. This is distinct from fragile things that break under stress, and robust things that persist.

Some antifragile examples used are:

  • Human bodies that improve under certain levels of stress (eg. strength training) and;
  • The airline system, which gets better with each single accident. Each accident improves the system (MH370 may prove to be an exception).

Chapter 3 argues that antifragile things tend to be organisms like “The Cat” whereas fragile things tend to be like the “The Washing Machine”.

What does this have to do with IT Infrastructure? Historically we’ve aimed to build robust infrastructure. We’ve built environments with dual power supplies, backup data centres and RAID disk arrays. We build environments to be robust under stress. They don’t improve.

When IT hardware was expensive and scarce it may not have been justified to buy back-up hardware. You may have bought a single mainframe for one site, leaving your business exposed to a potential extended outage. That is, fragile.

Now in the days of IT abundance, IT infrastructure is vast and geographically dispersed. If one server dies in the cloud does anyone notice? Shadow IT, iPads, BYOD and opportunistic vendors make IT growth much more organic that it once was.

It isn’t urgent if a power supply or HDD dies anymore. What is tricky is understanding the complexity of key business functions and how they map to IT. Also, software is eating infrastructure. Physical assets can be robust, but never antifragile. Software is much more malleable and can improve.

As an aside I chatted to a lady at a cafe the other day. She was a nurse about to study LEAN. I was horrified. LEAN is all about minimising waste and maximising value, or in my mind, managerial efficiency pr0n. I imagine the LEAN consultants received less-than-lean money from the public health system. LEAN works well for manufacturing, but applying a model born out of Toyota to hospitals is MBA madness.

I have nothing against efficiency. I just don’t think LEAN works outside of manufacturing. In manufacturing you have slow-changing product life cycles, mass production and well understood customer needs.

Applying LEAN thinking to IT assumes IT is like “The Machine” where in fact it is becoming more like “The Cat”. In IT we don’t fully understand customer needs so we have tight iterations and close proximity to the end-user (ie. Agile). Product life cycles can be incredibly short. Look at mobile apps. Digital software products are not mass-produced. A single version is produced that copied/distributed one-by-one as needed. A large portion of IT work is not documented. It is done by skilled artisans who don’t see the point of writing down stuff for efficiency experts.

In the naughties IT was consolidated to achieve scale. It was then optimised using methodologies like LEAN and outsourcing to manage cost, but IT never simplified. Complexity is always, at least, preserved it seems.

LEAN, consolidation, outsourcing. We ended up with many gutted-out, off-shore script shops. A disaster waiting when something undocumented or unexpected occurs. Previously salaried employees returned as contractors. IT in some ways became more fragile.

In fact, Taleb argues, and I agree, excessive size, efficiency/optimisation and complexity generally makes things more fragile. Efficiency leads you to put all you eggs in one basket. To lean on single vendors (pun intended) and have big hidden risks. Times have changed. Efficiency is becoming dangerous.

That’s not to say we gold-plate IT. I like Agile. Even though it not always implemented well, Agile recognizes IT as “The Cat” it is. It works with small teams, learns and improves (fails fast) quickly.

I suspect a good number of IT professionals will continue to balance efficiency/fragility with robustness when in fact the we should start looking at anti-fragility. We should look at more ways for IT to improve itself under stress.

We want failure to make IT better and stronger. Isn’t this the big takeaway from ITIL problem management? Isn’t this what Chaos Monkey does for Netflix? A non-functional process destroying things at random to improve the Netflix ecosystem.

What other examples can you think of where IT improves itself under stress? And do MBAs with the latest efficiency fad represent a risk to IT systems?

Is the cloud just… web services?

My work colleague turned around to me and said, “Web Services aren’t APIs”. That didn’t sound right so I started investigating. What is the difference and why is it important?

Applications have been built and constructed from re-usable and shared components since the bloody epoch. It’s why software is eating everything. Once a standard interface – and usually a corresponding library – is agreed upon there’s little reason to go back to first principles again. It’s division of labour, driven by pure need, evolving organically.

Then along came ubiquitous, always-available connectivity and expanding bandwidth. Why bother compiling and distributing an API/Library when the same API call could be made to a shared network service? A web service.

So a web service is a type of API. Not all APIs are web services though. APIs are the public interface for a piece of code – it is what you can call from within your code. It may be a web service, but it could also be a Java library or similar.

Wikipedia has a web services as “… a software function provided at a network address over the web”. That’s a pretty good definition. It must also be loosely coupled, which means that both the service caller and owner agree on the interface but know little about each other. The back end class can be substituted as needed.

Companies are themselves creating their own web service APIs. Here’s an API for business listings at Australia’s Yellow Pages and here’s one I’m looking at now to do some useful work on my expanding Evernote database. Both of these are web services with some extra API management capabilities. Grab yourself key and start developing!

Amazon was miles ahead of the curve on this. This famous post by Steve Yegge describes the famous Jeff Bezos mandate (in 2002!) that all teams at Amazon must:

  1. Expose their data and functionality through data services,
  2. Teams must communicate only through these interfaces (no direct database links etc.),
  3. All interfaces must be designed to be accessed externally

and a few other things. Talk about ahead of the curve.

It’s the future mesh of interconnectivity. One big “programmatical” mash-up.

There are the web services you own and the ones you don’t. With regards to the ones you do own, you’ll have to think about how you maintain some control of the integration. Two possible models are shown below. Do you have an integration bus/relay/proxy in each cloud so as to minimise network traversals, or do you have a few centralised integration points to make control and standards (and non-functional requirements like logging, access control, templates etc.) easier.

Web service cloud options

For the web services you don’t own but use, how much do you rely on single vendors, how do you manage keys and how do you manage different standards/skill sets etc.

I’ve always thought of “The Cloud” as a technology theme that is behind many new implementations, but I’m starting to think “the Cloud” is just a biblical plague of web services. It’s even hidden the in term “as-a-service”! All SaaS, PaaS and IaaS providers have public web service APIs built in from day one.

IaaS could mean twice the work for IT

In my last blog I wrote about Cloud Management platforms, and how they enable integration of multiple clouds (public and private).

One purpose of this is to drive standardisation of infrastructure. This is the usual drive for standards, strategies, life cycling and consolidation that has been with us for years.

Tech-heads get excited about new stuff like node.js, ruby on rails, Ubuntu, skinless servers etc., but in isolation these technologies provide absolutely no benefit to a company. They cost money to buy, build and support. When these component technologies are combined with application logic and data though, they can add immense value. This value must exceed – by a decent margin – the sunk cost of deployment and support.

IT vendors move between mass standardisation/commoditisation and differentiation – sometimes doing both things at the same time. AWS, GCE, and Azure strive to provide a base server at the cheapest cost – ie. commoditisation – but at the same time offer differentiated, non-standard services like Azure Service Bus and Redshift – to get the customers in (and keep them).

Also, over time enterprises accumulate legacy bits and pieces that are too old, too important or too expensive to replace. There they (dis)gracefully age until something serious happens.

All these drivers work against simplification and standardisation. A good friend I use to work with was asked by the IT Group Manager what he would do if, in his role as IT Infrastructure Manager, he had a blank cheque and infinite time. He said something like trash the whole lot and start again. From scratch. Clean out the “detritus”.

Head In A Cloud (jb912 via Flickr)

If you’re an established enterprise you’ve probably spent the last couple of years trying to understand The Cloud and what it means. The years before that you probably virtualised the hell out of everything on some hypervisor, and before that you tried to get everything on commodity servers. Along the way you migrated many workloads to the latest and greatest, but not everything. You probably still have non-cloud, non-virtualised, non-commodity gear around. Do you really believe all workloads will end up on IaaS?

(If you’re a start-up, it probably makes sense to start in one cloud. As you grow though you may end up getting stuck using their proprietary technology. That may be ok, or not. Ask yourself, could Netflix migrate off AWS, and what cost?)

You have standards for deploying servers in-house (something like Linux or Windows on a hypervisor), you have standard network switch configurations and whatever passes for standards in the storage space.

You don’t want to manage a second (or third or fourth) set of standards for your IaaS provider(s).

Comparing some pretty standard IT infrastructure objects against some AWS objects:

[table th=”1″]

In-house technology, AWS technology

VM template,AMI

Load balancer,ELB

Firewall policy,Security Groups

[/table]

At best they are only packaged differently (VMs vs AMIs) and their guts are pretty similar. At worst, they have different capabilities, configurations, and therefore standards and expertise (load balancers vs ELB).

If you buy the Hybrid cloud direction we’re heading according to Gartner and your own observations then…

It’s two – or more – standards trying to be one, and that’s possibly double the work for IT.

Another argument for Cloud Management Platforms such as RightScale, Scalr and Kumolus? Thoughts?

IT will go on forever

The economist William Stanley Jevons made an observation about coal in his 1865 book, “The Coal question”. It became known as Jevon’s paradox, and it was:

      increases in energy production efficiency leads to more not less consumption

In Jevon’s time the big worry was that England would run out of coal. It was hoped that more efficient coal-use (eg. better steam engines etc.) would lead to a lower consumption and therefore England’s coal reserves would last a lot longer.

“write down my name …” (josef.stuefer via flickr)

Economics is very often counter-intuitive. Jevons argued the opposite would happen. If a doubling of fuel efficiency more than doubled work demanded then overall coal use would increase. If improved technology had a lesser effect coal use would decrease as perhaps expected.

What does this mean for IT? Let’s consider the raw input costs of IT projects: Infrastructure, Software & People.

Infrastructure

… is of course getting cheaper and cheaper, while at the same time getting more powerful. Moore’s Law still holds and is driving greater processing power at cheaper cost. There are grumblings about the law’s demise, that chips are getting more expensive to develop, but big brains such as Andy Bechtolsheim assert that the law is still alive and well.

Networking speeds double every two years or so (Nielsen’s Law) and the value of any network increases with more connected devices (Metcalfe’s Law). Connecting to decent bandwidth is cheap but highly valuable and necessary. With the rise of mobile networking the same trends are occurring but now the network is available anywhere.

Manufacturing costs have plummeted. One just has to look at the cost of a Raspberry Pi. Better manufacturing automation, low margins and stiff competition are driving continual investment in lower production costs.

Then of course you have cloud computing which is consolidating data centres into mega data centres and banking huge scales of magnitude. Did you know that every dollar of revenue to Amazon Web Services results in three or four lost dollars to established vendors? This is showing up in the results of IBM and Oracle.

Software

Cloud computing extends up into the software realm too and this impacts those previously mentioned tier-1 vendors. COTS software is now SaaS.

Agile development is reducing the risk of development projects and this lowers cost further. A minimum viable product can be produced to prototype new platforms at very low cost.

Open Source has been underwriting software cost reductions for almost 20 years.

The age of trolling software rent seekers may be coming to an end. We live in hope.

People

Massive outsourcing and off-shoring has had some impact on stagnating wages in IT, thereby limiting – or perhaps burying – costs. (Phew got that distasteful line out of the way.) Countering this there is downward rigidity in wage costs also or, as normal non-economists say, wages don’t go down much. The people cost can be one of the most expensive parts of any project.

That said, collectively people have become much more efficient. IT departments are more efficient through consolidation of data centres, standardisation, automation, orchestration, Green IT and commoditisation of particular platforms (eg. VOIP, Email, OS). IT operating budgets shrink or stagnate, but more is done.

Where IT departments are not efficient and have high transaction costs (eg. Deploying a DB costs $100,000 and takes 4 weeks) the department is being circumvented and a cloud solution deployed, even if the overall cost is higher over time.

The outcome

Overall costs in IT continue to drop for all the above reasons. In line with Jevon’s and England’s coal problems though, IT doesn’t bank the savings and use less computing resources. The opposite happens. IT consumes more. Why is that?

Projects we’d never have done in the past because they were too expensive, or difficult to tie to economic transactions, now become viable (eg. Engagement systems such as Social marketing). Companies can pursue a wider range of projects. Smaller companies can develop capabilities previously only available to large organisations. And these new capabilities once pursued and attained in a market, become a cost of doing business. They can also support the creation of further advanced capabilities. More demand for “work” is generated in IT that is gained by the extra efficiency.

Some costs and problems across the enterprise get worse, especially those that have to deal with the complexity of the entire ecosystem.  For example, Performance management, Change transformation, security, SOA, Enterprise Architecture, orchestration, Cloud architecture, networking, data stewardship, power and cooling and application testing.  These costs and problems are systemic though. They don’t stop new projects. In fact over time they create their own requirements and therefore projects.

Where will this all end? Back to Jevons again, it was expected to end with the exhaustion of coal. What resource will be exhausted first in IT? Electricity? Software skills? Our ability to manage complexity?  Processing power? Physical data centre costs and space? I can’t see any end in sight yet folks… feed the beast!

Software is eating infrastructure

Infrastructure people increasingly deal in software, not hardware. Software is eating the world.

Servers are “guests”. Orchestration is no longer a nice to have but required. Cloud Management software. Application and log monitoring tools. Even storage vendors spruik their cloud values more than their hardware “creds” these days.

In the past applications would run on one big old server in the corner. Every night someone would change a backup tape. Occasionally someone would have walked up to and power-cycled the server. Over time servers got cheaper and smaller, data centres consolidated and grew and servers got remote management cards then became virtualised. Infrastructure guys got more and more distant from the hardware. Then the basic operations jobs got outsourced. It’s no wonder we need software for everything. As storage and networks get commoditised – like servers were before them – the consumption of the profession by software will be complete.

The past was “big tin” with leading-edge hardware and unmatched reliability and power inside the box. The future is tiny disposable units of compute, storage and network that move across an ethereal fabric. These units have a life cycle of potentially minutes. The big ol’ dinosaurs will have been replaced by the most elemental of hardware life forms. The management of this ever growing and sprawling environment will be performed by increasing layers of software. There is no other way.

*******TeaStrainer

An organisation could be thought of as a series of tea strainers with infrastructure as the teacup.

Every issue unresolved or missed at higher levels gets dripped down to the next strainer and there’s always unexpected tea that gets through to the teacup. CPU and memory get thrown at a poorly sized application. Storage specialists fix your database problems. And “the network” fixes everything else.

Even with cheap hardware and ever increasing software layers, tea is still going to keep dripping through all those strainers (especially now that we’re dealing with software!).

The ongoing drama of aligning:

  • A companies’ broader culture and proficiency
  • A companies’ actual needs
  • The always shifting technology landscape

is timeless.

The personality and skills required are those of people who have lived the “pain” and happily catch all the missed bits in their “teacup”. Infrastructure thinking has been crystallised by late night call-outs and unreasonable demands by those less technically savvy. That’s why we’re quite pessimistic and “failure-focused” compared to everyone else. (I do personally build loads of redundancy into my life!)

Infrastructure will always be there, hidden in these layers of software, process, methods and patterns that keep a companies’ core business going. It is just going to get a lot messier.

You see IT Infrastructure isn’t really about server “specs” and stuff. Sure that’s part of it, but mainly it’s about keeping the business going no matter what the technology looks like.