I will make this simple. There is only one question you need to ask yourself or your IT department to determine if what you have is really an Infrastructure-as-a-Service cloud.
Can I get a VM in 5-10 minutes?
Perhaps a little bit more detailed?
Can a properly credentialed user, with a legitimate need for cloud resources, log into your cloud portal or use your cloud API, request a set of cloud resources (compute, network, storage), and have them provisioned for them automatically in a matter of a few minutes (typically less than 10 and often less than 5)?
If you can answer yes, congratulations – it’s very likely a cloud. If you cannot answer yes it is NOT cloud IaaS. There is no wriggle room here.
Cloud is an operating model supported by technology. And that operating model has as its core defining characteristic the ability to request and receive resources in real-time, on-demand. All of the other NIST characteristics are great, but no amount of metering (measured service), resource pooling, elasticity, or broad network access (aka Internet) can overcome a 3-week (or worse) provisioning cycle for a set of VMs.
Tie this to your business drivers for cloud.
Agility? Only if you get your VMs when you need them. Like NOW!
Cost? If you have lots of manual approvals and provisioning, you have not taken the cost of labor out. 5 Minute VMs requires 100% end-to-end automation with no manual approvals.
Quality? Back to manual processes – these are error prone because humans suck at repetitive tasks as compared to machines.
Does that thing you call a cloud give you a 5 Minute VM? If not, stop calling it a cloud and get serious about building the IT Factory of the Future.
“You keep using that word [cloud]. I do not think it means what you think it means.”
– The Princess Cloud
(c) 2012 CloudBzz / TechBzz Media, LLC. All rights reserved. This post originally appeared at http://www.cloudbzz.com/. You can follow CloudBzz on Twitter @CloudBzz.
You’d think as we head into the waning months of 2011 that there’d be little left to discuss regarding the definition of cloud IT. Well, not quite yet.
Having spent a lot of time with clients working on their cloud strategies and planning, I’ve come to learn that the definition of cloud IT is fundamentally different depending on your perspective. Note that I am using “cloud IT” and not “cloud computing” to make it clear I’m talking only about IT services and not consumer Internet services.
Users of cloud IT – those requesting and getting access to cloud resources – define clouds by the benefits they derive. All those NIST-y terms like resource pooling, rapid elasticity, measured service, etc. can sound like gibberish to users. Self-service is just a feature – but users need to understand the benefits. For a user – cloud IT is about control, flexibility, improved productivity, (potentially) lower costs, and greater transparency. There are other benefits, perhaps – but these are commonly what I hear.
For providers – whether internal IT groups or commercial service providers – cloud IT means something entirely different. First and foremost, it’s about providing services that align with the benefits valued by users described above. Beyond that, cloud IT is about achieving the benefits of mass production and automation, a “factory IT” model that fundamentally and forever changes the way we deliver IT services. In fact, factory IT (McKinsey blog) is a far better term to describe what we call cloud today when you’re talking to service providers.
Factory IT standardizes on a reasonable number of standard configurations (service catalog), automates repetitive processes (DevOps), then manages and monitors ongoing operations more tightly (management). Unlike typical IT, with it’s heavily manual processes and hand-crafted custom output, factory IT generates economies of scale that produce more services in a given time period, at a far lower marginal cost per unit of output.
Delivering these economies end-to-end is where self-service comes in. Like a vending machine, you put your money (or budget) in, make a selection, and out pops your IT service. Without factory IT, self service – and the control, transparency, productivity and other benefits end users value – would not be possible.
Next time someone asks you to define cloud, make sure you understand which side of the cloud they are standing on before you answer.
(c) 2011 CloudBzz / TechBzz Media, LLC. All rights reserved. This post originally appeared at http://www.cloudbzz.com/. You can follow CloudBzz on Twitter @CloudBzz.
How the Meek Shall Inherit The Data Center, Change The Way We Build and Deploy Applications, And Kill the Public Cloud Virtualization Market
The tiny ant. Capable of lifting up to 50 times its body weight, an ant is an amazing workhorse with by far the highest “power to weight” ratio of any living creature. Ants are also among the most populous creatures on the planet. They do the most work as well – a bit at a time Ants can move mountains.
Atom chips (and ARM chips too) are the new ants of the data center. They are what power our smartphones, tablets and ever more consumer electronics devices. They are now very fast, but surprisingly thrifty with energy – giving them the highest computing power to energy weight ratio of any microprocessor.
I predict that significantly more than half of new data center compute capacity deployed in 2016 and beyond will be based on Atoms, ARMs and other ultra-low-power processors. These mighty mites will change much about how application architectures will evolve too. Lastly, I seriously believe that the small, low-power server model will eliminate the use of virtualization in a majority of public cloud capacity by 2018. The impact in the enterprise will be initially less significant, and will take longer to play out, but in the end it will be the same result.
So, let’s take a look at this in more detail to see if you agree.
This week I had the great pleasure to spend an hour with Andrew Feldman, CEO and founder of SeaMicro, Inc., one of the emerging leaders in the nascent low-power server market. SeaMicro has had quite a great run of publicity lately, appearing twice in the Wall Street Journal related to their recent launch of their second-generation product – the SM10000-64 based on a new dual-core 1.66 GHz 64-bit Atom chip created by Intel specifically for SeaMicro.
Note – the rest of this article is based on SeaMicro and their Atom-based servers. Calxeda is another company in this space, but uses ARM chips instead.
These little beasties, taking up a mere 10 rack units of space (out of 42 in a typical rack), pack an astonishing 256 individual servers (512 cores), 64 SATA or SSD drives, up to 160GB of external network connectivity (16 x 10GigE), and 1.024 TB of DRAM. Further, SeaMicro uses ¼ of the power, ¼ the space and costs a fraction of a similar amount of capacity in a traditional 1U configuration. Internally, the 256 servers are connected by a 1.28 Tbps “3D torus” fabric modeled on the IBM Blue Gene/L supercomputer.
A single rack of these units would boast 1,024 individual servers (1 CPU per server), 2,048 cores (total of 3,400 GHz of compute), 4.1TB of DRAM, and 256TB of storage using 1TB SATA drives, and communicate at 1.28Tbps at a cost of around half a million dollars (< $500 per server).
$500/server – really? Yup.
Now, let’s briefly consider the power issue. SeaMicro saves power through a couple of key innovations. First, they’re using these low power chips. But CPU power is typically only 1/3 of the load in a traditional server. To get real savings, they had to build custom ASICs and FPGAs to get 90% of the components off of a typical motherboard (which is now the size of a credit card, with 4 of them on each “blade”). Aside from capacitors, each motherboard has only three types of components – the Atom CPU, DRAM, and the SeaMicro ASIC. The result is 75% less power per server. Google has stated that, even at their scale, the cost of electricity to run servers exceeds the cost to buy them. Power and space consumes >75% of data center operating expense. If you save 75% of the cost of electricity and space, these servers pay for themselves – quickly.
If someone just gave you 256 1U traditional servers to run – for free – it would be far more expensive than purchasing and operating the SeaMicro servers.
Think about it.
Why would anybody buy traditional Xeon-based servers for web farms ever again? As the saying goes, you’d have to pay me to take a standard server now.
This is why I predict that, subject to supply chain capacity, more than 50% of new data center servers will be based on this model in the next 4-5 years.
Atoms and Applications
So let’s dig a bit deeper into the specifics of these 256 servers and how they might impact application architectures. Each has a dual-core 1.66GHz 64-bit Intel Atom N570 processor with 4GB of DRAM. These are just about ideal Web servers and, according to Intel, the highest performance per watt of any Internet workload processer they’ve every built.
They’re really ideal “everyday” servers that can run a huge range of computing tasks. You wouldn’t run HPC workloads on these devices – such as CAD/CAM, simulations, etc. – or a scale-up database like Oracle RAC. My experience is that 4GB is actually a fairly typical VM size in an enterprise environment, so it seems like a pretty good all-purpose machine that can run the vast majority of traditional workloads.
They’d even be ideal as VDI (virtual desktop servers) where literally every running Windows desktop would get their own dedicated server. Cool!
Forrester’s James Staten, in a keynote address at CloudConnect 2011, recommended that people write applications that use many small instances when needed vs. fewer larger instances, and aggressively scale down (e.g. turn off) their instances when demand drops. That’s the best way to optimize economics in metered on-demand cloud business models.
So, with a little thought there’s really no need for most applications to require instances that are larger than 4GB of RAM and 1.66GHz of compute. You just need to build for that.
And databases are going this way too. New and future “scale out” database technologies such as ScaleBase, Akiban, Xeround, dbShards, TransLattice, and (at some future point) NimbusDB can actually run quite well in a SeaMicro configuration, just creating more instances as needed to meet workload demand. The SeaMicro model will accelerate demand for scale-out database technologies in all settings – including the enterprise.
In fact, some enterprises are already buying SeaMicro units for use with Hadoop MapReduce environments. Your own massively scalable distributed analytics farm can be a very compelling first use case.
This model heavily favors Linux due to the far smaller OS memory footprint as compared with Windows Server. Microsoft will have to put Windows Server on a diet to support this model of data center or risk a really bad TCO equation. SeaMicro is adding Windows certification soon, but I’m not sure how popular that will be.
If I’m right, then it would seem that application architectures will indeed be impacted by this – though in the scheme of things it’s probably pretty minor and in line with current trends in cloud.
Virtualization? No Thank You… I’ll Take My Public Cloud Single Tenant, Please!
SeaMicro claims that they can support running virtualization hosts on their servers, but for the life of me I don’t know why you’d want to in most cases.
What do you normally use virtualization for? Typically it’s to take big honking servers and chunk them up into smaller “virtual” servers that match application workload requirements. For that you pay a performance and license penalty. Sure, there are some other capabilities that you get with virtualization solutions, but these can be accomplished in other ways.
With small servers being the standard model going forward, most workloads won’t need to be virtualized.
And consider the tenancy issue. Your 4GB 1.66GHz instance can now run on its own physical server. Nobody else will be on your server impacting your workload or doing nefarious things. All of the security and performance concerns over multi-tenancy go away. With a 1.28 Tbps connectivity fabric, it’s unlikely that you’ll feel their impact at the network layer as well. SeaMicro claims 12x available bandwidth per unit of compute than traditional servers. Faster, more secure, what’s not to love?
And then there’s the cost of virtualization licenses. According to a now-missing blog post on the Virtualization for Services Providers blog (thank you Google) written by a current employee of the VCE Company, the service provider (VSPP) cost for VMware Standard is $5/GB per month. On a 4GB VM, that’s $240 per year – or 150% the cost of the SeaMicro node over three years! (VMware Premier is $15/GB, but in fairness you do get a lot of incremental functionality in that version). And for all that you get a decrease in performance having the hypervisor between you and the bare metal server.
Undoubtedly, Citrix (XenServer), RedHat (KVM), Microsoft (Hyper-V) and VMware will find ways to add value to the SeaMicro equation, but I suspect that many new approaches may emerge that make public clouds without the need for hypervisors a reality. As Feldman put it, SeaMicro represents a potential shift away from virtualization towards the old model of “physicalization” of infrastructure.
The SeaMicro approach represents the first truly new approach to data center architectures since the introduction of blades over a decade ago. You could argue – and I believe you’d be right – that low-power super-dense server clusters are a far more significant and disruptive innovation than blades ever were.
Because of the enormous decrease in TCO represented by this model, as much as 80% or more overall, it’s fairly safe to say that any prior predictions of future aggregate data center compute capacity are probably too low by a very wide margin. Perhaps even by an order of magnitude or more, depending on the price-elasticity of demand in this market.
Whew! This is some seriously good sh%t.
It’s the dawn of a new era in the data center, where the ants will reign supreme and will carry on their backs an unimaginably larger cloud than we had ever anticipated. Combined with hyper-efficient cloud operating models, information technology is about to experience a capacity and value-enablement explosion of Cambrian proportions.
What should you do? Embrace the ants as soon as possible, or face the inevitable Darwinian outcome.
On many of the key points – such as elasticity being a side-effect of how Amazon and Google built their infrastructure – I totally agree. We have defined cloud computing in our business in a similar way to how most patients define their conditions – by the symptoms (runny nose, fever, headache) and not the underlying causes (caught the flu because I didn’t get the vaccine…). Sure, the result of the infrastructure that Amazon built is that it is elastic, can be automatically provisioned by users, scales out, etc. But the reasons they have this type of infrastructure are based on their underlying drivers – the need to scale massively, at a very low cost, while achieving high performance.
Here is the diagram from Randy’s post. I put it here so I can discuss it, and then provide my own take below.
My big challenge with this is how Randy characterizes the middle tier. Sure, Amazon and Google needed unprecedented scale, efficiency and speed to do what they have done. How they achieve this are the tactics, tools and methods they exposed in the middle tier. The cause and the results are the same – scale because I need to. Efficient because it has to be. These are the requirements. The middle layer here is not the results – but the method chosen to achieve them. You could successfully argue that achieving their level of scale with different contents in the grey boxes would not be possible – and I would not disagree. Few need to scale to 10,000+ servers per admin today.
However, I believe that what makes an infrastructure a “cloud” is far more about the top and bottom layers than about the middle. The middle, especially the first row above, impacts the characteristics of the cloud – not its definition. Different types of automation and infrastructure will change the cost model (negatively impacting efficiency). I can achieve an environment that is fully automated from bare metal up, uses classic enterprise tools (BMC) on branded (IBM) heterogeneous infrastructure (within reason), and is built with the underlying constraints of assumed failure, distribution, self-service and some level of over-built environment. And this 2nd grey row is the key – without these core principles I agree that what you might have is a fairly uninteresting model of automated VM provisioning. Too often, as Randy points out, this is the case. But if you do build to these row 2 principles…?
Below I have switched the middle tier around to put the core principles as the hands that guide the methods and tools used to achieve the intended outcome (and the side effects).
The core difference between Amazon and an enterprise IaaS private cloud is now the grey “methods/tools” row. Again, I might use a very different set of tools here than Amazon (e.g. BMC, et al). This enterprise private cloud model may not be as cost-efficient as Amazon’s, or as scalable as Google’s, but it can still be a cloud if it meets the requirement, core principles and side effects components. In addition, the enterprise methods/tools have other constraints that Amazon and Google don’t have at such a high priority. Like internal governance and risk issues, the fact that I might have regulated data, or perhaps that I have already a very large investment in the processes, tools and infrastructure needed to run my systems.
Whatever my concerns as an enterprise, the fact that I chose a different road to reach a similar (though perhaps less lofty) destination does not mean I have not achieved an environment that can rightly be called a cloud. Randy’s approach of dev/ops and homogeneous commodity hardware might be more efficient at scale, but it is simply not the case that an “internal infrastructure cloud” is not cloud by default.
One part of the debate on cloudonomics that often gets overlooked is the effect of over-provisioning. Many people look at the numbers and say they can run a server for less money than they can buy the same capacity in the cloud. And, assuming that you optimize the utilization of that server, that may be true for some of us. That that’s a very big and risky assumption.
People are optimists – well, at least most of us are. We naturally believe that the application we spend our valuable time creating and perfecting will be widely used. That holds true whether the application is internal- or external-facing.
In IT projects, such optimism can be very expensive because we feel the need to purchase many more servers than we typically need. On the other hand, and with the typical lead time of many weeks or even months to provision new servers and storage in a traditional IT shop, it’s important to not get caught with too little infrastructure. Nothing kills a new system’s acceptance more than poor performance or significant downtime due to overloaded servers. The result is that new systems typically get provisioned with far more infrastructure than they really need. When in doubt, it’s better to have too much than too little.
As proof of this it is typical for an enterprise to have server utilization rates below 15%. That means that, on average, 85% of the money companies spend with IBM, HP, Dell, EMC, NetApp, Cisco and other infrastructure providers is wasted. Most would peg ideal utilization rates at somewhere in the 70% range (performance degrades above a certain level), so that means that somewhere between $5 and $6 of every $10 we spend on hardware only enriches the vendors and adds no value to the enterprise.
Even with virtualization we tend to over-provision. It takes a lot of discipline, planning and expense to drive utilization above 50%, and like most things in life, it gets harder the closer we are to the top. And more expensive. The automation tools, processes, monitoring and management of an optimized environment require a substantial investment of money, people and time. And after all, are most companies even capable of sustaining that investment?
I haven’t even touched on the variability of demand. Very few systems have a stable demand curve. For business applications, there are peaks and valleys even during business hours (10-11 AM and 2-3 PM tend to be peaks while early, late and lunchtime are valleys). If you own your infrastructure, you’re paying for it even when you’re not using it. How many people are on your systems at 3:00 in the morning?
If a company looks at their actual utilization rate for infrastructure, is it still cheaper to run it in-house? Or, does the cloud look more attractive. Consider that cloud servers are on-demand, pay as you go. Same for storage.
If you build your shiny new application to scale out – that is, use a larger quantity of smaller commodity servers when demand is high – and you enable the auto-scaling features available in some clouds and cloud tools – your applications will always use what they need, and only what they need, at any time. For example, at peak you might need 20 front-end Web servers to handle the load of your application, but perhaps only one in the middle of the night. In this case a cloud infrastructure will be far less costly than in-house servers. See the demand chart below for a typical application accessed from only one geography.
So, back to the point about over-provisioning. If you buy for the peak plus some % to ensure availability, most of the time you’ll have too much infrastructure on hand. In the above chart, assume that we purchased 25 servers to cover the peak load. In that case, only 29% of the available server hours in a day are used: 174 hours out of 600 available hours (25 servers x 24 hours).
Now, if you take the simple math a step further, you can see that if your internal cost per hour is $1 (for simplicity), then the cloud cost would need to be $3.45 to approach equivalency ($1 / 0.29). A well-architected application that usea autoscaling in the cloud has the ability to run far cheaper than in a traditional environment.
Build your applications to scale out, and take advantage of autoscaling in a public cloud, and you’ll never have to over-provision again.
I have no doubt in my mind that Thomas Edison, were he alive today, would instantly spot the real value of cloud computing. Most people think it’s the economics. To one of history’s most prolific inventors, cloud computing would mean innovation.
You see, cloud isn’t just about how cheap you can make a VM, or how much less money Amazon costs than your internal infrastructure, even though it’s absolutely critical to the success of cloud computing that this is the case. Instead, the real value being created is how cloud computing dramatically lowers the barriers to experimentation and new models of delivering capability, thus increasing the chance that true innovation can occur.
Consider the following. Cloud computing is still way more prevelant in the U.S. than elsewhere in the world. There are relatively few non-U.S. clouds, even in Western Europe, though the pace of new cloud investments there appears to be increasing. All VCs now tell their Web-based portfolio companies to save their capital and use cloud computing to launch their service. Why? Because it reduces the amount of capital required to get to market. If the U.S. has a more vibrant cloud ecosystem, it has a positive impact on the level of start-up activity driving new innovations to market. Cloud computing, therefore, increases start-up activity.
Here’s another example. There’s a new pre-market company in the complex event processing (CEP) market called Cloud Event Processing (disclaimer – I am an advisor) using cloud instances, Map/Reduce for massively parallel computation, and a lot of other techniques that are becoming more prevelent in a cloud environment (e.g. no SQL databases). This is a new model that promises to dramatically change the face of the CEP market. Founder Colin Clark’s blog is incredibly open and forthcoming about why he feels that cloud-based CEP is a great and disruptive innovation. His post on Why CEP in the Cloud Makes Sense lays it out for all to see.
When Eli Lilly needed lots of processing power to drive drug discovery, they too turned to the cloud to generate a large map/reduce cluster in a matter of hours vs. the hundreds of days it would have taken through traditional provisioning. As pointed out in this Information Week article on Eli Lilly’s cloud success, the key value is the enabling effect that this cloud project has had on innovation (a word used 5 times in this article).
Have you ever had to go in front of your company’s capital committee to ask for money? This is one of the more painful things most companies put people through. Sometimes it takes months of analysis, justification, and back room lobbying to get a capex request approved. What do you think that does to innovation?? Imagine that tomorrow, instead of asking for $1m for your new speculative project’s infrastructure, you ask your boss for a few thousand dollars of expense budget to try out this new idea. If it takes off, you’re a hero. If it’s a dud, at least the company is not out $1m. That’s how cloud lowers the barriers to innovation and encourages experimentation.
While cost does matter, the cloud is about eliminating the opportunity cost, and the opportunity lost, by discouraging innovation. Next time someone argues with you over “cloudonomics,” change the discussion. To borrow from a famous campaign office sign from Bill Clinton’s 1992 presidential campaign – “It’s the Innovation, stupid!”
Setting aside the shameless cloud-washing that’s going on from some vendors, there are a lot of cloud service providers (CSPs – providers of cloud) today. Many of those listed in Sys-Con’s Top 150 report are CSPs, while others are providing extensions, tools or services for clouds. Everybody’s a cloud provider these days – and as Larry Ellison recently said “All boats are cloud boats.”
Every telco, every hoster, every data center outsourcer, most systems providers and many, many startups are becoming CSPs. After all, there have been thousands of hosting providers over the past several years competing for your business. A few were huge, several were large, and most were small but often profitable. I’m convinced this time it might be different — the cloud provider market will be increasingly consolidated with fewer opportunities for new entrants or profit from the tier 2 or 3 CSPs. The APIs, data center economics and proprietary platforms will make cloud a much more consolidated market.
The chart below depicts the scenario that I see taking place over the next few years, where the number of new entrants and the hyper-efficiencies gained by the biggest (Amazon, Google, Microsoft) result in razor thin margins that can’t be met by most of the players going forward. The pricing curve will drive adoption, solidifying the economies of scale by these mega CSPs.
One of the ways that the biggest guys will get scale is through completely proprietary cloud stacks that have a marginal cost of $0 to deploy new customers. Contrast that with the vendor stacks from VMware, 3tera, Enomaly, and VMOps. If you have to pay money to others based on the number of servers you have, the number of VMs or other components, it puts you at a disadvantage. You can still win, but your profits will be lower and it will be harder to find the capital to invest to stay competitive.
There are the open source alternatives such as Nebula and OpenNebula, and the VMware-free version of Eucalyptus. Some will go this route, innovate around the core, and survive. Others will rely on the vendors or community to keep them competitive and some may not be happy with the results.
This goes back to the excellent recent post by James Urquhart on differentiation. There are many ways that CSPs can differentiate their offerings, but price is probably not one of them (unless you’re in the mega category). That said, your premium needs to be reasonable – selling cloud VMs at 5x the price of Amazon is not sustainable in the long run. Relationships, custom services, security, applications, and other content that’s harder to commoditize needs to be part of your strategy.
I predict that there will be many new CSPs over the next 18 months, but even before the new entrants stop coming many companies will exit the cloud business. Some exits will be via consolidation/merger, but many will just pull out of unprofitable businesses in the face of blistering competition. My take is that the great shakeout will be in full force in the 2012 time frame, with a bottom reached over the following 5-10 years.
So, will you be a survivor, or will you be cloudkill?