Table of Contents
To conclude our featured talks from Microservices Day London, we share this talk by Anne Currie, Co-founder at Microscaling Systems, who provides a good overview of the “why” in container adoption.
We hope you enjoy the presentation as much as we did!
Microservices and Containers. How much faster than a VM?!
Presented at Microservices Day London
Today you’ve heard an awful lot about containers, you might feel a little bit like you’ve been beaten around the head with containers and I’m going to continue that by talking about them again. My background is I’m CTO of Microscaling Systems. I’ve been in the business for a long time, worked in a lot of different areas and CTO of several companies and I’ve worked on quite a lot of systems. Backend systems, servers as well, for a very long time. The reason why I’m going to be talking to you about containers and the reason why I think everybody has been talking about containers today, is that paired with microservices and paired with orchestrators as well, there’s an incredible amount of power that you can get, that which, can potentially, completely revolutionize the way that you run your data center, the way that you architect your systems, there is enormous amounts of power, it’s quite futuristic, it’s not there immediately, but it is worth bearing in mind, this is one of the benefits that you get for moving to a microserviced architecture. I’ll talk a little bit about it.
One of the things we’ve been kind of talking about today is who is going to be … what is the platform of the future? Who is going to be running on the bare metal in data centers, in a years time, in two years time, in five years time? Is it going to be just bare metal, which is getting a bit of a come-back, but probably not. Is it VMs which have been absolutely dominating the world for the past, nearly ten years now. Or is it containers? Which apparently seem to be the “new” way of running your applications in production.
In order to think about that, I think it is quite helpful to step back and think about what are the benefits, what are the trade-offs for each of those particular types of technology.
Why Bare metal?
So, let’s go back and think about bare metal. Why do we not run on bare metal now? Why has bare metal been superseded by VMs? It is still underlyingly what we are running on you’ve got the incredible power of actually running directly on the machine. The reason why we stop doing it, the reason why we went over to VMs is that, although they are incredibly powerful, they were quite inflexible. It is very difficult to scale up with a physical machine, you have to go out and buy new machines, you have to put it into data centers. There is all those down sides which we experienced ten years ago and we were actually running our systems on bare metal, probably in our own data centers. They also led to incredibly poor server utilization, incredibly poor resource utilization, because you can’t just put one thing on one machine, you didn’t really share your resources very effectively when you were running on bare metal. So, that is kind of what made us go off of bare metal. Although we still run it underlyingly.
So, then came along VMs and VMs seemed to solve all of the problems that bare metal had. Rather than being inflexible, they were remarkably and astonishingly flexible. They were completely agnostic about what you were running. You could run any operating system, any application, on any hardware on any host OS. Networking was phenomenally good. The security was phenomenally good. Their ability to divide up your resources and apply them to specific machines was astonishingly good.
They have been around a long time, they really bedded in. Something I realized about VMs, which is an advantage, that when I first started thinking about it, it wasn’t immediately apparent to me, but has become more apparent over time. Is it VMs are a technology where all of the benefits of VMs accrued to the operations team. As far as the developers were concerned it was transparent, it was an invisible technology, it was all about operations, it was all about how things were deployed, at least, to start with. So, all of the benefits accrued to operations and all of the pain associated with learning about VMs, also went to the operations team. There was a lot of pain, learning how to orchestrate VMs effectively was a lot of work, it took years and years to develop that expertise across the business, across any particular business. Orchestration of VMs has got an awful lot better as time has progressed. I think that is something that we quite often forget, that the ease of … One of the reasons why VMs took off was because the pains and the gains all fell to the same team and that’s a much easier situation to manage, then when the pains accrue to one team but the gains accrue to a different team. So with VMs that’s an interesting …
So, what’s wrong with VMs? Is there anything wrong with VMs? Well, they are still slightly overweight. They aren’t perfect as a way of effectively utilizing the resources you have in your data center, because you are still running a full guest operating system on top of a full host operating system. So, even that alone means that you’ve got an awful lot of overhead. Although they are vastly faster than scaling up, or scaling out, when you’re talking about physical machines, they are still not real-time scaling. It still takes a couple of minutes to bring a new VM online, even if you don’t have to scale out your underlying physical infrastructure, which you do if you are running your own cloud but you don’t if you’re in the public cloud. So, there are huge advantages to VMs, but they don’t necessarily solve everything, they still do have some disadvantages as far as operations are concerned.
Then you’ve got containers, the new contender and in many ways they solve the weaknesses of both of the previous ways of running your applications. They are incredibly lightweight, because fundamentally a container is just a package of processes. It is just a way of getting your operating system to do some of the clever stuff that VMs do for you, in terms of limiting what resources on a system are available to a particular process, or set of processes. They have some of the things that VMs do, but in a very lightweight way, with almost no overhead. This is completely different from the advantages that we all have been talking about all day, which have amazingly good advantages, which are about deployment and packaging.
Docker is fantastic! It is an amazingly useful developer productivity tool and all the tooling and the continuous integration. That is a fantastic thing that containers can do for you, but it is not the only thing that containers can do for you and it is not the reason why containers were invented. Now, when we are thinking about it, we tend to think that containers are a continuous delivery, a developer productivity tool. They were actually invented around ten years ago by people like Google and Sun as an operational tool to help massively cut down their operational costs, both in terms of the resource used and the amount of operational time that was required to look after systems. I’ll be talking about that through this talk today, but really what I want to say is that containers are great, Docker is great, but they are not the only reason for using containers. It’s not the reason why they were invented and I don’t think it’s the way that we’ll fundamentally change the world. I think it’s the operational use of containers that is really, utterly, groundbreaking.
They are not without their faults in production environments. The networking is still pretty poor, although we are getting better at that and there are a couple of companies in London, Project Calico and Weaveworks, who are working to massively get networking to scale up to meet the potential of containers, because you could be running millions of containers and you need new ways of a bungee of networks to handle that well.
Security in containers
Security, obviously that’s always been a concern, a risk, a worry about containers in production, because VMs are so secure and they are so well bedded in and we are in an increasingly aggressive and unsecure world, we worry about putting in any kind of new operational technology that will not have been as hardened in the field as VMs now are. It’s so much a focus of everybody’s interest in containers, everybody’s going container security mad, I think that this will not be a problem in the long run. It’s definitely worth worrying about it, but not worrying about it to such an extent that you don’t move towards a containerized infrastructure in production in the long run.
As I said, Docker is great! That whole way of packaging applications so they can be moved easily from place to place, continuous delivery all absolutely fantastic and I’m not going to slack off Docker, because we wouldn’t be here talking about containers if they hadn’t popularized it and given you a value to containers right now, without having to go through any pain or learning new techniques. So, Docker is fantastic, but for me, the really incredibly interesting thing about containers is how lightweight they are and how quickly they instantiate compared to any technology that is similar to them. If we say that VMs are similar to containers and they assign a certain amount of resource to a particular application, then a VM will do that, but it will take minutes, a container will do that, but it will take seconds. That is quite a game changer.
Why is it a game changer? Let’s talk first about something that Google uses it for. They use this for throwing away the idea of autoscaling. So, autoscaling, obviously, is bringing an additional resource into your data center to cope with, for example, increasing demand. The good thing about autoscaling, it used to be incredibly hard, if you ran your own data center autoscaling was difficult, you had to plan it six months in advance, you had to bring in additional machines, you have to get the power for these machines, there were loads and loads so it was a bit of a headache so you never really wanted to do it, or you planned it a very long time in advance.
One of the huge advantages of the Cloud is the ability to autoscale really, quite quickly. So, you can autoscale in minutes, you can scale in minutes, which means you can effectively autoscale, but that’s still not realtime. It slightly lulls us into a false sense of security because you still have to plan in advance, you can’t respond in realtime to an unexpected peak, an unexpected demand that falls on your systems.
With containers, actually, there is the possibility of responding to things in real time. It’s not autoscaling, because in order to autoscale you still have to add in an additional resource and that is never realtime, right now, it’s not a realtime operation. You can potentially reuse your existing resources in realtime using containers. For example, imagine you were a video hosting service, Kittens R’ Us, and you were hosting incredibly cute kitten videos and you offered two services to your customers. You both showed them kitten videos and that was a very demand dependent operation. If people came to you and they want to see a kitten video right now. You also offered kitten video transcale uploading, so they could upload their own videos, which isn’t a very expensive operation. Actually, that’s less time sensitive for your visitors. If it takes a minute to upload a video, that’s great, if it takes an hour, well that’s not the end of the world. You’ve got two services there, one is incredibly important and time critical. One is important, but not time critical.
What Google came up with was the idea that if, “Well, hang on a minute if we could switch off these non-time critical services in order to use all our capacity for the time-critical services in order to cope with the peak in demand then we wouldn’t necessarily need to auto-scale at all and we wouldn’t need to predict that demand in advance”. That is the kind of way that people like Google, people like Netflix are actually using containers to make their systems more realtime and self-managing. I think it’s a really interesting concept, I think it’s absolutely fascinating. It does require microservices. My talk is completely divorced from microservices. You have to have a microservices approach in order to do this. If you just have a monolith, if you had one monolith that was doing both the video serving and the video encoding there is nothing to turn off. You have to have at least two services, ideally more services than that. So if you are going to reuse your existing resources in realtime to handle demand. Microservices is utterly key.
A cattle not pets approach, I don’t know if people are familiar with cattle or pets? It’s the idea that you can turn a service on and off very quickly. You don’t have to spend ten minutes carefully, or hours, or weeks, carefully tidying it up before you can turn it off. If your service, that you need to turn off; in order to free up additional space, takes ten minutes to turn off, then you might as well of auto-scaled, you are not realtime. In order to take advantage of any of this realtime reactiveness of a system you have to be both have microserviced and you have to have adopted a cattle or pets approach.
This is all very well, but is it actually true? This seems like a crazy thing, Google is saying they’re doing it, Netflix is saying they’re doing it, but is it really true? It relies on three things, it relies on microservices, and that seems quite plausibly true, I think we’re all here and we believe that microservices is possible. Cattle not pets, again that’s not crazy talk, that is a common industry move towards a cattle not pets approach to how we manage services. That kind of instantiation of containers within a second, that’s not necessarily true, or achievable. Just because Google is doing it, doesn’t mean that everybody can do it, maybe they’re using special orchestrators that are not available to us, or they’re running on special hardware, that isn’t available to us. There are a lot of reasons to feel suspicious of anything that they’re saying that they’re doing in this area. Is it something that only they could do? We went and took a look at this, because the only way to find out is to actually experiment.
[16:31] We don’t believe in taking anybody’s words for these things, so if we wanted to find out where the containers were instantiatable in seconds, the only way to do that was to actually find out. We built a tool, which was a very simple scheduler that sits on top of a variety of different orchestrators, because we wanted to try it with various different orchestrators and what we did is we said, “Okay, let’s take the most simplified form of what we’re proposing here, which is that there are two types of service, a high priority service, which is incredibly time critical and a low priority service, maybe a batch service, something that we really don’t mind how long it takes to run.” The high priority service in blue and the low priority service in lilac up there. We also, just produce randomized, simulated demand, which is the red line there. We asked our schedulist to say, talk to the orchestrator and try and meet the demand, whilst, the same time using all of the rest of your resources for your batch process. Is this possible? Is this even vaguely possible on standard servers on standard infrastructure?
We built this and we ran it against ECS, the Elastic Container Service, the Amazon scheduler for containers. We ran it against Mesos Marathon which is another popular Mesosphere scheduler. We ran it against Swarm and we ran it against Kubernetes and we ran it against, just the Docker API directly. We put it, in terms of infrastructure, we tried it on bare metal, we tried it on EC2 and we tried it on Agile. In every case, actually it did, more or less, behave in exactly the same way and we saw, more or less, the same performance in every one as well. Yes, it could keep up with a, kind of, plausibly, real-time demand there. It can plausibly, those orchestrators, the standard orchestrators, that’s pretty much all of the standard orchestrators and the standard physical infrastructure can keep up with changing which containers you have running on your infrastructure, in plausible realtime. That is the basis of the idea of having infrastructure, which is effectively self-managing, self-regulating, self-healing, depending on the demand that is currently coming into your system.
[19:20] Google and Netflix are doing this and what are they really doing it for? Well, they’re doing it because the kind of resource utilization that most of us get was not acceptable to these guys. The average data center utilization worldwide is about 10-15%. Which is pretty terrible, isn’t it really? The move to VMs and the move to cloud has actually made that worse, rather than better. It is now very easy to over provision and really we don’t have an awful lot of choice. You either have to get your … If you’re going to autoscale, you either have to get your demand prediction, utterly right, which we know is pretty much impossible, or you have to massively over provision, so we over provisioned to a factor of nearly over ten. Which is quite astonishing really isn’t it?
For these guys that was too expensive. They couldn’t possibly do that so they needed to adopt a more reactive way of managing, of what didn’t always; or, very rarely involved autoscaling and instead involved reusing your existing capacity differently, depending on your current demand profile, which we call, microscaling. With that, Google has achieved around 65-70 % resource utilization. That’s vs. 10-15% that we’re achieving. Netflix around 50% because they use batch processes a little less heavily than Google does. Again, that’s for five times of what we achieve. I think it was Keith mentioned earlier, that it is not unrealistic to assume that you could double or triple your resource utilization by moving to a clever, orchestrated containerized model and that can make an enormous difference to your hosting costs and also your operational costs as well, how hard your system is to manage. It is something that is well worth achieving, I think.
What is it? What is the throne? What are we all fighting for here? I think in terms of ops, were fighting for lower cost with higher server density, lower maintenance, more self-healing, more reactive systems, systems that will manage and look after themselves without you having to constantly anticipate demand, that might not be anticipatable. That’s everything there is to be gained by considering containers as an operational technology, as opposed to purely a deployment or development technology.
Was that my buzzer for my time going? Awesome! Well, in that case I have pretty much finished. My last slide was in summary. I talked in the beginning about how there was a battle for who was going to win, who was going to be on the bare metal in the data centers, in 5 years time. That battle has already been fought. I hadn’t realized this as well, but that battle has already been fought by people like Google and Netflix. Who looked at VMs, and looked at containers and said well, only containers can deliver the kind of performance and the kind of self-healing, the kind of resource usage that we need.
Although we might think that there is a battle to be won, actually containers has kind of already won it. We will get there eventually. I have no doubt about that. It does deliver lower maintenance and low costs. Operationally, what is required for you to get there? Well, you’ve got to containerize, but there has been a lot of talk about that earlier. You also have got to really put an orchestrator in and that is quite painful. You do need to have a vision for what that’s going to deliver for you. There are some absolutely excellent orchestrators out there, Mesos, Kubernetes, Swarm, ECS, they are all very good. They’re in a decent state for you to at least, be playing with, at least be starting to think about them. Microservices that are containerized and orchestrated is really a different world for the data center and it’s a world which is well worth moving towards, I believe. Yes, there we go! Containers are coming! I think there is no doubt about that after what you’ve heard today, it matches so well with microservices. That’s me!
About the speaker
Anne Currie has been in the tech industry for over 20 years, working on everything from core server technology to pioneering e-commerce platforms. She is currently CTO of Microscaling Systems, with a focus on the convergence of microservices and containers.