Simple IT made complex

Dryerr's blog

vCPU performance degradation

with one comment

This post will lightly touch the subject of vCPU performance degradation, when using multiple vCPU in a VM. This is not a scientific test, it’s just meant to help make decisions about how many vCPU you should use 🙂 

How is the test made?

It’s pretty simple. I got the program Super Pi, and for the single thread test, I ran one occurence of the program. For the multi thread test, I ran 10. All tests were made with the 2M calculation. The test is run on the same ESX host, which is in a production environment, in a low load period. Tests are run multiple times over the course of 3 hours, to adjust somewhat for inaccurancy in case of some sudden loads. 

The numbers, how should I read them?

Well, the number of vCPU is pretty self-explanatory. The number at the end of the bar though, it’s the average finish time (lets call that AFT from now on) of the threads. In the single thread test, that’s simply the average of how long Super Pi took to finish. In the multi thread test, it’s all the 10 finish times added together, then divided by 10. 

Single thread

Let me warn you, this first test is slightly weird, and I’m still unable to explain it. But have a look at the numbers first, then we’ll dive into it. 

 

1 vCPU takes 38,3 seconds to finish. Okay, noted, no problem.
2 vCPU takes 46,1 seconds to finish, hmm, that’s quite a performance hit, roughly 20%. 4 vCPU must be horrible then!
4 vCPU takes 39,3 seconds… Wait, what? That’s only 2% slower than 1 vCPU. 

These numbers are consistent, I ran this test on the same ESX host every 15-20min over 3 hours, and they kept ending up this way. If anyone reading has the sligthest idea why this happens, I would love to be enlightened. 

Anyway, this shows a slight performance degradation from 1 vCPU to 4 vCPU, 2% roughly. This means, that if your environment is able to reserve the cores needed to run a 4 vCPU VM, it wont really perform worse when it’s not multi threading. 

Multi thread

This is mainly why I made the test, to clarify how much multi threading you actually gain from more than one vCPU. Like the last time, take a look at the numbers. 

 

Notice how the 2 vCPU performs how you would expect when multi threading. 

1 vCPU is 10 times slower to calculate 10 threads than 1, makes sense.
2 vCPU is ~49,4% faster than 1 vCPU, pretty good deal.
4 vCPU is ~73,4% faster than 1 vCPU,  ~47,4% faster than 2 vCPU. Not as good a deal, but still pretty good. 

Putting it together

I’ve put the two graphs in the same chart, simply because I could, which gives me the opportunity to post one more. Behold! 

 

A final round of numbers. This is the average finish time per thread. The calculation is (AFT * Number of vCPU) / Number of threads. The number is in seconds.

 Single thread
1 vCPU: 38,3
2 vCPU: 46,1 (20,3% slower than 1 vCPU)
4 vCPU: 39,3 (2,6% slower than 1 vCPU) 

Multi thread
1 vCPU: 38,6 (0,08% percent slower than single thread)
2 vCPU: 39,1 (17,9% faster than 2 vCPU single thread, 5,1% faster than 4 vCPU multi thread)
4 vCPU: 41,1 (4,6% slower than 4 vCPU single thread) 

A note: I did also do the test with 8 vCPU, but because the ESX host only has 8 cores, it’s pretty hard for the hypervisor to reserve the CPU cores, and the tests were horrible 🙂 

What does this tell us? Well, if you’re not running a terminal server, examine if you need to use more than 1 vCPU. CPU resources aren’t the most expensive in a virtual environment, but if you just pop 4 vCPU in every VM you create, you will run into a CPU limit very fast. Imagine you have a ESX who can have 100 VMs with 1 vCPU (fictional numbers here), you might instead only be able to support 50-70 with 2 vCPU, or 10-20 with 4 vCPU. These numbers aren’t real, but what I’m trying to say, is that the curve is exponential, meaning you will loose more than you gain. 

As said, the above is not a fact sheet, but it should help you make decisions. Sure you might loose some extra CPU by creating a 4 vCPU VM instead of 2 VMs with 2 vCPU each. On the other hand, you might be able to save RAM and storage by just creating a single 4 vCPU VM, which might be a better choice for you, because those 2 resources might cost you more than CPU. Usually we run out of RAM before anything else 🙂

Written by dryerr

April 2, 2010 at 22:45

Posted in VMWare ESX

Tagged with ,

One Response

Subscribe to comments with RSS.

  1. […] with one comment The vCPU benchmark can be found here: https://dryerr.wordpress.com/2010/04/02/vcpu-performance-degradation/ […]


Leave a comment