I have an interesting scenario (HP vs DELL hardware) with potentially degraded performance (specific to the DELL R815 hardware) and I would like to know if what I am seeing is being interpreted correctly, or whether am I simply being over cautious and don't actually have an issue.
Summary;
- Although a much higher hardware specification, the DELL R815 ESXi hosts are not scheduling the CPU cycles as efficiently as the HP DL585 G6 hardware. The impact we are seeing is an increased CPU ready time and performance degradation of the guest VM’s. This is evident with a very low number of guest VM’s on the host and increases as the consolidation ratio is ramped up or the CPU load is increased on any of the guest VM’s.
- There also appears to be an imbalance in the NUMA nodes where a particular node is favoured and the % NUMA local memory is not as efficient as it should be (ie. the HP hardware performs much better than the DELL hardware)
DELL Technical Details;
Hypervisor : VMware ESXi 4.1.0, build 582267
Hardware specification;
Dell PowerEdge R815
- Model : AMD Opteron(tm) Processor 6174
- Processor Speed : 2.2 GHz
- Processor Sockets : 4
- Processor Cores per Socket : 12
- Logical Processors : 48
- Memory : 256 GB
esxtop performance statistics;
DELL Memory (incl NUMA statistics);
Dell CPU;
Observations;
- NUMA home node #7 is favoured, rather than balancing the load across all 8x nodes
- % NUMA local memory is inefficiently allocated
- Very low consolidation ratio of guest VM’s per host
- Very low load on the host and already seeing ready time
Example of the affected Guest VM
DELL Host is under no load whatsoever;
As a contrasting perspective from a heavily loaded HP DL585 G6 host, this is what I would “expect” to see;
HP Technical Details;
Hypervisor : VMware ESXi 4.1.0, build 582267
HP Hardware specification;
HP ProLiant DL585 G6
- Model : Six-Core AMD Opteron(tm) Processor 8435
- Processor Speed : 2.6 GHz
- Processor Sockets : 4
- Processor Cores per Socket : 6
- Logical Processors : 24
- Memory : 128 GB
esxtop performance statistics;
HP Memory (incl NUMA statistics);
HP CPU;
Observations;
- HP host is of a lower hardware specification than the DELL host
- HP host has almost 4x the number of guest VM’s hosted and does not suffer from the same performance issues
- NUMA home node #0 is favoured, but there is a much better allocation of NUMA local memory (more efficient) – close to 100%
- Much higher consolidation ratio of guest VM’s per host without performance issues
- Much higher load on the host and almost ZERO ready time
HP host still has capacity, but is under much more load the than the affected DELL host;
In both cases (HP and DELL) we do expect to see a certain level of ready time, but the levels seen on the DELL hardware are of concern, as well as the inefficient use of NUMA local memory. This issues is not seen on the HP hardware, including earlier and later generation hardware.
So the questions are;
- Have I interpreted this correctly?
- Has anyone else see this before? If yes, how was this resolved?
- What next steps can be taken to test and verify this information