Friday, November 28, 2014

Monitoring Load average


Load average represents the number of processes waiting in the OS scheduler queue. Unlike CPU, load average will increase when any resource is limited (e.g. CPU, network, disk, memory etc.). Please refer to the blog “Understanding Linux Load Average” for more details.
We can use the following load average values to decide whether a machine is loaded
  • If load average < number of cores, then the machine is not loaded
  • If load average == number of cores, then the machine is in full use
  • If load average >= 4X number of cores, then the machine is highly loaded
  • If load average >= around 40X number of cores, then the machine is unusable

One mistake often made is profiling the CPU without first determining whether it is truly a CPU bound use case. Although CPU utilization shown by top command is low, machine may be busy doing IO (e.g. reading disk, writing to network). Load average is a much better metric for determining whether the machine is loaded.

Generally three figures are seen in top or uptime command for load average, something like -
1.08, 0.8. 4.0. This means load average over 1,5 and 15 mins.

In top command, at the top-right hand corner you should see your load averages displayed as three numbers. These numbers are setup as follows:

Load Averages: 1.00 2.00 3.00
1.00 - 1 Minute average
2.00 - 5 Minute average
3.00 - 15 Minute average

In the example above, these load averages would indicate that 15 minutes ago the server was averaging a load of 3.00, while 5 minutes ago it was averaging a load of 2.00, and over the last minute it's been averaging a load of 1.00. This means that over the course of 15 minutes the server was doing a lot of work, 5 minutes ago it cut that workload in half, and then within the last minute it was cut in half again.

The Linux load average on a server is a rough estimate of the workload currently waiting to process.

No comments:

Post a Comment