Checking System Performance - top, htop, And nmon

Ted LeRoy
A free video tutorial from Ted LeRoy
Enterprise Security Architect - Online Instructor
4.5 instructor rating • 3 courses • 8,419 students

Lecture description

Ubuntu Server - Managing The System - top

In this lesson, you’ll learn how to see what processes are using system resources with top.

top is like a text based version of Windows Task Manager, and like Task Manager, it refreshes every few seconds so you can see what’s going on on your system real time.

top

The top command is very capable and flexible. There are many pages in the man pages for it. It even has its own table of contents in the man pages!

We’ll look at basic usage and some of the things for you to be aware of when using it in this lesson.

Output

top refreshes every 3 seconds by default, constantly telling you what processes are the top users of CPU and Memory. 

You can see, it provides a LOT of information.

We’ll have a look at the output now.

Load average, toward the end of the first line shows you the load average for the last minute, 5 minutes, and 15 minutes.

You can see if this is increasing or decreasing over time.

A load average of 1.00 for a single CPU or single core system means the CPU is at full capacity. A value of greater than 1.00 means things are getting queued and users could experience delays.

If you have a dual-core processor, a number of 2.00 would mean both CPU’s are at full capacity.

Four processors would have 4.0 to be fully utilized.

These numbers may jump to numbers in the 1 to 10 range for brief periods if some system intensive process, like a virus scan, kick off. They should not be above 1.0 (or 1.0 * Number of cores) for extended periods.

If it jumps really high (254 for example), process may start timing out and dying.

Line 2 shows how many tasks are running, and their status.

Possible statuses are:

  • Running - Process was running at last polling interval.
  • Sleeping - Normal state for a process that was active, but is now waiting for input.
  • Stopped - A process is stopping. If you see this, it should clear quickly once the process stops.
  • Zombie - Process stopped but couldn’t sent its exit status to its parent. These aren’t good, but aren’t something to likely worry about unless there are a lot of them. It’s also possible that the program that created them is not written well, so you may want to look into that. They sometimes go away on their own, but will go away on rebooting.

Line 3 is CPU activity.  

us: refers to user space. Usually commands started by normal users.

sy: system space. Usually kernel routines doing their thing.

ni: time spent on low priority processes.

id: inactivity. CPU idle time. Indicates processor isn’t doing anything.

wa: waiting. Should always be low. If not, you may have a hardware mismatch with your hard drive(s).

hi: hardware interrupt. Time CPU spent communicating with hardware. Could be high at certain times (when reading from a CD or DVD) but should normally be low.

si: software interrupt. Time CPU spent communicating with software. Should be low.

st: stolen time. On a virtualized system, such as one running on a virtualization platform, like VMware ESXi Server, this can go up when many VM’s are working hard at the same time.

Lines 4 and 5 show memory information. Keep an eye on swap space. It should almost always be low.

Under the headings is information about running processes.

You’re familiar with most of the headings.

Here are some you may not have seen before:

PR: priority - a low number means it will get some CPU time soon. rt means real time.

NI: niceness value. 

-20 very high priority (don’t assign this to a process manually. It may take over and not let other processes run.

19 very nice (very low priority). Process will be the last to receive CPU time.

Pressing the n key will let you re-nice or change the niceness level of a process you specify.

VIRT: Total amount of memory claimed by the process.

RES: Resident memory size.

SHR: Amount of shared memory.

S: Status - Similar to the statuses with ps.

%CPU and %MEM are the same as with ps, as is command.

Some things to watch closely if your system is having trouble are: 

%CPU - us, and sy should be below 1.0 per processor (i.e. below 2.0 for dual core, and below 4.0 for quad core), and wa should be a low number.

See what processes are very high in CPU and memory utilization. If it’s a database server, that is busy doing database transactions, then it’s normal for the database app to be at the top of the list.

If not, see what’s taking the CPU resources.

HTOP

htop is a top-like program, but it is not installed by default.

I like it and you may want to check it out. 

Just run:

sudo apt update

sudo apt install htop

And to run htop, just type htop.

NMON

nmon is similar to top and htop, but shows several types of data, such as disk status along with process data.

Learn more from the full course

Ubuntu Linux Fundamentals Linux Server Administration Basics

Updated for Ubuntu 20.04 - The Latest! Gain essential skills with Linux Server in this 11 hour Beginner's course.

11:18:00 of on-demand video • Updated March 2021

  • You will learn what Linux is
  • Installing Linux
  • Working at the command line and why the Command Line Interface is so simple yet powerful
  • Configuring and securing remote access with SSH
  • Securing your server, ufw, apt update and upgrade
  • Stopping bad guys with Fail2ban
  • Installing and securing nginx web server
  • Managing users and groups
  • How to use the Linux file system
English [Auto] In this lesson you'll learn to see what processes are using system resources with the top command top is a flexible command with many many possible uses. The main page has its own table of contents so to run it all you have to do is type top. Of course this runs it with the default options. You can see the output refreshes about every three seconds by default and we'll look at each of these options and what they mean the first field in the top row is uptime. So this system's been up for five days. It's had two users log in to load average shows the average load on the CPSU for one minute than five minutes and then 15 minutes it's very low right now because this is just a virtual machine we're using for this lesson. But if it were a server it could be actually doing things like serving up Web pages if it's a web server or managing a database if it's a database server this load average should always be below one point zero. If it's a single processor system if it's a dual core processor you're running on it should be below 2 for a quad core it should be below for if it jumps up above one per processor for a short time that's OK if it stays up consistently you could have a problem with your system and you look at the load average in conjunction with some other things like what processes using the most CPE and memory to figure out what to look at and possibly kill the next line is tasks. So there are 106 total tasks running right now one is running one hundred and five or sleeping there are none that are stopped stopped would only be a really transient thing you wouldn't see something stuck in stopped normally. If you saw one stopped for example because a process was stopping you would probably go away at the next refresh and we have no zombies zombies kind of like The Walking Dead are processes that have stopped but can't be unloaded from memory they weren't able to send their exit status to their parent when they were closing it's usually due to bad programming. If you see a lot of zombies you might want to take a look at what process started them could be a programming flaw in that program those will go away when you reboot your system but if it's a programming problem that brought them on they'll be back hungry for brains. Line 3 is the CPR activity. Again it's very low because this system isn't really doing anything right now except running top so the US is per user space. These would be processes that you may have started or a regular user on the system may have started as wise of course system space usually just kernel routines doing their thing and I as the time spent on low priority processes I.D. is idle time again since our system's not doing much. It's mostly idle it's hovering around 97 percent to 100 percent idle w a is wait time there should always be low if not have a look at your hardware you could have a mismatch between your hard drives and your system each AI is hardware interrupt this is time the CPSU has spent communicating with hardware it could be high at certain times like if you're reading from a DVD drive but should normally be quite low Asi is software interrupt this is time the CPC spent communicating with software and this should generally be low s t is stolen time and you have many virtual servers running and several are busy they can steal time from each other so they can all be using the system efficiently as ti will tell you how much of your time was stolen by other systems as TS should be zero on a hardware install but may go up some if a virtualization server is kind of busy lines 4 and 5 show memory information you swap space utilization should always be fairly low if swap size is consistently high you may need more memory or you may be trying to run too many VMS on the system. The problem with high swap utilization means that you're using your hard drive and place of memory which is many many times slower and is very rough on your hard drive. Below this bar we have information on the running processes you're familiar with many of these like PD and user some that you haven't seen before are PR which is priority a low number means it will get some CPM time soon. Our team means it's running real time so you can see this process migration 0 is real time an eye is for niceness value if a process is nice it doesn't put many demands on the system it lets other processes run first a low value here means it's not very nice to other processes and we'll take the system every time it gets a chance. Could you take over the system if it's a heavy use process and it's minus 20. You would not want to manually set a process to minus 20 especially if it could be a high use process. 19 is very nice or very low priority. So this K huge paged demon is very nice and will let almost anything run before it. Use caution when changing the niceness level manually it'll be covered in a different lesson vert is the total amount of virtual memory claimed by a process rez is resident memory size and S H R is the amount of shared memory. S stands for status and is similar to the statuses we spoke about in P S we have percent memory and percent CPR you again. You understand those it's telling us how much CPR and memory each process is using and then we have the command that was run to actually start the process some things to watch closely. If your system is having trouble are CPR you want to make sure you us and s y are consistently below 1. If it's a single processor system to its dual core or four if it's a quad core system if they jump up once in a while that's OK. You just don't want them to go up and stay there and be there be above one consistently. If they are take a look at your system and your processes and try and figure out what's going on you'll also want to see which processes are consistently high on your list so if this is a database server and you see my ask you all consistently up here that's OK. It's a database server it's a database server and you see another service consistently above that that could be a problem you might look at what else your database server is running so that's it for top top is installed by default on almost any Linux Unix or BSD system. Top is a very powerful program so please experiment with it and try it out. There's another free program you can check out another open source program called each top. It's not installed by default so we'll have to install it. You have to do to install it is type sudo apt. UPDATE And then sudo apt install each top to start it. All you have to do is type each top he can see it gives kind of a graphical interface. The information is organized a little differently and you have a menu down here at the bottom. It's kind of a modernized version of Top. I don't often recommend downloading and installing new software in the course but this is one that I like another similar program is and mine again and mine is not installed by default so to get it you type sudo apt install and mod again to start. You just type in mod so I just typed C then M C for you m for memory and D for disk I O so we can see all those statistics in an easy to read nicely formatted manner. All you do is type help if you want to see how to bring something else up. It'll toggle into help by pressing h a total out by pressing H again. And mine and H top are two that I recommend you download and take a look at. Onto the next lesson.