How to Check Why a Process Has Stopped Unexpectedly in Linux

How-to-Guides from Kryotech

JB Benjamin
How-To-Guides from Kryotech

--

What Do I Need?

Why do Processes Die?

Some unexpected behavior on the web server can at times be caused by system resource limitations. Processes often do this and sometimes quite excessively. Linux by its design aims to use all of the available physical memory as efficiently as possible. However, in practice, the Linux kernel follows a basic rule that a page of free ram is wasted RAM. The system holds a lot more in RAM than just application data; most importantly mirrored data from storage drives for faster access.

Unexpectedly Stopped Process

Suddenly killed tasks are often the result of the system running out of memory, which is when the so-called out-of-memory, or ‘OOM’, makes its presence known. Sort of like the eponymous blue screen of death, or ‘BSoD’, that plagues early versions of Windows.

You can search the logs for messages of out of memory alerts:

sudo grep -i -r ‘out of memory’ /var/log/

Grep goes through all the logs in the directory and will show all the commands just run from /var/log/auth.log. Actual indicators of an OOM killed process would look something like this:

The log noted here shows the process killed was mysqld with PID 9163 and OOM score of 511 at the time it was killed. Messages and their content will vary depending on the distribution of Linux that you’re using. If, for example, a process crucial to your web application was killed as a result of an OOM situation, you have a few options: reduce the amount of memory asked by the process, disallow processes to overcommit memory, or simply add more memory to your server configuration.

Current Resource Usage

Linux comes with a number of awesome tools for tracking processes that can help with identifying potential resource shortfalls:

free -h

Here it’s important to make the distinction between application used memory, buffers, and caches. On the ‘Mem’ line of the output it’d appear nearly 75% of our RAM is in use, but then again over half of the used memory is occupied by cached data.

The difference is that while applications reserve memory for their own use, the cache is commonly used hard drive data that the kernel stores temporarily in memory space for faster access, which on the application level is considered free memory.

Keeping that in mind, it’s easier to understand why used and free memory are listed twice; on the second line is the actual memory usage when taking into account the amount of memory occupied by buffers and cache.

In this example, the system is using merely 234MB of the total available 993MB, and no process is in immediate danger of being killed to save resources.

Another useful tool for memory monitoring is ‘top’, which displays useful continuously updated information about processes’ memory and CPU usage, runtime, and other statistics. This is particularly useful for identifying resource exhaustive tasks:

top

Demonstrated above is a system showing nominal to profile.

Check if Your Processes are at Risk

If your server’s memory gets used up to the extent to which it can threaten system stability, the out of memory killer will choose which process to eliminate based on many variables, such as the amount of work done that would be lost and total memory freed. Linux keeps a score for each running process, which represents the likelihood at which the process would be killed in an OOM situation.

This score is stored on file in /proc/<pid>/oom_score. PID is the identification number for the process you’re looking into. The PID can be easily found using the following command:

ps aux ¦ grep <process name>

The command when searching for MySQL would be similar to the following:

Next Steps

After completing the above steps it’s always a good idea to disable the ‘over commit’ function that’s inherent in most Linux distributions. The kernel allows by default for processes to request more memory than is currently free in the system to improve memory utilization. This is based on heuristics that the processes never truly use all the memory they request. However, if your system is at risk of running out of memory, and you wish to prevent losing tasks to the OOM killer, it’s possible to disable memory overcommit. And let’s not forget, if you’re running low on usable memory you can always consider actually upgrading your web server hardware.

Conclusion

Over consumption of system allocated resources can be an indicator of a number of things, including an actual attack by bad actors. Be aware, take precautions, and stay secure.

--

--