Thursday, 12 September 2013

Enable Core File Dumps for Application Crashes or Segmentation Faults

Core dumps are really useful debugging tool for system administrators especially when applications like Apache or MySQL crashes. When an application is poorly written or has a bug on, it may tries to access memory location that it is not supposed to and segmentation fault occurs.

Redhat and it's clones core file creation is disabled by default. You can verify this by

# ulimit -c
0

Where zero is the size of the core dump file. Do the following steps

Step 1: Set limit for core dump file 

You can restrict your core dump file to 100MB (102400KB), you should say that in kilobytes.

vim /etc/security/limits.conf

Replace  #*               soft    core            0  with  *               soft    core            102400

Step 2: Set limit for all the users

Now you need to change this in /etc/profile file. If you are using RHEL you can find

# vim /etc/profile
ulimit -S -c 0 > /dev/null 2>1

Replace the above ulimit command in /etc/profile with the following

ulimit -S -c 102400 > /dev/null 2>1

If you don't find it in any of your Redhat clones don't worry just add this to /etc/profile file.

Step 3: Enable debugging for all the applications

To enable debugging for all the applications, edit  /etc/sysconfig/init file and add the following
# vim /etc/sysconfig/init

DAEMON_COREFILE_LIMIT=102400

If you want to enable core dumping for a particular application you can say that in it's corresponding file in /etc/sysconfig/ directory.

Now edit /etc/sysctl.conf file and add the following

# vim /etc/sysctl.conf

kernel.core_pattern = /tmp/core-%e-%s-%u-%g-%p-%t
fs.suid_dumpable = 2

When an application crashes or killed by a signal, a core dump file is created inside the directory /tmp named core, but you can define the core dump file name with the following template which can contain % specifiers which are substituted by the following values when a core file is created:

%% - A single % character
%p - PID of dumped process
%u - real UID of dumped process
%g - real GID of dumped process
%s - number of the signal causing dump
%t - time of dump (seconds since 0:00h, 1 Jan 1970)
%h - hostname
%e - application file name

Reload the settings in /etc/sysctl.conf by running the following command
# sysctl -p

Step 4: Test to see core dump file is creating on crash

Now I'm going to kill Apache using the signal SIGQUIT, which dumps core on termination. Don't try this in production environment.

# kill -s SIGQUIT `cat /var/run/httpd/httpd.pid`

Now you can see a core dump file is created in the /tmp dictionary.

# ls -l /tmp
-rw-------  1 root  root  5431296 Jan 12 21:07 core-httpd-3-0-0-4277-1379034432

Where httpd is the name of application which was killed by the signal SIGQUIT, 3 is the signal number, two  zeros means GID and UID of root, 4277 is the PID and 1379034432 is the time of crash.

Step 5: Analyse the core dump file

You need gdb (GNU Debugger) to analyse the core dump file. You may not have gdb on your system, install gdb.

# yum install gdb

You need use the gdb command as follows
# gdb <path of the application> <path of the core dump file>

# which httpd
/usr/sbin/httpd

Now you may run gdb

# gdb /usr/sbin/httpd /tmp/core-httpd-3-0-0-24631-1374966080
Core was generated by `/usr/sbin/httpd'.
Program terminated with signal 3, Quit.

You may get error message like “Missing separate debuginfos, use: debuginfo-install httpd-2.2.15-28.el6.centos.i686", you need to have yum-utils packages installed on your system, yum-utils packages provides the command debuginfo-install. Check if you have installed yum-utils package on your system.

# rpm -qa | grep yum-utils
yum-utils-1.1.30-14.el6.noarch

If not install it and run. If you still getting error message like “Could not find debuginfo for main pkg: httpd-2.2.15-28.el6.centos.i686”, edit  the file /etc/yum.repos.d/CentOS-Debuginfo.repo and set enabled=1. If you use some other Redhat clone you need to use the appropriate Debuginfo repo file.

Tuesday, 10 September 2013

Processes, Daemons, Signals and Services

Process - Process is a running program. Each process is uniquely identified by a number called a process ID (PID). Similar to files, each process has one owner and group, and the owner and group permissions are used to determine which files and devices the process can open.  Init, the parent of all processes is the first process to start at boot time and has a PID of 1. A process state could be running (R), sleeping(S), stopped(T) or zombie* (Z). We can find the state of the process from the STAT field of the ps command or from the S field of the top command. To find zombie process

# ps aux | grep Z

Daemon - A daemon is a process which runs in background and has no controlling terminal.

Signal - A signal is a notification sent to a process or to a specific thread within the process to notify that an event occurred. Signals are used for the communication between user processes and from kernel to user process. We can communicate with a daemon or any running process by sending a signal using the command kill.

Signal name starts with SIG and is defined by numbers between 1-64. The kill -l command will display all signals with signal number and corresponding signal name. While on the other hand fuser -l will give you only used signal names.

# kill -l

  1) SIGHUP           2) SIGINT           3) SIGQUIT         4) SIGILL             5) SIGTRAP
  6) SIGABRT         7) SIGBUS          8) SIGFPE           9) SIGKILL        10) SIGUSR1
11) SIGSEGV       12) SIGUSR2      13) SIGPIPE        14) SIGALRM     15) SIGTERM
16) SIGSTKFLT   17) SIGCHLD     18) SIGCONT     19) SIGSTOP      20) SIGTSTP
.
.
.
# fuser -l
HUP INT QUIT ILL TRAP ABRT IOT BUS FPE KILL USR1 SEGV USR2 PIPE ALRM TERM
STKFLT CHLD CONT STOP TSTP TTIN TTOU URG XCPU XFSZ VTALRM PROF WINCH IO PWR SYS UNUSED

There are five default disposition for each signal

1. Term (terminate the process),
2. Ign (Ignore the signal)
3. Core (Terminate the process and dump core)
4. Stop (Stop the process)
5. Cont (Continue the process if it is currently stopped)

The Signals every system admin should know!

1 SIGHUP This signal indicates that someone has killed the terminal program, without killing applications running inside of terminal window. Once they receive this signal, process will restart and re-read the configuration file, same as calling init q. You can make processes immune to SIGHUP signals so that they can continue to run after the user logs out with the nohup command. Default handler for this signal will terminate your program.

After changing a web server's configuration file, the web server needs to be told to re-read its configuration. Restarting Apache would result in a brief outage period on the web server. Instead, send the daemon the SIGHUP signal, same as gracefully restarting Apache  # /etc/init.d/httpd graceful.

2 SIGINT This signal being sent from kernel to your application when an user tries to end it by pressing Ctrl+C (Mostly when a process freezes). It’s a request to terminate the current operation. Most programs will stop (if they catch the signal) or simply allow themselves to be killed, which is the default if the signal is not caught. Default handler for this signal will terminate your program.

3 SIGQUIT Signal is used to stop the processes that could not be killed with SIGINT, and you can do it by pressing Ctrl + \  Default handler for this signal will terminate the process and dump to a core file.

6 SIGABRT It is yet another method to terminate your program used as an emergency stop. The function abort() issues SIGABRT signal which terminates your program. Normally initiated by a debugging process or self-detected error. Default handler for this signal terminate and leave a core file for debugging purposes.

9 SIGKILL This signal terminates the program operation ungracefully; the program may not save open files, etc. This signal can not be ignored by a process. This is the “I do not care what you are doing, stop right now” signal. Sending a SIGKILL to a process will usually stop that process there and then. Default handler for this signal will terminate your program.

10 SIGUSR1 This is a general purpose signal available for programs to use in whatever way they’d like. For example, the Apache web server interprets the SIGUSR1 signal as a request to gracefully restart.

11 SIGSEGV If an application is badly written and tries to access memory that it is not supposed to, kernel send the process Segmentation Violation signal. This is often caused by reading off the end of an arrays.

12 SIGUSR2 This is also a general purpose signal available for programs to use in whatever way they’d like. You may use this signal to synchronize your program with some other program or to communicate with it.

15 SIGTERM This signal terminates the program operation gracefully, close any log files it may have open, and attempt to finish what it is doing before shutting down. In some cases, a process may ignore SIGTERM if it is in the middle of some task that can not be interrupted. This is the default signal sent by the kill command. Default handler for this signal will terminate your program.

17 SIGCHLD Kernel sends a process this signal when a child process of your program has stopped or terminated. We can use this signal to kill a zombie process, first find the zombie’s parent PID (PPID) then send him the SIGCHLD signal, kill -17 ppid. Default handler for this signal will ignore your process.

18 SIGCONT If a process has been suspended by sending SIGSTOP signal then the process will continue it's execution if it receives a SIGCONT signal. Default handler for this signal will continue your process, if stopped.

19 SIGSTOP If a process has been suspended by sending SIGSTOP signal then the process will continue it's execution if it receives a SIGCONT signal. Default handler for this signal will stop your process.

20 SIGTSTP Both SIGTSTP and SIGSTOP are designed to suspend a process which will be eventually resumed with SIGCONT. The main differences between them are SIGSTOP is a signal sent as a script(eg: kill -STOP pid ) while SIGTSTP is sent by a user pressing Control-Z on his keyboard. Default handler for this signal will stop your process.

Services - In Windows, daemons are called services. We can run this services by typing services.msc at the command prompt. Linux has a command called /sbin/service, used to run init scripts which are located in /etc/init.d/SCRIPT

* A zombie process or defunct process is a process that has completed execution but still has an entry in the process table, usually because of bugs and coding errors, and is waiting for it's parent process to pick up the return value. A zombie process is different from an orphan process. An orphan process is a process that is still executing, but it's parent process has died, and are adopted by init.

Note 1
The signals named SIGKILL and SIGSTOP cannot be caught, blocked, or ignored. The SIGKILL signal destroys the receiving process, and SIGSTOP suspends its execution until a SIGCONT signal is received. SIGCONT may be caught or ignored, but not blocked.

Note 2
You can view the key mappings that sends specific signal to a process using the “stty -a” command as shown below.

# stty -a
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; swtch = <undef>; start = ^Q;
stop = ^S; susp = ^Z; rprnt = ^R; werase = ^W; lnext = ^V; flush = ^O;