Abstract: |
In UNIX terminal sessions, you usually have a key like
C-c (Control-C) to immediately end whatever program you
have running in the foreground. This should work even when the program
you called has called other programs in turn. Everything should be
aborted, giving you your command prompt back, no matter how deep the
call stack is.
Basically, it's trivial. But the existence of interactive applications that use SIGINT and/or SIGQUIT for other purposes than a complete immediate abort make matters complicated, and - as was to expect - left us with several ways to solve the problems. Of course, existing shells and applications follow different ways. This web page outlines different ways to solve the problem and argues that only one of them can do everything right, although it means that we have to fix some existing software. |
---|---|
Intended audience: | Programmers who implement programs that catch SIGINT/SIGQUIT.
Programmers who implements shells or shell-like programs that execute batches of programs. Users who have problems problems getting rid of runaway shell
scripts using |
Required knowledge: | You have to know what it means to catch SIGINT or SIGQUIT and how processes are waiting for other processes (children) they spawned. |
You may change the key that triggers the signal using stty
...`
. Running programs may remap the SIGINT-sending key at
any time they like, without your intervention and without asking you
first.
The usual reaction of a running program to SIGINT is to exit. However, not all program do an exit on SIGINT, programs are free to use the signal for other actions or to ignore it at all.
All programs running in the foreground receive the signal. This may be a nested "stack" of programs: You started a program that started another and the outer is waiting for the inner to exit. This nesting may be arbitrarily deep.
The innermost program is the one that decides what to do on SIGINT. It may exit, do something else or do nothing. Still, when the user hit SIGINT, all the outer programs are awaken, get the signal and may react on it.
Let us consider the most basic script:
#! /bin/sh program1 program2and the usual run looks like this:
$ sh myscript [output of program1] [output of program2] $
Let us assume that both programs do nothing special on SIGINT, they just exit when you hit Control-C.
Now imagine the user hits C-c while the shellscript is executing its
first program. The following programs receive SIGINT:
1) program1 and
2) also the shell executing the script.
Program1 exits on that SIGINT, so much is clear.
But what should the shell do? If we say that it is only the innermost's programs business to react on SIGINT, the shell will do nothing special (not exit) and it will continue the execution of the script and run program2. But this is wrong: The users intention in hitting C-c is to abort the whole script, to get the command prompt back. If he hits C-c while the first program is running, he does not want program2 to be even started.
Here is what would happen if the shell didn't do anything:
$ sh myscript [first half of program1's output] C-c [users presses C-c] [second half of program1's output will not be displayed] [output of program2 will appear]
Consider a more annoying example:
#! /bin/sh # let's assume there are 300 *.dat files for file in *.dat ; do dat2ascii $dat doneIf your shell wouldn't end if the user hits
C-c
,
C-c
would just end one dat2ascii run and
the script would continue. Thus, you had to hit C-c
up to
300 times to end this script.
There are several ways to handle abortion of shell scripts when SIGINT is received while a foreground child runs:
There are programs that use the signal SIGINT for other purposes than exiting. They use it as a normal keystroke. The user is expected to use the key that sends SIGINT during a perfectly normal program run. As a result, the user sends SIGINT in situations where he/she does not want the program or the script to end.
The primary example is the Emacs editor: C-g does what ESC does in other applications: It cancels a partially executed or prepared operation. Technically, Emacs remaps the key that sends SIGINT from C-c to C-g and catches SIGINT.
Remember that the SIGINT is sent to all programs running in the foreground. If Emacs is executing from a shell script, both emacs and the shell get SIGINT. Emacs is the program that decides what to do: Exit on SIGINT or not. Emacs decides not to exit. The problem arises when the shell draws its own conclusions from receiving SIGINT without consulting Emacs for its "opinion" on the matter of exiting.
Consider this script:
#! /bin/sh emacs /tmp/foo # this is supposed to be called when the user is finished editing foo cp /tmp/foo /home/user/mail/sent
If C-g is used in Emacs, both the shell and Emacs will have received SIGINT. Emacs will not exit, the user used C-g as a normal editing keystroke, he/she does not want the script to be aborted on C-g.
The central problem is that the second command (cp) may unintentionally be killed when the shell draws its own conclusion about the user's intention. The innermost program is the only one to judge. If the shell decides "hey, I have seen SIGINT, so I exit after this program" then your cp` command will never be issued and you lose this file.
Imagine a mail session using a curses mailer in a tty. You called
your mailer and started to compose a message. Your mailer calls emacs.
C-g
is a normal editing key in emacs. Technically it
sends SIGINT to all of:
If everyone just exits on SIGINT, you will be left with nothing but your login shell, without asking.
But for sure, when using what is an ordinary keystroke in your Emacs session, you don't want to be dropped out of your editor and out of your mailer back to the commandline, having your edited data and mailer status deleted.
Understand the difference: While C-g
is used an a kind
of abort key in emacs, it isnt the major "abort everything" key. When
you use C-g
in emacs, you want to end some internal emacs
command. You don't want your whole emacs and mailer session to end.
So, if the shell exits immediately if the user sends SIGINT (the second of the four ways shown above), the parent of emacs would die, leaving emacs without the controlling tty. The user will lose it's editing session immediately and unrecoverable. If the "main" shell of the operating system defaults to this behavior, every editor session that is spawned from a mailer or such will break (because it is usually executed by system(3), which calls /bin/sh). This was the case in FreeBSD before I and Bruce Evans changed it in 1998.
If the shell recognized that SIGINT was sent and exits after the current foreground process exited (the third way of the four), the editor session will not be disturbed, but things will still fail because all the post-editing command will never be called
Still considering this script to examine the shell's actions in the IUE, WUE and ICE way of handling SIGINT:
#! /bin/sh emacs /tmp/foo cp /tmp/foo /home/user/mail/sent
The IUE ("immediate unconditional exit") way does not work at all: emacs wants to survive the SIGINT (it's a normal editing key for emacs), but its parent shell unconditionally thinks "We received SIGINT. Abort everything. Now.". The shell will exit even before emacs exits. But this will leave emacs in an unusable state, since the death of its calling shell will leave it without required resources (file descriptors). This way does not work at all for shellscripts that call programs that use SIGINT for other purposes than immediate exit. Even for programs that exit on SIGINT, but want to do some cleanup between the signal and the exit, may fail before they complete their cleanup.
It should be noted that this way has one advantage: If a child blocks SIGINT and does not exit at all, this way will get control back to the user's terminal. But since such programs should be banned from your system anyway, I don't think that weighs against the disadvantages.
WUE ("wait and unconditional exit") is a little more clever: If C-g was used in emacs, the shell will get SIGINT. It will not immediately exit, but remember the fact that a SIGINT happened. When emacs ends (maybe a long time after the SIGINT), it will say "Ok, a SIGINT happened sometime while the child was executing, the user wants the script to be discontinued". It will then exit. The cp will not be executed. But that's bad. The "cp" will be executed when the emacs session ended without the C-g key ever used, but it will not be executed when the user used C-g at least one time. That is clearly not desired. Since C-g is a normal editing key in emacs, the user expects the rest of the script to behave identically no matter what keys he used.
As a result, the "WUE" way is better than the "IUE" way in that it does not break SIGINT-using programs completely. The emacs session will end undisturbed. But it still does not support scripts where other actions should be performed after a program that use SIGINT for non-exit purposes. Since the behavior is basically undeterminable for the user, this can lead to nasty surprises.
The "WCE" way fixes this by "asking" the called program whether it exited on SIGINT or not. While emacs receives SIGINT, it does not exit on it and a calling shell waiting for its exit will not be told that it exited on SIGINT. (Although it receives SIGINT at some point in time, the system does not enforce that emacs will exit with "I-exited-on-SIGINT" status. This is under emacs' control, see below).
This still works for the normal script without SIGINT-using programs:
#! /bin/sh program1 program2Unless program1 and program2 mess around with signal handling, the system will tell the calling shell whether the programs exited normally or as a result of SIGINT.
The "WCE" way then has an easy way to things right: When one called program exited with "I-exited-on-SIGINT" status, it will discontinue the script after this program. If the program ends without this status, the next command in the script is started.
It is important to understand that a shell in "WCE" modus does not need to listen to the SIGINT signal at all. Both in the "emacs-then-cp" script and in the "several-normal-programs" script, it will be woken up and receive SIGINT when the user hits the corresponding key. But the shell does not need to react on this event and it doesn't need to remember the event of any SIGINT, either. Telling whether the user wants to end a script is done by asking that program that has to decide, that program that interprets keystrokes from the user, the innermost program.
The problem with the "WCE" modus is that there are broken programs that do not properly communicate the required information up to the calling program.
Unless a program messes with signal handling, the right thing happens automatically.
There are programs that want to exit on SIGINT, but they don't let the system do the automatic exit, because they want to do some cleanup. To do so, they catch SIGINT, do the cleanup and then exit by themselves.
And here is where the problem arises: Once they catch the signal, the system will no longer communicate the "I-exited-on-SIGINT" status to the calling program (the parent). Even if the program exits immediately in the signal handler of SIGINT. Once it catches the signal, it has to take care of communicating the signal status itself.
Some programs don't do this. On SIGINT, they do cleanup and exit immediatly, but the calling shell isn't told about the non-normal exit and it will call the next program in the script.
As a result, the user hits SIGINT and while one program exits, the shellscript continues. To him/her it looks like the shell fails to obey to his abortion command.
Both IUE or WUE shell would not have this problem, since they discontinue the script on their own. But as I said, they don't support programs using SIGINT for non-exiting purposes, no matter whether these programs properly communicate their signal status to the calling shell or not.
Since some shells in wide use implement the WUE way, there is a considerable number of broken programs out there that break WCE shells. The programmers just don't recognize it if their shell isn't WCE.
(Short note in advance: What you need to achieve is that WIFSIGNALED(status) is true in the calling program and that WTERMSIG(status) returns SIGINT.)
If you don't catch SIGINT, the system automatically does the right thing for you: Your program exits and the calling program gets the right "I-exited-on-SIGINT" status after waiting for your exit.
But once you catch SIGINT, you have to take care of the proper way to exit after whatever cleanup you do in your SIGINT handler.
Decide whether the SIGINT is used for exit/abort purposes and hence a shellscript calling this program should discontinue. This is hopefully obvious. If you just need to do some cleanup on SIGINT, but then exit immediately, the answer is "yes".
If so, you have to tell the calling program about it by exiting with the "I-exited-on-SIGINT" status.
There is no other way of doing this than to kill yourself with a SIGINT signal. Do it by resetting the SIGINT handler to SIG_DFL, then send yourself the signal.
void sigint_handler(int sig) { [do some cleanup] signal(SIGINT, SIG_DFL); kill(getpid(), SIGINT); }Notes:
In a bourne shell script, you can catch signals using the
trap
command. Here, the same as for C programs apply. If
the intention of SIGINT is to end your program, you have to exit in a
way that the calling programs "sees" that you have been killed. If
you don't catch SIGINT, this happend automatically, but of you catch
SIGINT, i.e. to do cleanup work, you have to end the program by
killing yourself, not by calling exit.
Consider this example from FreeBSD's mkdep
, which is a
bourne shell script.
TMP=_mkdep$$ trap 'rm -f $TMP ; trap 2 ; kill -2 $$' 1 2 3 13 15Yes, you have to do it the hard way. It's even more annoying in shell scripts than in C programs since you can't "pre-delete" temporary files (which isn't really portable in C, though).
All this applies to programs in all languages, not only C and bourne shell. Every language implementation that lets you catch SIGINT should also give you the option to reset the signal and kill yourself.
It is always desireable to exit the right way, even if you don't expect your usual callers to depend on it, some unusual one will come along. This proper exit status will be needed for WCE and will not hurt when the calling shell uses IUE or WUE.
Make sure people understand why you can't fake an exit-on-signal by doing exit(...) using any numerical status.
Make sure you use a shell that behaves right. Especially if you develop programs, since it will help seeing problems. The major reason why we have commandline programs that don't properly exit with WIFSIGNALED detectable status is broken shells that just exit unconditional no matter what. So programmers who use these shells will never notice the problems in their programs.
Method sign | Does what? | Example shells that implement it: | What happens when a shellscript called emacs, the user used
C-g and the script has additional commands in it? |
What happens when a shellscript called emacs, the user did not use
C-c and the script has additional commands in it? |
What happens if a non-interactive child catches SIGINT? | To behave properly, children must do what? |
---|---|---|---|---|---|---|
IUE | The shell executing a script exits immediately if it receives SIGINT. | 4.4BSD ash (ash), NetBSD, FreeBSD prior to 3.0/22.8 | The editor session is lost and subsequent commands are not executed. | The editor continues as normal and the subsequent commands are executed. | The scripts ends immediately, returning to the caller even before the current foreground child of the shell exits. | It doesn't matter what the child does or how it exits, even if the child continues to operate, the shell returns. |
WUE | If the shell executing a script received SIGINT while a foreground process was running, it will exit after that child's exit. | pdksh (OpenBSD /bin/sh) | The editor continues as normal, but subsequent commands from the script are not executed. | The editor continues as normal and subsequent commands are executed. | The scripts returns to its caller after the current foreground child exits, no matter how the child exited. | It doesn't matter how the child exits (signal status or not), but if it doesn't return at all, the shell will not return. In no case will further commands from the script be executed. |
WCE | The shell exits if a child signaled that it was killed on a signal (either it had the default handler for SIGINT or it killed itself). | bash (Linux /bin/sh), most commercial /bin/sh, FreeBSD /bin/sh from 3.0/2.2.8. | The editor continues as normal and subsequent commands are executed. | The editor continues as normal and subsequent commands are executed. | The scripts returns to its caller after the current foreground child exits, but only if the child exited with signal status. If the child did a normal exit (even if it received SIGINT, but catches it), the script will continue. | The child must be implemented right, or the user will not be able to break shell scripts reliably. |