What if 'kill -9' does not work?

  • I have a process I can't kill with kill -9 <pid>. What's the problem in such a case, especially since I am the owner of that process. I thought nothing could evade that kill option.

  • kill -9 (SIGKILL) always works, provided you have the permission to kill the process. Basically either the process must be started by you and not be setuid or setgid, or you must be root. There is one exception: even root cannot send a fatal signal to PID 1 (the init process).

    However kill -9 is not guaranteed to work immediately. All signals, including SIGKILL, are delivered asynchronously: the kernel may take its time to deliver them. Usually, delivering a signal takes at most a few microseconds, just the time it takes for the target to get a time slice. However, if the target has blocked the signal, the signal will be queued until the target unblocks it.

    Normally, processes cannot block SIGKILL. But kernel code can, and processes execute kernel code when they call system calls. Kernel code blocks all signals when interrupting the system call would result in a badly formed data structure somewhere in the kernel, or more generally in some kernel invariant being violated. So if (due to a bug or misdesign) a system call blocks indefinitely, there may effectively be no way to kill the process. (But the process will be killed if it ever completes the system call.)

    A process blocked in a system call is in uninterruptible sleep. The ps or top command will (on most unices) show it in state D (originally for “disk”, I think).

    A classical case of long uninterruptible sleep is processes accessing files over NFS when the server is not responding; modern implementations tend not to impose uninterruptible sleep (e.g. under Linux, the intr mount option allows a signal to interrupt NFS file accesses).

    You may sometimes see entries marked Z (or H under Linux, I don't know what the distinction is) in the ps or top output. These are technically not processes, they are zombie processes, which are nothing more than an entry in the process table, kept around so that the parent process can be notified of the death of its child. They will go away when the parent process pays attention (or dies).

    Yor reply looks self contradicting. You start telling SIGKILL always works but end citing the uninterruptible sleep case, where SIGKILL might never work outside shutting down the kernel. There are also two cases where SIGKILL doesn't work. With zombies obviously as you can't kill already dead processes and with init, which by design is ignoring SIGKILL signals.

    @jlliagre: Killing a zombie doesn't make sense, it's not alive to begin with. And killing a process in interruptible sleep *does* work, it's just (as with other signals) asynchronous. I've tried to clarify this in my edit.

    I wrote too killing a zombie doesn't make sense but that doesn't prevent many people to try it and complain. Killing a process in interruptible sleep indeed works by design, but I was talking about killing a process in uninterruptible sleep which can fail if the system call never wake up.

    `man 5 nfs`: "The `intr`/`nointr` mount option is deprecated after kernel 2.6.25. Only SIGKILL can interrupt a pending NFS operation on these kernels, and if specified, this mount option is ignored to provide backwards compatibility with older kernels."

    I had problems killing an `ls` process accessing an `sshfs` mount, when the remote server has beome unreachable. Is there a mount option for FUSE or sshfs, which I could use in future to avoid such situations? 2.6.30 kernel

    @imz--IvanZakharyaschev Not that I know of (but I might not know). With sshfs, as a last resort, you can kill the `sshfs` process (and likewise with any other FUSE filesystem: you can always force-unmount this way).

    @Gilles Thanks, your advice (what to do with any FUSE fs) helped. I haven't been able to get rid of those hanging sshfs mounts for months, and some GUIs listing the filesystem tree were unusable simply because there were such mountpoints in my home directory (Thunar, "open file" and "save as" dialogs in most programs). Now I was able to simply kill the sshfs processes, and everything is fine again! So, in a sense, user-space FSs are superior to kernel-space FSs in the sense of the usability/convenience fo the system for the user!

    @imz--IvanZakharyaschev: Heh, users of microkernels have known this convenience for a long time. Disk totally stuck? Kill and respawn disk server. NFS stuck? Kill and respawn nfs daemon. Since everything is a process, it is very hard to really hang a microkernel OS.

    *"`kill -9` (SIGKILL) always works, provided you have the permission..."* - I'm not sure that's correct. It depends on the process state in the kernel. There's a couple of states the process can be in such that it can't be killed. I wish I could find the post that discusses it....

    @jww There are process states where the process can't be killed, but the signal is queued up and can't be cancelled. As I explain, SIGKILL always *works*, but it doesn't always work *immediately*.

    Something that worked for me is `for i in $(seq 1 1000); do sudo kill -9 ; done;`. I just did it out of frustration but it actually worked.

    @GreenRaccoon23 What worked wasn't sending the signal multiple times, sending it once would have had exactly the same effect. What worked was waiting long enough for the signal to be processed.

    @Tshepang .. lot of options here. If you know the port.. kill $(lsof -t -i:) Or else pkill .. or if nothing else works you should go ahead with killing tty :)

    as indicated here, the uninterruptable sleep may be triggered by unmounting/remounting/reconnecting the media/link: https://www.redhat.com/archives/rhl-list/2004-January/msg04543.html Unplugging a USB device (appearing as a serial port and mass-storage) fixed my issue of sublime-text hanging

    Any chance that "killing the kernel" (rebooting?) during this blockage could permanently mess something up on the bootup disk or in a peripheral?

    @sudo Rebooting won't increase the likelyhood of messing something up. It isn't mathematically impossible, but the reboot wouldn't be the cause of the mess. If the reason for the effectively unkillable process is a buggy driver or hardware, then the buggy driver or hardware could cause a mess, but rebooting won't make it worse.

    The process can also be blocked if it was accessing `sshfs`

    We recently had a case where a daemon using the `bluez` API would stop responding to the KILL signal.

    @AyberkÖzgür That's presumably a bug in a Bluetooth driver.

    @Gilles I suspect the same.

    @Gilles zombie processes can consume resources. I was in the python debugger and the python program started another process. I hit "ctrl" + 'c' which left a zombie that used 1/2 of a CPU. I owned it, but kill -9 didn't kill it, ever. Exiting the debugger killed it though.

    @VectorVortec If it used CPU time, it wasn't a zombie.

    @Gilles I agree "Killing a zombie doesn't make sense" --- but "what if zombie come to life" :-)))))) Happy Christmas

    `Kill -9 ALWAYS works` Except when it doesn't. How do you kill a zombie process? I have a FM that's hung, and I can't open new instances of it, but I can't kill it either. In a case like this, what options do you have other than restarting?

    @DouglasGaskell See the last paragraph of my answer. A zombie process is already dead. If you can't open new instances, it isn't because of the zombie. It may be because the program died without deleting some lock file, or because of some completely different bug in the program.

    I had a process in the state of waiting for a system call, and eventually it exited, but it was in a Z, not D state...thanks for the tip!

    The idea is there (sort of) that a process being suspended while executing kernel code cannot be killed, but the devil is in the details. Where shall I start?

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM