Linux kernel parameter – hung_task_timeout_secs

GNU/Linux or UNIX Parameter

$cat /proc/sys/kernel/hung_task_timeout_secs 120 $

Parameter Related

TEST-MAIL1 ~ #dmesg [cut] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. rm D ffff88107f472c40 0 16705 22512 0x00000000 ffff881014693810 0000000000000086 ffff881000000000 ffff88102013b040 0000000000012c40 ffff880471855fd8 0000000000012c40 ffff880471854010 ffff880471855fd8 0000000000012c40 ffff881017ff8e40 0000000100000000 Call Trace: [] ? schedule_timeout+0x1ed/0x2d0 [] ? dlmlock+0x8a/0xda0 [ocfs2_dlm] [] ? wait_for_common+0x12c/0x1a0 [] ? try_to_wake_up+0x280/0x280 [] ? __ocfs2_cluster_lock+0x1f0/0x780 [ocfs2] [] ? wait_for_common+0x150/0x1a0 [] ? ocfs2_buffer_cached+0x8c/0x180 [ocfs2] [] ? ocfs2_inode_lock_full_nested+0x126/0x540 [ocfs2] [] ? ocfs2_lookup_lock_orphan_dir+0x6e/0x1b0 [ocfs2] [] ? ocfs2_lookup_lock_orphan_dir+0x6e/0x1b0 [ocfs2] [] ? ocfs2_prepare_orphan_dir+0x4a/0x290 [ocfs2] [] ? ocfs2_unlink+0x6e1/0xbb0 [ocfs2] [] ? may_link+0xda/0x170 [] ? vfs_unlink+0x9e/0x100 [] ? do_unlinkat+0x1a1/0x1d0 [] ? vfs_readdir+0xa0/0xe0 [] ? fsnotify_find_inode_mark+0x2b/0x40 [] ? dnotify_flush+0x54/0x110 [] ? filp_close+0x5c/0x90 [] ? system_call_fastpath+0x16/0x1b


While waiting for read() or write() to/from a file descriptor return, the process will be put in a special kind of sleep, known as "D" or "Disk Sleep". This is special, because the process can not be killed or interrupted while in such a state. A process waiting for a return from ioctl() would also be put to sleep in this manner. source :

Parameter Code Internals

/* * Check whether a TASK_UNINTERRUPTIBLE does not get woken up for * a really long time (120 seconds). If that happens, print out * a warning. */ static void check_hung_uninterruptible_tasks(unsigned long timeout) { int max_count = sysctl_hung_task_check_count; int batch_count = HUNG_TASK_BATCHING; struct task_struct *g, *t; /* * If the system crashed already then all bets are off, * do not report extra hung tasks: */ if (test_taint(TAINT_DIE) || did_panic) return; rcu_read_lock(); do_each_thread(g, t) { if (!max_count--) goto unlock; if (!--batch_count) { batch_count = HUNG_TASK_BATCHING; rcu_lock_break(g, t); /* Exit if t or g was unhashed during refresh. */ if (t->state == TASK_DEAD || g->state == TASK_DEAD) goto unlock; } /* use "==" to skip the TASK_KILLABLE tasks waiting on NFS */ if (t->state == TASK_UNINTERRUPTIBLE) check_hung_task(t, timeout); } while_each_thread(g, t); unlock: rcu_read_unlock(); }

Related From Research Paper

Kernel data collection tools. Several monitoring facilities are provided by the Linux kernel, which have been exploited in this work. In particular, we use KProbes which inserts breakpoints in arbitrary binary code locations in charge of triggering user-defined handler functions. Handlers can be used to collect information about internal kernel variables; subsequently, kernel execution is restored. Kdump is a tool for failure data collection based on the execution of a secondary kernel, namely capture kernel, which is preliminarily loaded into a reserved memory region. When the primary kernel fails, the capture kernel is executed; then, it can collect failure data by reading the main memory state. Built-in hang detection mechanisms. Several hang detection mechanisms are available in the Linux OS, which can be enabled by recompiling the kernel. In particular, the following facilities can be used for hang detection: Soft lockup detection, i.e., the kernel detects whether a "canary" task is not scheduled within a timeout; Hard lockup detection, i.e., if any CPU in the system does not handles local timer interrupt for longer than a timeout; Sleep-inside-spinlock checking, i.e., assertions that verify whether there are spinlocks that have been acquired before calling a sleeping function (i.e., a function during which the current thread may block and be preempted by the scheduler); Checks on lock API usage, that is: missing lock initialization, release of an already freed lock, release of a lock by a thread or CPU different from the lock holder, lock data structure corruption. source : Assessment and Improvement of Hang Detection in the Linux Operating System 2009 28th IEEE International Symposium on Reliable Distributed Systems

Copyright © 2009,  2010,  2011,  2012,  2013, 2014, 2015     BeautifulWork Project    e-mail:
BeautifulWork Project comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law.