summaryrefslogtreecommitdiff
path: root/kernel/workqueue.c
AgeCommit message (Collapse)Author
2013-03-13workqueue: rename @id to @pi in for_each_each_pool()Tejun Heo
Rename @id argument of for_each_pool() to @pi so that it doesn't get reused accidentally when for_each_pool() is used in combination with other iterators. This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org>
2013-03-13workqueue: update comments and a warning messageTejun Heo
* Update incorrect and add missing synchronization labels. * Update incorrect or misleading comments. Add new comments where clarification is necessary. Reformat / rephrase some comments. * drain_workqueue() can be used separately from destroy_workqueue() but its warning message was incorrectly referring to destruction. Other than the warning message change, this patch doesn't make any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>
2013-03-13workqueue: fix max_active handling in init_and_link_pwq()Tejun Heo
Since 9e8cd2f589 ("workqueue: implement apply_workqueue_attrs()"), init_and_link_pwq() may be called to initialize a new pool_workqueue for a workqueue which is already online, but the function was setting pwq->max_active to wq->saved_max_active without proper synchronization. Fix it by calling pwq_adjust_max_active() under proper locking instead of manually setting max_active. Signed-off-by: Tejun Heo <tj@kernel.org>
2013-03-13workqueue: implement and use pwq_adjust_max_active()Tejun Heo
Rename pwq_set_max_active() to pwq_adjust_max_active() and move pool_workqueue->max_active synchronization and max_active determination logic into it. The new function should be called with workqueue_lock held for stable workqueue->saved_max_active, determines the current max_active value the target pool_workqueue should be using from @wq->saved_max_active and the state of the associated pool, and applies it with proper synchronization. The current two users - workqueue_set_max_active() and thaw_workqueues() - are updated accordingly. In addition, the manual freezing handling in __alloc_workqueue_key() and freeze_workqueues_begin() are replaced with calls to pwq_adjust_max_active(). This centralizes max_active handling so that it's less error-prone. Signed-off-by: Tejun Heo <tj@kernel.org>
2013-03-13workqueue: relocate pwq_set_max_active()Tejun Heo
pwq_set_max_active() is gonna be modified and used during pool_workqueue init. Move it above init_and_link_pwq(). This patch is pure code reorganization and doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org>
2013-03-13workqueue: convert to idr_alloc()Tejun Heo
idr_get_new*() and friends are about to be deprecated. Convert to the new idr_alloc() interface. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-12workqueue: implement current_is_workqueue_rescuer()Tejun Heo
Implement a function which queries whether it currently is running off a workqueue rescuer. This will be used to convert writeback to workqueue. Signed-off-by: Tejun Heo <tj@kernel.org>
2013-03-12workqueue: implement sysfs interface for workqueuesTejun Heo
There are cases where workqueue users want to expose control knobs to userland. e.g. Unbound workqueues with custom attributes are scheduled to be used for writeback workers and depending on configuration it can be useful to allow admins to tinker with the priority or allowed CPUs. This patch implements workqueue_sysfs_register(), which makes the workqueue visible under /sys/bus/workqueue/devices/WQ_NAME. There currently are two attributes common to both per-cpu and unbound pools and extra attributes for unbound pools including nice level and cpumask. If alloc_workqueue*() is called with WQ_SYSFS, workqueue_sysfs_register() is called automatically as part of workqueue creation. This is the preferred method unless the workqueue user wants to apply workqueue_attrs before making the workqueue visible to userland. v2: Disallow exposing ordered workqueues as ordered workqueues can't be tuned in any way. Signed-off-by: Tejun Heo <tj@kernel.org>
2013-03-12workqueue: reject adjusting max_active or applying attrs to ordered workqueuesTejun Heo
Adjusting max_active of or applying new workqueue_attrs to an ordered workqueue breaks its ordering guarantee. The former is obvious. The latter is because applying attrs creates a new pwq (pool_workqueue) and there is no ordering constraint between the old and new pwqs. Make apply_workqueue_attrs() and workqueue_set_max_active() trigger WARN_ON() if those operations are requested on an ordered workqueue and fail / ignore respectively. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: make it clear that WQ_DRAINING is an internal flagTejun Heo
We're gonna add another internal WQ flag. Let's make the distinction clear. Prefix WQ_DRAINING with __ and move it to bit 16. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: implement apply_workqueue_attrs()Tejun Heo
Implement apply_workqueue_attrs() which applies workqueue_attrs to the specified unbound workqueue by creating a new pwq (pool_workqueue) linked to worker_pool with the specified attributes. A new pwq is linked at the head of wq->pwqs instead of tail and __queue_work() verifies that the first unbound pwq has positive refcnt before choosing it for the actual queueing. This is to cover the case where creation of a new pwq races with queueing. As base ref on a pwq won't be dropped without making another pwq the first one, __queue_work() is guaranteed to make progress and not add work item to a dead pwq. init_and_link_pwq() is updated to return the last first pwq the new pwq replaced, which is put by apply_workqueue_attrs(). Note that apply_workqueue_attrs() is almost identical to unbound pwq part of alloc_and_link_pwqs(). The only difference is that there is no previous first pwq. apply_workqueue_attrs() is implemented to handle such cases and replaces unbound pwq handling in alloc_and_link_pwqs(). Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: perform non-reentrancy test when queueing to unbound workqueues tooTejun Heo
Because per-cpu workqueues have multiple pwqs (pool_workqueues) to serve the CPUs, to guarantee that a single work item isn't queued on one pwq while still executing another, __queue_work() takes a look at the previous pool the target work item was on and if it's still executing there, queue the work item on that pool. To support changing workqueue_attrs on the fly, unbound workqueues too will have multiple pwqs and thus need non-reentrancy test when queueing. This patch modifies __queue_work() such that the reentrancy test is performed regardless of the workqueue type. per_cpu_ptr(wq->cpu_pwqs, cpu) used to be used to determine the matching pwq for the last pool. This can't be used for unbound workqueues and is replaced with worker->current_pwq which also happens to be simpler. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: prepare flush_workqueue() for dynamic creation and destrucion of ↵Tejun Heo
unbound pool_workqueues Unbound pwqs (pool_workqueues) will be dynamically created and destroyed with the scheduled unbound workqueue w/ custom attributes support. This patch synchronizes pwq linking and unlinking against flush_workqueue() so that its operation isn't disturbed by pwqs coming and going. Linking and unlinking a pwq into wq->pwqs is now protected also by wq->flush_mutex and a new pwq's work_color is initialized to wq->work_color during linking. This ensures that pwqs changes don't disturb flush_workqueue() in progress and the new pwq's work coloring stays in sync with the rest of the workqueue. flush_mutex during unlinking isn't strictly necessary but it's simpler to do it anyway. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: implement get/put_pwq()Tejun Heo
Add pool_workqueue->refcnt along with get/put_pwq(). Both per-cpu and unbound pwqs have refcnts and any work item inserted on a pwq increments the refcnt which is dropped when the work item finishes. For per-cpu pwqs the base ref is never dropped and destroy_workqueue() frees the pwqs as before. For unbound ones, destroy_workqueue() simply drops the base ref on the first pwq. When the refcnt reaches zero, pwq_unbound_release_workfn() is scheduled on system_wq, which unlinks the pwq, puts the associated pool and frees the pwq and wq as necessary. This needs to be done from a work item as put_pwq() needs to be protected by pool->lock but release can't happen with the lock held - e.g. put_unbound_pool() involves blocking operations. Unbound pool->locks are marked with lockdep subclas 1 as put_pwq() will schedule the release work item on system_wq while holding the unbound pool's lock and triggers recursive locking warning spuriously. This will be used to implement dynamic creation and destruction of unbound pwqs. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: restructure __alloc_workqueue_key()Tejun Heo
* Move initialization and linking of pool_workqueues into init_and_link_pwq(). * Make the failure path use destroy_workqueue() once pool_workqueue initialization succeeds. These changes are to prepare for dynamic management of pool_workqueues and don't introduce any functional changes. While at it, convert list_del(&wq->list) to list_del_init() as a precaution as scheduled changes will make destruction more complex. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: drop WQ_RESCUER and test workqueue->rescuer for NULL insteadTejun Heo
WQ_RESCUER is superflous. WQ_MEM_RECLAIM indicates that the user wants a rescuer and testing wq->rescuer for NULL can answer whether a given workqueue has a rescuer or not. Drop WQ_RESCUER and test wq->rescuer directly. This will help simplifying __alloc_workqueue_key() failure path by allowing it to use destroy_workqueue() on a partially constructed workqueue, which in turn will help implementing dynamic management of pool_workqueues. While at it, clear wq->rescuer after freeing it in destroy_workqueue(). This is a precaution as scheduled changes will make destruction more complex. This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: add pool ID to the names of unbound kworkersTejun Heo
There are gonna be multiple unbound pools. Include pool ID in the name of unbound kworkers. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: drop "std" from cpu_std_worker_pools and for_each_std_worker_pool()Tejun Heo
All per-cpu pools are standard, so there's no need to use both "cpu" and "std" and for_each_std_worker_pool() is confusing in that it can be used only for per-cpu pools. * s/cpu_std_worker_pools/cpu_worker_pools/ * s/for_each_std_worker_pool()/for_each_cpu_worker_pool()/ Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: remove unbound_std_worker_pools[] and related helpersTejun Heo
Workqueue no longer makes use of unbound_std_worker_pools[]. All unbound worker_pools are created dynamically and there's nothing special about the standard ones. With unbound_std_worker_pools[] unused, workqueue no longer has places where it needs to treat the per-cpu pools-cpu and unbound pools together. Remove unbound_std_worker_pools[] and the helpers wrapping it to present unified per-cpu and unbound standard worker_pools. * for_each_std_worker_pool() now only walks through per-cpu pools. * for_each[_online]_wq_cpu() which don't have any users left are removed. * std_worker_pools() and std_worker_pool_pri() are unused and removed. * get_std_worker_pool() is removed. Its only user - alloc_and_link_pwqs() - only used it for per-cpu pools anyway. Open code per_cpu access in alloc_and_link_pwqs() instead. This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: implement attribute-based unbound worker_pool managementTejun Heo
This patch makes unbound worker_pools reference counted and dynamically created and destroyed as workqueues needing them come and go. All unbound worker_pools are hashed on unbound_pool_hash which is keyed by the content of worker_pool->attrs. When an unbound workqueue is allocated, get_unbound_pool() is called with the attributes of the workqueue. If there already is a matching worker_pool, the reference count is bumped and the pool is returned. If not, a new worker_pool with matching attributes is created and returned. When an unbound workqueue is destroyed, put_unbound_pool() is called which decrements the reference count of the associated worker_pool. If the refcnt reaches zero, the worker_pool is destroyed in sched-RCU safe way. Note that the standard unbound worker_pools - normal and highpri ones with no specific cpumask affinity - are no longer created explicitly during init_workqueues(). init_workqueues() only initializes workqueue_attrs to be used for standard unbound pools - unbound_std_wq_attrs[]. The pools are spawned on demand as workqueues are created. v2: - Comment added to init_worker_pool() explaining that @pool should be in a condition which can be passed to put_unbound_pool() even on failure. - pool->refcnt reaching zero and the pool being removed from unbound_pool_hash should be dynamic. pool->refcnt is converted to int from atomic_t and now manipulated inside workqueue_lock. - Removed an incorrect sanity check on nr_idle in put_unbound_pool() which may trigger spuriously. All changes were suggested by Lai Jiangshan. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: introduce workqueue_attrsTejun Heo
Introduce struct workqueue_attrs which carries worker attributes - currently the nice level and allowed cpumask along with helper routines alloc_workqueue_attrs() and free_workqueue_attrs(). Each worker_pool now carries ->attrs describing the attributes of its workers. All functions dealing with cpumask and nice level of workers are updated to follow worker_pool->attrs instead of determining them from other characteristics of the worker_pool, and init_workqueues() is updated to set worker_pool->attrs appropriately for all standard pools. Note that create_worker() is updated to always perform set_user_nice() and use set_cpus_allowed_ptr() combined with manual assertion of PF_THREAD_BOUND instead of kthread_bind(). This simplifies handling random attributes without affecting the outcome. This patch doesn't introduce any behavior changes. v2: Missing cpumask_var_t definition caused build failure on some archs. linux/cpumask.h included. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: kbuild test robot <fengguang.wu@intel.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: separate out init_worker_pool() from init_workqueues()Tejun Heo
This will be used to implement unbound pools with custom attributes. This patch doesn't introduce any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: replace POOL_MANAGING_WORKERS flag with worker_pool->manager_arbTejun Heo
POOL_MANAGING_WORKERS is used to synchronize the manager role. Synchronizing among workers doesn't need blocking and that's why it's implemented as a flag. It got converted to a mutex a while back to add blocking wait from CPU hotplug path - 6037315269 ("workqueue: use mutex for global_cwq manager exclusion"). Later it turned out that synchronization among workers and cpu hotplug need to be done separately. Eventually, POOL_MANAGING_WORKERS is restored and workqueue->manager_mutex got morphed into workqueue->assoc_mutex - 552a37e936 ("workqueue: restore POOL_MANAGING_WORKERS") and b2eb83d123 ("workqueue: rename manager_mutex to assoc_mutex"). Now, we're gonna need to be able to lock out managers from destroy_workqueue() to support multiple unbound pools with custom attributes making it again necessary to be able to block on the manager role. This patch replaces POOL_MANAGING_WORKERS with worker_pool->manager_arb. This patch doesn't introduce any behavior changes. v2: s/manager_mutex/manager_arb/ Signed-off-by: Tejun Heo <tj@kernel.org>
2013-03-12workqueue: update synchronization rules on worker_pool_idrTejun Heo
Make worker_pool_idr protected by workqueue_lock for writes and sched-RCU protected for reads. Lockdep assertions are added to for_each_pool() and get_work_pool() and all their users are converted to either hold workqueue_lock or disable preemption/irq. worker_pool_assign_id() is updated to hold workqueue_lock when allocating a pool ID. As idr_get_new() always performs RCU-safe assignment, this is enough on the writer side. As standard pools are never destroyed, there's nothing to do on that side. The locking is superflous at this point. This is to help implementation of unbound pools/pwqs with custom attributes. This patch doesn't introduce any behavior changes. v2: Updated for_each_pwq() use if/else for the hidden assertion statement instead of just if as suggested by Lai. This avoids confusing the following else clause. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: update synchronization rules on workqueue->pwqsTejun Heo
Make workqueue->pwqs protected by workqueue_lock for writes and sched-RCU protected for reads. Lockdep assertions are added to for_each_pwq() and first_pwq() and all their users are converted to either hold workqueue_lock or disable preemption/irq. alloc_and_link_pwqs() is updated to use list_add_tail_rcu() for consistency which isn't strictly necessary as the workqueue isn't visible. destroy_workqueue() isn't updated to sched-RCU release pwqs. This is okay as the workqueue should have on users left by that point. The locking is superflous at this point. This is to help implementation of unbound pools/pwqs with custom attributes. This patch doesn't introduce any behavior changes. v2: Updated for_each_pwq() use if/else for the hidden assertion statement instead of just if as suggested by Lai. This avoids confusing the following else clause. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: replace get_pwq() with explicit per_cpu_ptr() accesses and ↵Tejun Heo
first_pwq() get_pwq() takes @cpu, which can also be WORK_CPU_UNBOUND, and @wq and returns the matching pwq (pool_workqueue). We want to move away from using @cpu for identifying pools and pwqs for unbound pools with custom attributes and there is only one user - workqueue_congested() - which makes use of the WQ_UNBOUND conditional in get_pwq(). All other users already know whether they're dealing with a per-cpu or unbound workqueue. Replace get_pwq() with explicit per_cpu_ptr(wq->cpu_pwqs, cpu) for per-cpu workqueues and first_pwq() for unbound ones, and open-code WQ_UNBOUND conditional in workqueue_congested(). Note that this makes workqueue_congested() behave sligntly differently when @cpu other than WORK_CPU_UNBOUND is specified. It ignores @cpu for unbound workqueues and always uses the first pwq instead of oopsing. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: remove workqueue_struct->pool_wq.singleTejun Heo
workqueue->pool_wq union is used to point either to percpu pwqs (pool_workqueues) or single unbound pwq. As the first pwq can be accessed via workqueue->pwqs list, there's no reason for the single pointer anymore. Use list_first_entry(workqueue->pwqs) to access the unbound pwq and drop workqueue->pool_wq.single pointer and the pool_wq union. It simplifies the code and eases implementing multiple unbound pools w/ custom attributes. This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: consistently use int for @cpu variablesTejun Heo
Workqueue is mixing unsigned int and int for @cpu variables. There's no point in using unsigned int for cpus - many of cpu related APIs take int anyway. Consistently use int for @cpu variables so that we can use negative values to mark special ones. This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: add wokrqueue_struct->maydays list to replace mayday cpu iteratorsTejun Heo
Similar to how pool_workqueue iteration used to be, raising and servicing mayday requests is based on CPU numbers. It's hairy because cpumask_t may not be able to handle WORK_CPU_UNBOUND and cpumasks are assumed to be always set on UP. This is ugly and can't handle multiple unbound pools to be added for unbound workqueues w/ custom attributes. Add workqueue_struct->maydays. When a pool_workqueue needs rescuing, it gets chained on the list through pool_workqueue->mayday_node and rescuer_thread() consumes the list until it's empty. This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: restructure pool / pool_workqueue iterations in freeze/thaw functionsTejun Heo
The three freeze/thaw related functions - freeze_workqueues_begin(), freeze_workqueues_busy() and thaw_workqueues() - need to iterate through all pool_workqueues of all freezable workqueues. They did it by first iterating pools and then visiting all pwqs (pool_workqueues) of all workqueues and process it if its pwq->pool matches the current pool. This is rather backwards and done this way partly because workqueue didn't have fitting iteration helpers and partly to avoid the number of lock operations on pool->lock. Workqueue now has fitting iterators and the locking operation overhead isn't anything to worry about - those locks are unlikely to be contended and the same CPU visiting the same set of locks multiple times isn't expensive. Restructure the three functions such that the flow better matches the logical steps and pwq iteration is done using for_each_pwq() inside workqueue iteration. * freeze_workqueues_begin(): Setting of FREEZING is moved into a separate for_each_pool() iteration. pwq iteration for clearing max_active is updated as described above. * freeze_workqueues_busy(): pwq iteration updated as described above. * thaw_workqueues(): The single for_each_wq_cpu() iteration is broken into three discrete steps - clearing FREEZING, restoring max_active, and kicking workers. The first and last steps use for_each_pool() and the second step uses pwq iteration described above. This makes the code easier to understand and removes the use of for_each_wq_cpu() for walking pwqs, which can't support multiple unbound pwqs which will be needed to implement unbound workqueues with custom attributes. This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: introduce for_each_pool()Tejun Heo
With the scheduled unbound pools with custom attributes, there will be multiple unbound pools, so it wouldn't be able to use for_each_wq_cpu() + for_each_std_worker_pool() to iterate through all pools. Introduce for_each_pool() which iterates through all pools using worker_pool_idr and use it instead of for_each_wq_cpu() + for_each_std_worker_pool() combination in freeze_workqueues_begin(). Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: replace for_each_pwq_cpu() with for_each_pwq()Tejun Heo
Introduce for_each_pwq() which iterates all pool_workqueues of a workqueue using the recently added workqueue->pwqs list and replace for_each_pwq_cpu() usages with it. This is primarily to remove the single unbound CPU assumption from pwq iteration for the scheduled unbound pools with custom attributes support which would introduce multiple unbound pwqs per workqueue; however, it also simplifies iterator users. Note that pwq->pool initialization is moved to alloc_and_link_pwqs() as that now is the only place which is explicitly handling the two pwq types. This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: add workqueue_struct->pwqs listTejun Heo
Add workqueue_struct->pwqs list and chain all pool_workqueues belonging to a workqueue there. This will be used to implement generic pool_workqueue iteration and handle multiple pool_workqueues for the scheduled unbound pools with custom attributes. This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: introduce kmem_cache for pool_workqueuesTejun Heo
pool_workqueues need to be aligned to 1 << WORK_STRUCT_FLAG_BITS as the lower bits of work->data are used for flags when they're pointing to pool_workqueues. Due to historical reasons, unbound pool_workqueues are allocated using kzalloc() with sufficient buffer area for alignment and aligned manually. The original pointer is stored at the end which free_pwqs() retrieves when freeing it. There's no reason for this hackery anymore. Set alignment of struct pool_workqueue to 1 << WORK_STRUCT_FLAG_BITS, add kmem_cache for pool_workqueues with proper alignment and replace the hacky alloc and free implementation with plain kmem_cache_zalloc/free(). In case WORK_STRUCT_FLAG_BITS gets shrunk too much and makes fields of pool_workqueues misaligned, trigger WARN if the alignment of struct pool_workqueue becomes smaller than that of long long. Note that assertion on IS_ALIGNED() is removed from alloc_pwqs(). We already have another one in pwq init loop in __alloc_workqueue_key(). This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: make workqueue_lock irq-safeTejun Heo
workqueue_lock will be used to synchronize areas which require irq-safety and there isn't much benefit in keeping it not irq-safe. Make it irq-safe. This patch doesn't introduce any visible behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-12workqueue: make sanity checks less punshing using WARN_ON[_ONCE]()sTejun Heo
Workqueue has been using mostly BUG_ON()s for sanity checks, which fail unnecessarily harshly when the assertion doesn't hold. Most assertions can converted to be less drastic such that things can limp along instead of dying completely. Convert BUG_ON()s to WARN_ON[_ONCE]()s with softer failure behaviors - e.g. if assertion check fails in destroy_worker(), trigger WARN and silently ignore destruction request. Most conversions are trivial. Note that sanity checks in destroy_workqueue() are moved above removal from workqueues list so that it can bail out without side-effects if assertion checks fail. This patch doesn't introduce any visible behavior changes during normal operation. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-03-08workqueue: fix possible pool stall bug in wq_unbind_fn()Lai Jiangshan
Since multiple pools per cpu have been introduced, wq_unbind_fn() has a subtle bug which may theoretically stall work item processing. The problem is two-fold. * wq_unbind_fn() depends on the worker executing wq_unbind_fn() itself to start unbound chain execution, which works fine when there was only single pool. With multiple pools, only the pool which is running wq_unbind_fn() - the highpri one - is guaranteed to have such kick-off. The other pool could stall when its busy workers block. * The current code is setting WORKER_UNBIND / POOL_DISASSOCIATED of the two pools in succession without initiating work execution inbetween. Because setting the flags requires grabbing assoc_mutex which is held while new workers are created, this could lead to stalls if a pool's manager is waiting for the previous pool's work items to release memory. This is almost purely theoretical tho. Update wq_unbind_fn() such that it sets WORKER_UNBIND / POOL_DISASSOCIATED, goes over schedule() and explicitly kicks off execution for a pool and then moves on to the next one. tj: Updated comments and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
2013-03-04workqueue: better define synchronization rule around rescuer->pool updatesLai Jiangshan
Rescuers visit different worker_pools to process work items from pools under pressure. Currently, rescuer->pool is updated outside any locking and when an outsider looks at a rescuer, there's no way to tell when and whether rescuer->pool is gonna change. While this doesn't currently cause any problem, it is nasty. With recent worker_maybe_bind_and_lock() changes, we can move rescuer->pool updates inside pool locks such that if rescuer->pool equals a locked pool, it's guaranteed to stay that way until the pool is unlocked. Move rescuer->pool inside pool->lock. This patch doesn't introduce any visible behavior difference. tj: Updated the description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-03-04workqueue: change argument of worker_maybe_bind_and_lock() to @poolLai Jiangshan
worker_maybe_bind_and_lock() currently takes @worker but only cares about @worker->pool. This patch updates worker_maybe_bind_and_lock() to take @pool instead of @worker. This will be used to better define synchronization rules regarding rescuer->pool updates. This doesn't introduce any functional change. tj: Updated the comments and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-03-04workqueue: use %current instead of worker->task in worker_maybe_bind_and_lock()Lai Jiangshan
worker_maybe_bind_and_lock() uses both @worker->task and @current at the same time. As worker_maybe_bind_and_lock() can only be called by the current worker task, they are always the same. Update worker_maybe_bind_and_lock() to use %current consistently. This doesn't introduce any functional change. tj: Massaged the description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-02-27hlist: drop the node parameter from iteratorsSasha Levin
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: Peter Senna Tschudin <peter.senna@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-19workqueue: un-GPL function delayed_work_timer_fn()Konstantin Khlebnikov
commit d8e794dfd51c368ed3f686b7f4172830b60ae47b ("workqueue: set delayed_work->timer function on initialization") exports function delayed_work_timer_fn() only for GPL modules. This makes delayed-works unusable for non-GPL modules, because initialization macro now requires GPL symbol. For example schedule_delayed_work() available for non-GPL. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # 3.7
2013-02-13workqueue: rename cpu_workqueue to pool_workqueueTejun Heo
workqueue has moved away from global_cwqs to worker_pools and with the scheduled custom worker pools, wforkqueues will be associated with pools which don't have anything to do with CPUs. The workqueue code went through significant amount of changes recently and mass renaming isn't likely to hurt much additionally. Let's replace 'cpu' with 'pool' so that it reflects the current design. * s/struct cpu_workqueue_struct/struct pool_workqueue/ * s/cpu_wq/pool_wq/ * s/cwq/pwq/ This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org>
2013-02-13workqueue: reimplement is_chained_work() using current_wq_worker()Tejun Heo
is_chained_work() was added before current_wq_worker() and implemented its own ham-fisted way of finding out whether %current is a workqueue worker - it iterates through all possible workers. Drop the custom implementation and reimplement using current_wq_worker(). Signed-off-by: Tejun Heo <tj@kernel.org>
2013-02-13workqueue: fix is_chained_work() regressionTejun Heo
c9e7cf273f ("workqueue: move busy_hash from global_cwq to worker_pool") incorrectly converted is_chained_work() to use get_gcwq() inside for_each_gcwq_cpu() while removing get_gcwq(). As cwq might not exist for all possible workqueue CPUs, @cwq can be NULL and the following cwq deferences can lead to oops. Fix it by using for_each_cwq_cpu() instead, which is the better one to use anyway as we only need to check pools that the wq is associated with. Signed-off-by: Tejun Heo <tj@kernel.org>
2013-02-07workqueue: pick cwq instead of pool in __queue_work()Lai Jiangshan
Currently, __queue_work() chooses the pool to queue a work item to and then determines cwq from the target wq and the chosen pool. This is a bit backwards in that we can determine cwq first and simply use cwq->pool. This way, we can skip get_std_worker_pool() in queueing path which will be a hurdle when implementing custom worker pools. Update __queue_work() such that it chooses the target cwq and then use cwq->pool instead of the other way around. While at it, add missing {} in an if statement. This patch doesn't introduce any functional changes. tj: The original patch had two get_cwq() calls - the first to determine the pool by doing get_cwq(cpu, wq)->pool and the second to determine the matching cwq from get_cwq(pool->cpu, wq). Updated the function such that it chooses cwq instead of pool and removed the second call. Rewrote the description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-02-07workqueue: make get_work_pool_id() cheaperLai Jiangshan
get_work_pool_id() currently first obtains pool using get_work_pool() and then return pool->id. For an off-queue work item, this involves obtaining pool ID from worker->data, performing idr_find() to find the matching pool and then returning its pool->id which of course is the same as the one which went into idr_find(). Just open code WORK_STRUCT_CWQ case and directly return pool ID from work->data. tj: The original patch dropped on-queue work item handling and renamed the function to offq_work_pool_id(). There isn't much benefit in doing so. Handling it only requires a single if() and we need at least BUG_ON(), which is also a branch, even if we drop on-queue handling. Open code WORK_STRUCT_CWQ case and keep the function in line with get_work_pool(). Rewrote the description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2013-02-07workqueue: move nr_running into worker_poolTejun Heo
As nr_running is likely to be accessed from other CPUs during try_to_wake_up(), it was kept outside worker_pool; however, while less frequent, other fields in worker_pool are accessed from other CPUs for, e.g., non-reentrancy check. Also, with recent pool related changes, accessing nr_running matching the worker_pool isn't as simple as it used to be. Move nr_running inside worker_pool. Keep it aligned to cacheline and define CPU pools using DEFINE_PER_CPU_SHARED_ALIGNED(). This should give at least the same cacheline behavior. get_pool_nr_running() is replaced with direct pool->nr_running accesses. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Joonsoo Kim <js1304@gmail.com>
2013-02-06workqueue: cosmetic update in try_to_grab_pending()Tejun Heo
With the recent is-work-queued-here test simplification, the nested if() in try_to_grab_pending() can be collapsed. Collapse it. This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
2013-02-06workqueue: simplify is-work-item-queued-here testLai Jiangshan
Currently, determining whether a work item is queued on a locked pool involves somewhat convoluted memory barrier dancing. It goes like the following. * When a work item is queued on a pool, work->data is updated before work->entry is linked to the pending list with a wmb() inbetween. * When trying to determine whether a work item is currently queued on a pool pointed to by work->data, it locks the pool and looks at work->entry. If work->entry is linked, we then do rmb() and then check whether work->data points to the current pool. This works because, work->data can only point to a pool if it currently is or were on the pool and, * If it currently is on the pool, the tests would obviously succeed. * It it left the pool, its work->entry was cleared under pool->lock, so if we're seeing non-empty work->entry, it has to be from the work item being linked on another pool. Because work->data is updated before work->entry is linked with wmb() inbetween, work->data update from another pool is guaranteed to be visible if we do rmb() after seeing non-empty work->entry. So, we either see empty work->entry or we see updated work->data pointin to another pool. While this works, it's convoluted, to put it mildly. With recent updates, it's now guaranteed that work->data points to cwq only while the work item is queued and that updating work->data to point to cwq or back to pool is done under pool->lock, so we can simply test whether work->data points to cwq which is associated with the currently locked pool instead of the convoluted memory barrier dancing. This patch replaces the memory barrier based "are you still here, really?" test with much simpler "does work->data points to me?" test - if work->data points to a cwq which is associated with the currently locked pool, the work item is guaranteed to be queued on the pool as work->data can start and stop pointing to such cwq only under pool->lock and the start and stop coincide with queue and dequeue. tj: Rewrote the comments and description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>