summaryrefslogtreecommitdiff
path: root/block/blk-throttle.c
AgeCommit message (Collapse)Author
2013-05-14blk-throttle: add throtl_service_queue->parent_sqTejun Heo
To prepare for hierarchy support, this patch adds throtl_service_queue->service_sq which points to the arent service_queue. Currently, for all service_queues embedded in throtl_grps, it points to throtl_data->service_queue. As throtl_data->service_queue doesn't have a parent its parent_sq is set to NULL. There are a number of functions which take both throtl_grp *tg and throtl_service_queue *parent_sq. With this patch, the parent service_queue can be determined from @tg and the @parent_sq arguments are removed. This patch doesn't make any behavior differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
2013-05-14blk-throttle: generalize update_disptime optimization in blk_throtl_bio()Tejun Heo
When blk_throtl_bio() wants to queue a bio to a tg (throtl_grp), it avoids invoking tg_update_disptime() and throtl_schedule_next_dispatch() if the tg already has bios queued in that direction. As a new bio is appeneded after the existing ones, it can't change the tg's next dispatch time or the parent's dispatch schedule. This optimization is currently open coded in blk_throtl_bio(). Whether the target biolist was occupied was recorded in a local variable and later used to skip disptime update. This patch moves generalizes it so that throtl_add_bio_tg() sets a new flag THROTL_TG_WAS_EMPTY if the biolist was empty before the new bio was added. tg_update_disptime() clears the flag automatically. blk_throtl_bio() is updated to simply test the flag before updating disptime. This patch doesn't make any functional differences now but will enable using the same optimization for recursive dispatch. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
2013-05-14blk-throttle: dispatch to throtl_data->service_queue.bio_lists[]Tejun Heo
throtl_service_queues will eventually form a tree which is anchored at throtl_data->service_queue and queue bios will climb the tree to the top service_queue to be executed. This patch makes the dispatch paths in blk_throtl_dispatch_work_fn() and blk_throtl_drain() to dispatch bios to throtl_data->service_queue.bio_lists[] instead of the on-stack bio_lists. This will keep the final dispatch to the top level service_queue share the same mechanism as dispatches through the rest of the hierarchy. As bio's should be issued in a sleepable context, blk_throtl_dispatch_work_fn() transfers all dispatched bio's from the service_queue bio_lists[] into an onstack one before dropping queue_lock and issuing the bio's. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
2013-05-14blk-throttle: move bio_lists[] and friends to throtl_service_queueTejun Heo
throtl_service_queues will eventually form a tree which is anchored at throtl_data->service_queue and queue bios will climb the tree to the top service_queue to be executed. This patch moves bio_lists[] and nr_queued[] from throtl_grp to its service_queue to prepare for that. As currently only the throtl_data->service_queue is in use, this patch just ends up moving throtl_grp->bio_lists[] and ->nr_queued[] to throtl_grp->service_queue.bio_lists[] and ->nr_queued[] without making any functional differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
2013-05-14blk-throttle: add throtl_grp->service_queueTejun Heo
Currently, there's single service_queue per queue - throtl_data->service_queue. All active throtl_grp's are queued on the queue and dispatched according to their limits. To support hierarchy, this will be expanded such that active throtl_grp's form a tree anchored at throtl_data->service_queue and chained through each intermediate throtl_grp's service_queue. This patch adds throtl_grp->service_queue to prepare for hierarchy support. The initialization function - throtl_service_queue_init() - is added and replaces the macro initializer. The newly added tg->service_queue isn't used yet. Following patches will do. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
2013-05-14blk-throttle: reorganize throtl_service_queue passed around as argumentTejun Heo
throtl_service_queue will be the building block of hierarchy support and will form a tree. This patch updates its usages as arguments to reduce confusion. * When a service queue is used as the parent role - the host of the rbtree - use @parent_sq instead of @sq. * For functions taking both @tg and @parent_sq, reorder them so that the order is (@tg, @parent_sq) not the other way around. This makes the code follow the usual convention of specifying the primary target of the operation as the first argument. This patch doesn't make any functional differences. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
2013-05-14blk-throttle: pass around throtl_service_queue instead of throtl_dataTejun Heo
throtl_service_queue will be used as the basic block to implement hierarchy support. Pass around throtl_service_queue *sq instead of throtl_data *td in the following functions which will be used across multiple levels of hierarchy. * [__]throtl_enqueue/dequeue_tg() * throtl_add_bio_tg() * tg_update_disptime() * throtl_select_dispatch() Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
2013-05-14blk-throttle: add backlink pointer from throtl_grp to throtl_dataTejun Heo
Add throtl_grp->td so that the td (throtl_data) a given tg (throtl_grp) belongs to can be determined, and remove @td argument from functions which take both @td and @tg as the former now can be determined from the latter. This generally simplifies the code and removes a number of cases where @td is passed as an argument without being actually used. This will also help hierarchy support implementation. While at it, in multi-line conditions, move the logical operators leading broken lines to the end of the previous line. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
2013-05-14blk-throttle: simplify throtl_grp flag handlingTejun Heo
blk-throttle is still using function-defining macros to define flag handling functions, which went out style at least a decade ago. Just define the flag as bitmask and use direct bit operations. This patch doesn't make any functional changes. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
2013-05-14blk-throttle: rename throtl_rb_root to throtl_service_queueTejun Heo
throtl_rb_root will be expanded to cover more roles for hierarchy support. Rename it to throtl_service_queue and make its fields more descriptive. * rb -> pending_tree * left -> first_pending * count -> nr_pending * min_disptime -> first_pending_disptime This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org Acked-by: Vivek Goyal <vgoyal@redhat.com>
2013-05-14blk-throttle: remove pointless throtl_nr_queued() optimizationsTejun Heo
throtl_nr_queued() is used in several places to avoid performing certain operations when the throtl_data is empty. This usually is useless as those paths usually aren't traveled if there's no bio queued. * throtl_schedule_delayed_work() skips scheduling dispatch work item if @td doesn't have any bios queued; however, the only case it can be called when @td is empty is from tg_set_conf() which isn't something we should be optimizing for. * throtl_schedule_next_dispatch() takes a quick exit if @td is empty; however, right after that it triggers BUG if the service tree is empty. The two conditions are equivalent and it can just test @st->count for the quick exit. * blk_throtl_dispatch_work_fn() skips dispatch if @td is empty. This work function isn't usually invoked when @td is empty. The only possibility is from tg_set_conf() and when it happens the normal dispatching path can handle empty @td fine. No need to add special skip path. This patch removes the above three unnecessary optimizations, which leave throtl_log() call in blk_throtl_dispatch_work_fn() the only user of throtl_nr_queued(). Remove throtl_nr_queued() and open code it in throtl_log(). I don't think we need td->nr_queued[] at all. Maybe we can remove it later. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
2013-05-14blk-throttle: relocate throtl_schedule_delayed_work()Tejun Heo
Move throtl_schedule_delayed_work() above its first user so that the forward declaration can be removed. This patch is pure relocaiton. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
2013-05-14blk-throttle: collapse throtl_dispatch() into the work functionTejun Heo
blk-throttle is about to go through major restructuring to support hierarchy. Do cosmetic updates in preparation. * s/throtl_data->throtl_work/throtl_data->dispatch_work/ * s/blk_throtl_work()/blk_throtl_dispatch_work_fn()/ * Collapse throtl_dispatch() into blk_throtl_dispatch_work_fn() This patch is purely cosmetic. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
2013-05-14blk-throttle: remove deferred config application mechanismTejun Heo
When bps or iops configuration changes, blk-throttle records the new configuration and sets a flag indicating that the config has changed. The flag is checked in the bio dispatch path and applied. This deferred config application was necessary due to limitations in blkcg framework, which haven't existed for quite a while now. This patch removes the deferred config application mechanism and applies new configurations directly from tg_set_conf(), which is simpler. v2: Dropped unnecessary throtl_schedule_delayed_work() call from tg_set_conf() as suggested by Vivek Goyal. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
2013-05-14blk-throttle: remove spurious throtl_enqueue_tg() call from ↵Tejun Heo
throtl_select_dispatch() throtl_select_dispatch() calls throtl_enqueue_tg() right after tg_update_disptime(), which always calls the function anyway. The call is, while harmless, unnecessary. Remove it. This patch doesn't introduce any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com>
2012-12-06block: Rename queue dead flagBart Van Assche
QUEUE_FLAG_DEAD is used to indicate that queuing new requests must stop. After this flag has been set queue draining starts. However, during the queue draining phase it is still safe to invoke the queue's request_fn, so QUEUE_FLAG_DYING is a better name for this flag. This patch has been generated by running the following command over the kernel source tree: git grep -lEw 'blk_queue_dead|QUEUE_FLAG_DEAD' | xargs sed -i.tmp -e 's/blk_queue_dead/blk_queue_dying/g' \ -e 's/QUEUE_FLAG_DEAD/QUEUE_FLAG_DYING/g'; \ sed -i.tmp -e "s/QUEUE_FLAG_DYING$(printf \\t)*5/QUEUE_FLAG_DYING$(printf \\t)5/g" \ include/linux/blkdev.h; \ sed -i.tmp -e 's/ DEAD/ DYING/g' -e 's/dead queue/a dying queue/' \ -e 's/Dead queue/A dying queue/' block/blk-core.c Signed-off-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: James Bottomley <JBottomley@Parallels.com> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Jens Axboe <axboe@kernel.dk> Cc: Chanho Min <chanho.min@lge.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-08-21workqueue: use mod_delayed_work() instead of __cancel + queueTejun Heo
Now that mod_delayed_work() is safe to call from IRQ handlers, __cancel_delayed_work() followed by queue_delayed_work() can be replaced with mod_delayed_work(). Most conversions are straight-forward except for the following. * net/core/link_watch.c: linkwatch_schedule_work() was doing a quite elaborate dancing around its delayed_work. Collapse it such that linkwatch_work is queued for immediate execution if LW_URGENT and existing timer is kept otherwise. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
2012-08-20workqueue: deprecate system_nrt[_freezable]_wqTejun Heo
system_nrt[_freezable]_wq are now spurious. Mark them deprecated and convert all users to system[_freezable]_wq. If you're cc'd and wondering what's going on: Now all workqueues are non-reentrant, so there's no reason to use system_nrt[_freezable]_wq. Please use system[_freezable]_wq instead. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-By: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: David Airlie <airlied@linux.ie> Cc: Jiri Kosina <jkosina@suse.cz> Cc: "David S. Miller" <davem@davemloft.net> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: David Howells <dhowells@redhat.com>
2012-06-25block: allocate io_context upfrontTejun Heo
Block layer very lazy allocation of ioc. It waits until the moment ioc is absolutely necessary; unfortunately, that time could be inside queue lock and __get_request() performs unlock - try alloc - retry dancing. Just allocate it up-front on entry to block layer. We're not saving the rain forest by deferring it to the last possible moment and complicating things unnecessarily. This patch is to prepare for further updates to request allocation path. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-05-23blkcg: tg_stats_alloc_lock is an irq lockTejun Heo
tg_stats_alloc_lock nests inside queue lock and should always be held with irq disabled. throtl_pd_{init|exit}() were using non-irqsafe spinlock ops which triggered inverse lock ordering via irq warning via RCU freeing of blkg invoking throtl_pd_exit() w/o disabling IRQ. Update both functions to use irq safe operations. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Sasha Levin <sasha.levin@oracle.com> LKML-Reference: <1335339396.16988.80.camel@lappy> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-05-01Merge tag 'v3.4-rc5' into for-3.5/coreJens Axboe
The core branch is behind driver commits that we want to build on for 3.5, hence I'm pulling in a later -rc. Linux 3.4-rc5 Conflicts: Documentation/feature-removal-schedule.txt Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: collapse blkcg_policy_ops into blkcg_policyTejun Heo
There's no reason to keep blkcg_policy_ops separate. Collapse it into blkcg_policy. This patch doesn't introduce any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: embed struct blkg_policy_data in policy specific dataTejun Heo
Currently blkg_policy_data carries policy specific data as char flex array instead of being embedded in policy specific data. This was forced by oddities around blkg allocation which are all gone now. This patch makes blkg_policy_data embedded in policy specific data - throtl_grp and cfq_group so that it's more conventional and consistent with how io_cq is handled. * blkcg_policy->pdata_size is renamed to ->pd_size. * Functions which used to take void *pdata now takes struct blkg_policy_data *pd. * blkg_to_pdata/pdata_to_blkg() updated to blkg_to_pd/pd_to_blkg(). * Dummy struct blkg_policy_data definition added. Dummy pdata_to_blkg() definition was unused and inconsistent with the non-dummy version - correct dummy pd_to_blkg() added. * throtl and cfq updated accordingly. * As dummy blkg_to_pd/pd_to_blkg() are provided, blkg_to_cfqg/cfqg_to_blkg() don't need to be ifdef'd. Moved outside ifdef block. This patch doesn't introduce any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: mass rename of blkcg APITejun Heo
During the recent blkcg cleanup, most of blkcg API has changed to such extent that mass renaming wouldn't cause any noticeable pain. Take the chance and cleanup the naming. * Rename blkio_cgroup to blkcg. * Drop blkio / blkiocg prefixes and consistently use blkcg. * Rename blkio_group to blkcg_gq, which is consistent with io_cq but keep the blkg prefix / variable name. * Rename policy method type and field names to signify they're dealing with policy data. * Rename blkio_policy_type to blkcg_policy. This patch doesn't cause any functional change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: remove blkio_group->path[]Tejun Heo
blkio_group->path[] stores the path of the associated cgroup and is used only for debug messages. Just format the path from blkg->cgroup when printing debug messages. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: drop stuff unused after per-queue policy activation updateTejun Heo
* All_q_list is unused. Drop all_q_{mutex|list}. * @for_root of blkg_lookup_create() is always %false when called from outside blk-cgroup.c proper. Factor out __blkg_lookup_create() so that it doesn't check whether @q is bypassing and use the underscored version for the @for_root callsite. * blkg_destroy_all() is used only from blkcg proper and @destroy_root is always %true. Make it static and drop @destroy_root. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: implement per-queue policy activationTejun Heo
All blkcg policies were assumed to be enabled on all request_queues. Due to various implementation obstacles, during the recent blkcg core updates, this was temporarily implemented as shooting down all !root blkgs on elevator switch and policy [de]registration combined with half-broken in-place root blkg updates. In addition to being buggy and racy, this meant losing all blkcg configurations across those events. Now that blkcg is cleaned up enough, this patch replaces the temporary implementation with proper per-queue policy activation. Each blkcg policy should call the new blkcg_[de]activate_policy() to enable and disable the policy on a specific queue. blkcg_activate_policy() allocates and installs policy data for the policy for all existing blkgs. blkcg_deactivate_policy() does the reverse. If a policy is not enabled for a given queue, blkg printing / config functions skip the respective blkg for the queue. blkcg_activate_policy() also takes care of root blkg creation, and cfq_init_queue() and blk_throtl_init() are updated accordingly. This replaces blkcg_bypass_{start|end}() and update_root_blkg_pd() unnecessary. Dropped. v2: cfq_init_queue() was returning uninitialized @ret on root_group alloc failure if !CONFIG_CFQ_GROUP_IOSCHED. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: add request_queue->root_blkgTejun Heo
With per-queue policy activation, root blkg creation will be moved to blkcg core. Add q->root_blkg in preparation. For blk-throtl, this replaces throtl_data->root_tg; however, cfq needs to keep cfqd->root_group for !CONFIG_CFQ_GROUP_IOSCHED. This is to prepare for per-queue policy activation and doesn't cause any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: make blkg_conf_prep() take @pol and return with queue lock heldTejun Heo
Add @pol to blkg_conf_prep() and let it return with queue lock held (to be released by blkg_conf_finish()). Note that @pol isn't used yet. This is to prepare for per-queue policy activation and doesn't cause any visible difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: remove static policy ID enumsTejun Heo
Remove BLKIO_POLICY_* enums and let blkio_policy_register() allocate @pol->plid dynamically on registration. The maximum number of blkcg policies which can be registered at the same time is defined by BLKCG_MAX_POLS constant added to include/linux/blkdev.h. Note that blkio_policy_register() now may fail. Policy init functions updated accordingly and unnecessary ifdefs removed from cfq_init(). Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-20blkcg: use @pol instead of @plid in update_root_blkg_pd() and ↵Tejun Heo
blkcg_print_blkgs() The two functions were taking "enum blkio_policy_id plid". Make them take "const struct blkio_policy_type *pol" instead. This is to prepare for per-queue policy activation and doesn't cause any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-04-01blkcg: drop BLKCG_STAT_{PRIV|POL|OFF} macrosTejun Heo
Now that all stat handling code lives in policy implementations, there's no need to encode policy ID in cft->private. * Export blkcg_prfill_[rw]stat() from blkcg, remove blkcg_print_[rw]stat(), and implement cfqg_print_[rw]stat() which use hard-code BLKIO_POLICY_PROP. * Use cft->private for offset of the target field directly and drop BLKCG_STAT_{PRIV|POL|OFF}(). Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: pass around pd->pdata instead of pd itself in prfill functionsTejun Heo
Now that all conf and stat fields are moved into policy specific blkio_policy_data->pdata areas, there's no reason to use blkio_policy_data itself in prfill functions. Pass around @pd->pdata instead of @pd. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: move blkio_group_conf->iops and ->bps to blk-throttleTejun Heo
blkio_cgroup_conf->iops and ->bps are owned by blk-throttle and has no reason to be defined in blkcg core. Drop them and let conf setting functions directly manipulate throtl_grp->bps[] and ->iops[]. This makes blkio_group_conf empty. Drop it. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: move blkio_group_stats_cpu and friends to blk-throttle.cTejun Heo
blkio_group_stats_cpu is used only by blk-throtl and has no reason to be defined in blkcg core. * Move blkio_group_stats_cpu to blk-throttle.c and rename it to tg_stats_cpu. * blkg_policy_data->stats_cpu is replaced with throtl_grp->stats_cpu. prfill functions updated accordingly. * All related macros / functions are renamed so that they have tg_ prefix and the unnecessary @pol arguments are dropped. * Per-cpu stats allocation code is also moved from blk-cgroup.c to blk-throttle.c and gets simplified to only deal with BLKIO_POLICY_THROTL. percpu stat free is performed by the exit method throtl_exit_blkio_group(). * throtl_reset_group_stats() implemented for blkio_reset_group_stats_fn method so that tg->stats_cpu can be reset. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: cfq doesn't need per-cpu dispatch statsTejun Heo
blkio_group_stats_cpu is used to count dispatch stats using per-cpu counters. This is used by both blk-throtl and cfq-iosched but the sharing is rather silly. * cfq-iosched doesn't need per-cpu dispatch stats. cfq always updates those stats while holding queue_lock. * blk-throtl needs per-cpu dispatch stats but only service_bytes and serviced. It doesn't make use of sectors. This patch makes cfq add and use global stats for service_bytes, serviced and sectors, removes per-cpu sectors counter and moves per-cpu stat printing code to blk-throttle.c. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: move statistics update code to policiesTejun Heo
As with conf/stats file handling code, there's no reason for stat update code to live in blkcg core with policies calling into update them. The current organization is both inflexible and complex. This patch moves stat update code to specific policies. All blkiocg_update_*_stats() functions which deal with BLKIO_POLICY_PROP stats are collapsed into their cfq_blkiocg_update_*_stats() counterparts. blkiocg_update_dispatch_stats() is used by both policies and duplicated as throtl_update_dispatch_stats() and cfq_blkiocg_update_dispatch_stats(). This will be cleaned up later. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: move conf/stat file handling code to policiesTejun Heo
blkcg conf/stat handling is convoluted in that details which belong to specific policy implementations are all out in blkcg core and then policies hook into core layer to access and manipulate confs and stats. This sadly achieves both inflexibility (confs/stats can't be modified without messing with blkcg core) and complexity (all the call-ins and call-backs). The previous patches restructured conf and stat handling code such that they can be separated out. This patch relocates the file handling part. All conf/stat file handling code which belongs to BLKIO_POLICY_PROP is moved to cfq-iosched.c and all BKLIO_POLICY_THROTL code to blk-throtl.c. The move is verbatim except for blkio_update_group_{weight|bps|iops}() callbacks which relays conf changes to policies. The configuration settings are handled in policies themselves so the relaying isn't necessary. Conf setting functions are modified to directly call per-policy update functions and the relaying mechanism is dropped. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-04-01blkcg: remove unused @pol and @plid parametersTejun Heo
@pol to blkg_to_pdata() and @plid to blkg_lookup_create() are no longer necessary. Drop them. Signed-off-by: Tejun Heo <tj@kernel.org>
2012-03-30block: use lockdep_assert_held for queue lockingAndi Kleen
Instead of an ugly open coded variant. Cc: axboe@kernel.dk Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-03-06block: make blk-throttle preserve the issuing task on delayed biosTejun Heo
Make blk-throttle call bio_associate_current() on bios being delayed such that they get issued to block layer with the original io_context. This allows stacking blk-throttle and cfq-iosched propio policies. bios will always be issued with the correct ioc and blkcg whether it gets delayed by blk-throttle or not. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-03-06block: make block cgroup policies follow bio task associationTejun Heo
Implement bio_blkio_cgroup() which returns the blkcg associated with the bio if exists or %current's blkcg, and use it in blk-throttle and cfq-iosched propio. This makes both cgroup policies honor task association for the bio instead of always assuming %current. As nobody is using bio_set_task() yet, this doesn't introduce any behavior change. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-03-06blkcg: drop unnecessary RCU lockingTejun Heo
Now that blkg additions / removals are always done under both q and blkcg locks, the only places RCU locking is necessary are blkg_lookup[_create]() for lookup w/o blkcg lock. This patch drops unncessary RCU locking replacing it with plain blkcg locking as necessary. * blkiocg_pre_destroy() already perform proper locking and don't need RCU. Dropped. * blkio_read_blkg_stats() now uses blkcg->lock instead of RCU read lock. This isn't a hot path. * Now unnecessary synchronize_rcu() from queue exit paths removed. This makes q->nr_blkgs unnecessary. Dropped. * RCU annotation on blkg->q removed. -v2: Vivek pointed out that blkg_lookup_create() still needs to be called under rcu_read_lock(). Updated. -v3: After the update, stats_lock locking in blkio_read_blkg_stats() shouldn't be using _irq variant as it otherwise ends up enabling irq while blkcg->lock is locked. Fixed. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-03-06blkcg: unify blkg's for blkcg policiesTejun Heo
Currently, blkg is per cgroup-queue-policy combination. This is unnatural and leads to various convolutions in partially used duplicate fields in blkg, config / stat access, and general management of blkgs. This patch make blkg's per cgroup-queue and let them serve all policies. blkgs are now created and destroyed by blkcg core proper. This will allow further consolidation of common management logic into blkcg core and API with better defined semantics and layering. As a transitional step to untangle blkg management, elvswitch and policy [de]registration, all blkgs except the root blkg are being shot down during elvswitch and bypass. This patch adds blkg_root_update() to update root blkg in place on policy change. This is hacky and racy but should be good enough as interim step until we get locking simplified and switch over to proper in-place update for all blkgs. -v2: Root blkgs need to be updated on elvswitch too and blkg_alloc() comment wasn't updated according to the function change. Fixed. Both pointed out by Vivek. -v3: v2 updated blkg_destroy_all() to invoke update_root_blkg_pd() for all policies. This freed root pd during elvswitch before the last queue finished exiting and led to oops. Directly invoke update_root_blkg_pd() only on BLKIO_POLICY_PROP from cfq_exit_queue(). This also is closer to what will be done with proper in-place blkg update. Reported by Vivek. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-03-06blkcg: let blkcg core manage per-queue blkg list and counterTejun Heo
With the previous patch to move blkg list heads and counters to request_queue and blkg, logic to manage them in both policies are almost identical and can be moved to blkcg core. This patch moves blkg link logic into blkg_lookup_create(), implements common blkg unlink code in blkg_destroy(), and updates blkg_destory_all() so that it's policy specific and can skip root group. The updated blkg_destroy_all() is now used to both clear queue for bypassing and elv switching, and release all blkgs on q exit. This patch introduces a race window where policy [de]registration may race against queue blkg clearing. This can only be a problem on cfq unload and shouldn't be a real problem in practice (and we have many other places where this race already exists). Future patches will remove these unlikely races. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-03-06blkcg: move per-queue blkg list heads and counters to queue and blkgTejun Heo
Currently, specific policy implementations are responsible for maintaining list and number of blkgs. This duplicates code unnecessarily, and hinders factoring common code and providing blkcg API with better defined semantics. After this patch, request_queue hosts list heads and counters and blkg has list nodes for both policies. This patch only relocates the necessary fields and the next patch will actually move management code into blkcg core. Note that request_queue->blkg_list[] and ->nr_blkgs[] are hardcoded to have 2 elements. This is to avoid include dependency and will be removed by the next patch. This patch doesn't introduce any behavior change. -v2: Now unnecessary conditional on CONFIG_BLK_CGROUP_MODULE removed as pointed out by Vivek. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-03-06blkcg: don't use blkg->plid in stat related functionsTejun Heo
blkg is scheduled to be unified for all policies and thus there won't be one-to-one mapping from blkg to policy. Update stat related functions to take explicit @pol or @plid arguments and not use blkg->plid. This is painful for now but most of specific stat interface functions will be replaced with a handful of generic helpers. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-03-06blkcg: move refcnt to blkcg coreTejun Heo
Currently, blkcg policy implementations manage blkg refcnt duplicating mostly identical code in both policies. This patch moves refcnt to blkg and let blkcg core handle refcnt and freeing of blkgs. * cfq blkgs now also get freed via RCU. * cfq blkgs lose RB_EMPTY_ROOT() sanity check on blkg free. If necessary, we can add blkio_exit_group_fn() to resurrect this. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-03-06blkcg: let blkcg core handle policy private data allocationTejun Heo
Currently, blkg's are embedded in private data blkcg policy private data structure and thus allocated and freed by policies. This leads to duplicate codes in policies, hinders implementing common part in blkcg core with strong semantics, and forces duplicate blkg's for the same cgroup-q association. This patch introduces struct blkg_policy_data which is a separate data structure chained from blkg. Policies specifies the amount of private data it needs in its blkio_policy_type->pdata_size and blkcg core takes care of allocating them along with blkg which can be accessed using blkg_to_pdata(). blkg can be determined from pdata using pdata_to_blkg(). blkio_alloc_group_fn() method is accordingly updated to blkio_init_group_fn(). For consistency, tg_of_blkg() and cfqg_of_blkg() are replaced with blkg_to_tg() and blkg_to_cfqg() respectively, and functions to map in the reverse direction are added. Except that policy specific data now lives in a separate data structure from blkg, this patch doesn't introduce any functional difference. This will be used to unify blkg's for different policies. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2012-03-06blkcg: add blkcg_{init|drain|exit}_queue()Tejun Heo
Currently block core calls directly into blk-throttle for init, drain and exit. This patch adds blkcg_{init|drain|exit}_queue() which wraps the blk-throttle functions. This is to give more control and visiblity to blkcg core layer for proper layering. Further patches will add logic common to blkcg policies to the functions. While at it, collapse blk_throtl_release() into blk_throtl_exit(). There's no reason to keep them separate. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>