summaryrefslogtreecommitdiff
path: root/block
AgeCommit message (Collapse)Author
2009-10-07cfq-iosched: apply bool value where we return 0/1Jens Axboe
Saves 16 bytes of text, woohoo. But the more important point is that it makes the code more readable when returning bool for 0/1 cases. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-07cfq-iosched: fix think time allowed for seekersCorrado Zoccolo
CFQ enables idle only for processes that think less than the allowed idle time. Since idle time is lower for seeky queues, we should use the correct value in the comparison. Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-06cfq-iosched: fix the slice residual signJens Axboe
We should subtract the slice residual from the rb tree key, since a negative residual count indicates that the cfqq overran its slice the last time. Hence we want to add the overrun time, to position it a bit further away in the service tree. Reported-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-06cfq-iosched: abstract out the 'may this cfqq dispatch' logicJens Axboe
Makes the whole thing easier to read, cfq_dispatch_requests() was a bit messy before. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-06block: use proper BLK_RW_ASYNC in blk_queue_start_tag()Jens Axboe
Makes it easier to read than the 0. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-06block: Seperate read and write statistics of in_flight requests v2Nikanth Karthikesan
Commit a9327cac440be4d8333bba975cbbf76045096275 added seperate read and write statistics of in_flight requests. And exported the number of read and write requests in progress seperately through sysfs. But Corrado Zoccolo <czoccolo@gmail.com> reported getting strange output from "iostat -kx 2". Global values for service time and utilization were garbage. For interval values, utilization was always 100%, and service time is higher than normal. So this was reverted by commit 0f78ab9899e9d6acb09d5465def618704255963b The problem was in part_round_stats_single(), I missed the following: if (now == part->stamp) return; - if (part->in_flight) { + if (part_in_flight(part)) { __part_stat_add(cpu, part, time_in_queue, part_in_flight(part) * (now - part->stamp)); __part_stat_add(cpu, part, io_ticks, (now - part->stamp)); With this chunk included, the reported regression gets fixed. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> -- Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-05block: get rid of kblock_schedule_delayed_work()Jens Axboe
It was briefly introduced to allow CFQ to to delayed scheduling, but we ended up removing that feature again. So lets kill the function and export, and just switch CFQ back to the normal work schedule since it is now passing in a '0' delay from all call sites. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-05cfq-iosched: fix possible problem with jiffies wraparoundCorrado Zoccolo
The RR service tree is indexed by a key that is relative to current jiffies. This can cause problems on jiffies wraparound. The patch fixes it using time_before comparison, and changing the add_front path to use a relative number, too. Signed-off-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-05cfq-iosched: fix issue with rq-rq merging and fifo list orderingJens Axboe
cfq uses rq->start_time as the fifo indicator, but that field may get modified prior to cfq doing it's fifo list adjustment when a request gets merged with another request. This can cause the fifo list to become unordered. Reported-by: Corrado Zoccolo <czoccolo@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-04Revert "Seperate read and write statistics of in_flight requests"Jens Axboe
This reverts commit a9327cac440be4d8333bba975cbbf76045096275. Corrado Zoccolo <czoccolo@gmail.com> reports: "with 2.6.32-rc1 I started getting the following strange output from "iostat -kx 2": Linux 2.6.31bisect (et2) 04/10/2009 _i686_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 10,70 0,00 3,16 15,75 0,00 70,38 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 18,22 0,00 0,67 0,01 14,77 0,02 43,94 0,01 10,53 39043915,03 2629219,87 sdb 60,89 9,68 50,79 3,04 1724,43 50,52 65,95 0,70 13,06 488437,47 2629219,87 avg-cpu: %user %nice %system %iowait %steal %idle 2,72 0,00 0,74 0,00 0,00 96,53 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 avg-cpu: %user %nice %system %iowait %steal %idle 6,68 0,00 0,99 0,00 0,00 92,33 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 sdb 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 avg-cpu: %user %nice %system %iowait %steal %idle 4,40 0,00 0,73 1,47 0,00 93,40 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util sda 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 0,00 100,00 sdb 0,00 4,00 0,00 3,00 0,00 28,00 18,67 0,06 19,50 333,33 100,00 Global values for service time and utilization are garbage. For interval values, utilization is always 100%, and service time is higher than normal. I bisected it down to: [a9327cac440be4d8333bba975cbbf76045096275] Seperate read and write statistics of in_flight requests and verified that reverting just that commit indeed solves the issue on 2.6.32-rc1." So until this is debugged, revert the bad commit. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-04cfq-iosched: don't delay async queue if it hasn't dispatched at allJens Axboe
We cannot delay for the first dispatch of the async queue if it hasn't dispatched at all, since that could present a local user DoS attack vector using an app that just did slow timed sync reads while filling memory. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-03block: Topology ioctlsMartin K. Petersen
Not all users of the topology information want to use libblkid. Provide the topology information through bdev ioctls. Also clarify sector size comments for existing BLK ioctls. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-03cfq-iosched: use assigned slice sync value, not defaultJens Axboe
We should use the sysfs modified slice sync value, in case it differs from the default. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-03cfq-iosched: rename 'desktop' sysfs entry to 'low_latency'Jens Axboe
Don't think that's necessarily a perfect description of what this option fiddles with, but it's probably better than 'desktop'. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-03cfq-iosched: implement slower async initiate and queue ramp upJens Axboe
This slowly ramps up the async queue depth based on the time passed since the sync IO, and doesn't allow async at all until a sync slice period has passed. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-03cfq-iosched: delay async IO dispatch, if sync IO was just doneVivek Goyal
o Do not allow more than max_dispatch requests from an async queue, if some sync request has finished recently. This is in the hope that sync activity is still going on in the system and we might receive a sync request soon. Most likely from a sync queue which finished a request and we did not enable idling on it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-02cfq-iosched: add a knob for desktop interactivenessJens Axboe
This is basically identical to what Vivek Goyal posted, but combined into one and labelled 'desktop' instead of 'fairness'. The goal is to continue to improve on the latency side of things as it relates to interactiveness, keeping the questionable bits under this sysfs tunable so it would be easy for throughput-only people to turn off. Apart from adding the interactive sysfs knob, it also adds the behavioural change of allowing slice idling even if the hardware does tagged command queuing. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01Add a tracepoint for block request remappingJun'ichi Nomura
Since 2.6.31 now has request-based device-mapper, it's useful to have a tracepoint for request-remapping as well as bio-remapping. This patch adds a tracepoint for request-remapping, trace_block_rq_remap(). Signed-off-by: Kiyoshi Ueda <k-ueda@ct.jp.nec.com> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01block: allow large discard requestsChristoph Hellwig
Currently we set the bio size to the byte equivalent of the blocks to be trimmed when submitting the initial DISCARD ioctl. That means it is subject to the max_hw_sectors limitation of the HBA which is much lower than the size of a DISCARD request we can support. Add a separate max_discard_sectors tunable to limit the size for discard requests. We limit the max discard request size in bytes to 32bit as that is the limit for bio->bi_size. This could be much larger if we had a way to pass that information through the block layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01block: use normal I/O path for discard requestsChristoph Hellwig
prepare_discard_fn() was being called in a place where memory allocation was effectively impossible. This makes it inappropriate for all but the most trivial translations of Linux's DISCARD operation to the block command set. Additionally adding a payload there makes the ownership of the bio backing unclear as it's now allocated by the device driver and not the submitter as usual. It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether the queue supports discard operations or not. blkdev_issue_discard now allocates a one-page, sector-length payload which is the right thing for the common ATA and SCSI implementations. The mtd implementation of prepare_discard_fn() is replaced with simply checking for the request being a discard. Largely based on a previous patch from Matthew Wilcox <matthew@wil.cx> which did the prepare_discard_fn but not the different payload allocation yet. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfsZdenek Kabelac
Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs introduced in commit 1d54ad6da9192fed5dd3b60224d9f2dfea0dcd82. Release kobject also in case the request_fn is NULL. Problem was noticed via kmemleak backtrace when some sysfs entries were note properly destroyed during device removal: unreferenced object 0xffff88001aa76640 (size 80): comm "lvcreate", pid 2120, jiffies 4294885144 hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 f0 65 a7 1a 00 88 ff ff .........e...... 90 66 a7 1a 00 88 ff ff 86 1d 53 81 ff ff ff ff .f........S..... backtrace: [<ffffffff813f9cc6>] kmemleak_alloc+0x26/0x60 [<ffffffff8111d693>] kmem_cache_alloc+0x133/0x1c0 [<ffffffff81195891>] sysfs_new_dirent+0x41/0x120 [<ffffffff81194b0c>] sysfs_add_file_mode+0x3c/0xb0 [<ffffffff81197c81>] internal_create_group+0xc1/0x1a0 [<ffffffff81197d93>] sysfs_create_group+0x13/0x20 [<ffffffff810d8004>] blk_trace_init_sysfs+0x14/0x20 [<ffffffff8123f45c>] blk_register_queue+0x3c/0xf0 [<ffffffff812447e4>] add_disk+0x94/0x160 [<ffffffffa00d8b08>] dm_create+0x598/0x6e0 [dm_mod] [<ffffffffa00de951>] dev_create+0x51/0x350 [dm_mod] [<ffffffffa00de823>] ctl_ioctl+0x1a3/0x240 [dm_mod] [<ffffffffa00de8f2>] dm_compat_ctl_ioctl+0x12/0x20 [dm_mod] [<ffffffff81177bfd>] compat_sys_ioctl+0xcd/0x4f0 [<ffffffff81036ed8>] sysenter_dispatch+0x7/0x2c [<ffffffffffffffff>] 0xffffffffffffffff Signed-off-by: Zdenek Kabelac <zkabelac@redhat.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01block: Do not clamp max_hw_sectors for stacking devicesMartin K. Petersen
Stacking devices do not have an inherent max_hw_sector limit. Set the default to INT_MAX so we are bounded only by capabilities of the underlying storage. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-10-01block: Set max_sectors correctly for stacking devicesMartin K. Petersen
The topology changes unintentionally caused SAFE_MAX_SECTORS to be set for stacking devices. Set the default limit to BLK_DEF_MAX_SECTORS and provide SAFE_MAX_SECTORS in blk_queue_make_request() for legacy hw drivers that depend on the old behavior. Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-19Driver-Core: extend devnode callbacks to provide permissionsKay Sievers
This allows subsytems to provide devtmpfs with non-default permissions for the device node. Instead of the default mode of 0600, null, zero, random, urandom, full, tty, ptmx now have a mode of 0666, which allows non-privileged processes to access standard device nodes in case no other userspace process applies the expected permissions. This also fixes a wrong assignment in pktcdvd and a checkpatch.pl complain. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev debugfs: Modify default debugfs directory for debugging pktcdvd. debugfs: Modified default dir of debugfs for debugging UHCI. debugfs: Change debugfs directory of IWMC3200 debugfs: Change debuhgfs directory of trace-events-sample.h debugfs: Fix mount directory of debugfs by default in events.txt hpilo: add poll f_op hpilo: add interrupt handler hpilo: staging for interrupt handling driver core: platform_device_add_data(): use kmemdup() Driver core: Add support for compatibility classes uio: add generic driver for PCI 2.3 devices driver-core: move dma-coherent.c from kernel to driver/base mem_class: fix bug mem_class: use minor as index instead of searching the array driver model: constify attribute groups UIO: remove 'default n' from Kconfig Driver core: Add accessor for device platform data Driver core: move dev_get/set_drvdata to drivers/base/dd.c Driver core: add new device to bus's list before probing
2009-09-15driver model: constify attribute groupsDavid Brownell
Let attribute group vectors be declared "const". We'd like to let most attribute metadata live in read-only sections... this is a start. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-15Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits) powerpc64: convert to dynamic percpu allocator sparc64: use embedding percpu first chunk allocator percpu: kill lpage first chunk allocator x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA percpu: update embedding first chunk allocator to handle sparse units percpu: use group information to allocate vmap areas sparsely vmalloc: implement pcpu_get_vm_areas() vmalloc: separate out insert_vmalloc_vm() percpu: add chunk->base_addr percpu: add pcpu_unit_offsets[] percpu: introduce pcpu_alloc_info and pcpu_group_info percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward percpu: add @align to pcpu_fc_alloc_fn_t percpu: make @dyn_size mandatory for pcpu_setup_first_chunk() percpu: drop @static_size from first chunk allocators percpu: generalize first chunk allocator selection percpu: build first chunk allocators selectively percpu: rename 4k first chunk allocator to page percpu: improve boot messages percpu: fix pcpu_reclaim() locking ... Fix trivial conflict as by Tejun Heo in kernel/sched.c
2009-09-14Merge branch 'for-2.6.32' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds
* 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits) block: use blkdev_issue_discard in blk_ioctl_discard Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads block: don't assume device has a request list backing in nr_requests store block: Optimal I/O limit wrapper cfq: choose a new next_req when a request is dispatched Seperate read and write statistics of in_flight requests aoe: end barrier bios with EOPNOTSUPP block: trace bio queueing trial only when it occurs block: enable rq CPU completion affinity by default cfq: fix the log message after dispatched a request block: use printk_once cciss: memory leak in cciss_init_one() splice: update mtime and atime on files block: make blk_iopoll_prep_sched() follow normal 0/1 return convention cfq-iosched: get rid of must_alloc flag block: use interrupts disabled version of raise_softirq_irqoff() block: fix comment in blk-iopoll.c block: adjust default budget for blk-iopoll block: fix long lines in block/blk-iopoll.c block: add blk-iopoll, a NAPI like approach for block devices ...
2009-09-14block: use blkdev_issue_discard in blk_ioctl_discardChristoph Hellwig
blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard, the only difference between the two is that blkdev_issue_discard needs to send a barrier discard request and blk_ioctl_discard a non-barrier one, and blk_ioctl_discard needs to wait on the request. To facilitates this add a flags argument to blkdev_issue_discard to control both aspects of the behaviour. This will be very useful later on for using the waiting funcitonality for other callers. Based on an earlier patch from Matthew Wilcox <matthew@wil.cx>. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-14block: don't assume device has a request list backing in nr_requests storeJens Axboe
Stacked devices do not. For now, just error out with -EINVAL. Later we could make the limit apply on stacked devices too, for throttling reasons. This fixes 5a54cd13353bb3b88887604e2c980aa01e314309 and should go into 2.6.31 stable as well. Cc: stable@kernel.org Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-14block: Optimal I/O limit wrapperMartin K. Petersen
Implement blk_limits_io_opt() and make blk_queue_io_opt() a wrapper around it. DM needs this to avoid poking at the queue_limits directly. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-14cfq: choose a new next_req when a request is dispatchedJeff Moyer
This patch addresses http://bugzilla.kernel.org/show_bug.cgi?id=13401, a regression introduced in 2.6.30. From the bug report: Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-14Seperate read and write statistics of in_flight requestsNikanth Karthikesan
Currently, there is a single in_flight counter measuring the number of requests in the request_queue. But some monitoring tools would like to know how many read requests and write requests are in progress. Split the current in_flight counter into two seperate counters for read and write. This information is exported as a sysfs attribute, as changing the currently available stat files would break the existing tools. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11block: trace bio queueing trial only when it occursMinchan Kim
If BIO is discarded or cross over end of device, BIO queueing trial doesn't occur. Actually the trace was called just before make_request at first: [PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23      2056a782f8e7e65fd4bfd027506b4ce1c5e9ccd4 And then 2 patches added some checks between them: [PATCH] md: check bio address after mapping through partitions        5ddfe9691c91a244e8d1be597b6428fcefd58103, [BLOCK] Don't allow empty barriers to be passed down to queues that don't grok them        51fd77bd9f512ab6cc9df0733ba1caaab89eb957 It breaks original goal. Let's trace it only when it happens. Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Acked-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11cfq: fix the log message after dispatched a requestShan Wei
The blktrace tools can show process id when cfq dispatched a request, using cfq_log_cfqq() instead of cfq_log(). Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11cfq-iosched: get rid of must_alloc flagJens Axboe
It's not currently used, as pointed out by Gui Jianfeng <guijianfeng@cn.fujitsu.com>. We already check the wait_request flag to allow an idling queue priority allocation access, so we don't need this extra flag. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11block: use interrupts disabled version of raise_softirq_irqoff()Jens Axboe
We already have interrupts disabled at that point, so use the __raise_softirq_irqoff() variant. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11block: fix comment in blk-iopoll.cJens Axboe
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11block: adjust default budget for blk-iopollJens Axboe
It's not exported, I doubt we'll have a reason to change this... Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11block: fix long lines in block/blk-iopoll.cJens Axboe
Note sure why they happened in the first place, probably some bad terminal setting. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11block: add blk-iopoll, a NAPI like approach for block devicesJens Axboe
This borrows some code from NAPI and implements a polled completion mode for block devices. The idea is the same as NAPI - instead of doing the command completion when the irq occurs, schedule a dedicated softirq in the hopes that we will complete more IO when the iopoll handler is invoked. Devices have a budget of commands assigned, and will stay in polled mode as long as they continue to consume their budget from the iopoll softirq handler. If they do not, the device is set back to interrupt completion mode. This patch holds the core bits for blk-iopoll, device driver support sold separately. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11block: improve queue_should_plug() by looking at IO depthsJens Axboe
Instead of just checking whether this device uses block layer tagging, we can improve the detection by looking at the maximum queue depth it has reached. If that crosses 4, then deem it a queuing device. This is important on high IOPS devices, since plugging hurts the performance there (it can be as much as 10-15% of the sys time). Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11bio: first step in sanitizing the bio->bi_rw flag testingJens Axboe
Get rid of any functions that test for these bits and make callers use bio_rw_flagged() directly. Then it is at least directly apparent what variable and flag they check. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11Send uevents for write_protect changesHannes Reinecke
Whenever a block device changes it's read-only attribute notify the userspace about it. Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11cfq-iosched: no need to keep track of busy_rt_queuesVivek Goyal
o Get rid of busy_rt_queues infrastructure. Looks like it is redundant. o Once an RT queue gets request it will preempt any of the BE or IDLE queues immediately. Otherwise this queue will be put on service tree and scheduler will anyway select this queue before any of the BE or IDLE queue. Hence looks like there is no need to keep track of how many busy RT queues are currently on service tree. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11cfq-iosched: drain device queue before switching to a sync queueJens Axboe
To lessen the impact of async IO on sync IO, let the device drain of any async IO in progress when switching to a sync cfqq that has idling enabled. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11scsi,block: update SCSI to handle mixed merge failuresTejun Heo
Update scsi_io_completion() such that it only fails requests till the next error boundary and retry the leftover. This enables block layer to merge requests with different failfast settings and still behave correctly on errors. Allow merge of requests of different failfast settings. As SCSI is currently the only subsystem which follows failfast status, there's no need to worry about other block drivers for now. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Niel Lambrechts <niel.lambrechts@gmail.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11block: implement mixed merge of different failfast requestsTejun Heo
Failfast has characteristics from other attributes. When issuing, executing and successuflly completing requests, failfast doesn't make any difference. It only affects how a request is handled on failure. Allowing requests with different failfast settings to be merged cause normal IOs to fail prematurely while not allowing has performance penalties as failfast is used for read aheads which are likely to be located near in-flight or to-be-issued normal IOs. This patch introduces the concept of 'mixed merge'. A request is a mixed merge if it is merge of segments which require different handling on failure. Currently the only mixable attributes are failfast ones (or lack thereof). When a bio with different failfast settings is added to an existing request or requests of different failfast settings are merged, the merged request is marked mixed. Each bio carries failfast settings and the request always tracks failfast state of the first bio. When the request fails, blk_rq_err_bytes() can be used to determine how many bytes can be safely failed without crossing into an area which requires further retrials. This allows request merging regardless of failfast settings while keeping the failure handling correct. This patch only implements mixed merge but doesn't enable it. The next one will update SCSI to make use of mixed merge. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Niel Lambrechts <niel.lambrechts@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11block: use the same failfast bits for bio and requestTejun Heo
bio and request use the same set of failfast bits. This patch makes the following changes to simplify things. * enumify BIO_RW* bits and reorder bits such that BIOS_RW_FAILFAST_* bits coincide with __REQ_FAILFAST_* bits. * The above pushes BIO_RW_AHEAD out of sync with __REQ_FAILFAST_DEV but the matching is useless anyway. init_request_from_bio() is responsible for setting FAILFAST bits on FS requests and non-FS requests never use BIO_RW_AHEAD. Drop the code and comment from blk_rq_bio_prep(). * Define REQ_FAILFAST_MASK which is OR of all FAILFAST bits and simplify FAILFAST flags handling in init_request_from_bio(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11writeback: add name to backing_dev_infoJens Axboe
This enables us to track who does what and print info. Its main use is catching dirty inodes on the default_backing_dev_info, so we can fix that up. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>