summaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)Author
2013-02-04gpio: mpc8xxx: don't set IRQ_TYPE_NONE when creating irq mappingAnatolij Gustschin
Exporting gpios over sysfs GPIO interface throws genirq error messages, i.e. on an mpc5121 based board exporting GPIO 5 triggers it: # echo 229 > /sys/class/gpio/export genirq: Setting trigger mode 0 for irq 44 failed (mpc512x_irq_set_type+0x0/0x18c) Similar error messages appear in the kernel boot log since the board specifies GPIOs for matrix keypad and also SD Card write protect and card detect GPIOs in its device tree. For all these GPIOs there is an error message in the log. The issue is triggered by setting the irq type to IRQ_TYPE_NONE in the driver's irq_domain map function mpc8xxx_gpio_irq_map(). ... mpc8xxx_gpio_irq_map irq_set_irq_type __irq_set_trigger __irq_set_trigger() calls irq_set_type() callback of the mpc8xxx gpio irq chip with the IRQ_TYPE_NONE in its 'flags' argument. This callback is either mpc8xxx_irq_set_type() or mpc512x_irq_set_type(). Both these functions return -EINVAL in the case if IRQ_TYPE_NONE is passed in the flow_type argument. This return value triggers the observed error message in __irq_set_trigger(). Modifying these callbacks to not return an error in IRQ_TYPE_NONE case doesn't make any sense to me. The line setting IRQ_TYPE_NONE type has been originally added by commit 345e5c8a "powerpc: Add interrupt support to mpc8xxx_gpio". At this time set_irq_type() checked its type argument and returned 0 if the type argument didn't specify any meaningful type in its type sense bits (and thus was equal to IRQ_TYPE_NONE). Effectively this line was a nop and I wonder what was the point of adding it. Remove IRQ_TYPE_NONE setting in the irq_domain mapping function. Signed-off-by: Anatolij Gustschin <agust@denx.de> Acked-by: Peter Korsgaard <jacmet@sunsite.dk> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-02-04gpiolib: remove gpiochip_reserve()Alexandre Courbot
gpiochip_reserve() has no user and stands in the way of the removal of the static gpio_desc[] array. Remove this function as well as the now unneeded RESERVED flag of struct gpio_desc. Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-01-29gpio: mxs: Add IRQ_TYPE_EDGE_BOTH supportGwenhael Goavec-Merou
This patch adds support for IRQ_TYPE_EDGE_BOTH needed for some driver (gpio-keys). Inspired from gpio-mxc.c Acked-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Gwenhael Goavec-Merou <gwenhael.goavec-merou@armadeus.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-01-29gpiolib-acpi: Add ACPI5 event model support to gpio.Mathias Nyman
Add ability to handle ACPI events signalled by GPIO interrupts. ACPI5 platforms can use GPIO signaled ACPI events. These GPIO interrupts are handled by ACPI event methods which need to be called from the GPIO controller's interrupt handler. acpi_gpio_request_interrupt() finds out which gpio pins have acpi event methods and assigns interrupt handlers that calls the acpi event methods for those pins. Partially based on work by Rui Zhang <rui.zhang@intel.com> Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-01-28gpio: pca953x: use managed resourcesLinus Walleij
Using the devm_* managed resources the pca driver can be simplified and cut down on boilerplate code. [gcl: fixed a inccorect reference to a removed label, "goto fail_out" became "return ret"] Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-01-28gpio: pca953x: use simple irqdomainGregory CLEMENT
This switches the legacy irqdomain to the simple one, which will auto-allocate descriptors, and also make sure that we use irq_create_mapping() in the to_irq function. Also use the map function of irq_domain_ops to setup the irq configuration on demand and no more statically during the initialization of the driver. Based on a initial patch from Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-01-25gpio: pxa: set initcall level to module initHaojian Zhuang
gpio & pinctrl driver are used together. The pinctrl driver is already launched before gpio driver in Makefile. So set gpio driver to module init level. Otherwise, the sequence will be inverted. Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-01-25gpio: pca953x: add support for pca9505Gregory CLEMENT
Now that pca953x driver can handle GPIO expanders with more than 32 bits this patch adds the support for the pca9505 which cam with 40 GPIOs. Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-01-25gpio: pca953x: make the register access by GPIO bankGregory CLEMENT
Until now the pca953x driver accessed all the bank of a given register in a single command using only a 32 bits variable. New expanders from the pca53x family come with 40 GPIOs which no more fit in a 32 variable. This patch make access to the registers more generic by relying on an array of u8 variables. This fits exactly the way the registers are represented in the hardware. It also adds helpers to access to a single register of a bank instead of reading or writing all the banks for a given register. Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Maxime Ripard <maxime.ripard@free-electrons.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-01-22gpio: pl061: set initcall level to module initHaojian Zhuang
Replace subsys initcall by module initcall level. Since pinctrl driver is already launched before gpio driver. It's unnecessary to set gpio driver in subsys init call level. Signed-off-by: Haojian Zhuang <haojian.zhuang@linaro.org> Tested-by: Dinh Nguyen <dinguyen@altera.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-01-22gpio: devm_gpio_* support should not depend on GPIOLIBShawn Guo
Some architectures (e.g. blackfin) provide gpio API without requiring GPIOLIB support (ARCH_WANT_OPTIONAL_GPIOLIB). devm_gpio_* functions should also work for these architectures, since they do not really depend on GPIOLIB. Add a new option GPIO_DEVRES (enabled by default) to control the build of devres.c. It also removes the empty version of devm_gpio_* functions for !GENERIC_GPIO build from linux/gpio.h, and moves the function declarations from asm-generic/gpio.h into linux/gpio.h. Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-01-17gpio: twl4030: Cache the direction and output states in private dataPeter Ujfalusi
Use more coherent locking in the driver. Use bitfield to store the GPIO direction and if the pin is configured as output store the status also in a bitfiled. In this way we can just look at these bitfields when we need information about the pin status and only reach out to the chip when it is needed. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-01-17gpio: twl4030: Introduce private structure to store variables needed runtimePeter Ujfalusi
Move most of the global variables inside a private structure and allocate it dynamically. Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2013-01-10gpio: vt8500: Export dedicated GPIO before multifunction pins.Tony Prisk
The vendor does not provide numbering for gpio pins. Vendor source exports dedicated gpio pins first, followed by multifunction pins. As this is what end users expect, this patch changes vt8500 and wm8505 to do the same. Signed-off-by: Tony Prisk <linux@prisktech.co.nz> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2012-12-21Merge git://www.linux-watchdog.org/linux-watchdogLinus Torvalds
Pull watchdog updates from Wim Van Sebroeck: "This includes some fixes and code improvements (like clk_prepare_enable and clk_disable_unprepare), conversion from the omap_wdt and twl4030_wdt drivers to the watchdog framework, addition of the SB8x0 chipset support and the DA9055 Watchdog driver and some OF support for the davinci_wdt driver." * git://www.linux-watchdog.org/linux-watchdog: (22 commits) watchdog: mei: avoid oops in watchdog unregister code path watchdog: Orion: Fix possible null-deference in orion_wdt_probe watchdog: sp5100_tco: Add SB8x0 chipset support watchdog: davinci_wdt: add OF support watchdog: da9052: Fix invalid free of devm_ allocated data watchdog: twl4030_wdt: Change TWL4030_MODULE_PM_RECEIVER to TWL_MODULE_PM_RECEIVER watchdog: remove depends on CONFIG_EXPERIMENTAL watchdog: Convert dev_printk(KERN_<LEVEL> to dev_<level>( watchdog: DA9055 Watchdog driver watchdog: omap_wdt: eliminate goto watchdog: omap_wdt: delete redundant platform_set_drvdata() calls watchdog: omap_wdt: convert to devm_ functions watchdog: omap_wdt: convert to new watchdog core watchdog: WatchDog Timer Driver Core: fix comment watchdog: s3c2410_wdt: use clk_prepare_enable and clk_disable_unprepare watchdog: imx2_wdt: Select the driver via ARCH_MXC watchdog: cpu5wdt.c: add missing del_timer call watchdog: hpwdt.c: Increase version string watchdog: Convert twl4030_wdt to watchdog core davinci_wdt: preparation for switch to common clock framework ...
2012-12-21Merge tag 'dm-3.8-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm Pull dm update from Alasdair G Kergon: "Miscellaneous device-mapper fixes, cleanups and performance improvements. Of particular note: - Disable broken WRITE SAME support in all targets except linear and striped. Use it when kcopyd is zeroing blocks. - Remove several mempools from targets by moving the data into the bio's new front_pad area(which dm calls 'per_bio_data'). - Fix a race in thin provisioning if discards are misused. - Prevent userspace from interfering with the ioctl parameters and use kmalloc for the data buffer if it's small instead of vmalloc. - Throttle some annoying error messages when I/O fails." * tag 'dm-3.8-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm: (36 commits) dm stripe: add WRITE SAME support dm: remove map_info dm snapshot: do not use map_context dm thin: dont use map_context dm raid1: dont use map_context dm flakey: dont use map_context dm raid1: rename read_record to bio_record dm: move target request nr to dm_target_io dm snapshot: use per_bio_data dm verity: use per_bio_data dm raid1: use per_bio_data dm: introduce per_bio_data dm kcopyd: add WRITE SAME support to dm_kcopyd_zero dm linear: add WRITE SAME support dm: add WRITE SAME support dm: prepare to support WRITE SAME dm ioctl: use kmalloc if possible dm ioctl: remove PF_MEMALLOC dm persistent data: improve improve space map block alloc failure message dm thin: use DMERR_LIMIT for errors ...
2012-12-21Merge tag 'rdma-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull more infiniband changes from Roland Dreier: "Second batch of InfiniBand/RDMA changes for 3.8: - cxgb4 changes to fix lookup engine hash collisions - mlx4 changes to make flow steering usable - fix to IPoIB to avoid pinning dst reference for too long" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: RDMA/cxgb4: Fix bug for active and passive LE hash collision path RDMA/cxgb4: Fix LE hash collision bug for passive open connection RDMA/cxgb4: Fix LE hash collision bug for active open connection mlx4_core: Allow choosing flow steering mode mlx4_core: Adjustments to Flow Steering activation logic for SR-IOV mlx4_core: Fix error flow in the flow steering wrapper mlx4_core: Add QPN enforcement for flow steering rules set by VFs cxgb4: Add LE hash collision bug fix path in LLD driver cxgb4: Add T4 filter support IPoIB: Call skb_dst_drop() once skb is enqueued for sending
2012-12-21dm stripe: add WRITE SAME supportMike Snitzer
Rename stripe_map_discard to stripe_map_range and reuse it for WRITE SAME bio processing. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm: remove map_infoMikulas Patocka
This patch removes map_info from bio-based device mapper targets. map_info is still used for request-based targets. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm snapshot: do not use map_contextMikulas Patocka
Eliminate struct map_info from dm-snap. map_info->ptr was used in dm-snap to indicate if the bio was tracked. If map_info->ptr was non-NULL, the bio was linked in tracked_chunk_hash. This patch removes the use of map_info->ptr. We determine if the bio was tracked based on hlist_unhashed(&c->node). If hlist_unhashed is true, the bio is not tracked, if it is false, the bio is tracked. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm thin: dont use map_contextMikulas Patocka
This patch removes endio_hook_pool from dm-thin and uses per-bio data instead. This patch removes any use of map_info in preparation for the next patch that removes map_info from bio-based device mapper. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm raid1: dont use map_contextMikulas Patocka
Don't use map_info any more in dm-raid1. map_info was used for writes to hold the region number. For this purpose we add a new field dm_bio_details to dm_raid1_bio_record. map_info was used for reads to hold a pointer to dm_raid1_bio_record (if the pointer was non-NULL, bio details were saved; if the pointer was NULL, bio details were not saved). We use dm_raid1_bio_record.details->bi_bdev for this purpose. If bi_bdev is NULL, details were not saved, if bi_bdev is non-NULL, details were saved. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm flakey: dont use map_contextMikulas Patocka
Replace map_info with a per-bio structure "struct per_bio_data" in dm-flakey. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm raid1: rename read_record to bio_recordMikulas Patocka
Rename struct read_record to bio_record in dm-raid1. In the following patch, the structure will be used for both read and write bios, so rename it. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm: move target request nr to dm_target_ioMikulas Patocka
This patch moves target_request_nr from map_info to dm_target_io and makes it accessible with dm_bio_get_target_request_nr. This patch is a preparation for the next patch that removes map_info. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm snapshot: use per_bio_dataMikulas Patocka
Replace tracked_chunk_pool with per_bio_data in dm-snap. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm verity: use per_bio_dataMikulas Patocka
Replace io_mempool with per_bio_data in dm-verity. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm raid1: use per_bio_dataMikulas Patocka
Replace read_record_pool with per_bio_data in dm-raid1. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm: introduce per_bio_dataMikulas Patocka
Introduce a field per_bio_data_size in struct dm_target. Targets can set this field in the constructor. If a target sets this field to a non-zero value, "per_bio_data_size" bytes of auxiliary data are allocated for each bio submitted to the target. These data can be used for any purpose by the target and help us improve performance by removing some per-target mempools. Per-bio data is accessed with dm_per_bio_data. The argument data_size must be the same as the value per_bio_data_size in dm_target. If the target has a pointer to per_bio_data, it can get a pointer to the bio with dm_bio_from_per_bio_data() function (data_size must be the same as the value passed to dm_per_bio_data). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm kcopyd: add WRITE SAME support to dm_kcopyd_zeroMike Snitzer
Add WRITE SAME support to dm-io and make it accessible to dm_kcopyd_zero(). dm_kcopyd_zero() provides an asynchronous interface whereas the blkdev_issue_write_same() interface is synchronous. WRITE SAME is a SCSI command that can be leveraged for more efficient zeroing of a specified logical extent of a device which supports it. Only a single zeroed logical block is transfered to the target for each WRITE SAME and the target then writes that same block across the specified extent. The dm thin target uses this. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm linear: add WRITE SAME supportMike Snitzer
The linear target can already support WRITE SAME requests so signal this by setting num_write_same_requests to 1. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm: add WRITE SAME supportMike Snitzer
WRITE SAME bios have a payload that contain a single page. When cloning WRITE SAME bios DM has no need to modify the bi_io_vec attributes (and doing so would be detrimental). DM need only alter the start and end of the WRITE SAME bio accordingly. Rather than duplicate __clone_and_map_discard, factor out a common function that is also used by __clone_and_map_write_same. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm: prepare to support WRITE SAMEMike Snitzer
Allow targets to opt in to WRITE SAME support by setting 'num_write_same_requests' in the dm_target structure. A dm device will only advertise WRITE SAME support if all its targets and all its underlying devices support it. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm ioctl: use kmalloc if possibleMikulas Patocka
If the parameter buffer is small enough, try to allocate it with kmalloc() rather than vmalloc(). vmalloc is noticeably slower than kmalloc because it has to manipulate page tables. In my tests, on PA-RISC this patch speeds up activation 13 times. On Opteron this patch speeds up activation by 5%. This patch introduces a new function free_params() to free the parameters and this uses new flags that record whether or not vmalloc() was used and whether or not the input buffer must be wiped after use. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm ioctl: remove PF_MEMALLOCMikulas Patocka
When allocating memory for the userspace ioctl data, set some appropriate GPF flags directly instead of using PF_MEMALLOC. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm persistent data: improve improve space map block alloc failure messageJoe Thornber
Improve space map error message when unable to allocate a new metadata block. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm thin: use DMERR_LIMIT for errorsMike Snitzer
Throttle all errors logged from the IO path by dm thin. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm persistent data: use DMERR_LIMIT for errorsMike Snitzer
Nearly all of persistent-data is in the IO path so throttle error messages with DMERR_LIMIT to limit the amount logged when something has gone wrong. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm block manager: reinstate message when validator failsMike Snitzer
Reinstate a useful error message when the block manager buffer validator fails. This was mistakenly eliminated when the block manager was converted to use dm-bufio. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm raid: round region_size to power of twoJonathan Brassow
If the user does not supply a bitmap region_size to the dm raid target, a reasonable size is computed automatically. If this is not a power of 2, the md code will report an error later. This patch catches the problem early and rounds the region_size to the next power of two. Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm thin: cleanup dead codeJoe Thornber
Remove unused @data_block parameter from cell_defer. Change thin_bio_map to use many returns rather than setting a variable. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm thin: rename cell_defer_except to cell_defer_no_holderJoe Thornber
Rename cell_defer_except() to cell_defer_no_holder() which describes its function more clearly. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm snapshot: optimize track_chunkMikulas Patocka
track_chunk is always called with interrupts enabled. Consequently, we do not need to save and restore interrupt state in "flags" variable. This patch changes spin_lock_irqsave to spin_lock_irq and spin_unlock_irqrestore to spin_unlock_irq. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm raid: use DM_ENDIO_INCOMPLETEMikulas Patocka
Use a defined macro DM_ENDIO_INCOMPLETE instead of a numeric constant. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm raid1: remove impossible mempool_alloc error testMikulas Patocka
mempool_alloc can't fail if __GFP_WAIT is specified, so the condition that tests if read_record is non-NULL is always true. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm thin: emit ignore_discard in status when discards disabledMike Snitzer
If "ignore_discard" is specified when creating the thin pool device then discard support is disabled for that device. The pool device's status should reflect this fact rather than stating "no_discard_passdown" (which implies discards are enabled but passdown is disabled). Reported-by: Zdenek Kabelac <zkabelac@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm persistent data: fix nested btree deletionJoe Thornber
When deleting nested btrees, the code forgets to delete the innermost btree. The thin-metadata code serendipitously compensates for this by claiming there is one extra layer in the tree. This patch corrects both problems. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm thin: wake worker when discard is preparedJoe Thornber
When discards are prepared it is best to directly wake the worker that will process them. The worker will be woken anyway, via periodic commit, but there is no reason to not wake_worker here. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2012-12-21dm thin: fix race between simultaneous io and discards to same blockJoe Thornber
There is a race when discard bios and non-discard bios are issued simultaneously to the same block. Discard support is expensive for all thin devices precisely because you have to be careful to quiesce the area you're discarding. DM thin must handle this conflicting IO pattern (simultaneous non-discard vs discard) even though a sane application shouldn't be issuing such IO. The race manifests as follows: 1. A non-discard bio is mapped in thin_bio_map. This doesn't lock out parallel activity to the same block. 2. A discard bio is issued to the same block as the non-discard bio. 3. The discard bio is locked in a dm_bio_prison_cell in process_discard to lock out parallel activity against the same block. 4. The non-discard bio's mapping continues and its all_io_entry is incremented so the bio is accounted for in the thin pool's all_io_ds which is a dm_deferred_set used to track time locality of non-discard IO. 5. The non-discard bio is finally locked in a dm_bio_prison_cell in process_bio. The race can result in deadlock, leaving the block layer hanging waiting for completion of a discard bio that never completes, e.g.: INFO: task ruby:15354 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ruby D ffffffff8160f0e0 0 15354 15314 0x00000000 ffff8802fb08bc58 0000000000000082 ffff8802fb08bfd8 0000000000012900 ffff8802fb08a010 0000000000012900 0000000000012900 0000000000012900 ffff8802fb08bfd8 0000000000012900 ffff8803324b9480 ffff88032c6f14c0 Call Trace: [<ffffffff814e5a19>] schedule+0x29/0x70 [<ffffffff814e3d85>] schedule_timeout+0x195/0x220 [<ffffffffa06b9bc1>] ? _dm_request+0x111/0x160 [dm_mod] [<ffffffff814e589e>] wait_for_common+0x11e/0x190 [<ffffffff8107a170>] ? try_to_wake_up+0x2b0/0x2b0 [<ffffffff814e59ed>] wait_for_completion+0x1d/0x20 [<ffffffff81233289>] blkdev_issue_discard+0x219/0x260 [<ffffffff81233e79>] blkdev_ioctl+0x6e9/0x7b0 [<ffffffff8119a65c>] block_ioctl+0x3c/0x40 [<ffffffff8117539c>] do_vfs_ioctl+0x8c/0x340 [<ffffffff8119a547>] ? block_llseek+0x67/0xb0 [<ffffffff811756f1>] sys_ioctl+0xa1/0xb0 [<ffffffff810561f6>] ? sys_rt_sigprocmask+0x86/0xd0 [<ffffffff814ef099>] system_call_fastpath+0x16/0x1b The thinp-test-suite's test_discard_random_sectors reliably hits this deadlock on fast SSD storage. The fix for this race is that the all_io_entry for a bio must be incremented whilst the dm_bio_prison_cell is held for the bio's associated virtual and physical blocks. That cell locking wasn't occurring early enough in thin_bio_map. This patch fixes this. Care is taken to always call the new function inc_all_io_entry() with the relevant cells locked, but they are generally unlocked before calling issue() to try to avoid holding the cells locked across generic_submit_request. Also, now that thin_bio_map may lock bios in a cell, process_bio() is no longer the only thread that will do so. Because of this we must be sure to use cell_defer_except() to release all non-holder entries, that were added by the other thread, because they must be deferred. This patch depends on "dm thin: replace dm_cell_release_singleton with cell_defer_except". Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Cc: stable@vger.kernel.org
2012-12-21dm thin: replace dm_cell_release_singleton with cell_defer_exceptJoe Thornber
Change existing users of the function dm_cell_release_singleton to share cell_defer_except instead, and then remove the now-unused function. Everywhere that calls dm_cell_release_singleton, the bio in question is the holder of the cell. If there are no non-holder entries in the cell then cell_defer_except behaves exactly like dm_cell_release_singleton. Conversely, if there *are* non-holder entries then dm_cell_release_singleton must not be used because those entries would need to be deferred. Consequently, it is safe to replace use of dm_cell_release_singleton with cell_defer_except. This patch is a pre-requisite for "dm thin: fix race between simultaneous io and discards to same block". Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Alasdair G Kergon <agk@redhat.com>