summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-07-22workqueue: remove the stale comment in pwq_unbound_release_workfn()Lai Jiangshan
In 75ccf5950f82 ("workqueue: prepare flush_workqueue() for dynamic creation and destrucion of unbound pool_workqueues"), a comment about the synchronization for the pwq in pwq_unbound_release_workfn() was added. The comment claimed the flush_mutex wasn't strictly necessary, it was correct in that time, due to the pwq was protected by workqueue_lock. But it is incorrect now since the wq->flush_mutex was renamed to wq->mutex and workqueue_lock was removed, the wq->mutex is strictly needed. But the comment was miss-updated when the synchronization was changed. This patch removes the incorrect comments and doesn't add any new comment to explain why wq->mutex is needed here, which is definitely obvious and wq->pwqs_node has "WQ" notation in its definition which is better comment. The old commit mentioned above also introduced a comment in link_pwq() about the synchronization. This comment is also removed in this patch since the whole link_pwq() is proteced by wq->mutex. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-07-22workqueue: move rescuer pool detachment to the endLai Jiangshan
In 51697d393922 ("workqueue: use generic attach/detach routine for rescuers"), The rescuer detaches itself from the pool before put_pwq() so that the put_unbound_pool() will not destroy the rescuer-attached pool. It is unnecessary. worker_detach_from_pool() can be used as the last statement to access to the pool just like the regular workers, put_unbound_pool() will wait for it to detach and then free the pool. So we move the worker_detach_from_pool() down, make it coincide with the regular workers. tj: Minor description update. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-07-22workqueue: unfold start_worker() into create_worker()Lai Jiangshan
Simply unfold the code of start_worker() into create_worker() and remove the original start_worker() and create_and_start_worker(). The only trade-off is the introduced overhead that the pool->lock is released and regrabbed after the newly worker is started. The overhead is acceptible since the manager is slow path. And because this new locking behavior, the newly created worker may grab the lock earlier than the manager and go to process work items. In this case, the recheck need_to_create_worker() may be true as expected and the manager goes to restart which is the correct behavior. tj: Minor updates to description and comments. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-07-22workqueue: remove @wakeup from worker_set_flags()Lai Jiangshan
worker_set_flags() has only two callers, each specifying %true and %false for @wakeup. Let's push the wake up to the caller and remove @wakeup from worker_set_flags(). The caller can use the following instead if wakeup is necessary: worker_set_flags(); if (need_more_worker(pool)) wake_up_worker(pool); This makes the code simpler. This patch doesn't introduce behavior changes. tj: Updated description and comments. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-07-22workqueue: remove an unneeded UNBOUND test before waking up the next workerLai Jiangshan
In process_one_work(): if ((worker->flags & WORKER_UNBOUND) && need_more_worker(pool)) wake_up_worker(pool); the first test is unneeded. Even if the first test is removed, it doesn't affect the wake-up logic for WORKER_UNBOUND, and it will not introduce any useless wake-ups for normal per-cpu workers since nr_running is always >= 1. It will introduce useless/redundant wake-ups for CPU_INTENSIVE, but this case is rare and the next patch will also remove this redundant wake-up. tj: Minor updates to the description and comment. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-07-18workqueue: wake regular worker if need_more_worker() when rescuer leave the poolLai Jiangshan
We don't need to wake up regular worker when nr_running==1, so need_more_worker() is sufficient here. And need_more_worker() gives us better readability due to the name of "keep_working()" implies the rescuer should keep working now but the rescuer is actually leaving. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-07-15workqueue: alloc struct worker on its local nodeLai Jiangshan
When the create_worker() is called from non-manager, the struct worker is allocated from the node of the caller which may be different from the node of pool->node. So we add a node ID argument for the alloc_worker() to ensure the struct worker is allocated from the preferable node. tj: @nid renamed to @node for consistency. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-07-11workqueue: reuse the already calculated pwq in try_to_grab_pending()Lai Jiangshan
try_to_grab_pending() was re-calculating the associated pwq using get_work_pwq() when it already has it cached in a local varible and the association can't change. Reuse the local variable instead. This doesn't introduce any functional changes. tj: Updated description. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-07-01workqueue: stronger test in process_one_work()Lai Jiangshan
When POOL_DISASSOCIATED is cleared, the running worker's local CPU should be the same as pool->cpu without any exception even during cpu-hotplug. This patch changes "(proposition_A && proposition_B && proposition_C)" to "(proposition_B && proposition_C)", so if the old compound proposition is true, the new one must be true too. so this won't hide any possible bug which can be hit by old test. tj: Minor description update and dropped the obvious comment. CC: Jason J. Herne <jjherne@linux.vnet.ibm.com> CC: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-07-01workqueue: clear POOL_DISASSOCIATED in rebind_workers()Lai Jiangshan
a9ab775bcadf ("workqueue: directly restore CPU affinity of workers from CPU_ONLINE") moved pool locking into rebind_workers() but left "pool->flags &= ~POOL_DISASSOCIATED" in workqueue_cpu_up_callback(). There is nothing necessarily wrong with it, but there is no benefit either. Let's move it into rebind_workers() and achieve the following benefits: 1) better readability, POOL_DISASSOCIATED is cleared in rebind_workers() as expected. 2) we can guarantee that, when POOL_DISASSOCIATED is clear, the running workers of the pool are on the local CPU (pool->cpu). tj: Minor description update. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-06-19workqueue: sanity check pool->cpu in wq_worker_sleeping()Lai Jiangshan
In theory, pool->cpu is equals to @cpu in wq_worker_sleeping() after worker->flags is checked. And "pool->cpu != cpu" sanity check will help us if something wrong. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-06-19workqueue: clear leftover flags when detachedLai Jiangshan
When a worker is detached, the worker->flags may still have WORKER_UNBOUND or WORKER_REBOUND, it is OK for all cases: 1) if it is a normal worker, the worker will be dead, it is OK. 2) if it is a rescuer, it may re-attach to a pool with this leftover flag[s], it is still correct except it may cause unneeded wakeup. It is correct but not good, so we just remove the leftover flags. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-06-19workqueue: remove useless WARN_ON_ONCE()Lai Jiangshan
The @cpu is fetched via smp_processor_id() in this function, so the check is useless. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-06-19workqueue: use schedule_timeout_interruptible() instead of open codeLai Jiangshan
schedule_timeout_interruptible(CREATE_COOLDOWN) is exactly the same as the original code. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-06-19workqueue: remove the empty check in too_many_workers()Lai Jiangshan
The commit ea1abd6197d5 ("workqueue: reimplement idle worker rebinding") used a trick which simply removes all to-be-bound idle workers from the idle list and lets them add themselves back after completing rebinding. And this trick caused the @worker_pool->nr_idle may deviate than the actual number of idle workers on @worker_pool->idle_list. More specifically, nr_idle may be non-zero while ->idle_list is empty. All users of ->nr_idle and ->idle_list are audited. The only affected one is too_many_workers() which is updated to check %false if ->idle_list is empty regardless of ->nr_idle. The commit/trick was complicated due to it just tried to simplify an even more complicated problem (workers had to rebind itself). But the commit a9ab775bcadf ("workqueue: directly restore CPU affinity of workers from CPU_ONLINE") fixed all these problems and the mentioned trick was useless and is gone. So, now the @worker_pool->nr_idle is exactly the actual number of workers on @worker_pool->idle_list. too_many_workers() should recover as it was before the trick. So we remove the empty check. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-06-19workqueue: use "pool->cpu < 0" to stand for an unbound poolLai Jiangshan
There is a piece of sanity checks code in the put_unbound_pool(). The meaning of this code is "if it is not an unbound pool, it will complain and return" IIUC. But the code uses "pool->flags & POOL_DISASSOCIATED" imprecisely due to a non-unbound pool may also have this flags. We should use "pool->cpu < 0" to stand for an unbound pool, so we covert the code to it. There is no strictly wrong if we still keep "pool->flags & POOL_DISASSOCIATED" here, but it is just a noise if we keep it: 1) we focus on "unbound" here, not "[dis]association". 2) "pool->cpu < 0" already implies "pool->flags & POOL_DISASSOCIATED". Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2014-06-16epoll: fix use-after-free in eventpoll_release_fileKonstantin Khlebnikov
This fixes use-after-free of epi->fllink.next inside list loop macro. This loop actually releases elements in the body. The list is rcu-protected but here we cannot hold rcu_read_lock because we need to lock mutex inside. The obvious solution is to use list_for_each_entry_safe(). RCU-ness isn't essential because nobody can change this list under us, it's final fput for this file. The bug was introduced by ae10b2b4eb01 ("epoll: optimize EPOLL_CTL_DEL using rcu") Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Stable <stable@vger.kernel.org> # 3.13+ Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Jason Baron <jbaron@akamai.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-16Revert "offb: Add palette hack for little endian"Benjamin Herrenschmidt
This reverts commit e1edf18b20076da83dd231dbd2146cbbc31c0b14. This patch was a misguided attempt at fixing offb for LE ppc64 kernels on BE qemu but is just wrong ... it breaks real LE/LE setups, LE with real HW, and existing mixed endian systems that did the fight thing with the appropriate device-tree property. Bad reviewing on my part, sorry. The right fix is to either make qemu change its endian when the guest changes endian (working on that) or to use the existing foreign endian support. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> [v3.13+] ---
2014-06-15Linux 3.16-rc1Linus Torvalds
2014-06-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix checksumming regressions, from Tom Herbert. 2) Undo unintentional permissions changes for SCTP rto_alpha and rto_beta sysfs knobs, from Denial Borkmann. 3) VXLAN, like other IP tunnels, should advertize it's encapsulation size using dev->needed_headroom instead of dev->hard_header_len. From Cong Wang. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: net: sctp: fix permissions for rto_alpha and rto_beta knobs vxlan: Checksum fixes net: add skb_pop_rcv_encapsulation udp: call __skb_checksum_complete when doing full checksum net: Fix save software checksum complete net: Fix GSO constants to match NETIF flags udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookup vxlan: use dev->needed_headroom instead of dev->hard_header_len MAINTAINERS: update cxgb4 maintainer
2014-06-15Merge tag 'clk-for-linus-3.16-part2' of ↵Linus Torvalds
git://git.linaro.org/people/mike.turquette/linux Pull more clock framework updates from Mike Turquette: "This contains the second half the of the clk changes for 3.16. They are simply fixes and code refactoring for the OMAP clock drivers. The sunxi clock driver changes include splitting out the one mega-driver into several smaller pieces and adding support for the A31 SoC clocks" * tag 'clk-for-linus-3.16-part2' of git://git.linaro.org/people/mike.turquette/linux: (25 commits) clk: sunxi: document PRCM clock compatible strings clk: sunxi: add PRCM (Power/Reset/Clock Management) clks support clk: sun6i: Protect SDRAM gating bit clk: sun6i: Protect CPU clock clk: sunxi: Rework clock protection code clk: sunxi: Move the GMAC clock to a file of its own clk: sunxi: Move the 24M oscillator to a file of its own clk: sunxi: Remove calls to clk_put clk: sunxi: document new A31 USB clock compatible clk: sunxi: Implement A31 USB clock ARM: dts: OMAP5/DRA7: use omap5-mpu-dpll-clock capable of dealing with higher frequencies CLK: TI: dpll: support OMAP5 MPU DPLL that need special handling for higher frequencies ARM: OMAP5+: dpll: support Duty Cycle Correction(DCC) CLK: TI: clk-54xx: Set the rate for dpll_abe_m2x2_ck CLK: TI: Driver for DRA7 ATL (Audio Tracking Logic) dt:/bindings: DRA7 ATL (Audio Tracking Logic) clock bindings ARM: dts: dra7xx-clocks: Correct name for atl clkin3 clock CLK: TI: gate: add composite interface clock to OMAP2 only build ARM: OMAP2: clock: add DT boot support for cpufreq_ck CLK: TI: OMAP2: add clock init support ...
2014-06-15Merge git://git.infradead.org/users/willy/linux-nvmeLinus Torvalds
Pull NVMe update from Matthew Wilcox: "Mostly bugfixes again for the NVMe driver. I'd like to call out the exported tracepoint in the block layer; I believe Keith has cleared this with Jens. We've had a few reports from people who're really pounding on NVMe devices at scale, hence the timeout changes (and new module parameters), hotplug cpu deadlock, tracepoints, and minor performance tweaks" [ Jens hadn't seen that tracepoint thing, but is ok with it - it will end up going away when mq conversion happens ] * git://git.infradead.org/users/willy/linux-nvme: (22 commits) NVMe: Fix START_STOP_UNIT Scsi->NVMe translation. NVMe: Use Log Page constants in SCSI emulation NVMe: Define Log Page constants NVMe: Fix hot cpu notification dead lock NVMe: Rename io_timeout to nvme_io_timeout NVMe: Use last bytes of f/w rev SCSI Inquiry NVMe: Adhere to request queue block accounting enable/disable NVMe: Fix nvme get/put queue semantics NVMe: Delete NVME_GET_FEAT_TEMP_THRESH NVMe: Make admin timeout a module parameter NVMe: Make iod bio timeout a parameter NVMe: Prevent possible NULL pointer dereference NVMe: Fix the buffer size passed in GetLogPage(CDW10.NUMD) NVMe: Update data structures for NVMe 1.2 NVMe: Enable BUILD_BUG_ON checks NVMe: Update namespace and controller identify structures to the 1.1a spec NVMe: Flush with data support NVMe: Configure support for block flush NVMe: Add tracepoints NVMe: Protect against badly formatted CQEs ...
2014-06-15net: sctp: fix permissions for rto_alpha and rto_beta knobsDaniel Borkmann
Commit 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") has silently changed permissions for rto_alpha and rto_beta knobs from 0644 to 0444. The purpose of this was to discourage users from tweaking rto_alpha and rto_beta knobs in production environments since they are key to correctly compute rtt/srtt. RFC4960 under section 6.3.1. RTO Calculation says regarding rto_alpha and rto_beta under rule C3 and C4: [...] C3) When a new RTT measurement R' is made, set RTTVAR <- (1 - RTO.Beta) * RTTVAR + RTO.Beta * |SRTT - R'| and SRTT <- (1 - RTO.Alpha) * SRTT + RTO.Alpha * R' Note: The value of SRTT used in the update to RTTVAR is its value before updating SRTT itself using the second assignment. After the computation, update RTO <- SRTT + 4 * RTTVAR. C4) When data is in flight and when allowed by rule C5 below, a new RTT measurement MUST be made each round trip. Furthermore, new RTT measurements SHOULD be made no more than once per round trip for a given destination transport address. There are two reasons for this recommendation: First, it appears that measuring more frequently often does not in practice yield any significant benefit [ALLMAN99]; second, if measurements are made more often, then the values of RTO.Alpha and RTO.Beta in rule C3 above should be adjusted so that SRTT and RTTVAR still adjust to changes at roughly the same rate (in terms of how many round trips it takes them to reflect new values) as they would if making only one measurement per round-trip and using RTO.Alpha and RTO.Beta as given in rule C3. However, the exact nature of these adjustments remains a research issue. [...] While it is discouraged to adjust rto_alpha and rto_beta and not further specified how to adjust them, the RFC also doesn't explicitly forbid it, but rather gives a RECOMMENDED default value (rto_alpha=3, rto_beta=2). We have a couple of users relying on the old permissions before they got changed. That said, if someone really has the urge to adjust them, we could allow it with a warning in the log. Fixes: 3fd091e73b81 ("[SCTP]: Remove multiple levels of msecs to jiffies conversions.") Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15Merge branch 'csum_fixes'David S. Miller
Tom Herbert says: ==================== Fixes related to some recent checksum modifications. - Fix GSO constants to match NETIF flags - Fix logic in saving checksum complete in __skb_checksum_complete - Call __skb_checksum_complete from UDP if we are checksumming over whole packet in order to save checksum. - Fixes to VXLAN to work correctly with checksum complete ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15vxlan: Checksum fixesTom Herbert
Call skb_pop_rcv_encapsulation and postpull_rcsum for the Ethernet header to work properly with checksum complete. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15net: add skb_pop_rcv_encapsulationTom Herbert
This function is used by UDP encapsulation protocols in RX when crossing encapsulation boundary. If ip_summed is set to CHECKSUM_UNNECESSARY and encapsulation is not set, change to CHECKSUM_NONE since the checksum has not been validated within the encapsulation. Clears csum_valid by the same rationale. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15udp: call __skb_checksum_complete when doing full checksumTom Herbert
In __udp_lib_checksum_complete check if checksum is being done over all the data (len is equal to skb->len) and if it is call __skb_checksum_complete instead of __skb_checksum_complete_head. This allows checksum to be saved in checksum complete. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15net: Fix save software checksum completeTom Herbert
Geert reported issues regarding checksum complete and UDP. The logic introduced in commit 7e3cead5172927732f51fde ("net: Save software checksum complete") is not correct. This patch: 1) Restores code in __skb_checksum_complete_header except for setting CHECKSUM_UNNECESSARY. This function may be calculating checksum on something less than skb->len. 2) Adds saving checksum to __skb_checksum_complete. The full packet checksum 0..skb->len is calculated without adding in pseudo header. This value is saved in skb->csum and then the pseudo header is added to that to derive the checksum for validation. 3) In both __skb_checksum_complete_header and __skb_checksum_complete, set skb->csum_valid to whether checksum of zero was computed. This allows skb_csum_unnecessary to return true without changing to CHECKSUM_UNNECESSARY which was done previously. 4) Copy new csum related bits in __copy_skb_header. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-15net: Fix GSO constants to match NETIF flagsTom Herbert
Joseph Gasparakis reported that VXLAN GSO offload stopped working with i40e device after recent UDP changes. The problem is that the SKB_GSO_* bits are out of sync with the corresponding NETIF flags. This patch fixes that. Also, we add BUILD_BUG_ONs in net_gso_ok for several GSO constants that were missing to avoid the problem in the future. Reported-by: Joseph Gasparakis <joseph.gasparakis@intel.com> Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-14Merge tag 'scsi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull more SCSI updates from James Bottomley: "This is just a couple of drivers (hpsa and lpfc) that got left out for further testing in linux-next. We also have one fix to a prior submission (qla2xxx sparse)" * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (36 commits) qla2xxx: fix sparse warnings introduced by previous target mode t10-dif patch lpfc: Update lpfc version to driver version 10.2.8001.0 lpfc: Fix ExpressLane priority setup lpfc: mark old devices as obsolete lpfc: Fix for initializing RRQ bitmap lpfc: Fix for cleaning up stale ring flag and sp_queue_event entries lpfc: Update lpfc version to driver version 10.2.8000.0 lpfc: Update Copyright on changed files from 8.3.45 patches lpfc: Update Copyright on changed files lpfc: Fixed locking for scsi task management commands lpfc: Convert runtime references to old xlane cfg param to fof cfg param lpfc: Fix FW dump using sysfs lpfc: Fix SLI4 s abort loop to process all FCP rings and under ring_lock lpfc: Fixed kernel panic in lpfc_abort_handler lpfc: Fix locking for postbufq when freeing lpfc: Fix locking for lpfc_hba_down_post lpfc: Fix dynamic transitions of FirstBurst from on to off hpsa: fix handling of hpsa_volume_offline return value hpsa: return -ENOMEM not -1 on kzalloc failure in hpsa_get_device_id hpsa: remove messages about volume status VPD inquiry page not supported ...
2014-06-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull more btrfs updates from Chris Mason: "This has a few fixes since our last pull and a new ioctl for doing btree searches from userland. It's very similar to the existing ioctl, but lets us return larger items back down to the app" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: btrfs: fix error handling in create_pending_snapshot btrfs: fix use of uninit "ret" in end_extent_writepage() btrfs: free ulist in qgroup_shared_accounting() error path Btrfs: fix qgroups sanity test crash or hang btrfs: prevent RCU warning when dereferencing radix tree slot Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting btrfs: new ioctl TREE_SEARCH_V2 btrfs: tree_search, search_ioctl: direct copy to userspace btrfs: new function read_extent_buffer_to_user btrfs: tree_search, copy_to_sk: return needed size on EOVERFLOW btrfs: tree_search, copy_to_sk: return EOVERFLOW for too small buffer btrfs: tree_search, search_ioctl: accept varying buffer btrfs: tree_search: eliminate redundant nr_items check
2014-06-14Merge git://git.kvack.org/~bcrl/aio-nextLinus Torvalds
Pull aio fix and cleanups from Ben LaHaise: "This consists of a couple of code cleanups plus a minor bug fix" * git://git.kvack.org/~bcrl/aio-next: aio: cleanup: flatten kill_ioctx() aio: report error from io_destroy() when threads race in io_destroy() fs/aio.c: Remove ctx parameter in kiocb_cancel
2014-06-14fix __swap_writepage() compile failure on old gcc versionsAl Viro
Tetsuo Handa wrote: "Commit 62a8067a7f35 ("bio_vec-backed iov_iter") introduced an unnamed union inside a struct which gcc-4.4.7 cannot handle. Name the unnamed union as u in order to fix build failure" Let's do this instead: there is only one place in the entire tree that steps into this breakage. Anon structs and unions work in older gcc versions; as the matter of fact, we have those in the tree - see e.g. struct ieee80211_tx_info in include/net/mac80211.h What doesn't work is handling their initializers: struct { int a; union { int b; char c; }; } x[2] = {{.a = 1, .c = 'a'}, {.a = 0, .b = 1}}; is the obvious syntax for initializer, perfectly fine for C11 and handled correctly by gcc-4.7 or later. Earlier versions, though, break on it - declaration is fine and so's access to fields (i.e. x[0].c = 'a'; would produce the right code), but members of the anon structs and unions are not inserted into the right namespace. Tellingly, those older versions will not barf on struct {int a; struct {int a;};}; - looks like they just have it hacked up somewhere around the handling of . and -> instead of doing the right thing. The easiest way to deal with that crap is to turn initialization of those fields (in the only place where we have such initializer of iov_iter) into plain assignment. Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-14Merge tag 'hsi-for-3.16-fixes1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi Pull HSI build fixes from Sebastian Reichel: - tighten dependency between ssi-protocol and omap-ssi to fix build failures with randconfig. - use normal module refcounting in omap driver to fix build with disabled module support * tag 'hsi-for-3.16-fixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-hsi: hsi: omap_ssi_port: use normal module refcounting HSI: fix omap ssi driver dependency
2014-06-14Merge tag 'gpio-v3.16-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO fix from Linus Walleij: "A first GPIO fix for the v3.16 series, this was serious since it blocks the OMAP boot. Sending you this vital fix before leaving for a short vacation so it does not sit collecting dust in my tree for no good reason. Apart from this, our v3.16 cycle looks like a good start" * tag 'gpio-v3.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: gpio: of: Fix handling for deferred probe for -gpio suffix
2014-06-14Merge branch 'x86-vdso-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso fixes from Peter Anvin: "Fixes for x86/vdso. One is a simple build fix for bigendian hosts, one is to make "make vdso_install" work again, and the rest is about working around a bug in Google's Go language -- two are documentation patches that improves the sample code that the Go coders took, modified, and broke; the other two implements a workaround that keeps existing Go binaries from segfaulting at least" * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Fix vdso_install x86/vdso: Hack to keep 64-bit Go programs working x86/vdso: Add PUT_LE to store little-endian values x86/vdso/doc: Make vDSO examples more portable x86/vdso/doc: Rename vdso_test.c to vdso_standalone_test_x86.c x86, vdso: Remove one final use of htole16()
2014-06-14Merge tag 'hwmon-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon updates from Guenter Roeck: - new driver for Sensirion SHTC1 humidity / temperature sensor - convert ltc4151 and vexpress drivers to use devm functions - drop generic chip detection from lm85 driver - avoid forward declarations in atxp1 driver - fix sign extensions in ina2xx driver * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: vexpress: Use devm helper for hwmon device registration hwmon: (atxp1) Avoid forward declaration hwmon: add support for Sensirion SHTC1 sensor hwmon: (ltc4151) Convert to devm_hwmon_device_register_with_groups hwmon: (lm85) Drop generic detection hwmon: (ina2xx) Cast to s16 on shunt and current regs
2014-06-13udp: ipv4: do not waste time in __udp4_lib_mcast_demux_lookupEric Dumazet
Its too easy to add thousand of UDP sockets on a particular bucket, and slow down an innocent multicast receiver. Early demux is supposed to be an optimization, we should avoid spending too much time in it. It is interesting to note __udp4_lib_demux_lookup() only tries to match first socket in the chain. 10 is the threshold we already have in __udp4_lib_lookup() to switch to secondary hash. Fixes: 421b3885bf6d5 ("udp: ipv4: Add udp early demux") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: David Held <drheld@google.com> Cc: Shawn Bohrer <sbohrer@rgmadvisors.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-13vxlan: use dev->needed_headroom instead of dev->hard_header_lenCong Wang
When we mirror packets from a vxlan tunnel to other device, the mirror device should see the same packets (that is, without outer header). Because vxlan tunnel sets dev->hard_header_len, tcf_mirred() resets mac header back to outer mac, the mirror device actually sees packets with outer headers Vxlan tunnel should set dev->needed_headroom instead of dev->hard_header_len, like what other ip tunnels do. This fixes the above problem. Cc: "David S. Miller" <davem@davemloft.net> Cc: stephen hemminger <stephen@networkplumber.org> Cc: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: Cong Wang <cwang@twopensource.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-13MAINTAINERS: update cxgb4 maintainerDimitris Michailidis
Hari's been doing the patch submissions for a while now and he'll be taking over as maintainer. Signed-off-by: Dimitris Michailidis <dm@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-13x86/vdso: Fix vdso_installAndy Lutomirski
"make vdso_install" installs unstripped versions of the vdso objects for the benefit of the debugger. This was broken by checkin: 6f121e548f83 x86, vdso: Reimplement vdso.so preparation in build-time C The filenames are different now, so update the Makefile to cope. This still installs the 64-bit vdso as vdso64.so. We believe this will be okay, as the only known user is a patched gdb which is known to use build-ids, but if it turns out to be a problem we may have to add a link. Inspired by a patch from Sam Ravnborg. Acked-by: Sam Ravnborg <sam@ravnborg.org> Reported-by: Josh Boyer <jwboyer@fedoraproject.org> Tested-by: Josh Boyer <jwboyer@fedoraproject.org> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/b10299edd8ba98d17e07dafcd895b8ecf4d99eff.1402586707.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-06-13NVMe: Fix START_STOP_UNIT Scsi->NVMe translation.Dan McLeran
This patch contains several fixes for Scsi START_STOP_UNIT. The previous code did not account for signed vs. unsigned arithmetic which resulted in an invalid lowest power state caculation when the device only supports 1 power state. The code for Power Condition == 2 (Idle) was not following the spec. The spec calls for setting the device to specific power states, depending upon Power Condition Modifier, without accounting for the number of power states supported by the device. The code for Power Condition == 3 (Standby) was using a hard-coded '0' which is replaced with the macro POWER_STATE_0. Signed-off-by: Dan McLeran <daniel.mcleran@intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@linux.intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2014-06-13btrfs: fix error handling in create_pending_snapshotEric Sandeen
fcebe456 cut and pasted some code to a later point in create_pending_snapshot(), but didn't switch to the appropriate error handling for this stage of the function. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-13btrfs: fix use of uninit "ret" in end_extent_writepage()Eric Sandeen
If this condition in end_extent_writepage() is false: if (tree->ops && tree->ops->writepage_end_io_hook) we will then test an uninitialized "ret" at: ret = ret < 0 ? ret : -EIO; The test for ret is for the case where ->writepage_end_io_hook failed, and we'd choose that ret as the error; but if there is no ->writepage_end_io_hook, nothing sets ret. Initializing ret to 0 should be sufficient; if writepage_end_io_hook wasn't set, (!uptodate) means non-zero err was passed in, so we choose -EIO in that case. Signed-of-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-13btrfs: free ulist in qgroup_shared_accounting() error pathEric Sandeen
If tmp = ulist_alloc(GFP_NOFS) fails, we return without freeing the previously allocated qgroups = ulist_alloc(GFP_NOFS) and cause a memory leak. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-13Btrfs: fix qgroups sanity test crash or hangFilipe Manana
Often when running the qgroups sanity test, a crash or a hang happened. This is because the extent buffer the test uses for the root node doesn't have an header level explicitly set, making it have a random level value. This is a problem when it's not zero for the btrfs_search_slot() calls the test ends up doing, resulting in crashes or hangs such as the following: [ 6454.127192] Btrfs loaded, debug=on, assert=on, integrity-checker=on (...) [ 6454.127760] BTRFS: selftest: Running qgroup tests [ 6454.127964] BTRFS: selftest: Running test_test_no_shared_qgroup [ 6454.127966] BTRFS: selftest: Qgroup basic add [ 6480.152005] BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:5383] [ 6480.152005] Modules linked in: btrfs(+) xor raid6_pq binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc i2c_piix4 i2c_core pcspkr evbug psmouse serio_raw e1000 [last unloaded: btrfs] [ 6480.152005] irq event stamp: 188448 [ 6480.152005] hardirqs last enabled at (188447): [<ffffffff8168ef5c>] restore_args+0x0/0x30 [ 6480.152005] hardirqs last disabled at (188448): [<ffffffff81698e6a>] apic_timer_interrupt+0x6a/0x80 [ 6480.152005] softirqs last enabled at (188446): [<ffffffff810516cf>] __do_softirq+0x1cf/0x450 [ 6480.152005] softirqs last disabled at (188441): [<ffffffff81051c25>] irq_exit+0xb5/0xc0 [ 6480.152005] CPU: 0 PID: 5383 Comm: modprobe Not tainted 3.15.0-rc8-fdm-btrfs-next-33+ #4 [ 6480.152005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 6480.152005] task: ffff8802146125a0 ti: ffff8800d0d00000 task.ti: ffff8800d0d00000 [ 6480.152005] RIP: 0010:[<ffffffff81349a63>] [<ffffffff81349a63>] __write_lock_failed+0x13/0x20 [ 6480.152005] RSP: 0018:ffff8800d0d038e8 EFLAGS: 00000287 [ 6480.152005] RAX: 0000000000000000 RBX: ffffffff8168ef5c RCX: 000005deb8525852 [ 6480.152005] RDX: 0000000000000000 RSI: 0000000000001d45 RDI: ffff8802105000b8 [ 6480.152005] RBP: ffff8800d0d038e8 R08: fffffe12710f63db R09: ffffffffa03196fb [ 6480.152005] R10: ffff8802146125a0 R11: ffff880214612e28 R12: ffff8800d0d03858 [ 6480.152005] R13: 0000000000000000 R14: ffff8800d0d00000 R15: ffff8802146125a0 [ 6480.152005] FS: 00007f14ff804700(0000) GS:ffff880215e00000(0000) knlGS:0000000000000000 [ 6480.152005] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 6480.152005] CR2: 00007fff4df0dac8 CR3: 00000000d1796000 CR4: 00000000000006f0 [ 6480.152005] Stack: [ 6480.152005] ffff8800d0d03908 ffffffff810ae967 0000000000000001 ffff8802105000b8 [ 6480.152005] ffff8800d0d03938 ffffffff8168e57e ffffffffa0319c16 0000000000000007 [ 6480.152005] ffff880210500000 ffff880210500100 ffff8800d0d039b8 ffffffffa0319c16 [ 6480.152005] Call Trace: [ 6480.152005] [<ffffffff810ae967>] do_raw_write_lock+0x47/0xa0 [ 6480.152005] [<ffffffff8168e57e>] _raw_write_lock+0x5e/0x80 [ 6480.152005] [<ffffffffa0319c16>] ? btrfs_tree_lock+0x116/0x270 [btrfs] [ 6480.152005] [<ffffffffa0319c16>] btrfs_tree_lock+0x116/0x270 [btrfs] [ 6480.152005] [<ffffffffa02b2acb>] btrfs_lock_root_node+0x3b/0x50 [btrfs] [ 6480.152005] [<ffffffffa02b81a6>] btrfs_search_slot+0x916/0xa20 [btrfs] [ 6480.152005] [<ffffffff811a727f>] ? create_object+0x23f/0x300 [ 6480.152005] [<ffffffffa02b9958>] btrfs_insert_empty_items+0x78/0xd0 [btrfs] [ 6480.152005] [<ffffffffa036041a>] insert_normal_tree_ref.constprop.4+0xa2/0x19a [btrfs] [ 6480.152005] [<ffffffffa03605c3>] test_no_shared_qgroup+0xb1/0x1ca [btrfs] [ 6480.152005] [<ffffffff8108cad6>] ? local_clock+0x16/0x30 [ 6480.152005] [<ffffffffa035ef8e>] btrfs_test_qgroups+0x1ae/0x1d7 [btrfs] [ 6480.152005] [<ffffffffa03a69d2>] ? ftrace_define_fields_btrfs_space_reservation+0xfd/0xfd [btrfs] [ 6480.152005] [<ffffffffa03a6a86>] init_btrfs_fs+0xb4/0x153 [btrfs] [ 6480.152005] [<ffffffff81000352>] do_one_initcall+0x102/0x150 [ 6480.152005] [<ffffffff8103d223>] ? set_memory_nx+0x43/0x50 [ 6480.152005] [<ffffffff81682668>] ? set_section_ro_nx+0x6d/0x74 [ 6480.152005] [<ffffffff810d91cc>] load_module+0x1cdc/0x2630 (...) Therefore initialize the extent buffer as an empty leaf (level 0). Issue easy to reproduce when btrfs is built as a module via: $ for ((i = 1; i <= 1000000; i++)); do rmmod btrfs; modprobe btrfs; done Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-13btrfs: prevent RCU warning when dereferencing radix tree slotSasha Levin
Mark the dereference as protected by lock. Not doing so triggers an RCU warning since the radix tree assumed that RCU is in use. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-13Btrfs: fix unfinished readahead thread for raid5/6 degraded mountingWang Shilong
Steps to reproduce: # mkfs.btrfs -f /dev/sd[b-f] -m raid5 -d raid5 # mkfs.ext4 /dev/sdc --->corrupt one of btrfs device # mount /dev/sdb /mnt -o degraded # btrfs scrub start -BRd /mnt This is because readahead would skip missing device, this is not true for RAID5/6, because REQ_GET_READ_MIRRORS return 1 for RAID5/6 block mapping. If expected data locates in missing device, readahead thread would not call __readahead_hook() which makes event @rc->elems=0 wait forever. Fix this problem by checking return value of btrfs_map_block(),we can only skip missing device safely if there are several mirrors. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
2014-06-13btrfs: new ioctl TREE_SEARCH_V2Gerhard Heift
This new ioctl call allows the user to supply a buffer of varying size in which a tree search can store its results. This is much more flexible if you want to receive items which are larger than the current fixed buffer of 3992 bytes or if you want to fetch more items at once. Items larger than this buffer are for example some of the type EXTENT_CSUM. Signed-off-by: Gerhard Heift <Gerhard@Heift.Name> Signed-off-by: Chris Mason <clm@fb.com> Acked-by: David Sterba <dsterba@suse.cz>
2014-06-13NVMe: Use Log Page constants in SCSI emulationMatthew Wilcox
The nvme-scsi file defined its own Log Page constant. Use the newly-defined one from the header file instead. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>