summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2015-11-01mm: get rid of 'vmalloc_info' from /proc/meminfoLinus Torvalds
It turns out that at least some versions of glibc end up reading /proc/meminfo at every single startup, because glibc wants to know the amount of memory the machine has. And while that's arguably insane, it's just how things are. And it turns out that it's not all that expensive most of the time, but the vmalloc information statistics (amount of virtual memory used in the vmalloc space, and the biggest remaining chunk) can be rather expensive to compute. The 'get_vmalloc_info()' function actually showed up on my profiles as 4% of the CPU usage of "make test" in the git source repository, because the git tests are lots of very short-lived shell-scripts etc. It turns out that apparently this same silly vmalloc info gathering shows up on the facebook servers too, according to Dave Jones. So it's not just "make test" for git. We had two patches to just cache the information (one by me, one by Ingo) to mitigate this issue, but the whole vmalloc information of of rather dubious value to begin with, and people who *actually* want to know what the situation is wrt the vmalloc area should just look at the much more complete /proc/vmallocinfo instead. In fact, according to my testing - and perhaps more importantly, according to that big search engine in the sky: Google - there is nothing out there that actually cares about those two expensive fields: VmallocUsed and VmallocChunk. So let's try to just remove them entirely. Actually, this just removes the computation and reports the numbers as zero for now, just to try to be minimally intrusive. If this breaks anything, we'll obviously have to re-introduce the code to compute this all and add the caching patches on top. But if given the option, I'd really prefer to just remove this bad idea entirely rather than add even more code to work around our historical mistake that likely nobody really cares about. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-01Merge branch 'fs-file-descriptor-optimization'Linus Torvalds
Merge file descriptor allocation speedup. Eric Dumazet has a test-case for a fairly common network deamon load pattern: openign and closing a lot of sockets that each have very little work done on them. It turns out that in that case, the cost of just finding the correct file descriptor number can be a dominating factor. We've long had a trivial optimization for allocating file descriptors sequentially, but that optimization ends up being not very effective when other file descriptors are being closed concurrently, and the fd patterns are not some simple FIFO pattern. In such cases we ended up spending a lot of time just scanning the bitmap of open file descriptors in order to find the next file descriptor number to open. This trivial patch-series mitigates that by simply introducing a second-level bitmap of which words in the first bitmap are already fully allocated. That cuts down the cost of scanning by an order of magnitude in some pathological (but realistic) cases. The second patch is an even more trivial patch to avoid unnecessarily dirtying the cacheline for the close-on-exec bit array that normally ends up being all empty. * fs-file-descriptor-optimization: vfs: conditionally clear close-on-exec flag vfs: Fix pathological performance case for __alloc_fd()
2015-11-01Linux 4.3Linus Torvalds
2015-11-01Merge branch 'libnvdimm-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull memremap fix from Dan Williams: "The new memremap() api introduced in the 4.3 cycle to unify/replace ioremap_cache() and ioremap_wt() is mishandling the highmem case. This patch has received a build success notification from a 0day-kbuild-robot run and has received an ack from Ard" From the commit message: "The impact of this bug is low for now since the pmem driver is the only user of memremap(), but this is important to fix before more conversions to memremap arrive in 4.4" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: memremap: fix highmem support
2015-11-01Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "This set of updates contains: - Another bugfix for the pathologic vm86 machinery. Clear thread.vm86 on fork to prevent corrupting the parent state. This comes along with an update to the vm86 selftest case - Fix another corner case in the ioapic setup code which causes a boot crash on some oddball systems - Fix the fallout from the dma allocation consolidation work, which leads to a NULL pointer dereference when the allocation code is called with a NULL device" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vm86: Set thread.vm86 to NULL on fork/clone selftests/x86: Add a fork() to entry_from_vm86 to catch fork bugs x86/ioapic: Prevent NULL pointer dereference in setup_ioapic_dest() x86/dma-mapping: Fix arch_dma_alloc_attrs() oops with NULL dev
2015-11-01Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "The last round of minimalistic fixes for clocksource drivers: - Prevent multiple shutdown of the sh_mtu2 clocksource - Annotate a bunch of clocksource/schedclock functions with notrace to prevent an annoying ftrace recursion issue" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers/sh_mtu2: Fix multiple shutdown call issue clocksource/drivers/digicolor: Prevent ftrace recursion clocksource/drivers/fsl_ftm_timer: Prevent ftrace recursion clocksource/drivers/vf_pit_timer: Prevent ftrace recursion clocksource/drivers/prima2: Prevent ftrace recursion clocksource/drivers/samsung_pwm_timer: Prevent ftrace recursion clocksource/drivers/pistachio: Prevent ftrace recursion clocksource/drivers/arm_global_timer: Prevent ftrace recursion
2015-11-01Merge branch 'irq-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "The last two one-liners for 4.3 from the irqchip space: - Regression fix for armada SoC which addresses the fallout from the set_irq_flags() cleanup - Add the missing propagation of the irq_set_type() callback to the parent interrupt controller of the tegra interrupt chip" * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/tegra: Propagate IRQ type setting to parent irqchip/armada-370-xp: Fix regression by clearing IRQ_NOAUTOEN
2015-10-31Merge tag 'armsoc-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC fixes from Olof Johansson: "This should be our final batch of fixes for 4.3: - A patch from Sudeep Holla that fixes annotation of wakeup sources properly, old unused format seems to have spread through copying. - Two patches from Tony for OMAP. One dealing with MUSB setup problems due to runtime PM being enabled too early on the parent device. The other fixes IRQ numbering for OMAP1" * tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: usb: musb: omap2430: Fix regression caused by driver core change ARM: OMAP1: fix incorrect INT_DMA_LCD ARM: dts: fix gpio-keys wakeup-source property
2015-10-31Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "This is three essential bug fixes for various SCSI parts. The only affected users are SCSI multi-path via device handler (basically all the enterprise) and mvsas users. The dh bugs are an async entanglement in boot resulting in a serious WARN_ON trip and a use after free on remove leading to a crash with strict memory accounting. The mvsas bug manifests as a null deref oops but only on abort sequences; however, these can commonly occur with SATA attached devices, hence the fix" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi_dh: don't try to load a device handler during async probing scsi_dh: fix use-after-free when removing scsi device mvsas: Fix NULL pointer dereference in mvs_slot_task_free
2015-10-31Merge tag 'md/4.3-rc7-fixes' of git://neil.brown.name/mdLinus Torvalds
Pull md bug fixes from Neil Brown: "Two more bug fixes for md. One bugfix for a list corruption in raid5 because of incorrect locking. Other for possible data corruption when a recovering device is failed, removed, and re-added. Both tagged for -stable" * tag 'md/4.3-rc7-fixes' of git://neil.brown.name/md: Revert "md: allow a partially recovered device to be hot-added to an array." md/raid5: fix locking in handle_stripe_clean_event()
2015-10-31Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds
Pull drm fixes from Dave Airlie: "Two drm atomic core fixes. And two radeon patches needed to fix a backlight regression on some older hardware" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm: Correct arguments to list_tail_add in create blob ioctl drm: crtc: integer overflow in drm_property_create_blob() drm/radeon: fix dpms when driver backlight control is disabled drm/radeon: move bl encoder assignment into bl init
2015-10-31vfs: conditionally clear close-on-exec flagLinus Torvalds
We clear the close-on-exec flag when opening and closing files, and the bit was almost always already clear before. Avoid dirtying the cacheline if the clearning isn't necessary. That avoids unnecessary cacheline dirtying and bouncing in multi-socket environments. Eric Dumazet has a file descriptor benchmark that goes 4% faster from this on his two-socket machine. It's probably partly superlinear improvement due to getting slightly less spinlock contention on the file_lock spinlock due to less work in the critical section. Tested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-31vfs: Fix pathological performance case for __alloc_fd()Linus Torvalds
Al Viro points out that: > > * [Linux-specific aside] our __alloc_fd() can degrade quite badly > > with some use patterns. The cacheline pingpong in the bitmap is probably > > inevitable, unless we accept considerably heavier memory footprint, > > but we also have a case when alloc_fd() takes O(n) and it's _not_ hard > > to trigger - close(3);open(...); will have the next open() after that > > scanning the entire in-use bitmap. And Eric Dumazet has a somewhat realistic multithreaded microbenchmark that opens and closes a lot of sockets with minimal work per socket. This patch largely fixes it. We keep a 2nd-level bitmap of the open file bitmaps, showing which words are already full. So then we can traverse that second-level bitmap to efficiently skip already allocated file descriptors. On his benchmark, this improves performance by up to an order of magnitude, by avoiding the excessive open file bitmap scanning. Tested-and-acked-by: Eric Dumazet <edumazet@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-31Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fix from Sage Weil: "This sets the stable pages flag on the RBD block device when we have CRCs enabled. (This is necessary since the default assumption for block devices changed in 3.9)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: require stable pages if message data CRCs are enabled
2015-10-31Merge branch 'overlayfs-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs bug fixes from Miklos Szeredi: "This contains fixes for bugs that appeared in earlier kernels (all are marked for -stable)" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: free lower_mnt array in ovl_put_super ovl: free stack of paths in ovl_fill_super ovl: fix open in stacked overlay ovl: fix dentry reference leak ovl: use O_LARGEFILE in ovl_copy_up()
2015-10-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix two regressions in ipv6 route lookups, particularly wrt output interface specifications in the lookup key. From David Ahern. 2) Fix checks in ipv6 IPSEC tunnel pre-encap fragmentation, from Herbert Xu. 3) Fix mis-advertisement of 1000BASE-T on bcm63xx_enet, from Simon Arlott. 4) Some smsc phys misbehave with energy detect mode enabled, so add a DT property and disable it on such switches. From Heiko Schocher. 5) Fix TSO corruption on TX in mv643xx_eth, from Philipp Kirchhofer. 6) Fix regression added by removal of openvswitch vport stats, from James Morse. 7) Vendor Kconfig options should be bool, not tristate, from Andreas Schwab. 8) Use non-_BH() net stats bump in tcp_xmit_probe_skb(), otherwise we barf during TCP REPAIR operations. 9) Fix various bugs in openvswitch conntrack support, from Joe Stringer. 10) Fix NETLINK_LIST_MEMBERSHIPS locking, from David Herrmann. 11) Don't have VSOCK do sock_put() in interrupt context, from Jorgen Hansen. 12) Fix skb_realloc_headroom() failures properly in ISDN, from Karsten Keil. 13) Add some device IDs to qmi_wwan, from Bjorn Mork. 14) Fix ovs egress tunnel information when using lwtunnel devices, from Pravin B Shelar. 15) Add missing NETIF_F_FRAGLIST to macvtab feature list, from Jason Wang. 16) Fix incorrect handling of throw routes when the result of the throw cannot find a match, from Xin Long. 17) Protect ipv6 MTU calculations from wrap-around, from Hannes Frederic Sowa. 18) Fix failed autonegotiation on KSZ9031 micrel PHYs, from Nathan Sullivan. 19) Add missing memory barries in descriptor accesses or xgbe driver, from Thomas Lendacky. 20) Fix release conditon test in pppoe_release(), from Guillaume Nault. 21) Fix gianfar bugs wrt filter configuration, from Claudiu Manoil. 22) Fix violations of RX buffer alignment in sh_eth driver, from Sergei Shtylyov. 23) Fixing missing of_node_put() calls in various places around the networking, from Julia Lawall. 24) Fix incorrect leaf now walking in ipv4 routing tree, from Alexander Duyck. 25) RDS doesn't check pskb_pull()/pskb_trim() return values, from Sowmini Varadhan. 26) Fix VLAN configuration in mlx4 driver, from Jack Morgenstein. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits) ipv6: protect mtu calculation of wrap-around and infinite loop by rounding issues Revert "Merge branch 'ipv6-overflow-arith'" net/mlx4: Copy/set only sizeof struct mlx4_eqe bytes net/mlx4_en: Explicitly set no vlan tags in WQE ctrl segment when no vlan is present vhost: fix performance on LE hosts bpf: sample: define aarch64 specific registers amd-xgbe: Fix race between access of desc and desc index RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in rds_tcp_data_recv forcedeth: fix unilateral interrupt disabling in netpoll path openvswitch: Fix skb leak using IPv6 defrag ipv6: Export nf_ct_frag6_consume_orig() openvswitch: Fix double-free on ip_defrag() errors fib_trie: leaf_walk_rcu should not compute key if key is less than pn->key net: mv643xx_eth: add missing of_node_put ath6kl: add missing of_node_put net: phy: mdio: add missing of_node_put netdev/phy: add missing of_node_put net: netcp: add missing of_node_put net: thunderx: add missing of_node_put ipv6: gre: support SIT encapsulation ...
2015-10-31x86/vm86: Set thread.vm86 to NULL on fork/cloneAndy Lutomirski
thread.vm86 points to per-task information -- the pointer should not be copied on clone. Fixes: d4ce0f26c790 ("x86/vm86: Move fields from 'struct kernel_vm86_struct' to 'struct vm86'") Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Stas Sergeev <stsp@list.ru> Link: http://lkml.kernel.org/r/71c5d6985d70ec8197c8d72f003823c81b7dcf99.1446270067.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-31selftests/x86: Add a fork() to entry_from_vm86 to catch fork bugsAndy Lutomirski
Mere possession of vm86 state is strange. Make sure that nothing gets corrupted if we fork after calling vm86(). Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Stas Sergeev <stsp@list.ru> Link: http://lkml.kernel.org/r/08f83295460a80e41dc5e3e81ec40d6844d316f5.1446270067.git.luto@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-10-30Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input layer fixes from Dmitry Torokhov: - a change to the ALPS driver where we had limit the quirk for trackstick handling from being active on all Dells to just a few models - a fix for a build dependency issue in the sur40 driver - a small clock handling fixup in the LPC32xx touchscreen driver * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: alps - only the Dell Latitude D420/430/620/630 have separate stick button bits Input: sur40 - add dependency on VIDEO_V4L2 Input: lpc32xx_ts - fix warnings caused by enabling unprepared clock
2015-10-30Merge tag 'pci-v4.3-fixes-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "Sorry for this last-minute update; it's been in -next for quite a while, but I forgot about it until I started getting ready for the merge window. It's small and fixes a way a user could cause a panic via sysfs, so I think it's worth getting it in v4.3. NUMA: - Prevent out of bounds access in sysfs numa_node override (Sasha Levin)" * tag 'pci-v4.3-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: PCI: Prevent out of bounds access in numa_node override
2015-10-31Merge tag 'omap-for-v4.3/fixes-rc7' of ↵Olof Johansson
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into fixes Two omap regression fixes: - Fix omap3 MUSB with DMA caused by driver core changes - Fix LCD DMA interrupt number for omap1 that did not get changed for sparse IRQ changes * tag 'omap-for-v4.3/fixes-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: usb: musb: omap2430: Fix regression caused by driver core change ARM: OMAP1: fix incorrect INT_DMA_LCD Signed-off-by: Olof Johansson <olof@lixom.net>
2015-10-31drm: Correct arguments to list_tail_add in create blob ioctlManeet Singh
Arguments passed to list_add_tail were reversed resulting in deletion of old blob property everytime the new one is added. Fixes commit e2f5d2ea479b9b2619965d43db70939589afe43a Author: Daniel Stone <daniels@collabora.com> Date: Fri May 22 13:34:51 2015 +0100 drm/mode: Add user blob-creation ioctl Signed-off-by: Maneet Singh <mmaneetsingh@nvidia.com> [seanpaul tweaked commit subject a little] Signed-off-by: Sean Paul <seanpaul@chromium.org> Cc: stable@kernel.org # v4.2 Reviewed-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Dave Airlie <airlied@gmail.com>
2015-10-31Revert "md: allow a partially recovered device to be hot-added to an array."NeilBrown
This reverts commit 7eb418851f3278de67126ea0c427641ab4792c57. This commit is poorly justified, I can find not discusison in email, and it clearly causes a problem. If a device which is being recovered fails and is subsequently re-added to an array, there could easily have been changes to the array *before* the point where the recovery was up to. So the recovery must start again from the beginning. If a spare is being recovered and fails, then when it is re-added we really should do a bitmap-based recovery up to the recovery-offset, and then a full recovery from there. Before this reversion, we only did the "full recovery from there" which is not corect. After this reversion with will do a full recovery from the start, which is safer but not ideal. It will be left to a future patch to arrange the two different styles of recovery. Reported-and-tested-by: Nate Dailey <nate.dailey@stratus.com> Signed-off-by: NeilBrown <neilb@suse.com> Cc: stable@vger.kernel.org (3.14+) Fixes: 7eb418851f32 ("md: allow a partially recovered device to be hot-added to an array.")
2015-10-31drm: crtc: integer overflow in drm_property_create_blob()Dan Carpenter
The size here comes from the user via the ioctl, it is a number between 1-u32max so the addition here could overflow on 32 bit systems. Fixes: f453ba046074 ('DRM: add mode setting support') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Daniel Stone <daniels@collabora.com> Cc: stable@kernel.org # v4.2 Signed-off-by: Dave Airlie <airlied@gmail.com>
2015-10-30Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "Apologies for this being so late, but we've uncovered a few nasty issues on arm64 which didn't settle down until yesterday and the fixes all look suitable for 4.3. Of the four patches, three of them are Cc'd to stable, with the remaining patch fixing an issue that only took effect during the merge window. Summary: - Fix corruption in SWP emulation when STXR fails due to contention - Fix MMU re-initialisation when resuming from a low-power state - Fix stack unwinding code to match what ftrace expects - Fix relocation code in the EFI stub when DRAM base is not 2MB aligned" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64/efi: do not assume DRAM base is aligned to 2 MB Revert "ARM64: unwind: Fix PC calculation" arm64: kernel: fix tcr_el1.t0sz restore on systems with extended idmap arm64: compat: fix stxr failure case in SWP emulation
2015-10-30Merge tag 'please-pull-syscalls' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull ia64 kcmp syscall from Tony Luck: "Missed adding the kcmp() syscall a long time ago. Now it seems that it is essential to build systemd" * tag 'please-pull-syscalls' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: [IA64] Wire up kcmp syscall
2015-10-31md/raid5: fix locking in handle_stripe_clean_event()Roman Gushchin
After commit 566c09c53455 ("raid5: relieve lock contention in get_active_stripe()") __find_stripe() is called under conf->hash_locks + hash. But handle_stripe_clean_event() calls remove_hash() under conf->device_lock. Under some cirscumstances the hash chain can be circuited, and we get an infinite loop with disabled interrupts and locked hash lock in __find_stripe(). This leads to hard lockup on multiple CPUs and following system crash. I was able to reproduce this behavior on raid6 over 6 ssd disks. The devices_handle_discard_safely option should be set to enable trim support. The following script was used: for i in `seq 1 32`; do dd if=/dev/zero of=large$i bs=10M count=100 & done neilb: original was against a 3.x kernel. I forward-ported to 4.3-rc. This verison is suitable for any kernel since Commit: 59fc630b8b5f ("RAID5: batch adjacent full stripe write") (v4.1+). I'll post a version for earlier kernels to stable. Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Fixes: 566c09c53455 ("raid5: relieve lock contention in get_active_stripe()") Signed-off-by: NeilBrown <neilb@suse.com> Cc: Shaohua Li <shli@kernel.org> Cc: <stable@vger.kernel.org> # 3.13 - 4.2
2015-10-30rbd: require stable pages if message data CRCs are enabledRonny Hegewald
rbd requires stable pages, as it performs a crc of the page data before they are send to the OSDs. But since kernel 3.9 (patch 1d1d1a767206fbe5d4c69493b7e6d2a8d08cc0a0 "mm: only enforce stable page writes if the backing device requires it") it is not assumed anymore that block devices require stable pages. This patch sets the necessary flag to get stable pages back for rbd. In a ceph installation that provides multiple ext4 formatted rbd devices "bad crc" messages appeared regularly (ca 1 message every 1-2 minutes on every OSD that provided the data for the rbd) in the OSD-logs before this patch. After this patch this messages are pretty much gone (only ca 1-2 / month / OSD). Cc: stable@vger.kernel.org # 3.9+, needs backporting Signed-off-by: Ronny Hegewald <Ronny.Hegewald@online.de> [idryomov@gmail.com: require stable pages only in crc case, changelog] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-10-30Merge branch 'drm-fixes-4.3' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes regression fix for backlight on old laptops. * 'drm-fixes-4.3' of git://people.freedesktop.org/~agd5f/linux: drm/radeon: fix dpms when driver backlight control is disabled drm/radeon: move bl encoder assignment into bl init
2015-10-29arm64/efi: do not assume DRAM base is aligned to 2 MBArd Biesheuvel
The current arm64 Image relocation code in the UEFI stub assumes that the dram_base argument it receives is always a multiple of 2 MB. In reality, it is simply the lowest start address of all RAM entries in the UEFI memory map, which means it could be any multiple of 4 KB. Since the arm64 kernel Image needs to reside TEXT_OFFSET bytes beyond a 2 MB aligned base, or it will fail to boot, make sure we round dram_base to 2 MB before using it to calculate the relocation address. Fixes: e38457c361b30c5a ("arm64: efi: prefer AllocatePages() over efi_low_alloc() for vmlinux") Reported-by: Timur Tabi <timur@codeaurora.org> Tested-by: Timur Tabi <timur@codeaurora.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-29drm/radeon: fix dpms when driver backlight control is disabledAlex Deucher
If driver backlight control is disabled, either by driver parameter or default per-asic setting, revert to the old behavior. Fixes a regression in commit: 4281f46ef839050d2ef60348f661eb463c21cc2e Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2015-10-29drm/radeon: move bl encoder assignment into bl initAlex Deucher
So that the bl encoder will be null if the GPU does not control the backlight. Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2015-10-29ipv6: protect mtu calculation of wrap-around and infinite loop by rounding ↵Hannes Frederic Sowa
issues Raw sockets with hdrincl enabled can insert ipv6 extension headers right into the data stream. In case we need to fragment those packets, we reparse the options header to find the place where we can insert the fragment header. If the extension headers exceed the link's MTU we actually cannot make progress in such a case. Instead of ending up in broken arithmetic or rounding towards 0 and entering an endless loop in ip6_fragment, just prevent those cases by aborting early and signal -EMSGSIZE to user space. This is the second version of the patch which doesn't use the overflow_usub function, which got reverted for now. Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-29Revert "Merge branch 'ipv6-overflow-arith'"Hannes Frederic Sowa
Linus dislikes these changes. To not hold up the net-merge let's revert it for now and fix the bug like Linus suggested. This reverts commit ec3661b42257d9a06cf0d318175623ac7a660113, reversing changes made to c80dbe04612986fd6104b4a1be21681b113b5ac9. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-28[IA64] Wire up kcmp syscallÉmeric MASCHINO
systemd > 218 fails to compile on ia64 with: error: ‘__NR_kcmp’ undeclared [1]. I've been told that this is because the kcmp syscall hasn't been wired up for the ia64 arch [2]. The proposed patch thus wire up the kcmp syscall for the ia64 arch. [1] https://bugs.gentoo.org/show_bug.cgi?id=560492 [2] https://bugs.gentoo.org/show_bug.cgi?id=560492#c17 Signed-off-by: Émeric MASCHINO <emeric.maschino@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
2015-10-28usb: musb: omap2430: Fix regression caused by driver core changeTony Lindgren
Commit ddef08dd00f5 ("Driver core: wakeup the parent device before trying probe") started automatically ensuring the parent device is enabled when the child gets probed. This however caused a regression for MUSB omap2430 interface as the runtime PM for the parent device needs the child initialized to access the MUSB hardware registers. Let's delay the enabling of PM runtime for the parent until the child has been properly initialized as suggested in an earlier patch by Grygorii Strashko <grygorii.strashko@ti.com>. In addition to delaying pm_runtime_enable, we now also need to make sure the parent is enabled during omap2430_musb_init. We also want to propagate an error from omap2430_runtime_resume if struct musb is not initialized. Note that we use pm_runtime_put_noidle here for both the child and parent to prevent an extra runtime_suspend/resume cycle. Let's also add some comments to avoid confusion between the two different devices. Fixes: ddef08dd00f5 ("Driver core: wakeup the parent device before trying probe") Suggested-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Felipe Balbi <balbi@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com>
2015-10-28Revert "ARM64: unwind: Fix PC calculation"Will Deacon
This reverts commit e306dfd06fcb44d21c80acb8e5a88d55f3d1cf63. With this patch applied, we were the only architecture making this sort of adjustment to the PC calculation in the unwinder. This causes problems for ftrace, where the PC values are matched against the contents of the stack frames in the callchain and fail to match any records after the address adjustment. Whilst there has been some effort to change ftrace to workaround this, those patches are not yet ready for mainline and, since we're the odd architecture in this regard, let's just step in line with other architectures (like arch/arm/) for now. Cc: <stable@vger.kernel.org> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-28arm64: kernel: fix tcr_el1.t0sz restore on systems with extended idmapLorenzo Pieralisi
Commit dd006da21646 ("arm64: mm: increase VA range of identity map") introduced a mechanism to extend the virtual memory map range to support arm64 systems with system RAM located at very high offset, where the identity mapping used to enable/disable the MMU requires additional translation levels to map the physical memory at an equal virtual offset. The kernel detects at boot time the tcr_el1.t0sz value required by the identity mapping and sets-up the tcr_el1.t0sz register field accordingly, any time the identity map is required in the kernel (ie when enabling the MMU). After enabling the MMU, in the cold boot path the kernel resets the tcr_el1.t0sz to its default value (ie the actual configuration value for the system virtual address space) so that after enabling the MMU the memory space translated by ttbr0_el1 is restored as expected. Commit dd006da21646 ("arm64: mm: increase VA range of identity map") also added code to set-up the tcr_el1.t0sz value when the kernel resumes from low-power states with the MMU off through cpu_resume() in order to effectively use the identity mapping to enable the MMU but failed to add the code required to restore the tcr_el1.t0sz to its default value, when the core returns to the kernel with the MMU enabled, so that the kernel might end up running with tcr_el1.t0sz value set-up for the identity mapping which can be lower than the value required by the actual virtual address space, resulting in an erroneous set-up. This patchs adds code in the resume path that restores the tcr_el1.t0sz default value upon core resume, mirroring this way the cold boot path behaviour therefore fixing the issue. Cc: <stable@vger.kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Fixes: dd006da21646 ("arm64: mm: increase VA range of identity map") Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: James Morse <james.morse@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-28arm64: compat: fix stxr failure case in SWP emulationWill Deacon
If the STXR instruction fails in the SWP emulation code, we leave *data overwritten with the loaded value, therefore corrupting the data written by a subsequent, successful attempt. This patch re-jigs the code so that we only write back to *data once we know that the update has happened. Cc: <stable@vger.kernel.org> Fixes: bd35a4adc413 ("arm64: Port SWP/SWPB emulation support from arm") Reported-by: Shengjiu Wang <shengjiu.wang@freescale.com> Reported-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-10-28ARM: OMAP1: fix incorrect INT_DMA_LCDAaro Koskinen
Commit 685e2d08c54b ("ARM: OMAP1: Change interrupt numbering for sparse IRQ") turned on SPARSE_IRQ on OMAP1, but forgot to change the number of INT_DMA_LCD. This broke the boot at least on Nokia 770, where the device hangs during framebuffer initialization. Fix by defining INT_DMA_LCD like the other interrupts. Cc: stable@vger.kernel.org # v4.2+ Fixes: 685e2d08c54b ("ARM: OMAP1: Change interrupt numbering for sparse IRQ") Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi> Signed-off-by: Tony Lindgren <tony@atomide.com>
2015-10-28clocksource/drivers/sh_mtu2: Fix multiple shutdown call issueMagnus Damm
On the r7s72100 Genmai board the MTU2 driver currently triggers a common clock framework WARN_ON(enable_count) when disabling the clock due to the MTU2 driver after recent callback rework may call ->set_state_shutdown() multiple times. A similar issue was spotted for the TMU driver and fixed in: 452b132 clocksource/drivers/sh_tmu: Fix traceback spotted in -next On r7s72100 Genmai v4.3-rc7 built with shmobile_defconfig spits out the following during boot: sh_mtu2 fcff0000.timer: ch0: used for clock events ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1 at drivers/clk/clk.c:675 clk_core_disable+0x2c/0x6c() CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc7 #1 Hardware name: Generic R7S72100 (Flattened Device Tree) Backtrace: [<c00133d4>] (dump_backtrace) from [<c0013570>] (show_stack+0x18/0x1c) [<c0013558>] (show_stack) from [<c01c7aac>] (dump_stack+0x74/0x90) [<c01c7a38>] (dump_stack) from [<c00272fc>] (warn_slowpath_common+0x88/0xb4) [<c0027274>] (warn_slowpath_common) from [<c0027400>] (warn_slowpath_null+0x24/0x2c) [<c00273dc>] (warn_slowpath_null) from [<c03a9320>] (clk_core_disable+0x2c/0x6c) [<c03a92f4>] (clk_core_disable) from [<c03aa0a0>] (clk_disable+0x40/0x4c) [<c03aa060>] (clk_disable) from [<c0395d2c>] (sh_mtu2_disable+0x24/0x50) [<c0395d08>] (sh_mtu2_disable) from [<c0395d6c>] (sh_mtu2_clock_event_shutdown+0x14/0x1c) [<c0395d58>] (sh_mtu2_clock_event_shutdown) from [<c007d7d0>] (clockevents_switch_state+0xc8/0x114) [<c007d708>] (clockevents_switch_state) from [<c007d834>] (clockevents_shutdown+0x18/0x28) [<c007d81c>] (clockevents_shutdown) from [<c007dd58>] (clockevents_exchange_device+0x70/0x78) [<c007dce8>] (clockevents_exchange_device) from [<c007e578>] (tick_check_new_device+0x88/0xe0) [<c007e4f0>] (tick_check_new_device) from [<c007daf0>] (clockevents_register_device+0xac/0x120) [<c007da44>] (clockevents_register_device) from [<c0395be8>] (sh_mtu2_probe+0x230/0x350) [<c03959b8>] (sh_mtu2_probe) from [<c028b6f0>] (platform_drv_probe+0x50/0x98) Reported-by: Chris Brandt <chris.brandt@renesas.com> Fixes: 19a9ffb ("clockevents/drivers/sh_mtu2: Migrate to new 'set-state' interface") Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Signed-off-by: Magnus Damm <damm+renesas@opensource.se> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Reviewed-by: Viresh Kumar <viresh.kumar@linaro.org>
2015-10-28Merge tag 'powerpc-4.3-6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fix from Michael Ellerman: - powerpc/dma: dma_set_coherent_mask() should not be GPL only from Ben * tag 'powerpc-4.3-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/dma: dma_set_coherent_mask() should not be GPL only
2015-10-28powerpc/dma: dma_set_coherent_mask() should not be GPL onlyBenjamin Herrenschmidt
When turning this from inline to an exported function I was a bit over-eager and made it GPL only. This prevents the use of pretty much all non-GPL PCI driver which is a bit over the top. Let's bring it back in line with other architecture. Fixes: 817820b0226a ("powerpc/iommu: Support "hybrid" iommu/direct DMA ops for coherent_mask < dma_mask") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-27Merge branch 'mlx4-fixes'David S. Miller
Or Gerlitz says: ==================== Mellanox mlx4 driver fixes for 4.3-rc7 Jack's fix is for a regression introduced in 4.3-rc1 Carol's fix addresses an issue which exists for while and turns to beat us hard on PPC, please queue for -stable. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27net/mlx4: Copy/set only sizeof struct mlx4_eqe bytesCarol L Soto
When doing memcpy/memset of EQEs, we should use sizeof struct mlx4_eqe as the base size and not caps.eqe_size which could be bigger. If caps.eqe_size is bigger than the struct mlx4_eqe then we corrupt data in the master context. When using a 64 byte stride, the memcpy copied over 63 bytes to the slave_eq structure. This resulted in copying over the entire eqe of interest, including its ownership bit -- and also 31 bytes of garbage into the next WQE in the slave EQ -- which did NOT include the ownership bit (and therefore had no impact). However, once the stride is increased to 128, we are overwriting the ownership bits of *three* eqes in the slave_eq struct. This results in an incorrect ownership bit for those eqes, which causes the eq to seem to be full. The issue therefore surfaced only once 128-byte EQEs started being used in SRIOV and (overarchitectures that have 128/256 byte cache-lines such as PPC) - e.g after commit 77507aa249ae "net/mlx4_core: Enable CQE/EQE stride support". Fixes: 08ff32352d6f ('mlx4: 64-byte CQE/EQE support') Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27net/mlx4_en: Explicitly set no vlan tags in WQE ctrl segment when no vlan is ↵Jack Morgenstein
present We do not set the ins_vlan field to zero when no vlan id is present in the packet. Since WQEs in the TX ring are not zeroed out between uses, this oversight could result in having vlan flags present in the WQE ctrl segment when no vlan is preset. Fixes: e38af4faf01d ('net/mlx4_en: Add support for hardware accelerated 802.1ad vlan') Reported-by: Gideon Naim <gideonn@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27vhost: fix performance on LE hostsMichael S. Tsirkin
commit 2751c9882b947292fcfb084c4f604e01724af804 ("vhost: cross-endian support for legacy devices") introduced a minor regression: even with cross-endian disabled, and even on LE host, vhost_is_little_endian is checking is_le flag so there's always a branch. To fix, simply check virtio_legacy_is_little_endian first. Cc: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27bpf: sample: define aarch64 specific registersYang Shi
Define aarch64 specific registers for building bpf samples correctly. Signed-off-by: Yang Shi <yang.shi@linaro.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27amd-xgbe: Fix race between access of desc and desc indexLendacky, Thomas
During Tx cleanup it's still possible for the descriptor data to be read ahead of the descriptor index. A memory barrier is required between the read of the descriptor index and the start of the Tx cleanup loop. This allows a change to a lighter-weight barrier in the Tx transmit routine just before updating the current descriptor index. Since the memory barrier does result in extra overhead on arm64, keep the previous change to not chase the current descriptor value. This prevents the execution of the barrier for each loop performed. Suggested-by: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-27RDS-TCP: Recover correctly from pskb_pull()/pksb_trim() failure in ↵Sowmini Varadhan
rds_tcp_data_recv Either of pskb_pull() or pskb_trim() may fail under low memory conditions. If rds_tcp_data_recv() ignores such failures, the application will receive corrupted data because the skb has not been correctly carved to the RDS datagram size. Avoid this by handling pskb_pull/pskb_trim failure in the same manner as the skb_clone failure: bail out of rds_tcp_data_recv(), and retry via the deferred call to rds_send_worker() that gets set up on ENOMEM from rds_tcp_read_sock() Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>