Age | Commit message (Collapse) | Author |
|
Pull block layer fixes from Jens Axboe:
"A final set of fixes for 4.3.
It is (again) bigger than I would have liked, but it's all been
through the testing mill and has been carefully reviewed by multiple
parties. Each fix is either a regression fix for this cycle, or is
marked stable. You can scold me at KS. The pull request contains:
- Three simple fixes for NVMe, fixing regressions since 4.3. From
Arnd, Christoph, and Keith.
- A single xen-blkfront fix from Cathy, fixing a NULL dereference if
an error is returned through the staste change callback.
- Fixup for some bad/sloppy code in nbd that got introduced earlier
in this cycle. From Markus Pargmann.
- A blk-mq tagset use-after-free fix from Junichi.
- A backing device lifetime fix from Tejun, fixing a crash.
- And finally, a set of regression/stable fixes for cgroup writeback
from Tejun"
* 'for-linus' of git://git.kernel.dk/linux-block:
writeback: remove broken rbtree_postorder_for_each_entry_safe() usage in cgwb_bdi_destroy()
NVMe: Fix memory leak on retried commands
block: don't release bdi while request_queue has live references
nvme: use an integer value to Linux errno values
blk-mq: fix use-after-free in blk_mq_free_tag_set()
nvme: fix 32-bit build warning
writeback: fix incorrect calculation of available memory for memcg domains
writeback: memcg dirty_throttle_control should be initialized with wb->memcg_completions
writeback: bdi_writeback iteration must not skip dying ones
writeback: fix bdi_writeback iteration in wakeup_dirtytime_writeback()
writeback: laptop_mode_timer_fn() needs rcu_read_lock() around bdi_writeback iteration
nbd: Add locking for tasks
xen-blkfront: check for null drvdata in blkback_changed (XenbusStateClosing)
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Misc fixes: two KASAN fixes, two EFI boot fixes, two boot-delay
optimization fixes, and a fix for a IRQ handling hang observed on
virtual platforms"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm, kasan: Silence KASAN warnings in get_wchan()
compiler, atomics, kasan: Provide READ_ONCE_NOCHECK()
x86, kasan: Fix build failure on KASAN=y && KMEMCHECK=y kernels
x86/smpboot: Fix CPU #1 boot timeout
x86/smpboot: Fix cpu_init_udelay=10000 corner case boot parameter misbehavior
x86/ioapic: Disable interrupts when re-routing legacy IRQs
x86/setup: Extend low identity map to cover whole kernel range
x86/efi: Fix multiple GOP device support
|
|
Merge fixes from Andrew Morton:
"9 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
ocfs2/dlm: unlock lockres spinlock before dlm_lockres_put
fault-inject: fix inverted interval/probability values in printk
lib/Kconfig.debug: disable -Wframe-larger-than warnings with KASAN=y
mm: make sendfile(2) killable
thp: use is_zero_pfn() only after pte_present() check
mailmap: update Javier Martinez Canillas' email
MAINTAINERS: add Sergey as zsmalloc reviewer
mm: cma: fix incorrect type conversion for size during dma allocation
kmod: don't run async usermode helper as a child of kworker thread
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"There is nothing to worry you much, only a few small & stable patches
are found for usual stuff, HD-audio (a Lenovo laptop quirk, a fix for
minor error handling) and ASoC (trivial fixes for RT298 and WM
codecs).
The only remaining major change is the fix for ASoC SX_TLV control
that was overseen during refactoring, but the fix itself is trivial
and safe"
* tag 'sound-4.3-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ASoC: wm8962: mark cache_dirty flag after software reset in pm_resume
ASoC: rt298: fix wrong setting of gpio2_en
ASoC: wm8904: Correct number of EQ registers
ALSA: hda - Fix deadlock at error in building PCM
ASoC: Add info callback for SX_TLV controls
ASoC: rt298: correct index default value
ALSA: hda - Fix inverted internal mic on Lenovo G50-80
ALSA: hdac: Explicitly add io.h
|
|
This was found during userspace fuzzing test when a large size dma cma
allocation is made by driver(like ion) through userspace.
show_stack+0x10/0x1c
dump_stack+0x74/0xc8
kasan_report_error+0x2b0/0x408
kasan_report+0x34/0x40
__asan_storeN+0x15c/0x168
memset+0x20/0x44
__dma_alloc_coherent+0x114/0x18c
Signed-off-by: Rohit Vaswani <rvaswani@codeaurora.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
'asoc/fix/wm8904' and 'asoc/fix/wm8962' into asoc-linus
|
|
There are 24 EQ registers not 25, I suspect this bug came about because
the registers start at EQ1 not zero. The bug is relatively harmless as
the extra register written is an unused one.
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
|
|
Some code may perform racy by design memory reads. This could be
harmless, yet such code may produce KASAN warnings.
To hide such accesses from KASAN this patch introduces
READ_ONCE_NOCHECK() macro. KASAN will not check the memory
accessed by READ_ONCE_NOCHECK(). The KernelThreadSanitizer
(KTSAN) is going to ignore it as well.
This patch creates __read_once_size_nocheck() a clone of
__read_once_size(). The only difference between them is
'no_sanitized_address' attribute appended to '*_nocheck'
function. This attribute tells the compiler that instrumentation
of memory accesses should not be applied to that function. We
declare it as static '__maybe_unsed' because GCC is not capable
to inline such function:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67368
With KASAN=n READ_ONCE_NOCHECK() is just a clone of READ_ONCE().
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wolfram Gloger <wmglo@dent.med.uni-muenchen.de>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Link: http://lkml.kernel.org/r/1445243838-17763-2-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Pull networking fixes from David Miller:
1) Account for extra headroom in ath9k driver, from Felix Fietkau.
2) Fix OOPS in pppoe driver due to incorrect socket state transition,
from Guillaume Nault.
3) Kill memory leak in amd-xgbe debugfx, from Geliang Tang.
4) Power management fixes for iwlwifi, from Johannes Berg.
5) Fix races in reqsk_queue_unlink(), from Eric Dumazet.
6) Fix dst_entry usage in ARP replies, from Jiri Benc.
7) Cure OOPSes with SO_GET_FILTER, from Daniel Borkmann.
8) Missing allocation failure check in amd-xgbe, from Tom Lendacky.
9) Various resource allocation/freeing cures in DSA< from Neil
Armstrong.
10) A series of bug fixes in the openvswitch conntrack support, from
Joe Stringer.
11) Fix two cases (BPF and act_mirred) where we have to clean the sender
cpu stored in the SKB before transmitting. From WANG Cong and
Alexei Starovoitov.
12) Disable VLAN filtering in promiscuous mode in mlx5 driver, from
Achiad Shochat.
13) Older bnx2x chips cannot do 4-tuple UDP hashing, so prevent this
configuration via ethtool. From Yuval Mintz.
14) Don't call rt6_uncached_list_flush_dev() from rt6_ifdown() when
'dev' is NULL, from Eric Biederman.
15) Prevent stalled link synchronization in tipc, from Jon Paul Maloy.
16) kcalloc() gstrings ethtool buffer before having driver fill it in,
in order to prevent kernel memory leaking. From Joe Perches.
17) Fix mixxing rt6_info initialization for blackhole routes, from
Martin KaFai Lau.
18) Kill VLAN regression in via-rhine, from Andrej Ota.
19) Missing pfmemalloc check in sk_add_backlog(), from Eric Dumazet.
20) Fix spurious MSG_TRUNC signalling in netlink dumps, from Ronen Arad.
21) Scrube SKBs when pushing them between namespaces in openvswitch,
from Joe Stringer.
22) bcmgenet enables link interrupts too early, fix from Florian
Fainelli.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (92 commits)
net: bcmgenet: Fix early link interrupt enabling
tunnels: Don't require remote endpoint or ID during creation.
openvswitch: Scrub skb between namespaces
xen-netback: correctly check failed allocation
net: asix: add support for the Billionton GUSB2AM-1G-B USB adapter
netlink: Trim skb to alloc size to avoid MSG_TRUNC
net: add pfmemalloc check in sk_add_backlog()
via-rhine: fix VLAN receive handling regression.
ipv6: Initialize rt6_info properly in ip6_blackhole_route()
ipv6: Move common init code for rt6_info to a new function rt6_info_init()
Bluetooth: Fix initializing conn_params in scan phase
Bluetooth: Fix conn_params list update in hci_connect_le_scan_cleanup
Bluetooth: Fix remove_device behavior for explicit connects
Bluetooth: Fix LE reconnection logic
Bluetooth: Fix reference counting for LE-scan based connections
Bluetooth: Fix double scan updates
mlxsw: core: Fix race condition in __mlxsw_emad_transmit
tipc: move fragment importance field to new header position
ethtool: Use kcalloc instead of kmalloc for ethtool_get_strings
tipc: eliminate risk of stalled link synchronization
...
|
|
Greg reported crashes hitting the following check in __sk_backlog_rcv()
BUG_ON(!sock_flag(sk, SOCK_MEMALLOC));
The pfmemalloc bit is currently checked in sk_filter().
This works correctly for TCP, because sk_filter() is ran in
tcp_v[46]_rcv() before hitting the prequeue or backlog checks.
For UDP or other protocols, this does not work, because the sk_filter()
is ran from sock_queue_rcv_skb(), which might be called _after_ backlog
queuing if socket is owned by user by the time packet is processed by
softirq handler.
Fixes: b4b9e35585089 ("netvm: set PF_MEMALLOC as appropriate during SKB processing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Greg Thelen <gthelen@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull drm fixes from Dave Airlie:
"Nothing too crazy or exciting:
- two MAINTAINERS entries that I didn't see the point in delaying.
- one drm mst fix to stop sending uninitialised data to monitors
- two amdgpu fixes
- one radeon mst tiling fix
- one vmwgfx regression fix
- one virtio warning fix.
I have found one locking problem that needs a bit of reorg to fix, but
I'm not sure it's worth putting in -fixes as I don't think we've seen
it hit in the real world ever, I just found it using the virtio-gpu
driver when working on it. I'll possibly send it next week once I've
time to discuss with Daniel"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/virtio: use %llu format string form atomic64_t
MAINTAINERS: Add myself as maintainer for the gma500 driver
MAINTAINERS: add a maintainer for the atmel-hlcdc DRM driver
drm/amdgpu: Keep the pflip interrupts always enabled v7
drm/amdgpu: adjust default dispclk (v2)
drm/dp/mst: make mst i2c transfer code more robust.
drm/radeon: attach tile property to mst connector
drm/vmwgfx: Fix kernel NULL pointer dereference on older hardware
|
|
SX_TLV controls are intended for situations where the register behind
the control has some non-zero value indicating the minimum gain
and then gains increasing from there and eventually overflowing through
zero.
Currently every CODEC implementing these controls specifies the minimum
as the non-zero value for the minimum and the maximum as the number of
gain settings available.
This means when the info callback subtracts the minimum value from the
maximum value to calculate the number of gain levels available it is
actually under reporting the available levels. This patch fixes this
issue by adding a new snd_soc_info_volsw_sx callback that does not
subtract the minimum value.
Fixes: 1d99f2436d0d ("ASoC: core: Rework SOC_DOUBLE_R_SX_TLV add SOC_SINGLE_SX_TLV")
Signed-off-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Acked-by: Brian Austin <brian.austin@cirrus.com>
Tested-by: Brian Austin <brian.austin@cirrus.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
|
|
bdi's are initialized in two steps, bdi_init() and bdi_register(), but
destroyed in a single step by bdi_destroy() which, for a bdi embedded
in a request_queue, is called during blk_cleanup_queue() which makes
the queue invisible and starts the draining of remaining usages.
A request_queue's user can access the congestion state of the embedded
bdi as long as it holds a reference to the queue. As such, it may
access the congested state of a queue which finished
blk_cleanup_queue() but hasn't reached blk_release_queue() yet.
Because the congested state was embedded in backing_dev_info which in
turn is embedded in request_queue, accessing the congested state after
bdi_destroy() was called was fine. The bdi was destroyed but the
memory region for the congested state remained accessible till the
queue got released.
a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in
bdi_writeback") changed the situation. Now, the root congested state
which is expected to be pinned while request_queue remains accessible
is separately reference counted and the base ref is put during
bdi_destroy(). This means that the root congested state may go away
prematurely while the queue is between bdi_dstroy() and
blk_cleanup_queue(), which was detected by Andrey's KASAN tests.
The root cause of this problem is that bdi doesn't distinguish the two
steps of destruction, unregistration and release, and now the root
congested state actually requires a separate release step. To fix the
issue, this patch separates out bdi_unregister() and bdi_exit() from
bdi_destroy(). bdi_unregister() is called from blk_cleanup_queue()
and bdi_exit() from blk_release_queue(). bdi_destroy() is now just a
simple wrapper calling the two steps back-to-back.
While at it, the prototype of bdi_destroy() is moved right below
bdi_setup_and_register() so that the counterpart operations are
located together.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: a13f35e87140 ("writeback: don't embed root bdi_writeback_congested in bdi_writeback")
Cc: stable@vger.kernel.org # v4.2+
Reported-and-tested-by: Andrey Konovalov <andreyknvl@google.com>
Link: http://lkml.kernel.org/g/CAAeHK+zUJ74Zn17=rOyxacHU18SgCfC6bsYW=6kCY5GXJBwGfQ@mail.gmail.com
Reviewed-by: Jan Kara <jack@suse.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This zeroes the msg so no random stack data ends up getting
sent, it also limits the function to not accepting > 4
i2c msgs.
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
RTA_ALIGNTO is currently define as 4. It has to be 4U to prevent warning
for RTA_ALIGN and RTA_DATA expansions when -Wconversion gcc option is
enabled.
This follows NLMSG_ALIGNTO definition in <include/uapi/linux/netlink.h>.
Signed-off-by: Ronen Arad <ronen.arad@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
MINSIGSTKSZ and SIGSTKSZ for ARM64 are not correctly set in latest kernel.
This patch fixes this issue.
This issue is reported in LTP (testcase: sigaltstack02.c).
Testcase failed when sigaltstack() called with stack size "MINSIGSTKSZ - 1"
Since in Glibc-2.22, MINSIGSTKSZ is set to 5120 but in kernel
it is set to 2048 so testcase gets failed.
Testcase Output:
sigaltstack02 1 TPASS : stgaltstack() fails, Invalid Flag value,errno:22
sigaltstack02 2 TFAIL : sigaltstack() returned 0, expected -1,errno:12
Reported Issue in Glibc Bugzilla:
Bugfix in Glibc-2.22: [Bug 16850]
https://sourceware.org/bugzilla/show_bug.cgi?id=16850
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Akhilesh Kumar <akhilesh.k@samsung.com>
Signed-off-by: Manjeet Pawar <manjeet.p@samsung.com>
Signed-off-by: Rohit Thapliyal <r.thapliyal@samsung.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
|
|
For memcg domains, the amount of available memory was calculated as
min(the amount currently in use + headroom according to memcg,
total clean memory)
This isn't quite correct as what should be capped by the amount of
clean memory is the headroom, not the sum of memory in use and
headroom. For example, if a memcg domain has a significant amount of
dirty memory, the above can lead to a value which is lower than the
current amount in use which doesn't make much sense. In most
circumstances, the above leads to a number which is somewhat but not
drastically lower.
As the amount of memory which can be readily allocated to the memcg
domain is capped by the amount of system-wide clean memory which is
not already assigned to the memcg itself, the number we want is
the amount currently in use +
min(headroom according to memcg, clean memory elsewhere in the system)
This patch updates mem_cgroup_wb_stats() to return the number of
filepages and headroom instead of the calculated available pages.
mdtc_cap_avail() is renamed to mdtc_calc_avail() and performs the
above calculation from file, headroom, dirty and globally clean pages.
v2: Dummy mem_cgroup_wb_stats() implementation wasn't updated leading
to build failure when !CGROUP_WRITEBACK. Fixed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Fixes: c2aa723a6093 ("writeback: implement memcg writeback domain based throttling")
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
bdi_for_each_wb() is used in several places to wake up or issue
writeback work items to all wb's (bdi_writeback's) on a given bdi.
The iteration is performed by walking bdi->cgwb_tree; however, the
tree only indexes wb's which are currently active.
For example, when a memcg gets associated with a different blkcg, the
old wb is removed from the tree so that the new one can be indexed.
The old wb starts dying from then on but will linger till all its
inodes are drained. As these dying wb's may still host dirty inodes,
writeback operations which affect all wb's must include them.
bdi_for_each_wb() skipping dying wb's led to sync(2) missing and
failing to sync the inodes belonging to those wb's.
This patch adds a RCU protected @bdi->wb_list which lists all wb's
beloinging to that bdi. wb's are added on creation and removed on
release rather than on the start of destruction. bdi_for_each_wb()
usages are replaced with list_for_each[_continue]_rcu() iterations
over @bdi->wb_list and bdi_for_each_wb() and its helpers are removed.
v2: Updated as per Jan. last_wb ref leak in bdi_split_work_to_wbs()
fixed and unnecessary list head severing in cgwb_bdi_destroy()
removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Artem Bityutskiy <dedekind1@gmail.com>
Fixes: ebe41ab0c79d ("writeback: implement bdi_for_each_wb()")
Link: http://lkml.kernel.org/g/1443012552.19983.209.camel@gmail.com
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"Three trivial commits:
- Fix a kerneldoc regression
- Export handle_bad_irq to unbreak a driver in next
- Add an accessor for the of_node field so refactoring in next does
not depend on merge ordering"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqdomain: Add an accessor for the of_node field
genirq: Fix handle_bad_irq kerneldoc comment
genirq: Export handle_bad_irq
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB and PHY fixes and quirk updates for 4.3-rc5.
Nothing major here, full details in the shortlog, and all of these
have been in linux-next for a while"
* tag 'usb-4.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: Add device quirk for Logitech PTZ cameras
USB: chaoskey read offset bug
USB: Add reset-resume quirk for two Plantronics usb headphones.
usb: renesas_usbhs: Add support for R-Car H3
usb: renesas_usbhs: fix build warning if 64-bit architecture
usb: gadget: bdc: fix memory leak
phy: berlin-sata: Fix module autoload for OF platform driver
phy: rockchip-usb: power down phy when rockchip phy probe
phy: qcom-ufs: fix build error when the component is built as a module
|
|
As we're about to remove the of_node field from the irqdomain
structure, introduce an accessor for it. Subsequent patches
will take care of the actual repainting.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Jason Cooper <jason@lakedaemon.net>
Link: http://lkml.kernel.org/r/1444402211-1141-1-git-send-email-marc.zyngier@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Make unix_sk() just like inet[6]_sk() by constify'ing the sock
parameter.
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Previously, the CT_ATTR_FLAGS attribute, when nested under the
OVS_ACTION_ATTR_CT, encoded a 32-bit bitmask of flags that modify the
semantics of the ct action. It's more extensible to just represent each
flag as a nested attribute, and this requires no additional error
checking to reject flags that aren't currently supported.
Suggested-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The ct_state field was initially added as an 8-bit field, however six of
the bits are already being used and use cases are already starting to
appear that may push the limits of this field. This patch extends the
field to 32 bits while retaining the internal representation of 8 bits.
This should cover forward compatibility of the ABI for the foreseeable
future.
This patch also reorders the OVS_CS_F_* bits to be sequential.
Suggested-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
These comments hadn't caught up to their implementations, fix them.
Fixes: 7f8a436eaa2c "openvswitch: Add conntrack action"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen bug fixes from David Vrabel:
- Fix VM save performance regression with x86 PV guests
- Make kexec work in x86 PVHVM guests (if Xen has the soft-reset ABI)
- Other minor fixes.
* tag 'for-linus-4.3b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
x86/xen/p2m: hint at the last populated P2M entry
x86/xen: Do not clip xen_e820_map to xen_e820_map_entries when sanitizing map
x86/xen: Support kexec/kdump in HVM guests by doing a soft reset
xen/x86: Don't try to write syscall-related MSRs for PV guests
xen: use correct type for HYPERVISOR_memory_op()
|
|
Conntrack LABELS (plural) are exposed by conntrack; rename the OVS name
for these to be consistent with conntrack.
Fixes: c2ac667 "openvswitch: Allow matching on conntrack label"
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
I’m using the compilation flag -Werror=old-style-declaration, which
requires that the “inline” word would come at the beginning of the code
line.
$ make drivers/net/ethernet/intel/e1000e/e1000e.ko
...
include/net/inet_timewait_sock.h:116:1: error: ‘inline’ is not at
beginning of declaration [-Werror=old-style-declaration]
static void inline inet_twsk_schedule(struct inet_timewait_sock *tw, int
timeo)
include/net/inet_timewait_sock.h:121:1: error: ‘inline’ is not at
beginning of declaration [-Werror=old-style-declaration]
static void inline inet_twsk_reschedule(struct inet_timewait_sock *tw,
int timeo)
Fixes: ed2e92394589 ("tcp/dccp: fix timewait races in timer handling")
Signed-off-by: Raanan Avargil <raanan.avargil@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull strscpy string copy function implementation from Chris Metcalf.
Chris sent this during the merge window, but I waffled back and forth on
the pull request, which is why it's going in only now.
The new "strscpy()" function is definitely easier to use and more secure
than either strncpy() or strlcpy(), both of which are horrible nasty
interfaces that have serious and irredeemable problems.
strncpy() has a useless return value, and doesn't NUL-terminate an
overlong result. To make matters worse, it pads a short result with
zeroes, which is a performance disaster if you have big buffers.
strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking
the insane NUL padding, but having a differently broken return value
which returns the original length of the source string. Which means
that it will read characters past the count from the source buffer, and
you have to trust the source to be properly terminated. It also makes
error handling fragile, since the test for overflow is unnecessarily
subtle.
strscpy() avoids both these problems, guaranteeing the NUL termination
(but not excessive padding) if the destination size wasn't zero, and
making the overflow condition very obvious by returning -E2BIG. It also
doesn't read past the size of the source, and can thus be used for
untrusted source data too.
So why did I waffle about this for so long?
Every time we introduce a new-and-improved interface, people start doing
these interminable series of trivial conversion patches.
And every time that happens, somebody does some silly mistake, and the
conversion patch to the improved interface actually makes things worse.
Because the patch is mindnumbing and trivial, nobody has the attention
span to look at it carefully, and it's usually done over large swatches
of source code which means that not every conversion gets tested.
So I'm pulling the strscpy() support because it *is* a better interface.
But I will refuse to pull mindless conversion patches. Use this in
places where it makes sense, but don't do trivial patches to fix things
that aren't actually known to be broken.
* 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
tile: use global strscpy() rather than private copy
string: provide strscpy()
Make asm/word-at-a-time.h available on all architectures
|
|
Pull drm fixes from Dave Airlie:
"Bunch of fixes all over the place, all pretty small: amdgpu, i915,
exynos, one qxl and one vmwgfx.
There is also a bunch of mst fixes, I left some cleanups in the series
as I didn't think it was worth splitting up the tested series"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (37 commits)
drm/dp/mst: add some defines for logical/physical ports
drm/dp/mst: drop cancel work sync in the mstb destroy path (v2)
drm/dp/mst: split connector registration into two parts (v2)
drm/dp/mst: update the link_address_sent before sending the link address (v3)
drm/dp/mst: fixup handling hotplug on port removal.
drm/dp/mst: don't pass port into the path builder function
drm/radeon: drop radeon_fb_helper_set_par
drm: handle cursor_set2 in restore_fbdev_mode
drm/exynos: Staticize local function in exynos_drm_gem.c
drm/exynos: fimd: actually disable dp clock
drm/exynos: dp: remove suspend/resume functions
drm/qxl: recreate the primary surface when the bo is not primary
drm/amdgpu: only print meaningful VM faults
drm/amdgpu/cgs: remove import_gpu_mem
drm/i915: Call non-locking version of drm_kms_helper_poll_enable(), v2
drm: Add a non-locking version of drm_kms_helper_poll_enable(), v2
drm/vmwgfx: Fix a command submission hang regression
drm/exynos: remove unused mode_fixup() code
drm/exynos: remove decon_mode_fixup()
drm/exynos: remove fimd_mode_fixup()
...
|
|
Pull block fixes from Jens Axboe:
"Another week, another round of fixes.
These have been brewing for a bit and in various iterations, but I
feel pretty comfortable about the quality of them. They fix real
issues. The pull request is mostly blk-mq related, and the only one
not fixing a real bug, is the tag iterator abstraction from Christoph.
But it's pretty trivial, and we'll need it for another fix soon.
Apart from the blk-mq fixes, there's an NVMe affinity fix from Keith,
and a single fix for xen-blkback from Roger fixing failure to free
requests on disconnect"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: factor out a helper to iterate all tags for a request_queue
blk-mq: fix racy updates of rq->errors
blk-mq: fix deadlock when reading cpu_list
blk-mq: avoid inserting requests before establishing new mapping
blk-mq: fix q->mq_usage_counter access race
blk-mq: Fix use after of free q->mq_map
blk-mq: fix sysfs registration/unregistration race
blk-mq: avoid setting hctx->tags->cpumask before allocation
NVMe: Set affinity after allocating request queues
xen/blkback: free requests on disconnection
|
|
Pull IOVA fixes from David Woodhouse:
"The main fix here is the first one, fixing the over-allocation of
size-aligned requests. The other patches simply make the existing
IOVA code available to users other than the Intel VT-d driver, with no
functional change.
I concede the latter really *should* have been submitted during the
merge window, but since it's basically risk-free and people are
waiting to build on top of it and it's my fault I didn't get it in, I
(and they) would be grateful if you'd take it"
* git://git.infradead.org/intel-iommu:
iommu: Make the iova library a module
iommu: iova: Export symbols
iommu: iova: Move iova cache management to the iova library
iommu/iova: Avoid over-allocating when size-aligned
|
|
This just removes the magic number.
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
In order to cache the EDID properly for tiled displays, we
need to retrieve it before we register the connector with
userspace, otherwise userspace can call get resources
and try and get the edid before we've even cached it.
This fixes some problems when hotplugging mst monitors,
with X/mutter running. As mutter seems to get 0 modes
for one of the monitors in the tile.
v2: fix warning in radeon
handle tile setting in cached path rather than
get edid path.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
Merge misc fixes from Andrew Morton:
"12 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
dmapool: fix overflow condition in pool_find_page()
thermal: avoid division by zero in power allocator
memcg: remove pcp_counter_lock
kprobes: use _do_fork() in samples to make them work again
drivers/input/joystick/Kconfig: zhenhua.c needs BITREVERSE
memcg: make mem_cgroup_read_stat() unsigned
memcg: fix dirty page migration
dax: fix NULL pointer in __dax_pmd_fault()
mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault
mm/slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE+1)
userfaultfd: remove kernel header include from uapi header
arch/x86/include/asm/efi.h: fix build failure
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management and ACPI fixes from Rafael Wysocki:
"These are fixes mostly, for a few changes made in this cycle (the
intel_idle driver, the OPP library, the ACPI EC driver, turbostat) and
for some issues that have just been discovered (ACPI PCI IRQ
management, PCI power management documentation, turbostat), with a
couple of cleanups on top of them.
Specifics:
- intel_idle driver fixup for the recently added Skylake chips
support (Len Brown).
- Operating Performance Points (OPP) library fix related to the
recently added support for new DT bindings and a fix for a typo in
a comment (Viresh Kumar, Stephen Boyd).
- ACPI EC driver fix for a recently introduced memory leak in an
error code path (Lv Zheng).
- ACPI PCI IRQ management fix for the issue where an ISA IRQ is
shared with a PCI device which requires it to be configured in a
different way and may cause an interrupt storm to happen as a
result with an extra ACPI SCI IRQ handling simplification on top of
it (Jiang Liu).
- Update of the PCI power management documentation that became
outdated and started to actively confuse the readers to make it
actually reflect the code (Rafael J Wysocki).
- turbostat fixes including an IVB Xeon regression fix (related to
the --debug command line option), Skylake adjustment for the TSC
running at a frequency that doesn't match the base one exactly, and
a Knights Landing quirk to account for the fact that it only
updates APERF and MPERF every 1024 clock cycles plus bumping up the
turbostat version number (Len Brown, Hubert Chrzaniuk)"
* tag 'pm+acpi-4.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
tools/power turbosat: update version number
tools/power turbostat: SKL: Adjust for TSC difference from base frequency
tools/power turbostat: KNL workaround for %Busy and Avg_MHz
tools/power turbostat: IVB Xeon: fix --debug regression
ACPI / PCI: Remove duplicated penalty on SCI IRQ
ACPI, PCI, irq: Do not share PCI IRQ with ISA IRQ
ACPI / EC: Fix a memory leak issue in acpi_ec_query()
PM / OPP: Fix typo modifcation -> modification
PCI / PM: Update runtime PM documentation for PCI devices
PM / OPP: of_property_count_u32_elems() can return errors
intel_idle: Skylake Client Support - updated
|
|
Pull networking fixes from David Miller:
1) Fix regression in SKB partial checksum handling, from Pravin B
Shalar.
2) Fix VLAN inside of VXLAN handling in i40e driver, from Jesse
Brandeburg.
3) Cure softlockups during accept() in SCTP, from Karl Heiss.
4) MSG_PEEK should return multiple SKBs worth of data in AF_UNIX, from
Aaron Conole.
5) IPV6 erroneously ignores output interface specifier in lookup key for
route lookups, fix from David Ahern.
6) In Marvell DSA driver, forward unknown frames to CPU port, from
Andrew Lunn.
7) Mission flow flag initializations in some code paths, from David
Ahern.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: Initialize flow flags in input path
net: dsa: fix preparation of a port STP update
testptp: Silence compiler warnings on ppc64
net/mlx4: Handle return codes in mlx4_qp_attach_common
dsa: mv88e6xxx: Enable forwarding for unknown to the CPU port
skbuff: Fix skb checksum partial check.
net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set
net sysfs: Print link speed as signed integer
bna: fix error handling
af_unix: return data from multiple SKBs on recv() with MSG_PEEK flag
af_unix: Convert the unix_sk macro to an inline function for type safety
net: sctp: Don't use 64 kilobyte lookup table for four elements
l2tp: protect tunnel->del_work by ref_count
net/ibm/emac: bump version numbers for correct work with ethtool
sctp: Prevent soft lockup when sctp_accept() is called during a timeout event
sctp: Whitespace fix
i40e/i40evf: check for stopped admin queue
i40e: fix VLAN inside VXLAN
r8169: fix handling rtl_readphy result
net: hisilicon: fix handling platform_get_irq result
|
|
Commit 733a572e66d2 ("memcg: make mem_cgroup_read_{stat|event}() iterate
possible cpus instead of online") removed the last use of the per memcg
pcp_counter_lock but forgot to remove the variable.
Kill the vestigial variable.
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The problem starts with a file backed dirty page which is charged to a
memcg. Then page migration is used to move oldpage to newpage.
Migration:
- copies the oldpage's data to newpage
- clears oldpage.PG_dirty
- sets newpage.PG_dirty
- uncharges oldpage from memcg
- charges newpage to memcg
Clearing oldpage.PG_dirty decrements the charged memcg's dirty page
count.
However, because newpage is not yet charged, setting newpage.PG_dirty
does not increment the memcg's dirty page count. After migration
completes newpage.PG_dirty is eventually cleared, often in
account_page_cleaned(). At this time newpage is charged to a memcg so
the memcg's dirty page count is decremented which causes underflow
because the count was not previously incremented by migration. This
underflow causes balance_dirty_pages() to see a very large unsigned
number of dirty memcg pages which leads to aggressive throttling of
buffered writes by processes in non root memcg.
This issue:
- can harm performance of non root memcg buffered writes.
- can report too small (even negative) values in
memory.stat[(total_)dirty] counters of all memcg, including the root.
To avoid polluting migrate.c with #ifdef CONFIG_MEMCG checks, introduce
page_memcg() and set_page_memcg() helpers.
Test:
0) setup and enter limited memcg
mkdir /sys/fs/cgroup/test
echo 1G > /sys/fs/cgroup/test/memory.limit_in_bytes
echo $$ > /sys/fs/cgroup/test/cgroup.procs
1) buffered writes baseline
dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
sync
grep ^dirty /sys/fs/cgroup/test/memory.stat
2) buffered writes with compaction antagonist to induce migration
yes 1 > /proc/sys/vm/compact_memory &
rm -rf /data/tmp/foo
dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
kill %
sync
grep ^dirty /sys/fs/cgroup/test/memory.stat
3) buffered writes without antagonist, should match baseline
rm -rf /data/tmp/foo
dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
sync
grep ^dirty /sys/fs/cgroup/test/memory.stat
(speed, dirty residue)
unpatched patched
1) 841 MB/s 0 dirty pages 886 MB/s 0 dirty pages
2) 611 MB/s -33427456 dirty pages 793 MB/s 0 dirty pages
3) 114 MB/s -33427456 dirty pages 891 MB/s 0 dirty pages
Notice that unpatched baseline performance (1) fell after
migration (3): 841 -> 114 MB/s. In the patched kernel, post
migration performance matches baseline.
Fixes: c4843a7593a9 ("memcg: add per cgroup dirty page accounting")
Signed-off-by: Greg Thelen <gthelen@google.com>
Reported-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org> [4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As include/uapi/linux/userfaultfd.h is a user visible header file, it
should not include kernel-exclusive header files.
So trying to build the userfaultfd test program from the selftests
directory fails, since it contains a reference to linux/compiler.h. As
it turns out, that header is not really needed there, so we can simply
remove it to fix that issue.
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
- Fixes for mlx5 related issues
- Fixes for ipoib multicast handling
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/ipoib: increase the max mcast backlog queue
IB/ipoib: Make sendonly multicast joins create the mcast group
IB/ipoib: Expire sendonly multicast joins
IB/mlx5: Remove pa_lkey usages
IB/mlx5: Remove support for IB_DEVICE_LOCAL_DMA_LKEY
IB/iser: Add module parameter for always register memory
xprtrdma: Replace global lkey with lkey local to PD
|
|
* pm-pci:
PCI / PM: Update runtime PM documentation for PCI devices
* acpi-pci:
ACPI / PCI: Remove duplicated penalty on SCI IRQ
ACPI, PCI, irq: Do not share PCI IRQ with ISA IRQ
|
|
And replace the blk_mq_tag_busy_iter with it - the driver use has been
replaced with a new helper a while ago, and internal to the block we
only need the new version.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
blk_mq_complete_request may be a no-op if the request has already
been completed by others means (e.g. a timeout or cancellation), but
currently drivers have to set rq->errors before calling
blk_mq_complete_request, which might leave us with the wrong error value.
Add an error parameter to blk_mq_complete_request so that we can
defer setting rq->errors until we known we won the race to complete the
request.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
This patch fixes the following warning if 64-bit architecture environment:
./drivers/usb/renesas_usbhs/common.c:496:25: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
dparam->type = of_id ? (u32)of_id->data : 0;
Acked-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Felipe Balbi <balbi@ti.com>
|
|
drm_kms_helper_poll_enable() was converted to lock the mode_config
mutex in commit 8c4ccc4ab6f64e859d4ff8d7c02c2ed2e956e07f
("drm/probe-helper: Grab mode_config.mutex in poll_init/enable").
This disregarded the cases where this function is called from a context
where this mutex is already locked.
Add a non-locking version as well.
Changes since v1:
- use function name suffix '_locked' for the function that
is to be called from a locked context.
Signed-off-by: Egbert Eich <eich@suse.de>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
|
|
Earlier patch 6ae459bda tried to detect void ckecksum partial
skb by comparing pull length to checksum offset. But it does
not work for all cases since checksum-offset depends on
updates to skb->data.
Following patch fixes it by validating checksum start offset
after skb-data pointer is updated. Negative value of checksum
offset start means there is no need to checksum.
Fixes: 6ae459bda ("skbuff: Fix skb checksum flag on skb pull")
Reported-by: Andrew Vagin <avagin@odin.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
As suggested by Eric Dumazet this change replaces the
#define with a static inline function to enjoy
complaints by the compiler when misusing the API.
Signed-off-by: Aaron Conole <aconole@bytheb.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
There is a race between cpu hotplug handling and adding/deleting
gendisk for blk-mq, where both are trying to register and unregister
the same sysfs entries.
null_add_dev
--> blk_mq_init_queue
--> blk_mq_init_allocated_queue
--> add to 'all_q_list' (*)
--> add_disk
--> blk_register_queue
--> blk_mq_register_disk (++)
null_del_dev
--> del_gendisk
--> blk_unregister_queue
--> blk_mq_unregister_disk (--)
--> blk_cleanup_queue
--> blk_mq_free_queue
--> del from 'all_q_list' (*)
blk_mq_queue_reinit
--> blk_mq_sysfs_unregister (-)
--> blk_mq_sysfs_register (+)
While the request queue is added to 'all_q_list' (*),
blk_mq_queue_reinit() can be called for the queue anytime by CPU
hotplug callback. But blk_mq_sysfs_unregister (-) and
blk_mq_sysfs_register (+) in blk_mq_queue_reinit must not be called
before blk_mq_register_disk (++) and after blk_mq_unregister_disk (--)
is finished. Because '/sys/block/*/mq/' is not exists.
There has already been BLK_MQ_F_SYSFS_UP flag in hctx->flags which can
be used to track these sysfs stuff, but it is only fixing this issue
partially.
In order to fix it completely, we just need per-queue flag instead of
per-hctx flag with appropriate locking. So this introduces
q->mq_sysfs_init_done which is properly protected with all_q_mutex.
Also, we need to ensure that blk_mq_map_swqueue() is called with
all_q_mutex is held. Since hctx->nr_ctx is reset temporarily and
updated in blk_mq_map_swqueue(), so we should avoid
blk_mq_register_hctx() seeing the temporary hctx->nr_ctx value
in CPU hotplug handling or adding/deleting gendisk .
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Cc: Ming Lei <tom.leiming@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
Currently there is a number of issues preventing PVHVM Xen guests from
doing successful kexec/kdump:
- Bound event channels.
- Registered vcpu_info.
- PIRQ/emuirq mappings.
- shared_info frame after XENMAPSPACE_shared_info operation.
- Active grant mappings.
Basically, newly booted kernel stumbles upon already set up Xen
interfaces and there is no way to reestablish them. In Xen-4.7 a new
feature called 'soft reset' is coming. A guest performing kexec/kdump
operation is supposed to call SCHEDOP_shutdown hypercall with
SHUTDOWN_soft_reset reason before jumping to new kernel. Hypervisor
(with some help from toolstack) will do full domain cleanup (but
keeping its memory and vCPU contexts intact) returning the guest to
the state it had when it was first booted and thus allowing it to
start over.
Doing SHUTDOWN_soft_reset on Xen hypervisors which don't support it is
probably OK as by default all unknown shutdown reasons cause domain
destroy with a message in toolstack log: 'Unknown shutdown reason code
5. Destroying domain.' which gives a clue to what the problem is and
eliminates false expectations.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
|