Age | Commit message (Collapse) | Author |
|
startsync is a no-op, has been for years. Remove it.
Link: http://tracker.ceph.com/issues/20604
Signed-off-by: Yanhu Cao <gmayyyha@gmail.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
- Prevent a potential inconistency in the perf user space access which
might lead to evading sanity checks.
- Prevent perf recording function trace entries twice
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/ftrace: Fix double traces of perf on ftrace:function
perf/core: Fix potential double-fetch bug
|
|
Pull networking fixes from David Miller:
1) Fix handling of pinned BPF map nodes in hash of maps, from Daniel
Borkmann.
2) IPSEC ESP error paths leak memory, from Steffen Klassert.
3) We need an RCU grace period before freeing fib6_node objects, from
Wei Wang.
4) Must check skb_put_padto() return value in HSR driver, from FLorian
Fainelli.
5) Fix oops on PHY probe failure in ftgmac100 driver, from Andrew
Jeffery.
6) Fix infinite loop in UDP queue when using SO_PEEK_OFF, from Eric
Dumazet.
7) Use after free when tcf_chain_destroy() called multiple times, from
Jiri Pirko.
8) Fix KSZ DSA tag layer multiple free of SKBS, from Florian Fainelli.
9) Fix leak of uninitialized memory in sctp_get_sctp_info(),
inet_diag_msg_sctpladdrs_fill() and inet_diag_msg_sctpaddrs_fill().
From Stefano Brivio.
10) L2TP tunnel refcount fixes from Guillaume Nault.
11) Don't leak UDP secpath in udp_set_dev_scratch(), from Yossi
Kauperman.
12) Revert a PHY layer change wrt. handling of PHY_HALTED state in
phy_stop_machine(), it causes regressions for multiple people. From
Florian Fainelli.
13) When packets are sent out of br0 we have to clear the
offload_fwdq_mark value.
14) Several NULL pointer deref fixes in packet schedulers when their
->init() routine fails. From Nikolay Aleksandrov.
15) Aquantium devices cannot checksum offload correctly when the packet
is <= 60 bytes. From Pavel Belous.
16) Fix vnet header access past end of buffer in AF_PACKET, from
Benjamin Poirier.
17) Double free in probe error paths of nfp driver, from Dan Carpenter.
18) QOS capability not checked properly in DCB init paths of mlx5
driver, from Huy Nguyen.
19) Fix conflicts between firmware load failure and health_care timer in
mlx5, also from Huy Nguyen.
20) Fix dangling page pointer when DMA mapping errors occur in mlx5,
from Eran Ben ELisha.
21) ->ndo_setup_tc() in bnxt_en driver doesn't count rings properly,
from Michael Chan.
22) Missing MSIX vector free in bnxt_en, also from Michael Chan.
23) Refcount leak in xfrm layer when using sk_policy, from Lorenzo
Colitti.
24) Fix copy of uninitialized data in qlge driver, from Arnd Bergmann.
25) bpf_setsockopts() erroneously always returns -EINVAL even on
success. Fix from Yuchung Cheng.
26) tipc_rcv() needs to linearize the SKB before parsing the inner
headers, from Parthasarathy Bhuvaragan.
27) Fix deadlock between link status updates and link removal in netvsc
driver, from Stephen Hemminger.
28) Missed locking of page fragment handling in ESP output, from Steffen
Klassert.
29) Fix refcnt leak in ebpf congestion control code, from Sabrina
Dubroca.
30) sxgbe_probe_config_dt() doesn't check devm_kzalloc()'s return value,
from Christophe Jaillet.
31) Fix missing ipv6 rx_dst_cookie update when rx_dst is updated during
early demux, from Paolo Abeni.
32) Several info leaks in xfrm_user layer, from Mathias Krause.
33) Fix out of bounds read in cxgb4 driver, from Stefano Brivio.
34) Properly propagate obsolete state of route upwards in ipv6 so that
upper holders like xfrm can see it. From Xin Long.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (118 commits)
udp: fix secpath leak
bridge: switchdev: Clear forward mark when transmitting packet
mlxsw: spectrum: Forbid linking to devices that have uppers
wl1251: add a missing spin_lock_init()
Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()"
net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278
kcm: do not attach PF_KCM sockets to avoid deadlock
sch_tbf: fix two null pointer dereferences on init failure
sch_sfq: fix null pointer dereference on init failure
sch_netem: avoid null pointer deref on init failure
sch_fq_codel: avoid double free on init failure
sch_cbq: fix null pointer dereferences on init failure
sch_hfsc: fix null pointer deref and double free on init failure
sch_hhf: fix null pointer dereference on init failure
sch_multiq: fix double free on init failure
sch_htb: fix crash on init failure
net/mlx5e: Fix CQ moderation mode not set properly
net/mlx5e: Fix inline header size for small packets
net/mlx5: E-Switch, Unload the representors in the correct order
net/mlx5e: Properly resolve TC offloaded ipv6 vxlan tunnel source address
...
|
|
The mlxsw driver relies on NETDEV_CHANGEUPPER events to configure the
device in case a port is enslaved to a master netdev such as bridge or
bond.
Since the driver ignores events unrelated to its ports and their
uppers, it's possible to engineer situations in which the device's data
path differs from the kernel's.
One example to such a situation is when a port is enslaved to a bond
that is already enslaved to a bridge. When the bond was enslaved the
driver ignored the event - as the bond wasn't one of its uppers - and
therefore a bridge port instance isn't created in the device.
Until such configurations are supported forbid them by checking that the
upper device doesn't have uppers of its own.
Fixes: 0d65fc13042f ("mlxsw: spectrum: Implement LAG port join/leave")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Nogah Frankel <nogahf@mellanox.com>
Tested-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull block fixes from Jens Axboe:
"Unfortunately a few issues that warrant sending another pull request,
even if I had hoped to avoid it. This contains:
- A fix for multiqueue xen-blkback, on tear down / disconnect.
- A few fixups for NVMe, including a wrong bit definition, fix for
host memory buffers, and an nvme rdma page size fix"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme: fix the definition of the doorbell buffer config support bit
nvme-pci: use dma memory for the host memory buffer descriptors
nvme-rdma: default MR page size to 4k
xen-blkback: stop blkback thread of every queue in xen_blkif_disconnect
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- A couple fixes for bugs introduced as part of the blk_status_t block
layer changes during the 4.13 merge window
- A printk throttling fix to use discrete rate limiting state for each
DM log level
- A stable@ fix for DM multipath that delays request requeueing to
avoid CPU lockup if/when the request queue is "dying"
* tag 'for-4.13/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm mpath: do not lock up a CPU with requeuing activity
dm: fix printk() rate limiting code
dm mpath: retry BLK_STS_RESOURCE errors
dm: fix the second dec_pending() argument in __split_and_process_bio()
|
|
Merge more fixes from Andrew Morton:
"6 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
scripts/dtc: fix '%zx' warning
include/linux/compiler.h: don't perform compiletime_assert with -O0
mm, madvise: ensure poisoned pages are removed from per-cpu lists
mm, uprobes: fix multiple free of ->uprobes_state.xol_area
kernel/kthread.c: kthread_worker: don't hog the cpu
mm,page_alloc: don't call __node_reclaim() with oom_lock held.
|
|
Commit c7acec713d14 ("kernel.h: handle pointers to arrays better in
container_of()") made use of __compiletime_assert() from container_of()
thus increasing the usage of this macro, allowing developers to notice
type conflicts in usage of container_of() at compile time.
However, the implementation of __compiletime_assert relies on compiler
optimizations to report an error. This means that if a developer uses
"-O0" with any code that performs container_of(), the compiler will always
report an error regardless of whether there is an actual problem in the
code.
This patch disables compile_time_assert when optimizations are disabled to
allow such code to compile with CFLAGS="-O0".
Example compilation failure:
./include/linux/compiler.h:547:38: error: call to `__compiletime_assert_94' declared with attribute error: pointer type mismatch in container_of()
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
^
./include/linux/compiler.h:530:4: note: in definition of macro `__compiletime_assert'
prefix ## suffix(); \
^~~~~~
./include/linux/compiler.h:547:2: note: in expansion of macro `_compiletime_assert'
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)
^~~~~~~~~~~~~~~~~~~
./include/linux/build_bug.h:46:37: note: in expansion of macro `compiletime_assert'
#define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
^~~~~~~~~~~~~~~~~~
./include/linux/kernel.h:860:2: note: in expansion of macro `BUILD_BUG_ON_MSG'
BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) && \
^~~~~~~~~~~~~~~~
[akpm@linux-foundation.org: use do{}while(0), per Michal]
Link: http://lkml.kernel.org/r/20170829230114.11662-1-joe@ovn.org
Fixes: c7acec713d14c6c ("kernel.h: handle pointers to arrays better in container_of()")
Signed-off-by: Joe Stringer <joe@ovn.org>
Cc: Ian Abbott <abbotti@mev.co.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The invalidate_page callback suffered from two pitfalls. First it used
to happen after the page table lock was release and thus a new page
might have setup before the call to invalidate_page() happened.
This is in a weird way fixed by commit c7ab0d2fdc84 ("mm: convert
try_to_unmap_one() to use page_vma_mapped_walk()") that moved the
callback under the page table lock but this also broke several existing
users of the mmu_notifier API that assumed they could sleep inside this
callback.
The second pitfall was invalidate_page() being the only callback not
taking a range of address in respect to invalidation but was giving an
address and a page. Lots of the callback implementers assumed this
could never be THP and thus failed to invalidate the appropriate range
for THP.
By killing this callback we unify the mmu_notifier callback API to
always take a virtual address range as input.
Finally this also simplifies the end user life as there is now two clear
choices:
- invalidate_range_start()/end() callback (which allow you to sleep)
- invalidate_range() where you can not sleep but happen right after
page table update under page table lock
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Bernhard Held <berny156@gmx.de>
Cc: Adam Borowski <kilobyte@angband.pl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: axie <axie@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Replace all mmu_notifier_invalidate_page() calls by *_invalidate_range()
and make sure it is bracketed by calls to *_invalidate_range_start()/end().
Note that because we can not presume the pmd value or pte value we have
to assume the worst and unconditionaly report an invalidation as
happening.
Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Bernhard Held <berny156@gmx.de>
Cc: Adam Borowski <kilobyte@angband.pl>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: axie <axie@amd.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
MLX5_INTERFACE_STATE_SHUTDOWN is not used in the code.
Fixes: 5fc7197d3a25 ("net/mlx5: Add pci shutdown callback")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
There is an issue where the firmware fails during mlx5_load_one,
the health_care timer detects the issue and schedules a health_care call.
Then the mlx5_load_one detects the issue, cleans up and quits. Then
the health_care starts and calls mlx5_unload_one to clean up the resources
that no longer exist and causes kernel panic.
The root cause is that the bit MLX5_INTERFACE_STATE_DOWN is not set
after mlx5_load_one fails. The solution is removing the bit
MLX5_INTERFACE_STATE_DOWN and quit mlx5_unload_one if the
bit MLX5_INTERFACE_STATE_UP is not set. The bit MLX5_INTERFACE_STATE_DOWN
is redundant and we can use MLX5_INTERFACE_STATE_UP instead.
Fixes: 5fc7197d3a25 ("net/mlx5: Add pci shutdown callback")
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
|
|
Pull NVMe fixes from Christoph:
"Three more fixes for 4.13 below:
- fix the incorrect bit for the doorbell buffer features (Changpeng Liu)
- always use a 4k MR page size for RDMA, to not get in trouble with
offset in non-4k page size systems (no-op for x86) (Max Gurtovoy)
- and a fix for the new nvme host memory buffer support to keep the
descriptor list DMA mapped when the buffer is enabled (me)"
|
|
NVMe 1.3 specification defines the Optional Admin Command Support feature
flags, bit 8 set to '1' then the controller supports the Doorbell Buffer
Config command. Bit 7 is used for Virtualization Mangement command.
Signed-off-by: Changpeng Liu <changpeng.liu@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Fixes: f9f38e33 ("nvme: improve performance for virtual NVMe devices")
Cc: stable@vger.kernel.org
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata
Pull libata fixes from Tejun Heo:
"Late fixes for libata. There's a minor platform driver fix but the
important one is READ LOG PAGE.
This is a new ATA command which is used to test some optional features
but it broke probing of some devices - they locked up instead of
failing the unknown command.
Christoph tried blacklisting, but, after finding out there are
multiple devices which fail this way, backed off to testing feature
bit in IDENTIFY data first, which is a bit lossy (we can miss features
on some devices) but should be a lot safer"
* 'for-4.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
Revert "libata: quirk read log on no-name M.2 SSD"
libata: check for trusted computing in IDENTIFY DEVICE data
libata: quirk read log on no-name M.2 SSD
sata: ahci-da850: Fix some error handling paths in 'ahci_da850_probe()'
|
|
This reverts commit 35f0b6a779b8b7a98faefd7c1c660b4dac9a5c26.
We now conditionalize issuing of READ LOG PAGE on the TRUSTED
COMPUTING SUPPORTED bit in the identity data and this shouldn't be
necessary.
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
ATA-8 and later mirrors the TRUSTED COMPUTING SUPPORTED bit in word 48 of
the IDENTIFY DEVICE data. Check this before issuing a READ LOG PAGE
command to avoid issues with buggy devices. The only downside is that
we can't support Security Send / Receive for a device with an older
revision due to the conflicting use of this field in earlier
specifications.
tj: The reason we need this is because some devices which don't
support READ LOG PAGE lock up after getting issued that command.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
When running perf on the ftrace:function tracepoint, there is a bug
which can be reproduced by:
perf record -e ftrace:function -a sleep 20 &
perf record -e ftrace:function ls
perf script
ls 10304 [005] 171.853235: ftrace:function:
perf_output_begin
ls 10304 [005] 171.853237: ftrace:function:
perf_output_begin
ls 10304 [005] 171.853239: ftrace:function:
task_tgid_nr_ns
ls 10304 [005] 171.853240: ftrace:function:
task_tgid_nr_ns
ls 10304 [005] 171.853242: ftrace:function:
__task_pid_nr_ns
ls 10304 [005] 171.853244: ftrace:function:
__task_pid_nr_ns
We can see that all the function traces are doubled.
The problem is caused by the inconsistency of the register
function perf_ftrace_event_register() with the probe function
perf_ftrace_function_call(). The former registers one probe
for every perf_event. And the latter handles all perf_events
on the current cpu. So when two perf_events on the current cpu,
the traces of them will be doubled.
So this patch adds an extra parameter "event" for perf_tp_event,
only send sample data to this event when it's not NULL.
Signed-off-by: Zhou Chengming <zhouchengming1@huawei.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: huawei.libin@huawei.com
Link: http://lkml.kernel.org/r/1503668977-12526-1-git-send-email-zhouchengming1@huawei.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Ido reported that reading the log page on his systems fails,
so quirk it as it won't support ZBC or security protocols.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
|
|
Using the same rate limiting state for different kinds of messages
is wrong because this can cause a high frequency message to suppress
a report of a low frequency message. Hence use a unique rate limiting
state per message type.
Fixes: 71a16736a15e ("dm: use local printk ratelimit")
Cc: stable@vger.kernel.org
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fix from Joerg Roedel:
"Another fix, this time in common IOMMU sysfs code.
In the conversion from the old iommu sysfs-code to the
iommu_device_register interface, I missed to update the release path
for the struct device associated with an IOMMU. It freed the 'struct
device', which was a pointer before, but is now embedded in another
struct.
Freeing from the middle of allocated memory had all kinds of nasty
side effects when an IOMMU was unplugged. Unfortunatly nobody
unplugged and IOMMU until now, so this was not discovered earlier. The
fix is to make the 'struct device' a pointer again"
* tag 'iommu-fixes-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Fix wrong freeing of iommu_device->dev
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/iio fixes from Greg KH:
"Here are few small staging driver fixes, and some more IIO driver
fixes for 4.13-rc7. Nothing major, just resolutions for some reported
problems.
All of these have been in linux-next with no reported problems"
* tag 'staging-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: magnetometer: st_magn: remove ihl property for LSM303AGR
iio: magnetometer: st_magn: fix status register address for LSM303AGR
iio: hid-sensor-trigger: Fix the race with user space powering up sensors
iio: trigger: stm32-timer: fix get trigger mode
iio: imu: adis16480: Fix acceleration scale factor for adis16480
PATCH] iio: Fix some documentation warnings
staging: rtl8188eu: add RNX-N150NUB support
Revert "staging: fsl-mc: be consistent when checking strcmp() return"
iio: adc: stm32: fix common clock rate
iio: adc: ina219: Avoid underflow for sleeping time
iio: trigger: stm32-timer: add enable attribute
iio: trigger: stm32-timer: fix get/set down count direction
iio: trigger: stm32-timer: fix write_raw return value
iio: trigger: stm32-timer: fix quadrature mode get routine
iio: bmp280: properly initialize device for humidity reading
|
|
We have a MAX_LFS_FILESIZE macro that is meant to be filled in by
filesystems (and other IO targets) that know they are 64-bit clean and
don't have any 32-bit limits in their IO path.
It turns out that our 32-bit value for that limit was bogus. On 32-bit,
the VM layer is limited by the page cache to only 32-bit index values,
but our logic for that was confusing and actually wrong. We used to
define that value to
(((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)
which is actually odd in several ways: it limits the index to 31 bits,
and then it limits files so that they can't have data in that last byte
of a page that has the highest 31-bit index (ie page index 0x7fffffff).
Neither of those limitations make sense. The index is actually the full
32 bit unsigned value, and we can use that whole full page. So the
maximum size of the file would logically be "PAGE_SIZE << BITS_PER_LONG".
However, we do wan tto avoid the maximum index, because we have code
that iterates over the page indexes, and we don't want that code to
overflow. So the maximum size of a file on a 32-bit host should
actually be one page less than the full 32-bit index.
So the actual limit is ULONG_MAX << PAGE_SHIFT. That means that we will
not actually be using the page of that last index (ULONG_MAX), but we
can grow a file up to that limit.
The wrong value of MAX_LFS_FILESIZE actually caused problems for Doug
Nazar, who was still using a 32-bit host, but with a 9.7TB 2 x RAID5
volume. It turns out that our old MAX_LFS_FILESIZE was 8TiB (well, one
byte less), but the actual true VM limit is one page less than 16TiB.
This was invisible until commit c2a9737f45e2 ("vfs,mm: fix a dead loop
in truncate_inode_pages_range()"), which started applying that
MAX_LFS_FILESIZE limit to block devices too.
NOTE! On 64-bit, the page index isn't a limiter at all, and the limit is
actually just the offset type itself (loff_t), which is signed. But for
clarity, on 64-bit, just use the maximum signed value, and don't make
people have to count the number of 'f' characters in the hex constant.
So just use LLONG_MAX for the 64-bit case. That was what the value had
been before too, just written out as a hex constant.
Fixes: c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
Reported-and-tested-by: Doug Nazar <nazard@nazar.ca>
Cc: Andreas Dilger <adilger@dilger.ca>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Dave Kleikamp <shaggy@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull block fixes from Jens Axboe:
"A small batch of fixes that should be included for the 4.13 release.
This contains:
- Revert of the 4k loop blocksize support. Even with a recent batch
of 4 fixes, we're still not really happy with it. Rather than be
stuck with an API issue, let's revert it and get it right for 4.14.
- Trivial patch from Bart, adding a few flags to the blk-mq debugfs
exports that were added in this release, but not to the debugfs
parts.
- Regression fix for bsg, fixing a potential kernel panic. From
Benjamin.
- Tweak for the blk throttling, improving how we account discards.
From Shaohua"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq-debugfs: Add names for recently added flags
bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer
Revert "loop: support 4k physical blocksize"
blk-throttle: cap discard request size
|
|
The implementation of TIOCGPTPEER has two issues.
When /dev/ptmx (as opposed to /dev/pts/ptmx) is opened the wrong
vfsmount is passed to dentry_open. Which results in the kernel displaying
the wrong pathname for the peer.
The second is simply by caching the vfsmount and dentry of the peer it leaves
them open, in a way they were not previously Which because of the inreased
reference counts can cause unnecessary behaviour differences resulting in
regressions.
To fix these move the ioctl into tty_io.c at a generic level allowing
the ioctl to have access to the struct file on which the ioctl is
being called. This allows the path of the slave to be derived when
opening the slave through TIOCGPTPEER instead of requiring the path to
the slave be cached. Thus removing the need for caching the path.
A new function devpts_ptmx_path is factored out of devpts_acquire and
used to implement a function devpts_mntget. The new function devpts_mntget
takes a filp to perform the lookup on and fsi so that it can confirm
that the superblock that is found by devpts_ptmx_path is the proper superblock.
v2: Lots of fixes to make the code actually work
v3: Suggestions by Linus
- Removed the unnecessary initialization of filp in ptm_open_peer
- Simplified devpts_ptmx_path as gotos are no longer required
[ This is the fix for the issue that was reverted in commit
143c97cc6529, but this time without breaking 'pbuilder' due to
increased reference counts - Linus ]
Fixes: 54ebbfb16034 ("tty: add TIOCGPTPEER ioctl")
Reported-by: Christian Brauner <christian.brauner@canonical.com>
Reported-and-tested-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Since we split the scsi_request out of struct request bsg fails to
provide a reply-buffer for the drivers. This was done via the pointer
for sense-data, that is not preallocated anymore.
Failing to allocate/assign it results in illegal dereferences because
LLDs use this pointer unquestioned.
An example panic on s390x, using the zFCP driver, looks like this (I had
debugging on, otherwise NULL-pointer dereferences wouldn't even panic on
s390x):
Unable to handle kernel pointer dereference in virtual kernel address space
Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6403
Fault in home space mode while using kernel ASCE.
AS:0000000001590007 R3:0000000000000024
Oops: 0038 ilc:2 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Modules linked in: <Long List>
CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-bsg-regression+ #3
Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
task: 0000000065cb0100 task.stack: 0000000065cb4000
Krnl PSW : 0704e00180000000 000003ff801e4156 (zfcp_fc_ct_els_job_handler+0x16/0x58 [zfcp])
R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
Krnl GPRS: 0000000000000001 000000005fa9d0d0 000000005fa9d078 0000000000e16866
000003ff00000290 6b6b6b6b6b6b6b6b 0000000059f78f00 000000000000000f
00000000593a0958 00000000593a0958 0000000060d88800 000000005ddd4c38
0000000058b50100 07000000659cba08 000003ff801e8556 00000000659cb9a8
Krnl Code: 000003ff801e4146: e31020500004 lg %r1,80(%r2)
000003ff801e414c: 58402040 l %r4,64(%r2)
#000003ff801e4150: e35020200004 lg %r5,32(%r2)
>000003ff801e4156: 50405004 st %r4,4(%r5)
000003ff801e415a: e54c50080000 mvhi 8(%r5),0
000003ff801e4160: e33010280012 lt %r3,40(%r1)
000003ff801e4166: a718fffb lhi %r1,-5
000003ff801e416a: 1803 lr %r0,%r3
Call Trace:
([<000003ff801e8556>] zfcp_fsf_req_complete+0x726/0x768 [zfcp])
[<000003ff801ea82a>] zfcp_fsf_reqid_check+0x102/0x180 [zfcp]
[<000003ff801eb980>] zfcp_qdio_int_resp+0x230/0x278 [zfcp]
[<00000000009b91b6>] qdio_kick_handler+0x2ae/0x2c8
[<00000000009b9e3e>] __tiqdio_inbound_processing+0x406/0xc10
[<00000000001684c2>] tasklet_action+0x15a/0x1d8
[<0000000000bd28ec>] __do_softirq+0x3ec/0x848
[<00000000001675a4>] irq_exit+0x74/0xf8
[<000000000010dd6a>] do_IRQ+0xba/0xf0
[<0000000000bd19e8>] io_int_handler+0x104/0x2d4
[<00000000001033b6>] enabled_wait+0xb6/0x188
([<000000000010339e>] enabled_wait+0x9e/0x188)
[<000000000010396a>] arch_cpu_idle+0x32/0x50
[<0000000000bd0112>] default_idle_call+0x52/0x68
[<00000000001cd0fa>] do_idle+0x102/0x188
[<00000000001cd41e>] cpu_startup_entry+0x3e/0x48
[<0000000000118c64>] smp_start_secondary+0x11c/0x130
[<0000000000bd2016>] restart_int_handler+0x62/0x78
[<0000000000000000>] (null)
INFO: lockdep is turned off.
Last Breaking-Event-Address:
[<000003ff801e41d6>] zfcp_fc_ct_job_handler+0x3e/0x48 [zfcp]
Kernel panic - not syncing: Fatal exception in interrupt
This patch moves bsg-lib to allocate and setup struct bsg_job ahead of
time, including the allocation of a buffer for the reply-data.
This means, struct bsg_job is not allocated separately anymore, but as part
of struct request allocation - similar to struct scsi_cmd. Reflect this in
the function names that used to handle creation/destruction of struct
bsg_job.
Reported-by: Steffen Maier <maier@linux.vnet.ibm.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Fixes: 82ed4db499b8 ("block: split scsi_request out of struct request")
Cc: <stable@vger.kernel.org> #4.11+
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Rename skb_pad() into __skb_pad() and make it take a third argument:
free_on_error which controls whether kfree_skb() should be called or
not, skb_pad() directly makes use of it and passes true to preserve its
existing behavior. Do exactly the same thing with __skb_put_padto() and
skb_put_padto().
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Woojung Huh <Woojung.Huh@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit c8c03f1858331e85d397bacccd34ef409aae993c.
It turns out that while fixing the ptmx file descriptor to have the
correct 'struct path' to the associated slave pty is a really good
thing, it breaks some user space tools for a very annoying reason.
The problem is that /dev/ptmx and its associated slave pty (/dev/pts/X)
are on different mounts. That was what caused us to have the wrong path
in the first place (we would mix up the vfsmount of the 'ptmx' node,
with the dentry of the pty slave node), but it also means that now while
we use the right vfsmount, having the pty master open also keeps the pts
mount busy.
And it turn sout that that makes 'pbuilder' very unhappy, as noted by
Stefan Lippers-Hollmann:
"This patch introduces a regression for me when using pbuilder
0.228.7[2] (a helper to build Debian packages in a chroot and to
create and update its chroots) when trying to umount /dev/ptmx (inside
the chroot) on Debian/ unstable (full log and pbuilder configuration
file[3] attached).
[...]
Setting up build-essential (12.3) ...
Processing triggers for libc-bin (2.24-15) ...
I: unmounting dev/ptmx filesystem
W: Could not unmount dev/ptmx: umount: /var/cache/pbuilder/build/1340/dev/ptmx: target is busy
(In some cases useful info about processes that
use the device is found by lsof(8) or fuser(1).)"
apparently pbuilder tries to unmount the /dev/pts filesystem while still
holding at least one master node open, which is arguably not very nice,
but we don't break user space even when fixing other bugs.
So this commit has to be reverted.
I'll try to figure out a way to avoid caching the path to the slave pty
in the master pty. The only thing that actually wants that slave pty
path is the "TIOCGPTPEER" ioctl, and I think we could just recreate the
path at that time.
Reported-by: Stefan Lippers-Hollmann <s.l-h@gmx.de>
Cc: Eric W Biederman <ebiederm@xmission.com>
Cc: Christian Brauner <christian.brauner@canonical.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull networking fixes from David Miller:
1) Fix IGMP handling wrt VRF, from David Ahern.
2) Fix timer access to freed object in dccp, from Eric Dumazet.
3) Use kmalloc_array() in ptr_ring to avoid overflow cases which are
triggerable by userspace. Also from Eric Dumazet.
4) Fix infinite loop in unmapping cleanup of nfp driver, from Colin Ian
King.
5) Correct datagram peek handling of empty SKBs, from Matthew Dawson.
6) Fix use after free in TIPC, from Eric Dumazet.
7) When replacing a route in ipv6 we need to reset the round robin
pointer, from Wei Wang.
8) Fix bug in pci_find_pcie_root_port() which was unearthed by the
relaxed ordering changes, from Thierry Redding. I made sure to get
an explicit ACK from Bjorn this time around :-)
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (27 commits)
ipv6: repair fib6 tree in failure case
net_sched: fix order of queue length updates in qdisc_replace()
tools lib bpf: improve warning
switchdev: documentation: minor typo fixes
bpf, doc: also add s390x as arch to sysctl description
net: sched: fix NULL pointer dereference when action calls some targets
rxrpc: Fix oops when discarding a preallocated service call
irda: do not leak initialized list.dev to userspace
net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled
PCI: Allow PCI express root ports to find themselves
tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
net: check and errout if res->fi is NULL when RTM_F_FIB_MATCH is set
ipv6: reset fn->rr_ptr when replacing route
sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
tipc: fix use-after-free
tun: handle register_netdevice() failures properly
datagram: When peeking datagrams with offset < 0 don't skip empty skbs
bpf, doc: improve sysctl knob description
netxen: fix incorrect loop counter decrement
nfp: fix infinite loop on umapping cleanup
...
|
|
This was reported many times, and this was even mentioned in commit
52ee2dfdd4f5 ("pids: refactor vnr/nr_ns helpers to make them safe") but
somehow nobody bothered to fix the obvious problem: task_tgid_nr_ns() is
not safe because task->group_leader points to nowhere after the exiting
task passes exit_notify(), rcu_read_lock() can not help.
We really need to change __unhash_process() to nullify group_leader,
parent, and real_parent, but this needs some cleanups. Until then we
can turn task_tgid_nr_ns() into another user of __task_pid_nr_ns() and
fix the problem.
Reported-by: Troy Kensinger <tkensinger@google.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus
Jonathan writes:
Second set of IIO fixes for the 4.13 cycle.
Given the late stage of this series, some more involved fixes have been
held back for the upcoming merge window.
The hid-sensor issue has been causing problems for a long time so it
is great to have that one finally fixed! No more bug reports for the
userspace guys (well about that anyway).
* documentation
- some warning fixes due to missing colons in kernel-doc.
* adis16480
- fix accel scale factor.
* bmp280
- properly initialize the device for humidity readings - without this
the humidity readings may be skipped and a magic value of 0x8000 returned.
* hid-sensor-strigger
- fix a race with user space when powering up the sensor.
* ina291
- Avoid an underflow for the sleeping time as a result of supporting the
fastest rates.
* st-magnetometer
- Fix the status register address for hte LSM303AGR,
- Remove the ihl property for LSM303AGR as the sensor doesn't support
active low for the dataready line.
* stm32-adc
- Fix use of a common clock rate.
* stm32-timer
- fix the quadrature mode get routine to account for the magic 0 value.
set on boot.
- fix the return value of write_raw,
- fix the get/set down count direction as the enum value was not being
converted to the relevant bit field,
- add an enable attribute to actually turn it on when in encoder mode,
- missing mask when reading the trigger mode.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"Two fixes for the perf subsystem:
- Fix an inconsistency of RDPMC mm struct tagging across exec() which
causes RDPMC to fault.
- Correct the timestamp mechanics across IOC_DISABLE/ENABLE which
causes incorrect timestamps and total time calculations"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix time on IOC_ENABLE
perf/x86: Fix RDPMC vs. mm_struct tracking
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull watchdog fix from Thomas Gleixner:
"A fix for the hardlockup watchdog to prevent false positives with
extreme Turbo-Modes which make the perf/NMI watchdog fire faster than
the hrtimer which is used to verify.
Slightly larger than the minimal fix, which just would increase the
hrtimer frequency, but comes with extra overhead of more watchdog
timer interrupts and thread wakeups for all users.
With this change we restrict the overhead to the extreme Turbo-Mode
systems"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kernel/watchdog: Prevent false positives with turbo modes
|
|
The kerneldoc description for the trig_readonly field of struct iio_dev
lacked a colon, leading to this doc build warning:
./include/linux/iio/iio.h:603: warning: No description found for parameter 'trig_readonly'
A similar issue for iio_trigger_set_immutable() in trigger.h yielded:
./include/linux/iio/trigger.h:151: warning: No description found for parameter 'indio_dev'
./include/linux/iio/trigger.h:151: warning: No description found for parameter 'trig'
Fix the formatting and silence the warnings.
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
|
|
Wenwei Tao has noticed that our current assumption that the oom victim
is dying and never doing any visible changes after it dies, and so the
oom_reaper can tear it down, is not entirely true.
__task_will_free_mem consider a task dying when SIGNAL_GROUP_EXIT is set
but do_group_exit sends SIGKILL to all threads _after_ the flag is set.
So there is a race window when some threads won't have
fatal_signal_pending while the oom_reaper could start unmapping the
address space. Moreover some paths might not check for fatal signals
before each PF/g-u-p/copy_from_user.
We already have a protection for oom_reaper vs. PF races by checking
MMF_UNSTABLE. This has been, however, checked only for kernel threads
(use_mm users) which can outlive the oom victim. A simple fix would be
to extend the current check in handle_mm_fault for all tasks but that
wouldn't be sufficient because the current check assumes that a kernel
thread would bail out after EFAULT from get_user*/copy_from_user and
never re-read the same address which would succeed because the PF path
has established page tables already. This seems to be the case for the
only existing use_mm user currently (virtio driver) but it is rather
fragile in general.
This is even more fragile in general for more complex paths such as
generic_perform_write which can re-read the same address more times
(e.g. iov_iter_copy_from_user_atomic to fail and then
iov_iter_fault_in_readable on retry).
Therefore we have to implement MMF_UNSTABLE protection in a robust way
and never make a potentially corrupted content visible. That requires
to hook deeper into the PF path and check for the flag _every time_
before a pte for anonymous memory is established (that means all
!VM_SHARED mappings).
The corruption can be triggered artificially
(http://lkml.kernel.org/r/201708040646.v746kkhC024636@www262.sakura.ne.jp)
but there doesn't seem to be any real life bug report. The race window
should be quite tight to trigger most of the time.
Link: http://lkml.kernel.org/r/20170807113839.16695-3-mhocko@kernel.org
Fixes: aac453635549 ("mm, oom: introduce oom reaper")
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Wenwei Tao <wenwei.tww@alibaba-inc.com>
Tested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There is existing use after free bug when deferred struct pages are
enabled:
The memblock_add() allocates memory for the memory array if more than
128 entries are needed. See comment in e820__memblock_setup():
* The bootstrap memblock region count maximum is 128 entries
* (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries
* than that - so allow memblock resizing.
This memblock memory is freed here:
free_low_memory_core_early()
We access the freed memblock.memory later in boot when deferred pages
are initialized in this path:
deferred_init_memmap()
for_each_mem_pfn_range()
__next_mem_pfn_range()
type = &memblock.memory;
One possible explanation for why this use-after-free hasn't been hit
before is that the limit of INIT_MEMBLOCK_REGIONS has never been
exceeded at least on systems where deferred struct pages were enabled.
Tested by reducing INIT_MEMBLOCK_REGIONS down to 4 from the current 128,
and verifying in qemu that this code is getting excuted and that the
freed pages are sane.
Link: http://lkml.kernel.org/r/1502485554-318703-2-git-send-email-pasha.tatashin@oracle.com
Fixes: 7e18adb4f80b ("mm: meminit: initialise remaining struct pages in parallel with kswapd")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
These are the few pending fixes I have queued up for v4.13-final. One
is a a generic regression fix for recursive loops on kmod and the other
one is a trivial print out correction.
During the v4.13 development we assumed that recursive kmod loops were
no longer possible. Clearly that is not true. The regression fix makes
use of a new killable wait. We use a killable wait to be paranoid in
how signals might be sent to modprobe and only accept a proper SIGKILL.
The signal will only be available to userspace to issue *iff* a thread
has already entered a wait state, and that happens only if we've already
throttled after 50 kmod threads have been hit.
Note that although it may seem excessive to trigger a failure afer 5
seconds if all kmod thread remain busy, prior to the series of changes
that went into v4.13 we would actually *always* fatally fail any request
which came in if the limit was already reached. The new waiting
implemented in v4.13 actually gives us *more* breathing room -- the wait
for 5 seconds is a wait for *any* kmod thread to finish. We give up and
fail *iff* no kmod thread has finished and they're *all* running
straight for 5 consecutive seconds. If 50 kmod threads are running
consecutively for 5 seconds something else must be really bad.
Recursive loops with kmod are bad but they're also hard to implement
properly as a selftest without currently fooling current userspace tools
like kmod [1]. For instance kmod will complain when you run depmod if
it finds a recursive loop with symbol dependency between modules as such
this type of recursive loop cannot go upstream as the modules_install
target will fail after running depmod.
These tests already exist on userspace kmod upstream though (refer to
the testsuite/module-playground/mod-loop-*.c files). The same is not
true if request_module() is used though, or worst if aliases are used.
Likewise the issue with 64-bit kernels booting 32-bit userspace without
a binfmt handler built-in is also currently not detected and proactively
avoided by userspace kmod tools, or kconfig for all architectures.
Although we could complain in the kernel when some of these individual
recursive issues creep up, proactively avoiding these situations in
userspace at build time is what we should keep striving for.
Lastly, since recursive loops could happen with kmod it may mean
recursive loops may also be possible with other kernel usermode helpers,
this should be investigated and long term if we can come up with a more
sensible generic solution even better!
[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20170809-kmod-for-v4.13-final
[1] https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git
This patch (of 3):
This wait is similar to wait_event_interruptible_timeout() but only
accepts SIGKILL interrupt signal. Other signals are ignored.
Link: http://lkml.kernel.org/r/20170809234635.13443-2-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Matt Redfearn <matt.redfearn@imgetc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Jaegeuk and Brad report a NULL pointer crash when writeback ending tries
to update the memcg stats:
BUG: unable to handle kernel NULL pointer dereference at 00000000000003b0
IP: test_clear_page_writeback+0x12e/0x2c0
[...]
RIP: 0010:test_clear_page_writeback+0x12e/0x2c0
Call Trace:
<IRQ>
end_page_writeback+0x47/0x70
f2fs_write_end_io+0x76/0x180 [f2fs]
bio_endio+0x9f/0x120
blk_update_request+0xa8/0x2f0
scsi_end_request+0x39/0x1d0
scsi_io_completion+0x211/0x690
scsi_finish_command+0xd9/0x120
scsi_softirq_done+0x127/0x150
__blk_mq_complete_request_remote+0x13/0x20
flush_smp_call_function_queue+0x56/0x110
generic_smp_call_function_single_interrupt+0x13/0x30
smp_call_function_single_interrupt+0x27/0x40
call_function_single_interrupt+0x89/0x90
RIP: 0010:native_safe_halt+0x6/0x10
(gdb) l *(test_clear_page_writeback+0x12e)
0xffffffff811bae3e is in test_clear_page_writeback (./include/linux/memcontrol.h:619).
614 mod_node_page_state(page_pgdat(page), idx, val);
615 if (mem_cgroup_disabled() || !page->mem_cgroup)
616 return;
617 mod_memcg_state(page->mem_cgroup, idx, val);
618 pn = page->mem_cgroup->nodeinfo[page_to_nid(page)];
619 this_cpu_add(pn->lruvec_stat->count[idx], val);
620 }
621
622 unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
623 gfp_t gfp_mask,
The issue is that writeback doesn't hold a page reference and the page
might get freed after PG_writeback is cleared (and the mapping is
unlocked) in test_clear_page_writeback(). The stat functions looking up
the page's node or zone are safe, as those attributes are static across
allocation and free cycles. But page->mem_cgroup is not, and it will
get cleared if we race with truncation or migration.
It appears this race window has been around for a while, but less likely
to trigger when the memcg stats were updated first thing after
PG_writeback is cleared. Recent changes reshuffled this code to update
the global node stats before the memcg ones, though, stretching the race
window out to an extent where people can reproduce the problem.
Update test_clear_page_writeback() to look up and pin page->mem_cgroup
before clearing PG_writeback, then not use that pointer afterward. It
is a partial revert of 62cccb8c8e7a ("mm: simplify lock_page_memcg()")
but leaves the pageref-holding callsites that aren't affected alone.
Link: http://lkml.kernel.org/r/20170809183825.GA26387@cmpxchg.org
Fixes: 62cccb8c8e7a ("mm: simplify lock_page_memcg()")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Jaegeuk Kim <jaegeuk@kernel.org>
Tested-by: Jaegeuk Kim <jaegeuk@kernel.org>
Reported-by: Bradley Bolen <bradleybolen@gmail.com>
Tested-by: Brad Bolen <bradleybolen@gmail.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: <stable@vger.kernel.org> [4.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The hardlockup detector on x86 uses a performance counter based on unhalted
CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the
performance counter period, so the hrtimer should fire 2-3 times before the
performance counter NMI fires. The NMI code checks whether the hrtimer
fired since the last invocation. If not, it assumess a hard lockup.
The calculation of those periods is based on the nominal CPU
frequency. Turbo modes increase the CPU clock frequency and therefore
shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x
nominal frequency) the perf/NMI period is shorter than the hrtimer period
which leads to false positives.
A simple fix would be to shorten the hrtimer period, but that comes with
the side effect of more frequent hrtimer and softlockup thread wakeups,
which is not desired.
Implement a low pass filter, which checks the perf/NMI period against
kernel time. If the perf/NMI fires before 4/5 of the watchdog period has
elapsed then the event is ignored and postponed to the next perf/NMI.
That solves the problem and avoids the overhead of shorter hrtimer periods
and more frequent softlockup thread wakeups.
Fixes: 58687acba592 ("lockup_detector: Combine nmi_watchdog and softlockup detector")
Reported-and-tested-by: Kan Liang <Kan.liang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: dzickus@redhat.com
Cc: prarit@redhat.com
Cc: ak@linux.intel.com
Cc: babu.moger@oracle.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: acme@redhat.com
Cc: stable@vger.kernel.org
Cc: atomlin@redhat.com
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos
|
|
Christian Brauner reported that if you use the TIOCGPTPEER ioctl() to
get a slave pty file descriptor, the resulting file descriptor doesn't
look right in /proc/<pid>/fd/<fd>. In particular, he wanted to use
readlink() on /proc/self/fd/<fd> to get the pathname of the slave pty
(basically implementing "ptsname{_r}()").
The reason for that was that we had generated the wrong 'struct path'
when we create the pty in ptmx_open().
In particular, the dentry was correct, but the vfsmount pointed to the
mount of the ptmx node. That _can_ be correct - in case you use
"/dev/pts/ptmx" to open the master - but usually is not. The normal
case is to use /dev/ptmx, which then looks up the pts/ directory, and
then the vfsmount of the ptmx node is obviously the /dev directory, not
the /dev/pts/ directory.
We actually did have the right vfsmount available, but in the wrong
place (it gets looked up in 'devpts_acquire()' when we get a reference
to the pts filesystem), and so ptmx_open() used the wrong mnt pointer.
The end result of this confusion was that the pty worked fine, but when
if you did TIOCGPTPEER to get the slave side of the pty, end end result
would also work, but have that dodgy 'struct path'.
And then when doing "d_path()" on to get the pathname, the vfsmount
would not match the root of the pts directory, and d_path() would return
an empty pathname thinking that the entry had escaped a bind mount into
another mount.
This fixes the problem by making devpts_acquire() return the vfsmount
for the pts filesystem, allowing ptmx_open() to trivially just use the
right mount for the pts dentry, and create the proper 'struct path'.
Reported-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
As found by syzkaller, malicious users can set whatever tx_queue_len
on a tun device and eventually crash the kernel.
Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small
ring buffer is not fast anyway.
Fixes: 2e0ab8ca83c1 ("ptr_ring: array based FIFO for pointers")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Pull networking fixes from David Miller:
1) Fix TCP checksum offload handling in iwlwifi driver, from Emmanuel
Grumbach.
2) In ksz DSA tagging code, free SKB if skb_put_padto() fails. From
Vivien Didelot.
3) Fix two regressions with bonding on wireless, from Andreas Born.
4) Fix build when busypoll is disabled, from Daniel Borkmann.
5) Fix copy_linear_skb() wrt. SO_PEEK_OFF, from Eric Dumazet.
6) Set SKB cached route properly in inet_rtm_getroute(), from Florian
Westphal.
7) Fix PCI-E relaxed ordering handling in cxgb4 driver, from Ding
Tianhong.
8) Fix module refcnt leak in ULP code, from Sabrina Dubroca.
9) Fix use of GFP_KERNEL in atomic contexts in AF_KEY code, from Eric
Dumazet.
10) Need to purge socket write queue in dccp_destroy_sock(), also from
Eric Dumazet.
11) Make bpf_trace_printk() work properly on 32-bit architectures, from
Daniel Borkmann.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
bpf: fix bpf_trace_printk on 32 bit archs
PCI: fix oops when try to find Root Port for a PCI device
sfc: don't try and read ef10 data on non-ef10 NIC
net_sched: remove warning from qdisc_hash_add
net_sched/sfq: update hierarchical backlog when drop packet
net_sched: reset pointers to tcf blocks in classful qdiscs' destructors
ipv4: fix NULL dereference in free_fib_info_rcu()
net: Fix a typo in comment about sock flags.
ipv6: fix NULL dereference in ip6_route_dev_notify()
tcp: fix possible deadlock in TCP stack vs BPF filter
dccp: purge write queue in dccp_destroy_sock()
udp: fix linear skb reception with PEEK_OFF
ipv6: release rt6->rt6i_idev properly during ifdown
af_key: do not use GFP_KERNEL in atomic contexts
tcp: ulp: avoid module refcnt leak in tcp_set_ulp
net/cxgb4vf: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
net/cxgb4: Use new PCI_DEV_FLAGS_NO_RELAXED_ORDERING flag
PCI: Disable Relaxed Ordering Attributes for AMD A1100
PCI: Disable Relaxed Ordering for some Intel processors
PCI: Disable PCIe Relaxed Ordering if unsupported
...
|
|
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
The struct iommu_device has a 'struct device' embedded into
it, not as a pointer, but the whole struct. In the
conversion of the iommu drivers to use struct iommu_device
it was forgotten that the relase function for that struct
device simply calls kfree() on the pointer.
This frees memory that was never allocated and causes memory
corruption.
To fix this issue, use a pointer to struct device instead of
embedding the whole struct. This needs some updates in the
iommu sysfs code as well as the Intel VT-d and AMD IOMMU
driver.
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Fixes: 39ab9555c241 ('iommu: Add sysfs bindings for struct iommu_device')
Cc: stable@vger.kernel.org # >= v4.11
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
When bit4 is set in the PCIe Device Control register, it indicates
whether the device is permitted to use relaxed ordering.
On some platforms using relaxed ordering can have performance issues or
due to erratum can cause data-corruption. In such cases devices must avoid
using relaxed ordering.
The patch adds a new flag PCI_DEV_FLAGS_NO_RELAXED_ORDERING to indicate that
Relaxed Ordering (RO) attribute should not be used for Transaction Layer
Packets (TLP) targeted towards these affected root complexes.
This patch checks if there is any node in the hierarchy that indicates that
using relaxed ordering is not safe. In such cases the patch turns off the
relaxed ordering by clearing the capability for this device.
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Acked-by: Ashok Raj <ashok.raj@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are two tty serial driver fixes for 4.13-rc5. One is a revert of
a -rc1 patch that turned out to not be a good idea, and the other is a
fix for the pl011 serial driver.
Both have been in linux-next with no reported issues"
* tag 'tty-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "serial: Delete dead code for CIR serial ports"
tty: pl011: fix initialization order of QDF2400 E44
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/iio fixes from Greg KH:
"Here are some Staging and IIO driver fixes for 4.13-rc5.
Nothing major, just a number of small fixes for reported issues. All
of these have been in linux-next for a while now with no reported
issues. Full details are in the shortlog"
* tag 'staging-4.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: comedi: comedi_fops: do not call blocking ops when !TASK_RUNNING
iio: aspeed-adc: wait for initial sequence.
iio: accel: bmc150: Always restore device to normal mode after suspend-resume
staging:iio:resolver:ad2s1210 fix negative IIO_ANGL_VEL read
iio: adc: axp288: Fix the GPADC pin reading often wrongly returning 0
iio: adc: vf610_adc: Fix VALT selection value for REFSEL bits
iio: accel: st_accel: add SPI-3wire support
iio: adc: Revert "axp288: Drop bogus AXP288_ADC_TS_PIN_CTRL register modifications"
iio: adc: sun4i-gpadc-iio: fix unbalanced irq enable/disable
iio: pressure: st_pressure_core: disable multiread by default for LPS22HB
iio: light: tsl2563: use correct event code
|
|
Pull block fixes from Jens Axboe:
"A set of fixes that should go into this series. This contains:
- Fix from Bart for blk-mq requeue queue running, preventing a
continued loop of run/restart.
- Fix for a bio/blk-integrity issue, in two parts. One from
Christoph, fixing where verification happens, and one from Milan,
for a NULL profile.
- NVMe pull request, most of the changes being for nvme-fc, but also
a few trivial core/pci fixes"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme: fix directive command numd calculation
nvme: fix nvme reset command timeout handling
nvme-pci: fix CMB sysfs file removal in reset path
lpfc: support nvmet_fc defer_rcv callback
nvmet_fc: add defer_req callback for deferment of cmd buffer return
nvme: strip trailing 0-bytes in wwid_show
block: Make blk_mq_delay_kick_requeue_list() rerun the queue at a quiet time
bio-integrity: only verify integrity on the lowest stacked driver
bio-integrity: Fix regression if profile verify_fn is NULL
|