Age | Commit message (Collapse) | Author |
|
git://anongit.freedesktop.org/drm-intel into drm-next
- make modeset hw state checker atomic aware (Maarten)
- close races in gpu stuck detection/seqno reading (Chris)
- tons&tons of small improvements from Chris Wilson all over the gem code
- more dsi/bxt work from Ramalingam&Jani
- macro polish from Joonas
- guc fw loading fixes (Arun&Dave)
- vmap notifier (acked by Andrew) + i915 support by Chris Wilson
- create bottom half for execlist irq processing (Chris Wilson)
- vlv/chv pll cleanup (Ville)
- rework DP detection, especially sink detection (Shubhangi Shrivastava)
- make color manager support fully atomic (Maarten)
- avoid livelock on chv in execlist irq handler (Chris)
* tag 'drm-intel-next-2016-04-11' of git://anongit.freedesktop.org/drm-intel: (82 commits)
drm/i915: Update DRIVER_DATE to 20160411
drm/i915: Avoid allocating a vmap arena for a single page
drm,i915: Introduce drm_malloc_gfp()
drm/i915/shrinker: Restrict vmap purge to objects with vmaps
drm/i915: Refactor duplicate object vmap functions
drm/i915: Consolidate common error handling in intel_pin_and_map_ringbuffer_obj
drm/i915/dmabuf: Tighten struct_mutex for unmap_dma_buf
drm/i915: implement WaClearTdlStateAckDirtyBits
drm/i915/bxt: Reversed polarity of PORT_PLL_REF_SEL bit
drm/i915: Rename hw state checker to hw state verifier.
drm/i915: Move modeset state verifier calls.
drm/i915: Make modeset state verifier take crtc as argument.
drm/i915: Replace manual barrier() with READ_ONCE() in HWS accessor
drm/i915: Use simplest form for flushing the single cacheline in the HWS
drm/i915: Harden detection of missed interrupts
drm/i915: Separate out the seqno-barrier from engine->get_seqno
drm/i915: Remove forcewake dance from seqno/irq barrier on legacy gen6+
drm/i915: Fixup the free space logic in ring_prepare
drm/i915: Simplify check for idleness in hangcheck
drm/i915: Apply a mb between emitting the request and hangcheck
...
|
|
Backmerge 4.6-rc3 for i915.
Linux 4.6-rc3
|
|
Linux 4.6-rc3
Backmerge requested by Chris Wilson to make his patches apply cleanly.
Tiny conflict in vmalloc.c with the (properly acked and all) patch in
drm-intel-next:
commit 4da56b99d99e5a7df2b7f11e87bfea935f909732
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Mon Apr 4 14:46:42 2016 +0100
mm/vmap: Add a notifier for when we run out of vmap address space
and Linus' tree.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
|
|
Pull networking fixes from David Miller:
1) Stale SKB data pointer access across pskb_may_pull() calls in L2TP,
from Haishuang Yan.
2) Fix multicast frame handling in mac80211 AP code, from Felix
Fietkau.
3) mac80211 station hashtable insert errors not handled properly, fix
from Johannes Berg.
4) Fix TX descriptor count limit handling in e1000, from Alexander
Duyck.
5) Revert a buggy netdev refcount fix in netpoll, from Bjorn Helgaas.
6) Must assign rtnl_link_ops of the device before registering it, fix
in ip6_tunnel from Thadeu Lima de Souza Cascardo.
7) Memory leak fix in tc action net exit, from WANG Cong.
8) Add missing AF_KCM entries to name tables, from Dexuan Cui.
9) Fix regression in GRE handling of csums wrt. FOU, from Alexander
Duyck.
10) Fix memory allocation alignment and congestion map corruption in
RDS, from Shamir Rabinovitch.
11) Fix default qdisc regression in tuntap driver, from Jason Wang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
bridge, netem: mark mailing lists as moderated
tuntap: restore default qdisc
mpls: find_outdev: check for err ptr in addition to NULL check
ipv6: Count in extension headers in skb->network_header
RDS: fix congestion map corruption for PAGE_SIZE > 4k
RDS: memory allocated must be align to 8
GRE: Disable segmentation offloads w/ CSUM and we are encapsulated via FOU
net: add the AF_KCM entries to family name tables
MAINTAINERS: intel-wired-lan list is moderated
lib/test_bpf: Add additional BPF_ADD tests
lib/test_bpf: Add test to check for result of 32-bit add that overflows
lib/test_bpf: Add tests for unsigned BPF_JGT
lib/test_bpf: Fix JMP_JSET tests
VSOCK: Detach QP check should filter out non matching QPs.
stmmac: fix adjust link call in case of a switch is attached
af_packet: tone down the Tx-ring unsupported spew.
net_sched: fix a memory leak in tc action
samples/bpf: Enable powerpc support
samples/bpf: Use llc in PATH, rather than a hardcoded value
samples/bpf: Fix build breakage with map_perf_test_user.c
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu
Pull IOMMU fixes from Joerg Roedel:
- compile-time fixes (warnings and failures)
- a bug in iommu core code which could cause the group->domain pointer
to be falsly cleared
- fix in scatterlist handling of the ARM common DMA-API code
- stall detection fix for the Rockchip IOMMU driver
* tag 'iommu-fixes-v4.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/vt-d: Silence an uninitialized variable warning
iommu/rockchip: Fix "is stall active" check
iommu: Don't overwrite domain pointer when there is no default_domain
iommu/dma: Restore scatterlist offsets correctly
iommu: provide of_xlate pointer unconditionally
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bugfixes from Ted Ts'o:
"These changes contains a fix for overlayfs interacting with some
(badly behaved) dentry code in various file systems. These have been
reviewed by Al and the respective file system mtinainers and are going
through the ext4 tree for convenience.
This also has a few ext4 encryption bug fixes that were discovered in
Android testing (yes, we will need to get these sync'ed up with the
fs/crypto code; I'll take care of that). It also has some bug fixes
and a change to ignore the legacy quota options to allow for xfstests
regression testing of ext4's internal quota feature and to be more
consistent with how xfs handles this case"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: ignore quota mount options if the quota feature is enabled
ext4 crypto: fix some error handling
ext4: avoid calling dquot_get_next_id() if quota is not enabled
ext4: retry block allocation for failed DIO and DAX writes
ext4: add lockdep annotations for i_data_sem
ext4: allow readdir()'s of large empty directories to be interrupted
btrfs: fix crash/invalid memory access on fsync when using overlayfs
ext4 crypto: use dget_parent() in ext4_d_revalidate()
ext4: use file_dentry()
ext4: use dget_parent() in ext4_file_open()
nfs: use file_dentry()
fs: add file_dentry()
ext4 crypto: don't let data integrity writebacks fail with ENOMEM
ext4: check if in-inode xattr is corrupted in ext4_expand_extra_isize_ea()
|
|
This patch fixes an issue I found in which we were dropping frames if we
had enabled checksums on GRE headers that were encapsulated by either FOU
or GUE. Without this patch I was barely able to get 1 Gb/s of throughput.
With this patch applied I am now at least getting around 6 Gb/s.
The issue is due to the fact that with FOU or GUE applied we do not provide
a transport offset pointing to the GRE header, nor do we offload it in
software as the GRE header is completely skipped by GSO and treated like a
VXLAN or GENEVE type header. As such we need to prevent the stack from
generating it and also prevent GRE from generating it via any interface we
create.
Fixes: c3483384ee511 ("gro: Allow tunnel stacking in the case of FOU/GUE")
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
git://anongit.freedesktop.org/drm-intel into drm-next
* tag 'topic/drm-misc-2016-04-01' of git://anongit.freedesktop.org/drm-intel:
drm: Add new DCS commands in the enum list
drm: Make uapi headers C89 pendantic compliant
drm/atomic: export drm_atomic_helper_wait_for_fences()
drm: Untangle __KERNEL__ guards
drm: Move DRM_MODE_OBJECT_* to uapi headers
drm: align #include directives with libdrm in uapi headers
drm: Make drm.h uapi header safe for C++
vgacon: dummy implementation for vgacon_text_force
drm/sysfs: Nuke TV/DVI property files
drm/ttm: Remove TTM_HAS_AGP
drm: bridge/dw-hdmi: Remove pre_enable/post_disable dummy funcs
Revert "drm: Don't pass negative delta to ktime_sub_ns()"
drm/atmel: Fixup drm_connector_/unplug/unregister/_all
drm: Rename drm_connector_unplug_all() to drm_connector_unregister_all()
drm: bridge: Make (pre/post) enable/disable callbacks optional
|
|
Pull KVM fixes from Paolo Bonzini:
"Miscellaneous bugfixes.
The ARM and s390 fixes are for new regressions from the merge window,
others are usual stable material"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
compiler-gcc: disable -ftracer for __noclone functions
kvm: x86: make lapic hrtimer pinned
s390/mm/kvm: fix mis-merge in gmap handling
kvm: set page dirty only if page has been writable
KVM: x86: reduce default value of halt_poll_ns parameter
KVM: Hyper-V: do not do hypercall userspace exits if SynIC is disabled
KVM: x86: Inject pending interrupt even if pending nmi exist
arm64: KVM: Register CPU notifiers when the kernel runs at HYP
arm64: kvm: 4.6-rc1: Fix VTCR_EL2 VS setting
|
|
-ftracer can duplicate asm blocks causing compilation to fail in
noclone functions. For example, KVM declares a global variable
in an asm like
asm("2: ... \n
.pushsection data \n
.global vmx_return \n
vmx_return: .long 2b");
and -ftracer causes a double declaration.
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: stable@vger.kernel.org
Cc: kvm@vger.kernel.org
Reported-by: Linda Walsh <lkml@tlinx.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
iommu drivers that support the standard DT bindings use a of_xlate
callback pointer, but that is only part of struct iommu_ops when
CONFIG_OF_IOMMU is enabled, leading to build errors in randconfig
builds when that is not provided:
drivers/iommu/mtk_iommu.c:497:2: error: unknown field 'of_xlate' specified in initializer
.of_xlate = mtk_iommu_of_xlate,
^
drivers/iommu/mtk_iommu.c:497:14: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
.of_xlate = mtk_iommu_of_xlate,
^~~~~~~~~~~~~~~~~~
drivers/iommu/mtk_iommu.c:497:14: note: (near initialization for 'mtk_iommu_ops.domain_get_attr')
We can work around it by adding more #ifdefs in each driver, but
it seems nicer to just allow setting the pointer even if it is
unused. This makes the driver code look nicer, and it gives better
compile-time coverage when test building on other architectures.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 0df4fabe208d ("iommu/mediatek: Add mt8173 IOMMU driver")
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
|
|
vmaps are temporary kernel mappings that may be of long duration.
Reusing a vmap on an object is preferrable for a driver as the cost of
setting up the vmap can otherwise dominate the operation on the object.
However, the vmap address space is rather limited on 32bit systems and
so we add a notification for vmap pressure in order for the driver to
release any cached vmappings.
The interface is styled after the oom-notifier where the callees are
passed a pointer to an unsigned long counter for them to indicate if they
have freed any space.
v2: Guard the blocking notifier call with gfpflags_allow_blocking()
v3: Correct typo in forward declaration and move to head of file
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Roman Peniaev <r.peniaev@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Acked-by: Andrew Morton <akpm@linux-foundation.org> # for inclusion via DRM
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459777603-23618-3-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
|
|
Merge PAGE_CACHE_SIZE removal patches from Kirill Shutemov:
"PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
Let's stop pretending that pages in page cache are special. They are
not.
The first patch with most changes has been done with coccinelle. The
second is manual fixups on top.
The third patch removes macros definition"
[ I was planning to apply this just before rc2, but then I spaced out,
so here it is right _after_ rc2 instead.
As Kirill suggested as a possibility, I could have decided to only
merge the first two patches, and leave the old interfaces for
compatibility, but I'd rather get it all done and any out-of-tree
modules and patches can trivially do the converstion while still also
working with older kernels, so there is little reason to try to
maintain the redundant legacy model. - Linus ]
* PAGE_CACHE_SIZE-removal:
mm: drop PAGE_CACHE_* and page_cache_{get,release} definition
mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
|
|
All users gone. We can remove these macros.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Mostly direct substitution with occasional adjustment or removing
outdated comments.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core kernel fixes from Ingo Molnar:
"This contains the nohz/atomic cleanup/fix for the fetch_or() ugliness
you noted during the original nohz pull request, plus there's also
misc fixes:
- fix liblockdep build bug
- fix uapi header build bug
- print more lockdep hash collision info to help debug recent reports
of hash collisions
- update MAINTAINERS email address"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
MAINTAINERS: Update my email address
locking/lockdep: Print chain_key collision information
uapi/linux/stddef.h: Provide __always_inline to userspace headers
tools/lib/lockdep: Fix unsupported 'basename -s' in run_tests.sh
locking/atomic, sched: Unexport fetch_or()
timers/nohz: Convert tick dependency mask to atomic_t
locking/atomic: Introduce atomic_fetch_or()
|
|
Pull configfs fix from Christoph Hellwig:
"A trivial fix to the recently introduced binary attribute helper
macros"
* tag 'configfs-for-linus-2' of git://git.infradead.org/users/hch/configfs:
configfs: fix CONFIGFS_BIN_ATTR_[RW]O definitions
|
|
Pull networking fixes from David Miller:
1) Missing device reference in IPSEC input path results in crashes
during device unregistration. From Subash Abhinov Kasiviswanathan.
2) Per-queue ISR register writes not being done properly in macb
driver, from Cyrille Pitchen.
3) Stats accounting bugs in bcmgenet, from Patri Gynther.
4) Lightweight tunnel's TTL and TOS were swapped in netlink dumps, from
Quentin Armitage.
5) SXGBE driver has off-by-one in probe error paths, from Rasmus
Villemoes.
6) Fix race in save/swap/delete options in netfilter ipset, from
Vishwanath Pai.
7) Ageing time of bridge not set properly when not operating over a
switchdev device. Fix from Haishuang Yan.
8) Fix GRO regression wrt nested FOU/GUE based tunnels, from Alexander
Duyck.
9) IPV6 UDP code bumps wrong stats, from Eric Dumazet.
10) FEC driver should only access registers that actually exist on the
given chipset, fix from Fabio Estevam.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (73 commits)
net: mvneta: fix changing MTU when using per-cpu processing
stmmac: fix MDIO settings
Revert "stmmac: Fix 'eth0: No PHY found' regression"
stmmac: fix TX normal DESC
net: mvneta: use cache_line_size() to get cacheline size
net: mvpp2: use cache_line_size() to get cacheline size
net: mvpp2: fix maybe-uninitialized warning
tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter
net: usb: cdc_ncm: adding Telit LE910 V2 mobile broadband card
rtnl: fix msg size calculation in if_nlmsg_size()
fec: Do not access unexisting register in Coldfire
net: mvneta: replace MVNETA_CPU_D_CACHE_LINE_SIZE with L1_CACHE_BYTES
net: mvpp2: replace MVPP2_CPU_D_CACHE_LINE_SIZE with L1_CACHE_BYTES
net: dsa: mv88e6xxx: Clear the PDOWN bit on setup
net: dsa: mv88e6xxx: Introduce _mv88e6xxx_phy_page_{read, write}
bpf: make padding in bpf_tunnel_key explicit
ipv6: udp: fix UDP_MIB_IGNOREDMULTI updates
bnxt_en: Fix ethtool -a reporting.
bnxt_en: Fix typo in bnxt_hwrm_set_pause_common().
bnxt_en: Implement proper firmware message padding.
...
|
|
The return value of pmd_trans_huge_lock() is a pointer, not a boolean
value, so use NULL instead of false as the return value.
Signed-off-by: Chen Gang <gang.chen.5i5j@gmail.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Initially the phy_bus_name was added to manipulate the
driver name but it was recently just used to manage the
fixed-link and then to take some decision at run-time.
So the patch uses the is_pseudo_fixed_link and removes
the phy_bus_name variable not necessary anymore.
The driver can manage the mdio registration by using phy-handle,
dwmac-mdio and own parameter e.g. snps,phy-addr.
This patch takes care about all these possible configurations
and fixes the mdio registration in case of there is a real
transceiver or a switch (that needs to be managed by using
fixed-link).
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Tested-by: Frank Schäfer <fschaefer.oss@googlemail.com>
Cc: Gabriel Fernandez <gabriel.fernandez@linaro.org>
Cc: Dinh Nguyen <dinh.linux@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Phil Reid <preid@electromag.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This reverts commit 88f8b1bb41c6208f81b6a480244533ded7b59493.
due to problems on GeekBox and Banana Pi M1 board when
connected to a real transceiver instead of a switch via
fixed-link.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Cc: Gabriel Fernandez <gabriel.fernandez@linaro.org>
Cc: Andreas Färber <afaerber@suse.de>
Cc: Frank Schäfer <fschaefer.oss@googlemail.com>
Cc: Dinh Nguyen <dinh.linux@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Sasha Levin reported a suspicious rcu_dereference_protected() warning
found while fuzzing with trinity that is similar to this one:
[ 52.765684] net/core/filter.c:2262 suspicious rcu_dereference_protected() usage!
[ 52.765688] other info that might help us debug this:
[ 52.765695] rcu_scheduler_active = 1, debug_locks = 1
[ 52.765701] 1 lock held by a.out/1525:
[ 52.765704] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff816a64b7>] rtnl_lock+0x17/0x20
[ 52.765721] stack backtrace:
[ 52.765728] CPU: 1 PID: 1525 Comm: a.out Not tainted 4.5.0+ #264
[...]
[ 52.765768] Call Trace:
[ 52.765775] [<ffffffff813e488d>] dump_stack+0x85/0xc8
[ 52.765784] [<ffffffff810f2fa5>] lockdep_rcu_suspicious+0xd5/0x110
[ 52.765792] [<ffffffff816afdc2>] sk_detach_filter+0x82/0x90
[ 52.765801] [<ffffffffa0883425>] tun_detach_filter+0x35/0x90 [tun]
[ 52.765810] [<ffffffffa0884ed4>] __tun_chr_ioctl+0x354/0x1130 [tun]
[ 52.765818] [<ffffffff8136fed0>] ? selinux_file_ioctl+0x130/0x210
[ 52.765827] [<ffffffffa0885ce3>] tun_chr_ioctl+0x13/0x20 [tun]
[ 52.765834] [<ffffffff81260ea6>] do_vfs_ioctl+0x96/0x690
[ 52.765843] [<ffffffff81364af3>] ? security_file_ioctl+0x43/0x60
[ 52.765850] [<ffffffff81261519>] SyS_ioctl+0x79/0x90
[ 52.765858] [<ffffffff81003ba2>] do_syscall_64+0x62/0x140
[ 52.765866] [<ffffffff817d563f>] entry_SYSCALL64_slow_path+0x25/0x25
Same can be triggered with PROVE_RCU (+ PROVE_RCU_REPEATEDLY) enabled
from tun_attach_filter() when user space calls ioctl(tun_fd, TUN{ATTACH,
DETACH}FILTER, ...) for adding/removing a BPF filter on tap devices.
Since the fix in f91ff5b9ff52 ("net: sk_{detach|attach}_filter() rcu
fixes") sk_attach_filter()/sk_detach_filter() now dereferences the
filter with rcu_dereference_protected(), checking whether socket lock
is held in control path.
Since its introduction in 994051625981 ("tun: socket filter support"),
tap filters are managed under RTNL lock from __tun_chr_ioctl(). Thus the
sock_owned_by_user(sk) doesn't apply in this specific case and therefore
triggers the false positive.
Extend the BPF API with __sk_attach_filter()/__sk_detach_filter() pair
that is used by tap filters and pass in lockdep_rtnl_is_held() for the
rcu_dereference_protected() checks instead.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
This allows us to ditch a ton of ugly #ifdefs from a bunch of drm modeset
drivers.
v2: Make the dummy function actually return a sane value, spotted by
Ville.
v3: Because the patch is still in limbo there's no more drivers to
convert, noticed by Emil.
v4: Rebase once more, because hooray. I'll just go ahead an apply this
one later on to drm-misc.
Cc: Emil Velikov <emil.l.velikov@gmail.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
This patch functionally reverts:
5fd7a09cfb8c ("atomic: Export fetch_or()")
During the merge Linus observed that the generic version of fetch_or()
was messy:
" This makes the ugly "fetch_or()" macro that the scheduler used
internally a new generic helper, and does a bad job at it. "
e23604edac2a Merge branch 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Now that we have introduced atomic_fetch_or(), fetch_or() is only used
by the scheduler in order to deal with thread_info flags which type
can vary across architectures.
Lets confine fetch_or() back to the scheduler so that we encourage
future users to use the more robust and well typed atomic_t version
instead.
While at it, fetch_or() gets robustified, pasting improvements from a
previous patch by Ingo Molnar that avoids needless expression
re-evaluations in the loop.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1458830281-4255-4-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The tick dependency mask was intially unsigned long because this is the
type on which clear_bit() operates on and fetch_or() accepts it.
But now that we have atomic_fetch_or(), we can instead use
atomic_andnot() to clear the bit. This consolidates the type of our
tick dependency mask, reduce its size on structures and benefit from
possible architecture optimizations on atomic_t operations.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1458830281-4255-3-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
This is deemed to replace the type generic fetch_or() which brings a lot
of issues such as macro induced block variable aliasing and sloppy types.
Not to mention fetch_or() doesn't refer to any namespace, adding even
more confusion.
So lets provide an atomic_t version. Current and next users of fetch_or()
are thus encouraged to use atomic_t.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1458830281-4255-2-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Update the definition of memcpy_from_pmem() to return 0 or a negative
error code. Implement x86/arch_memcpy_from_pmem() with memcpy_mcsafe().
Cc: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
|
|
This fix adds a new reference counter (ref_netlink) for the struct ip_set.
The other reference counter (ref) can be swapped out by ip_set_swap and we
need a separate counter to keep track of references for netlink events
like dump. Using the same ref counter for dump causes a race condition
which can be demonstrated by the following script:
ipset create hash_ip1 hash:ip family inet hashsize 1024 maxelem 500000 \
counters
ipset create hash_ip2 hash:ip family inet hashsize 300000 maxelem 500000 \
counters
ipset create hash_ip3 hash:ip family inet hashsize 1024 maxelem 500000 \
counters
ipset save &
ipset swap hash_ip3 hash_ip2
ipset destroy hash_ip3 /* will crash the machine */
Swap will exchange the values of ref so destroy will see ref = 0 instead of
ref = 1. With this fix in place swap will not succeed because ipset save
still has ref_netlink on the set (ip_set_swap doesn't swap ref_netlink).
Both delete and swap will error out if ref_netlink != 0 on the set.
Note: The changes to *_head functions is because previously we would
increment ref whenever we called these functions, we don't do that
anymore.
Reviewed-by: Joshua Hunt <johunt@akamai.com>
Signed-off-by: Vishwanath Pai <vpai@akamai.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
"There is quite a bit here, including some overdue refactoring and
cleanup on the mon_client and osd_client code from Ilya, scattered
writeback support for CephFS and a pile of bug fixes from Zheng, and a
few random cleanups and fixes from others"
[ I already decided not to pull this because of it having been rebased
recently, but ended up changing my mind after all. Next time I'll
really hold people to it. Oh well. - Linus ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (34 commits)
libceph: use KMEM_CACHE macro
ceph: use kmem_cache_zalloc
rbd: use KMEM_CACHE macro
ceph: use lookup request to revalidate dentry
ceph: kill ceph_get_dentry_parent_inode()
ceph: fix security xattr deadlock
ceph: don't request vxattrs from MDS
ceph: fix mounting same fs multiple times
ceph: remove unnecessary NULL check
ceph: avoid updating directory inode's i_size accidentally
ceph: fix race during filling readdir cache
libceph: use sizeof_footer() more
ceph: kill ceph_empty_snapc
ceph: fix a wrong comparison
ceph: replace CURRENT_TIME by current_fs_time()
ceph: scattered page writeback
libceph: add helper that duplicates last extent operation
libceph: enable large, variable-sized OSD requests
libceph: osdc->req_mempool should be backed by a slab pool
libceph: make r_request msg_size calculation clearer
...
|
|
This series fixes bugs in nfs and ext4 due to 4bacc9c9234c ("overlayfs:
Make f_path always point to the overlay and f_inode to the underlay").
Regular files opened on overlayfs will result in the file being opened on
the underlying filesystem, while f_path points to the overlayfs
mount/dentry.
This confuses filesystems which get the dentry from struct file and assume
it's theirs.
Add a new helper, file_dentry() [*], to get the filesystem's own dentry
from the file. This checks file->f_path.dentry->d_flags against
DCACHE_OP_REAL, and returns file->f_path.dentry if DCACHE_OP_REAL is not
set (this is the common, non-overlayfs case).
In the uncommon case it will call into overlayfs's ->d_real() to get the
underlying dentry, matching file_inode(file).
The reason we need to check against the inode is that if the file is copied
up while being open, d_real() would return the upper dentry, while the open
file comes from the lower dentry.
[*] If possible, it's better simply to use file_inode() instead.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: <stable@vger.kernel.org> # v4.2
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Daniel Axtens <dja@axtens.net>
|
|
Pull NTB bug fixes from Jon Mason:
"NTB bug fixes for tasklet from spinning forever, link errors,
translation window setup, NULL ptr dereference, and ntb-perf errors.
Also, a modification to the driver API that makes _addr functions
optional"
* tag 'ntb-4.6' of git://github.com/jonmason/ntb:
NTB: Remove _addr functions from ntb_hw_amd
NTB: Make _addr functions optional in the API
NTB: Fix incorrect clean up routine in ntb_perf
NTB: Fix incorrect return check in ntb_perf
ntb: fix possible NULL dereference
ntb: add missing setup of translation window
ntb: stop link work when we do not have memory
ntb: stop tasklet from spinning forever during shutdown.
ntb: perf test: fix address space confusion
|
|
Implement the stack depot and provide CONFIG_STACKDEPOT. Stack depot
will allow KASAN store allocation/deallocation stack traces for memory
chunks. The stack traces are stored in a hash table and referenced by
handles which reside in the kasan_alloc_meta and kasan_free_meta
structures in the allocated memory chunks.
IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
duplication.
Right now stackdepot support is only enabled in SLAB allocator. Once
KASAN features in SLAB are on par with those in SLUB we can switch SLUB
to stackdepot as well, thus removing the dependency on SLUB stack
bookkeeping, which wastes a lot of memory.
This patch is based on the "mm: kasan: stack depots" patch originally
prepared by Dmitry Chernenkov.
Joonsoo has said that he plans to reuse the stackdepot code for the
mm/page_owner.c debugging facility.
[akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
[aryabinin@virtuozzo.com: comment style fixes]
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
KASAN needs to know whether the allocation happens in an IRQ handler.
This lets us strip everything below the IRQ entry point to reduce the
number of unique stack traces needed to be stored.
Move the definition of __irq_entry to <linux/interrupt.h> so that the
users don't need to pull in <linux/ftrace.h>. Also introduce the
__softirq_entry macro which is similar to __irq_entry, but puts the
corresponding functions to the .softirqentry.text section.
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add GFP flags to KASAN hooks for future patches to use.
This patch is based on the "mm: kasan: unified support for SLUB and SLAB
allocators" patch originally prepared by Dmitry Chernenkov.
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add KASAN hooks to SLAB allocator.
This patch is based on the "mm: kasan: unified support for SLUB and SLAB
allocators" patch originally prepared by Dmitry Chernenkov.
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A leftover from commit c32b3cbe0d06 ("oom, PM: make OOM detection in the
freezer path raceless").
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
"oom, oom_reaper: disable oom_reaper for oom_kill_allocating_task" tried
to protect oom_reaper_list using MMF_OOM_KILLED flag. But we can do it
by simply checking tsk->oom_reaper_list != NULL.
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Entries are only added/removed from oom_reaper_list at head so we can
use a single linked list and hence save a word in task_struct.
Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Tetsuo has reported that oom_kill_allocating_task=1 will cause
oom_reaper_list corruption because oom_kill_process doesn't follow
standard OOM exclusion (aka ignores TIF_MEMDIE) and allows to enqueue
the same task multiple times - e.g. by sacrificing the same child
multiple times.
This patch fixes the issue by introducing a new MMF_OOM_KILLED mm flag
which is set in oom_kill_process atomically and oom reaper is disabled
if the flag was already set.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
wake_oom_reaper has allowed only 1 oom victim to be queued. The main
reason for that was the simplicity as other solutions would require some
way of queuing. The current approach is racy and that was deemed
sufficient as the oom_reaper is considered a best effort approach to
help with oom handling when the OOM victim cannot terminate in a
reasonable time. The race could lead to missing an oom victim which can
get stuck
out_of_memory
wake_oom_reaper
cmpxchg // OK
oom_reaper
oom_reap_task
__oom_reap_task
oom_victim terminates
atomic_inc_not_zero // fail
out_of_memory
wake_oom_reaper
cmpxchg // fails
task_to_reap = NULL
This race requires 2 OOM invocations in a short time period which is not
very likely but certainly not impossible. E.g. the original victim
might have not released a lot of memory for some reason.
The situation would improve considerably if wake_oom_reaper used a more
robust queuing. This is what this patch implements. This means adding
oom_reaper_list list_head into task_struct (eat a hole before embeded
thread_struct for that purpose) and a oom_reaper_lock spinlock for
queuing synchronization. wake_oom_reaper will then add the task on the
queue and oom_reaper will dequeue it.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When oom_reaper manages to unmap all the eligible vmas there shouldn't
be much of the freable memory held by the oom victim left anymore so it
makes sense to clear the TIF_MEMDIE flag for the victim and allow the
OOM killer to select another task.
The lack of TIF_MEMDIE also means that the victim cannot access memory
reserves anymore but that shouldn't be a problem because it would get
the access again if it needs to allocate and hits the OOM killer again
due to the fatal_signal_pending resp. PF_EXITING check. We can safely
hide the task from the OOM killer because it is clearly not a good
candidate anymore as everyhing reclaimable has been torn down already.
This patch will allow to cap the time an OOM victim can keep TIF_MEMDIE
and thus hold off further global OOM killer actions granted the oom
reaper is able to take mmap_sem for the associated mm struct. This is
not guaranteed now but further steps should make sure that mmap_sem for
write should be blocked killable which will help to reduce such a lock
contention. This is not done by this patch.
Note that exit_oom_victim might be called on a remote task from
__oom_reap_task now so we have to check and clear the flag atomically
otherwise we might race and underflow oom_victims or wake up waiters too
early.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This patch (of 5):
This is based on the idea from Mel Gorman discussed during LSFMM 2015
and independently brought up by Oleg Nesterov.
The OOM killer currently allows to kill only a single task in a good
hope that the task will terminate in a reasonable time and frees up its
memory. Such a task (oom victim) will get an access to memory reserves
via mark_oom_victim to allow a forward progress should there be a need
for additional memory during exit path.
It has been shown (e.g. by Tetsuo Handa) that it is not that hard to
construct workloads which break the core assumption mentioned above and
the OOM victim might take unbounded amount of time to exit because it
might be blocked in the uninterruptible state waiting for an event (e.g.
lock) which is blocked by another task looping in the page allocator.
This patch reduces the probability of such a lockup by introducing a
specialized kernel thread (oom_reaper) which tries to reclaim additional
memory by preemptively reaping the anonymous or swapped out memory owned
by the oom victim under an assumption that such a memory won't be needed
when its owner is killed and kicked from the userspace anyway. There is
one notable exception to this, though, if the OOM victim was in the
process of coredumping the result would be incomplete. This is
considered a reasonable constrain because the overall system health is
more important than debugability of a particular application.
A kernel thread has been chosen because we need a reliable way of
invocation so workqueue context is not appropriate because all the
workers might be busy (e.g. allocating memory). Kswapd which sounds
like another good fit is not appropriate as well because it might get
blocked on locks during reclaim as well.
oom_reaper has to take mmap_sem on the target task for reading so the
solution is not 100% because the semaphore might be held or blocked for
write but the probability is reduced considerably wrt. basically any
lock blocking forward progress as described above. In order to prevent
from blocking on the lock without any forward progress we are using only
a trylock and retry 10 times with a short sleep in between. Users of
mmap_sem which need it for write should be carefully reviewed to use
_killable waiting as much as possible and reduce allocations requests
done with the lock held to absolute minimum to reduce the risk even
further.
The API between oom killer and oom reaper is quite trivial.
wake_oom_reaper updates mm_to_reap with cmpxchg to guarantee only
NULL->mm transition and oom_reaper clear this atomically once it is done
with the work. This means that only a single mm_struct can be reaped at
the time. As the operation is potentially disruptive we are trying to
limit it to the ncessary minimum and the reaper blocks any updates while
it operates on an mm. mm_struct is pinned by mm_count to allow parallel
exit_mmap and a race is detected by atomic_inc_not_zero(mm_users).
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Mel Gorman <mgorman@suse.de>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This will be needed in the patch "mm, oom: introduce oom reaper".
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The type should be struct configfs_bin_attribute and not struct
configfs_attribute.
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
When security is enabled, security module can call filesystem's
getxattr/setxattr callbacks during d_instantiate(). For cephfs,
d_instantiate() is usually called by MDS' dispatch thread, while
handling MDS reply. If the MDS reply does not include xattrs and
corresponding caps, getxattr/setxattr need to send a new request
to MDS and waits for the reply. This makes MDS' dispatch sleep,
nobody handles later MDS replies.
The fix is make sure lookup/atomic_open reply include xattrs and
corresponding caps. So getxattr can be handled by cached xattrs.
This requires some modification to both MDS and request message.
(Client tells MDS what caps it wants; MDS encodes proper caps in
the reply)
Smack security module may call setxattr during d_instantiate().
Unlike getxattr, we can't force MDS to issue CEPH_CAP_XATTR_EXCL
to us. So just make setxattr return error when called by MDS'
dispatch thread.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
|
|
This helper duplicates last extent operation in OSD request, then
adjusts the new extent operation's offset and length. The helper
is for scatterd page writeback, which adds nonconsecutive dirty
pages to single OSD request.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Turn r_ops into a flexible array member to enable large, consisting of
up to 16 ops, OSD requests. The use case is scattered writeback in
cephfs and, as far as the kernel client is concerned, 16 is just a made
up number.
r_ops had size 3 for copyup+hint+write, but copyup is really a special
case - it can only happen once. ceph_osd_request_cache is therefore
stuffed with num_ops=2 requests, anything bigger than that is allocated
with kmalloc(). req_mempool is backed by ceph_osd_request_cache, which
means either num_ops=1 or num_ops=2 for use_mempool=true - all existing
users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with
that.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
This avoids defining large array of r_reply_op_{len,result} in
in struct ceph_osd_request.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|
Follow userspace nomenclature on this - the next commit adds
outdata_len.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|