Age | Commit message (Collapse) | Author |
|
Pull UBI and UBIFS updates from Richard Weinberger:
- the UBI on-disk format header file is now dual licensed
- new way to detect Fastmap problems during runtime
- bugfix for Fastmap
- minor updates for UBIFS (spelling, comments, vm_fault_t, ...)
* tag 'upstream-4.18-rc1' of git://git.infradead.org/linux-ubifs:
mtd: ubi: Update ubi-media.h to dual license
ubi: fastmap: Detect EBA mismatches on-the-fly
ubi: fastmap: Check each mapping only once
ubi: fastmap: Correctly handle interrupted erasures in EBA
ubi: fastmap: Cancel work upon detach
ubifs: lpt: Fix wrong pnode number range in comment
ubifs: gc: Fix typo
ubifs: log: Some spelling fixes
ubifs: Spelling fix someting -> something
ubifs: journal: Remove wrong comment
ubifs: remove set but never used variable
ubifs, xattr: remove misguided quota flags
fs: ubifs: Adding new return type vm_fault_t
|
|
Pull cifs fixes from Steve French:
- one smb3 (ACL related) fix for stable
- one SMB3 security enhancement (when mounting -t smb3 forbid less
secure dialects)
- some RDMA and compounding fixes
* tag '4.18-fixes-smb3' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix a buffer leak in smb2_query_symlink
smb3: do not allow insecure cifs mounts when using smb3
CIFS: Fix NULL ptr deref
CIFS: fix encryption in SMB3.1.1
CIFS: Pass page offset for encrypting
CIFS: Pass page offset for calculating signature
CIFS: SMBD: Support page offset in memory registration
CIFS: SMBD: Support page offset in RDMA recv
CIFS: SMBD: Support page offset in RDMA send
CIFS: When sending data on socket, pass the correct page offset
CIFS: Introduce helper function to get page offset and length in smb_rqst
CIFS: Calculate the correct request length based on page offset and tail size
cifs: For SMB2 security informaion query, check for minimum sized security descriptor instead of sizeof FileAllInformation class
CIFS: Fix signing for SMB2/3
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull restartable sequence support from Thomas Gleixner:
"The restartable sequences syscall (finally):
After a lot of back and forth discussion and massive delays caused by
the speculative distraction of maintainers, the core set of
restartable sequences has finally reached a consensus.
It comes with the basic non disputed core implementation along with
support for arm, powerpc and x86 and a full set of selftests
It was exposed to linux-next earlier this week, so it does not fully
comply with the merge window requirements, but there is really no
point to drag it out for yet another cycle"
* 'core-rseq-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq/selftests: Provide Makefile, scripts, gitignore
rseq/selftests: Provide parametrized tests
rseq/selftests: Provide basic percpu ops test
rseq/selftests: Provide basic test
rseq/selftests: Provide rseq library
selftests/lib.mk: Introduce OVERRIDE_TARGETS
powerpc: Wire up restartable sequences system call
powerpc: Add syscall detection for restartable sequences
powerpc: Add support for restartable sequences
x86: Wire up restartable sequence system call
x86: Add support for restartable sequences
arm: Wire up restartable sequences system call
arm: Add syscall detection for restartable sequences
arm: Add restartable sequences support
rseq: Introduce restartable sequences system call
uapi/headers: Provide types_32_64.h
|
|
If the server returns NFS4ERR_SEQ_FALSE_RETRY or NFS4ERR_RETRY_UNCACHED_REP,
then it thinks we're trying to replay an existing request. If so, then
let's just bump the sequence ID and retry the operation.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Merge proc_cmdline simplifications.
This re-writes the get_mm_cmdline() logic to be rather simpler than it
used to be, and makes the semantics for "cmdline goes past the end of
the original area" more natural.
You _can_ use prctl(PR_SET_MM) to just point your command line somewhere
else entirely, but the traditional model is to just edit things in place
and that still needs to continue to work. At least this way the code
makes some sense.
* proc-cmdline:
fs/proc: simplify and clarify get_mm_cmdline() function
fs/proc: re-factor proc_pid_cmdline_read() a bit
|
|
Use the error code EUCLEAN for filesystem errors because other
filesystems use this code too.
[ And remove unused EMEMERROR - Linus ]
Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO updates from Greg KH:
"Here is the big staging and IIO driver update for 4.18-rc1.
It was delayed as I wanted to make sure the final driver deletions did
not cause any major merge issues, and all now looks good.
There are a lot of patches here, just over 1000. The diffstat summary
shows the major changes here:
1007 files changed, 16828 insertions(+), 227770 deletions(-)
Because of this, we might be close to shrinking the overall kernel
source code size for two releases in a row.
There was loads of work in this release cycle, primarily:
- tons of ks7010 driver cleanups
- lots of mt7621 driver fixes and cleanups
- most driver cleanups
- wilc1000 fixes and cleanups
- lots and lots of IIO driver cleanups and new additions
- debugfs cleanups for all staging drivers
- lots of other staging driver cleanups and fixes, the shortlog has
the full details.
but the big user-visable things here are the removal of 3 chunks of
code:
- ncpfs and ipx were removed on schedule, no one has cared about this
code since it moved to staging last year, and if it needs to come
back, it can be reverted.
- lustre file system is removed.
I've ranted at the lustre developers about once a year for the past
5 years, with no real forward progress at all to clean things up
and get the code into the "real" part of the kernel.
Given that the lustre developers continue to work on an external
tree and try to port those changes to the in-kernel tree every once
in a while, this whole thing really really is not working out at
all. So I'm deleting it so that the developers can spend the time
working in their out-of-tree location and get things cleaned up
properly to get merged into the tree correctly at a later date.
Because of these file removals, you will have merge issues on some of
these files (2 in the ipx code, 1 in the ncpfs code, and 1 in the
atomisp driver). Just delete those files, it's a simple merge :)
All of this has been in linux-next for a while with no reported
problems"
* tag 'staging-4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (1011 commits)
staging: ipx: delete it from the tree
ncpfs: remove uapi .h files
ncpfs: remove Documentation
ncpfs: remove compat functionality
staging: ncpfs: delete it
staging: lustre: delete the filesystem from the tree.
staging: vc04_services: no need to save the log debufs dentries
staging: vc04_services: vchiq_debugfs_log_entry can be a void *
staging: vc04_services: remove struct vchiq_debugfs_info
staging: vc04_services: move client dbg directory into static variable
staging: vc04_services: remove odd vchiq_debugfs_top() wrapper
staging: vc04_services: no need to check debugfs return values
staging: mt7621-gpio: reorder includes alphabetically
staging: mt7621-gpio: change gc_map to don't use pointers
staging: mt7621-gpio: use GPIOF_DIR_OUT and GPIOF_DIR_IN macros instead of custom values
staging: mt7621-gpio: change 'to_mediatek_gpio' to make just a one line return
staging: mt7621-gpio: dt-bindings: update documentation for #interrupt-cells property
staging: mt7621-gpio: update #interrupt-cells for the gpio node
staging: mt7621-gpio: dt-bindings: complete documentation for the gpio
staging: mt7621-dts: add missing properties to gpio node
...
|
|
We want to compare the slot_id to the highest slot number advertised by the
server.
Fixes: 3be0f80b5fe9c ("NFSv4.1: Fix up replays of interrupted requests")
Cc: stable@vger.kernel.org # 4.15+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
The correct behaviour for NFSv4 sequence IDs is to wrap around
to the value 0 after 0xffffffff.
See https://tools.ietf.org/html/rfc5661#section-2.10.6.1
Fixes: 5f83d86cf531d ("NFSv4.x: Fix wraparound issues when validing...")
Cc: stable@vger.kernel.org # 4.6+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"This adds a user for the new 'bytes-remaining' updates to
memcpy_mcsafe() that you already received through Ingo via the
x86-dax- for-linus pull.
Not included here, but still targeting this cycle, is support for
handling memory media errors (poison) consumed via userspace dax
mappings.
Summary:
- DAX broke a fundamental assumption of truncate of file mapped
pages. The truncate path assumed that it is safe to disconnect a
pinned page from a file and let the filesystem reclaim the physical
block. With DAX the page is equivalent to the filesystem block.
Introduce dax_layout_busy_page() to enable filesystems to wait for
pinned DAX pages to be released. Without this wait a filesystem
could allocate blocks under active device-DMA to a new file.
- DAX arranges for the block layer to be bypassed and uses
dax_direct_access() + copy_to_iter() to satisfy read(2) calls.
However, the memcpy_mcsafe() facility is available through the pmem
block driver. In order to safely handle media errors, via the DAX
block-layer bypass, introduce copy_to_iter_mcsafe().
- Fix cache management policy relative to the ACPI NFIT Platform
Capabilities Structure to properly elide cache flushes when they
are not necessary. The table indicates whether CPU caches are
power-fail protected. Clarify that a deep flush is always performed
on REQ_{FUA,PREFLUSH} requests"
* tag 'libnvdimm-for-4.18' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (21 commits)
dax: Use dax_write_cache* helpers
libnvdimm, pmem: Do not flush power-fail protected CPU caches
libnvdimm, pmem: Unconditionally deep flush on *sync
libnvdimm, pmem: Complete REQ_FLUSH => REQ_PREFLUSH
acpi, nfit: Remove ecc_unit_size
dax: dax_insert_mapping_entry always succeeds
libnvdimm, e820: Register all pmem resources
libnvdimm: Debug probe times
linvdimm, pmem: Preserve read-only setting for pmem devices
x86, nfit_test: Add unit test for memcpy_mcsafe()
pmem: Switch to copy_to_iter_mcsafe()
dax: Report bytes remaining in dax_iomap_actor()
dax: Introduce a ->copy_to_iter dax operation
uio, lib: Fix CONFIG_ARCH_HAS_UACCESS_MCSAFE compilation
xfs, dax: introduce xfs_break_dax_layouts()
xfs: prepare xfs_break_layouts() for another layout type
xfs: prepare xfs_break_layouts() to be called with XFS_MMAPLOCK_EXCL
mm, fs, dax: handle layout changes to pinned dax mappings
mm: fix __gup_device_huge vs unmap
mm: introduce MEMORY_DEVICE_FS_DAX and CONFIG_DEV_PAGEMAP_OPS
...
|
|
|
|
|
|
I noticed a memory corruption crash in nfsd in
4.17-rc1. This patch corrects the issue.
Fix to return error if the delegation couldn't be hashed or there was
a recall in progress. Use the existing error path instead of
destroy_delegation() for readability.
Signed-off-by: Andrew Elble <aweits@rit.edu>
Fixes: 353601e7d323c ("nfsd: create a separate lease for each delegation")
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
When running a fuzz tester against a KASAN-enabled kernel, the following
splat periodically occurs.
The problem occurs when the test sends a GETDEVICEINFO request with a
malformed xdr array (size but no data) for gdia_notify_types and the
array size is > 0x3fffffff, which results in an overflow in the value of
nbytes which is passed to read_buf().
If the array size is 0x40000000, 0x80000000, or 0xc0000000, then after
the overflow occurs, the value of nbytes 0, and when that happens the
pointer returned by read_buf() points to the end of the xdr data (i.e.
argp->end) when really it should be returning NULL.
Fix this by returning NFS4ERR_BAD_XDR if the array size is > 1000 (this
value is arbitrary, but it's the same threshold used by
nfsd4_decode_bitmap()... in could really be any value >= 1 since it's
expected to get at most a single bitmap in gdia_notify_types).
[ 119.256854] ==================================================================
[ 119.257611] BUG: KASAN: use-after-free in nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
[ 119.258422] Read of size 4 at addr ffff880113ada000 by task nfsd/538
[ 119.259146] CPU: 0 PID: 538 Comm: nfsd Not tainted 4.17.0+ #1
[ 119.259662] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
[ 119.261202] Call Trace:
[ 119.262265] dump_stack+0x71/0xab
[ 119.263371] print_address_description+0x6a/0x270
[ 119.264609] kasan_report+0x258/0x380
[ 119.265854] ? nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
[ 119.267291] nfsd4_decode_getdeviceinfo+0x5a4/0x5b0 [nfsd]
[ 119.268549] ? nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd]
[ 119.269873] ? nfsd4_decode_sequence+0x490/0x490 [nfsd]
[ 119.271095] nfs4svc_decode_compoundargs+0xa5b/0x13c0 [nfsd]
[ 119.272393] ? nfsd4_release_compoundargs+0x1b0/0x1b0 [nfsd]
[ 119.273658] nfsd_dispatch+0x183/0x850 [nfsd]
[ 119.274918] svc_process+0x161c/0x31a0 [sunrpc]
[ 119.276172] ? svc_printk+0x190/0x190 [sunrpc]
[ 119.277386] ? svc_xprt_release+0x451/0x680 [sunrpc]
[ 119.278622] nfsd+0x2b9/0x430 [nfsd]
[ 119.279771] ? nfsd_destroy+0x1c0/0x1c0 [nfsd]
[ 119.281157] kthread+0x2db/0x390
[ 119.282347] ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 119.283756] ret_from_fork+0x35/0x40
[ 119.286041] Allocated by task 436:
[ 119.287525] kasan_kmalloc+0xa0/0xd0
[ 119.288685] kmem_cache_alloc+0xe9/0x1f0
[ 119.289900] get_empty_filp+0x7b/0x410
[ 119.291037] path_openat+0xca/0x4220
[ 119.292242] do_filp_open+0x182/0x280
[ 119.293411] do_sys_open+0x216/0x360
[ 119.294555] do_syscall_64+0xa0/0x2f0
[ 119.295721] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 119.298068] Freed by task 436:
[ 119.299271] __kasan_slab_free+0x130/0x180
[ 119.300557] kmem_cache_free+0x78/0x210
[ 119.301823] rcu_process_callbacks+0x35b/0xbd0
[ 119.303162] __do_softirq+0x192/0x5ea
[ 119.305443] The buggy address belongs to the object at ffff880113ada000
which belongs to the cache filp of size 256
[ 119.308556] The buggy address is located 0 bytes inside of
256-byte region [ffff880113ada000, ffff880113ada100)
[ 119.311376] The buggy address belongs to the page:
[ 119.312728] page:ffffea00044eb680 count:1 mapcount:0 mapping:0000000000000000 index:0xffff880113ada780
[ 119.314428] flags: 0x17ffe000000100(slab)
[ 119.315740] raw: 0017ffe000000100 0000000000000000 ffff880113ada780 00000001000c0001
[ 119.317379] raw: ffffea0004553c60 ffffea00045c11e0 ffff88011b167e00 0000000000000000
[ 119.319050] page dumped because: kasan: bad access detected
[ 119.321652] Memory state around the buggy address:
[ 119.322993] ffff880113ad9f00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 119.324515] ffff880113ad9f80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 119.326087] >ffff880113ada000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 119.327547] ^
[ 119.328730] ffff880113ada080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ 119.330218] ffff880113ada100: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
[ 119.331740] ==================================================================
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
If the attempt to recall the delegation fails because the inode is
in the process of being evicted from cache, then use NFS4ERR_DELAY
to ask the server to retry later.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
do_mod() is a hold-over from when we have different sizes for file
offsets and and other internal values for 40 bit XFS filesystems.
Hence depending on build flags variables passed to do_mod() could
change size. We no longer support those small format filesystems and
hence everything is of fixed size theses days, even on 32 bit
platforms.
As such, we can convert all the do_mod() callers to platform
optimised modulus operations as defined by linux/math64.h.
Individual conversions depend on the types of variables being used.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
xfs_attr3_leaf_create may have errored out before instantiating a buffer,
for example if the blkno is out of range. In that case there is no work
to do to remove it, and in fact xfs_da_shrink_inode will lead to an oops
if we try.
This also seems to fix a flaw where the original error from
xfs_attr3_leaf_create gets overwritten in the cleanup case, and it
removes a pointless assignment to bp which isn't used after this.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199969
Reported-by: Xu, Wen <wen.xu@gatech.edu>
Tested-by: Xu, Wen <wen.xu@gatech.edu>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
Get rid of the MIN/MAX macros and just use the native min/max macros
directly in the XFS code.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
New verification functions like xfs_verify_fsbno() and
xfs_verify_agino() are spread across multiple files and different
header files. They really don't fit cleanly into the places they've
been put, and have wider scope than the current header includes.
Move the type verifiers to a new file in libxfs (xfs-types.c) and
the prototypes to xfs_types.h where they will be visible to all the
code that uses the types.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
xfs_reflink_convert_cow() manipulates the incore extent list
in GFP_KERNEL context in the IO submission path whilst holding
locked pages under writeback. This is a memory reclaim deadlock
vector. This code is not in a transaction, so any memory allocations
it makes aren't protected via the memalloc_nofs_save() context that
transactions carry.
Hence we need to run this call under memalloc_nofs_save() context to
prevent potential memory allocations from being run as GFP_KERNEL
and deadlocking.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
When lockdep is enabled, it changes the type of the inode i_rwsem
semaphore before unlocking a newly instantiated inode. THere is the
possibility that there is already a waiter on that inode lock by the
time we unlock the new inode, so having lockdep re-initialise the
lock is a vector for trouble.
Avoid this whole situation by setting up the i_rwsem lockdep class
at the same time we set up the XFS inode i_ilock classes and so the
VFS doesn't have to change the lock class itself when it is
potentially unsafe.
This change is necessary because the equivalent fixes to the VFS code
made in commit 1e2e547a93a0 ("do d_instantiate/unlock_new_inode
combinations safely") are not relevant to XFS as it has it's own
internal inode cache lookup and instantiation routines.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull aio iopriority support from Al Viro:
"The rest of aio stuff for this cycle - Adam's aio ioprio series"
* 'work.aio' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: aio ioprio use ioprio_check_cap ret val
fs: aio ioprio add explicit block layer dependence
fs: iomap dio set bio prio from kiocb prio
fs: blkdev set bio prio from kiocb prio
fs: Add aio iopriority support
fs: Convert kiocb rw_hint from enum to u16
block: add ioprio_check_cap function
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull proc_fill_cache regression fix from Al Viro:
"Regression fix for proc_fill_cache() braino introduced when switching
instantiate() callback to d_splice_alias()"
* 'work.lookup' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix proc_fill_cache() in case of d_alloc_parallel() failure
|
|
If d_alloc_parallel() returns ERR_PTR(...), we don't want to dput()
that. Small reorganization allows to have all error-in-lookup
cases rejoin the main codepath after dput(child), avoiding the
entire problem.
Spotted-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Fixes: 0168b9e38c42 "procfs: switch instantiate_t to d_splice_alias()"
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
This leak was introduced in 91cb74f5142c14dd921ab2d064b7b128054f9fae and caused us
to leak one small buffer for every symlink query.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
We don't set "*name" so it's slightly nicer to just pass "name" instead
of "&name".
Link: http://lkml.kernel.org/r/20180531064736.lnisb55eajwjynvk@kili.mountain
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Remove includes that aren't needed from autofs (and fs/compat_ioctl.c).
Link: http://lkml.kernel.org/r/152635085258.5968.9743527195522188148.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Due to the autofs4 module using a file system type name of autofs
different from the module containing directory name autoload did not
function properly. To work around this kernel configurations have often
elected to build the module into the kernel.
This can result in selinux policies that prohibit autoloading of the
autofs module which need to be changed.
Add a comment about this to "possible changes" section of the autofs4
module help.
Link: http://lkml.kernel.org/r/152686474171.6155.1239659539983577463.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Create Makefile and Kconfig for autofs module.
[raven@themaw.net: make autofs4 Kconfig depend on AUTOFS_FS]
Link: http://lkml.kernel.org/r/152687649097.8263.7046086367407522029.stgit@pluto.themaw.net
Link: http://lkml.kernel.org/r/152626705591.28589.356365986974038383.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Delete the now unused autofs4 module files.
Link: http://lkml.kernel.org/r/152626707391.28589.3553309771262313504.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Update Makefile to build from source in fs/autofs instead of fs/autofs4.
Link: http://lkml.kernel.org/r/152626706824.28589.1915028175544560855.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Update Kconfig and add a depricated warning.
[raven@themaw.net: make autofs4 Kconfig depend on AUTOFS_FS]
Link: http://lkml.kernel.org/r/152687649097.8263.7046086367407522029.stgit@pluto.themaw.net
Link: http://lkml.kernel.org/r/152626706133.28589.11994171621899212952.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Copy source files from the autofs4 directory to the autofs directory.
Link: http://lkml.kernel.org/r/152626705013.28589.931913083997578251.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Update naming within autofs source to be consistent by changing
occurrences of autofs4 to autofs.
Link: http://lkml.kernel.org/r/152626703688.28589.8315406711135226803.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The autofs module has long since been removed so there's no need to have
two separate include files for autofs.
Link: http://lkml.kernel.org/r/152626703024.28589.9571964661718767929.stgit@pluto.themaw.net
Signed-off-by: Ian Kent <raven@themaw.net>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
WHen registering a new binfmt_misc handler, it is possible to overflow
the offset to get a negative value, which might crash the system, or
possibly leak kernel data.
Here is a crash log when 2500000000 was used as an offset:
BUG: unable to handle kernel paging request at ffff989cfd6edca0
IP: load_misc_binary+0x22b/0x470 [binfmt_misc]
PGD 1ef3e067 P4D 1ef3e067 PUD 0
Oops: 0000 [#1] SMP NOPTI
Modules linked in: binfmt_misc kvm_intel ppdev kvm irqbypass joydev input_leds serio_raw mac_hid parport_pc qemu_fw_cfg parpy
CPU: 0 PID: 2499 Comm: bash Not tainted 4.15.0-22-generic #24-Ubuntu
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/2014
RIP: 0010:load_misc_binary+0x22b/0x470 [binfmt_misc]
Call Trace:
search_binary_handler+0x97/0x1d0
do_execveat_common.isra.34+0x667/0x810
SyS_execve+0x31/0x40
do_syscall_64+0x73/0x130
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
Use kstrtoint instead of simple_strtoul. It will work as the code
already set the delimiter byte to '\0' and we only do it when the field
is not empty.
Tested with offsets -1, 2500000000, UINT_MAX and INT_MAX. Also tested
with examples documented at Documentation/admin-guide/binfmt-misc.rst
and other registrations from packages on Ubuntu.
Link: http://lkml.kernel.org/r/20180529135648.14254-1-cascardo@canonical.com
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
struct stack_trace::nr_entries is defined as "unsigned int" (YAY!) so
the iterator should be unsigned as well.
It saves 1 byte of code or something like that.
Link: http://lkml.kernel.org/r/20180423215248.GG9043@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
It's defined as atomic_t and really long signal queues are unheard of.
Link: http://lkml.kernel.org/r/20180423215119.GF9043@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
All those lengths are unsigned as they should be.
Link: http://lkml.kernel.org/r/20180423213751.GC9043@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
struct kstat is thread local.
Link: http://lkml.kernel.org/r/20180423213626.GB9043@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Code can be sonsolidated if a dummy region of 0 length is used in normal
case of \0-separated command line:
1) [arg_start, arg_end) + [dummy len=0]
2) [arg_start, arg_end) + [env_start, env_end)
Link: http://lkml.kernel.org/r/20180221193335.GB28678@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
"rv" variable is used both as a counter of bytes transferred and an
error value holder but it can be reduced solely to error values if
original start of userspace buffer is stashed and used at the very end.
[akpm@linux-foundation.org: simplify cleanup code]
Link: http://lkml.kernel.org/r/20180221193009.GA28678@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
"final" variable is OK but we can get away with less lines.
Link: http://lkml.kernel.org/r/20180221192751.GC28548@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
access_remote_vm() doesn't return negative errors, it returns number of
bytes read/written (0 if error occurs). This allows to delete some
comparisons which never trigger.
Reuse "nr_read" variable while I'm at it.
Link: http://lkml.kernel.org/r/20180221192605.GB28548@avx2
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If a process monitored with userfaultfd changes it's memory mappings or
forks() at the same time as uffd monitor fills the process memory with
UFFDIO_COPY, the actual creation of page table entries and copying of
the data in mcopy_atomic may happen either before of after the memory
mapping modifications and there is no way for the uffd monitor to
maintain consistent view of the process memory layout.
For instance, let's consider fork() running in parallel with
userfaultfd_copy():
process | uffd monitor
---------------------------------+------------------------------
fork() | userfaultfd_copy()
... | ...
dup_mmap() | down_read(mmap_sem)
down_write(mmap_sem) | /* create PTEs, copy data */
dup_uffd() | up_read(mmap_sem)
copy_page_range() |
up_write(mmap_sem) |
dup_uffd_complete() |
/* notify monitor */ |
If the userfaultfd_copy() takes the mmap_sem first, the new page(s) will
be present by the time copy_page_range() is called and they will appear
in the child's memory mappings. However, if the fork() is the first to
take the mmap_sem, the new pages won't be mapped in the child's address
space.
If the pages are not present and child tries to access them, the monitor
will get page fault notification and everything is fine. However, if
the pages *are present*, the child can access them without uffd
noticing. And if we copy them into child it'll see the wrong data.
Since we are talking about background copy, we'd need to decide whether
the pages should be copied or not regardless #PF notifications.
Since userfaultfd monitor has no way to determine what was the order,
let's disallow userfaultfd_copy in parallel with the non-cooperative
events. In such case we return -EAGAIN and the uffd monitor can
understand that userfaultfd_copy() clashed with a non-cooperative event
and take an appropriate action.
Link: http://lkml.kernel.org/r/1527061324-19949-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Acked-by: Pavel Emelyanov <xemul@virtuozzo.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrei Vagin <avagin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Define a new PageTable bit in the page_type and use it to mark pages in
use as page tables. This can be helpful when debugging crashdumps or
analysing memory fragmentation. Add a KPF flag to report these pages to
userspace and update page-types.c to interpret that flag.
Note that only pages currently accounted as NR_PAGETABLES are tracked as
PageTable; this does not include pgd/p4d/pud/pmd pages. Those will be the
subject of a later patch.
Link: http://lkml.kernel.org/r/20180518194519.3820-4-willy@infradead.org
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jérôme Glisse <jglisse@redhat.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In commit ab676b7d6fbf ("pagemap: do not leak physical addresses to
non-privileged userspace"), the /proc/PID/pagemap is restricted to be
readable only by CAP_SYS_ADMIN to address some security issue.
In commit 1c90308e7a77 ("pagemap: hide physical addresses from
non-privileged users"), the restriction is relieved to make
/proc/PID/pagemap readable, but hide the physical addresses for
non-privileged users.
But the swap entries are readable for non-privileged users too. This
has some security issues. For example, for page under migrating, the
swap entry has physical address information. So, in this patch, the
swap entries are hided for non-privileged users too.
Link: http://lkml.kernel.org/r/20180508012745.7238-1-ying.huang@intel.com
Fixes: 1c90308e7a77 ("pagemap: hide physical addresses from non-privileged users")
Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Suggested-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Andrei Vagin <avagin@openvz.org>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Zi Yan <zi.yan@cs.rutgers.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
With the addition of memfd hugetlbfs support, we now have the situation
where memfd depends on TMPFS -or- HUGETLBFS. Previously, memfd was only
supported on tmpfs, so it made sense that the code resided in shmem.c.
In the current code, memfd is only functional if TMPFS is defined. If
HUGETLFS is defined and TMPFS is not defined, then memfd functionality
will not be available for hugetlbfs. This does not cause BUGs, just a
lack of potentially desired functionality.
Code is restructured in the following way:
- include/linux/memfd.h is a new file containing memfd specific
definitions previously contained in shmem_fs.h.
- mm/memfd.c is a new file containing memfd specific code previously
contained in shmem.c.
- memfd specific code is removed from shmem_fs.h and shmem.c.
- A new config option MEMFD_CREATE is added that is defined if TMPFS
or HUGETLBFS is defined.
No functional changes are made to the code: restructuring only.
Link: http://lkml.kernel.org/r/20180415182119.4517-4-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Marc-Andr Lureau <marcandre.lureau@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
mmap_sem is on the hot path of kernel, and it very contended, but it is
abused too. It is used to protect arg_start|end and evn_start|end when
reading /proc/$PID/cmdline and /proc/$PID/environ, but it doesn't make
sense since those proc files just expect to read 4 values atomically and
not related to VM, they could be set to arbitrary values by C/R.
And, the mmap_sem contention may cause unexpected issue like below:
INFO: task ps:14018 blocked for more than 120 seconds.
Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
message.
ps D 0 14018 1 0x00000004
Call Trace:
schedule+0x36/0x80
rwsem_down_read_failed+0xf0/0x150
call_rwsem_down_read_failed+0x18/0x30
down_read+0x20/0x40
proc_pid_cmdline_read+0xd9/0x4e0
__vfs_read+0x37/0x150
vfs_read+0x96/0x130
SyS_read+0x55/0xc0
entry_SYSCALL_64_fastpath+0x1a/0xc5
Both Alexey Dobriyan and Michal Hocko suggested to use dedicated lock
for them to mitigate the abuse of mmap_sem.
So, introduce a new spinlock in mm_struct to protect the concurrent
access to arg_start|end, env_start|end and others, as well as replace
write map_sem to read to protect the race condition between prctl and
sys_brk which might break check_data_rlimit(), and makes prctl more
friendly to other VM operations.
This patch just eliminates the abuse of mmap_sem, but it can't resolve
the above hung task warning completely since the later
access_remote_vm() call needs acquire mmap_sem. The mmap_sem
scalability issue will be solved in the future.
[yang.shi@linux.alibaba.com: add comment about mmap_sem and arg_lock]
Link: http://lkml.kernel.org/r/1524077799-80690-1-git-send-email-yang.shi@linux.alibaba.com
Link: http://lkml.kernel.org/r/1523730291-109696-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|