Age | Commit message (Collapse) | Author |
|
iput() tests whether its argument is NULL and then returns immediately.
Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Reviewed-by: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
fsnotify_destroy_mark_locked() is subtle to use because it temporarily
releases group->mark_mutex. To avoid future problems with this
function, split it into two.
fsnotify_detach_mark() is the part that needs group->mark_mutex and
fsnotify_free_mark() is the part that must be called outside of
group->mark_mutex. This way it's much clearer what's going on and we
also avoid some pointless acquisitions of group->mark_mutex.
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Free list is used when all marks on given inode / mount should be
destroyed when inode / mount is going away. However we can free all of
the marks without using a special list with some care.
Signed-off-by: Jan Kara <jack@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
A check in inotify_fdinfo() checking whether mark is valid was always
true due to a bug. Luckily we can never get to invalidated marks since
we hold mark_mutex and invalidated marks get removed from the group list
when they are invalidated under that mutex.
Anyway fix the check to make code more future proof.
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
I have a _tiny_ microbenchmark that sits in a loop and writes single
bytes to a file. Writing one byte to a tmpfs file is around 2x slower
than reading one byte from a file, which is a _bit_ more than I expecte.
This is a dumb benchmark, but I think it's hard to deny that write() is
a hot path and we should avoid unnecessary overhead there.
I did a 'perf record' of 30-second samples of read and write. The top
item in a diffprofile is srcu_read_lock() from fsnotify(). There are
active inotify fd's from systemd, but nothing is actually listening to
the file or its part of the filesystem.
I *think* we can avoid taking the srcu_read_lock() for the common case
where there are no actual marks on the file. This means that there will
both be nothing to notify for *and* implies that there is no need for
clearing the ignore mask.
This patch gave a 13.1% speedup in writes/second on my test, which is an
improvement from the 10.8% that I saw with the last version.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Credit where credit is due: this idea comes from Christoph Lameter with
a lot of valuable input from Serge Hallyn. This patch is heavily based
on Christoph's patch.
===== The status quo =====
On Linux, there are a number of capabilities defined by the kernel. To
perform various privileged tasks, processes can wield capabilities that
they hold.
Each task has four capability masks: effective (pE), permitted (pP),
inheritable (pI), and a bounding set (X). When the kernel checks for a
capability, it checks pE. The other capability masks serve to modify
what capabilities can be in pE.
Any task can remove capabilities from pE, pP, or pI at any time. If a
task has a capability in pP, it can add that capability to pE and/or pI.
If a task has CAP_SETPCAP, then it can add any capability to pI, and it
can remove capabilities from X.
Tasks are not the only things that can have capabilities; files can also
have capabilities. A file can have no capabilty information at all [1].
If a file has capability information, then it has a permitted mask (fP)
and an inheritable mask (fI) as well as a single effective bit (fE) [2].
File capabilities modify the capabilities of tasks that execve(2) them.
A task that successfully calls execve has its capabilities modified for
the file ultimately being excecuted (i.e. the binary itself if that
binary is ELF or for the interpreter if the binary is a script.) [3] In
the capability evolution rules, for each mask Z, pZ represents the old
value and pZ' represents the new value. The rules are:
pP' = (X & fP) | (pI & fI)
pI' = pI
pE' = (fE ? pP' : 0)
X is unchanged
For setuid binaries, fP, fI, and fE are modified by a moderately
complicated set of rules that emulate POSIX behavior. Similarly, if
euid == 0 or ruid == 0, then fP, fI, and fE are modified differently
(primary, fP and fI usually end up being the full set). For nonroot
users executing binaries with neither setuid nor file caps, fI and fP
are empty and fE is false.
As an extra complication, if you execute a process as nonroot and fE is
set, then the "secure exec" rules are in effect: AT_SECURE gets set,
LD_PRELOAD doesn't work, etc.
This is rather messy. We've learned that making any changes is
dangerous, though: if a new kernel version allows an unprivileged
program to change its security state in a way that persists cross
execution of a setuid program or a program with file caps, this
persistent state is surprisingly likely to allow setuid or file-capped
programs to be exploited for privilege escalation.
===== The problem =====
Capability inheritance is basically useless.
If you aren't root and you execute an ordinary binary, fI is zero, so
your capabilities have no effect whatsoever on pP'. This means that you
can't usefully execute a helper process or a shell command with elevated
capabilities if you aren't root.
On current kernels, you can sort of work around this by setting fI to
the full set for most or all non-setuid executable files. This causes
pP' = pI for nonroot, and inheritance works. No one does this because
it's a PITA and it isn't even supported on most filesystems.
If you try this, you'll discover that every nonroot program ends up with
secure exec rules, breaking many things.
This is a problem that has bitten many people who have tried to use
capabilities for anything useful.
===== The proposed change =====
This patch adds a fifth capability mask called the ambient mask (pA).
pA does what most people expect pI to do.
pA obeys the invariant that no bit can ever be set in pA if it is not
set in both pP and pI. Dropping a bit from pP or pI drops that bit from
pA. This ensures that existing programs that try to drop capabilities
still do so, with a complication. Because capability inheritance is so
broken, setting KEEPCAPS, using setresuid to switch to nonroot uids, and
then calling execve effectively drops capabilities. Therefore,
setresuid from root to nonroot conditionally clears pA unless
SECBIT_NO_SETUID_FIXUP is set. Processes that don't like this can
re-add bits to pA afterwards.
The capability evolution rules are changed:
pA' = (file caps or setuid or setgid ? 0 : pA)
pP' = (X & fP) | (pI & fI) | pA'
pI' = pI
pE' = (fE ? pP' : pA')
X is unchanged
If you are nonroot but you have a capability, you can add it to pA. If
you do so, your children get that capability in pA, pP, and pE. For
example, you can set pA = CAP_NET_BIND_SERVICE, and your children can
automatically bind low-numbered ports. Hallelujah!
Unprivileged users can create user namespaces, map themselves to a
nonzero uid, and create both privileged (relative to their namespace)
and unprivileged process trees. This is currently more or less
impossible. Hallelujah!
You cannot use pA to try to subvert a setuid, setgid, or file-capped
program: if you execute any such program, pA gets cleared and the
resulting evolution rules are unchanged by this patch.
Users with nonzero pA are unlikely to unintentionally leak that
capability. If they run programs that try to drop privileges, dropping
privileges will still work.
It's worth noting that the degree of paranoia in this patch could
possibly be reduced without causing serious problems. Specifically, if
we allowed pA to persist across executing non-pA-aware setuid binaries
and across setresuid, then, naively, the only capabilities that could
leak as a result would be the capabilities in pA, and any attacker
*already* has those capabilities. This would make me nervous, though --
setuid binaries that tried to privilege-separate might fail to do so,
and putting CAP_DAC_READ_SEARCH or CAP_DAC_OVERRIDE into pA could have
unexpected side effects. (Whether these unexpected side effects would
be exploitable is an open question.) I've therefore taken the more
paranoid route. We can revisit this later.
An alternative would be to require PR_SET_NO_NEW_PRIVS before setting
ambient capabilities. I think that this would be annoying and would
make granting otherwise unprivileged users minor ambient capabilities
(CAP_NET_BIND_SERVICE or CAP_NET_RAW for example) much less useful than
it is with this patch.
===== Footnotes =====
[1] Files that are missing the "security.capability" xattr or that have
unrecognized values for that xattr end up with has_cap set to false.
The code that does that appears to be complicated for no good reason.
[2] The libcap capability mask parsers and formatters are dangerously
misleading and the documentation is flat-out wrong. fE is *not* a mask;
it's a single bit. This has probably confused every single person who
has tried to use file capabilities.
[3] Linux very confusingly processes both the script and the interpreter
if applicable, for reasons that elude me. The results from thinking
about a script's file capabilities and/or setuid bits are mostly
discarded.
Preliminary userspace code is here, but it needs updating:
https://git.kernel.org/cgit/linux/kernel/git/luto/util-linux-playground.git/commit/?h=cap_ambient&id=7f5afbd175d2
Here is a test program that can be used to verify the functionality
(from Christoph):
/*
* Test program for the ambient capabilities. This program spawns a shell
* that allows running processes with a defined set of capabilities.
*
* (C) 2015 Christoph Lameter <cl@linux.com>
* Released under: GPL v3 or later.
*
*
* Compile using:
*
* gcc -o ambient_test ambient_test.o -lcap-ng
*
* This program must have the following capabilities to run properly:
* Permissions for CAP_NET_RAW, CAP_NET_ADMIN, CAP_SYS_NICE
*
* A command to equip the binary with the right caps is:
*
* setcap cap_net_raw,cap_net_admin,cap_sys_nice+p ambient_test
*
*
* To get a shell with additional caps that can be inherited by other processes:
*
* ./ambient_test /bin/bash
*
*
* Verifying that it works:
*
* From the bash spawed by ambient_test run
*
* cat /proc/$$/status
*
* and have a look at the capabilities.
*/
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <cap-ng.h>
#include <sys/prctl.h>
#include <linux/capability.h>
/*
* Definitions from the kernel header files. These are going to be removed
* when the /usr/include files have these defined.
*/
#define PR_CAP_AMBIENT 47
#define PR_CAP_AMBIENT_IS_SET 1
#define PR_CAP_AMBIENT_RAISE 2
#define PR_CAP_AMBIENT_LOWER 3
#define PR_CAP_AMBIENT_CLEAR_ALL 4
static void set_ambient_cap(int cap)
{
int rc;
capng_get_caps_process();
rc = capng_update(CAPNG_ADD, CAPNG_INHERITABLE, cap);
if (rc) {
printf("Cannot add inheritable cap\n");
exit(2);
}
capng_apply(CAPNG_SELECT_CAPS);
/* Note the two 0s at the end. Kernel checks for these */
if (prctl(PR_CAP_AMBIENT, PR_CAP_AMBIENT_RAISE, cap, 0, 0)) {
perror("Cannot set cap");
exit(1);
}
}
int main(int argc, char **argv)
{
int rc;
set_ambient_cap(CAP_NET_RAW);
set_ambient_cap(CAP_NET_ADMIN);
set_ambient_cap(CAP_SYS_NICE);
printf("Ambient_test forking shell\n");
if (execv(argv[1], argv + 1))
perror("Cannot exec");
return 0;
}
Signed-off-by: Christoph Lameter <cl@linux.com> # Original author
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Aaron Jones <aaronmdjones@gmail.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Andrew G. Morgan <morgan@kernel.org>
Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
Cc: Austin S Hemmelgarn <ahferroin7@gmail.com>
Cc: Markku Savela <msa@moth.iki.fi>
Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: James Morris <james.l.morris@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
ocfs2_file_write_iter() is usng the wrong return value ('written'). This
will cause ocfs2_rw_unlock() be called both in write_iter & end_io,
triggering a BUG_ON.
This issue was introduced by commit 7da839c47589 ("ocfs2: use
__generic_file_write_iter()").
Orabug: 21612107
Fixes: 7da839c47589 ("ocfs2: use __generic_file_write_iter()")
Signed-off-by: Ryan Ding <ryan.ding@oracle.com>
Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This reverts commit f895c53f8ace3c3e49ebf9def90e63fc6d46d2bf.
This commit causes a NFSv4 regression in that close()+unlink() can end
up failing. The reason is that we no longer have a guarantee that the
CLOSE has completed on the server, meaning that the subsequent call to
REMOVE may fail with NFS4ERR_FILE_OPEN if the server implements Windows
unlink() semantics.
Reported-by: <Olga Kornievskaia <aglo@umich.edu>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If there is no cached data, then there is no need to track the file
change attribute on close.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
'regmap/topic/force-update' into regmap-next
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"The major work includes fixing and enhancing the existing extent_cache
feature, which has been well settling down so far and now it becomes a
default mount option accordingly.
Also, this version newly registers a f2fs memory shrinker to reclaim
several objects consumed by a couple of data structures in order to
avoid memory pressures.
Another new feature is to add ioctl(F2FS_GARBAGE_COLLECT) which
triggers a cleaning job explicitly by users.
Most of the other patches are to fix bugs occurred in the corner cases
across the whole code area"
* tag 'for-f2fs-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (85 commits)
f2fs: upset segment_info repair
f2fs: avoid accessing NULL pointer in f2fs_drop_largest_extent
f2fs: update extent tree in batches
f2fs: fix to release inode correctly
f2fs: handle f2fs_truncate error correctly
f2fs: avoid unneeded initializing when converting inline dentry
f2fs: atomically set inode->i_flags
f2fs: fix wrong pointer access during try_to_free_nids
f2fs: use __GFP_NOFAIL to avoid infinite loop
f2fs: lookup neighbor extent nodes for merging later
f2fs: split __insert_extent_tree_ret for readability
f2fs: kill dead code in __insert_extent_tree
f2fs: adjust showing of extent cache stat
f2fs: add largest/cached stat in extent cache
f2fs: fix incorrect mapping for bmap
f2fs: add annotation for space utilization of regular/inline dentry
f2fs: fix to update cached_en of extent tree properly
f2fs: fix typo
f2fs: check the node block address of newly allocated nid
f2fs: go out for insert_inode_locked failure
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm
Pull dlm updates from David Teigland:
"This set mainly includes a change to the way the dlm uses the SCTP API
in the kernel, removing the direct dependency on the sctp module.
Other odd SCTP-related fixes are also included.
The other notable fix is for a long standing regression in the
behavior of lock value blocks for user space locks"
* tag 'dlm-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
dlm: print error from kernel_sendpage
dlm: fix lvb copy for user locks
dlm: sctp_accept_from_sock() can be static
dlm: fix reconnecting but not sending data
dlm: replace BUG_ON with a less severe handling
dlm: use sctp 1-to-1 API
dlm: fix not reconnecting on connecting error handling
dlm: fix race while closing connections
dlm: fix connection stealing if using SCTP
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Pretty much all bug fixes and clean ups for 4.3, after a lot of
features and other churn going into 4.2"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
Revert "ext4: remove block_device_ejected"
ext4: ratelimit the file system mounted message
ext4: silence a format string false positive
ext4: simplify some code in read_mmp_block()
ext4: don't manipulate recovery flag when freezing no-journal fs
jbd2: limit number of reserved credits
ext4 crypto: remove duplicate header file
ext4: update c/mtime on truncate up
jbd2: avoid infinite loop when destroying aborted journal
ext4, jbd2: add REQ_FUA flag when recording an error in the superblock
ext4 crypto: fix spelling typo in comment
ext4 crypto: exit cleanly if ext4_derive_key_aes() fails
ext4: reject journal options for ext2 mounts
ext4: implement cgroup writeback support
ext4: replace ext4_io_submit->io_op with ->io_wbc
ext4 crypto: check for too-short encrypted file names
ext4 crypto: use a jbd2 transaction when adding a crypto policy
jbd2: speedup jbd2_journal_dirty_metadata()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3 removal, quota & udf fixes from Jan Kara:
"The biggest change in the pull is the removal of ext3 filesystem
driver (~28k lines removed). Ext4 driver is a full featured
replacement these days and both RH and SUSE use it for several years
without issues. Also there are some workarounds in VM & block layer
mainly for ext3 which we could eventually get rid of.
Other larger change is addition of proper error handling for
dquot_initialize(). The rest is small fixes and cleanups"
[ I wasn't convinced about the ext3 removal and worried about things
falling through the cracks for legacy users, but ext4 maintainers
piped up and were all unanimously in favor of removal, and maintaining
all legacy ext3 support inside ext4. - Linus ]
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Don't modify filesystem for read-only mounts
quota: remove an unneeded condition
ext4: memory leak on error in ext4_symlink()
mm/Kconfig: NEED_BOUNCE_POOL: clean-up condition
ext4: Improve ext4 Kconfig test
block: Remove forced page bouncing under IO
fs: Remove ext3 filesystem driver
doc: Update doc about journalling layer
jfs: Handle error from dquot_initialize()
reiserfs: Handle error from dquot_initialize()
ocfs2: Handle error from dquot_initialize()
ext4: Handle error from dquot_initialize()
ext2: Handle error from dquot_initalize()
quota: Propagate error from ->acquire_dquot()
|
|
Merge hpfs upddate from Mikulas Patocka.
* emailed patches from Mikulas Patocka <mikulas@twibright.com>:
hpfs: update ctime and mtime on directory modification
hpfs: support hotfixes
|
|
Update ctime and mtime when a directory is modified. (though OS/2 doesn't
update them anyway)
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@kernel.org # v3.3+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
When the OS/2 driver hits a disk write error, it writes the sector to
another location and adds the sector mapping to the hotfix map.
This patch makes the hpfs driver understand the hotfix map and remap
accesses accoring to it.
Signed-off-by: Mikulas Patocka <mikulas@twibright.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
When I/O cannot complete due to a fatal error on the DS, ensure that we
invalidate the corresponding layout segment and return it.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Pull core block updates from Jens Axboe:
"This first core part of the block IO changes contains:
- Cleanup of the bio IO error signaling from Christoph. We used to
rely on the uptodate bit and passing around of an error, now we
store the error in the bio itself.
- Improvement of the above from myself, by shrinking the bio size
down again to fit in two cachelines on x86-64.
- Revert of the max_hw_sectors cap removal from a revision again,
from Jeff Moyer. This caused performance regressions in various
tests. Reinstate the limit, bump it to a more reasonable size
instead.
- Make /sys/block/<dev>/queue/discard_max_bytes writeable, by me.
Most devices have huge trim limits, which can cause nasty latencies
when deleting files. Enable the admin to configure the size down.
We will look into having a more sane default instead of UINT_MAX
sectors.
- Improvement of the SGP gaps logic from Keith Busch.
- Enable the block core to handle arbitrarily sized bios, which
enables a nice simplification of bio_add_page() (which is an IO hot
path). From Kent.
- Improvements to the partition io stats accounting, making it
faster. From Ming Lei.
- Also from Ming Lei, a basic fixup for overflow of the sysfs pending
file in blk-mq, as well as a fix for a blk-mq timeout race
condition.
- Ming Lin has been carrying Kents above mentioned patches forward
for a while, and testing them. Ming also did a few fixes around
that.
- Sasha Levin found and fixed a use-after-free problem introduced by
the bio->bi_error changes from Christoph.
- Small blk cgroup cleanup from Viresh Kumar"
* 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits)
blk: Fix bio_io_vec index when checking bvec gaps
block: Replace SG_GAPS with new queue limits mask
block: bump BLK_DEF_MAX_SECTORS to 2560
Revert "block: remove artifical max_hw_sectors cap"
blk-mq: fix race between timeout and freeing request
blk-mq: fix buffer overflow when reading sysfs file of 'pending'
Documentation: update notes in biovecs about arbitrarily sized bios
block: remove bio_get_nr_vecs()
fs: use helper bio_add_page() instead of open coding on bi_io_vec
block: kill merge_bvec_fn() completely
md/raid5: get rid of bio_fits_rdev()
md/raid5: split bio for chunk_aligned_read
block: remove split code in blkdev_issue_{discard,write_same}
btrfs: remove bio splitting and merge_bvec_fn() calls
bcache: remove driver private bio splitting code
block: simplify bio_add_page()
block: make generic_make_request handle arbitrarily sized bios
blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL)
block: don't access bio->bi_error after bio_put()
block: shrink struct bio down to 2 cache lines again
...
|
|
We have observed the server sending recalls for delegation stateids
that have already been successfully returned. Change
nfsd4_cb_recall_done() to return success if the client has returned
the delegation. While this does not completely eliminate the sending
of recalls for delegations that have already been returned, this
does prevent unnecessarily declaring the callback path to be down.
Reported-by: Eric Meddaugh <etmsys@rit.edu>
Signed-off-by: Andrew Elble <aweits@rit.edu>
Acked-by: Jeff Layton <jlayton@poochiereds.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
"The usual stuff from trivial tree for 4.3 (kerneldoc updates, printk()
fixes, Documentation and MAINTAINERS updates)"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (28 commits)
MAINTAINERS: update my e-mail address
mod_devicetable: add space before */
scsi: a100u2w: trivial typo in printk
i2c: Fix typo in i2c-bfin-twi.c
treewide: fix typos in comment blocks
Doc: fix trivial typo in SubmittingPatches
proportions: Spelling s/consitent/consistent/
dm: Spelling s/consitent/consistent/
aic7xxx: Fix typo in error message
pcmcia: Fix typo in locking documentation
scsi/arcmsr: Fix typos in error log
drm/nouveau/gr: Fix typo in nv10.c
[SCSI] Fix printk typos in drivers/scsi
staging: comedi: Grammar s/Enable support a/Enable support for a/
Btrfs: Spelling s/consitent/consistent/
README: GTK+ is a acronym
ASoC: omap: Fix typo in config option description
mm: tlb.c: Fix error message
ntfs: super.c: Fix error log
fix typo in Documentation/SubmittingPatches
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace updates from Eric Biederman:
"This finishes up the changes to ensure proc and sysfs do not start
implementing executable files, as the there are application today that
are only secure because such files do not exist.
It akso fixes a long standing misfeature of /proc/<pid>/mountinfo that
did not show the proper source for files bind mounted from
/proc/<pid>/ns/*.
It also straightens out the handling of clone flags related to user
namespaces, fixing an unnecessary failure of unshare(CLONE_NEWUSER)
when files such as /proc/<pid>/environ are read while <pid> is calling
unshare. This winds up fixing a minor bug in unshare flag handling
that dates back to the first version of unshare in the kernel.
Finally, this fixes a minor regression caused by the introduction of
sysfs_create_mount_point, which broke someone's in house application,
by restoring the size of /sys/fs/cgroup to 0 bytes. Apparently that
application uses the directory size to determine if a tmpfs is mounted
on /sys/fs/cgroup.
The bind mount escape fixes are present in Al Viros for-next branch.
and I expect them to come from there. The bind mount escape is the
last of the user namespace related security bugs that I am aware of"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
fs: Set the size of empty dirs to 0.
userns,pidns: Force thread group sharing, not signal handler sharing.
unshare: Unsharing a thread does not require unsharing a vm
nsfs: Add a show_path method to fix mountinfo
mnt: fs_fully_visible enforce noexec and nosuid if !SB_I_NOEXEC
vfs: Commit to never having exectuables on proc and sysfs.
|
|
The return value from scnprintf always less than the buffer length.
So, result >= len always false. This patch removes those checking.
int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
{
int i;
i = vsnprintf(buf, size, fmt, args);
if (likely(i < size))
return i;
if (size != 0)
return size - 1;
return 0;
}
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
The length of "Linux NFSv4.0 " is 14, not 10.
Without this patch, I get a truncated client owner id as,
"Linux NFSv4.0 ::1/::1"
With this patch,
"Linux NFSv4.0 ::1/::1 tcp"
Fixes: a319268891 ("nfs: make nfs4_init_nonuniform_client_string use a dynamically allocated buffer")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If a read-write layout has an invalid mirror, then we should
mark it as invalid, and return it.
If a read-only layout has an invalid mirror, then mark it as invalid
and check if there is still at least one valid mirror before we return
it.
Note: Also fix incorrect use of pnfs_generic_mark_devid_invalid().
We really want nfs4_mark_deviceid_unavailable().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Unlike read layouts, the writeable layout cannot fall back to using only
one of the mirrors. It need to write to all of them.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Unlike the files layout, flexfiles does not test for the NFS_DEVICEID_INVALID
flag. Instead it relies on NFS_DEVICEID_UNAVAILABLE.
Fix is to replace with nfs4_mark_deviceid_unavailable().
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
upset segment_info like this:
276000|161 0|0 4|70 3|0 3|0 0|0 0|91 4|0 4|232 4|39
276104|0 4|0 4|1 4|0 4|0 4|280 4|0 4|42 4|262 4|38
276204|179 4|89 4|39 4|24 4|0 4|96 4|3 4|428 4|0 4|118
276304|112 4|97 4|0 4|0 4|0 4|68 4|0 4|0 4|86 4|138
276404|0 4|0 0|166 5|39 4|101 0|111
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Mirrors are now shared objects, so we should not be freeing them directly
inside ff_layout_free_lseg(). We should already be doing the right thing
in _ff_layout_free_lseg(), so just let it handle things.
Also ensure that ff_layout_free_mirror() frees the RPC credential if it
is set.
Fixes: 28a0d72c6867 ("Add refcounting to struct nfs4_ff_layout_mirror")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Somebody with a Solaris client was hitting this case. We haven't
figured out why yet, and don't have a reproducer. Meanwhile Frank
noticed that RFC 7530 actually recommends CLID_INUSE for this case.
Unlikely to help the original reporter, but may as well fix it.
Reported-by: Frank Filz <ffilzlnx@mindspring.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
It's possible that a DELEGRETURN could race with (e.g.) client expiry,
in which case we could end up putting the delegation hash reference more
than once.
Have unhash_delegation_locked return a bool that indicates whether it
was already unhashed. In the case of destroy_delegation we only
conditionally put the hash reference if that returns true.
The other callers of unhash_delegation_locked call it while walking
list_heads that shouldn't yet be detached. If we find that it doesn't
return true in those cases, then throw a WARN_ON as that indicates that
we have a partially hashed delegation, and that something is likely very
wrong.
Tested-by: Andrew W Elble <aweits@rit.edu>
Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
When an open or lock stateid is hashed, we take an extra reference to
it. When we unhash it, we drop that reference. The code however does
not properly account for the case where we have two callers concurrently
trying to unhash the stateid. This can lead to list corruption and the
hash reference being put more than once.
Fix this by having unhash_ol_stateid use list_del_init on the st_perfile
list_head, and then testing to see if that list_head is empty before
releasing the hash reference. This means that some of the unhashing
wrappers now become bool return functions so we can test to see whether
the stateid was unhashed before we put the reference.
Reported-by: Andrew W Elble <aweits@rit.edu>
Tested-by: Andrew W Elble <aweits@rit.edu>
Reported-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
We can potentially have several nfs4_laundromat jobs running if there
are multiple namespaces running nfsd on the box. Those are effectively
separated from one another though, so I don't see any reason to
serialize them.
Also, create_singlethread_workqueue automatically adds the
WQ_MEM_RECLAIM flag. Since we run this job on a timer, it's not really
involved in any reclaim paths. I see no need for a rescuer thread.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
These messages, combined with the backtrace they trigger, makes it seem
like a serious problem, though a quick search shows distros marking
it as a "won't fix" non-issue when the problem is reported by users.
The backtrace is overkill, and only really manages to show that if
you follow the code path, you can't really avoid it with bootargs
or configuration settings in the container.
Given that, lets tone it down a bit and get rid of the WARN severity,
and the associated backtrace, so people aren't needlessly alarmed.
Also, lets drop the split printk line, since they are grep unfriendly.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Fix kernel-doc warnings in fs/locks.c:
Warning(..//fs/locks.c:1577): No description found for parameter 'flags'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
|
|
Security label can be set in OPEN/CREATE request, nfsd should set
the bitmask in word2 if setting success.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
According to rfc5661 18.16.4,
"If EXCLUSIVE4_1 was used, the client determines the attributes
used for the verifier by comparing attrset with cva_attrs.attrmask;"
So, EXCLUSIVE4_1 also needs those bitmask used to store the verifier.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
The encode order should be as the bitmask defined order.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
Currently we'll respond correctly to a request for either
FS_LAYOUT_TYPES or LAYOUT_TYPES, but not to a request for both
attributes simultaneously.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
After commit ae7095a7c4 (nfsd4: helper function for getting mounted_on
ino) we ignore the return value from get_parent_attributes().
Also, the following FATTR4_WORD2_LAYOUT_BLKSIZE uses stat.blksize, so to
avoid overwriting that, use an independent value for the parent's
attributes.
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver patches from Greg KH:
"Here's the "big" char/misc driver update for 4.3-rc1.
Not much really interesting here, just a number of little changes all
over the place, and some nice consolidation of the nvmem drivers to a
common framework. As usual, the mei drivers stand out as the largest
"churn" to handle new devices and features in their hardware.
All have been in linux-next for a while with no issues"
* tag 'char-misc-4.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits)
auxdisplay: ks0108: initialize local parport variable
extcon: palmas: Fix build break due to devm_gpiod_get_optional API change
extcon: palmas: Support GPIO based USB ID detection
extcon: Fix signedness bugs about break error handling
extcon: Drop owner assignment from i2c_driver
extcon: arizona: Simplify pdata symantics for micd_dbtime
extcon: arizona: Declare 3-pole jack if we detect open circuit on mic
extcon: Add exception handling to prevent the NULL pointer access
extcon: arizona: Ensure variables are set for headphone detection
extcon: arizona: Use gpiod inteface to handle micd_pol_gpio gpio
extcon: arizona: Add basic microphone detection DT/ACPI bindings
extcon: arizona: Update to use the new device properties API
extcon: palmas: Remove the mutually_exclusive array
extcon: Remove optional print_state() function pointer of struct extcon_dev
extcon: Remove duplicate header file in extcon.h
extcon: max77843: Clear IRQ bits state before request IRQ
toshiba laptop: replace ioremap_cache with ioremap
misc: eeprom: max6875: clean up max6875_read()
misc: eeprom: clean up eeprom_read()
misc: eeprom: 93xx46: clean up eeprom_93xx46_bin_read/write
...
|
|
If we have a read layout, then sanity check the minimal layout length
so that it does not extend beyond the end of file.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
According to RFC5661 section 18.43.3, if the server cannot satisfy
the loga_minlength argument to LAYOUTGET, there are 2 cases:
1) If loga_minlength == 0, it returns NFS4ERR_LAYOUTTRYLATER
2) If loga_minlength != 0, it returns NFS4ERR_BADLAYOUT
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
According to RFC5661 Section 18.2.4, CLOSE is supposed to return
the zero stateid. This means that nfs_clear_open_stateid_locked()
cannot assume that the result stateid will always match the 'other'
field of the existing open stateid when trying to determine a race
with a parallel OPEN.
Instead, we look at the argument, and check for matches.
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If the file was fenced and/or has been deleted on the DS, then we want
to retry pNFS after a layoutreturn with error report. If the server
cannot fix the problem, then we rely on it to tell us so in the
response to the LAYOUTGET.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|
If extent cache is disable, we will encounter oops when triggering direct
IO as below:
BUG: unable to handle kernel NULL pointer dereference at 0000000c
IP: [<f0b9c61e>] f2fs_drop_largest_extent+0xe/0x30 [f2fs]
*pdpt = 000000002bb9a001 *pde = 0000000000000000
Oops: 0000 [#1] SMP
Modules linked in: f2fs(O) fuse bnep rfcomm bluetooth nfsd dm_crypt nfs_acl auth_rpcgss oid_registry nfs binfmt_misc fscache lockd
sunrpc grace snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer
snd_seq_device snd soundcore joydev psmouse hid_generic i2c_piix4 serio_raw ppdev mac_hid parport_pc lp parport ext4 jbd2 mbcache
usbhid hid e1000
CPU: 3 PID: 3608 Comm: dd Tainted: G O 4.2.0-rc4 #12
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
task: ef161600 ti: ebd5e000 task.ti: ebd5e000
EIP: 0060:[<f0b9c61e>] EFLAGS: 00010202 CPU: 3
EIP is at f2fs_drop_largest_extent+0xe/0x30 [f2fs]
EAX: 00000000 EBX: ddebc000 ECX: 00000000 EDX: 00000000
ESI: ebd5fdf8 EDI: 00000000 EBP: ebd5fd58 ESP: ebd5fd58
DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
CR0: 80050033 CR2: 0000000c CR3: 2c24ee40 CR4: 000006f0
Stack:
ebd5fda4 f0b8c005 00000000 00000001 00000000 f0b8c430 c816cd68 ddebc000
ddebc088 00001000 00000555 00000555 ffffffff c160bb00 00055501 00000000
00000000 00000100 00000000 ebd5fe20 f0b8c430 00000046 ef161600 00001000
Call Trace:
[<f0b8c005>] __allocate_data_block+0x1a5/0x260 [f2fs]
[<f0b8c430>] ? f2fs_direct_IO+0x370/0x440 [f2fs]
[<c160bb00>] ? down_read+0x30/0x50
[<f0b8c430>] f2fs_direct_IO+0x370/0x440 [f2fs]
[<c113e115>] generic_file_direct_write+0xa5/0x260
[<c10b53f8>] ? current_fs_time+0x18/0x50
[<c113e38b>] __generic_file_write_iter+0xbb/0x210
[<c113e50f>] ? generic_file_write_iter+0x2f/0x320
[<c113e63c>] generic_file_write_iter+0x15c/0x320
[<f0b77f29>] f2fs_file_write_iter+0x39/0x80 [f2fs]
[<c11984d9>] __vfs_write+0xa9/0xe0
[<c1199227>] vfs_write+0x97/0x180
[<c119955b>] SyS_write+0x5b/0xd0
[<c160dcd0>] sysenter_do_call+0x12/0x12
Code: 10 8b 50 1c 89 53 14 eb ca 8d 74 26 00 85 f6 74 86 eb a6 0f 0b 90 8d b4 26 00 00 00 00 55 89 e5 3e 8d 74 26 00 8b 80 d4 02 00
00 <8b> 48 0c 39 d1 77 0e 03 48 14 39 ca 73 07 c7 40 14 00 00 00 00
EIP: [<f0b9c61e>] f2fs_drop_largest_extent+0xe/0x30 [f2fs] SS:ESP 0068:ebd5fd58
CR2: 000000000000000c
---[ end trace a38c07026a1afffd ]---
This is because when extent cache is disable, extent_tree pointer in struct
f2fs_inode_info should be NULL, but in f2fs_drop_largest_extent we access
this NULL pointer directly without checking state of extent cache, then,
the oops occurs. Let's fix it by checking state of extent cache before
accessing.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
As the code stands today, if xfs_trans_reserve() fails, we
goto out_dqrele, which does not free the allocated transaction.
Fix up the goto targets to undo everything properly.
Addresses-Coverity-Id: 145571
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|