summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2017-05-08NFS append COMMIT after synchronous COPYOlga Kornievskaia
Instead of messing with the commit path which has been causing issues, add a COMMIT op after the COPY and ask for stable copies in the first space. It saves a round trip, since after the COPY, the client sends a COMMIT anyway. Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-08lockd: fix lockd shutdown raceJ. Bruce Fields
As reported by David Jeffery: "a signal was sent to lockd while lockd was shutting down from a request to stop nfs. The signal causes lockd to call restart_grace() which puts the lockd_net structure on the grace list. If this signal is received at the wrong time, it will occur after lockd_down_net() has called locks_end_grace() but before lockd_down_net() stops the lockd thread. This leads to lockd putting the lockd_net structure back on the grace list, then exiting without anything removing it from the list." So, perform the final locks_end_grace() from the the lockd thread; this ensures it's serialized with respect to restart_grace(). Reported-by: David Jeffery <djeffery@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-05-08Merge tag 'for-f2fs-4.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've focused on enhancing performance with regards to block allocation, GC, and discard/in-place-update IO controls. There are a bunch of clean-ups as well as minor bug fixes. Enhancements: - disable heap-based allocation by default - issue small-sized discard commands by default - change the policy of data hotness for logging - distinguish IOs in terms of size and wbc type - start SSR earlier to avoid foreground GC - enhance data structures managing discard commands - enhance in-place update flow - add some more fault injection routines - secure one more xattr entry Bug fixes: - calculate victim cost for GC correctly - remain correct victim segment number for GC - race condition in nid allocator and initializer - stale pointer produced by atomic_writes - fix missing REQ_SYNC for flush commands - handle missing errors in more corner cases" * tag 'for-f2fs-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (111 commits) f2fs: fix a mount fail for wrong next_scan_nid f2fs: enhance scalability of trace macro f2fs: relocate inode_{,un}lock in F2FS_IOC_SETFLAGS f2fs: Make flush bios explicitely sync f2fs: show available_nids in f2fs/status f2fs: flush dirty nats periodically f2fs: introduce CP_TRIMMED_FLAG to avoid unneeded discard f2fs: allow cpc->reason to indicate more than one reason f2fs: release cp and dnode lock before IPU f2fs: shrink size of struct discard_cmd f2fs: don't hold cmd_lock during waiting discard command f2fs: nullify fio->encrypted_page for each writes f2fs: sanity check segment count f2fs: introduce valid_ipu_blkaddr to clean up f2fs: lookup extent cache first under IPU scenario f2fs: reconstruct code to write a data page f2fs: introduce __wait_discard_cmd f2fs: introduce __issue_discard_cmd f2fs: enable small discard by default f2fs: delay awaking discard thread ...
2017-05-08ubifs: Fix a typo in comment of ioctl2ubifs & ubifs2ioctlRock Lee
Change 'convert' to 'converts' Change 'UBIFS' to 'UBIFS inode flags' Signed-off-by: Rock Lee <rockdotlee@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-05-08ubifs: Remove unnecessary assignmentStefan Agner
Assigning a value of a variable to itself is not useful. Signed-off-by: Stefan Agner <stefan@agner.ch> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-05-08ubifs: Fix cut and paste error on sb type comparisonsColin Ian King
The check for the bad node type of sb->type is checking sa->type and not sb-type. This looks like a cut and paste error. Fix this. Detected by PVS-Studio, warning: V581 Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-05-08ubifs: Add CONFIG_UBIFS_FS_SECURITY to disable/enable security labelsHyunchul Lee
When write syscall is called, every time security label is searched to determine that file's privileges should be changed. If LSM(Linux Security Model) is not used, this is useless. So introduce CONFIG_UBIFS_SECURITY to disable security labels. it's default value is "y". Signed-off-by: Hyunchul Lee <cheol.lee@lge.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2017-05-08Merge tag 'fscrypt_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt Pull fscrypt updates from Ted Ts'o: "Only bug fixes and cleanups for this merge window" * tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: fscrypt: correct collision claim for digested names MAINTAINERS: fscrypt: update mailing list, patchwork, and git ext4: clean up ext4_match() and callers f2fs: switch to using fscrypt_match_name() ext4: switch to using fscrypt_match_name() fscrypt: introduce helper function for filename matching fscrypt: avoid collisions when presenting long encrypted filenames f2fs: check entire encrypted bigname when finding a dentry ubifs: check for consistent encryption contexts in ubifs_lookup() f2fs: sync f2fs_lookup() with ext4_lookup() ext4: remove "nokey" check from ext4_lookup() fscrypt: fix context consistency check when key(s) unavailable fscrypt: Remove __packed from fscrypt_policy fscrypt: Move key structure and constants to uapi fscrypt: remove fscrypt_symlink_data_len() fscrypt: remove unnecessary checks for NULL operations
2017-05-08Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: - add GETFSMAP support - some performance improvements for very large file systems and for random write workloads into a preallocated file - bug fixes and cleanups. * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: jbd2: cleanup write flags handling from jbd2_write_superblock() ext4: mark superblock writes synchronous for nobarrier mounts ext4: inherit encryption xattr before other xattrs ext4: replace BUG_ON with WARN_ONCE in ext4_end_bio() ext4: avoid unnecessary transaction stalls during writeback ext4: preload block group descriptors ext4: make ext4_shutdown() static ext4: support GETFSMAP ioctls vfs: add common GETFSMAP ioctl definitions ext4: evict inline data when writing to memory map ext4: remove ext4_xattr_check_entry() ext4: rename ext4_xattr_check_names() to ext4_xattr_check_entries() ext4: merge ext4_xattr_list() into ext4_listxattr() ext4: constify static data that is never modified ext4: trim return value and 'dir' argument from ext4_insert_dentry() jbd2: fix dbench4 performance regression for 'nobarrier' mounts jbd2: Fix lockdep splat with generic/270 test mm: retry writepages() on ENOMEM when doing an data integrity writeback
2017-05-08block, dax: move "select DAX" from BLOCK to FS_DAXDan Williams
For configurations that do not enable DAX filesystems or drivers, do not require the DAX core to be built. Given that the 'direct_access' method has been removed from 'block_device_operations', we can also go ahead and remove the block-related dax helper functions from fs/block_dev.c to drivers/dax/super.c. This keeps dax details out of the block layer and lets the DAX core be built as a module in the FS_DAX=n case. Filesystems need to include dax.h to call bdev_dax_supported(). Cc: linux-xfs@vger.kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Jan Kara <jack@suse.com> Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-08NFSv4: Fix exclusive create attributes encodingTrond Myklebust
When using NFS4_CREATE_EXCLUSIVE4_1 mode, the client will overestimate the amount of space that it needs for the attributes because it does so before checking whether or not the server supports a given attribute. Fix by checking the attribute mask earlier. Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-08NFSv4: Fix an rcu lock leakTrond Myklebust
The intention in the original patch was to release the lock when we put the inode, however something got screwed up. Reported-by: Jason Yan <yanaijie@huawei.com> Fixes: 7b410d9ce460f ("pNFS: Delay getting the layout header in..") Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-06Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Various fixes for stable for CIFS/SMB3 especially for better interoperability for SMB3 to Macs. It also includes Pavel's improvements to SMB3 async i/o support (which is much faster now)" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: add misssing SFM mapping for doublequote SMB3: Work around mount failure when using SMB3 dialect to Macs cifs: fix CIFS_IOC_GET_MNT_INFO oops CIFS: fix mapping of SFM_SPACE and SFM_PERIOD CIFS: fix oplock break deadlocks cifs: fix CIFS_ENUMERATE_SNAPSHOTS oops cifs: fix leak in FSCTL_ENUM_SNAPS response handling Set unicode flag on cifs echo request to avoid Mac error CIFS: Add asynchronous write support through kernel AIO CIFS: Add asynchronous read support through kernel AIO CIFS: Add asynchronous context to support kernel AIO cifs: fix IPv6 link local, with scope id, address parsing cifs: small underflow in cnvrtDosUnixTm()
2017-05-06Merge tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull xfs updates from Darrick Wong: "Here are the XFS changes for 4.12. The big new feature for this release is the new space mapping ioctl that we've been discussing since LSF2016, but other than that most of the patches are larger bug fixes, memory corruption prevention, and other cleanups. Summary: - various code cleanups - introduce GETFSMAP ioctl - various refactoring - avoid dio reads past eof - fix memory corruption and other errors with fragmented directory blocks - fix accidental userspace memory corruptions - publish fs uuid in superblock - make fstrim terminatable - fix race between quotaoff and in-core inode creation - avoid use-after-free when finishing up w/ buffer heads - reserve enough space to handle bmap tree resizing during cow remap" * tag 'xfs-4.12-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (53 commits) xfs: fix use-after-free in xfs_finish_page_writeback xfs: reserve enough blocks to handle btree splits when remapping xfs: wait on new inodes during quotaoff dquot release xfs: update ag iterator to support wait on new inodes xfs: support ability to wait on new inodes xfs: publish UUID in struct super_block xfs: Allow user to kill fstrim process xfs: better log intent item refcount checking xfs: fix up quotacheck buffer list error handling xfs: remove xfs_trans_ail_delete_bulk xfs: don't use bool values in trace buffers xfs: fix getfsmap userspace memory corruption while setting OF_LAST xfs: fix __user annotations for xfs_ioc_getfsmap xfs: corruption needs to respect endianess too! xfs: use NULL instead of 0 to initialize a pointer in xfs_ioc_getfsmap xfs: use NULL instead of 0 to initialize a pointer in xfs_getfsmap xfs: simplify validation of the unwritten extent bit xfs: remove unused values from xfs_exntst_t xfs: remove the unused XFS_MAXLINK_1 define xfs: more do_div cleanups ...
2017-05-06Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes and updates from Jens Axboe: "Some fixes and followup features/changes that should go in, in this merge window. This contains: - Two fixes for lightnvm from Javier, fixing problems in the new code merge previously in this merge window. - A fix from Jan for the backing device changes, fixing an issue in NFS that causes a failure to mount on certain setups. - A change from Christoph, cleaning up the blk-mq init and exit request paths. - Remove elevator_change(), which is now unused. From Bart. - A fix for queue operation invocation on a dead queue, from Bart. - A series fixing up mtip32xx for blk-mq scheduling, removing a bandaid we previously had in place for this. From me. - A regression fix for this series, fixing a case where we wait on workqueue flushing from an invalid (non-blocking) context. From me. - A fix/optimization from Ming, ensuring that we don't both quiesce and freeze a queue at the same time. - A fix from Peter on lock ordering for CPU hotplug. Not a real problem right now, but will be once the CPU hotplug rework goes in. - A series from Omar, cleaning up out blk-mq debugfs support, and adding support for exporting info from schedulers in debugfs as well. This is really useful in debugging stalls or livelocks. From Omar" * 'for-linus' of git://git.kernel.dk/linux-block: (28 commits) mq-deadline: add debugfs attributes kyber: add debugfs attributes blk-mq-debugfs: allow schedulers to register debugfs attributes blk-mq: untangle debugfs and sysfs blk-mq: move debugfs declarations to a separate header file blk-mq: Do not invoke queue operations on a dead queue blk-mq-debugfs: get rid of a bunch of boilerplate blk-mq-debugfs: rename hw queue directories from <n> to hctx<n> blk-mq-debugfs: don't open code strstrip() blk-mq-debugfs: error on long write to queue "state" file blk-mq-debugfs: clean up flag definitions blk-mq-debugfs: separate flags with | nfs: Fix bdi handling for cloned superblocks block/mq: Cure cpu hotplug lock inversion lightnvm: fix bad back free on error path lightnvm: create cmd before allocating request blk-mq: don't use sync workqueue flushing from drivers mtip32xx: convert internal commands to regular block infrastructure mtip32xx: cleanup internal tag assumptions block: don't call blk_mq_quiesce_queue() after queue is frozen ...
2017-05-05Merge tag 'libnvdimm-for-4.12' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: "The bulk of this has been in multiple -next releases. There were a few late breaking fixes and small features that got added in the last couple days, but the whole set has received a build success notification from the kbuild robot. Change summary: - Region media error reporting: A libnvdimm region device is the parent to one or more namespaces. To date, media errors have been reported via the "badblocks" attribute attached to pmem block devices for namespaces in "raw" or "memory" mode. Given that namespaces can be in "device-dax" or "btt-sector" mode this new interface reports media errors generically, i.e. independent of namespace modes or state. This subsequently allows userspace tooling to craft "ACPI 6.1 Section 9.20.7.6 Function Index 4 - Clear Uncorrectable Error" requests and submit them via the ioctl path for NVDIMM root bus devices. - Introduce 'struct dax_device' and 'struct dax_operations': Prompted by a request from Linus and feedback from Christoph this allows for dax capable drivers to publish their own custom dax operations. This fixes the broken assumption that all dax operations are related to a persistent memory device, and makes it easier for other architectures and platforms to add customized persistent memory support. - 'libnvdimm' core updates: A new "deep_flush" sysfs attribute is available for storage appliance applications to manually trigger memory controllers to drain write-pending buffers that would otherwise be flushed automatically by the platform ADR (asynchronous-DRAM-refresh) mechanism at a power loss event. Support for "locked" DIMMs is included to prevent namespaces from surfacing when the namespace label data area is locked. Finally, fixes for various reported deadlocks and crashes, also tagged for -stable. - ACPI / nfit driver updates: General updates of the nfit driver to add DSM command overrides, ACPI 6.1 health state flags support, DSM payload debug available by default, and various fixes. Acknowledgements that came after the branch was pushed: - commmit 565851c972b5 "device-dax: fix sysfs attribute deadlock": Tested-by: Yi Zhang <yizhan@redhat.com> - commit 23f498448362 "libnvdimm: rework region badblocks clearing" Tested-by: Toshi Kani <toshi.kani@hpe.com>" * tag 'libnvdimm-for-4.12' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (52 commits) libnvdimm, pfn: fix 'npfns' vs section alignment libnvdimm: handle locked label storage areas libnvdimm: convert NDD_ flags to use bitops, introduce NDD_LOCKED brd: fix uninitialized use of brd->dax_dev block, dax: use correct format string in bdev_dax_supported device-dax: fix sysfs attribute deadlock libnvdimm: restore "libnvdimm: band aid btt vs clear poison locking" libnvdimm: fix nvdimm_bus_lock() vs device_lock() ordering libnvdimm: rework region badblocks clearing acpi, nfit: kill ACPI_NFIT_DEBUG libnvdimm: fix clear length of nvdimm_forget_poison() libnvdimm, pmem: fix a NULL pointer BUG in nd_pmem_notify libnvdimm, region: sysfs trigger for nvdimm_flush() libnvdimm: fix phys_addr for nvdimm_clear_poison x86, dax, pmem: remove indirection around memcpy_from_pmem() block: remove block_device_operations ->direct_access() block, dax: convert bdev_dax_supported() to dax_direct_access() filesystem-dax: convert to dax_direct_access() Revert "block: use DAX for partition table reads" ext2, ext4, xfs: retrieve dax_device for iomap operations ...
2017-05-05Merge tag 'gfs2-4.12.fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 updates from Bob Peterson: "We've got ten GFS2 patches for this merge window. - Andreas Gruenbacher wrote a patch to replace the deprecated call to rhashtable_walk_init with rhashtable_walk_enter. - Andreas also wrote a patch to eliminate redundant code in two of our debugfs sequence files. - Andreas also cleaned up the rhashtable key ugliness Linus pointed out during this cycle, following Linus's suggestions. - Andreas also wrote a patch to take advantage of his new function rhashtable_lookup_get_insert_fast. This makes glock lookup faster and more bullet-proof. - Andreas also wrote a patch to revert a patch in the evict path that caused occasional deadlocks, and is no longer needed. - Andrew Price wrote a patch to re-enable fallocate for the rindex system file to enable gfs2_grow to grow properly on secondary file system grow operations. - I wrote a patch to initialize an inode number field to make certain kernel trace points more understandable. - I also wrote a patch that makes GFS2 file system "withdraw" work more like it should by ignoring operations after a withdraw that would formerly cause a BUG() and kernel panic. - I also reworked the entire truncate/delete algorithm, scrapping the old recursive algorithm in favor of a new non-recursive algorithm. This was done for performance: This way, GFS2 no longer needs to lock multiple resource groups while doing truncates and deletes of files that cross multiple resource group boundaries, allowing for better parallelism. It also solves a problem whereby deleting large files would request a large chunk of kernel memory, which resulted in a get_page_from_freelist warning. - Due to a regression found during testing, I added a new patch to correct 'GFS2: Prevent BUG from occurring when normal Withdraws occur'." * tag 'gfs2-4.12.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: GFS2: Allow glocks to be unlocked after withdraw GFS2: Non-recursive delete gfs2: Re-enable fallocate for the rindex Revert "GFS2: Wait for iopen glock dequeues" gfs2: Switch to rhashtable_lookup_get_insert_fast GFS2: Temporarily zero i_no_addr when creating a dinode gfs2: Don't pack struct lm_lockname gfs2: Deduplicate gfs2_{glocks,glstats}_open gfs2: Replace rhashtable_walk_init with rhashtable_walk_enter GFS2: Prevent BUG from occurring when normal Withdraws occur
2017-05-05Merge tag 'for-linus-4.12-ofs-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "Orangefs cleanups, fixes and statx support. Some cleanups: - remove unused get_fsid_from_ino - fix bounds check for listxattr - clean up oversize xattr validation - do not set getattr_time on orangefs_lookup - return from orangefs_devreq_read quickly if possible - do not wait for timeout if umounting - handle zero size write in debugfs Bug fixes: - do not check possibly stale size on truncate - ensure the userspace component is unmounted if mount fails - total reimplementation of dir.c New feature: - implement statx The new implementation of dir.c is kind of a big deal, all new code. It has been posted to fs-devel during the previous rc period, we didn't get much review or feedback from there, but it has been reviewed very heavily here, so much so that we have two entire versions of the reimplementation. Not only does the new implementation fix some xfstests, but it passes all the new tests we made here that involve seeking and rewinding and giant directories and long file names. The new dir code has three patches itself: - skip forward to the next directory entry if seek is short - invalidate stored directory on seek - count directory pieces correctly" * tag 'for-linus-4.12-ofs-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: count directory pieces correctly orangefs: invalidate stored directory on seek orangefs: skip forward to the next directory entry if seek is short orangefs: handle zero size write in debugfs orangefs: do not wait for timeout if umounting orangefs: return from orangefs_devreq_read quickly if possible orangefs: ensure the userspace component is unmounted if mount fails orangefs: do not check possibly stale size on truncate orangefs: implement statx orangefs: remove ORANGEFS_READDIR macros orangefs: support very large directories orangefs: support llseek on directories orangefs: rewrite readdir to fix several bugs orangefs: do not set getattr_time on orangefs_lookup orangefs: clean up oversize xattr validation orangefs: fix bounds check for listxattr orangefs: remove unused get_fsid_from_ino
2017-05-05Merge tag 'befs-v4.12-rc1' of git://github.com/luisbg/linux-befsLinus Torvalds
Pull befs fix from Luis de Bethencourt: "One fix from Fabian Frederick making the nfs client still work after a cache drop" * tag 'befs-v4.12-rc1' of git://github.com/luisbg/linux-befs: befs: make export work with cold dcache
2017-05-05fs/affs: add rename exchangeFabian Frederick
Process RENAME_EXCHANGE in affs_rename2() adding static affs_xrename() based on affs_rename(). We remove headers from respective directories then affect bh to other inode directory entries for swapping. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-05fs/affs: add rename2 to prepare multiple methodsFabian Frederick
Currently AFFS only supports RENAME_NOREPLACE. This patch isolates that method to a static function to prepare RENAME_EXCHANGE addition. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-05-05GFS2: Allow glocks to be unlocked after withdrawBob Peterson
This bug fixes a regression introduced by patch 0d1c7ae9d8. The intent of the patch was to stop promoting glocks after a file system is withdrawn due to a variety of errors, because doing so results in a BUG(). (You should be able to unmount after a withdraw rather than having the kernel panic.) Unfortunately, it also stopped demotions, so glocks could not be unlocked after withdraw, which means the unmount would hang. This patch allows function do_xmote to demote locks to an unlocked state after a withdraw, but not promote them. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2017-05-05xfs: fix use-after-free in xfs_finish_page_writebackEryu Guan
Commit 28b783e47ad7 ("xfs: bufferhead chains are invalid after end_page_writeback") fixed one use-after-free issue by pre-calculating the loop conditionals before calling bh->b_end_io() in the end_io processing loop, but it assigned 'next' pointer before checking end offset boundary & breaking the loop, at which point the bh might be freed already, and caused use-after-free. This is caught by KASAN when running fstests generic/127 on sub-page block size XFS. [ 2517.244502] run fstests generic/127 at 2017-04-27 07:30:50 [ 2747.868840] ================================================================== [ 2747.876949] BUG: KASAN: use-after-free in xfs_destroy_ioend+0x3d3/0x4e0 [xfs] at addr ffff8801395ae698 ... [ 2747.918245] Call Trace: [ 2747.920975] dump_stack+0x63/0x84 [ 2747.924673] kasan_object_err+0x21/0x70 [ 2747.928950] kasan_report+0x271/0x530 [ 2747.933064] ? xfs_destroy_ioend+0x3d3/0x4e0 [xfs] [ 2747.938409] ? end_page_writeback+0xce/0x110 [ 2747.943171] __asan_report_load8_noabort+0x19/0x20 [ 2747.948545] xfs_destroy_ioend+0x3d3/0x4e0 [xfs] [ 2747.953724] xfs_end_io+0x1af/0x2b0 [xfs] [ 2747.958197] process_one_work+0x5ff/0x1000 [ 2747.962766] worker_thread+0xe4/0x10e0 [ 2747.966946] kthread+0x2d3/0x3d0 [ 2747.970546] ? process_one_work+0x1000/0x1000 [ 2747.975405] ? kthread_create_on_node+0xc0/0xc0 [ 2747.980457] ? syscall_return_slowpath+0xe6/0x140 [ 2747.985706] ? do_page_fault+0x30/0x80 [ 2747.989887] ret_from_fork+0x2c/0x40 [ 2747.993874] Object at ffff8801395ae690, in cache buffer_head size: 104 [ 2748.001155] Allocated: [ 2748.003782] PID = 8327 [ 2748.006411] save_stack_trace+0x1b/0x20 [ 2748.010688] save_stack+0x46/0xd0 [ 2748.014383] kasan_kmalloc+0xad/0xe0 [ 2748.018370] kasan_slab_alloc+0x12/0x20 [ 2748.022648] kmem_cache_alloc+0xb8/0x1b0 [ 2748.027024] alloc_buffer_head+0x22/0xc0 [ 2748.031399] alloc_page_buffers+0xd1/0x250 [ 2748.035968] create_empty_buffers+0x30/0x410 [ 2748.040730] create_page_buffers+0x120/0x1b0 [ 2748.045493] __block_write_begin_int+0x17a/0x1800 [ 2748.050740] iomap_write_begin+0x100/0x2f0 [ 2748.055308] iomap_zero_range_actor+0x253/0x5c0 [ 2748.060362] iomap_apply+0x157/0x270 [ 2748.064347] iomap_zero_range+0x5a/0x80 [ 2748.068624] iomap_truncate_page+0x6b/0xa0 [ 2748.073227] xfs_setattr_size+0x1f7/0xa10 [xfs] [ 2748.078312] xfs_vn_setattr_size+0x68/0x140 [xfs] [ 2748.083589] xfs_file_fallocate+0x4ac/0x820 [xfs] [ 2748.088838] vfs_fallocate+0x2cf/0x780 [ 2748.093021] SyS_fallocate+0x48/0x80 [ 2748.097006] do_syscall_64+0x18a/0x430 [ 2748.101186] return_from_SYSCALL_64+0x0/0x6a [ 2748.105948] Freed: [ 2748.108189] PID = 8327 [ 2748.110816] save_stack_trace+0x1b/0x20 [ 2748.115093] save_stack+0x46/0xd0 [ 2748.118788] kasan_slab_free+0x73/0xc0 [ 2748.122969] kmem_cache_free+0x7a/0x200 [ 2748.127247] free_buffer_head+0x41/0x80 [ 2748.131524] try_to_free_buffers+0x178/0x250 [ 2748.136316] xfs_vm_releasepage+0x2e9/0x3d0 [xfs] [ 2748.141563] try_to_release_page+0x100/0x180 [ 2748.146325] invalidate_inode_pages2_range+0x7da/0xcf0 [ 2748.152087] xfs_shift_file_space+0x37d/0x6e0 [xfs] [ 2748.157557] xfs_collapse_file_space+0x49/0x120 [xfs] [ 2748.163223] xfs_file_fallocate+0x2a7/0x820 [xfs] [ 2748.168462] vfs_fallocate+0x2cf/0x780 [ 2748.172642] SyS_fallocate+0x48/0x80 [ 2748.176629] do_syscall_64+0x18a/0x430 [ 2748.180810] return_from_SYSCALL_64+0x0/0x6a Fixed it by checking on offset against end & breaking out first, dereference bh only if there're still bufferheads to process. Signed-off-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-05-05Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "This is a set of small fixes that were mostly stumbled over during more significant development. This proc fix and the fix to posix-timers are the most significant of the lot. There is a lot of good development going on but unfortunately it didn't quite make the merge window" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: Fix unbalanced hard link numbers signal: Make kill_proc_info static rlimit: Properly call security_task_setrlimit signal: Remove unused definition of sig_user_definied ia64: Remove unused IA64_TASK_SIGHAND_OFFSET and IA64_SIGHAND_SIGLOCK_OFFSET ipc: Remove unused declaration of recompute_msgmni posix-timers: Correct sanity check in posix_cpu_nsleep sysctl: Remove dead register_sysctl_root
2017-05-05nfs: use kmap/kunmap directlyFabian Frederick
This patch removes useless nfs_readdir_get_array() and nfs_readdir_release_array() as suggested by Trond Myklebust nfs_readdir() calls nfs_revalidate_mapping() before readdir_search_pagecache() , nfs_do_filldir(), uncached_readdir() so mapping should be correct. While kmap() can't fail, all subsequent error checks were removed as well as unused labels. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-05NFS: always treat the invocation of nfs_getattr as cache hit when noac is onHou Tao
When using 'ls -l' to display a large directory, if noac option is used, in function nfs_getattr() nfs_need_revalidate_inode() will always be true for NFSv3 and the nfs_entry cache of the directory will be flushed. The flush will lead to a fully reread of the directory entries from server. To prevent the unnecessary RPCs, we need to check whether or not the noac option is used, and always report the invocation of nfs_getattr() as cache hit instead cache miss when it's on. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-05Fix nfs_client refcounting if kmalloc fails in nfs4_proc_exchange_id and ↵Dave Wysochanski
nfs4_proc_async_renew If memory allocation fails for the callback data, we need to put the nfs_client or we end up with an elevated refcount. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2017-05-05NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSIONTrond Myklebust
If the server returns NFS4ERR_CONN_NOT_BOUND_TO_SESSION because we are trunking, then RECLAIM_COMPLETE must handle that by calling nfs4_schedule_session_recovery() and then retrying. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Chuck Lever <chuck.lever@oracle.com>
2017-05-05CIFS: add misssing SFM mapping for doublequoteBjörn Jacke
SFM is mapping doublequote to 0xF020 Without this patch creating files with doublequote fails to Windows/Mac Signed-off-by: Bjoern Jacke <bjacke@samba.org> Signed-off-by: Steve French <smfrench@gmail.com> CC: stable <stable@vger.kernel.org>
2017-05-05befs: make export work with cold dcacheFabian Frederick
based on commit b3b42c0deaa1 ("fs/affs: make export work with cold dcache") This adds get_parent function so that nfs client can still work after cache drop (Tested on NFS v4 with echo 3 > /proc/sys/vm/drop_caches) Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Luis de Bethencourt <luisbg@osg.samsung.com>
2017-05-05ovl: persistent inode numbers for upper hardlinksAmir Goldstein
An upper type non directory dentry that is a copy up target should have a reference to its lower copy up origin. There are three ways for an upper type dentry to be instantiated: 1. A lower type dentry that is being copied up 2. An entry that is found in upper dir by ovl_lookup() 3. A negative dentry is hardlinked to an upper type dentry In the first case, the lower reference is set before copy up. In the second case, the lower reference is found by ovl_lookup(). In the last case of hardlinked upper dentry, it is not easy to update the lower reference of the negative dentry. Instead, drop the newly hardlinked negative dentry from dcache and let the next access call ovl_lookup() to find its lower reference. This makes sure that the inode number reported by stat(2) after the hardlink is created is the same inode number that will be reported by stat(2) after mount cycle, which is the inode number of the lower copy up origin of the hardlink source. NOTE that this does not fix breaking of lower hardlinks on copy up, but only fixes the case of lower nlink == 1, whose upper copy up inode is hardlinked in upper dir. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05ovl: merge getattr for dir and nondirMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05ovl: constant st_ino/st_dev across copy upAmir Goldstein
When all layers are on the same underlying filesystem, let stat(2) return st_dev/st_ino values of the copy up origin inode if it is known. This results in constant st_ino/st_dev representation of files in an overlay mount before and after copy up. When the underlying filesystem support NFS exportfs, the result is also persistent st_ino/st_dev representation before and after mount cycle. Lower hardlinks are broken on copy up to different upper files, so we cannot use the lower origin st_ino for those different files, even for the same fs case. When all overlay layers are on the same fs, use overlay st_dev for non-dirs to get the correct result from du -x. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05ovl: persistent inode number for directoriesAmir Goldstein
stat(2) on overlay directories reports the overlay temp inode number, which is constant across copy up, but is not persistent. When all layers are on the same fs, report the copy up origin inode number for directories. This inode number is persistent, unique across the overlay mount and constant across copy up. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05ovl: set the ORIGIN type flagAmir Goldstein
For directory entries, non zero oe->numlower implies OVL_TYPE_MERGE. Define a new type flag OVL_TYPE_ORIGIN to indicate that an entry holds a reference to its lower copy up origin. For directory entries ORIGIN := MERGE && UPPER. For non-dir entries ORIGIN means that a lower type dentry has been recently copied up or that we were able to find the copy up origin from overlay.origin xattr. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05ovl: lookup non-dir copy-up-origin by file handleAmir Goldstein
If overlay.origin xattr is found on a non-dir upper inode try to get lower dentry by calling exportfs_decode_fh(). On failure to lookup by file handle to lower layer, do not lookup the copy up origin by name, because the lower found by name could be another file in case the upper file was renamed. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05ovl: use an auxiliary var for overlay root entryAmir Goldstein
Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05ovl: store file handle of lower inode on copy upAmir Goldstein
Sometimes it is interesting to know if an upper file is pure upper or a copy up target, and if it is a copy up target, it may be interesting to find the copy up origin. This will be used to preserve lower inode numbers across copy up. Store the lower inode file handle in upper inode extended attribute overlay.origin on copy up to use it later for these cases. Store the lower filesystem uuid along side the file handle, so we can validate that we are looking for the origin file in the original fs. If lower fs does not support NFS export ops store a zero sized xattr so we can always use the overlay.origin xattr to distinguish between a copy up and a pure upper inode. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-05ovl: check if all layers are on the same fsAmir Goldstein
Some features can only work when all layers are on the same fs. Test this condition during mount time, so features can check them later. Add helper ovl_same_sb() to return the common super block in case all layers are on the same fs. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2017-05-04Merge tag 'char-misc-4.12-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big set of new char/misc driver drivers and features for 4.12-rc1. There's lots of new drivers added this time around, new firmware drivers from Google, more auxdisplay drivers, extcon drivers, fpga drivers, and a bunch of other driver updates. Nothing major, except if you happen to have the hardware for these drivers, and then you will be happy :) All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits) firmware: google memconsole: Fix return value check in platform_memconsole_init() firmware: Google VPD: Fix return value check in vpd_platform_init() goldfish_pipe: fix build warning about using too much stack. goldfish_pipe: An implementation of more parallel pipe fpga fr br: update supported version numbers fpga: region: release FPGA region reference in error path fpga altera-hps2fpga: disable/unprepare clock on error in alt_fpga_bridge_probe() mei: drop the TODO from samples firmware: Google VPD sysfs driver firmware: Google VPD: import lib_vpd source files misc: lkdtm: Add volatile to intentional NULL pointer reference eeprom: idt_89hpesx: Add OF device ID table misc: ds1682: Add OF device ID table misc: tsl2550: Add OF device ID table w1: Remove unneeded use of assert() and remove w1_log.h w1: Use kernel common min() implementation uio_mf624: Align memory regions to page size and set correct offsets uio_mf624: Refactor memory info initialization uio: Allow handling of non page-aligned memory regions hangcheck-timer: Fix typo in comment ...
2017-05-04btrfs: fix the gfp_mask for the reada_zones radix treeChris Mason
Commits cc8385b59e17 and 7ef70b4d9987a7 added preallocation for the reada radix trees and also switched them over to GFP_KERNEL for the default gfp mask. Since we're doing radix tree insertions under spinlocks, we need to make sure the mask doesn't allow sleeping. This fix keeps the radix preallocation but switches back to the original gfp_mask. Reported-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2017-05-04orangefs: count directory pieces correctlyMartin Brandenburg
A large directory full of differently sized file names triggered this. Most directories, even very large directories with shorter names, would be lucky enough to fit in one server response. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-05-04orangefs: invalidate stored directory on seekMartin Brandenburg
If an application seeks to a position before the point which has been read, it must want updates which have been made to the directory. So delete the copy stored in the kernel so it will be fetched again. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-05-04orangefs: skip forward to the next directory entry if seek is shortMartin Brandenburg
If userspace seeks to a position in the stream which is not correct, it would have returned EIO because the data in the buffer at that offset would be incorrect. This and the userspace daemon returning a corrupt directory are indistinguishable. Now if the data does not look right, skip forward to the next chunk and try again. The motivation is that if the directory changes, an application may seek to a position that was valid and no longer is valid. It is not yet possible for a directory to change. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-05-04ext4: clean up ext4_match() and callersEric Biggers
When ext4 encryption was originally merged, we were encrypting the user-specified filename in ext4_match(), introducing a lot of additional complexity into ext4_match() and its callers. This has since been changed to encrypt the filename earlier, so we can remove the gunk that's no longer needed. This more or less reverts ext4_search_dir() and ext4_find_dest_de() to the way they were in the v4.0 kernel. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-05-04f2fs: switch to using fscrypt_match_name()Eric Biggers
Switch f2fs directory searches to use the fscrypt_match_name() helper function. There should be no functional change. Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-05-04ext4: switch to using fscrypt_match_name()Eric Biggers
Switch ext4 directory searches to use the fscrypt_match_name() helper function. There should be no functional change. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-05-04fscrypt: introduce helper function for filename matchingEric Biggers
Introduce a helper function fscrypt_match_name() which tests whether a fscrypt_name matches a directory entry. Also clean up the magic numbers and document things properly. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-05-04fscrypt: avoid collisions when presenting long encrypted filenamesEric Biggers
When accessing an encrypted directory without the key, userspace must operate on filenames derived from the ciphertext names, which contain arbitrary bytes. Since we must support filenames as long as NAME_MAX, we can't always just base64-encode the ciphertext, since that may make it too long. Currently, this is solved by presenting long names in an abbreviated form containing any needed filesystem-specific hashes (e.g. to identify a directory block), then the last 16 bytes of ciphertext. This needs to be sufficient to identify the actual name on lookup. However, there is a bug. It seems to have been assumed that due to the use of a CBC (ciphertext block chaining)-based encryption mode, the last 16 bytes (i.e. the AES block size) of ciphertext would depend on the full plaintext, preventing collisions. However, we actually use CBC with ciphertext stealing (CTS), which handles the last two blocks specially, causing them to appear "flipped". Thus, it's actually the second-to-last block which depends on the full plaintext. This caused long filenames that differ only near the end of their plaintexts to, when observed without the key, point to the wrong inode and be undeletable. For example, with ext4: # echo pass | e4crypt add_key -p 16 edir/ # seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch # find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l 100000 # sync # echo 3 > /proc/sys/vm/drop_caches # keyctl new_session # find edir/ -type f | xargs stat -c %i | sort | uniq | wc -l 2004 # rm -rf edir/ rm: cannot remove 'edir/_A7nNFi3rhkEQlJ6P,hdzluhODKOeWx5V': Structure needs cleaning ... To fix this, when presenting long encrypted filenames, encode the second-to-last block of ciphertext rather than the last 16 bytes. Although it would be nice to solve this without depending on a specific encryption mode, that would mean doing a cryptographic hash like SHA-256 which would be much less efficient. This way is sufficient for now, and it's still compatible with encryption modes like HEH which are strong pseudorandom permutations. Also, changing the presented names is still allowed at any time because they are only provided to allow applications to do things like delete encrypted directories. They're not designed to be used to persistently identify files --- which would be hard to do anyway, given that they're encrypted after all. For ease of backports, this patch only makes the minimal fix to both ext4 and f2fs. It leaves ubifs as-is, since ubifs doesn't compare the ciphertext block yet. Follow-on patches will clean things up properly and make the filesystems use a shared helper function. Fixes: 5de0b4d0cd15 ("ext4 crypto: simplify and speed up filename encryption") Reported-by: Gwendal Grignou <gwendal@chromium.org> Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-05-04f2fs: check entire encrypted bigname when finding a dentryJaegeuk Kim
If user has no key under an encrypted dir, fscrypt gives digested dentries. Previously, when looking up a dentry, f2fs only checks its hash value with first 4 bytes of the digested dentry, which didn't handle hash collisions fully. This patch enhances to check entire dentry bytes likewise ext4. Eric reported how to reproduce this issue by: # seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch # find edir -type f | xargs stat -c %i | sort | uniq | wc -l 100000 # sync # echo 3 > /proc/sys/vm/drop_caches # keyctl new_session # find edir -type f | xargs stat -c %i | sort | uniq | wc -l 99999 Cc: <stable@vger.kernel.org> Reported-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> (fixed f2fs_dentry_hash() to work even when the hash is 0) Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>