summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-02-06Merge tag 'libnvdimm-for-4.16' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Ross Zwisler: - Require struct page by default for filesystem DAX to remove a number of surprising failure cases. This includes failures with direct I/O, gdb and fork(2). - Add support for the new Platform Capabilities Structure added to the NFIT in ACPI 6.2a. This new table tells us whether the platform supports flushing of CPU and memory controller caches on unexpected power loss events. - Revamp vmem_altmap and dev_pagemap handling to clean up code and better support future future PCI P2P uses. - Deprecate the ND_IOCTL_SMART_THRESHOLD command whose payload has become out-of-sync with recent versions of the NVDIMM_FAMILY_INTEL spec, and instead rely on the generic ND_CMD_CALL approach used by the two other IOCTL families, NVDIMM_FAMILY_{HPE,MSFT}. - Enhance nfit_test so we can test some of the new things added in version 1.6 of the DSM specification. This includes testing firmware download and simulating the Last Shutdown State (LSS) status. * tag 'libnvdimm-for-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (37 commits) libnvdimm, namespace: remove redundant initialization of 'nd_mapping' acpi, nfit: fix register dimm error handling libnvdimm, namespace: make min namespace size 4K tools/testing/nvdimm: force nfit_test to depend on instrumented modules libnvdimm/nfit_test: adding support for unit testing enable LSS status libnvdimm/nfit_test: add firmware download emulation nfit-test: Add platform cap support from ACPI 6.2a to test libnvdimm: expose platform persistence attribute for nd_region acpi: nfit: add persistent memory control flag for nd_region acpi: nfit: Add support for detect platform CPU cache flush on power loss device-dax: Fix trailing semicolon libnvdimm, btt: fix uninitialized err_lock dax: require 'struct page' by default for filesystem dax ext2: auto disable dax instead of failing mount ext4: auto disable dax instead of failing mount mm, dax: introduce pfn_t_special() mm: Fix devm_memremap_pages() collision handling mm: Fix memory size alignment in devm_memremap_pages_release() memremap: merge find_dev_pagemap into get_dev_pagemap memremap: change devm_memremap_pages interface to use struct dev_pagemap ...
2018-02-06afs: Support the AFS dynamic rootDavid Howells
Support the AFS dynamic root which is a pseudo-volume that doesn't connect to any server resource, but rather is just a root directory that dynamically creates mountpoint directories where the name of such a directory is the name of the cell. Such a mount can be created thus: mount -t afs none /afs -o dyn Dynamic root superblocks aren't shared except by bind mounts and propagation. Cell root volumes can then be mounted by referring to them by name, e.g.: ls /afs/grand.central.org/ ls /afs/.grand.central.org/ The kernel will upcall to consult the DNS if the address wasn't supplied directly. Signed-off-by: David Howells <dhowells@redhat.com>
2018-02-06afs: Rearrange afs_select_fileserver() a littleDavid Howells
Rearrange afs_select_fileserver() a little to put the use_server chunk before the next_server chunk so that with the removal of a couple of gotos the main path through the function is all one sequence. Signed-off-by: David Howells <dhowells@redhat.com>
2018-02-06afs: Remove unused codeDavid Howells
Remove some old unused code. Signed-off-by: David Howells <dhowells@redhat.com>
2018-02-06afs: Fix server list handlingDavid Howells
Fix server list handling in the following ways: (1) In afs_alloc_volume(), remove duplicate server list build code. This was already done by afs_alloc_server_list() which afs_alloc_volume() previously called. This just results in twice as many VL RPCs. (2) In afs_deliver_vl_get_entry_by_name_u(), use the number of server records indicated by ->nServers in the UVLDB record returned by the VL.GetEntryByNameU RPC call rather than scanning all NMAXNSERVERS slots. Unused slots may contain garbage. (3) In afs_alloc_server_list(), don't stop converting a UVLDB record into a server list just because we can't look up one of the servers. Just skip that server and go on to the next. If we can't look up any of the servers then we'll fail at the end. Without this patch, an attempt to view the umich.edu root cell using something like "ls /afs/umich.edu" on a dynamic root (future patch) mount or an autocell mount will result in ENOMEDIUM. The failure is due to kafs not stopping after nServers'worth of records have been read, but then trying to access a server with a garbage UUID and getting an error, which aborts the server list build. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Reported-by: Jonathan Billings <jsbillings@jsbillings.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org
2018-02-06afs: Need to clear responded flag in addr cursorDavid Howells
In afs_select_fileserver(), we need to clear the ->responded flag in the address list when reusing it. We should also clear it in afs_select_current_fileserver(). To this end, just memset() the object before initialising it. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org
2018-02-06afs: Fix missing cursor clearanceDavid Howells
afs_select_fileserver() ends the address cursor it is using in the case in which we get some sort of network error and run out of addresses to iterate through, before it jumps to try the next server. This also needs to be done when the server aborts with some sort of error that means we should try the next server. Fix this by: (1) Move the iterate_address afs_end_cursor() call to the next_server case. (2) End the cursor in the failed case. (3) Make afs_end_cursor() clear the ->begun flag and ->addr pointer in the address cursor. (4) Make afs_end_cursor() able to be called on an already cleared cursor. Without this, something like the following oops may occur: AFS: Assertion failed 18446612134397189888 == 0 is false 0xffff88007c279f00 == 0x0 is false ------------[ cut here ]------------ kernel BUG at fs/afs/rotate.c:360! RIP: 0010:afs_select_fileserver+0x79b/0xa30 [kafs] Call Trace: afs_statfs+0xcc/0x180 [kafs] ? p9_client_statfs+0x9e/0x110 [9pnet] ? _cond_resched+0x19/0x40 statfs_by_dentry+0x6d/0x90 vfs_statfs+0x1b/0xc0 user_statfs+0x4b/0x80 SYSC_statfs+0x15/0x30 SyS_statfs+0xe/0x10 entry_SYSCALL_64_fastpath+0x20/0x83 Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org
2018-02-06afs: Add missing afs_put_cell()David Howells
afs_alloc_volume() needs to release the cell ref it obtained in the case of an error. Fix this by adding an afs_put_cell() call into the error path. This can triggered when a lookup for a cell in a dynamic root or an autocell mount returns an error whilst trying to look up the server (such as ENOMEDIUM). This results in an assertion failure oops when the module is unloaded due to outstanding refs on a cell record. Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: David Howells <dhowells@redhat.com> cc: stable@vger.kernel.org
2018-02-05nfsd4: don't set lock stateid's sc_type to CLOSEDJ. Bruce Fields
There's no point I can see to stp->st_stid.sc_type = NFS4_CLOSED_STID; given release_lock_stateid immediately sets sc_type to 0. That set of sc_type to 0 should be enough to prevent it being used where we don't want it to be; NFS4_CLOSED_STID should only be needed for actual open stateid's that are actually closed. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-02-05nfsd: Detect unhashed stids in nfsd4_verify_open_stid()Trond Myklebust
The state of the stid is guaranteed by 2 locks: - The nfs4_client 'cl_lock' spinlock - The nfs4_ol_stateid 'st_mutex' mutex so it is quite possible for the stid to be unhashed after lookup, but before calling nfsd4_lock_ol_stateid(). So we do need to check for a zero value for 'sc_type' in nfsd4_verify_open_stid(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Checuk Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Fixes: 659aefb68eca "nfsd: Ensure we don't recognise lock stateids..." Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2018-02-05Merge tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull more xfs updates from Darrick Wong: "As promised, here's a (much smaller) second pull request for the second week of the merge cycle. This time around we have a couple patches shutting off unsupported fs configurations, and a couple of cleanups. Last, we turn off EXPERIMENTAL for the reverse mapping btree, since the primary downstream user of that information (online fsck) is now upstream and I haven't seen any major failures in a few kernel releases. Summary: - Print scrub build status in the xfs build info. - Explicitly call out the remaining two scenarios where we don't support reflink and never have. - Remove EXPERIMENTAL tag from reverse mapping btree!" * tag 'xfs-4.16-merge-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: remove experimental tag for reverse mapping xfs: don't allow reflink + realtime filesystems xfs: don't allow DAX on reflink filesystems xfs: add scrub to XFS_BUILD_OPTIONS xfs: fix u32 type usage in sb validation function
2018-02-05Merge branch 'overlayfs-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs updates from Miklos Szeredi: "This work from Amir adds NFS export capability to overlayfs. NFS exporting an overlay filesystem is a challange because we want to keep track of any copy-up of a file or directory between encoding the file handle and decoding it. This is achieved by indexing copied up objects by lower layer file handle. The index is already used for hard links, this patchset extends the use to NFS file handle decoding" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (51 commits) ovl: check ERR_PTR() return value from ovl_encode_fh() ovl: fix regression in fsnotify of overlay merge dir ovl: wire up NFS export operations ovl: lookup indexed ancestor of lower dir ovl: lookup connected ancestor of dir in inode cache ovl: hash non-indexed dir by upper inode for NFS export ovl: decode pure lower dir file handles ovl: decode indexed dir file handles ovl: decode lower file handles of unlinked but open files ovl: decode indexed non-dir file handles ovl: decode lower non-dir file handles ovl: encode lower file handles ovl: copy up before encoding non-connectable dir file handle ovl: encode non-indexed upper file handles ovl: decode connected upper dir file handles ovl: decode pure upper file handles ovl: encode pure upper file handles ovl: document NFS export vfs: factor out helpers d_instantiate_anon() and d_alloc_anon() ovl: store 'has_upper' and 'opaque' as bit flags ...
2018-02-05btrfs: Fix use-after-free when cleaning up fs_devs with a single stale deviceNikolay Borisov
Commit 4fde46f0cc71 ("Btrfs: free the stale device") introduced btrfs_free_stale_device which iterates the device lists for all registered btrfs filesystems and deletes those devices which aren't mounted. In a btrfs_devices structure has only 1 device attached to it and it is unused then btrfs_free_stale_devices will proceed to also free the btrfs_fs_devices struct itself. Currently this leads to a use after free since list_for_each_entry will try to perform a check on the already freed memory to see if it has to terminate the loop. The fix is to use 'break' when we know we are freeing the current fs_devs. Fixes: 4fde46f0cc71 ("Btrfs: free the stale device") Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-05ovl: check ERR_PTR() return value from ovl_encode_fh()Amir Goldstein
Another fix for an issue reported by 0-day robot. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 8ed5eec9d6c4 ("ovl: encode pure upper file handles") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-05ovl: fix regression in fsnotify of overlay merge dirAmir Goldstein
A re-factoring patch in NFS export series has passed the wrong argument to ovl_get_inode() causing a regression in the very recent fix to fsnotify of overlay merge dir. The regression has caused merge directory inodes to be hashed by upper instead of lower real inode, when NFS export and directory indexing is disabled. That caused an inotify watch to become obsolete after directory copy up and drop caches. LTP test inotify07 was improved to catch this regression. The regression also caused multiple redirect dirs to same origin not to be detected on lookup with NFS export disabled. An xfstest was added to cover this case. Fixes: 0aceb53e73be ("ovl: do not pass overlay dentry to ovl_get_inode()") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-02-04Merge tag 'fscrypt_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt Pull fscrypt updates from Ted Ts'o: "Refactor support for encrypted symlinks to move common code to fscrypt" Ted also points out about the merge: "This makes the f2fs symlink code use the fscrypt_encrypt_symlink() from the fscrypt tree. This will end up dropping the kzalloc() -> f2fs_kzalloc() change, which means the fscrypt-specific allocation won't get tested by f2fs's kmalloc error injection system; which is fine" * tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/fscrypt: (26 commits) fscrypt: fix build with pre-4.6 gcc versions fscrypt: remove 'ci' parameter from fscrypt_put_encryption_info() fscrypt: document symlink length restriction fscrypt: fix up fscrypt_fname_encrypted_size() for internal use fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names fscrypt: calculate NUL-padding length in one place only fscrypt: move fscrypt_symlink_data to fscrypt_private.h fscrypt: remove fscrypt_fname_usr_to_disk() ubifs: switch to fscrypt_get_symlink() ubifs: switch to fscrypt ->symlink() helper functions ubifs: free the encrypted symlink target f2fs: switch to fscrypt_get_symlink() f2fs: switch to fscrypt ->symlink() helper functions ext4: switch to fscrypt_get_symlink() ext4: switch to fscrypt ->symlink() helper functions fscrypt: new helper function - fscrypt_get_symlink() fscrypt: new helper functions for ->symlink() fscrypt: trim down fscrypt.h includes fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h ...
2018-02-03Merge tag 'usercopy-v4.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardened usercopy whitelisting from Kees Cook: "Currently, hardened usercopy performs dynamic bounds checking on slab cache objects. This is good, but still leaves a lot of kernel memory available to be copied to/from userspace in the face of bugs. To further restrict what memory is available for copying, this creates a way to whitelist specific areas of a given slab cache object for copying to/from userspace, allowing much finer granularity of access control. Slab caches that are never exposed to userspace can declare no whitelist for their objects, thereby keeping them unavailable to userspace via dynamic copy operations. (Note, an implicit form of whitelisting is the use of constant sizes in usercopy operations and get_user()/put_user(); these bypass all hardened usercopy checks since these sizes cannot change at runtime.) This new check is WARN-by-default, so any mistakes can be found over the next several releases without breaking anyone's system. The series has roughly the following sections: - remove %p and improve reporting with offset - prepare infrastructure and whitelist kmalloc - update VFS subsystem with whitelists - update SCSI subsystem with whitelists - update network subsystem with whitelists - update process memory with whitelists - update per-architecture thread_struct with whitelists - update KVM with whitelists and fix ioctl bug - mark all other allocations as not whitelisted - update lkdtm for more sensible test overage" * tag 'usercopy-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (38 commits) lkdtm: Update usercopy tests for whitelisting usercopy: Restrict non-usercopy caches to size 0 kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl kvm: whitelist struct kvm_vcpu_arch arm: Implement thread_struct whitelist for hardened usercopy arm64: Implement thread_struct whitelist for hardened usercopy x86: Implement thread_struct whitelist for hardened usercopy fork: Provide usercopy whitelisting for task_struct fork: Define usercopy region in thread_stack slab caches fork: Define usercopy region in mm_struct slab caches net: Restrict unwhitelisted proto caches to size 0 sctp: Copy struct sctp_sock.autoclose to userspace using put_user() sctp: Define usercopy region in SCTP proto slab cache caif: Define usercopy region in caif proto slab cache ip: Define usercopy region in IP proto slab cache net: Define usercopy region in struct proto slab cache scsi: Define usercopy region in scsi_sense_cache slab cache cifs: Define usercopy region in cifs_request slab cache vxfs: Define usercopy region in vxfs_inode slab cache ufs: Define usercopy region in ufs_inode_cache slab cache ...
2018-02-03Merge tag 'pstore-v4.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore update from Kees Cook: "Only a header cleanup this release; nice and quiet. :) - clean up hardirq header usage (Yang Shi)" * tag 'pstore-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: fs: pstore: remove unused hardirq.h
2018-02-03Merge tag 'ext4_for_linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Only miscellaneous cleanups and bug fixes for ext4 this cycle" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: create ext4_kset dynamically ext4: create ext4_feat kobject dynamically ext4: release kobject/kset even when init/register fail ext4: fix incorrect indentation of if statement ext4: correct documentation for grpid mount option ext4: use 'sbi' instead of 'EXT4_SB(sb)' ext4: save error to disk in __ext4_grp_locked_error() jbd2: fix sphinx kernel-doc build warnings ext4: fix a race in the ext4 shutdown path mbcache: make sure c_entry_count is not decremented past zero ext4: no need flush workqueue before destroying it ext4: fixed alignment and minor code cleanup in ext4.h ext4: fix ENOSPC handling in DAX page fault handler dax: pass detailed error code from dax_iomap_fault() mbcache: revert "fs/mbcache.c: make count_objects() more robust" mbcache: initialize entry->e_referenced in mb_cache_entry_create() ext4: fix up remaining files with SPDX cleanups
2018-02-03Merge tag 'gfs2-4.16.fixes2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull GFS2 fixes from Bob Peterson: "Andreas Gruenbacher wrote two additional patches that we would like merged in this time. Both are regressions: - fix another kernel build dependency problem - fix a performance regression in glock dumps" * tag 'gfs2-4.16.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: Glock dump performance regression fix gfs2: Fix the crc32c dependency
2018-02-03Merge branch 'for-4.16/nfit' into libnvdimm-for-nextRoss Zwisler
2018-02-02Btrfs: fix null pointer dereference when replacing missing deviceFilipe Manana
When we are replacing a missing device we mount the filesystem with the degraded mode option in which case we are allowed to have a btrfs device structure without a backing device member (its bdev member is NULL) and therefore we can't dereference that member. Commit 38b5f68e9811 ("btrfs: drop btrfs_device::can_discard to query directly") started to dereference that member when discarding extents, resulting in a null pointer dereference: [ 3145.322257] BTRFS warning (device sdf): devid 2 uuid 4d922414-58eb-4880-8fed-9c3840f6c5d5 is missing [ 3145.364116] BTRFS info (device sdf): dev_replace from <missing disk> (devid 2) to /dev/sdg started [ 3145.413489] BUG: unable to handle kernel NULL pointer dereference at 00000000000000e0 [ 3145.415085] IP: btrfs_discard_extent+0x6a/0xf8 [btrfs] [ 3145.415085] PGD 0 P4D 0 [ 3145.415085] Oops: 0000 [#1] PREEMPT SMP PTI [ 3145.415085] Modules linked in: ppdev ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper evdev psmouse parport_pc serio_raw i2c_piix4 i2 [ 3145.415085] CPU: 0 PID: 11989 Comm: btrfs Tainted: G W 4.15.0-rc9-btrfs-next-55+ #1 [ 3145.415085] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 [ 3145.415085] RIP: 0010:btrfs_discard_extent+0x6a/0xf8 [btrfs] [ 3145.415085] RSP: 0018:ffffc90004813c60 EFLAGS: 00010293 [ 3145.415085] RAX: ffff88020d39cc00 RBX: ffff88020c4ea2a0 RCX: 0000000000000002 [ 3145.415085] RDX: 0000000000000000 RSI: ffff88020c4ea240 RDI: 0000000000000000 [ 3145.415085] RBP: 0000000000000000 R08: 0000000000004000 R09: 0000000000000000 [ 3145.415085] R10: ffffc90004813ae8 R11: 0000000000000000 R12: 0000000000000000 [ 3145.415085] R13: ffff88020c418000 R14: 0000000000000000 R15: 0000000000000000 [ 3145.415085] FS: 00007f565681f8c0(0000) GS:ffff88023fc00000(0000) knlGS:0000000000000000 [ 3145.415085] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3145.415085] CR2: 00000000000000e0 CR3: 000000020d208006 CR4: 00000000001606f0 [ 3145.415085] Call Trace: [ 3145.415085] btrfs_finish_extent_commit+0x9a/0x1be [btrfs] [ 3145.415085] btrfs_commit_transaction+0x649/0x7a0 [btrfs] [ 3145.415085] ? start_transaction+0x2b0/0x3b3 [btrfs] [ 3145.415085] btrfs_dev_replace_start+0x274/0x30c [btrfs] [ 3145.415085] btrfs_dev_replace_by_ioctl+0x45/0x59 [btrfs] [ 3145.415085] btrfs_ioctl+0x1a91/0x1d62 [btrfs] [ 3145.415085] ? lock_acquire+0x16a/0x1af [ 3145.415085] ? vfs_ioctl+0x1b/0x28 [ 3145.415085] ? trace_hardirqs_on_caller+0x14c/0x1a6 [ 3145.415085] vfs_ioctl+0x1b/0x28 [ 3145.415085] do_vfs_ioctl+0x5a9/0x5e0 [ 3145.415085] ? _raw_spin_unlock_irq+0x34/0x46 [ 3145.415085] ? entry_SYSCALL_64_fastpath+0x5/0x8b [ 3145.415085] ? trace_hardirqs_on_caller+0x14c/0x1a6 [ 3145.415085] SyS_ioctl+0x52/0x76 [ 3145.415085] entry_SYSCALL_64_fastpath+0x1e/0x8b [ 3145.415085] RIP: 0033:0x7f56558b3c47 [ 3145.415085] RSP: 002b:00007ffdcfac4c58 EFLAGS: 00000202 [ 3145.415085] Code: be 02 00 00 00 4c 89 ef e8 b9 e7 03 00 85 c0 89 c5 75 75 48 8b 44 24 08 45 31 f6 48 8d 58 60 eb 52 48 8b 03 48 8b b8 a0 00 00 00 <48> 8b 87 e0 00 [ 3145.415085] RIP: btrfs_discard_extent+0x6a/0xf8 [btrfs] RSP: ffffc90004813c60 [ 3145.415085] CR2: 00000000000000e0 [ 3145.458185] ---[ end trace 06302e7ac31902bf ]--- This is trivially reproduced by running the test btrfs/027 from fstests like this: $ MOUNT_OPTIONS="-o discard" ./check btrfs/027 Fix this by skipping devices without a backing device before attempting to discard. Fixes: 38b5f68e9811 ("btrfs: drop btrfs_device::can_discard to query directly") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-02btrfs: remove spurious WARN_ON(ref->count < 0) in find_parent_nodesZygo Blaxell
Until v4.14, this warning was very infrequent: WARNING: CPU: 3 PID: 18172 at fs/btrfs/backref.c:1391 find_parent_nodes+0xc41/0x14e0 Modules linked in: [...] CPU: 3 PID: 18172 Comm: bees Tainted: G D W L 4.11.9-zb64+ #1 Hardware name: System manufacturer System Product Name/M5A78L-M/USB3, BIOS 2101 12/02/2014 Call Trace: dump_stack+0x85/0xc2 __warn+0xd1/0xf0 warn_slowpath_null+0x1d/0x20 find_parent_nodes+0xc41/0x14e0 __btrfs_find_all_roots+0xad/0x120 ? extent_same_check_offsets+0x70/0x70 iterate_extent_inodes+0x168/0x300 iterate_inodes_from_logical+0x87/0xb0 ? iterate_inodes_from_logical+0x87/0xb0 ? extent_same_check_offsets+0x70/0x70 btrfs_ioctl+0x8ac/0x2820 ? lock_acquire+0xc2/0x200 do_vfs_ioctl+0x91/0x700 ? __fget+0x112/0x200 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x23/0xc6 ? trace_hardirqs_off_caller+0x1f/0x140 Starting with v4.14 (specifically 86d5f9944252 ("btrfs: convert prelimary reference tracking to use rbtrees")) the WARN_ON occurs three orders of magnitude more frequently--almost once per second while running workloads like bees. Replace the WARN_ON() with a comment rationale for its removal. The rationale is paraphrased from an explanation by Edmund Nadolski <enadolski@suse.de> on the linux-btrfs mailing list. Fixes: 8da6d5815c59 ("Btrfs: added btrfs_find_all_roots()") Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Reviewed-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-02btrfs: Ignore errors from btrfs_qgroup_trace_extent_postNikolay Borisov
Running generic/019 with qgroups on the scratch device enabled is almost guaranteed to trigger the BUG_ON in btrfs_free_tree_block. It's supposed to trigger only on -ENOMEM, in reality, however, it's possible to get -EIO from btrfs_qgroup_trace_extent_post. This function just finds the roots of the extent being tracked and sets the qrecord->old_roots list. If this operation fails nothing critical happens except the quota accounting can be considered wrong. In such case just set the INCONSISTENT flag for the quota and print a warning, rather than killing off the system. Additionally, it's possible to trigger a BUG_ON in btrfs_truncate_inode_items as well. Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> [ error message adjustments ] Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-02Btrfs: fix unexpected -EEXIST when creating new inodeLiu Bo
The highest objectid, which is assigned to new inode, is decided at the time of initializing fs roots. However, in cases where log replay gets processed, the btree which fs root owns might be changed, so we have to search it again for the highest objectid, otherwise creating new inode would end up with -EEXIST. cc: <stable@vger.kernel.org> v4.4-rc6+ Fixes: f32e48e92596 ("Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots") Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-02Btrfs: fix use-after-free on root->orphan_block_rsvLiu Bo
I got these from running generic/475, WARNING: CPU: 0 PID: 26384 at fs/btrfs/inode.c:3326 btrfs_orphan_commit_root+0x1ac/0x2b0 [btrfs] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: btrfs_block_rsv_release+0x1c/0x70 [btrfs] Call Trace: btrfs_orphan_release_metadata+0x9f/0x200 [btrfs] btrfs_orphan_del+0x10d/0x170 [btrfs] btrfs_setattr+0x500/0x640 [btrfs] notify_change+0x7ae/0x870 do_truncate+0xca/0x130 vfs_truncate+0x2ee/0x3d0 do_sys_truncate+0xaf/0xf0 SyS_truncate+0xe/0x10 entry_SYSCALL_64_fastpath+0x1f/0x96 The race is between btrfs_orphan_commit_root and btrfs_orphan_del, t1 t2 btrfs_orphan_commit_root btrfs_orphan_del spin_lock check (&root->orphan_inodes) root->orphan_block_rsv = NULL; spin_unlock atomic_dec(&root->orphan_inodes); access root->orphan_block_rsv Accessing root->orphan_block_rsv must be done before decreasing root->orphan_inodes. cc: <stable@vger.kernel.org> v3.12+ Fixes: 703c88e03524 ("Btrfs: fix tracking of orphan inode count") Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-02Btrfs: fix btrfs_evict_inode to handle abnormal inodes correctlyLiu Bo
This regression is introduced in commit 3d48d9810de4 ("btrfs: Handle uninitialised inode eviction"). There are two problems, a) it is ->destroy_inode() that does the final free on inode, not ->evict_inode(), b) clear_inode() must be called before ->evict_inode() returns. This could end up hitting BUG_ON(inode->i_state != (I_FREEING | I_CLEAR)); in evict() because I_CLEAR is set in clear_inode(). Fixes: commit 3d48d9810de4 ("btrfs: Handle uninitialised inode eviction") Cc: <stable@vger.kernel.org> # v4.7-rc6+ Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-02Btrfs: fix extent state leak from tree logLiu Bo
It's possible that btrfs_sync_log() bails out after one of the two btrfs_write_marked_extents() which convert extent state's state bit into EXTENT_NEED_WAIT from EXTENT_DIRTY/EXTENT_NEW, however only EXTENT_DIRTY and EXTENT_NEW are searched by free_log_tree() so that those extent states with EXTENT_NEED_WAIT lead to memory leak. cc: <stable@vger.kernel.org> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-02Btrfs: fix crash due to not cleaning up tree log block's dirty bitsLiu Bo
In cases that the whole fs flips into readonly status due to failures in critical sections, then log tree's blocks are still dirty, and this leads to a crash during umount time, the crash is about use-after-free, umount -> close_ctree -> stop workers -> iput(btree_inode) -> iput_final -> write_inode_now -> ... -> queue job on stop'd workers cc: <stable@vger.kernel.org> v3.12+ Fixes: 681ae50917df ("Btrfs: cleanup reserved space when freeing tree log on error") Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-02Btrfs: fix deadlock in run_delalloc_nocowLiu Bo
@cur_offset is not set back to what it should be (@cow_start) if btrfs_next_leaf() returns something wrong, and the range [cow_start, cur_offset) remains locked forever. cc: <stable@vger.kernel.org> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-02-01xfs: remove experimental tag for reverse mappingDarrick J. Wong
Reverse mapping has had a while to soak, so remove the experimental tag. Now that we've landed space metadata cross-referencing in scrub, the feature actually has a purpose. Reject rmap filesystems with an rt device until the code to support it is actually implemented. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2018-02-01xfs: don't allow reflink + realtime filesystemsDarrick J. Wong
We don't support realtime filesystems with reflink either, so fail those mounts. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com>
2018-02-01xfs: don't allow DAX on reflink filesystemsDarrick J. Wong
Now that reflink is no longer experimental, reject attempts to mount with DAX until that whole mess gets sorted out. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Bill O'Donnell <billodo@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
2018-02-01xfs: add scrub to XFS_BUILD_OPTIONSEric Sandeen
Advertise this config option along with the others. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-02-01Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Add a console_msg_format command line option: The value "default" keeps the old "[time stamp] text\n" format. The value "syslog" allows to see the syslog-like "<log level>[timestamp] text" format. This feature was requested by people doing regression tests, for example, 0day robot. They want to have both filtered and full logs at hands. - Reduce the risk of softlockup: Pass the console owner in a busy loop. This is a new approach to the old problem. It was first proposed by Steven Rostedt on Kernel Summit 2017. It marks a context in which the console_lock owner calls console drivers and could not sleep. On the other side, printk() callers could detect this state and use a busy wait instead of a simple console_trylock(). Finally, the console_lock owner checks if there is a busy waiter at the end of the special context and eventually passes the console_lock to the waiter. The hand-off works surprisingly well and helps in many situations. Well, there is still a possibility of the softlockup, for example, when the flood of messages stops and the last owner still has too much to flush. There is increasing number of people having problems with printk-related softlockups. We might eventually need to get better solution. Anyway, this looks like a good start and promising direction. - Do not allow to schedule in console_unlock() called from printk(): This reverts an older controversial commit. The reschedule helped to avoid softlockups. But it also slowed down the console output. This patch is obsoleted by the new console waiter logic described above. In fact, the reschedule made the hand-off less effective. - Deprecate "%pf" and "%pF" format specifier: It was needed on ia64, ppc64 and parisc64 to dereference function descriptors and show the real function address. It is done transparently by "%ps" and "pS" format specifier now. Sergey Senozhatsky found that all the function descriptors were in a special elf section and could be easily detected. - Remove printk_symbol() API: It has been obsoleted by "%pS" format specifier, and this change helped to remove few continuous lines and a less intuitive old API. - Remove redundant memsets: Sergey removed unnecessary memset when processing printk.devkmsg command line option. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: (27 commits) printk: drop redundant devkmsg_log_str memsets printk: Never set console_may_schedule in console_trylock() printk: Hide console waiter logic into helpers printk: Add console owner and waiter logic to load balance console writes kallsyms: remove print_symbol() function checkpatch: add pF/pf deprecation warning symbol lookup: introduce dereference_symbol_descriptor() parisc64: Add .opd based function descriptor dereference powerpc64: Add .opd based function descriptor dereference ia64: Add .opd based function descriptor dereference sections: split dereference_function_descriptor() openrisc: Fix conflicting types for _exext and _stext lib: do not use print_symbol() irq debug: do not use print_symbol() sysfs: do not use print_symbol() drivers: do not use print_symbol() x86: do not use print_symbol() unicore32: do not use print_symbol() sh: do not use print_symbol() mn10300: do not use print_symbol() ...
2018-02-01annotate ep_scan_ready_list()Al Viro
make it always return __poll_t and have its callbacks do the same Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-02-01ep_send_events_proc(): return result via esed->resAl Viro
preparations for not mixing __poll_t and int in ep_scan_ready_list() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-02-01use linux/poll.h instead of asm/poll.hAl Viro
The only place that has any business including asm/poll.h is linux/poll.h. Fortunately, asm/poll.h had only been included in 3 places beyond that one, and all of them are trivial to switch to using linux/poll.h. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-02-01Merge branch 'KASAN-read_word_at_a_time'Linus Torvalds
Merge KASAN word-at-a-time fixups from Andrey Ryabinin. The word-at-a-time optimizations have caused headaches for KASAN, since the whole point is that we access byte streams in bigger chunks, and KASAN can be unhappy about the potential extra access at the end of the string. We used to have a horrible hack in dcache, and then people got complaints from the strscpy() case. This fixes it all up properly, by adding an explicit helper for the "access byte stream one word at a time" case. * emailed patches from Andrey Ryabinin <aryabinin@virtuozzo.com>: fs: dcache: Revert "manually unpoison dname after allocation to shut up kasan's reports" fs/dcache: Use read_word_at_a_time() in dentry_string_cmp() lib/strscpy: Shut up KASAN false-positives in strscpy() compiler.h: Add read_word_at_a_time() function. compiler.h, kasan: Avoid duplicating __read_once_size_nocheck()
2018-02-01fs: dcache: Revert "manually unpoison dname after allocation to shut up ↵Andrey Ryabinin
kasan's reports" This reverts commit df4c0e36f1b1782b0611a77c52cc240e5c4752dd. It's no longer needed since dentry_string_cmp() now uses read_word_at_a_time() to avoid kasan's reports. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01fs/dcache: Use read_word_at_a_time() in dentry_string_cmp()Andrey Ryabinin
dentry_string_cmp() performs the word-at-a-time reads from 'cs' and may read slightly more than it was requested in kmallac(). Normally this would make KASAN to report out-of-bounds access, but this was workarounded by commit df4c0e36f1b1 ("fs: dcache: manually unpoison dname after allocation to shut up kasan's reports"). This workaround is not perfect, since it allows out-of-bounds access to dentry's name for all the code, not just in dentry_string_cmp(). So it would be better to use read_word_at_a_time() instead and revert commit df4c0e36f1b1. Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-02-01gfs2: Glock dump performance regression fixAndreas Gruenbacher
Restore an optimization removed in commit 7f19449553 "Fix debugfs glocks dump": keep the glock hash table iterator active while the glock dump file is held open. This avoids having to rescan the hash table from the start for each read, with quadratically rising runtime. In addition, use rhastable_walk_peek for resuming a glock dump at the current position: when a glock doesn't fit in the provided buffer anymore, the next read must revisit the same glock. Finally, also restart the dump from the first entry when we notice that the hash table has been resized in gfs2_glock_seq_start. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-02-01gfs2: Fix the crc32c dependencyAndreas Gruenbacher
Depend on LIBCRC32C which uses the crypto API to select the appropriate crc32c implementation. With the CRYPTO and CRYPTO_CRC32C dependencies, gfs2 would still need to use the crypto API directly like ext4 and btrfs do, which isn't necessary. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-02-01Merge tag 'driver-core-4.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here is the set of "big" driver core patches for 4.16-rc1. The majority of the work here is in the firmware subsystem, with reworks to try to attempt to make the code easier to handle in the long run, but no functional change. There's also some tree-wide sysfs attribute fixups with lots of acks from the various subsystem maintainers, as well as a handful of other normal fixes and changes. And finally, some license cleanups for the driver core and sysfs code. All have been in linux-next for a while with no reported issues" * tag 'driver-core-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (48 commits) device property: Define type of PROPERTY_ENRTY_*() macros device property: Reuse property_entry_free_data() device property: Move property_entry_free_data() upper firmware: Fix up docs referring to FIRMWARE_IN_KERNEL firmware: Drop FIRMWARE_IN_KERNEL Kconfig option USB: serial: keyspan: Drop firmware Kconfig options sysfs: remove DEBUG defines sysfs: use SPDX identifiers drivers: base: add coredump driver ops sysfs: add attribute specification for /sysfs/devices/.../coredump test_firmware: fix missing unlock on error in config_num_requests_store() test_firmware: make local symbol test_fw_config static sysfs: turn WARN() into pr_warn() firmware: Fix a typo in fallback-mechanisms.rst treewide: Use DEVICE_ATTR_WO treewide: Use DEVICE_ATTR_RO treewide: Use DEVICE_ATTR_RW sysfs.h: Use octal permissions component: add debugfs support bus: simple-pm-bus: convert bool SIMPLE_PM_BUS to tristate ...
2018-02-01Merge tag 'staging-4.16-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging/IIO updates from Greg KH: "Here is the big Staging and IIO driver patches for 4.16-rc1. There is the normal amount of new IIO drivers added, like all releases. The networking IPX and the ncpfs filesystem are moved into the staging tree, as they are on their way out of the kernel due to lack of use anymore. The visorbus subsystem finall has started moving out of the staging tree to the "real" part of the kernel, and the most and fsl-mc codebases are almost ready to move out, that will probably happen for 4.17-rc1 if all goes well. Other than that, there is a bunch of license header cleanups in the tree, along with the normal amount of coding style churn that we all know and love for this codebase. I also got frustrated at the Meltdown/Spectre mess and took it out on the dgnc tty driver, deleting huge chunks of it that were never even being used. Full details of everything is in the shortlog. All of these patches have been in linux-next for a while with no reported issues" * tag 'staging-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (627 commits) staging: rtlwifi: remove redundant initialization of 'cfg_cmd' staging: rtl8723bs: remove a couple of redundant initializations staging: comedi: reformat lines to 80 chars or less staging: lustre: separate a connection destroy from free struct kib_conn Staging: rtl8723bs: Use !x instead of NULL comparison Staging: rtl8723bs: Remove dead code Staging: rtl8723bs: Change names to conform to the kernel code staging: ccree: Fix missing blank line after declaration staging: rtl8188eu: remove redundant initialization of 'pwrcfgcmd' staging: rtlwifi: remove unused RTLHALMAC_ST and RTLPHYDM_ST staging: fbtft: remove unused FB_TFT_SSD1325 kconfig staging: comedi: dt2811: remove redundant initialization of 'ns' staging: wilc1000: fix alignments to match open parenthesis staging: wilc1000: removed unnecessary defined enums typedef staging: wilc1000: remove unnecessary use of parentheses staging: rtl8192u: remove redundant initialization of 'timeout' staging: sm750fb: fix CamelCase for dispSet var staging: lustre: lnet/selftest: fix compile error on UP build staging: rtl8723bs: hal_com_phycfg: Remove unneeded semicolons staging: rts5208: Fix "seg_no" calculation in reset_ms_card() ...
2018-02-01fscrypt: fix build with pre-4.6 gcc versionsEric Biggers
gcc versions prior to 4.6 require an extra level of braces when using a designated initializer for a member in an anonymous struct or union. This caused a compile error with the 'struct qstr' initialization in __fscrypt_encrypt_symlink(). Fix it by using QSTR_INIT(). Reported-by: Andrew Morton <akpm@linux-foundation.org> Fixes: 76e81d6d5048 ("fscrypt: new helper functions for ->symlink()") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-02-01iversion: Rename make inode_cmp_iversion{+raw} to inode_eq_iversion{+raw}Goffredo Baroncelli
The function inode_cmp_iversion{+raw} is counter-intuitive, because it returns true when the counters are different and false when these are equal. Rename it to inode_eq_iversion{+raw}, which will returns true when the counters are equal and false otherwise. Signed-off-by: Goffredo Baroncelli <kreijack@inwind.it> Signed-off-by: Jeff Layton <jlayton@redhat.com>
2018-01-31xfs: fix u32 type usage in sb validation functionDarrick J. Wong
Don't use u32, use uint32_t, because this won't work in xfsprogs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
2018-01-31Merge tag 'docs-4.16' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "Documentation updates for 4.16. New stuff includes refcount_t documentation, errseq documentation, kernel-doc support for nested structure definitions, the removal of lots of crufty kernel-doc support for unused formats, SPDX tag documentation, the beginnings of a manual for subsystem maintainers, and lots of fixes and updates. As usual, some of the changesets reach outside of Documentation/ to effect kerneldoc comment fixes. It also adds the new LICENSES directory, of which Thomas promises I do not need to be the maintainer" * tag 'docs-4.16' of git://git.lwn.net/linux: (65 commits) linux-next: docs-rst: Fix typos in kfigure.py linux-next: DOC: HWPOISON: Fix path to debugfs in hwpoison.txt Documentation: Fix misconversion of #if docs: add index entry for networking/msg_zerocopy Documentation: security/credentials.rst: explain need to sort group_list LICENSES: Add MPL-1.1 license LICENSES: Add the GPL 1.0 license LICENSES: Add Linux syscall note exception LICENSES: Add the MIT license LICENSES: Add the BSD-3-clause "Clear" license LICENSES: Add the BSD 3-clause "New" or "Revised" License LICENSES: Add the BSD 2-clause "Simplified" license LICENSES: Add the LGPL-2.1 license LICENSES: Add the LGPL 2.0 license LICENSES: Add the GPL 2.0 license Documentation: Add license-rules.rst to describe how to properly identify file licenses scripts: kernel_doc: better handle show warnings logic fs/*/Kconfig: drop links to 404-compliant http://acl.bestbits.at doc: md: Fix a file name to md-fault.c in fault-injection.txt errseq: Add to documentation tree ...
2018-01-31Merge branch 'work.dcache' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache updates from Al Viro: "Neil Brown's d_move()/d_path() race fix" * 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: VFS: close race between getcwd() and d_move()