linux.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2016-12-10	ext4: do not perform data journaling when data is encrypted	Sergey Karamov
	Currently data journalling is incompatible with encryption: enabling both at the same time has never been supported by design, and would result in unpredictable behavior. However, users are not precluded from turning on both features simultaneously. This change programmatically replaces data journaling for encrypted regular files with ordered data journaling mode. Background: Journaling encrypted data has not been supported because it operates on buffer heads of the page in the page cache. Namely, when the commit happens, which could be up to five seconds after caching, the commit thread uses the buffer heads attached to the page to copy the contents of the page to the journal. With encryption, it would have been required to keep the bounce buffer with ciphertext for up to the aforementioned five seconds, since the page cache can only hold plaintext and could not be used for journaling. Alternatively, it would be required to setup the journal to initiate a callback at the commit time to perform deferred encryption - in this case, not only would the data have to be written twice, but it would also have to be encrypted twice. This level of complexity was not justified for a mode that in practice is very rarely used because of the overhead from the data journalling. Solution: If data=journaled has been set as a mount option for a filesystem, or if journaling is enabled on a regular file, do not perform journaling if the file is also encrypted, instead fall back to the data=ordered mode for the file. Rationale: The intent is to allow seamless and proper filesystem operation when journaling and encryption have both been enabled, and have these two conflicting features gracefully resolved by the filesystem. Fixes: 4461471107b7 Signed-off-by: Sergey Karamov <skaramov@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-12-10	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller

2016-12-10	ext4: return -ENOMEM instead of success	Dan Carpenter
	We should set the error code if kzalloc() fails. Fixes: 67cf5b09a46f ("ext4: add the basic function for inline data support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
2016-12-10	ext4: reject inodes with negative size	Darrick J. Wong
	Don't load an inode with a negative size; this causes integer overflow problems in the VFS. [ Added EXT4_ERROR_INODE() to mark file system as corrupted. -TYT] Fixes: a48380f769df (ext4: rename i_dir_acl to i_size_high) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2016-12-09	Merge tag 'ceph-for-4.9-rc9' of git://github.com/ceph/ceph-client	Linus Torvalds
	Pull ceph fix from Ilya Dryomov: "A fix for an issue with ->d_revalidate() in ceph, causing frequent kernel crashes. Marked for stable - it goes back to 4.6, but started popping up only in 4.8" * tag 'ceph-for-4.9-rc9' of git://github.com/ceph/ceph-client: ceph: don't set req->r_locked_dir in ceph_d_revalidate
2016-12-08	ceph: don't set req->r_locked_dir in ceph_d_revalidate	Jeff Layton
	This function sets req->r_locked_dir which is supposed to indicate to ceph_fill_trace that the parent's i_rwsem is locked for write. Unfortunately, there is no guarantee that the dir will be locked when d_revalidate is called, so we really don't want ceph_fill_trace to do any dcache manipulation from this context. Clear req->r_locked_dir since it's clearly not safe to do that. What we really want to know with d_revalidate is whether the dentry still points to the same inode. ceph_fill_trace installs a pointer to the inode in req->r_target_inode, so we can just compare that to d_inode(dentry) to see if it's the same one after the lookup. Also, since we aren't generally interested in the parent here, we can switch to using a GETATTR to hint that to the MDS, which also means that we only need to reserve one cap. Finally, just remove the d_unhashed check. That's really outside the purview of a filesystem's d_revalidate. If the thing became unhashed while we're checking it, then that's up to the VFS to handle anyway. Fixes: 200fd27c8fa2 ("ceph: use lookup request to revalidate dentry") Link: http://tracker.ceph.com/issues/18041 Reported-by: Donatas Abraitis <donatas.abraitis@gmail.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: "Yan, Zheng" <zyan@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-12-07	f2fs: fix to access nullified flush_cmd_control pointer	Jaegeuk Kim
	f2fs_sync_file() remount_ro - f2fs_readonly - destroy_flush_cmd_control - f2fs_issue_flush - no fcc pointer! So, this patch doesn't free fcc in this case, but just stop its kernel thread which sends flush commands. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-07	f2fs: free meta pages if sanity check for ckpt is failed	Jaegeuk Kim
	This fixes missing freeing meta pages in the error case. Tested-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-07	f2fs: detect wrong layout	Jaegeuk Kim
	Previous mkfs.f2fs allows small partition inappropriately, so f2fs should detect that as well. Refer this in f2fs-tools. mkfs.f2fs: detect small partition by overprovision ratio and # of segments Reported-and-Tested-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-06	fuse: fix clearing suid, sgid for chown()	Miklos Szeredi
	Basically, the pjdfstests set the ownership of a file to 06555, and then chowns it (as root) to a new uid/gid. Prior to commit a09f99eddef4 ("fuse: fix killing s[ug]id in setattr"), fuse would send down a setattr with both the uid/gid change and a new mode. Now, it just sends down the uid/gid change. Technically this is NOTABUG, since POSIX doesn't _require_ that we clear these bits for a privileged process, but Linux (wisely) has done that and I think we don't want to change that behavior here. This is caused by the use of should_remove_suid(), which will always return 0 when the process has CAP_FSETID. In fact we really don't need to be calling should_remove_suid() at all, since we've already been indicated that we should remove the suid, we just don't want to use a (very) stale mode for that. This patch should fix the above as well as simplify the logic. Reported-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: a09f99eddef4 ("fuse: fix killing s[ug]id in setattr") Cc: <stable@vger.kernel.org> Reviewed-by: Jeff Layton <jlayton@redhat.com>
2016-12-05	f2fs: call sync_fs when f2fs is idle	Jaegeuk Kim
	The sync_fs in f2fs_balance_fs_bg must avoid interrupting current user requests. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-05	Revert "f2fs: use percpu_counter for # of dirty pages in inode"	Jaegeuk Kim
	This reverts commit 1beba1b3a953107c3ff5448ab4e4297db4619c76. The perpcu_counter doesn't provide atomicity in single core and consume more DRAM. That incurs fs_mark test failure due to ENOMEM. Cc: stable@vger.kernel.org # 4.7+ Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-12-03	ext4: remove another test in ext4_alloc_file_blocks()	Dan Carpenter
	Before commit c3fe493ccdb1 ('ext4: remove unneeded test in ext4_alloc_file_blocks()') then it was possible for "depth" to be -1 but now, it's not possible that it is negative. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-12-03	ext4: fix checks for data=ordered and journal_async_commit options	Jan Kara
	Combination of data=ordered mode and journal_async_commit mount option is invalid. However the check in parse_options() fails to detect the case where we simply end up defaulting to data=ordered mode and we detect the problem only on remount which triggers hard to understand failure to remount the filesystem. Fix the checking of mount options to take into account also the default mode by moving the check somewhat later in the mount sequence. Reported-by: Wolfgang Walter <linux@stwm.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-03	mbcache: document that "find" functions only return reusable entries	Eric Biggers
	mb_cache_entry_find_first() and mb_cache_entry_find_next() only return cache entries with the 'e_reusable' bit set. This should be documented. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-12-03	mbcache: use consistent type for entry count	Eric Biggers
	mbcache used several different types to represent the number of entries in the cache. For consistency within mbcache and with the shrinker API, always use unsigned long. This does not change behavior for current mbcache users (ext2 and ext4) since they limit the entry count to a value which easily fits in an int. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-12-03	mbcache: remove unnecessary module_get/module_put	Eric Biggers
	When mbcache is built as a module, any modules that use it (ext2 and/or ext4) will depend on its symbols directly, incrementing its reference count. Therefore, there is no need to do module_get/module_put. Also note that since the module_get/module_put were in the mbcache module itself, executing those lines of code was already dependent on another reference to the mbcache module being held. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-12-03	mbcache: don't BUG() if entry cache cannot be allocated	Eric Biggers
	mbcache can be a module that is loaded long after startup, when someone asks to mount an ext2 or ext4 filesystem. Therefore it should not BUG() if kmem_cache_create() fails, but rather just fail the module load. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-12-03	mbcache: correctly handle 'e_referenced' bit	Eric Biggers
	mbcache entries have an 'e_referenced' bit which users can set with mb_cache_entry_touch() to indicate that an entry should be given another pass through the LRU list before the shrinker can delete it. However, mb_cache_shrink() actually would, when seeing an e_referenced entry at the front of the list (the least-recently used end), place it right at the front of the list again. The next iteration would then remove the entry from the list and delete it. Consequently, e_referenced had essentially no effect, so ext2/ext4 xattr blocks would sometimes not be reused as often as expected. Fix this by making the shrinker move e_referenced entries to the back of the list rather than the front. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
2016-12-03	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller
	Couple conflicts resolved here: 1) In the MACB driver, a bug fix to properly initialize the RX tail pointer properly overlapped with some changes to support variable sized rings. 2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix overlapping with a reorganization of the driver to support ACPI, OF, as well as PCI variants of the chip. 3) In 'net' we had several probe error path bug fixes to the stmmac driver, meanwhile a lot of this code was cleaned up and reorganized in 'net-next'. 4) The cls_flower classifier obtained a helper function in 'net-next' called __fl_delete() and this overlapped with Daniel Borkamann's bug fix to use RCU for object destruction in 'net'. It also overlapped with Jiri's change to guard the rhashtable_remove_fast() call with a check against tc_skip_sw(). 5) In mlx4, a revert bug fix in 'net' overlapped with some unrelated changes in 'net-next'. 6) In geneve, a stale header pointer after pskb_expand_head() bug fix in 'net' overlapped with a large reorganization of the same code in 'net-next'. Since the 'net-next' code no longer had the bug in question, there was nothing to do other than to simply take the 'net-next' hunks. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-02	ext4: fix reading new encrypted symlinks on no-journal file systems	Theodore Ts'o
	On a filesystem with no journal, a symlink longer than about 32 characters (exact length depending on padding for encryption) could not be followed or read immediately after being created in an encrypted directory. This happened because when the symlink data went through the delayed allocation path instead of the journaling path, the symlink was incorrectly detected as a "fast" symlink rather than a "slow" symlink until its data was written out. To fix this, disable delayed allocation for symlinks, since there is no benefit for delayed allocation anyway. Reported-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-01	ext4: validate s_first_meta_bg at mount time	Eryu Guan
	Ralf Spenneberg reported that he hit a kernel crash when mounting a modified ext4 image. And it turns out that kernel crashed when calculating fs overhead (ext4_calculate_overhead()), this is because the image has very large s_first_meta_bg (debug code shows it's 842150400), and ext4 overruns the memory in count_overhead() when setting bitmap buffer, which is PAGE_SIZE. ext4_calculate_overhead(): buf = get_zeroed_page(GFP_NOFS); <=== PAGE_SIZE buffer blks = count_overhead(sb, i, buf); count_overhead(): for (j = ext4_bg_num_gdb(sb, grp); j > 0; j--) { <=== j = 842150400 ext4_set_bit(EXT4_B2C(sbi, s++), buf); <=== buffer overrun count++; } This can be reproduced easily for me by this script: #!/bin/bash rm -f fs.img mkdir -p /mnt/ext4 fallocate -l 16M fs.img mke2fs -t ext4 -O bigalloc,meta_bg,^resize_inode -F fs.img debugfs -w -R "ssv first_meta_bg 842150400" fs.img mount -o loop fs.img /mnt/ext4 Fix it by validating s_first_meta_bg first at mount time, and refusing to mount if its value exceeds the largest possible meta_bg number. Reported-by: Ralf Spenneberg <ralf@os-t.de> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2016-12-01	ext4: correctly detect when an xattr value has an invalid size	Eric Biggers
	It was possible for an xattr value to have a very large size, which would then pass validation on 32-bit architectures due to a pointer wraparound. Fix this by validating the size in a way which avoids pointer wraparound. It was also possible that a value's size would fit in the available space but its padded size would not. This would cause an out-of-bounds memory write in ext4_xattr_set_entry when replacing the xattr value. For example, if an xattr value of unpadded size 253 bytes went until the very end of the inode or block, then using setxattr(2) to replace this xattr's value with 256 bytes would cause a write to the 3 bytes past the end of the inode or buffer, and the new xattr value would be incorrectly truncated. Fix this by requiring that the padded size fit in the available space rather than the unpadded size. This patch shouldn't have any noticeable effect on non-corrupted/non-malicious filesystems. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-01	ext4: don't read out of bounds when checking for in-inode xattrs	Eric Biggers
	With i_extra_isize equal to or close to the available space, it was possible for us to read past the end of the inode when trying to detect or validate in-inode xattrs. Fix this by checking for the needed extra space first. This patch shouldn't have any noticeable effect on non-corrupted/non-malicious filesystems. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2016-12-01	ext4: forbid i_extra_isize not divisible by 4	Eric Biggers
	i_extra_isize not divisible by 4 is problematic for several reasons: - It causes the in-inode xattr space to be misaligned, but the xattr header and entries are not declared __packed to express this possibility. This may cause poor performance or incorrect code generation on some platforms. - When validating the xattr entries we can read past the end of the inode if the size available for xattrs is not a multiple of 4. - It allows the nonsensical i_extra_isize=1, which doesn't even leave enough room for i_extra_isize itself. Therefore, update ext4_iget() to consider i_extra_isize not divisible by 4 to be an error, like the case where i_extra_isize is too large. This also matches the rule recently added to e2fsck for determining whether an inode has valid i_extra_isize. This patch shouldn't have any noticeable effect on non-corrupted/non-malicious filesystems, since the size of ext4_inode has always been a multiple of 4. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2016-12-01	Merge branch 'overlayfs-linus' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull overlayfs fix from Miklos Szeredi: "This fixes a regression introduced in 4.8" * 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: fix d_real() for stacked fs
2016-12-01	ext4: disable pwsalt ioctl when encryption disabled by config	Eric Biggers
	On a CONFIG_EXT4_FS_ENCRYPTION=n kernel, the ioctls to get and set encryption policies were disabled but EXT4_IOC_GET_ENCRYPTION_PWSALT was not. But there's no good reason to expose the pwsalt ioctl if the kernel doesn't support encryption. The pwsalt ioctl was also disabled pre-4.8 (via ext4_sb_has_crypto() previously returning 0 when encryption was disabled by config) and seems to have been enabled by mistake when ext4 encryption was refactored to use fs/crypto/. So let's disable it again. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-01	ext4: get rid of ext4_sb_has_crypto()	Eric Biggers
	ext4_sb_has_crypto() just called through to ext4_has_feature_encrypt(), and all callers except one were already using the latter. So remove it and switch its one caller to ext4_has_feature_encrypt(). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-01	ext4: fix inode checksum calculation problem if i_extra_size is small	Daeho Jeong
	We've fixed the race condition problem in calculating ext4 checksum value in commit b47820edd163 ("ext4: avoid modifying checksum fields directly during checksum veficationon"). However, by this change, when calculating the checksum value of inode whose i_extra_size is less than 4, we couldn't calculate the checksum value in a proper way. This problem was found and reported by Nix, Thank you. Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Daeho Jeong <daeho.jeong@samsung.com> Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-01	ext4: warn when page is dirtied without buffers	Jan Kara
	Warn when a page is dirtied without buffers (as that will likely lead to a crash in ext4_writepages()) or when it gets newly dirtied without the page being locked (as there is nothing that prevents buffers to get stripped just before calling set_page_dirty() under memory pressure). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-12-01	block: protect iterate_bdevs() against concurrent close	Rabin Vincent
	If a block device is closed while iterate_bdevs() is handling it, the following NULL pointer dereference occurs because bdev->b_disk is NULL in bdev_get_queue(), which is called from blk_get_backing_dev_info() (in turn called by the mapping_cap_writeback_dirty() call in __filemap_fdatawrite_range()): BUG: unable to handle kernel NULL pointer dereference at 0000000000000508 IP: [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 PGD 9e62067 PUD 9ee8067 PMD 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: CPU: 1 PID: 2422 Comm: sync Not tainted 4.5.0-rc7+ #400 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) task: ffff880009f4d700 ti: ffff880009f5c000 task.ti: ffff880009f5c000 RIP: 0010:[<ffffffff81314790>] [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 RSP: 0018:ffff880009f5fe68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88000ec17a38 RCX: ffffffff81a4e940 RDX: 7fffffffffffffff RSI: 0000000000000000 RDI: ffff88000ec176c0 RBP: ffff880009f5fe68 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff88000ec17860 R13: ffffffff811b25c0 R14: ffff88000ec178e0 R15: ffff88000ec17a38 FS: 00007faee505d700(0000) GS:ffff88000fb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000508 CR3: 0000000009e8a000 CR4: 00000000000006e0 Stack: ffff880009f5feb8 ffffffff8112e7f5 0000000000000000 7fffffffffffffff 0000000000000000 0000000000000000 7fffffffffffffff 0000000000000001 ffff88000ec178e0 ffff88000ec17860 ffff880009f5fec8 ffffffff8112e81f Call Trace: [<ffffffff8112e7f5>] __filemap_fdatawrite_range+0x85/0x90 [<ffffffff8112e81f>] filemap_fdatawrite+0x1f/0x30 [<ffffffff811b25d6>] fdatawrite_one_bdev+0x16/0x20 [<ffffffff811bc402>] iterate_bdevs+0xf2/0x130 [<ffffffff811b2763>] sys_sync+0x63/0x90 [<ffffffff815d4272>] entry_SYSCALL_64_fastpath+0x12/0x76 Code: 0f 1f 44 00 00 48 8b 87 f0 00 00 00 55 48 89 e5 <48> 8b 80 08 05 00 00 5d RIP [<ffffffff81314790>] blk_get_backing_dev_info+0x10/0x20 RSP <ffff880009f5fe68> CR2: 0000000000000508 ---[ end trace 2487336ceb3de62d ]--- The crash is easily reproducible by running the following command, if an msleep(100) is inserted before the call to func() in iterate_devs(): while :; do head -c1 /dev/nullb0; done > /dev/null & while :; do sync; done Fix it by holding the bd_mutex across the func() call and only calling func() if the bdev is opened. Cc: stable@vger.kernel.org Fixes: 5c0d6b60a0ba ("vfs: Create function for iterating over block devices") Reported-and-tested-by: Wei Fang <fangwei1@huawei.com> Signed-off-by: Rabin Vincent <rabinv@axis.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
2016-11-30	isofs: add KERN_CONT to printing of ER records	Mike Rapoport
	The ER records are printed without explicit log level presuming line continuation until "\n". After the commit 4bcc595ccd8 (printk: reinstate KERN_CONT for printing continuation lines), the ER records are printed a character per line. Adding KERN_CONT to appropriate printk statements restores the printout behavior. Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-11-29	f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage	Chao Yu
	We should use AOP_WRITEPAGE_ACTIVATE when we bypass writing pages. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Miao Xie <miaoxie@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-29	f2fs: do not activate auto_recovery for fallocated i_size	Jaegeuk Kim
	If a file needs to keep its i_size by fallocate, we need to turn off auto recovery during roll-forward recovery. This will resolve the below scenario. 1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4096" -c "fsync" 2. xfs_io -f /mnt/f2fs/file -c "falloc -k 4096 4096" -c "fsync" 3. md5sum /mnt/f2fs/file; 4. godown /mnt/f2fs/ 5. umount /mnt/f2fs/ 6. mount -t f2fs /dev/sdx /mnt/f2fs 7. md5sum /mnt/f2fs/file Reported-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-29	kernfs: Declare two local data structures static	Bart Van Assche
	This was spotted by the 'sparse' static checker. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-29	ext4: be more strict when verifying flags set via SETFLAGS ioctls	Jan Kara
	Currently we just silently ignore flags that we don't understand (or that cannot be manipulated) through EXT4_IOC_SETFLAGS and EXT4_IOC_FSSETXATTR ioctls. This makes it problematic for the unused flags to be used in future (some app may be inadvertedly setting them and we won't notice until the flag gets used). Also this is inconsistent with other filesystems like XFS or BTRFS which return EOPNOTSUPP when they see a flag they cannot set. ext4 has the additional problem that there are flags which are returned by EXT4_IOC_GETFLAGS ioctl but which cannot be modified via EXT4_IOC_SETFLAGS. So we have to be careful to ignore value of these flags and not fail the ioctl when they are set (as e.g. chattr(1) passes flags returned from EXT4_IOC_GETFLAGS to EXT4_IOC_SETFLAGS without any masking and thus we'd break this utility). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-29	ext4: add EXT4_JOURNAL_DATA_FL and EXT4_EXTENTS_FL to modifiable mask	Jan Kara
	Add EXT4_JOURNAL_DATA_FL and EXT4_EXTENTS_FL to EXT4_FL_USER_MODIFIABLE to recognize that they are modifiable by userspace. So far we got away without having them there because ext4_ioctl_setflags() treats them in a special way. But it was really confusing like that. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2016-11-29	ovl: fix d_real() for stacked fs	Miklos Szeredi
	Handling of recursion in d_real() is completely broken. Recursion is only done in the 'inode != NULL' case. But when opening the file we have 'inode == NULL' hence d_real() will return an overlay dentry. This won't work since overlayfs doesn't define its own file operations, so all file ops will fail. Fix by doing the recursion first and the check against the inode second. Bash script to reproduce the issue written by Quentin: - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - tmpdir=$(mktemp -d) pushd ${tmpdir} mkdir -p {upper,lower,work} echo -n 'rocks' > lower/ksplice mount -t overlay level_zero upper -o lowerdir=lower,upperdir=upper,workdir=work cat upper/ksplice tmpdir2=$(mktemp -d) pushd ${tmpdir2} mkdir -p {upper,work} mount -t overlay level_one upper -o lowerdir=${tmpdir}/upper,upperdir=upper,workdir=work ls -l upper/ksplice cat upper/ksplice - 8< - - - - - 8< - - - - - 8< - - - - - 8< - - - - Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 2d902671ce1c ("vfs: merge .d_select_inode() into .d_real()") Cc: <stable@vger.kernel.org> # v4.8+
2016-11-28	CIFS: iterate over posix acl xattr entry correctly in ACL_to_cifs_posix()	Eryu Guan
	Commit 2211d5ba5c6c ("posix_acl: xattr representation cleanups") removes the typedefs and the zero-length a_entries array in struct posix_acl_xattr_header, and uses bare struct posix_acl_xattr_header and struct posix_acl_xattr_entry directly. But it failed to iterate over posix acl slots when converting posix acls to CIFS format, which results in several test failures in xfstests (generic/053 generic/105) when testing against a samba v1 server, starting from v4.9-rc1 kernel. e.g. [root@localhost xfstests]# diff -u tests/generic/105.out /root/xfstests/results//generic/105.out.bad --- tests/generic/105.out 2016-09-19 16:33:28.577962575 +0800 +++ /root/xfstests/results//generic/105.out.bad 2016-10-22 15:41:15.201931110 +0800 @@ -1,3 +1,4 @@ QA output created by 105 -rw-r--r-- root +setfacl: subdir: Invalid argument -rw-r--r-- root Fix it by introducing a new "ace" var, like what cifs_copy_posix_acl() does, and iterating posix acl xattr entries over it in the for loop. Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com>
2016-11-28	Call echo service immediately after socket reconnect	Sachin Prabhu
	Commit 4fcd1813e640 ("Fix reconnect to not defer smb3 session reconnect long after socket reconnect") changes the behaviour of the SMB2 echo service and causes it to renegotiate after a socket reconnect. However under default settings, the echo service could take up to 120 seconds to be scheduled. The patch forces the echo service to be called immediately resulting a negotiate call being made immediately on reconnect. Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <smfrench@gmail.com>
2016-11-28	CIFS: Fix BUG() in calc_seckey()	Sachin Prabhu
	Andy Lutromirski's new virtually mapped kernel stack allocations moves kernel stacks the vmalloc area. This triggers the bug kernel BUG at ./include/linux/scatterlist.h:140! at calc_seckey()->sg_init() Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com>
2016-11-28	f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack	Jaegeuk Kim
	We don't guarantee cp_addr is fixed by cp_version. This is to sync with f2fs-tools. Cc: stable@vger.kernel.org Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-26	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	David S. Miller
	udplite conflict is resolved by taking what 'net-next' did which removed the backlog receive method assignment, since it is no longer necessary. Two entries were added to the non-priv ethtool operations switch statement, one in 'net' and one in 'net-next, so simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
2016-11-26	fix default_file_splice_read()	Al Viro
	Botched calculation of number of pages. As the result, we were dropping pieces when doing splice to pipe from e.g. 9p. Reported-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-11-26	ext4: fix mmp use after free during unmount	Eric Sandeen
	In ext4_put_super, we call brelse on the buffer head containing the ext4 superblock, but then try to use it when we stop the mmp thread, because when the thread shuts down it does: write_mmp_block ext4_mmp_csum_set ext4_has_metadata_csum WARN_ON_ONCE(ext4_has_feature_metadata_csum(sb)...) which reaches into sb->s_fs_info->s_es->s_feature_ro_compat, which lives in the superblock buffer s_sbh which we just released. Fix this by moving the brelse down to a point where we are no longer using it. Reported-by: Wang Shu <shuwang@redhat.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca>
2016-11-25	f2fs: fix 32-bit build	Arnd Bergmann
	The addition of multiple-device support broke CONFIG_BLK_DEV_ZONED on 32-bit machines because of a 64-bit division: fs/f2fs/f2fs.o: In function `__issue_discard_async': extent_cache.c:(.text.__issue_discard_async+0xd4): undefined reference to `__aeabi_uldivmod' Fortunately, bdev_zone_size() is guaranteed to return a power-of-two number, so we can replace the % operator with a cheaper bit mask. Fixes: 792b84b74b54 ("f2fs: support multiple devices") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25	f2fs: set ->owner for debugfs status file's file_operations	Nicolai Stange
	The struct file_operations instance serving the f2fs/status debugfs file lacks an initialization of its ->owner. This means that although that file might have been opened, the f2fs module can still get removed. Any further operation on that opened file, releasing included, will cause accesses to unmapped memory. Indeed, Mike Marshall reported the following: BUG: unable to handle kernel paging request at ffffffffa0307430 IP: [<ffffffff8132a224>] full_proxy_release+0x24/0x90 <...> Call Trace: [] __fput+0xdf/0x1d0 [] ____fput+0xe/0x10 [] task_work_run+0x8e/0xc0 [] do_exit+0x2ae/0xae0 [] ? __audit_syscall_entry+0xae/0x100 [] ? syscall_trace_enter+0x1ca/0x310 [] do_group_exit+0x44/0xc0 [] SyS_exit_group+0x14/0x20 [] do_syscall_64+0x61/0x150 [] entry_SYSCALL64_slow_path+0x25/0x25 <...> ---[ end trace f22ae883fa3ea6b8 ]--- Fixing recursive fault but reboot is needed! Fix this by initializing the f2fs/status file_operations' ->owner with THIS_MODULE. This will allow debugfs to grab a reference to the f2fs module upon any open on that file, thus preventing it from getting removed. Fixes: 902829aa0b72 ("f2fs: move proc files to debugfs") Reported-by: Mike Marshall <hubcap@omnibond.com> Reported-by: Martin Brandenburg <martin@omnibond.com> Cc: stable@vger.kernel.org Signed-off-by: Nicolai Stange <nicstange@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25	f2fs: fix incorrect free inode count in ->statfs	Chao Yu
	While calculating inode count that we can create at most in the left space, we should consider space which data/node blocks occupied, since we create data/node mixly in main area. So fix the wrong calculation in ->statfs. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25	f2fs: drop duplicate header timer.h	Geliang Tang
	Drop duplicate header timer.h from segment.c. Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2016-11-25	f2fs: fix wrong AUTO_RECOVER condition	Jaegeuk Kim
	If i_size is not aligned to the f2fs's block size, we should not skip inode update during fsync. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>