summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-09-19xfs: set up per-AG free space reservationsDarrick J. Wong
One unfortunate quirk of the reference count and reverse mapping btrees -- they can expand in size when blocks are written to *other* allocation groups if, say, one large extent becomes a lot of tiny extents. Since we don't want to start throwing errors in the middle of CoWing, we need to reserve some blocks to handle future expansion. The transaction block reservation counters aren't sufficient here because we have to have a reserve of blocks in every AG, not just somewhere in the filesystem. Therefore, create two per-AG block reservation pools. One feeds the AGFL so that rmapbt expansion always succeeds, and the other feeds all other metadata so that refcountbt expansion never fails. Use the count of how many reserved blocks we need to have on hand to create a virtual reservation in the AG. Through selective clamping of the maximum length of allocation requests and of the length of the longest free extent, we can make it look like there's less free space in the AG unless the reservation owner is asking for blocks. In other words, play some accounting tricks in-core to make sure that we always have blocks available. On the plus side, there's nothing to clean up if we crash, which is contrast to the strategy that the rough draft used (actually removing extents from the freespace btrees). Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-09-19xfs: defer should allow ->finish_item to request a new transactionDarrick J. Wong
When xfs_defer_finish calls ->finish_item, it's possible that (refcount) won't be able to finish all the work in a single transaction. When this happens, the ->finish_item handler should shorten the log done item's list count, update the work item to reflect where work should continue, and return -EAGAIN so that defer_finish knows to retain the pending item on the pending list, roll the transaction, and restart processing where we left off. Plumb in the code and document how this mechanism is supposed to work. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2016-09-19xfs: count the blocks in a btreeDarrick J. Wong
Provide a helper method to count the number of blocks in a short form btree. The refcount and rmap btrees need to know the number of blocks already in use to set up their per-AG block reservations during mount. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-09-19xfs: create a standard btree size calculator codeDarrick J. Wong
Create a helper to generate AG btree height calculator functions. This will be used (much) later when we get to the refcount btree. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-09-19xfs: remove xfs_btree_bigkeyDarrick J. Wong
Remove the xfs_btree_bigkey mess and simply make xfs_btree_key big enough to hold both keys in-core. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-09-19xfs: convert RUI log formats to use variable length arraysDarrick J. Wong
Use variable length array declarations for RUI log items, and replace the open coded sizeof formulae with a single function. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-09-19iomap: add a flag to report shared extentsDarrick J. Wong
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-09-19fs: add iomap_file_dirtyChristoph Hellwig
Originally-From: Christoph Hellwig <hch@lst.de> This function uses the iomap infrastructure to re-write all pages in a given range. This is useful for doing a copy-up of COW ranges, and might be useful for scrubbing in the future. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-09-19xfs: Document error handlers behaviorCarlos Maiolino
Document the implementation of error handlers into sysfs. [dchinner: Added lots more detail.] Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com>
2016-09-14xfs: normalize "infinite" retries in error configsEric Sandeen
As it stands today, the "fail immediately" vs. "retry forever" values for max_retries and retry_timeout_seconds in the xfs metadata error configurations are not consistent. A retry_timeout_seconds of 0 means "retry forever," but a max_retries of 0 means "fail immediately." retry_timeout_seconds < 0 is disallowed, while max_retries == -1 means "retry forever." Make this consistent across the error configs, such that a value of 0 means "fail immediately" (i.e. wait 0 seconds, or retry 0 times), and a value of -1 always means "retry forever." This makes retry_timeout a signed long to accommodate the -1, even though it stores jiffies. Given our limit of a 1 day maximum timeout, this should be sufficient even at much higher HZ values than we have available today. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-09-14xfs: fix signed integer overflowXie XiuQi
Use 1U for unsigned int to avoid a overflow warning from UBSAN. [ 31.910858] UBSAN: Undefined behaviour in fs/xfs/xfs_buf_item.c:889:25 [ 31.911252] signed integer overflow: [ 31.911478] -2147483648 - 1 cannot be represented in type 'int' [ 31.911846] CPU: 1 PID: 1011 Comm: tuned Tainted: G B ---- ------- 3.10.0-327.28.3.el7.x86_64 #1 [ 31.911857] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 01/07/2011 [ 31.911866] 1ffff1004069cd3b 0000000076bec3fd ffff8802034e69a0 ffffffff81ee3140 [ 31.911883] ffff8802034e69b8 ffffffff81ee31fd ffffffffa0ad79e0 ffff8802034e6b20 [ 31.911898] ffffffff81ee46e2 0000002d515470c0 0000000000000001 0000000041b58ab3 [ 31.911913] Call Trace: [ 31.911932] [<ffffffff81ee3140>] dump_stack+0x1e/0x20 [ 31.911947] [<ffffffff81ee31fd>] ubsan_epilogue+0x12/0x55 [ 31.911964] [<ffffffff81ee46e2>] handle_overflow+0x1ba/0x215 [ 31.912083] [<ffffffff81ee4798>] __ubsan_handle_sub_overflow+0x2a/0x31 [ 31.912204] [<ffffffffa08676fb>] xfs_buf_item_log+0x34b/0x3f0 [xfs] [ 31.912314] [<ffffffffa0880490>] xfs_trans_log_buf+0x120/0x260 [xfs] [ 31.912402] [<ffffffffa079a890>] xfs_btree_log_recs+0x80/0xc0 [xfs] [ 31.912490] [<ffffffffa07a29f8>] xfs_btree_delrec+0x11a8/0x2d50 [xfs] [ 31.913589] [<ffffffffa07a86f9>] xfs_btree_delete+0xc9/0x260 [xfs] [ 31.913762] [<ffffffffa075b5cf>] xfs_free_ag_extent+0x63f/0xe20 [xfs] [ 31.914339] [<ffffffffa075ec0f>] xfs_free_extent+0x2af/0x3e0 [xfs] [ 31.914641] [<ffffffffa0801b2b>] xfs_bmap_finish+0x32b/0x4b0 [xfs] [ 31.914841] [<ffffffffa083c2e7>] xfs_itruncate_extents+0x3b7/0x740 [xfs] [ 31.915216] [<ffffffffa08342fa>] xfs_setattr_size+0x60a/0x860 [xfs] [ 31.915471] [<ffffffffa08345ea>] xfs_vn_setattr+0x9a/0xe0 [xfs] [ 31.915590] [<ffffffff8149ad38>] notify_change+0x5c8/0x8a0 [ 31.915607] [<ffffffff81450f22>] do_truncate+0x122/0x1d0 [ 31.915640] [<ffffffff8147beee>] do_last+0x15de/0x2c80 [ 31.915707] [<ffffffff8147d777>] path_openat+0x1e7/0xcc0 [ 31.915802] [<ffffffff81480824>] do_filp_open+0xa4/0x160 [ 31.915848] [<ffffffff81453127>] do_sys_open+0x1b7/0x3f0 [ 31.915879] [<ffffffff81453392>] SyS_open+0x32/0x40 [ 31.915897] [<ffffffff81f08989>] system_call_fastpath+0x16/0x1b [ 240.086809] UBSAN: Undefined behaviour in fs/xfs/xfs_buf_item.c:866:34 [ 240.086820] signed integer overflow: [ 240.086830] -2147483648 - 1 cannot be represented in type 'int' [ 240.086846] CPU: 1 PID: 12969 Comm: rm Tainted: G B ---- ------- 3.10.0-327.28.3.el7.x86_64 #1 [ 240.086857] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 01/07/2011 [ 240.086868] 1ffff10040491def 00000000e2ea59c1 ffff88020248ef40 ffffffff81ee3140 [ 240.086885] ffff88020248ef58 ffffffff81ee31fd ffffffffa0ad79e0 ffff88020248f0c0 [ 240.086901] ffffffff81ee46e2 0000002d02488000 0000000000000001 0000000041b58ab3 [ 240.086915] Call Trace: [ 240.086938] [<ffffffff81ee3140>] dump_stack+0x1e/0x20 [ 240.086953] [<ffffffff81ee31fd>] ubsan_epilogue+0x12/0x55 [ 240.086971] [<ffffffff81ee46e2>] handle_overflow+0x1ba/0x215 ... Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-09-14Make __xfs_xattr_put_listen preperly report errors.Artem Savkov
Commit 2a6fba6 "xfs: only return -errno or success from attr ->put_listent" changes the returnvalue of __xfs_xattr_put_listen to 0 in case when there is insufficient space in the buffer assuming that setting context->count to -1 would be enough, but all of the ->put_listent callers only check seen_enough. This results in a failed assertion: XFS: Assertion failed: context->count >= 0, file: fs/xfs/xfs_xattr.c, line: 175 in insufficient buffer size case. This is only reproducible with at least 2 xattrs and only when the buffer gets depleted before the last one. Furthermore if buffersize is such that it is enough to hold the last xattr's name, but not enough to hold the sum of preceeding xattr names listxattr won't fail with ERANGE, but will suceed returning last xattr's name without the first character. The first character end's up overwriting data stored at (context->alist - 1). Signed-off-by: Artem Savkov <asavkov@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-09-14xfs: change mailing list addressDave Chinner
oss.sgi.com is going away, move contact details over to vger. cc: <stable@vger.kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-09-14xfs: undo block reservation correctly in xfs_trans_reserve()Eryu Guan
"blocks" should be added back to fdblocks at undo time, not taken away, i.e. the minus sign should not be used. This is a regression introduced by commit 0d485ada404b ("xfs: use generic percpu counters for free block counter"). And it's found by code inspection, I didn't it in real world, so there's no reproducer. Signed-off-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-30xfs: track log done items directly in the deferred pending work itemDarrick J. Wong
Christoph reports slab corruption when a deferred refcount update aborts during _defer_finish(). The cause of this was broken log item state tracking in xfs_defer_pending -- upon an abort, _defer_trans_abort() will call abort_intent on all intent items, including the ones that have already had a done item attached. This is incorrect because each intent item has 2 refcount: the first is released when the intent item is committed to the log; and the second is released when the _done_ item is committed to the log, or by the intent creator if there is no done item. In other words, once we log the done item, responsibility for releasing the intent item's second refcount is transferred to the done item and /must not/ be performed by anything else. The dfp_committed flag should have been tracking whether or not we had a done item so that _defer_trans_abort could decide if it needs to abort the intent item, but due to a thinko this was not the case. Rip it out and track the done item directly so that we do the right thing w.r.t. intent item freeing. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-29iomap: don't set FIEMAP_EXTENT_MERGED for extent based filesystemsChristoph Hellwig
Filesystems like XFS that use extents should not set the FIEMAP_EXTENT_MERGED flag in the fiemap extent structures. To allow for both behaviors for the upcoming gfs2 usage split the iomap type field into type and flags, and only set FIEMAP_EXTENT_MERGED if the IOMAP_F_MERGED flag is set. The flags field will also come in handy for future features such as shared extents on reflink-enabled file systems. Reported-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-26xfs: prevent dropping ioend completions during buftarg waitBrian Foster
xfs_wait_buftarg() waits for all pending I/O, drains the ioend completion workqueue and walks the LRU until all buffers in the cache have been released. This is traditionally an unmount operation` but the mechanism is also reused during filesystem freeze. xfs_wait_buftarg() invokes drain_workqueue() as part of the quiesce, which is intended more for a shutdown sequence in that it indicates to the queue that new operations are not expected once the drain has begun. New work jobs after this point result in a WARN_ON_ONCE() and are otherwise dropped. With filesystem freeze, however, read operations are allowed and can proceed during or after the workqueue drain. If such a read occurs during the drain sequence, the workqueue infrastructure complains about the queued ioend completion work item and drops it on the floor. As a result, the buffer remains on the LRU and the freeze never completes. Despite the fact that the overall buffer cache cleanup is not necessary during freeze, fix up this operation such that it is safe to invoke during non-unmount quiesce operations. Replace the drain_workqueue() call with flush_workqueue(), which runs a similar serialization on pending workqueue jobs without causing new jobs to be dropped. This is safe for unmount as unmount independently locks out new operations by the time xfs_wait_buftarg() is invoked. cc: <stable@vger.kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-26xfs: fix superblock inprogress checkDave Chinner
From inspection, the superblock sb_inprogress check is done in the verifier and triggered only for the primary superblock via a "bp->b_bn == XFS_SB_DADDR" check. Unfortunately, the primary superblock is an uncached buffer, and hence it is configured by xfs_buf_read_uncached() with: bp->b_bn = XFS_BUF_DADDR_NULL; /* always null for uncached buffers */ And so this check never triggers. Fix it. cc: <stable@vger.kernel.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-26xfs: simple btree query range should look right if LE lookup failsDarrick J. Wong
If the initial LOOKUP_LE in the simple query range fails to find anything, we should attempt to increment the btree cursor to see if there actually /are/ records for what we're trying to find. Without this patch, a bnobt range query of (0, $agsize) returns no results because the leftmost record never has a startblock of zero. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-26xfs: fix some key handling problems in _btree_simple_query_rangeDarrick J. Wong
We only need the record's high key for the first record that we look at; for all records, we /definitely/ need the regular record key. Therefore, fix how the simple range query function gets its keys. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-26xfs: don't log the entire end of the AGFDarrick J. Wong
When we're logging the last non-spare field in the AGF, we don't need to log the spare fields, so plumb in a new AGF logging flag to help us avoid that. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-26xfs: disallow mounting of realtime + rmap filesystemsDarrick J. Wong
Since the kernel doesn't currently support the realtime rmapbt, don't allow such filesystems to be mounted. Support will appear in a future release. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-26xfs: don't perform lookups on zero-height btreesDarrick J. Wong
If the caller passes in a cursor to a zero-height btree (which is impossible), we never set block to anything but NULL, which causes the later dereference of it to crash. Instead, just return -EFSCORRUPTED. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17Merge branch 'iomap-fixes-4.8-rc3' into for-nextDave Chinner
2016-08-17xfs: remove OWN_AG rmap when allocating a block from the AGFLDarrick J. Wong
When we're really tight on space, xfs_alloc_ag_vextent_small() can allocate a block from the AGFL and give it to the caller. Since the caller is never the AGFL-fixing method, we must remove the OWN_AG reverse mapping because it will clash with whatever rmap the caller wants to set up. This bug was discovered by running generic/299 repeatedly. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17xfs: (re-)implement FIEMAP_FLAG_XATTRChristoph Hellwig
Use a special read-only iomap_ops implementation to support fiemap on the attr fork. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17xfs: simplify xfs_file_iomap_beginChristoph Hellwig
We'll never get nimap == 0 for a successful return from xfs_bmapi_read, so don't try to handle it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17iomap: mark ->iomap_end as optionalChristoph Hellwig
No need to implement it for read-only mappings. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17iomap: prepare iomap_fiemap for attribute mappingsDave Chinner
By bassing through an -ENOENT, similar to the old XFS implementation of FIEMAP_FLAG_XATTR. Signed-off-by: Dave Chinner <dchinner@redhat.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17iomap: fiemap should honor the FIEMAP_FLAG_SYNC flagDave Chinner
The flag is checked as supported, but then we do an unconditional sync of the file, regardless of whether the flag is set or not. Make the sync conditional on having the FIEMAP_FLAG_SYNC flag set. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17iomap: remove superflous pagefault_disable from iomap_write_actorChristoph Hellwig
iov_iter_copy_from_user_atomic disables page faults internally, no need to do it around the call. This also brings the iomap code in line with the original filemap version. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17iomap: remove superflous mark_page_accessed from iomap_write_actorChristoph Hellwig
This catches up with commit 2457ae ("mm: non-atomically mark page accessed during page cache allocation where possible"), which moved the initial access marking into the pagecache allocator. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17xfs: store rmapbt block count in the AGFDarrick J. Wong
Track the number of blocks used for the rmapbt in the AGF. When we get to the AG reservation code we need this counter to quickly make our reservation during mount. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17xfs: don't invalidate whole file on DAX read/writeDave Chinner
When we do DAX IO, we try to invalidate the entire page cache held on the file. This is incorrect as it will trash the entire mapping tree that now tracks dirty state in exceptional entries in the radix tree slots. What we are trying to do is remove cached pages (e.g from reads into holes) that sit in the radix tree over the range we are about to write to. Hence we should just limit the invalidation to the range we are about to overwrite. Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17xfs: fix bogus space reservation in xfs_iomap_write_allocateChristoph Hellwig
The space reservations was without an explaination in commit "Add error reporting calls in error paths that return EFSCORRUPTED" back in 2003. There is no reason to reserve disk blocks in the transaction when allocating blocks for delalloc space as we already reserved the space when creating the delalloc extent. With this fix we stop running out of the reserved pool in generic/229, which has happened for long time with small blocksize file systems, and has increased in severity with the new buffered write path. [ dchinner: we still need to pass the block reservation into xfs_bmapi_write() to ensure we don't deadlock during AG selection. See commit dbd5c8c ("xfs: pass total block res. as total xfs_bmapi_write() parameter") for more details on why this is necessary. ] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-17xfs: don't assert fail on non-async buffers on ioacct decrementBrian Foster
The buffer I/O accounting mechanism tracks async buffers under I/O. As an optimization, the buffer I/O count is incremented only once on the first async I/O for a given hold cycle of a buffer and decremented once the buffer is released to the LRU (or freed). xfs_buf_ioacct_dec() has an ASSERT() check for an XBF_ASYNC buffer, but we have one or two corner cases where a buffer can be submitted for I/O multiple times via different methods in a single hold cycle. If an async I/O occurs first, the I/O count is incremented. If a sync I/O occurs before the hold count drops, XBF_ASYNC is cleared by the time the I/O count is decremented. Remove the async assert check from xfs_buf_ioacct_dec() as this is a perfectly valid scenario. For the purposes of I/O accounting, we really only care about the buffer async state at I/O submission time. Discovered-and-analyzed-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2016-08-14Linux 4.8-rc2Linus Torvalds
2016-08-14Merge branch 'next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal updates from Zhang Rui: - Fix a race condition when updating cooling device, which may lead to a situation where a thermal governor never updates the cooling device. From Michele Di Giorgio. - Fix a zero division error when disabling the forced idle injection from the intel powerclamp. From Petr Mladek. - Add suspend/resume callback for intel_pch_thermal thermal driver. From Srinivas Pandruvada. - Another two fixes for clocking cooling driver and hwmon sysfs I/F. From Michele Di Giorgio and Kuninori Morimoto. [ Hmm. That suspend/resume callback for intel_pch_thermal doesn't look like a fix, but I'm letting it slide.. - Linus ] * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: thermal: clock_cooling: Fix missing mutex_init() thermal: hwmon: EXPORT_SYMBOL_GPL for thermal hwmon sysfs thermal: fix race condition when updating cooling device thermal/powerclamp: Prevent division by zero when counting interval thermal: intel_pch_thermal: Add suspend/resume callback
2016-08-14Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68knommu fix from Greg Ungerer: "This contains only a single fix for a register corruption problem on certain types of m68k flat format binaries" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68knommu: fix user a5 register being overwritten
2016-08-13Merge tag 'fixes-for-linus-4.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull h8300 and unicore32 architecture fixes from Guenter Roeck: "Two patches to fix h8300 and unicore32 builds. unicore32 builds have been broken since v4.6. The fix has been available in -next since March of this year. h8300 builds have been broken since the last commit window. The fix has been available in -next since June of this year" * tag 'fixes-for-linus-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: h8300: Add missing include file to asm/io.h unicore32: mm: Add missing parameter to arch_vma_access_permitted
2016-08-13Merge tag 'arm64-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - support for nr_cpus= command line argument (maxcpus was previously changed to allow secondary CPUs to be hot-plugged) - ARM PMU interrupt handling fix - fix potential TLB conflict in the hibernate code - improved handling of EL1 instruction aborts (better error reporting) - removal of useless jprobes code for stack saving/restoring - defconfig updates * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: defconfig: enable CONFIG_LOCALVERSION_AUTO arm64: defconfig: add options for virtualization and containers arm64: hibernate: handle allocation failures arm64: hibernate: avoid potential TLB conflict arm64: Handle el1 synchronous instruction aborts cleanly arm64: Remove stack duplicating code from jprobes drivers/perf: arm-pmu: Fix handling of SPI lacking "interrupt-affinity" property drivers/perf: arm-pmu: convert arm_pmu_mutex to spinlock arm64: Support hard limit of cpu count by nr_cpus
2016-08-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Radim Krčmář: "KVM: - lock kvm_device list to prevent corruption on device creation. PPC: - split debugfs initialization from creation of the xics device to unlock the newly taken kvm lock earlier. s390: - prevent userspace from triggering two WARN_ON_ONCE. MIPS: - fix several issues in the management of TLB faults (Cc: stable)" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: MIPS: KVM: Propagate kseg0/mapped tlb fault errors MIPS: KVM: Fix gfn range check in kseg0 tlb faults MIPS: KVM: Add missing gfn range check MIPS: KVM: Fix mapped fault broken commpage handling KVM: Protect device ops->create and list_add with kvm->lock KVM: PPC: Move xics_debugfs_init out of create KVM: s390: reset KVM_REQ_MMU_RELOAD if mapping the prefix failed KVM: s390: set the prefix initially properly
2016-08-13Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: - an NVMe fix from Gabriel, fixing a suspend/resume issue on some setups - addition of a few missing entries in the block queue sysfs documentation, from Joe - a fix for a sparse shadow warning for the bvec iterator, from Johannes - a writeback deadlock involving raid issuing barriers, and not flushing the plug when we wakeup the flusher threads. From Konstantin - a set of patches for the NVMe target/loop/rdma code, from Roland and Sagi * 'for-linus' of git://git.kernel.dk/linux-block: bvec: avoid variable shadowing warning doc: update block/queue-sysfs.txt entries nvme: Suspend all queues before deletion mm, writeback: flush plugged IO in wakeup_flusher_threads() nvme-rdma: Remove unused includes nvme-rdma: start async event handler after reconnecting to a controller nvmet: Fix controller serial number inconsistency nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads nvmet-rdma: Correctly handle RDMA device hot removal nvme-rdma: Make sure to shutdown the controller if we can nvme-loop: Remove duplicate call to nvme_remove_namespaces nvme-rdma: Free the I/O tags when we delete the controller nvme-rdma: Remove duplicate call to nvme_remove_namespaces nvme-rdma: Fix device removal handling nvme-rdma: Queue ns scanning after a sucessful reconnection nvme-rdma: Don't leak uninitialized memory in connect request private data
2016-08-13h8300: Add missing include file to asm/io.hGuenter Roeck
h8300 builds fail with arch/h8300/include/asm/io.h:9:15: error: unknown type name ‘u8’ arch/h8300/include/asm/io.h:15:15: error: unknown type name ‘u16’ arch/h8300/include/asm/io.h:21:15: error: unknown type name ‘u32’ and many related errors. Fixes: 23c82d41bdf4 ("kexec-allow-architectures-to-override-boot-mapping-fix") Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net>
2016-08-13unicore32: mm: Add missing parameter to arch_vma_access_permittedGuenter Roeck
unicore32 fails to compile with the following errors. mm/memory.c: In function ‘__handle_mm_fault’: mm/memory.c:3381: error: too many arguments to function ‘arch_vma_access_permitted’ mm/gup.c: In function ‘check_vma_flags’: mm/gup.c:456: error: too many arguments to function ‘arch_vma_access_permitted’ mm/gup.c: In function ‘vma_permits_fault’: mm/gup.c:640: error: too many arguments to function ‘arch_vma_access_permitted’ Fixes: d61172b4b695b ("mm/core, x86/mm/pkeys: Differentiate instruction fetches") Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Acked-by: Guan Xuetao <gxt@mprc.pku.edu.cn>
2016-08-12Merge tag 'vfio-v4.8-rc2' of git://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO fix from Alex Williamson: "Fix oops when dereferencing empty data (Alex Williamson)" * tag 'vfio-v4.8-rc2' of git://github.com/awilliam/linux-vfio: vfio/pci: Fix NULL pointer oops in error interrupt setup handling
2016-08-12Merge tag 'nfsd-4.8-1' of git://linux-nfs.org/~bfields/linuxLinus Torvalds
Pull nfsd fixes from Bruce Fields: "Fixes for the dentry refcounting leak I introduced in 4.8-rc1, and for races in the LOCK code which appear to go back to the big nfsd state lock removal from 3.17" * tag 'nfsd-4.8-1' of git://linux-nfs.org/~bfields/linux: nfsd: don't return an unhashed lock stateid after taking mutex nfsd: Fix race between FREE_STATEID and LOCK nfsd: fix dentry refcounting on create
2016-08-12Merge tag 'pm-4.8-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Two hibernation fixes allowing it to work with the recently added randomization of the kernel identity mapping base on x86-64 and one cpufreq driver regression fix. Specifics: - Fix the x86 identity mapping creation helpers to avoid the assumption that the base address of the mapping will always be aligned at the PGD level, as it may be aligned at the PUD level if address space randomization is enabled (Rafael Wysocki). - Fix the hibernation core to avoid executing tracing functions before restoring the processor state completely during resume (Thomas Garnier). - Fix a recently introduced regression in the powernv cpufreq driver that causes it to crash due to an out-of-bounds array access (Akshay Adiga)" * tag 'pm-4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / hibernate: Restore processor state before using per-CPU variables x86/power/64: Always create temporary identity mapping correctly cpufreq: powernv: Fix crash in gpstate_timer_handler()
2016-08-12Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "This is bigger than usual - the reason is partly a pent-up stream of fixes after the merge window and partly accidental. The fixes are: - five patches to fix a boot failure on Andy Lutomirsky's laptop - four SGI UV platform fixes - KASAN fix - warning fix - documentation update - swap entry definition fix - pkeys fix - irq stats fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic/x2apic, smp/hotplug: Don't use before alloc in x2apic_cluster_probe() x86/efi: Allocate a trampoline if needed in efi_free_boot_services() x86/boot: Rework reserve_real_mode() to allow multiple tries x86/boot: Defer setup_real_mode() to early_initcall time x86/boot: Synchronize trampoline_cr4_features and mmu_cr4_features directly x86/boot: Run reserve_bios_regions() after we initialize the memory map x86/irq: Do not substract irq_tlb_count from irq_call_count x86/mm: Fix swap entry comment and macro x86/mm/kaslr: Fix -Wformat-security warning x86/mm/pkeys: Fix compact mode by removing protection keys' XSAVE buffer manipulation x86/build: Reduce the W=1 warnings noise when compiling x86 syscall tables x86/platform/UV: Fix kernel panic running RHEL kdump kernel on UV systems x86/platform/UV: Fix problem with UV4 BIOS providing incorrect PXM values x86/platform/UV: Fix bug with iounmap() of the UV4 EFI System Table causing a crash x86/platform/UV: Fix problem with UV4 Socket IDs not being contiguous x86/entry: Clarify the RF saving/restoring situation with SYSCALL/SYSRET x86/mm: Disable preemption during CR3 read+write x86/mm/KASLR: Increase BRK pages for KASLR memory randomization x86/mm/KASLR: Fix physical memory calculation on KASLR memory randomization x86, kasan, ftrace: Put APIC interrupt handlers into .irqentry.text
2016-08-12Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "Misc fixes: a /dev/rtc regression fix, two APIC timer period calibration fixes, an ARM clocksource driver fix and a NOHZ power use regression fix" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hpet: Fix /dev/rtc breakage caused by RTC cleanup x86/timers/apic: Inform TSC deadline clockevent device about recalibration x86/timers/apic: Fix imprecise timer interrupts by eliminating TSC clockevents frequency roundoff error timers: Fix get_next_timer_interrupt() computation clocksource/arm_arch_timer: Force per-CPU interrupt to be level-triggered