summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-07-11xfs: remove struct xfs_bmalloca dfops fieldBrian Foster
Now that bma.dfops is only assigned from ->t_dfops, replace all accesses to the former with the latter and remove the unnecessary field. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: remove xfs_bmapi_remap() dfops paramBrian Foster
All xfs_bmapi_remap() callers already use ->t_dfops. Note that deferred completion context unconditionally sets ->t_dfops if it hasn't already been set by the caller. Remove the unnecessary parameter and access ->t_dfops directly. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: remove xfs_bunmapi() dfops paramBrian Foster
Now that all xfs_bunmapi() callers use ->t_dfops, remove the unnecessary parameter and access ->t_dfops directly. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: use ->t_dfops for all xfs_bunmapi() callersBrian Foster
Use ->t_dfops for all remaining xfs_bunmapi() callers. This prepares the latter to no longer require a dfops parameter. Note that xfs_itruncate_extents_flags() associates a local dfops with a transaction provided from the caller. Since there are multiple callers, set and reset ->t_dfops before the function returns to avoid exposure of stack memory to the caller. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: remove xfs_bmapi_write() dfops paramBrian Foster
Now that all callers use ->t_dfops, the xfs_bmapi_write() dfops parameter is no longer necessary. Remove it and access ->t_dfops directly. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: use ->t_dfops for all xfs_bmapi_write() callersBrian Foster
Attach ->t_dfops for all remaining callers of xfs_bmapi_write(). This prepares the latter to no longer require a separate dfops parameter. Note that xfs_symlink() already uses ->t_dfops. Fix up the local references for consistency. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: use ->t_dfops in dqalloc transactionBrian Foster
xfs_dquot_disk_alloc() receives a transaction from the caller and passes a local dfops along to xfs_bmapi_write(). If we attach this dfops to the transaction, we have to make sure to clear it before returning to avoid invalid access of stack memory. Since xfs_qm_dqread_alloc() is the only caller, pull dfops into the caller and attach it to the transaction to eliminate this pattern entirely. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: replace xfs_da_args->dfops accesses with ->t_dfops and removeBrian Foster
Now that xfs_da_args->dfops is always assigned from a ->t_dfops pointer (or one that is immediately attached), replace all downstream accesses of the former with the latter and remove the field from struct xfs_da_args. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: use ->t_dfops in extent split tx and remove paramBrian Foster
Attach the local dfops to ->t_dfops of the extent split transaction. Since this is the only caller of xfs_bmap_split_extent_at(), remove the dfops parameter as well. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: remove dfops param in attr fork add pathBrian Foster
Now that the attribute fork add tx carries dfops along with the transaction, it is unnecessary to pass it down the stack. Remove the dfops parameter and access ->t_dfops directly where necessary. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: use ->t_dfops for attr set/remove operationsBrian Foster
Attach the local dfops to the transaction allocated for xattr add and remove operations. Add an earlier initialization in xfs_attr_remove() to ensure the structure is valid if it remains unused at transaction commit time. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: use ->t_dfops for recovery of [b|c]ui log itemsBrian Foster
Log recovery passes down a central dfops structure to recovery handlers for bui and cui log items. Each of these handlers allocates and commits a transaction and defers any remaining operations to be completed by the main recovery sequence. Since dfops outlives the transaction in this context, set and clear ->t_dfops appropriately such that the *_finish_item() paths and below (i.e., xfs_bmapi*()) can expect to find the dfops in the transaction without it being committed with the dfops attached. This is required because transaction commit expects that an associated dfops is finished and in this context the dfops may be populated at commit time. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: remove dfops param from high level dirname callsBrian Foster
All callers of the directory create, rename and remove interfaces already associate the dfops with the transaction. Drop the dfops parameters in these calls in preparation for further cleanups in the layers below. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: remove dfops parameter from ifree call stackBrian Foster
The inode free callchain starting in xfs_inactive_ifree() already associates its dfops with the transaction. It still passes the dfops on the stack down through xfs_difree_inobt(), however. Clean up the call stack and reference dfops directly from the transaction. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: rename xfs_trans ->t_agfl_dfops to ->t_dfopsBrian Foster
The ->t_agfl_dfops field is currently used to defer agfl block frees from associated transaction contexts. While all known problematic contexts have already been updated to use ->t_agfl_dfops, the broader goal is defer agfl frees from all callers that already use a deferred operations structure. Further, the transaction field facilitates a good amount of code clean up where the transaction and dfops have historically been passed down through the stack separately. Rename the field to something more generic to prepare to use it as such throughout XFS. This patch does not change behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: cow unwritten conversion uses uninitialized dfopsBrian Foster
A couple COW fork unwritten extent conversion helpers pass an uninitialized dfops pointer to xfs_bmapi_write(). This does not cause problems because conversion does not use a transaction or the dfops structure for the COW fork. Drop the uninitialized usage of dfops in these codepaths and pass NULL along to xfs_bmapi_write() instead. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: update my copyrights for the writeback and iomap codeChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: add support for sub-pagesize writeback without buffer_headsChristoph Hellwig
Switch to using the iomap_page structure for checking sub-page uptodate status and track sub-page I/O completion status, and remove large quantities of boilerplate code working around buffer heads. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11iomap: add support for sub-pagesize buffered I/O without buffer headsChristoph Hellwig
After already supporting a simple implementation of buffered writes for the blocksize == PAGE_SIZE case in the last commit this adds full support even for smaller block sizes. There are three bits of per-block information in the buffer_head structure that really matter for the iomap read and write path: - uptodate status (BH_uptodate) - marked as currently under read I/O (BH_Async_Read) - marked as currently under write I/O (BH_Async_Write) Instead of having new per-block structures this now adds a per-page structure called struct iomap_page to track this information in a slightly different form: - a bitmap for the per-block uptodate status. For worst case of a 64k page size system this bitmap needs to contain 128 bits. For the typical 4k page size case it only needs 8 bits, although we still need a full unsigned long due to the way the atomic bitmap API works. - two atomic_t counters are used to track the outstanding read and write counts There is quite a bit of boilerplate code as the buffered I/O path uses various helper methods, but the actual code is very straight forward. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: allow writeback on pages without buffer headsChristoph Hellwig
Disable the IOMAP_F_BUFFER_HEAD flag on file systems with a block size equal to the page size, and deal with pages without buffer heads in writeback. Thanks to the previous refactoring this is basically trivial now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: refactor the tail of xfs_writepage_mapChristoph Hellwig
Rejuggle how we deal with the different error vs non-error and have ioends vs not have ioend cases to keep the fast path streamlined, and the duplicate code at a minimum. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: remove xfs_start_page_writebackChristoph Hellwig
This helper only has two callers, one of them with a constant error argument. Remove it to make pending changes to the code a little easier. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: move all writeback buffer_head manipulation into xfs_map_at_offsetChristoph Hellwig
This keeps it in a single place so it can be made otional more easily. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: don't look at buffer heads in xfs_add_to_ioendChristoph Hellwig
Calculate all information for the bio based on the passed in information without requiring a buffer_head structure. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: remove the imap_valid flagChristoph Hellwig
Simplify the way we check for a valid imap - we know we have a valid mapping after xfs_map_blocks returned successfully, and we know we can call xfs_imap_valid on any imap, as it will always fail on a zero-initialized map. We can also remove the xfs_imap_valid function and fold it into xfs_map_blocks now. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: simplify xfs_map_blocks by using xfs_iext_lookup_extent directlyChristoph Hellwig
xfs_bmapi_read adds zero value in xfs_map_blocks. Replace it with a direct call to the low-level extent lookup function. Note that we now always pass a 0 length to the trace points as we ask for an unspecified len. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: remove xfs_reflink_find_cow_mappingChristoph Hellwig
We only have one caller left, and open coding the simple extent list lookup in it allows us to make the code both more understandable and reuse calculations and variables already present. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: remove the now unused XFS_BMAPI_IGSTATE flagChristoph Hellwig
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: make xfs_writepage_map extent map centricDave Chinner
xfs_writepage_map() iterates over the bufferheads on a page to decide what sort of IO to do and what actions to take. However, when it comes to reflink and deciding when it needs to execute a COW operation, we no longer look at the bufferhead state but instead we ignore than and look up internal state held in the COW fork extent list. This means xfs_writepage_map() is somewhat confused. It does stuff, then ignores it, then tries to handle the impedence mismatch by shovelling the results inside the existing mapping code. It works, but it's a bit of a mess and it makes it hard to fix the cached map bug that the writepage code currently has. To unify the two different mechanisms, we first have to choose a direction. That's already been set - we're de-emphasising bufferheads so they are no longer a control structure as we need to do taht to allow for eventual removal. Hence we need to move away from looking at bufferhead state to determine what operations we need to perform. We can't completely get rid of bufferheads yet - they do contain some state that is absolutely necessary, such as whether that part of the page contains valid data or not (buffer_uptodate()). Other state in the bufferhead is redundant: BH_dirty - the page is dirty, so we can ignore this and just write it BH_delay - we have delalloc extent info in the DATA fork extent tree BH_unwritten - same as BH_delay BH_mapped - indicates we've already used it once for IO and it is mapped to a disk address. Needs to be ignored for COW blocks. The BH_mapped flag is an interesting case - it's supposed to indicate that it's already mapped to disk and so we can just use it "as is". In theory, we don't even have to do an extent lookup to find where to write it too, but we have to do that anyway to determine we are actually writing over a valid extent. Hence it's not even serving the purpose of avoiding a an extent lookup during writeback, and so we can pretty much ignore it. Especially as we have to ignore it for COW operations... Therefore, use the extent map as the source of information to tell us what actions we need to take and what sort of IO we should perform. The first step is to have xfs_map_blocks() set the io type according to what it looks up. This means it can easily handle both normal overwrite and COW cases. The only thing we also need to add is the ability to return hole mappings. We need to return and cache hole mappings now for the case of multiple blocks per page. We no longer use the BH_mapped to indicate a block over a hole, so we have to get that info from xfs_map_blocks(). We cache it so that holes that span two pages don't need separate lookups. This allows us to avoid ever doing write IO over a hole, too. Now that we have xfs_map_blocks() returning both a cached map and the type of IO we need to perform, we can rewrite xfs_writepage_map() to drop all the bufferhead control. It's also much simplified because it doesn't need to explicitly handle COW operations. Instead of iterating bufferheads, it iterates blocks within the page and then looks up what per-block state is required from the appropriate bufferhead. It then validates the cached map, and if it's not valid, we get a new map. If we don't get a valid map or it's over a hole, we skip the block. At this point, we have to remap the bufferhead via xfs_map_at_offset(). As previously noted, we had to do this even if the buffer was already mapped as the mapping would be stale for XFS_IO_DELALLOC, XFS_IO_UNWRITTEN and XFS_IO_COW IO types. With xfs_map_blocks() now controlling the type, even XFS_IO_OVERWRITE types need remapping, as converted-but-not-yet- written delalloc extents beyond EOF can be reported at XFS_IO_OVERWRITE. Bufferheads that span such regions still need their BH_Delay flags cleared and their block numbers calculated, so we now unconditionally map each bufferhead before submission. But wait! There's more - remember the old "treat unwritten extents as holes on read" hack? Yeah, that means we can have a dirty page with unmapped, unwritten bufferheads that contain data! What makes these so special is that the unwritten "hole" bufferheads do not have a valid block device pointer, so if we attempt to write them xfs_add_to_ioend() blows up. So we make xfs_map_at_offset() do the "realtime or data device" lookup from the inode and ignore what was or wasn't put into the bufferhead when the buffer was instantiated. The astute reader will have realised by now that this code treats unwritten extents in multiple-blocks-per-page situations differently. If we get any combination of unwritten blocks on a dirty page that contain valid data in the page, we're going to convert them to real extents. This can actually be a win, because it means that pages with interleaving unwritten and written blocks will get converted to a single written extent with zeros replacing the interspersed unwritten blocks. This is actually good for reducing extent list and conversion overhead, and it means we issue a contiguous IO instead of lots of little ones. The downside is that we use up a little extra IO bandwidth. Neither of these seem like a bad thing given that spinning disks are seek sensitive, and SSDs/pmem have bandwidth to burn and the lower Io latency/CPU overhead of fewer, larger IOs will result in better performance on them... As a result of all this, the only state we actually care about from the bufferhead is a single flag - BH_Uptodate. We still use the bufferhead to pass some information to the bio via xfs_add_to_ioend(), but that is trivial to separate and pass explicitly. This means we really only need 1 bit of state per block per page from the buffered write path in the writeback path. Everything else we do with the bufferhead is purely to make the buffered IO front end continue to work correctly. i.e we've pretty much marginalised bufferheads in the writeback path completely. Signed-off-By: Dave Chinner <dchinner@redhat.com> [hch: forward port, refactor and split off bits into other commits] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: rename the offset variable in xfs_writepage_mapChristoph Hellwig
Calling it file_offset makes the usage more clear, especially with a new poffset variable that will be added soon for the offset inside the page. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: remove xfs_map_cowChristoph Hellwig
We can handle the existing cow mapping case as a special case directly in xfs_writepage_map, and share code for allocating delalloc blocks with regular I/O in xfs_map_blocks. This means we need to always call xfs_map_blocks for reflink inodes, but we can still skip most of the work if it turns out that there is no COW mapping overlapping the current block. As a subtle detail we need to start caching holes in the wpc to deal with the case of COW reservations between EOF. But we'll need that infrastructure later anyway, so this is no big deal. Based on a patch from Dave Chinner. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: remove xfs_reflink_trim_irec_to_next_cowChristoph Hellwig
We already have to check for overlapping COW extents everytime we come back to a page in xfs_writepage_map / xfs_map_cow, so this additional trim is not required. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: don't use XFS_BMAPI_IGSTATE in xfs_map_blocksChristoph Hellwig
We want to be able to use the extent state as a reliably indicator for the type of I/O, and stop using the buffer head state. For this we need to stop using the XFS_BMAPI_IGSTATE so that we don't see merged extents of different types. Based on a patch from Dave Chinner. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: don't clear imap_valid for a non-uptodate buffersChristoph Hellwig
Finding a buffer that isn't uptodate doesn't invalidate the mapping for any given block. The last_sector check will already take care of starting another ioend as soon as we find any non-update buffer, and if the current mapping doesn't include the next uptodate buffer the xfs_imap_valid check will take care of it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: do not set the page uptodate in xfs_writepage_mapChristoph Hellwig
We already track the page uptodate status based on the buffer uptodate status, which is updated whenever reading or zeroing blocks. This code has been there since commit a ptool commit in 2002, which claims to: "merge" the 2.4 fsx fix for block size < page size to 2.5. This needed major changes to actually fit. and isn't present in other writepage implementations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: move locking into xfs_bmap_punch_delalloc_rangeChristoph Hellwig
Both callers want the same looking, so do it only once. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: simplify xfs_aops_discard_pageChristoph Hellwig
Instead of looking at the buffer heads to see if a block is delalloc just call xfs_bmap_punch_delalloc_range on the whole page - this will leave any non-delalloc block intact and handle the iteration for us. As a side effect one more place stops caring about buffer heads and we can remove the xfs_check_page_type function entirely. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11xfs: use iomap for blocksize == PAGE_SIZE readpage and readpagesChristoph Hellwig
For file systems with a block size that equals the page size we never do partial reads, so we can use the buffer_head-less iomap versions of readpage and readpages without conflicting with the buffer_head structures create later in write_begin. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2018-07-11Merge branch 'iomap-4.19-merge' into xfs-4.19-mergeDarrick J. Wong
2018-07-08Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bugfixes from Ted Ts'o: "Bug fixes for ext4; most of which relate to vulnerabilities where a maliciously crafted file system image can result in a kernel OOPS or hang. At least one fix addresses an inline data bug could be triggered by userspace without the need of a crafted file system (although it does require that the inline data feature be enabled)" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: check superblock mapped prior to committing ext4: add more mount time checks of the superblock ext4: add more inode number paranoia checks ext4: avoid running out of journal credits when appending to an inline file jbd2: don't mark block as modified if the handle is out of credits ext4: never move the system.data xattr out of the inode body ext4: clear i_data in ext4_inode_info when removing inline data ext4: include the illegal physical block in the bad map ext4_error msg ext4: verify the depth of extent tree in ext4_find_extent() ext4: only look at the bg_flags field if it is valid ext4: make sure bitmaps and the inode table don't overlap with bg descriptors ext4: always check block group bounds in ext4_init_block_bitmap() ext4: always verify the magic number in xattr blocks ext4: add corruption check in ext4_xattr_set_entry() ext4: add warn_on_error mount option
2018-07-07Merge tag '4.18-rc3-smb3fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Five smb3/cifs fixes for stable (including for some leaks and memory overwrites) and also a few fixes for recent regressions in packet signing. Additional testing at the recent SMB3 test event, and some good work by Paulo and others spotted the issues fixed here. In addition to my xfstest runs on these, Aurelien and Stefano did additional test runs to verify this set" * tag '4.18-rc3-smb3fixes' of git://git.samba.org/sfrench/cifs-2.6: cifs: Fix stack out-of-bounds in smb{2,3}_create_lease_buf() cifs: Fix infinite loop when using hard mount option cifs: Fix slab-out-of-bounds in send_set_info() on SMB2 ACE setting cifs: Fix memory leak in smb2_set_ea() cifs: fix SMB1 breakage cifs: Fix validation of signed data in smb2 cifs: Fix validation of signed data in smb3+ cifs: Fix use after free of a mid_q_entry
2018-07-05Fix up non-directory creation in SGID directoriesLinus Torvalds
sgid directories have special semantics, making newly created files in the directory belong to the group of the directory, and newly created subdirectories will also become sgid. This is historically used for group-shared directories. But group directories writable by non-group members should not imply that such non-group members can magically join the group, so make sure to clear the sgid bit on non-directories for non-members (but remember that sgid without group execute means "mandatory locking", just to confuse things even more). Reported-by: Jann Horn <jannh@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-05cifs: Fix stack out-of-bounds in smb{2,3}_create_lease_buf()Stefano Brivio
smb{2,3}_create_lease_buf() store a lease key in the lease context for later usage on a lease break. In most paths, the key is currently sourced from data that happens to be on the stack near local variables for oplock in SMB2_open() callers, e.g. from open_shroot(), whereas smb2_open_file() properly allocates space on its stack for it. The address of those local variables holding the oplock is then passed to create_lease_buf handlers via SMB2_open(), and 16 bytes near oplock are used. This causes a stack out-of-bounds access as reported by KASAN on SMB2.1 and SMB3 mounts (first out-of-bounds access is shown here): [ 111.528823] BUG: KASAN: stack-out-of-bounds in smb3_create_lease_buf+0x399/0x3b0 [cifs] [ 111.530815] Read of size 8 at addr ffff88010829f249 by task mount.cifs/985 [ 111.532838] CPU: 3 PID: 985 Comm: mount.cifs Not tainted 4.18.0-rc3+ #91 [ 111.534656] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 111.536838] Call Trace: [ 111.537528] dump_stack+0xc2/0x16b [ 111.540890] print_address_description+0x6a/0x270 [ 111.542185] kasan_report+0x258/0x380 [ 111.544701] smb3_create_lease_buf+0x399/0x3b0 [cifs] [ 111.546134] SMB2_open+0x1ef8/0x4b70 [cifs] [ 111.575883] open_shroot+0x339/0x550 [cifs] [ 111.591969] smb3_qfs_tcon+0x32c/0x1e60 [cifs] [ 111.617405] cifs_mount+0x4f3/0x2fc0 [cifs] [ 111.674332] cifs_smb3_do_mount+0x263/0xf10 [cifs] [ 111.677915] mount_fs+0x55/0x2b0 [ 111.679504] vfs_kern_mount.part.22+0xaa/0x430 [ 111.684511] do_mount+0xc40/0x2660 [ 111.698301] ksys_mount+0x80/0xd0 [ 111.701541] do_syscall_64+0x14e/0x4b0 [ 111.711807] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 111.713665] RIP: 0033:0x7f372385b5fa [ 111.715311] Code: 48 8b 0d 99 78 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 66 78 2c 00 f7 d8 64 89 01 48 [ 111.720330] RSP: 002b:00007ffff27049d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 [ 111.722601] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f372385b5fa [ 111.724842] RDX: 000055c2ecdc73b2 RSI: 000055c2ecdc73f9 RDI: 00007ffff270580f [ 111.727083] RBP: 00007ffff2705804 R08: 000055c2ee976060 R09: 0000000000001000 [ 111.729319] R10: 0000000000000000 R11: 0000000000000206 R12: 00007f3723f4d000 [ 111.731615] R13: 000055c2ee976060 R14: 00007f3723f4f90f R15: 0000000000000000 [ 111.735448] The buggy address belongs to the page: [ 111.737420] page:ffffea000420a7c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0 [ 111.739890] flags: 0x17ffffc0000000() [ 111.741750] raw: 0017ffffc0000000 0000000000000000 dead000000000200 0000000000000000 [ 111.744216] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 111.746679] page dumped because: kasan: bad access detected [ 111.750482] Memory state around the buggy address: [ 111.752562] ffff88010829f100: 00 f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00 [ 111.754991] ffff88010829f180: 00 00 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00 [ 111.757401] >ffff88010829f200: 00 00 00 00 00 f1 f1 f1 f1 01 f2 f2 f2 f2 f2 f2 [ 111.759801] ^ [ 111.762034] ffff88010829f280: f2 02 f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 [ 111.764486] ffff88010829f300: f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 111.766913] ================================================================== Lease keys are however already generated and stored in fid data on open and create paths: pass them down to the lease context creation handlers and use them. Suggested-by: Aurélien Aptel <aaptel@suse.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Fixes: b8c32dbb0deb ("CIFS: Request SMB2.1 leases") Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-07-05cifs: Fix infinite loop when using hard mount optionPaulo Alcantara
For every request we send, whether it is SMB1 or SMB2+, we attempt to reconnect tcon (cifs_reconnect_tcon or smb2_reconnect) before carrying out the request. So, while server->tcpStatus != CifsNeedReconnect, we wait for the reconnection to succeed on wait_event_interruptible_timeout(). If it returns, that means that either the condition was evaluated to true, or timeout elapsed, or it was interrupted by a signal. Since we're not handling the case where the process woke up due to a received signal (-ERESTARTSYS), the next call to wait_event_interruptible_timeout() will _always_ fail and we end up looping forever inside either cifs_reconnect_tcon() or smb2_reconnect(). Here's an example of how to trigger that: $ mount.cifs //foo/share /mnt/test -o username=foo,password=foo,vers=1.0,hard (break connection to server before executing bellow cmd) $ stat -f /mnt/test & sleep 140 [1] 2511 $ ps -aux -q 2511 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 2511 0.0 0.0 12892 1008 pts/0 S 12:24 0:00 stat -f /mnt/test $ kill -9 2511 (wait for a while; process is stuck in the kernel) $ ps -aux -q 2511 USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 2511 83.2 0.0 12892 1008 pts/0 R 12:24 30:01 stat -f /mnt/test By using 'hard' mount point means that cifs.ko will keep retrying indefinitely, however we must allow the process to be killed otherwise it would hang the system. Signed-off-by: Paulo Alcantara <palcantara@suse.de> Cc: stable@vger.kernel.org Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-07-05cifs: Fix slab-out-of-bounds in send_set_info() on SMB2 ACE settingStefano Brivio
A "small" CIFS buffer is not big enough in general to hold a setacl request for SMB2, and we end up overflowing the buffer in send_set_info(). For instance: # mount.cifs //127.0.0.1/test /mnt/test -o username=test,password=test,nounix,cifsacl # touch /mnt/test/acltest # getcifsacl /mnt/test/acltest REVISION:0x1 CONTROL:0x9004 OWNER:S-1-5-21-2926364953-924364008-418108241-1000 GROUP:S-1-22-2-1001 ACL:S-1-5-21-2926364953-924364008-418108241-1000:ALLOWED/0x0/0x1e01ff ACL:S-1-22-2-1001:ALLOWED/0x0/R ACL:S-1-22-2-1001:ALLOWED/0x0/R ACL:S-1-5-21-2926364953-924364008-418108241-1000:ALLOWED/0x0/0x1e01ff ACL:S-1-1-0:ALLOWED/0x0/R # setcifsacl -a "ACL:S-1-22-2-1004:ALLOWED/0x0/R" /mnt/test/acltest this setacl will cause the following KASAN splat: [ 330.777927] BUG: KASAN: slab-out-of-bounds in send_set_info+0x4dd/0xc20 [cifs] [ 330.779696] Write of size 696 at addr ffff88010d5e2860 by task setcifsacl/1012 [ 330.781882] CPU: 1 PID: 1012 Comm: setcifsacl Not tainted 4.18.0-rc2+ #2 [ 330.783140] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [ 330.784395] Call Trace: [ 330.784789] dump_stack+0xc2/0x16b [ 330.786777] print_address_description+0x6a/0x270 [ 330.787520] kasan_report+0x258/0x380 [ 330.788845] memcpy+0x34/0x50 [ 330.789369] send_set_info+0x4dd/0xc20 [cifs] [ 330.799511] SMB2_set_acl+0x76/0xa0 [cifs] [ 330.801395] set_smb2_acl+0x7ac/0xf30 [cifs] [ 330.830888] cifs_xattr_set+0x963/0xe40 [cifs] [ 330.840367] __vfs_setxattr+0x84/0xb0 [ 330.842060] __vfs_setxattr_noperm+0xe6/0x370 [ 330.843848] vfs_setxattr+0xc2/0xd0 [ 330.845519] setxattr+0x258/0x320 [ 330.859211] path_setxattr+0x15b/0x1b0 [ 330.864392] __x64_sys_setxattr+0xc0/0x160 [ 330.866133] do_syscall_64+0x14e/0x4b0 [ 330.876631] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 330.878503] RIP: 0033:0x7ff2e507db0a [ 330.880151] Code: 48 8b 0d 89 93 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 bc 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 56 93 2c 00 f7 d8 64 89 01 48 [ 330.885358] RSP: 002b:00007ffdc4903c18 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc [ 330.887733] RAX: ffffffffffffffda RBX: 000055d1170de140 RCX: 00007ff2e507db0a [ 330.890067] RDX: 000055d1170de7d0 RSI: 000055d115b39184 RDI: 00007ffdc4904818 [ 330.892410] RBP: 0000000000000001 R08: 0000000000000000 R09: 000055d1170de7e4 [ 330.894785] R10: 00000000000002b8 R11: 0000000000000246 R12: 0000000000000007 [ 330.897148] R13: 000055d1170de0c0 R14: 0000000000000008 R15: 000055d1170de550 [ 330.901057] Allocated by task 1012: [ 330.902888] kasan_kmalloc+0xa0/0xd0 [ 330.904714] kmem_cache_alloc+0xc8/0x1d0 [ 330.906615] mempool_alloc+0x11e/0x380 [ 330.908496] cifs_small_buf_get+0x35/0x60 [cifs] [ 330.910510] smb2_plain_req_init+0x4a/0xd60 [cifs] [ 330.912551] send_set_info+0x198/0xc20 [cifs] [ 330.914535] SMB2_set_acl+0x76/0xa0 [cifs] [ 330.916465] set_smb2_acl+0x7ac/0xf30 [cifs] [ 330.918453] cifs_xattr_set+0x963/0xe40 [cifs] [ 330.920426] __vfs_setxattr+0x84/0xb0 [ 330.922284] __vfs_setxattr_noperm+0xe6/0x370 [ 330.924213] vfs_setxattr+0xc2/0xd0 [ 330.926008] setxattr+0x258/0x320 [ 330.927762] path_setxattr+0x15b/0x1b0 [ 330.929592] __x64_sys_setxattr+0xc0/0x160 [ 330.931459] do_syscall_64+0x14e/0x4b0 [ 330.933314] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 330.936843] Freed by task 0: [ 330.938588] (stack is not available) [ 330.941886] The buggy address belongs to the object at ffff88010d5e2800 which belongs to the cache cifs_small_rq of size 448 [ 330.946362] The buggy address is located 96 bytes inside of 448-byte region [ffff88010d5e2800, ffff88010d5e29c0) [ 330.950722] The buggy address belongs to the page: [ 330.952789] page:ffffea0004357880 count:1 mapcount:0 mapping:ffff880108fdca80 index:0x0 compound_mapcount: 0 [ 330.955665] flags: 0x17ffffc0008100(slab|head) [ 330.957760] raw: 0017ffffc0008100 dead000000000100 dead000000000200 ffff880108fdca80 [ 330.960356] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 [ 330.963005] page dumped because: kasan: bad access detected [ 330.967039] Memory state around the buggy address: [ 330.969255] ffff88010d5e2880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 330.971833] ffff88010d5e2900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 330.974397] >ffff88010d5e2980: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc [ 330.976956] ^ [ 330.979226] ffff88010d5e2a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 330.981755] ffff88010d5e2a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 330.984225] ================================================================== Fix this by allocating a regular CIFS buffer in smb2_plain_req_init() if the request command is SMB2_SET_INFO. Reported-by: Jianhong Yin <jiyin@redhat.com> Fixes: 366ed846df60 ("cifs: Use smb 2 - 3 and cifsacl mount options setacl function") CC: Stable <stable@vger.kernel.org> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Reviewed-and-tested-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-07-05cifs: Fix memory leak in smb2_set_ea()Paulo Alcantara
This patch fixes a memory leak when doing a setxattr(2) in SMB2+. Signed-off-by: Paulo Alcantara <palcantara@suse.de> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-07-05cifs: fix SMB1 breakageRonnie Sahlberg
SMB1 mounting broke in commit 35e2cc1ba755 ("cifs: Use correct packet length in SMB2_TRANSFORM header") Fix it and also rename smb2_rqst_len to smb_rqst_len to make it less unobvious that the function is also called from CIFS/SMB1 Good job by Paulo reviewing and cleaning up Ronnie's original patch. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Paulo Alcantara <palcantara@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-07-05cifs: Fix validation of signed data in smb2Paulo Alcantara
Fixes: c713c8770fa5 ("cifs: push rfc1002 generation down the stack") We failed to validate signed data returned by the server because __cifs_calc_signature() now expects to sign the actual data in iov but we were also passing down the rfc1002 length. Fix smb3_calc_signature() to calculate signature of rfc1002 length prior to passing only the actual data iov[1-N] to __cifs_calc_signature(). In addition, there are a few cases where no rfc1002 length is passed so we make sure there's one (iov_len == 4). Signed-off-by: Paulo Alcantara <palcantara@suse.de> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-07-05cifs: Fix validation of signed data in smb3+Paulo Alcantara
Fixes: c713c8770fa5 ("cifs: push rfc1002 generation down the stack") We failed to validate signed data returned by the server because __cifs_calc_signature() now expects to sign the actual data in iov but we were also passing down the rfc1002 length. Fix smb3_calc_signature() to calculate signature of rfc1002 length prior to passing only the actual data iov[1-N] to __cifs_calc_signature(). In addition, there are a few cases where no rfc1002 length is passed so we make sure there's one (iov_len == 4). Signed-off-by: Paulo Alcantara <palcantara@suse.de> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-07-05cifs: Fix use after free of a mid_q_entryLars Persson
With protocol version 2.0 mounts we have seen crashes with corrupt mid entries. Either the server->pending_mid_q list becomes corrupt with a cyclic reference in one element or a mid object fetched by the demultiplexer thread becomes overwritten during use. Code review identified a race between the demultiplexer thread and the request issuing thread. The demultiplexer thread seems to be written with the assumption that it is the sole user of the mid object until it calls the mid callback which either wakes the issuer task or deletes the mid. This assumption is not true because the issuer task can be woken up earlier by a signal. If the demultiplexer thread has proceeded as far as setting the mid_state to MID_RESPONSE_RECEIVED then the issuer thread will happily end up calling cifs_delete_mid while the demultiplexer thread still is using the mid object. Inserting a delay in the cifs demultiplexer thread widens the race window and makes reproduction of the race very easy: if (server->large_buf) buf = server->bigbuf; + usleep_range(500, 4000); server->lstrp = jiffies; To resolve this I think the proper solution involves putting a reference count on the mid object. This patch makes sure that the demultiplexer thread holds a reference until it has finished processing the transaction. Cc: stable@vger.kernel.org Signed-off-by: Lars Persson <larper@axis.com> Acked-by: Paulo Alcantara <palcantara@suse.de> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>