summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2018-04-12SMB3: Fix length checking of SMB3.11 negotiate requestSteve French
The length checking for SMB3.11 negotiate request includes "negotiate contexts" which caused a buffer validation problem and a confusing warning message on SMB3.11 mount e.g.: SMB2 server sent bad RFC1001 len 236 not 170 Fix the length checking for SMB3.11 negotiate to account for the new negotiate context so that we don't log a warning on SMB3.11 mount by default but do log warnings if lengths returned by the server are incorrect. CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2018-04-12Merge tag 'xfs-4.17-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds
Pull more xfs updates from Darrick Wong: "Most of these are code cleanups, but there are a couple of notable use-after-free bug fixes. This series has been run through a full xfstests run over the week and through a quick xfstests run against this morning's master, with no major failures reported. - clean up unnecessary function call parameters - fix a use-after-free bug when aborting logging intents - refactor filestreams state data to avoid use-after-free bug - fix incorrect removal of cow extents when truncating extended attributes. - refactor open-coded __set_page_dirty in favor of using vfs function. - fix a deadlock when fstrim and fs shutdown race" * tag 'xfs-4.17-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: Force log to disk before reading the AGF during a fstrim Export __set_page_dirty xfs: only cancel cow blocks when truncating the data fork xfs: non-scrub - remove unused function parameters xfs: remove filestream item xfs_inode reference xfs: fix intent use-after-free on abort xfs: Remove "committed" argument of xfs_dir_ialloc
2018-04-12Merge tag 'gfs2-4.17.fixes2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull more gfs2 updates from Bob Peterson: "We decided to request the latest three patches to be merged into this merge window while it's still open. - The first patch adds a new function to lockref: lockref_put_not_zero - The second patch fixes GFS2's glock dump code so it uses the new lockref function. This fixes a problem whereby lock dumps could miss glocks. - I made a minor patch to update some comments and fix the lock ordering text in our gfs2-glocks.txt Documentation file" * tag 'gfs2-4.17.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: GFS2: Minor improvements to comments and documentation gfs2: Stop using rhashtable_walk_peek lockref: Add lockref_put_not_zero
2018-04-12Merge tag 'nfs-for-4.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - xprtrdma: Fix corner cases when handling device removal # v4.12+ - xprtrdma: Fix latency regression on NUMA NFS/RDMA clients # v4.15+ Features: - New sunrpc tracepoint for RPC pings - Finer grained NFSv4 attribute checking - Don't unnecessarily return NFS v4 delegations Other bugfixes and cleanups: - Several other small NFSoRDMA cleanups - Improvements to the sunrpc RTT measurements - A few sunrpc tracepoint cleanups - Various fixes for NFS v4 lock notifications - Various sunrpc and NFS v4 XDR encoding cleanups - Switch to the ida_simple API - Fix NFSv4.1 exclusive create - Forget acl cache after setattr operation - Don't advance the nfs_entry readdir cookie if xdr decoding fails" * tag 'nfs-for-4.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (47 commits) NFS: advance nfs_entry cookie only after decoding completes successfully NFSv3/acl: forget acl cache after setattr NFSv4.1: Fix exclusive create NFSv4: Declare the size up to date after it was set. nfs: Use ida_simple API NFSv4: Fix the nfs_inode_set_delegation() arguments NFSv4: Clean up CB_GETATTR encoding NFSv4: Don't ask for attributes when ACCESS is protected by a delegation NFSv4: Add a helper to encode/decode struct timespec NFSv4: Clean up encode_attrs NFSv4; Clean up XDR encoding of type bitmap4 NFSv4: Allow GFP_NOIO sleeps in decode_attr_owner/decode_attr_group SUNRPC: Add a helper for encoding opaque data inline SUNRPC: Add helpers for decoding opaque and string types NFSv4: Ignore change attribute invalidations if we hold a delegation NFS: More fine grained attribute tracking NFS: Don't force unnecessary cache invalidation in nfs_update_inode() NFS: Don't redirty the attribute cache in nfs_wcc_update_inode() NFS: Don't force a revalidation of all attributes if change is missing NFS: Convert NFS_INO_INVALID flags to unsigned long ...
2018-04-12Merge branch 'work.thaw' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs thaw updates from Al Viro: "An ancient series that has fallen through the cracks in the previous cycle" * 'work.thaw' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: buffer.c: call thaw_super during emergency thaw vfs: factor sb iteration out of do_emergency_remount
2018-04-12Merge branch 'afs-dh' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull AFS updates from Al Viro: "The AFS series posted by dhowells depended upon lookup_one_len() rework; now that prereq is in the mainline, that series had been rebased on top of it and got some exposure and testing..." * 'afs-dh' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: afs: Do better accretion of small writes on newly created content afs: Add stats for data transfer operations afs: Trace protocol errors afs: Locally edit directory data for mkdir/create/unlink/... afs: Adjust the directory XDR structures afs: Split the directory content defs into a header afs: Fix directory handling afs: Split the dynroot stuff out and give it its own ops tables afs: Keep track of invalid-before version for dentry coherency afs: Rearrange status mapping afs: Make it possible to get the data version in readpage afs: Init inode before accessing cache afs: Introduce a statistics proc file afs: Dump bad status record afs: Implement @cell substitution handling afs: Implement @sys substitution handling afs: Prospectively look up extra files when doing a single lookup afs: Don't over-increment the cell usage count when pinning it afs: Fix checker warnings vfs: Remove the const from dir_context::actor
2018-04-12GFS2: Minor improvements to comments and documentationBob Peterson
This patch simply fixes some comments and the gfs2-glocks.txt file: Places where i_rwsem was called i_mutex, and adding i_rw_mutex. Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-04-12gfs2: Stop using rhashtable_walk_peekAndreas Gruenbacher
Function rhashtable_walk_peek is problematic because there is no guarantee that the glock previously returned still exists; when that key is deleted, rhashtable_walk_peek can end up returning a different key, which will cause an inconsistent glock dump. Fix this by keeping track of the current glock in the seq file iterator functions instead. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2018-04-12btrfs: add SPDX header to KconfigDavid Sterba
Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-12btrfs: replace GPL boilerplate by SPDX -- sourcesDavid Sterba
Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Add the SPDX header. Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-12btrfs: replace GPL boilerplate by SPDX -- headersDavid Sterba
Remove GPL boilerplate text (long, short, one-line) and keep the rest, ie. personal, company or original source copyright statements. Add the SPDX header. Unify the include protection macros to match the file names. Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-12Btrfs: fix loss of prealloc extents past i_size after fsync log replayFilipe Manana
Currently if we allocate extents beyond an inode's i_size (through the fallocate system call) and then fsync the file, we log the extents but after a power failure we replay them and then immediately drop them. This behaviour happens since about 2009, commit c71bf099abdd ("Btrfs: Avoid orphan inodes cleanup while replaying log"), because it marks the inode as an orphan instead of dropping any extents beyond i_size before replaying logged extents, so after the log replay, and while the mount operation is still ongoing, we find the inode marked as an orphan and then perform a truncation (drop extents beyond the inode's i_size). Because the processing of orphan inodes is still done right after replaying the log and before the mount operation finishes, the intention of that commit does not make any sense (at least as of today). However reverting that behaviour is not enough, because we can not simply discard all extents beyond i_size and then replay logged extents, because we risk dropping extents beyond i_size created in past transactions, for example: add prealloc extent beyond i_size fsync - clears the flag BTRFS_INODE_NEEDS_FULL_SYNC from the inode transaction commit add another prealloc extent beyond i_size fsync - triggers the fast fsync path power failure In that scenario, we would drop the first extent and then replay the second one. To fix this just make sure that all prealloc extents beyond i_size are logged, and if we find too many (which is far from a common case), fallback to a full transaction commit (like we do when logging regular extents in the fast fsync path). Trivial reproducer: $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ xfs_io -f -c "pwrite -S 0xab 0 256K" /mnt/foo $ sync $ xfs_io -c "falloc -k 256K 1M" /mnt/foo $ xfs_io -c "fsync" /mnt/foo <power failure> # mount to replay log $ mount /dev/sdb /mnt # at this point the file only has one extent, at offset 0, size 256K A test case for fstests follows soon, covering multiple scenarios that involve adding prealloc extents with previous shrinking truncates and without such truncates. Fixes: c71bf099abdd ("Btrfs: Avoid orphan inodes cleanup while replaying log") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-12Btrfs: clean up resources during umount after trans is abortedLiu Bo
Currently if some fatal errors occur, like all IO get -EIO, resources would be cleaned up when a) transaction is being committed or b) BTRFS_FS_STATE_ERROR is set However, in some rare cases, resources may be left alone after transaction gets aborted and umount may run into some ASSERT(), e.g. ASSERT(list_empty(&block_group->dirty_list)); For case a), in btrfs_commit_transaciton(), there're several places at the beginning where we just call btrfs_end_transaction() without cleaning up resources. For case b), it is possible that the trans handle doesn't have any dirty stuff, then only trans hanlde is marked as aborted while BTRFS_FS_STATE_ERROR is not set, so resources remain in memory. This makes btrfs also check BTRFS_FS_STATE_TRANS_ABORTED to make sure that all resources won't stay in memory after umount. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: David Sterba <dsterba@suse.com>
2018-04-12ovl: add support for "xino" mount and config optionsAmir Goldstein
With mount option "xino=on", mounter declares that there are enough free high bits in underlying fs to hold the layer fsid. If overlayfs does encounter underlying inodes using the high xino bits reserved for layer fsid, a warning will be emitted and the original inode number will be used. The mount option name "xino" goes after a similar meaning mount option of aufs, but in overlayfs case, the mapping is stateless. An example for a use case of "xino=on" is when upper/lower is on an xfs filesystem. xfs uses 64bit inode numbers, but it currently never uses the upper 8bit for inode numbers exposed via stat(2) and that is not likely to change in the future without user opting-in for a new xfs feature. The actual number of unused upper bit is much larger and determined by the xfs filesystem geometry (64 - agno_log - agblklog - inopblog). That means that for all practical purpose, there are enough unused bits in xfs inode numbers for more than OVL_MAX_STACK unique fsid's. Another use case of "xino=on" is when upper/lower is on tmpfs. tmpfs inode numbers are allocated sequentially since boot, so they will practially never use the high inode number bits. For compatibility with applications that expect 32bit inodes, the feature can be disabled with "xino=off". The option "xino=auto" automatically detects underlying filesystem that use 32bit inodes and enables the feature. The Kconfig option OVERLAY_FS_XINO_AUTO and module parameter of the same name, determine if the default mode for overlayfs mount is "xino=auto" or "xino=off". Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: consistent d_ino for non-samefs with xinoAmir Goldstein
When overlay layers are not all on the same fs, but all inode numbers of underlying fs do not use the high 'xino' bits, overlay st_ino values are constant and persistent. In that case, relax non-samefs constraint for consistent d_ino and always iterate non-merge dir using ovl_fill_real() actor so we can remap lower inode numbers to unique lower fs range. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: consistent i_ino for non-samefs with xinoAmir Goldstein
When overlay layers are not all on the same fs, but all inode numbers of underlying fs do not use the high 'xino' bits, overlay st_ino values are constant and persistent. In that case, set i_ino value to the same value as st_ino for nfsd readdirplus validator. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: constant st_ino for non-samefs with xinoAmir Goldstein
On 64bit systems, when overlay layers are not all on the same fs, but all inode numbers of underlying fs are not using the high bits, use the high bits to partition the overlay st_ino address space. The high bits hold the fsid (upper fsid is 0). This way overlay inode numbers are unique and all inodes use overlay st_dev. Inode numbers are also persistent for a given layer configuration. Currently, our only indication for available high ino bits is from a filesystem that supports file handles and uses the default encode_fh() operation, which encodes a 32bit inode number. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: allocate anon bdev per unique lower fsAmir Goldstein
Instead of allocating an anonymous bdev per lower layer, allocate one anonymous bdev per every unique lower fs that is different than upper fs. Every unique lower fs is assigned an fsid > 0 and the number of unique lower fs are stored in ofs->numlowerfs. The assigned fsid is stored in the lower layer struct and will be used also for inode number multiplexing. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: factor out ovl_map_dev_ino() helperAmir Goldstein
A helper for ovl_getattr() to map the values of st_dev and st_ino according to constant st_ino rules. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: cleanup ovl_update_time()Miklos Szeredi
No need to mess with an alias, the upperdentry can be retrieved directly from the overlay inode. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: add WARN_ON() for non-dir redirect casesMiklos Szeredi
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: cleanup setting OVL_INDEXVivek Goyal
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: set d->is_dir and d->opaque for last path elementVivek Goyal
Certain properties in ovl_lookup_data should be set only for the last element of the path. IOW, if we are calling ovl_lookup_single() for an absolute redirect, then d->is_dir and d->opaque do not make much sense for intermediate path elements. Instead set them only if dentry being lookup is last path element. As of now we do not seem to be making use of d->opaque if it is set for a path/dentry in lower. But just define the semantics so that future code can make use of this assumption. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: Do not check for redirect if this is last layerVivek Goyal
If we are looking in last layer, then there should not be any need to process redirect. redirect information is used only for lookup in next lower layer and there is no more lower layer to look into. So no need to process redirects. IOW, ignore redirects on lowest layer. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: lookup in inode cache first when decoding lower file handleAmir Goldstein
When decoding a lower file handle, we need to check if lower file was copied up and indexed and if it has a whiteout index, we need to check if this is an unlinked but open non-dir before returning -ESTALE. To find out if this is an unlinked but open non-dir we need to lookup an overlay inode in inode cache by lower inode and that requires decoding the lower file handle before looking in inode cache. Before this change, if the lower inode turned out to be a directory, we may have paid an expensive cost to reconnect that lower directory for nothing. After this change, we start by decoding a disconnected lower dentry and using the lower inode for looking up an overlay inode in inode cache. If we find overlay inode and dentry in cache, we avoid the index lookup overhead. If we don't find an overlay inode and dentry in cache, then we only need to decode a connected lower dentry in case the lower dentry is a non-indexed directory. The xfstests group overlay/exportfs tests decoding overlayfs file handles after drop_caches with different states of the file at encode and decode time. Overall the tests in the group call ovl_lower_fh_to_d() 89 times to decode a lower file handle. Before this change, the tests called ovl_get_index_fh() 75 times and reconnect_one() 61 times. After this change, the tests call ovl_get_index_fh() 70 times and reconnect_one() 59 times. The 2 cases where reconnect_one() was avoided are cases where a non-upper directory file handle was encoded, then the directory removed and then file handle was decoded. To demonstrate the affect on decoding file handles with hot inode/dentry cache, the drop_caches call in the tests was disabled. Without drop_caches, there are no reconnect_one() calls at all before or after the change. Before the change, there are 75 calls to ovl_get_index_fh(), exactly as the case with drop_caches. After the change, there are only 10 calls to ovl_get_index_fh(). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: do not try to reconnect a disconnected origin dentryAmir Goldstein
On lookup of non directory, we try to decode the origin file handle stored in upper inode. The origin file handle is supposed to be decoded to a disconnected non-dir dentry, which is fine, because we only need the lower inode of a copy up origin. However, if the origin file handle somehow turns out to be a directory we pay the expensive cost of reconnecting the directory dentry, only to get a mismatch file type and drop the dentry. Optimize this case by explicitly opting out of reconnecting the dentry. Opting-out of reconnect is done by passing a NULL acceptable callback to exportfs_decode_fh(). While the case described above is a strange corner case that does not really need to be optimized, the API added for this optimization will be used by a following patch to optimize a more common case of decoding an overlayfs file handle. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: disambiguate ovl_encode_fh()Amir Goldstein
Rename ovl_encode_fh() to ovl_encode_real_fh() to differentiate from the exportfs function ovl_encode_inode_fh() and change the latter to ovl_encode_fh() to match the exportfs method name. Rename ovl_decode_fh() to ovl_decode_real_fh() for consistency. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: set lower layer st_dev only if setting lower st_inoAmir Goldstein
For broken hardlinks, we do not return lower st_ino, so we should also not return lower pseudo st_dev. Fixes: a0c5ad307ac0 ("ovl: relax same fs constraint for constant st_ino") Cc: <stable@vger.kernel.org> #v4.15 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: fix lookup with middle layer opaque dir and absolute path redirectsAmir Goldstein
As of now if we encounter an opaque dir while looking for a dentry, we set d->last=true. This means that there is no need to look further in any of the lower layers. This works fine as long as there are no redirets or relative redircts. But what if there is an absolute redirect on the children dentry of opaque directory. We still need to continue to look into next lower layer. This patch fixes it. Here is an example to demonstrate the issue. Say you have following setup. upper: /redirect (redirect=/a/b/c) lower1: /a/[b]/c ([b] is opaque) (c has absolute redirect=/a/b/d/) lower0: /a/b/d/foo Now "redirect" dir should merge with lower1:/a/b/c/ and lower0:/a/b/d. Note, despite the fact lower1:/a/[b] is opaque, we need to continue to look into lower0 because children c has an absolute redirect. Following is a reproducer. Watch me make foo disappear: $ mkdir lower middle upper work work2 merged $ mkdir lower/origin $ touch lower/origin/foo $ mount -t overlay none merged/ \ -olowerdir=lower,upperdir=middle,workdir=work2 $ mkdir merged/pure $ mv merged/origin merged/pure/redirect $ umount merged $ mount -t overlay none merged/ \ -olowerdir=middle:lower,upperdir=upper,workdir=work $ mv merged/pure/redirect merged/redirect Now you see foo inside a twice redirected merged dir: $ ls merged/redirect foo $ umount merged $ mount -t overlay none merged/ \ -olowerdir=middle:lower,upperdir=upper,workdir=work After mount cycle you don't see foo inside the same dir: $ ls merged/redirect During middle layer lookup, the opaqueness of middle/pure is left in the lookup state and then middle/pure/redirect is wrongly treated as opaque. Fixes: 02b69b284cd7 ("ovl: lookup redirects") Cc: <stable@vger.kernel.org> #v4.10 Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-12ovl: Set d->last properly during lookupVivek Goyal
d->last signifies that this is the last layer we are looking into and there is no more. And that means this allows for some optimzation opportunities during lookup. For example, in ovl_lookup_single() we don't have to check for opaque xattr of a directory is this is the last layer we are looking into (d->last = true). But knowing for sure whether we are looking into last layer can be very tricky. If redirects are not enabled, then we can look at poe->numlower and figure out if the lookup we are about to is last layer or not. But if redircts are enabled then it is possible poe->numlower suggests that we are looking in last layer, but there is an absolute redirect present in found element and that redirects us to a layer in root and that means lookup will continue in lower layers further. For example, consider following. /upperdir/pure (opaque=y) /upperdir/pure/foo (opaque=y,redirect=/bar) /lowerdir/bar In this case pure is "pure upper". When we look for "foo", that time poe->numlower=0. But that alone does not mean that we will not search for a merge candidate in /lowerdir. Absolute redirect changes that. IOW, d->last should not be set just based on poe->numlower if redirects are enabled. That can lead to setting d->last while it should not have and that means we will not check for opaque xattr while we should have. So do this. - If redirects are not enabled, then continue to rely on poe->numlower information to determine if it is last layer or not. - If redirects are enabled, then set d->last = true only if this is the last layer in root ovl_entry (roe). Suggested-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Fixes: 02b69b284cd7 ("ovl: lookup redirects") Cc: <stable@vger.kernel.org> #v4.10
2018-04-12ovl: set i_ino to the value of st_ino for NFS exportAmir Goldstein
Eddie Horng reported that readdir of an overlayfs directory that was exported via NFSv3 returns entries with d_type set to DT_UNKNOWN. The reason is that while preparing the response for readdirplus, nfsd checks inside encode_entryplus_baggage() that a child dentry's inode number matches the value of d_ino returns by overlayfs readdir iterator. Because the overlayfs inodes use arbitrary inode numbers that are not correlated with the values of st_ino/d_ino, NFSv3 falls back to not encoding d_type. Although this is an allowed behavior, we can fix it for the case of all overlayfs layers on the same underlying filesystem. When NFS export is enabled and d_ino is consistent with st_ino (samefs), set the same value also to i_ino in ovl_fill_inode() for all overlayfs inodes, nfsd readdirplus sanity checks will pass. ovl_fill_inode() may be called from ovl_new_inode(), before real inode was created with ino arg 0. In that case, i_ino will be updated to real upper inode i_ino on ovl_inode_init() or ovl_inode_update(). Reported-by: Eddie Horng <eddiehorng.tw@gmail.com> Tested-by: Eddie Horng <eddiehorng.tw@gmail.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Fixes: 8383f1748829 ("ovl: wire up NFS export operations") Cc: <stable@vger.kernel.org> #v4.16 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2018-04-11Merge tag 'tags/upstream-4.17-rc1' of git://git.infradead.org/linux-ubifsLinus Torvalds
Pull UBI and UBIFS updates from Richard Weinberger: "Minor bug fixes and improvements" * tag 'tags/upstream-4.17-rc1' of git://git.infradead.org/linux-ubifs: ubi: Reject MLC NAND ubifs: Remove useless parameter of lpt_heap_replace ubifs: Constify struct ubifs_lprops in scan_for_leb_for_idx ubifs: remove unnecessary assignment ubi: Fix error for write access ubi: fastmap: Don't flush fastmap work on detach ubifs: Check ubifs_wbuf_sync() return code
2018-04-11CIFS: add ONCE flag for cifs_dbg typeAurelien Aptel
* Since cifs_vfs_error was just using pr_debug_ratelimited like the rest of cifs_dbg, move it there too * Add a ONCE type flag to call the pr_xxx_once() debug function instead of the ratelimited ones. To convert existing printk_once() calls to this we can run: perl -i -pE \ 's/printk_once\s*\(([^" \n]+)(.*)/cifs_dbg(VFS|ONCE,$2/g' \ fs/cifs/*.c Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-04-11cifs: Use ULL suffix for 64-bit constantGeert Uytterhoeven
On 32-bit (e.g. with m68k-linux-gnu-gcc-4.1): fs/cifs/inode.c: In function ‘simple_hashstr’: fs/cifs/inode.c:713: warning: integer constant is too large for ‘long’ type Fixes: 7ea884c77e5c97f1 ("smb3: Fix root directory when server returns inode number of zero") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-04-11SMB3: Log at least once if tree connect fails during reconnectSteve French
Adding an extra debug message to show if a tree connect failure during reconnect (and made it a log once so it doesn't spam the logs). Saw a case recently where tree connect repeatedly returned access denied on reconnect and it wasn't as easy to spot as it should have been. Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com>
2018-04-11cifs: smb2pdu: Fix potential NULL pointer dereferenceGustavo A. R. Silva
tcon->ses is being dereferenced before it is null checked, hence there is a potential null pointer dereference. Fix this by moving the pointer dereference after tcon->ses has been properly null checked. Addresses-Coverity-ID: 1467426 ("Dereference before null check") Fixes: 93012bf98416 ("cifs: add server->vals->header_preamble_size") Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
2018-04-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: - almost all of the rest of MM - kasan updates - lots of procfs work - misc things - lib/ updates - checkpatch - rapidio - ipc/shm updates - the start of willy's XArray conversion * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (140 commits) page cache: use xa_lock xarray: add the xa_lock to the radix_tree_root fscache: use appropriate radix tree accessors export __set_page_dirty unicore32: turn flush_dcache_mmap_lock into a no-op arm64: turn flush_dcache_mmap_lock into a no-op mac80211_hwsim: use DEFINE_IDA radix tree: use GFP_ZONEMASK bits of gfp_t for flags linux/const.h: refactor _BITUL and _BITULL a bit linux/const.h: move UL() macro to include/linux/const.h linux/const.h: prefix include guard of uapi/linux/const.h with _UAPI xen, mm: allow deferred page initialization for xen pv domains elf: enforce MAP_FIXED on overlaying elf segments fs, elf: drop MAP_FIXED usage from elf_map mm: introduce MAP_FIXED_NOREPLACE MAINTAINERS: update bouncing aacraid@adaptec.com addresses fs/dcache.c: add cond_resched() in shrink_dentry_list() include/linux/kfifo.h: fix comment ipc/shm.c: shm_split(): remove unneeded test for NULL shm_file_data.vm_ops kernel/sysctl.c: add kdoc comments to do_proc_do{u}intvec_minmax_conv_param ...
2018-04-11page cache: use xa_lockMatthew Wilcox
Remove the address_space ->tree_lock and use the xa_lock newly added to the radix_tree_root. Rename the address_space ->page_tree to ->i_pages, since we don't really care that it's a tree. [willy@infradead.org: fix nds32, fs/dax.c] Link: http://lkml.kernel.org/r/20180406145415.GB20605@bombadil.infradead.orgLink: http://lkml.kernel.org/r/20180313132639.17387-9-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Jeff Layton <jlayton@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11xarray: add the xa_lock to the radix_tree_rootMatthew Wilcox
This results in no change in structure size on 64-bit machines as it fits in the padding between the gfp_t and the void *. 32-bit machines will grow the structure from 8 to 12 bytes. Almost all radix trees are protected with (at least) a spinlock, so as they are converted from radix trees to xarrays, the data structures will shrink again. Initialising the spinlock requires a name for the benefit of lockdep, so RADIX_TREE_INIT() now needs to know the name of the radix tree it's initialising, and so do IDR_INIT() and IDA_INIT(). Also add the xa_lock() and xa_unlock() family of wrappers to make it easier to use the lock. If we could rely on -fplan9-extensions in the compiler, we could avoid all of this syntactic sugar, but that wasn't added until gcc 4.6. Link: http://lkml.kernel.org/r/20180313132639.17387-8-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11fscache: use appropriate radix tree accessorsMatthew Wilcox
Don't open-code accesses to data structure internals. Link: http://lkml.kernel.org/r/20180313132639.17387-7-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11export __set_page_dirtyMatthew Wilcox
XFS currently contains a copy-and-paste of __set_page_dirty(). Export it from buffer.c instead. Link: http://lkml.kernel.org/r/20180313132639.17387-6-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Acked-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: Dave Chinner <david@fromorbit.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11elf: enforce MAP_FIXED on overlaying elf segmentsMichal Hocko
Anshuman has reported that with "fs, elf: drop MAP_FIXED usage from elf_map" applied, some ELF binaries in his environment fail to start with [ 23.423642] 9148 (sed): Uhuuh, elf segment at 0000000010030000 requested but the memory is mapped already [ 23.423706] requested [10030000, 10040000] mapped [10030000, 10040000] 100073 anon The reason is that the above binary has overlapping elf segments: LOAD 0x0000000000000000 0x0000000010000000 0x0000000010000000 0x0000000000013a8c 0x0000000000013a8c R E 10000 LOAD 0x000000000001fd40 0x000000001002fd40 0x000000001002fd40 0x00000000000002c0 0x00000000000005e8 RW 10000 LOAD 0x0000000000020328 0x0000000010030328 0x0000000010030328 0x0000000000000384 0x00000000000094a0 RW 10000 That binary has two RW LOAD segments, the first crosses a page border into the second 0x1002fd40 (LOAD2-vaddr) + 0x5e8 (LOAD2-memlen) == 0x10030328 (LOAD3-vaddr) Handle this situation by enforcing MAP_FIXED when we establish a temporary brk VMA to handle overlapping segments. All other mappings will still use MAP_FIXED_NOREPLACE. Link: http://lkml.kernel.org/r/20180213100440.GM3443@dhcp22.suse.cz Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: Andrei Vagin <avagin@openvz.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Kees Cook <keescook@chromium.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Joel Stanley <joel@jms.id.au> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11fs, elf: drop MAP_FIXED usage from elf_mapMichal Hocko
Both load_elf_interp and load_elf_binary rely on elf_map to map segments on a controlled address and they use MAP_FIXED to enforce that. This is however dangerous thing prone to silent data corruption which can be even exploitable. Let's take CVE-2017-1000253 as an example. At the time (before commit eab09532d400: "binfmt_elf: use ELF_ET_DYN_BASE only for PIE") ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from the stack top on 32b (legacy) memory layout (only 1GB away). Therefore we could end up mapping over the existing stack with some luck. The issue has been fixed since then (a87938b2e246: "fs/binfmt_elf.c: fix bug in loading of PIE binaries"), ELF_ET_DYN_BASE moved moved much further from the stack (eab09532d400 and later by c715b72c1ba4: "mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes") and excessive stack consumption early during execve fully stopped by da029c11e6b1 ("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be safe and any attack should be impractical. On the other hand this is just too subtle assumption so it can break quite easily and hard to spot. I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still fundamentally dangerous. Moreover it shouldn't be even needed. We are at the early process stage and so there shouldn't be unrelated mappings (except for stack and loader) existing so mmap for a given address should succeed even without MAP_FIXED. Something is terribly wrong if this is not the case and we should rather fail than silently corrupt the underlying mapping. Address this issue by changing MAP_FIXED to the newly added MAP_FIXED_NOREPLACE. This will mean that mmap will fail if there is an existing mapping clashing with the requested one without clobbering it. [mhocko@suse.com: fix build] [akpm@linux-foundation.org: coding-style fixes] [avagin@openvz.org: don't use the same value for MAP_FIXED_NOREPLACE and MAP_SYNC] Link: http://lkml.kernel.org/r/20171218184916.24445-1-avagin@openvz.org Link: http://lkml.kernel.org/r/20171213092550.2774-3-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Kees Cook <keescook@chromium.org> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Joel Stanley <joel@jms.id.au> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11fs/dcache.c: add cond_resched() in shrink_dentry_list()Nikolay Borisov
As previously reported (https://patchwork.kernel.org/patch/8642031/) it's possible to call shrink_dentry_list with a large number of dentries (> 10000). This, in turn, could trigger the softlockup detector and possibly trigger a panic. In addition to the unmount path being vulnerable to this scenario, at SuSE we've observed similar situation happening during process exit on processes that touch a lot of dentries. Here is an excerpt from a crash dump. The number after the colon are the number of dentries on the list passed to shrink_dentry_list: PID 99760: 10722 PID 107530: 215 PID 108809: 24134 PID 108877: 21331 PID 141708: 16487 So we want to kill between 15k-25k dentries without yielding. And one possible call stack looks like: 4 [ffff8839ece41db0] _raw_spin_lock at ffffffff8152a5f8 5 [ffff8839ece41db0] evict at ffffffff811c3026 6 [ffff8839ece41dd0] __dentry_kill at ffffffff811bf258 7 [ffff8839ece41df0] shrink_dentry_list at ffffffff811bf593 8 [ffff8839ece41e18] shrink_dcache_parent at ffffffff811bf830 9 [ffff8839ece41e50] proc_flush_task at ffffffff8120dd61 10 [ffff8839ece41ec0] release_task at ffffffff81059ebd 11 [ffff8839ece41f08] do_exit at ffffffff8105b8ce 12 [ffff8839ece41f78] sys_exit at ffffffff8105bd53 13 [ffff8839ece41f80] system_call_fastpath at ffffffff81532909 While some of the callers of shrink_dentry_list do use cond_resched, this is not sufficient to prevent softlockups. So just move cond_resched into shrink_dentry_list from its callers. David said: I've found hundreds of occurrences of warnings that we emit when need_resched stays set for a prolonged period of time with the stack trace that is included in the change log. Link: http://lkml.kernel.org/r/1521718946-31521-1-git-send-email-nborisov@suse.com Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Rientjes <rientjes@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Goldwyn Rodrigues <rgoldwyn@suse.de> Cc: Jeff Mahoney <jeffm@suse.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11fs/proc/proc_sysctl.c: fix typo in sysctl_check_table_array()Waiman Long
Patch series "ipc: Clamp *mni to the real IPCMNI limit", v3. The sysctl parameters msgmni, shmmni and semmni have an inherent limit of IPC_MNI (32k). However, users may not be aware of that because they can write a value much higher than that without getting any error or notification. Reading the parameters back will show the newly written values which are not real. Enforcing the limit by failing sysctl parameter write, however, can break existing user applications. To address this delemma, a new flags field is introduced into the ctl_table. The value CTL_FLAGS_CLAMP_RANGE can be added to any ctl_table entries to enable a looser range clamping without returning any error. For example, .flags = CTL_FLAGS_CLAMP_RANGE, This flags value are now used for the range checking of shmmni, msgmni and semmni without breaking existing applications. If any out of range value is written to those sysctl parameters, the following warning will be printed instead. Kernel parameter "shmmni" was set out of range [0, 32768], clamped to 32768. Reading the values back will show 32768 instead of some fake values. This patch (of 6): Fix a typo. Link: http://lkml.kernel.org/r/1519926220-7453-2-git-send-email-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Luis R. Rodriguez <mcgrof@kernel.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11exec: pin stack limit during execKees Cook
Since the stack rlimit is used in multiple places during exec and it can be changed via other threads (via setrlimit()) or processes (via prlimit()), the assumption that the value doesn't change cannot be made. This leads to races with mm layout selection and argument size calculations. This changes the exec path to use the rlimit stored in bprm instead of in current. Before starting the thread, the bprm stack rlimit is stored back to current. Link: http://lkml.kernel.org/r/1518638796-20819-4-git-send-email-keescook@chromium.org Fixes: 64701dee4178e ("exec: Use sane stack rlimit under secureexec") Signed-off-by: Kees Cook <keescook@chromium.org> Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Reported-by: Andy Lutomirski <luto@kernel.org> Reported-by: Brad Spengler <spender@grsecurity.net> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Greg KH <greg@kroah.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11exec: introduce finalize_exec() before start_thread()Kees Cook
Provide a final callback into fs/exec.c before start_thread() takes over, to handle any last-minute changes, like the coming restoration of the stack limit. Link: http://lkml.kernel.org/r/1518638796-20819-3-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Cc: Brad Spengler <spender@grsecurity.net> Cc: Greg KH <greg@kroah.com> Cc: Hugh Dickins <hughd@google.com> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Willy Tarreau <w@1wt.eu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11exec: pass stack rlimit into mm layout functionsKees Cook
Patch series "exec: Pin stack limit during exec". Attempts to solve problems with the stack limit changing during exec continue to be frustrated[1][2]. In addition to the specific issues around the Stack Clash family of flaws, Andy Lutomirski pointed out[3] other places during exec where the stack limit is used and is assumed to be unchanging. Given the many places it gets used and the fact that it can be manipulated/raced via setrlimit() and prlimit(), I think the only way to handle this is to move away from the "current" view of the stack limit and instead attach it to the bprm, and plumb this down into the functions that need to know the stack limits. This series implements the approach. [1] 04e35f4495dd ("exec: avoid RLIMIT_STACK races with prlimit()") [2] 779f4e1c6c7c ("Revert "exec: avoid RLIMIT_STACK races with prlimit()"") [3] to security@kernel.org, "Subject: existing rlimit races?" This patch (of 3): Since it is possible that the stack rlimit can change externally during exec (either via another thread calling setrlimit() or another process calling prlimit()), provide a way to pass the rlimit down into the per-architecture mm layout functions so that the rlimit can stay in the bprm structure instead of sitting in the signal structure until exec is finalized. Link: http://lkml.kernel.org/r/1518638796-20819-2-git-send-email-keescook@chromium.org Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Willy Tarreau <w@1wt.eu> Cc: Hugh Dickins <hughd@google.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Rik van Riel <riel@redhat.com> Cc: Laura Abbott <labbott@redhat.com> Cc: Greg KH <greg@kroah.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Cc: Brad Spengler <spender@grsecurity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11seq_file: account everything to kmemcgAlexey Dobriyan
All it takes to open a file and read 1 byte from it. seq_file will be allocated along with any private allocations, and more importantly seq file buffer which is 1 page by default. Link: http://lkml.kernel.org/r/20180310085252.GB17121@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Glauber Costa <glommer@gmail.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-04-11seq_file: allocate seq_file from kmem_cacheAlexey Dobriyan
For fine-grained debugging and usercopy protection. Link: http://lkml.kernel.org/r/20180310085027.GA17121@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Glauber Costa <glommer@gmail.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>