summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2015-06-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Add TX fast path in mac80211, from Johannes Berg. 2) Add TSO/GRO support to ibmveth, from Thomas Falcon 3) Move away from cached routes in ipv6, just like ipv4, from Martin KaFai Lau. 4) Lots of new rhashtable tests, from Thomas Graf. 5) Run ingress qdisc lockless, from Alexei Starovoitov. 6) Allow servers to fetch TCP packet headers for SYN packets of new connections, for fingerprinting. From Eric Dumazet. 7) Add mode parameter to pktgen, for testing receive. From Alexei Starovoitov. 8) Cache access optimizations via simplifications of build_skb(), from Alexander Duyck. 9) Move page frag allocator under mm/, also from Alexander. 10) Add xmit_more support to hv_netvsc, from KY Srinivasan. 11) Add a counter guard in case we try to perform endless reclassify loops in the packet scheduler. 12) Extern flow dissector to be programmable and use it in new "Flower" classifier. From Jiri Pirko. 13) AF_PACKET fanout rollover fixes, performance improvements, and new statistics. From Willem de Bruijn. 14) Add netdev driver for GENEVE tunnels, from John W Linville. 15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso. 16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet. 17) Add an ECN retry fallback for the initial TCP handshake, from Daniel Borkmann. 18) Add tail call support to BPF, from Alexei Starovoitov. 19) Add several pktgen helper scripts, from Jesper Dangaard Brouer. 20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa. 21) Favor even port numbers for allocation to connect() requests, and odd port numbers for bind(0), in an effort to help avoid ip_local_port_range exhaustion. From Eric Dumazet. 22) Add Cavium ThunderX driver, from Sunil Goutham. 23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata, from Alexei Starovoitov. 24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai. 25) Double TCP Small Queues default to 256K to accomodate situations like the XEN driver and wireless aggregation. From Wei Liu. 26) Add more entropy inputs to flow dissector, from Tom Herbert. 27) Add CDG congestion control algorithm to TCP, from Kenneth Klette Jonassen. 28) Convert ipset over to RCU locking, from Jozsef Kadlecsik. 29) Track and act upon link status of ipv4 route nexthops, from Andy Gospodarek. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits) bridge: vlan: flush the dynamically learned entries on port vlan delete bridge: multicast: add a comment to br_port_state_selection about blocking state net: inet_diag: export IPV6_V6ONLY sockopt stmmac: troubleshoot unexpected bits in des0 & des1 net: ipv4 sysctl option to ignore routes when nexthop link is down net: track link-status of ipv4 nexthops net: switchdev: ignore unsupported bridge flags net: Cavium: Fix MAC address setting in shutdown state drivers: net: xgene: fix for ACPI support without ACPI ip: report the original address of ICMP messages net/mlx5e: Prefetch skb data on RX net/mlx5e: Pop cq outside mlx5e_get_cqe net/mlx5e: Remove mlx5e_cq.sqrq back-pointer net/mlx5e: Remove extra spaces net/mlx5e: Avoid TX CQE generation if more xmit packets expected net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq() net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device ...
2015-06-23Merge tag 'for-linus-20150623' of git://git.infradead.org/linux-mtdLinus Torvalds
Pull MTD updates from Brian Norris: "JFFS2: - fix a theoretical unbalanced locking issue; the lock handling was a bit unclean, but AFAICT, it didn't actually lead to real deadlocks NAND: - brcmnand driver: new driver supporting NAND controller found originally on Broadcom STB SoCs (BCM7xxx), but now also found on BCM63xxx, iProc (e.g., Cygnus, BCM5301x), BCM3xxx, and more - begin factoring out BBT code so it can be shared between traditional (parallel) NAND drivers and upcoming SPI NAND drivers (WIP) - add common DT-based init support, so nand_base can pick up some flash properties automatically, using established common NAND DT properties - mxc_nand: support 8-bit ECC - pxa3xx_nand: * fix build for ARM64 * use a jiffies-based timeout SPI NOR: - add a few new IDs - clear out some unnecessary entries - make sure SECT_4K flags are correct for all (?) entries Core: - fix mtd->usecount race conditions (BUG_ON()) - switch to modern PM ops Other: - CFI: save code space by de-inlining large functions - clean up some partition parser selection code across several drivers - various miscellaneous changes, mostly minor" * tag 'for-linus-20150623' of git://git.infradead.org/linux-mtd: (57 commits) mtd: docg3: Fix kasprintf() usage mtd: docg3: Don't leak docg3->bbt in error path mtd: nandsim: Fix kasprintf() usage mtd: cs553x_nand: Fix kasprintf() usage mtd: r852: Fix device_create_file() usage mtd: brcmnand: drop unnecessary initialization mtd: propagate error codes from add_mtd_device() mtd: diskonchip: remove two-phase partitioning / registration mtd: dc21285: use raw spinlock functions for nw_gpio_lock mtd: chips: fixup dependencies, to prevent build error mtd: cfi_cmdset_0002: Initialize datum before calling map_word_load_partial mtd: cfi: deinline large functions mtd: lantiq-flash: use default partition parsers mtd: plat_nand: use default partition probe mtd: nand: correct indentation within conditional mtd: remove incorrect file name mtd: blktrans: use better error code for unimplemented ioctl() mtd: maps: Spelling s/reseved/reserved/ mtd: blktrans: change blktrans_getgeo return value mtd: mxc_nand: generate nand_ecclayout for 8 bit ECC ...
2015-06-22Merge branch 'timers-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "A rather largish update for everything time and timer related: - Cache footprint optimizations for both hrtimers and timer wheel - Lower the NOHZ impact on systems which have NOHZ or timer migration disabled at runtime. - Optimize run time overhead of hrtimer interrupt by making the clock offset updates smarter - hrtimer cleanups and removal of restrictions to tackle some problems in sched/perf - Some more leap second tweaks - Another round of changes addressing the 2038 problem - First step to change the internals of clock event devices by introducing the necessary infrastructure - Allow constant folding for usecs/msecs_to_jiffies() - The usual pile of clockevent/clocksource driver updates The hrtimer changes contain updates to sched, perf and x86 as they depend on them plus changes all over the tree to cleanup API changes and redundant code, which got copied all over the place. The y2038 changes touch s390 to remove the last non 2038 safe code related to boot/persistant clock" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) clocksource: Increase dependencies of timer-stm32 to limit build wreckage timer: Minimize nohz off overhead timer: Reduce timer migration overhead if disabled timer: Stats: Simplify the flags handling timer: Replace timer base by a cpu index timer: Use hlist for the timer wheel hash buckets timer: Remove FIFO "guarantee" timers: Sanitize catchup_timer_jiffies() usage hrtimer: Allow hrtimer::function() to free the timer seqcount: Introduce raw_write_seqcount_barrier() seqcount: Rename write_seqcount_barrier() hrtimer: Fix hrtimer_is_queued() hole hrtimer: Remove HRTIMER_STATE_MIGRATE selftest: Timers: Avoid signal deadlock in leap-a-day timekeeping: Copy the shadow-timekeeper over the real timekeeper last clockevents: Check state instead of mode in suspend/resume path selftests: timers: Add leap-second timer edge testing to leap-a-day.c ntp: Do leapsecond adjustment in adjtimex read path time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge ntp: Introduce and use SECS_PER_DAY macro instead of 86400 ...
2015-06-22Merge branch 'locking-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "The main changes are: - 'qspinlock' support, enabled on x86: queued spinlocks - these are now the spinlock variant used by x86 as they outperform ticket spinlocks in every category. (Waiman Long) - 'pvqspinlock' support on x86: paravirtualized variant of queued spinlocks. (Waiman Long, Peter Zijlstra) - 'qrwlock' support, enabled on x86: queued rwlocks. Similar to queued spinlocks, they are now the variant used by x86: CONFIG_ARCH_USE_QUEUED_SPINLOCKS=y CONFIG_QUEUED_SPINLOCKS=y CONFIG_ARCH_USE_QUEUED_RWLOCKS=y CONFIG_QUEUED_RWLOCKS=y - various lockdep fixlets - various locking primitives cleanups, further WRITE_ONCE() propagation" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) locking/lockdep: Remove hard coded array size dependency locking/qrwlock: Don't contend with readers when setting _QW_WAITING lockdep: Do not break user-visible string locking/arch: Rename set_mb() to smp_store_mb() locking/arch: Add WRITE_ONCE() to set_mb() rtmutex: Warn if trylock is called from hard/softirq context arch: Remove __ARCH_HAVE_CMPXCHG locking/rtmutex: Drop usage of __HAVE_ARCH_CMPXCHG locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS locking/pvqspinlock: Rename QUEUED_SPINLOCK to QUEUED_SPINLOCKS locking/pvqspinlock: Replace xchg() by the more descriptive set_mb() locking/pvqspinlock, x86: Enable PV qspinlock for Xen locking/pvqspinlock, x86: Enable PV qspinlock for KVM locking/pvqspinlock, x86: Implement the paravirt qspinlock call patching locking/pvqspinlock: Implement simple paravirt support for the qspinlock locking/qspinlock: Revert to test-and-set on hypervisors locking/qspinlock: Use a simple write to grab the lock locking/qspinlock: Optimize for smaller NR_CPUS locking/qspinlock: Extract out code snippets for the next patch locking/qspinlock: Add pending bit ...
2015-06-22Merge branch 'for-linus-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "In this pile: pathname resolution rewrite. - recursion in link_path_walk() is gone. - nesting limits on symlinks are gone (the only limit remaining is that the total amount of symlinks is no more than 40, no matter how nested). - "fast" (inline) symlinks are handled without leaving rcuwalk mode. - stack footprint (independent of the nesting) is below kilobyte now, about on par with what it used to be with one level of nested symlinks and ~2.8 times lower than it used to be in the worst case. - struct nameidata is entirely private to fs/namei.c now (not even opaque pointers are being passed around). - ->follow_link() and ->put_link() calling conventions had been changed; all in-tree filesystems converted, out-of-tree should be able to follow reasonably easily. For out-of-tree conversions, see Documentation/filesystems/porting for details (and in-tree filesystems for examples of conversion). That has sat in -next since mid-May, seems to survive all testing without regressions and merges clean with v4.1" * 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (131 commits) turn user_{path_at,path,lpath,path_dir}() into static inlines namei: move saved_nd pointer into struct nameidata inline user_path_create() inline user_path_parent() namei: trim do_last() arguments namei: stash dfd and name into nameidata namei: fold path_cleanup() into terminate_walk() namei: saner calling conventions for filename_parentat() namei: saner calling conventions for filename_create() namei: shift nameidata down into filename_parentat() namei: make filename_lookup() reject ERR_PTR() passed as name namei: shift nameidata inside filename_lookup() namei: move putname() call into filename_lookup() namei: pass the struct path to store the result down into path_lookupat() namei: uninline set_root{,_rcu}() namei: be careful with mountpoint crossings in follow_dotdot_rcu() Documentation: remove outdated information from automount-support.txt get rid of assorted nameidata-related debris lustre: kill unused helper lustre: kill unused macro (LOOKUP_CONTINUE) ...
2015-06-19seqcount: Rename write_seqcount_barrier()Peter Zijlstra
I'll shortly be introducing another seqcount primitive that's useful to provide ordering semantics and would like to use the write_seqcount_barrier() name for that. Seeing how there's only one user of the current primitive, lets rename it to invalidate, as that appears what its doing. While there, employ lockdep_assert_held() instead of assert_spin_locked() to not generate debug code for regular kernels. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: ktkhai@parallels.com Cc: rostedt@goodmis.org Cc: juri.lelli@gmail.com Cc: pang.xunlei@linaro.org Cc: Oleg Nesterov <oleg@redhat.com> Cc: wanpeng.li@linux.intel.com Cc: Paul McKenney <paulmck@linux.vnet.ibm.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: umgwanakikbuti@gmail.com Link: http://lkml.kernel.org/r/20150611124743.279926217@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
2015-06-02vfs: read file_handle only once in handle_to_pathSasha Levin
We used to read file_handle twice. Once to get the amount of extra bytes, and once to fetch the entire structure. This may be problematic since we do size verifications only after the first read, so if the number of extra bytes changes in userspace between the first and second calls, we'll have an incoherent view of file_handle. Instead, read the constant size once, and copy that over to the final structure without having to re-read it again. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/phy/amd-xgbe-phy.c drivers/net/wireless/iwlwifi/Kconfig include/net/mac80211.h iwlwifi/Kconfig and mac80211.h were both trivial overlapping changes. The drivers/net/phy/amd-xgbe-phy.c file got removed in 'net-next' and the bug fix that happened on the 'net' side is already integrated into the rest of the amd-xgbe driver. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-31Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fix from Al Viro: "Off-by-one in d_walk()/__dentry_kill() race fix. It's very hard to hit; possible in the same conditions as the original bug, except that you need the skipped branch to contain all the remaining evictables, so that the d_walk()-calling loop in d_invalidate() decides there's nothing more to do and doesn't go for another pass - otherwise that next pass will sweep the sucker. So it's not too urgent, but seeing that the fix is obvious and the original commit has spread into all -stable branches..." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: d_walk() might skip too much
2015-05-29Merge tag 'xfs-for-linus-4.1-rc6' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs Pull xfs fixes from Dave Chinner: "This is a little larger than I'd like late in the release cycle, but all the fixes are for regressions introduced in the 4.1-rc1 merge, or are needed back in -stable kernels fairly quickly as they are filesystem corruption or userspace visible correctness issues. Changes in this update: - regression fix for new rename whiteout code - regression fixes for new superblock generic per-cpu counter code - fix for incorrect error return sign introduced in 3.17 - metadata corruption fixes that need to go back to -stable kernels" * tag 'xfs-for-linus-4.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: xfs: fix broken i_nlink accounting for whiteout tmpfile inode xfs: xfs_iozero can return positive errno xfs: xfs_attr_inactive leaves inconsistent attr fork state behind xfs: extent size hints can round up extents past MAXEXTLEN xfs: inode and free block counters need to use __percpu_counter_compare percpu_counter: batch size aware __percpu_counter_compare() xfs: use percpu_counter_read_positive for mp->m_icount
2015-05-28d_walk() might skip too muchAl Viro
when we find that a child has died while we'd been trying to ascend, we should go into the first live sibling itself, rather than its sibling. Off-by-one in question had been introduced in "deal with deadlock in d_walk()" and the fix needs to be backported to all branches this one has been backported to. Cc: stable@vger.kernel.org # 3.2 and later Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-28omfs: fix potential integer overflow in allocatorBob Copeland
Both 'i' and 'bits_per_entry' are signed integers but the result is a u64 block number. Cast i to u64 to avoid truncation on 32-bit targets. Found by Coverity (CID 200679). Signed-off-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-28omfs: fix sign confusion for bitmap loop counterBob Copeland
The count variable is used to iterate down to (below) zero from the size of the bitmap and handle the one-filling the remainder of the last partial bitmap block. The loop conditional expects count to be signed in order to detect when the final block is processed, after which count goes negative. Unfortunately, a recent change made this unsigned along with some other related fields. The result of is this is that during mount, omfs_get_imap will overrun the bitmap array and corrupt memory unless number of blocks happens to be a multiple of 8 * blocksize. Fix by changing count back to signed: it is guaranteed to fit in an s32 without overflow due to an enforced limit on the number of blocks in the filesystem. Signed-off-by: Bob Copeland <me@bobcopeland.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-28omfs: set error return when d_make_root() failsBob Copeland
A static checker found the following issue in the error path for omfs_fill_super: fs/omfs/inode.c:552 omfs_fill_super() warn: missing error code here? 'd_make_root()' failed. 'ret' = '0' Fix by returning -ENOMEM in this case. Signed-off-by: Bob Copeland <me@bobcopeland.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-28fs, omfs: add NULL terminator in the end up the token listSasha Levin
match_token() expects a NULL terminator at the end of the token list so that it would know where to stop. Not having one causes it to overrun to invalid memory. In practice, passing a mount option that omfs didn't recognize would sometimes panic the system. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Bob Copeland <me@bobcopeland.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-28fs/binfmt_elf.c:load_elf_binary(): return -EINVAL on zero-length mappingsAndrew Morton
load_elf_binary() returns `retval', not `error'. Fixes: a87938b2e246b81b4fb ("fs/binfmt_elf.c: fix bug in loading of PIE binaries") Reported-by: James Hogan <james.hogan@imgtec.com> Cc: Michael Davidson <md@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-05-29xfs: fix broken i_nlink accounting for whiteout tmpfile inodeBrian Foster
XFS uses the internal tmpfile() infrastructure for the whiteout inode used for RENAME_WHITEOUT operations. For tmpfile inodes, XFS allocates the inode, drops di_nlink, adds the inode to the agi unlinked list, calls d_tmpfile() which correspondingly drops i_nlink of the vfs inode, and then finishes the common inode setup (e.g., clear I_NEW and unlock). The d_tmpfile() call was originally made inxfs_create_tmpfile(), but was pulled up out of that function as part of the following commit to resolve a deadlock issue: 330033d6 xfs: fix tmpfile/selinux deadlock and initialize security As a result, callers of xfs_create_tmpfile() are responsible for either calling d_tmpfile() or fixing up i_nlink appropriately. The whiteout tmpfile allocation helper does neither. As a result, the vfs ->i_nlink becomes inconsistent with the on-disk ->di_nlink once xfs_rename() links it back into the source dentry and calls xfs_bumplink(). Update the assert in xfs_rename() to help detect this problem in the future and update xfs_rename_alloc_whiteout() to decrement the link count as part of the manual tmpfile inode setup. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: xfs_iozero can return positive errnoDave Chinner
It was missed when we converted everything in XFs to use negative error numbers, so fix it now. Bug introduced in 3.17 by commit 2451337 ("xfs: global error sign conversion"), and should go back to stable kernels. Thanks to Brian Foster for noticing it. cc: <stable@vger.kernel.org> # 3.17, 3.18, 3.19, 4.0 Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: xfs_attr_inactive leaves inconsistent attr fork state behindDave Chinner
xfs_attr_inactive() is supposed to clean up the attribute fork when the inode is being freed. While it removes attribute fork extents, it completely ignores attributes in local format, which means that there can still be active attributes on the inode after xfs_attr_inactive() has run. This leads to problems with concurrent inode writeback - the in-core inode attribute fork is removed without locking on the assumption that nothing will be attempting to access the attribute fork after a call to xfs_attr_inactive() because it isn't supposed to exist on disk any more. To fix this, make xfs_attr_inactive() completely remove all traces of the attribute fork from the inode, regardless of it's state. Further, also remove the in-core attribute fork structure safely so that there is nothing further that needs to be done by callers to clean up the attribute fork. This means we can remove the in-core and on-disk attribute forks atomically. Also, on error simply remove the in-memory attribute fork. There's nothing that can be done with it once we have failed to remove the on-disk attribute fork, so we may as well just blow it away here anyway. cc: <stable@vger.kernel.org> # 3.12 to 4.0 Reported-by: Waiman Long <waiman.long@hp.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: extent size hints can round up extents past MAXEXTLENDave Chinner
This results in BMBT corruption, as seen by this test: # mkfs.xfs -f -d size=40051712b,agcount=4 /dev/vdc .... # mount /dev/vdc /mnt/scratch # xfs_io -ft -c "extsize 16m" -c "falloc 0 30g" -c "bmap -vp" /mnt/scratch/foo which results in this failure on a debug kernel: XFS: Assertion failed: (blockcount & xfs_mask64hi(64-BMBT_BLOCKCOUNT_BITLEN)) == 0, file: fs/xfs/libxfs/xfs_bmap_btree.c, line: 211 .... Call Trace: [<ffffffff814cf0ff>] xfs_bmbt_set_allf+0x8f/0x100 [<ffffffff814cf18d>] xfs_bmbt_set_all+0x1d/0x20 [<ffffffff814f2efe>] xfs_iext_insert+0x9e/0x120 [<ffffffff814c7956>] ? xfs_bmap_add_extent_hole_real+0x1c6/0xc70 [<ffffffff814c7956>] xfs_bmap_add_extent_hole_real+0x1c6/0xc70 [<ffffffff814caaab>] xfs_bmapi_write+0x72b/0xed0 [<ffffffff811c72ac>] ? kmem_cache_alloc+0x15c/0x170 [<ffffffff814fe070>] xfs_alloc_file_space+0x160/0x400 [<ffffffff81ddcc29>] ? down_write+0x29/0x60 [<ffffffff815063eb>] xfs_file_fallocate+0x29b/0x310 [<ffffffff811d2bc8>] ? __sb_start_write+0x58/0x120 [<ffffffff811e3e18>] ? do_vfs_ioctl+0x318/0x570 [<ffffffff811cd680>] vfs_fallocate+0x140/0x260 [<ffffffff811ce6f8>] SyS_fallocate+0x48/0x80 [<ffffffff81ddec09>] system_call_fastpath+0x12/0x17 The tracepoint that indicates the extent that triggered the assert failure is: xfs_iext_insert: idx 0 offset 0 block 16777224 count 2097152 flag 1 Clearly indicating that the extent length is greater than MAXEXTLEN, which is 2097151. A prior trace point shows the allocation was an exact size match and that a length greater than MAXEXTLEN was asked for: xfs_alloc_size_done: agno 1 agbno 8 minlen 2097152 maxlen 2097152 ^^^^^^^ ^^^^^^^ We don't see this problem with extent size hints through the IO path because we can't do single IOs large enough to trigger MAXEXTLEN allocation. fallocate(), OTOH, is not limited in it's allocation sizes and so needs help here. The issue is that the extent size hint alignment is rounding up the extent size past MAXEXTLEN, because xfs_bmapi_write() is not taking into account extent size hints when calculating the maximum extent length to allocate. xfs_bmapi_reserve_delalloc() is already doing this, but direct extent allocation is not. Unfortunately, the calculation in xfs_bmapi_reserve_delalloc() is wrong, and it works only because delayed allocation extents are not limited in size to MAXEXTLEN in the in-core extent tree. hence this calculation does not work for direct allocation, and the delalloc code needs fixing. This may, in fact be the underlying bug that occassionally causes transaction overruns in delayed allocation extent conversion, so now we know it's wrong we should fix it, too. Many thanks to Brian Foster for finding this problem during review of this patch. Hence the fix, after much code reading, is to allow xfs_bmap_extsize_align() to align partial extents when full alignment would extend the alignment past MAXEXTLEN. We can safely do this because all callers have higher layer allocation loops that already handle short allocations, and so will simply run another allocation to cover the remainder of the requested allocation range that we ignored during alignment. The advantage of this approach is that it also removes the need for callers to do anything other than limit their requests to MAXEXTLEN - they don't really need to be aware of extent size hints at all. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: inode and free block counters need to use __percpu_counter_compareDave Chinner
Because the counters use a custom batch size, the comparison functions need to be aware of that batch size otherwise the comparison does not work correctly. This leads to ASSERT failures on generic/027 like this: XFS: Assertion failed: 0, file: fs/xfs/xfs_mount.c, line: 1099 ------------[ cut here ]------------ .... Call Trace: [<ffffffff81522a39>] xfs_mod_icount+0x99/0xc0 [<ffffffff815285cb>] xfs_trans_unreserve_and_mod_sb+0x28b/0x5b0 [<ffffffff8152f941>] xfs_log_commit_cil+0x321/0x580 [<ffffffff81528e17>] xfs_trans_commit+0xb7/0x260 [<ffffffff81503d4d>] xfs_bmap_finish+0xcd/0x1b0 [<ffffffff8151da41>] xfs_inactive_ifree+0x1e1/0x250 [<ffffffff8151dbe0>] xfs_inactive+0x130/0x200 [<ffffffff81523a21>] xfs_fs_evict_inode+0x91/0xf0 [<ffffffff811f3958>] evict+0xb8/0x190 [<ffffffff811f433b>] iput+0x18b/0x1f0 [<ffffffff811e8853>] do_unlinkat+0x1f3/0x320 [<ffffffff811d548a>] ? filp_close+0x5a/0x80 [<ffffffff811e999b>] SyS_unlinkat+0x1b/0x40 [<ffffffff81e0892e>] system_call_fastpath+0x12/0x71 This is a regression introduced by commit 501ab32 ("xfs: use generic percpu counters for inode counter"). This patch fixes the same problem for both the inode counter and the free block counter in the superblocks. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-29xfs: use percpu_counter_read_positive for mp->m_icountGeorge Wang
Function percpu_counter_read just return the current counter, which can be negative. This will cause the checking of "allocated inode counts <= m_maxicount" false positive. Use percpu_counter_read_positive can solve this problem, and be consistent with the purpose to introduce percpu mechanism to xfs. Signed-off-by: George Wang <xuw2015@gmail.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-05-27Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds
Pull cifs fixes from Steve French: "Back from SambaXP - now have 8 small CIFS bug fixes to merge" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: CIFS: Fix race condition on RFC1002_NEGATIVE_SESSION_RESPONSE Fix to convert SURROGATE PAIR cifs: potential missing check for posix_lock_file_wait Fix to check Unique id and FileType when client refer file directly. CIFS: remove an unneeded NULL check [cifs] fix null pointer check Fix that several functions handle incorrect value of mapchars cifs: Don't replace dentries for dfs mounts
2015-05-27Merge branch 'overlayfs-next' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs Pull two overlayfs fixes from Miklos Szeredi: "Overlayfs rmdir() failed to check for emptiness in one case; this was introduced in 4.0. The other bug was there since day one: failure to mount if upper fs is full, which bit some OpenWRT folks" * 'overlayfs-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: ovl: mount read-only if workdir can't be created ovl: don't remove non-empty opaque directory
2015-05-25net: af_unix: implement splice for stream af_unix socketsHannes Frederic Sowa
unix_stream_recvmsg is refactored to unix_stream_read_generic in this patch and enhanced to deal with pipe splicing. The refactoring is inneglible, we mostly have to deal with a non-existing struct msghdr argument. Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-23Merge branch 'for-linus-4.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "I fixed up a regression from 4.0 where conversion between different raid levels would sometimes bail out without converting. Filipe tracked down a race where it was possible to double allocate chunks on the drive. Mark has a fix for fiemap. All three will get bundled off for stable as well" * 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix regression in raid level conversion Btrfs: fix racy system chunk allocation when setting block group ro btrfs: clear 'ret' in btrfs_check_shared() loop
2015-05-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/ethernet/cadence/macb.c drivers/net/phy/phy.c include/linux/skbuff.h net/ipv4/tcp.c net/switchdev/switchdev.c Switchdev was a case of RTNH_H_{EXTERNAL --> OFFLOAD} renaming overlapping with net-next changes of various sorts. phy.c was a case of two changes, one adding a local variable to a function whilst the second was removing one. tcp.c overlapped a deadlock fix with the addition of new tcp_info statistic values. macb.c involved the addition of two zyncq device entries. skbuff.h involved adding back ipv4_daddr to nf_bridge_info whilst net-next changes put two other existing members of that struct into a union. Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-20CIFS: Fix race condition on RFC1002_NEGATIVE_SESSION_RESPONSEFederico Sauter
This patch fixes a race condition that occurs when connecting to a NT 3.51 host without specifying a NetBIOS name. In that case a RFC1002_NEGATIVE_SESSION_RESPONSE is received and the SMB negotiation is reattempted, but under some conditions it leads SendReceive() to hang forever while waiting for srv_mutex. This, in turn, sets the calling process to an uninterruptible sleep state and makes it unkillable. The solution is to unlock the srv_mutex acquired in the demux thread *before* going to sleep (after the reconnect error) and before reattempting the connection.
2015-05-20Fix to convert SURROGATE PAIRNakajima Akira
Garbled characters happen by using surrogate pair for filename. (replace each 1 character to ??) [Steps to Reproduce for bug] client# touch $(echo -e '\xf0\x9d\x9f\xa3') client# touch $(echo -e '\xf0\x9d\x9f\xa4') client# ls -li You see same inode number, same filename(=?? and ??) . Fix the bug about these functions do not consider about surrogate pair (and IVS). cifs_utf16_bytes() cifs_mapchar() cifs_from_utf16() cifsConvertToUTF16() Reported-by: Nakajima Akira <nakajima.akira@nttcom.co.jp> Signed-off-by: Nakajima Akira <nakajima.akira@nttcom.co.jp> Signed-off-by: Steve French <smfrench@gmail.com>
2015-05-20cifs: potential missing check for posix_lock_file_waitChengyu Song
posix_lock_file_wait may fail under certain circumstances, and its result is usually checked/returned. But given the complexity of cifs, I'm not sure if the result is intentially left unchecked and always expected to succeed. Signed-off-by: Chengyu Song <csong84@gatech.edu> Acked-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Steve French <smfrench@gmail.com>
2015-05-20Fix to check Unique id and FileType when client refer file directly.Nakajima Akira
When you refer file directly on cifs client, (e.g. ls -li <filename>, cd <dir>, stat <filename>) the function return old inode number and filetype from old inode cache, though server has different inode number or filetype. When server is Windows, cifs client has same problem. When Server is Windows , This patch fixes bug in different filetype, but does not fix bug in different inode number. Because QUERY_PATH_INFO response by Windows does not include inode number(Index Number) . BUG INFO https://bugzilla.kernel.org/show_bug.cgi?id=90021 https://bugzilla.kernel.org/show_bug.cgi?id=90031 Reported-by: Nakajima Akira <nakajima.akira@nttcom.co.jp> Signed-off-by: Nakajima Akira <nakajima.akira@nttcom.co.jp> Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Signed-off-by: Steve French <smfrench@gmail.com>
2015-05-20Btrfs: fix regression in raid level conversionChris Mason
Commit 2f0810880f082fa8ba66ab2c33b02e4ff9770a5e changed btrfs_set_block_group_ro to avoid trying to allocate new chunks with the new raid profile during conversion. This fixed failures when there was no space on the drive to allocate a new chunk, but the metadata reserves were sufficient to continue the conversion. But this ended up causing a regression when the drive had plenty of space to allocate new chunks, mostly because reduce_alloc_profile isn't using the new raid profile. Fixing btrfs_reduce_alloc_profile is a bigger patch. For now, do a partial revert of 2f0810880, and don't error out if we hit ENOSPC. Signed-off-by: Chris Mason <clm@fb.com> Tested-by: Dave Sterba <dsterba@suse.cz> Reported-by: Holger Hoffstaette <holger.hoffstaette@googlemail.com>
2015-05-20CIFS: remove an unneeded NULL checkDan Carpenter
Smatch complains because we dereference "ses->server" without checking some lines earlier inside the call to get_next_mid(ses->server). fs/cifs/cifssmb.c:4921 CIFSGetDFSRefer() warn: variable dereferenced before check 'ses->server' (see line 4899) There is only one caller for this function get_dfs_path() and it always passes a non-null "ses->server" pointer so this NULL check can be removed. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <smfrench@gmail.com>
2015-05-20[cifs] fix null pointer checkSteve French
Dan Carpenter pointed out an inconsistent null pointer check in smb2_hdr_assemble that was pointed out by static checker. Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Sachin Prabhu <sprabhu@redhat.com> CC: Dan Carpenter <dan.carpenter@oracle.com>w
2015-05-19Btrfs: fix racy system chunk allocation when setting block group roFilipe Manana
If while setting a block group read-only we end up allocating a system chunk, through check_system_chunk(), we were not doing it while holding the chunk mutex which is a problem if a concurrent chunk allocation is happening, through do_chunk_alloc(), as it means both block groups can end up using the same logical addresses and physical regions in the device(s). So make sure we hold the chunk mutex. Cc: stable@vger.kernel.org # 4.0+ Fixes: 2f0810880f08 ("btrfs: delete chunk allocation attemp when setting block group ro") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-05-19btrfs: clear 'ret' in btrfs_check_shared() loopMark Fasheh
btrfs_check_shared() is leaking a return value of '1' from find_parent_nodes(). As a result, callers (in this case, extent_fiemap()) are told extents are shared when they are not. This in turn broke fiemap on btrfs for kernels v3.18 and up. The fix is simple - we just have to clear 'ret' after we are done processing the results of find_parent_nodes(). It wasn't clear to me at first what was happening with return values in btrfs_check_shared() and find_parent_nodes() - thanks to Josef for the help on irc. I added documentation to both functions to make things more clear for the next hacker who might come across them. If we could queue this up for -stable too that would be great. Signed-off-by: Mark Fasheh <mfasheh@suse.de> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com>
2015-05-19Merge tag 'nfs-for-4.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds
Pull two NFS client bugfixes from Trond Myklebust: "Highlights include: - fix a Linux-4.1 regression affecting stat() - take an extra reference to fl->fl_file when running a setlk" * tag 'nfs-for-4.1-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: nfs: take extra reference to fl->fl_file when running a setlk nfs: stat(2) fails during cthon04 basic test5 on NFSv4.0
2015-05-19ovl: mount read-only if workdir can't be createdMiklos Szeredi
OpenWRT folks reported that overlayfs fails to mount if upper fs is full, because workdir can't be created. Wordir creation can fail for various other reasons too. There's no reason that the mount itself should fail, overlayfs can work fine without a workdir, as long as the overlay isn't modified. So mount it read-only and don't allow remounting read-write. Add a couple of WARN_ON()s for the impossible case of workdir being used despite being read-only. Reported-by: Bastian Bittorf <bittorf@bluebottle.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: <stable@vger.kernel.org> # v3.18+
2015-05-19locking/arch: Rename set_mb() to smp_store_mb()Peter Zijlstra
Since set_mb() is really about an smp_mb() -- not a IO/DMA barrier like mb() rename it to match the recent smp_load_acquire() and smp_store_release(). Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-18Merge tag 'v4.1-rc4' into MTD's -nextBrian Norris
2015-05-16Merge branch 'for-linus-4.1-rc4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML hostfs fix from Richard Weinberger: "This contains a single fix for a regression introduced in 4.1-rc1" * 'for-linus-4.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: hostfs: Use correct mask for file mode
2015-05-16Merge tag 'for_linus_stable' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Fix a number of ext4 bugs; the most serious of which is a bug in the lazytime mount optimization code where we could end up updating the timestamps to the wrong inode" * tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix an ext3 collapse range regression in xfstests jbd2: fix r_count overflows leading to buffer overflow in journal recovery ext4: check for zero length extent explicitly ext4: fix NULL pointer dereference when journal restart fails ext4: remove unused function prototype from ext4.h ext4: don't save the error information if the block device is read-only ext4: fix lazytime optimization
2015-05-16Merge branch 'for-linus-4.1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "The first commit is a fix from Filipe for a very old extent buffer reuse race that triggered a BUG_ON. It hasn't come up often, I looked through old logs at FB and we hit it a handful of times over the last year. The rest are other corners he hit during testing" * 'for-linus-4.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: fix race when reusing stale extent buffers that leads to BUG_ON Btrfs: fix race between block group creation and their cache writeout Btrfs: fix panic when starting bg cache writeout after IO error Btrfs: fix crash after inode cache writeback failure
2015-05-15Merge branch 'parisc-4.1-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: "One important patch which fixes crashes due to stack randomization on architectures where the stack grows upwards (currently parisc and metag only). This bug went unnoticed on parisc since kernel 3.14 where the flexible mmap memory layout support was added by commit 9dabf60dc4ab. The changes in fs/exec.c are inside an #ifdef CONFIG_STACK_GROWSUP section and will not affect other platforms. The other two patches rename args of the kthread_arg() function and fixes a printk output" * 'parisc-4.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc,metag: Fix crashes due to stack randomization on stack-grows-upwards architectures parisc: copy_thread(): rename 'arg' argument to 'kthread_arg' parisc: %pf is only for function pointers
2015-05-15turn user_{path_at,path,lpath,path_dir}() into static inlinesAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: move saved_nd pointer into struct nameidataAl Viro
these guys are always declared next to each other; might as well put the former (pointer to previous instance) into the latter and simplify the calling conventions for {set,restore}_nameidata() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15inline user_path_create()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15inline user_path_parent()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-15namei: trim do_last() argumentsAl Viro
now that struct filename is stashed in nameidata we have no need to pass it in Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>