summaryrefslogtreecommitdiff
path: root/tools
AgeCommit message (Collapse)Author
2020-06-04Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge yet more updates from Andrew Morton: - More MM work. 100ish more to go. Mike Rapoport's "mm: remove __ARCH_HAS_5LEVEL_HACK" series should fix the current ppc issue - Various other little subsystems * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (127 commits) lib/ubsan.c: fix gcc-10 warnings tools/testing/selftests/vm: remove duplicate headers selftests: vm: pkeys: fix multilib builds for x86 selftests: vm: pkeys: use the correct page size on powerpc selftests/vm/pkeys: override access right definitions on powerpc selftests/vm/pkeys: test correct behaviour of pkey-0 selftests/vm/pkeys: introduce a sub-page allocator selftests/vm/pkeys: detect write violation on a mapped access-denied-key page selftests/vm/pkeys: associate key on a mapped page and detect write violation selftests/vm/pkeys: associate key on a mapped page and detect access violation selftests/vm/pkeys: improve checks to determine pkey support selftests/vm/pkeys: fix assertion in test_pkey_alloc_exhaust() selftests/vm/pkeys: fix number of reserved powerpc pkeys selftests/vm/pkeys: introduce powerpc support selftests/vm/pkeys: introduce generic pkey abstractions selftests: vm: pkeys: use the correct huge page size selftests/vm/pkeys: fix alloc_random_pkey() to make it really random selftests/vm/pkeys: fix assertion in pkey_disable_set/clear() selftests/vm/pkeys: fix pkey_disable_clear() selftests: vm: pkeys: add helpers for pkey bits ...
2020-06-04tools/testing/selftests/vm: remove duplicate headersJagadeesh Pagadala
Code cleanup: Remove duplicate headers which are included twice. Signed-off-by: Jagadeesh Pagadala <jagdsh.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Shuah Khan <shuah@kernel.org> Cc: Brian Geffon <bgeffon@google.com> Link: http://lkml.kernel.org/r/1587278984-18847-1-git-send-email-jagdsh.linux@gmail.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests: vm: pkeys: fix multilib builds for x86Sandipan Das
This ensures that both 32-bit and 64-bit binaries are generated when this is built on a x86_64 system. Most of the changes have been borrowed from tools/testing/selftests/x86/Makefile. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Dave Hansen <dave.hansen@intel.com> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/0326a442214d7a1b970d38296e63df3b217f5912.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests: vm: pkeys: use the correct page size on powerpcSandipan Das
Both 4K and 64K pages are supported on powerpc. Parts of the selftest code perform alignment computations based on the PAGE_SIZE macro which is currently hardcoded to 64K for powerpc. This causes some test failures on kernels configured with 4K page size. In some cases, we need to enforce function alignment on page size. Since this can only be done at build time, 64K is used as the alignment factor as that also ensures 4K alignment. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/5dcdfbf3353acdc90f315172e800b49f5ca21299.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: override access right definitions on powerpcRam Pai
Some platforms hardcode the x86 values for PKEY_DISABLE_ACCESS and PKEY_DISABLE_WRITE such as those in: /usr/include/bits/mman-shared.h. This overrides the definitions with correct values for powerpc. [sandipan@linux.ibm.com: fix powerpc access right definitions] Link: http://lkml.kernel.org/r/1ba86fd8a94f38131cfe2d9f277001dd1ad1d34e.1588959697.git.sandipan@linux.ibm.com Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/f6eb38cb3a1e12eb2cdc9da6300bc5a5dfba0db9.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: test correct behaviour of pkey-0Ram Pai
Ensure that pkey-0 is allocated on start and that it can be attached dynamically in various modes, without failures. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/9b7c54a9b4261894fe0c7e884c70b87214ff8fbb.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: introduce a sub-page allocatorRam Pai
This introduces a new allocator that allocates 4K hardware pages to back 64K linux pages. This allocator is available only on powerpc. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/c4a82fa962ec71015b994fab1aaf83bdfd091553.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: detect write violation on a mapped access-denied-key pageRam Pai
Detect write-violation on a page to which access-disabled key is associated much after the page is mapped. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/6a7dd4069ee18a2a51b207a55aa197f3f3c59753.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: associate key on a mapped page and detect write violationRam Pai
Detect write-violation on a page to which write-disabled key is associated much after the page is mapped. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/6bfe3b3832f8bcfb07d7f2cf116b45197f4587dd.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: associate key on a mapped page and detect access violationRam Pai
Detect access-violation on a page to which access-disabled key is associated much after the page is mapped. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/4a19cf9252c03dd883887e9002881599e6900d06.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: improve checks to determine pkey supportRam Pai
For the pkeys subsystem to work, both the CPU and the kernel need to have support. So, additionally check if the kernel supports pkeys apart from the CPU feature checks. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/8fb76c63ebdadcf068ecd2d23731032e195cd364.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: fix assertion in test_pkey_alloc_exhaust()Ram Pai
Some pkeys which are valid on the hardware are reserved and not available for application use. These keys cannot be allocated. test_pkey_alloc_exhaust() tries to account for these and has an assertion which validates if all available pkeys have been exahaustively allocated. However, the expression that is currently used is only valid for x86. On powerpc, a pkey is additionally reserved as compared to x86. Hence, the assertion is made to use an arch-specific helper to get the correct count of reserved pkeys. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/38b08d0318820ae46af3aa6048384fd8056c3df7.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: fix number of reserved powerpc pkeysDesnes A. Nunes do Rosario
The number of reserved pkeys in a PowerNV environment is different from that on PowerVM or KVM. Tested on PowerVM and PowerNV environments. Signed-off-by: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/0341a0ca961166814b44c9e724774672c18d54ca.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: introduce powerpc supportRam Pai
This makes use of the abstractions added earlier and introduces support for powerpc. For powerpc, after receiving the SIGSEGV, the signal handler must explicitly restore access permissions for the faulting pkey to allow the test to continue. As this makes use of pkey_access_allow(), all of its dependencies and other similar functions have been moved ahead of the signal handler. [sandipan@linux.ibm.com: fix powerpc access right updates] Link: http://lkml.kernel.org/r/5f65cf37be993760de8112a88da194e3ccbb2bf8.1588959697.git.sandipan@linux.ibm.com Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/b121e9fd33789ed9195276e32fe4e80bb6b88a31.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: introduce generic pkey abstractionsRam Pai
This introduces some generic abstractions and provides the corresponding architecture-specfic implementations for these abstractions. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/1c977915e69fb7767fb0dbd55ac7656554b15b93.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests: vm: pkeys: use the correct huge page sizeSandipan Das
The huge page size can vary across architectures. This will ensure that the correct huge page size is used when accessing the hugetlb controls under sysfs. Instead of using a hardcoded page size (i.e. 2MB), this now uses the HPAGE_SIZE macro which is arch-specific. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/66882a5d6e45c73c3a52bc4aef9754e48afa4f88.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: fix alloc_random_pkey() to make it really randomRam Pai
alloc_random_pkey() was allocating the same pkey every time. Not all pkeys were geting tested. This fixes it. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/0162f55816d4e783a0d6e49e554d0ab9a3c9a23b.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: fix assertion in pkey_disable_set/clear()Ram Pai
In some cases, a pkey's bits need not necessarily change in a way that the value of the pkey register increases when performing a pkey_disable_set() or decreases when performing a pkey_disable_clear(). For example, on powerpc, if a pkey's current state is PKEY_DISABLE_ACCESS and we perform a pkey_write_disable() on it, the bits still remain the same. We will observe something similar when the pkey's current state is 0 and a pkey_access_enable() is performed on it. Either case would cause some assertions to fail. This fixes the problem. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/8240665131e43fc93eed4eea8194676c1ea39a7f.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: fix pkey_disable_clear()Ram Pai
Currently, pkey_disable_clear() sets the specified bits instead clearing them. This has been dead code up to now because its only callers i.e. pkey_access/write_allow() are also unused. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/1f70bca60330a85dca42c3cd98212bb1cdf5a076.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests: vm: pkeys: add helpers for pkey bitsSandipan Das
This introduces some functions that help with setting or clearing bits of a particular pkey. This also adds an abstraction for getting a pkey's bit position in the pkey register as this may vary across architectures. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/2ad9705f4f68ca7e72155cc583415e5a979546f1.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests: vm: pkeys: Use sane types for pkey registerSandipan Das
The size of the pkey register can vary across architectures. This converts the data type of all its references to u64 in preparation for multi-arch support. To keep the definition of the u64 type consistent and remove format specifier related warnings, __SANE_USERSPACE_TYPES__ is defined as suggested by Michael Ellerman. Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/d3e271798455d940e395e56e1ff1e82a31bcb7aa.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: make gcc check arguments of sigsafe_printf()Thiago Jung Bauermann
This will help us ensure we print pkey_reg_t values correctly in different architectures. Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ram Pai <linuxram@us.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/b40b7a95fdd4045d62530a2a34452934caf3b0bc.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: move some definitions to arch-specific headerThiago Jung Bauermann
In preparation for multi-arch support, move definitions which have arch-specific values to x86-specific header. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/d58eba2930059c8b209eefd6d5b48fe922a5b010.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: move generic definitions to header fileRam Pai
Moved all the generic definition and helper functions to the header file. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/57177f99e92a51295956715d5f2d5688a4d13927.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/vm/pkeys: rename all references to pkru to a generic nameRam Pai
This renames PKRU references to "pkey_reg" or "pkey" based on the usage. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/2c6970bc6d2e99796cd5cc1101bd2ecf7eccb937.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04selftests/x86/pkeys: move selftests to arch-neutral directoryRam Pai
Patch series "selftests, powerpc, x86: Memory Protection Keys", v19. Memory protection keys enables an application to protect its address space from inadvertent access by its own code. This feature is now enabled on powerpc and has been available since 4.16-rc1. The patches move the selftests to arch neutral directory and enhance their test coverage. Tested on powerpc64 and x86_64 (Skylake-SP). This patch (of 24): Move selftest files from tools/testing/selftests/x86/ to tools/testing/selftests/vm/. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.ibm.com> Signed-off-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@intel.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Florian Weimer <fweimer@redhat.com> Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Michal Suchanek <msuchanek@suse.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Shuah Khan <shuah@kernel.org> Link: http://lkml.kernel.org/r/14d25194c3e2e652e0047feec4487e269e76e8c9.1585646528.git.sandipan@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04lib: make a test module with set/clear bitJesse Brandeburg
Test some bit clears/sets to make sure assembly doesn't change, and that the set_bit and clear_bit functions work and don't cause sparse warnings. Instruct Kbuild to build this file with extra warning level -Wextra, to catch new issues, and also doesn't hurt to build with C=1. This was used to test changes to arch/x86/include/asm/bitops.h. In particular, sparse (C=1) was very concerned when the last bit before a natural boundary, like 7, or 31, was being tested, as this causes sign extension (0xffffff7f) for instance when clearing bit 7. Recommended usage: make defconfig scripts/config -m CONFIG_TEST_BITOPS make modules_prepare make C=1 W=1 lib/test_bitops.ko objdump -S -d lib/test_bitops.ko insmod lib/test_bitops.ko rmmod lib/test_bitops.ko <check dmesg>, there should be no compiler/sparse warnings and no error messages in log. Link: http://lkml.kernel.org/r/20200310221747.2848474-2-jesse.brandeburg@intel.com Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> CcL Ingo Molnar <mingo@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-04Merge branch 'exec-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull execve updates from Eric Biederman: "Last cycle for the Nth time I ran into bugs and quality of implementation issues related to exec that could not be easily be fixed because of the way exec is implemented. So I have been digging into exec and cleanup up what I can. I don't think I have exec sorted out enough to fix the issues I started with but I have made some headway this cycle with 4 sets of changes. - promised cleanups after introducing exec_update_mutex - trivial cleanups for exec - control flow simplifications - remove the recomputation of bprm->cred The net result is code that is a bit easier to understand and work with and a decrease in the number of lines of code (if you don't count the added tests)" * 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (24 commits) exec: Compute file based creds only once exec: Add a per bprm->file version of per_clear binfmt_elf_fdpic: fix execfd build regression selftests/exec: Add binfmt_script regression test exec: Remove recursion from search_binary_handler exec: Generic execfd support exec/binfmt_script: Don't modify bprm->buf and then return -ENOEXEC exec: Move the call of prepare_binprm into search_binary_handler exec: Allow load_misc_binary to call prepare_binprm unconditionally exec: Convert security_bprm_set_creds into security_bprm_repopulate_creds exec: Factor security_bprm_creds_for_exec out of security_bprm_set_creds exec: Teach prepare_exec_creds how exec treats uids & gids exec: Set the point of no return sooner exec: Move handling of the point of no return to the top level exec: Run sync_mm_rss before taking exec_update_mutex exec: Fix spelling of search_binary_handler in a comment exec: Move the comment from above de_thread to above unshare_sighand exec: Rename flush_old_exec begin_new_exec exec: Move most of setup_new_exec into flush_old_exec exec: In setup_new_exec cache current in the local variable me ...
2020-06-04Merge branch 'proc-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull proc updates from Eric Biederman: "This has four sets of changes: - modernize proc to support multiple private instances - ensure we see the exit of each process tid exactly - remove has_group_leader_pid - use pids not tasks in posix-cpu-timers lookup Alexey updated proc so each mount of proc uses a new superblock. This allows people to actually use mount options with proc with no fear of messing up another mount of proc. Given the kernel's internal mounts of proc for things like uml this was a real problem, and resulted in Android's hidepid mount options being ignored and introducing security issues. The rest of the changes are small cleanups and fixes that came out of my work to allow this change to proc. In essence it is swapping the pids in de_thread during exec which removes a special case the code had to handle. Then updating the code to stop handling that special case" * 'proc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: proc_pid_ns takes super_block as an argument remove the no longer needed pid_alive() check in __task_pid_nr_ns() posix-cpu-timers: Replace __get_task_for_clock with pid_for_clock posix-cpu-timers: Replace cpu_timer_pid_type with clock_pid_type posix-cpu-timers: Extend rcu_read_lock removing task_struct references signal: Remove has_group_leader_pid exec: Remove BUG_ON(has_group_leader_pid) posix-cpu-timer: Unify the now redundant code in lookup_task posix-cpu-timer: Tidy up group_leader logic in lookup_task proc: Ensure we see the exit of each process tid exactly once rculist: Add hlists_swap_heads_rcu proc: Use PIDTYPE_TGID in next_tgid Use proc_pid_ns() to get pid_namespace from the proc superblock proc: use named enums for better readability proc: use human-readable values for hidepid docs: proc: add documentation for "hidepid=4" and "subset=pid" options and new mount behavior proc: add option to mount only a pids subset proc: instantiate only pids that we can ptrace on 'hidepid=4' mount option proc: allow to mount many instances of proc in one pid namespace proc: rename struct proc_fs_info to proc_fs_opts
2020-06-04Merge tag 'perf-tools-2020-06-02' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tooling updates from Arnaldo Carvalho de Melo: "These are additional changes to the perf tools, on top of what Ingo already submitted. - Further Intel PT call-trace fixes - Improve SELinux docs and tool warnings - Fix race at exit in 'perf record' using eventfd. - Add missing build tests to the default set of 'make -C tools/perf build-test' - Sync msr-index.h getting new AMD MSRs to decode and filter in 'perf trace'. - Fix fallback to libaudit in 'perf trace' for arches not using per-arch *.tbl files. - Fixes for 'perf ftrace'. - Fixes and improvements for the 'perf stat' metrics. - Use dummy event to get PERF_RECORD_{FORK,MMAP,etc} while synthesizing those metadata events for pre-existing threads. - Fix leaks detected using clang tooling. - Improvements to PMU event metric testing. - Report summary for 'perf stat' interval mode at the end, summing up all the intervals. - Improve pipe mode, i.e. this now works as expected, continuously dumping samples: # perf record -g -e raw_syscalls:sys_enter | perf --no-pager script - Fixes for event grouping, detecting incompatible groups such as: # perf stat -e '{cycles,power/energy-cores/}' -v WARNING: group events cpu maps do not match, disabling group: anon group { power/energy-cores/, cycles } power/energy-cores/: 0 cycles: 0-7 - Fixes for 'perf probe': blacklist address checking, number of kretprobe instances, etc. - JIT processing improvements and fixes plus the addition of a 'perf test' entry for the java demangler. - Add support for synthesizing first/last level cache, TLB and remove access events from HW tracing in the auxtrace code, first to use is ARM SPE. - Vendor events updates and fixes, including for POWER9 and Intel. - Allow using ~/.perfconfig for removing the ',' separators in 'perf stat' output. - Opt-in support for libpfm4" * tag 'perf-tools-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (120 commits) perf tools: Remove some duplicated includes perf symbols: Fix kernel maps for kcore and eBPF tools arch x86: Sync the msr-index.h copy with the kernel sources perf stat: Ensure group is defined on top of the same cpu mask perf libdw: Fix off-by 1 relative directory includes perf arm-spe: Support synthetic events perf auxtrace: Add four itrace options perf tools: Move arm-spe-pkt-decoder.h/c to the new dir perf test: Initialize memory in dwarf-unwind perf tests: Don't tail call optimize in unwind test tools compiler.h: Add attribute to disable tail calls perf build: Add a LIBPFM4=1 build test entry perf tools: Add optional support for libpfm4 perf tools: Correct license on jsmn JSON parser perf jit: Fix inaccurate DWARF line table perf jvmti: Remove redundant jitdump line table entries perf build: Add NO_SDT=1 to the default set of build tests perf build: Add NO_LIBCRYPTO=1 to the default set of build tests perf build: Add NO_SYSCALL_TABLE=1 to the build tests perf build: Remove libaudit from the default feature checks ...
2020-06-03Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "More mm/ work, plenty more to come Subsystems affected by this patch series: slub, memcg, gup, kasan, pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs, thp, mmap, kconfig" * akpm: (131 commits) arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined riscv: support DEBUG_WX mm: add DEBUG_WX support drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid() powerpc/mm: drop platform defined pmd_mknotpresent() mm: thp: don't need to drain lru cache when splitting and mlocking THP hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs sparc32: register memory occupied by kernel as memblock.memory include/linux/memblock.h: fix minor typo and unclear comment mm, mempolicy: fix up gup usage in lookup_node tools/vm/page_owner_sort.c: filter out unneeded line mm: swap: memcg: fix memcg stats for huge pages mm: swap: fix vmstats for huge pages mm: vmscan: limit the range of LRU type balancing mm: vmscan: reclaim writepage is IO cost mm: vmscan: determine anon/file pressure balance at the reclaim root mm: balance LRU lists based on relative thrashing mm: only count actual rotations as LRU reclaim cost ...
2020-06-03tools/vm/page_owner_sort.c: filter out unneeded lineChanghee Han
To see a sorted result from page_owner, there must be a tiresome preprocessing step before running page_owner_sort. This patch simply filters out lines which start with "PFN" while reading the page owner report. Signed-off-by: Changhee Han <ch0.han@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Jonathan Corbet <corbet@lwn.net> Link: http://lkml.kernel.org/r/20200429052940.16968-1-ch0.han@lge.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03khugepaged: introduce 'max_ptes_shared' tunableKirill A. Shutemov
'max_ptes_shared' specifies how many pages can be shared across multiple processes. Exceeding the number would block the collapse:: /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared A higher value may increase memory footprint for some workloads. By default, at least half of pages has to be not shared. [colin.king@canonical.com: fix several spelling mistakes] Link: http://lkml.kernel.org/r/20200420084241.65433-1-colin.king@canonical.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Zi Yan <ziy@nvidia.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Link: http://lkml.kernel.org/r/20200416160026.16538-9-kirill.shutemov@linux.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03khugepaged: add self testKirill A. Shutemov
Patch series "thp/khugepaged improvements and CoW semantics", v4. The patchset adds khugepaged selftest (anon-THP only for now), expands cases khugepaged can handle and switches anon-THP copy-on-write handling to 4k. This patch (of 8): The test checks if khugepaged is able to recover huge page where we expect to do so. It only covers anon-THP for now. Currently the test shows few failures. They are going to be addressed by the following patches. [colin.king@canonical.com: fix several spelling mistakes] Link: http://lkml.kernel.org/r/20200420084241.65433-1-colin.king@canonical.com [aneesh.kumar@linux.ibm.com: replace the usage of system(3) in the test] Link: http://lkml.kernel.org/r/20200429110727.89388-1-aneesh.kumar@linux.ibm.com [kirill@shutemov.name: fixup for issues I've noticed] Link: http://lkml.kernel.org/r/20200429124816.jp272trghrzxx5j5@box [jhubbard@nvidia.com: add khugepaged to .gitignore] Link: http://lkml.kernel.org/r/20200517002509.362401-1-jhubbard@nvidia.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Tested-by: Zi Yan <ziy@nvidia.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: Yang Shi <yang.shi@linux.alibaba.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Link: http://lkml.kernel.org/r/20200416160026.16538-1-kirill.shutemov@linux.intel.com Link: http://lkml.kernel.org/r/20200416160026.16538-2-kirill.shutemov@linux.intel.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds
Pull networking updates from David Miller: 1) Allow setting bluetooth L2CAP modes via socket option, from Luiz Augusto von Dentz. 2) Add GSO partial support to igc, from Sasha Neftin. 3) Several cleanups and improvements to r8169 from Heiner Kallweit. 4) Add IF_OPER_TESTING link state and use it when ethtool triggers a device self-test. From Andrew Lunn. 5) Start moving away from custom driver versions, use the globally defined kernel version instead, from Leon Romanovsky. 6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin. 7) Allow hard IRQ deferral during NAPI, from Eric Dumazet. 8) Add sriov and vf support to hinic, from Luo bin. 9) Support Media Redundancy Protocol (MRP) in the bridging code, from Horatiu Vultur. 10) Support netmap in the nft_nat code, from Pablo Neira Ayuso. 11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina Dubroca. Also add ipv6 support for espintcp. 12) Lots of ReST conversions of the networking documentation, from Mauro Carvalho Chehab. 13) Support configuration of ethtool rxnfc flows in bcmgenet driver, from Doug Berger. 14) Allow to dump cgroup id and filter by it in inet_diag code, from Dmitry Yakunin. 15) Add infrastructure to export netlink attribute policies to userspace, from Johannes Berg. 16) Several optimizations to sch_fq scheduler, from Eric Dumazet. 17) Fallback to the default qdisc if qdisc init fails because otherwise a packet scheduler init failure will make a device inoperative. From Jesper Dangaard Brouer. 18) Several RISCV bpf jit optimizations, from Luke Nelson. 19) Correct the return type of the ->ndo_start_xmit() method in several drivers, it's netdev_tx_t but many drivers were using 'int'. From Yunjian Wang. 20) Add an ethtool interface for PHY master/slave config, from Oleksij Rempel. 21) Add BPF iterators, from Yonghang Song. 22) Add cable test infrastructure, including ethool interfaces, from Andrew Lunn. Marvell PHY driver is the first to support this facility. 23) Remove zero-length arrays all over, from Gustavo A. R. Silva. 24) Calculate and maintain an explicit frame size in XDP, from Jesper Dangaard Brouer. 25) Add CAP_BPF, from Alexei Starovoitov. 26) Support terse dumps in the packet scheduler, from Vlad Buslov. 27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei. 28) Add devm_register_netdev(), from Bartosz Golaszewski. 29) Minimize qdisc resets, from Cong Wang. 30) Get rid of kernel_getsockopt and kernel_setsockopt in order to eliminate set_fs/get_fs calls. From Christoph Hellwig. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits) selftests: net: ip_defrag: ignore EPERM net_failover: fixed rollback in net_failover_open() Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv" Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv" vmxnet3: allow rx flow hash ops only when rss is enabled hinic: add set_channels ethtool_ops support selftests/bpf: Add a default $(CXX) value tools/bpf: Don't use $(COMPILE.c) bpf, selftests: Use bpf_probe_read_kernel s390/bpf: Use bcr 0,%0 as tail call nop filler s390/bpf: Maintain 8-byte stack alignment selftests/bpf: Fix verifier test selftests/bpf: Fix sample_cnt shared between two threads bpf, selftests: Adapt cls_redirect to call csum_level helper bpf: Add csum_level helper for fixing up csum levels bpf: Fix up bpf_skb_adjust_room helper's skb csum setting sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf() crypto/chtls: IPv6 support for inline TLS Crypto/chcr: Fixes a coccinile check error Crypto/chcr: Fixes compilations warnings ...
2020-06-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull kvm updates from Paolo Bonzini: "ARM: - Move the arch-specific code into arch/arm64/kvm - Start the post-32bit cleanup - Cherry-pick a few non-invasive pre-NV patches x86: - Rework of TLB flushing - Rework of event injection, especially with respect to nested virtualization - Nested AMD event injection facelift, building on the rework of generic code and fixing a lot of corner cases - Nested AMD live migration support - Optimization for TSC deadline MSR writes and IPIs - Various cleanups - Asynchronous page fault cleanups (from tglx, common topic branch with tip tree) - Interrupt-based delivery of asynchronous "page ready" events (host side) - Hyper-V MSRs and hypercalls for guest debugging - VMX preemption timer fixes s390: - Cleanups Generic: - switch vCPU thread wakeup from swait to rcuwait The other architectures, and the guest side of the asynchronous page fault work, will come next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (256 commits) KVM: selftests: fix rdtsc() for vmx_tsc_adjust_test KVM: check userspace_addr for all memslots KVM: selftests: update hyperv_cpuid with SynDBG tests x86/kvm/hyper-v: Add support for synthetic debugger via hypercalls x86/kvm/hyper-v: enable hypercalls regardless of hypercall page x86/kvm/hyper-v: Add support for synthetic debugger interface x86/hyper-v: Add synthetic debugger definitions KVM: selftests: VMX preemption timer migration test KVM: nVMX: Fix VMX preemption timer migration x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit KVM: x86/pmu: Support full width counting KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' in KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT KVM: x86: acknowledgment mechanism for async pf page ready notifications KVM: x86: interrupt based APF 'page ready' event delivery KVM: introduce kvm_read_guest_offset_cached() KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present() KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously" KVM: VMX: Replace zero-length array with flexible-array ...
2020-06-03Merge tag 'threads-v5.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull thread updates from Christian Brauner: "We have been discussing using pidfds to attach to namespaces for quite a while and the patches have in one form or another already existed for about a year. But I wanted to wait to see how the general api would be received and adopted. This contains the changes to make it possible to use pidfds to attach to the namespaces of a process, i.e. they can be passed as the first argument to the setns() syscall. When only a single namespace type is specified the semantics are equivalent to passing an nsfd. That means setns(nsfd, CLONE_NEWNET) equals setns(pidfd, CLONE_NEWNET). However, when a pidfd is passed, multiple namespace flags can be specified in the second setns() argument and setns() will attach the caller to all the specified namespaces all at once or to none of them. Specifying 0 is not valid together with a pidfd. Here are just two obvious examples: setns(pidfd, CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWNET); setns(pidfd, CLONE_NEWUSER); Allowing to also attach subsets of namespaces supports various use-cases where callers setns to a subset of namespaces to retain privilege, perform an action and then re-attach another subset of namespaces. Apart from significantly reducing the number of syscalls needed to attach to all currently supported namespaces (eight "open+setns" sequences vs just a single "setns()"), this also allows atomic setns to a set of namespaces, i.e. either attaching to all namespaces succeeds or we fail without having changed anything. This is centered around a new internal struct nsset which holds all information necessary for a task to switch to a new set of namespaces atomically. Fwiw, with this change a pidfd becomes the only token needed to interact with a container. I'm expecting this to be picked-up by util-linux for nsenter rather soon. Associated with this change is a shiny new test-suite dedicated to setns() (for pidfds and nsfds alike)" * tag 'threads-v5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: selftests/pidfd: add pidfd setns tests nsproxy: attach to namespaces via pidfds nsproxy: add struct nsset
2020-06-02selftests: net: ip_defrag: ignore EPERMThadeu Lima de Souza Cascardo
When running with conntrack rules, the dropped overlap fragments may cause EPERM to be returned to sendto. Instead of completely failing, just ignore those errors and continue. If this causes packets with overlap fragments to be dropped as expected, that is okay. And if it causes packets that are expected to be received to be dropped, which should not happen, it will be detected as failure. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-02Merge tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: "Core block changes that have been queued up for this release: - Remove dead blk-throttle and blk-wbt code (Guoqing) - Include pid in blktrace note traces (Jan) - Don't spew I/O errors on wouldblock termination (me) - Zone append addition (Johannes, Keith, Damien) - IO accounting improvements (Konstantin, Christoph) - blk-mq hardware map update improvements (Ming) - Scheduler dispatch improvement (Salman) - Inline block encryption support (Satya) - Request map fixes and improvements (Weiping) - blk-iocost tweaks (Tejun) - Fix for timeout failing with error injection (Keith) - Queue re-run fixes (Douglas) - CPU hotplug improvements (Christoph) - Queue entry/exit improvements (Christoph) - Move DMA drain handling to the few drivers that use it (Christoph) - Partition handling cleanups (Christoph)" * tag 'for-5.8/block-2020-06-01' of git://git.kernel.dk/linux-block: (127 commits) block: mark bio_wouldblock_error() bio with BIO_QUIET blk-wbt: rename __wbt_update_limits to wbt_update_limits blk-wbt: remove wbt_update_limits blk-throttle: remove tg_drain_bios blk-throttle: remove blk_throtl_drain null_blk: force complete for timeout request blk-mq: drain I/O when all CPUs in a hctx are offline blk-mq: add blk_mq_all_tag_iter blk-mq: open code __blk_mq_alloc_request in blk_mq_alloc_request_hctx blk-mq: use BLK_MQ_NO_TAG in more places blk-mq: rename BLK_MQ_TAG_FAIL to BLK_MQ_NO_TAG blk-mq: move more request initialization to blk_mq_rq_ctx_init blk-mq: simplify the blk_mq_get_request calling convention blk-mq: remove the bio argument to ->prepare_request nvme: force complete cancelled requests blk-mq: blk-mq: provide forced completion method block: fix a warning when blkdev.h is included for !CONFIG_BLOCK builds block: blk-crypto-fallback: remove redundant initialization of variable err block: reduce part_stat_lock() scope block: use __this_cpu_add() instead of access by smp_processor_id() ...
2020-06-02Merge tag 'for-linus-hmm' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull hmm updates from Jason Gunthorpe: "This series adds a selftest for hmm_range_fault() and several of the DEVICE_PRIVATE migration related actions, and another simplification for hmm_range_fault()'s API. - Simplify hmm_range_fault() with a simpler return code, no HMM_PFN_SPECIAL, and no customizable output PFN format - Add a selftest for hmm_range_fault() and DEVICE_PRIVATE related functionality" * tag 'for-linus-hmm' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: MAINTAINERS: add HMM selftests mm/hmm/test: add selftests for HMM mm/hmm/test: add selftest driver for HMM mm/hmm: remove the customizable pfn format from hmm_range_fault mm/hmm: remove HMM_PFN_SPECIAL drm/amdgpu: remove dead code after hmm_range_fault() mm/hmm: make hmm_range_fault return 0 or -1
2020-06-02Merge tag 'pm-5.8-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These rework the system-wide PM driver flags, make runtime switching of cpuidle governors easier, improve the user space hibernation interface code, add intel-speed-select interface documentation, add more debug messages to the ACPI code handling suspend to idle, update the cpufreq core and drivers, fix a minor issue in the cpuidle core and update two cpuidle drivers, improve the PM-runtime framework, update the Intel RAPL power capping driver, update devfreq core and drivers, and clean up the cpupower utility. Specifics: - Rework the system-wide PM driver flags to make them easier to understand and use and update their documentation (Rafael Wysocki, Alan Stern). - Allow cpuidle governors to be switched at run time regardless of the kernel configuration and update the related documentation accordingly (Hanjun Guo). - Improve the resume device handling in the user space hibernarion interface code (Domenico Andreoli). - Document the intel-speed-select sysfs interface (Srinivas Pandruvada). - Make the ACPI code handing suspend to idle print more debug messages to help diagnose issues with it (Rafael Wysocki). - Fix a helper routine in the cpufreq core and correct a typo in the struct cpufreq_driver kerneldoc comment (Rafael Wysocki, Wang Wenhu). - Update cpufreq drivers: - Make the intel_pstate driver start in the passive mode by default on systems without HWP (Rafael Wysocki). - Add i.MX7ULP support to the imx-cpufreq-dt driver and add i.MX7ULP to the cpufreq-dt-platdev blacklist (Peng Fan). - Convert the qoriq cpufreq driver to a platform one, make the platform code create a suitable device object for it and add platform dependencies to it (Mian Yousaf Kaukab, Geert Uytterhoeven). - Fix wrong compatible binding in the qcom driver (Ansuel Smith). - Build the omap driver by default for ARCH_OMAP2PLUS (Anders Roxell). - Add r8a7742 SoC support to the dt cpufreq driver (Lad Prabhakar). - Update cpuidle core and drivers: - Fix three reference count leaks in error code paths in the cpuidle core (Qiushi Wu). - Convert Qualcomm SPM to a generic cpuidle driver (Stephan Gerhold). - Fix up the execution order when entering a domain idle state in the PSCI driver (Ulf Hansson). - Fix a reference counting issue related to clock management and clean up two oddities in the PM-runtime framework (Rafael Wysocki, Andy Shevchenko). - Add ElkhartLake support to the Intel RAPL power capping driver and remove an unused local MSR definition from it (Jacob Pan, Sumeet Pawnikar). - Update devfreq core and drivers: - Replace strncpy() with strscpy() in the devfreq core and use lockdep asserts instead of manual checks for a locked mutex in it (Dmitry Osipenko, Krzysztof Kozlowski). - Add a generic imx bus scaling driver and make it register an interconnect device (Leonard Crestez, Gustavo A. R. Silva). - Make the cpufreq notifier in the tegra30 driver take boosting into account and delete an unuseful error message from that driver (Dmitry Osipenko, Markus Elfring). - Remove unneeded semicolon from the cpupower code (Zou Wei)" * tag 'pm-5.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (51 commits) cpuidle: Fix three reference count leaks PM: runtime: Replace pm_runtime_callbacks_present() PM / devfreq: Use lockdep asserts instead of manual checks for locked mutex PM / devfreq: imx-bus: Fix inconsistent IS_ERR and PTR_ERR PM / devfreq: Replace strncpy with strscpy PM / devfreq: imx: Register interconnect device PM / devfreq: Add generic imx bus scaling driver PM / devfreq: tegra30: Delete an error message in tegra_devfreq_probe() PM / devfreq: tegra30: Make CPUFreq notifier to take into account boosting PM: hibernate: Restrict writes to the resume device PM: runtime: clk: Fix clk_pm_runtime_get() error path cpuidle: Convert Qualcomm SPM driver to a generic CPUidle driver ACPI: EC: PM: s2idle: Extend GPE dispatching debug message ACPI: PM: s2idle: Print type of wakeup debug messages powercap: RAPL: remove unused local MSR define PM: runtime: Make clear what we do when conditions are wrong in rpm_suspend() Documentation: admin-guide: pm: Document intel-speed-select PM: hibernate: Split off snapshot dev option PM: hibernate: Incorporate concurrency handling Documentation: ABI: make current_governer_ro as a candidate for removal ...
2020-06-02selftests/bpf: Add a default $(CXX) valueIlya Leoshkevich
When using make kselftest TARGETS=bpf, tools/bpf is built with MAKEFLAGS=rR, which causes $(CXX) to be undefined, which in turn causes the build to fail with CXX test_cpp /bin/sh: 2: g: not found Fix by adding a default $(CXX) value, like tools/build/feature/Makefile already does. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200602175649.2501580-3-iii@linux.ibm.com
2020-06-02tools/bpf: Don't use $(COMPILE.c)Ilya Leoshkevich
When using make kselftest TARGETS=bpf, tools/bpf is built with MAKEFLAGS=rR, which causes $(COMPILE.c) to be undefined, which in turn causes the build to fail with CC kselftest/bpf/tools/build/bpftool/map_perf_ring.o /bin/sh: 1: -MMD: not found Fix by using $(CC) $(CFLAGS) -c instead of $(COMPILE.c). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200602175649.2501580-2-iii@linux.ibm.com
2020-06-02Merge tag 'platform-drivers-x86-v5.8-1' of ↵Linus Torvalds
git://git.infradead.org/linux-platform-drivers-x86 Pull x86 platform driver updates from Andy Shevchenko: - Add a support of the media keys on the ASUS laptop UX325JA/UX425JA - ASUS WMI driver can now handle 2-in-1 models T100TA, T100CHI, T100HA, T200TA - Big refactoring of Intel SCU driver with Elkhart Lake support has been added - Slim Bootloarder firmware update signaling WMI driver has been added - Thinkpad ACPI driver can handle dual fan configuration on new P and X models - Touchscreen DMI driver has been extended to support - MP-man MPWIN895CL tablet - ONDA V891 v5 tablet - techBite Arc 11.6 - Trekstor Twin 10.1 - Trekstor Yourbook C11B - Vinga J116 - Virtual Button driver got a few fixes to detect mode of 2-in-1 tablet models - Intel Speed Select tools update - Plenty of small cleanups here and there * tag 'platform-drivers-x86-v5.8-1' of git://git.infradead.org/linux-platform-drivers-x86: (89 commits) platform/x86: dcdbas: Check SMBIOS for protected buffer address platform/x86: asus_wmi: Reserve more space for struct bias_args platform/x86: intel-vbtn: Only blacklist SW_TABLET_MODE on the 9 / "Laptop" chasis-type platform/x86: intel-hid: Add a quirk to support HP Spectre X2 (2015) platform/x86: touchscreen_dmi: Update Trekstor Twin 10.1 entry platform/x86: touchscreen_dmi: Add info for the Trekstor Yourbook C11B platform/x86: hp-wmi: Introduce HPWMI_POWER_FW_OR_HW as convenient shortcut platform/x86: hp-wmi: Convert simple_strtoul() to kstrtou32() platform/x86: hp-wmi: Refactor postcode_store() to follow standard patterns platform/x86: acerhdf: replace space by * in modalias platform/x86: ISST: Increase timeout tools/power/x86/intel-speed-select: Fix invalid core mask tools/power/x86/intel-speed-select: Increase CPU count tools/power/x86/intel-speed-select: Fix json perf-profile output output platform/x86: dell-wmi: Ignore keyboard attached / detached events platform/x86: dell-laptop: don't register micmute LED if there is no token platform/x86: thinkpad_acpi: Replace custom approach by kstrtoint() platform/x86: thinkpad_acpi: Use strndup_user() in dispatch_proc_write() platform/x86: thinkpad_acpi: Replace next_cmd(&buf) with strsep(&buf, ",") platform/x86: intel-vbtn: Detect switch position before registering the input-device ...
2020-06-02bpf, selftests: Use bpf_probe_read_kernelIlya Leoshkevich
Since commit 0ebeea8ca8a4 ("bpf: Restrict bpf_probe_read{, str}() only to archs where they work") 44 verifier tests fail on s390 due to not having bpf_probe_read anymore. Fix by using bpf_probe_read_kernel. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200602174448.2501214-1-iii@linux.ibm.com
2020-06-02selftests/bpf: Fix verifier testAlexei Starovoitov
Adjust verifier test due to addition of new field. Fixes: c3c16f2ea6d2 ("bpf: Add rx_queue_mapping to bpf_sock") Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2020-06-02selftests/bpf: Fix sample_cnt shared between two threadsAndrii Nakryiko
Make sample_cnt volatile to fix possible selftests failure due to compiler optimization preventing latest sample_cnt value to be visible to main thread. sample_cnt is incremented in background thread, which is then joined into main thread. So in terms of visibility sample_cnt update is ok. But because it's not volatile, compiler might make optimizations that would prevent main thread to see latest updated value. Fix this by marking global variable volatile. Fixes: cb1c9ddd5525 ("selftests/bpf: Add BPF ringbuf selftests") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20200602050349.215037-1-andriin@fb.com
2020-06-02bpf, selftests: Adapt cls_redirect to call csum_level helperDaniel Borkmann
Adapt bpf_skb_adjust_room() to pass in BPF_F_ADJ_ROOM_NO_CSUM_RESET flag and use the new bpf_csum_level() helper to inc/dec the checksum level by one after the encap/decap. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com> Link: https://lore.kernel.org/bpf/e7458f10e3f3d795307cbc5ad870112671d9c6f7.1591108731.git.daniel@iogearbox.net
2020-06-02bpf: Add csum_level helper for fixing up csum levelsDaniel Borkmann
Add a bpf_csum_level() helper which BPF programs can use in combination with bpf_skb_adjust_room() when they pass in BPF_F_ADJ_ROOM_NO_CSUM_RESET flag to the latter to avoid falling back to CHECKSUM_NONE. The bpf_csum_level() allows to adjust CHECKSUM_UNNECESSARY skb->csum_levels via BPF_CSUM_LEVEL_{INC,DEC} which calls __skb_{incr,decr}_checksum_unnecessary() on the skb. The helper also allows a BPF_CSUM_LEVEL_RESET which sets the skb's csum to CHECKSUM_NONE as well as a BPF_CSUM_LEVEL_QUERY to just return the current level. Without this helper, there is no way to otherwise adjust the skb->csum_level. I did not add an extra dummy flags as there is plenty of free bitspace in level argument itself iff ever needed in future. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Acked-by: Lorenz Bauer <lmb@cloudflare.com> Link: https://lore.kernel.org/bpf/279ae3717cb3d03c0ffeb511493c93c450a01e1a.1591108731.git.daniel@iogearbox.net
2020-06-02bpf: Fix up bpf_skb_adjust_room helper's skb csum settingDaniel Borkmann
Lorenz recently reported: In our TC classifier cls_redirect [0], we use the following sequence of helper calls to decapsulate a GUE (basically IP + UDP + custom header) encapsulated packet: bpf_skb_adjust_room(skb, -encap_len, BPF_ADJ_ROOM_MAC, BPF_F_ADJ_ROOM_FIXED_GSO) bpf_redirect(skb->ifindex, BPF_F_INGRESS) It seems like some checksums of the inner headers are not validated in this case. For example, a TCP SYN packet with invalid TCP checksum is still accepted by the network stack and elicits a SYN ACK. [...] That is, we receive the following packet from the driver: | ETH | IP | UDP | GUE | IP | TCP | skb->ip_summed == CHECKSUM_UNNECESSARY ip_summed is CHECKSUM_UNNECESSARY because our NICs do rx checksum offloading. On this packet we run skb_adjust_room_mac(-encap_len), and get the following: | ETH | IP | TCP | skb->ip_summed == CHECKSUM_UNNECESSARY Note that ip_summed is still CHECKSUM_UNNECESSARY. After bpf_redirect()'ing into the ingress, we end up in tcp_v4_rcv(). There, skb_checksum_init() is turned into a no-op due to CHECKSUM_UNNECESSARY. The bpf_skb_adjust_room() helper is not aware of protocol specifics. Internally, it handles the CHECKSUM_COMPLETE case via skb_postpull_rcsum(), but that does not cover CHECKSUM_UNNECESSARY. In this case skb->csum_level of the original skb prior to bpf_skb_adjust_room() call was 0, that is, covering UDP. Right now there is no way to adjust the skb->csum_level. NICs that have checksum offload disabled (CHECKSUM_NONE) or that support CHECKSUM_COMPLETE are not affected. Use a safe default for CHECKSUM_UNNECESSARY by resetting to CHECKSUM_NONE and add a flag to the helper called BPF_F_ADJ_ROOM_NO_CSUM_RESET that allows users from opting out. Opting out is useful for the case where we don't remove/add full protocol headers, or for the case where a user wants to adjust the csum level manually e.g. through bpf_csum_level() helper that is added in subsequent patch. The bpf_skb_proto_{4_to_6,6_to_4}() for NAT64/46 translation from the BPF bpf_skb_change_proto() helper uses bpf_skb_net_hdr_{push,pop}() pair internally as well but doesn't change layers, only transitions between v4 to v6 and vice versa, therefore no adoption is required there. [0] https://lore.kernel.org/bpf/20200424185556.7358-1-lmb@cloudflare.com/ Fixes: 2be7e212d541 ("bpf: add bpf_skb_adjust_room helper") Reported-by: Lorenz Bauer <lmb@cloudflare.com> Reported-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/CACAyw9-uU_52esMd1JjuA80fRPHJv5vsSg8GnfW3t_qDU4aVKQ@mail.gmail.com/ Link: https://lore.kernel.org/bpf/11a90472e7cce83e76ddbfce81fdfce7bfc68808.1591108731.git.daniel@iogearbox.net