summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2017-07-04net, ipv6: convert inet6_ifaddr.refcnt from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04net, ipv6: convert inet6_dev.refcnt from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-04net, ipv6: convert ipv6_txoptions.refcnt from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03Merge tag 'docs-4.13' of git://git.lwn.net/linuxLinus Torvalds
Pull documentation updates from Jonathan Corbet: "There has been a fair amount of activity in the docs tree this time around. Highlights include: - Conversion of a bunch of security documentation into RST - The conversion of the remaining DocBook templates by The Amazing Mauro Machine. We can now drop the entire DocBook build chain. - The usual collection of fixes and minor updates" * tag 'docs-4.13' of git://git.lwn.net/linux: (90 commits) scripts/kernel-doc: handle DECLARE_HASHTABLE Documentation: atomic_ops.txt is core-api/atomic_ops.rst Docs: clean up some DocBook loose ends Make the main documentation title less Geocities Docs: Use kernel-figure in vidioc-g-selection.rst Docs: fix table problems in ras.rst Docs: Fix breakage with Sphinx 1.5 and upper Docs: Include the Latex "ifthen" package doc/kokr/howto: Only send regression fixes after -rc1 docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters doc: Document suitability of IBM Verse for kernel development Doc: fix a markup error in coding-style.rst docs: driver-api: i2c: remove some outdated information Documentation: DMA API: fix a typo in a function name Docs: Insert missing space to separate link from text doc/ko_KR/memory-barriers: Update control-dependencies example Documentation, kbuild: fix typo "minimun" -> "minimum" docs: Fix some formatting issues in request-key.rst doc: ReSTify keys-trusted-encrypted.txt doc: ReSTify keys-request-key.txt ...
2017-07-03Merge tag 'tty-4.13-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial updates from Greg KH: "Here is the large tty/serial patchset for 4.13-rc1. A lot of tty and serial driver updates are in here, along with some fixups for some __get/put_user usages that were reported. Nothing huge, just lots of development by a number of different developers, full details in the shortlog. All of these have been in linux-next for a while" * tag 'tty-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (71 commits) tty: serial: lpuart: add a more accurate baud rate calculation method tty: serial: lpuart: add earlycon support for imx7ulp tty: serial: lpuart: add imx7ulp support dt-bindings: serial: fsl-lpuart: add i.MX7ULP support tty: serial: lpuart: add little endian 32 bit register support tty: serial: lpuart: refactor lpuart32_{read|write} prototype tty: serial: lpuart: introduce lpuart_soc_data to represent SoC property serial: imx-serial - move DMA buffer configuration to DT serial: imx: Enable RTSD only when needed serial: imx: Remove unused members from imx_port struct serial: 8250: 8250_omap: Fix race b/w dma completion and RX timeout serial: 8250: Fix THRE flag usage for CAP_MINI tty/serial: meson_uart: update to stable bindings dt-bindings: serial: Add bindings for the Amlogic Meson UARTs serial: Delete dead code for CIR serial ports serial: sirf: make of_device_ids const serial/mpsc: switch to dma_alloc_attrs tty: serial: Add Actions Semi Owl UART earlycon dt-bindings: serial: Document Actions Semi Owl UARTs tty/serial: atmel: make the driver DT only ...
2017-07-03Merge branch 'sched-core-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The main changes in this cycle were: - Add the SYSTEM_SCHEDULING bootup state to move various scheduler debug checks earlier into the bootup. This turns silent and sporadically deadly bugs into nice, deterministic splats. Fix some of the splats that triggered. (Thomas Gleixner) - A round of restructuring and refactoring of the load-balancing and topology code (Peter Zijlstra) - Another round of consolidating ~20 of incremental scheduler code history: this time in terms of wait-queue nomenclature. (I didn't get much feedback on these renaming patches, and we can still easily change any names I might have misplaced, so if anyone hates a new name, please holler and I'll fix it.) (Ingo Molnar) - sched/numa improvements, fixes and updates (Rik van Riel) - Another round of x86/tsc scheduler clock code improvements, in hope of making it more robust (Peter Zijlstra) - Improve NOHZ behavior (Frederic Weisbecker) - Deadline scheduler improvements and fixes (Luca Abeni, Daniel Bristot de Oliveira) - Simplify and optimize the topology setup code (Lauro Ramos Venancio) - Debloat and decouple scheduler code some more (Nicolas Pitre) - Simplify code by making better use of llist primitives (Byungchul Park) - ... plus other fixes and improvements" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (103 commits) sched/cputime: Refactor the cputime_adjust() code sched/debug: Expose the number of RT/DL tasks that can migrate sched/numa: Hide numa_wake_affine() from UP build sched/fair: Remove effective_load() sched/numa: Implement NUMA node level wake_affine() sched/fair: Simplify wake_affine() for the single socket case sched/numa: Override part of migrate_degrades_locality() when idle balancing sched/rt: Move RT related code from sched/core.c to sched/rt.c sched/deadline: Move DL related code from sched/core.c to sched/deadline.c sched/cpuset: Only offer CONFIG_CPUSETS if SMP is enabled sched/fair: Spare idle load balancing on nohz_full CPUs nohz: Move idle balancer registration to the idle path sched/loadavg: Generalize "_idle" naming to "_nohz" sched/core: Drop the unused try_get_task_struct() helper function sched/fair: WARN() and refuse to set buddy when !se->on_rq sched/debug: Fix SCHED_WARN_ON() to return a value on !CONFIG_SCHED_DEBUG as well sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming sched/wait: Move bit_wait_table[] and related functionality from sched/core.c to sched/wait_bit.c sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h> sched/wait: Re-adjust macro line continuation backslashes in <linux/wait.h> ...
2017-07-03openvswitch: fix mis-ordered comment lines for ovs_skb_cbDaniel Axtens
I was trying to wrap my head around meaning of mru, and realised that the second line of the comment defining it had somehow ended up after the line defining cutlen, leading to much confusion. Reorder the lines to make sense. Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03Merge tag 'wireless-drivers-next-for-davem-2017-07-03' of ↵David S. Miller
https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.13 Last minute changes to get new hardware and firmware support for iwlwifi and few other changes I was able to squeeze in. Also two patches for ieee80211.h and nl80211 as Johannes is away. Major changes: iwlwifi * some important fixes for 9000 HW * support for version 30 of the FW API for 8000 and 9000 series * a few new PCI IDs for 9000 series * reorganization of common files brcmfmac * support 4-way handshake offloading for WPA/WPA2-PSK and 802.1X ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03net: make sk_ehashfn() staticEric Dumazet
sk_ehashfn() is only used from a single file. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03net: avoid one splat in fib_nl_delrule()Eric Dumazet
We need to use refcount_set() on a newly created rule to avoid following error : [ 64.601749] ------------[ cut here ]------------ [ 64.601757] WARNING: CPU: 0 PID: 6476 at lib/refcount.c:184 refcount_sub_and_test+0x75/0xa0 [ 64.601758] Modules linked in: w1_therm wire cdc_acm ehci_pci ehci_hcd mlx4_en ib_uverbs mlx4_ib ib_core mlx4_core [ 64.601769] CPU: 0 PID: 6476 Comm: ip Tainted: G W 4.12.0-smp-DEV #274 [ 64.601771] task: ffff8837bf482040 task.stack: ffff8837bdc08000 [ 64.601773] RIP: 0010:refcount_sub_and_test+0x75/0xa0 [ 64.601774] RSP: 0018:ffff8837bdc0f5c0 EFLAGS: 00010286 [ 64.601776] RAX: 0000000000000026 RBX: 0000000000000001 RCX: 0000000000000000 [ 64.601777] RDX: 0000000000000026 RSI: 0000000000000096 RDI: ffffed06f7b81eae [ 64.601778] RBP: ffff8837bdc0f5d0 R08: 0000000000000004 R09: fffffbfff4a54c25 [ 64.601779] R10: 00000000cbc500e5 R11: ffffffffa52a6128 R12: ffff881febcf6f24 [ 64.601779] R13: ffff881fbf4eaf00 R14: ffff881febcf6f80 R15: ffff8837d7a4ed00 [ 64.601781] FS: 00007ff5a2f6b700(0000) GS:ffff881fff800000(0000) knlGS:0000000000000000 [ 64.601782] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 64.601783] CR2: 00007ffcdc70d000 CR3: 0000001f9c91e000 CR4: 00000000001406f0 [ 64.601783] Call Trace: [ 64.601786] refcount_dec_and_test+0x11/0x20 [ 64.601790] fib_nl_delrule+0xc39/0x1630 [ 64.601793] ? is_bpf_text_address+0xe/0x20 [ 64.601795] ? fib_nl_newrule+0x25e0/0x25e0 [ 64.601798] ? depot_save_stack+0x133/0x470 [ 64.601801] ? ns_capable+0x13/0x20 [ 64.601803] ? __netlink_ns_capable+0xcc/0x100 [ 64.601806] rtnetlink_rcv_msg+0x23a/0x6a0 [ 64.601808] ? rtnl_newlink+0x1630/0x1630 [ 64.601811] ? memset+0x31/0x40 [ 64.601813] netlink_rcv_skb+0x2d7/0x440 [ 64.601815] ? rtnl_newlink+0x1630/0x1630 [ 64.601816] ? netlink_ack+0xaf0/0xaf0 [ 64.601818] ? kasan_unpoison_shadow+0x35/0x50 [ 64.601820] ? __kmalloc_node_track_caller+0x4c/0x70 [ 64.601821] rtnetlink_rcv+0x28/0x30 [ 64.601823] netlink_unicast+0x422/0x610 [ 64.601824] ? netlink_attachskb+0x650/0x650 [ 64.601826] netlink_sendmsg+0x7b7/0xb60 [ 64.601828] ? netlink_unicast+0x610/0x610 [ 64.601830] ? netlink_unicast+0x610/0x610 [ 64.601832] sock_sendmsg+0xba/0xf0 [ 64.601834] ___sys_sendmsg+0x6a9/0x8c0 [ 64.601835] ? copy_msghdr_from_user+0x520/0x520 [ 64.601837] ? __alloc_pages_nodemask+0x160/0x520 [ 64.601839] ? memcg_write_event_control+0xd60/0xd60 [ 64.601841] ? __alloc_pages_slowpath+0x1d50/0x1d50 [ 64.601843] ? kasan_slab_free+0x71/0xc0 [ 64.601845] ? mem_cgroup_commit_charge+0xb2/0x11d0 [ 64.601847] ? lru_cache_add_active_or_unevictable+0x7d/0x1a0 [ 64.601849] ? __handle_mm_fault+0x1af8/0x2810 [ 64.601851] ? may_open_dev+0xc0/0xc0 [ 64.601852] ? __pmd_alloc+0x2c0/0x2c0 [ 64.601853] ? __fdget+0x13/0x20 [ 64.601855] __sys_sendmsg+0xc6/0x150 [ 64.601856] ? __sys_sendmsg+0xc6/0x150 [ 64.601857] ? SyS_shutdown+0x170/0x170 [ 64.601859] ? handle_mm_fault+0x28a/0x650 [ 64.601861] SyS_sendmsg+0x12/0x20 [ 64.601863] entry_SYSCALL_64_fastpath+0x13/0x94 Fixes: 717d1e993ad8 ("net: convert fib_rule.refcnt from atomic_t to refcount_t") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03net: core: Fix slab-out-of-bounds in netdev_stats_to_stats64Alban Browaeys
commit 9256645af098 ("net/core: relax BUILD_BUG_ON in netdev_stats_to_stats64") made an attempt to read beyond the size of the source a possibility. Fix to only copy src size to dest. As dest might be bigger than src. ================================================================== BUG: KASAN: slab-out-of-bounds in netdev_stats_to_stats64+0xe/0x30 at addr ffff8801be248b20 Read of size 192 by task VBoxNetAdpCtl/6734 CPU: 1 PID: 6734 Comm: VBoxNetAdpCtl Tainted: G O 4.11.4prahal+intel+ #118 Hardware name: LENOVO 20CDCTO1WW/20CDCTO1WW, BIOS GQET52WW (1.32 ) 05/04/2017 Call Trace: dump_stack+0x63/0x86 kasan_object_err+0x1c/0x70 kasan_report+0x270/0x520 ? netdev_stats_to_stats64+0xe/0x30 ? sched_clock_cpu+0x1b/0x190 ? __module_address+0x3e/0x3b0 ? unwind_next_frame+0x1ea/0xb00 check_memory_region+0x13c/0x1a0 memcpy+0x23/0x50 netdev_stats_to_stats64+0xe/0x30 dev_get_stats+0x1b9/0x230 rtnl_fill_stats+0x44/0xc00 ? nla_put+0xc6/0x130 rtnl_fill_ifinfo+0xe9e/0x3700 ? rtnl_fill_vfinfo+0xde0/0xde0 ? sched_clock+0x9/0x10 ? sched_clock+0x9/0x10 ? sched_clock_local+0x120/0x130 ? __module_address+0x3e/0x3b0 ? unwind_next_frame+0x1ea/0xb00 ? sched_clock+0x9/0x10 ? sched_clock+0x9/0x10 ? sched_clock_cpu+0x1b/0x190 ? VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp] ? depot_save_stack+0x1d8/0x4a0 ? depot_save_stack+0x34f/0x4a0 ? depot_save_stack+0x34f/0x4a0 ? save_stack+0xb1/0xd0 ? save_stack_trace+0x16/0x20 ? save_stack+0x46/0xd0 ? kasan_slab_alloc+0x12/0x20 ? __kmalloc_node_track_caller+0x10d/0x350 ? __kmalloc_reserve.isra.36+0x2c/0xc0 ? __alloc_skb+0xd0/0x560 ? rtmsg_ifinfo_build_skb+0x61/0x120 ? rtmsg_ifinfo.part.25+0x16/0xb0 ? rtmsg_ifinfo+0x47/0x70 ? register_netdev+0x15/0x30 ? vboxNetAdpOsCreate+0xc0/0x1c0 [vboxnetadp] ? vboxNetAdpCreate+0x210/0x400 [vboxnetadp] ? VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp] ? do_vfs_ioctl+0x17f/0xff0 ? SyS_ioctl+0x74/0x80 ? do_syscall_64+0x182/0x390 ? __alloc_skb+0xd0/0x560 ? __alloc_skb+0xd0/0x560 ? save_stack_trace+0x16/0x20 ? init_object+0x64/0xa0 ? ___slab_alloc+0x1ae/0x5c0 ? ___slab_alloc+0x1ae/0x5c0 ? __alloc_skb+0xd0/0x560 ? sched_clock+0x9/0x10 ? kasan_unpoison_shadow+0x35/0x50 ? kasan_kmalloc+0xad/0xe0 ? __kmalloc_node_track_caller+0x246/0x350 ? __alloc_skb+0xd0/0x560 ? kasan_unpoison_shadow+0x35/0x50 ? memset+0x31/0x40 ? __alloc_skb+0x31f/0x560 ? napi_consume_skb+0x320/0x320 ? br_get_link_af_size_filtered+0xb7/0x120 [bridge] ? if_nlmsg_size+0x440/0x630 rtmsg_ifinfo_build_skb+0x83/0x120 rtmsg_ifinfo.part.25+0x16/0xb0 rtmsg_ifinfo+0x47/0x70 register_netdevice+0xa2b/0xe50 ? __kmalloc+0x171/0x2d0 ? netdev_change_features+0x80/0x80 register_netdev+0x15/0x30 vboxNetAdpOsCreate+0xc0/0x1c0 [vboxnetadp] vboxNetAdpCreate+0x210/0x400 [vboxnetadp] ? vboxNetAdpComposeMACAddress+0x1d0/0x1d0 [vboxnetadp] ? kasan_check_write+0x14/0x20 VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp] ? VBoxNetAdpLinuxOpen+0x20/0x20 [vboxnetadp] ? lock_acquire+0x11c/0x270 ? __audit_syscall_entry+0x2fb/0x660 do_vfs_ioctl+0x17f/0xff0 ? __audit_syscall_entry+0x2fb/0x660 ? ioctl_preallocate+0x1d0/0x1d0 ? __audit_syscall_entry+0x2fb/0x660 ? kmem_cache_free+0xb2/0x250 ? syscall_trace_enter+0x537/0xd00 ? exit_to_usermode_loop+0x100/0x100 SyS_ioctl+0x74/0x80 ? do_sys_open+0x350/0x350 ? do_vfs_ioctl+0xff0/0xff0 do_syscall_64+0x182/0x390 entry_SYSCALL64_slow_path+0x25/0x25 RIP: 0033:0x7f7e39a1ae07 RSP: 002b:00007ffc6f04c6d8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007ffc6f04c730 RCX: 00007f7e39a1ae07 RDX: 00007ffc6f04c730 RSI: 00000000c0207601 RDI: 0000000000000007 RBP: 00007ffc6f04c700 R08: 00007ffc6f04c780 R09: 0000000000000008 R10: 0000000000000541 R11: 0000000000000206 R12: 0000000000000007 R13: 00000000c0207601 R14: 00007ffc6f04c730 R15: 0000000000000012 Object at ffff8801be248008, in cache kmalloc-4096 size: 4096 Allocated: PID = 6734 save_stack_trace+0x16/0x20 save_stack+0x46/0xd0 kasan_kmalloc+0xad/0xe0 __kmalloc+0x171/0x2d0 alloc_netdev_mqs+0x8a7/0xbe0 vboxNetAdpOsCreate+0x65/0x1c0 [vboxnetadp] vboxNetAdpCreate+0x210/0x400 [vboxnetadp] VBoxNetAdpLinuxIOCtlUnlocked+0x14b/0x280 [vboxnetadp] do_vfs_ioctl+0x17f/0xff0 SyS_ioctl+0x74/0x80 do_syscall_64+0x182/0x390 return_from_SYSCALL_64+0x0/0x6a Freed: PID = 5600 save_stack_trace+0x16/0x20 save_stack+0x46/0xd0 kasan_slab_free+0x73/0xc0 kfree+0xe4/0x220 kvfree+0x25/0x30 single_release+0x74/0xb0 __fput+0x265/0x6b0 ____fput+0x9/0x10 task_work_run+0xd5/0x150 exit_to_usermode_loop+0xe2/0x100 do_syscall_64+0x26c/0x390 return_from_SYSCALL_64+0x0/0x6a Memory state around the buggy address: ffff8801be248a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8801be248b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8801be248b80: 00 00 00 00 00 00 00 00 00 00 00 07 fc fc fc fc ^ ffff8801be248c00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff8801be248c80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Signed-off-by: Alban Browaeys <alban.browaeys@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03iucv: Convert sk_wmem_alloc accesses to refcount_t.David S. Miller
Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03bpf: simplify narrower ctx accessDaniel Borkmann
This work tries to make the semantics and code around the narrower ctx access a bit easier to follow. Right now everything is done inside the .is_valid_access(). Offset matching is done differently for read/write types, meaning writes don't support narrower access and thus matching only on offsetof(struct foo, bar) is enough whereas for read case that supports narrower access we must check for offsetof(struct foo, bar) + offsetof(struct foo, bar) + sizeof(<bar>) - 1 for each of the cases. For read cases of individual members that don't support narrower access (like packet pointers or skb->cb[] case which has its own narrow access logic), we check as usual only offsetof(struct foo, bar) like in write case. Then, for the case where narrower access is allowed, we also need to set the aux info for the access. Meaning, ctx_field_size and converted_op_size have to be set. First is the original field size e.g. sizeof(<bar>) as in above example from the user facing ctx, and latter one is the target size after actual rewrite happened, thus for the kernel facing ctx. Also here we need the range match and we need to keep track changing convert_ctx_access() and converted_op_size from is_valid_access() as both are not at the same location. We can simplify the code a bit: check_ctx_access() becomes simpler in that we only store ctx_field_size as a meta data and later in convert_ctx_accesses() we fetch the target_size right from the location where we do convert. Should the verifier be misconfigured we do reject for BPF_WRITE cases or target_size that are not provided. For the subsystems, we always work on ranges in is_valid_access() and add small helpers for ranges and narrow access, convert_ctx_accesses() sets target_size for the relevant instruction. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Cc: Yonghong Song <yhs@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03bpf: add bpf_skb_adjust_room helperDaniel Borkmann
This work adds a helper that can be used to adjust net room of an skb. The helper is generic and can be further extended in future. Main use case is for having a programmatic way to add/remove room to v4/v6 header options along with cls_bpf on egress and ingress hook of the data path. It reuses most of the infrastructure that we added for the bpf_skb_change_type() helper which can be used in nat64 translations. Similarly, the helper only takes care of adjusting the room so that related data is populated and csum adapted out of the BPF program using it. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03bpf, net: add skb_mac_header_len helperDaniel Borkmann
Add a small skb_mac_header_len() helper similarly as the skb_network_header_len() we have and replace open coded places in BPF's bpf_skb_change_proto() helper. Will also be used in upcoming work. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-03ipv6: dad: don't remove dynamic addresses if link is downSabrina Dubroca
Currently, when the link for $DEV is down, this command succeeds but the address is removed immediately by DAD (1): ip addr add 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800 In the same situation, this will succeed and not remove the address (2): ip addr add 1111::12/64 dev $DEV ip addr change 1111::12/64 dev $DEV valid_lft 3600 preferred_lft 1800 The comment in addrconf_dad_begin() when !IF_READY makes it look like this is the intended behavior, but doesn't explain why: * If the device is not ready: * - keep it tentative if it is a permanent address. * - otherwise, kill it. We clearly cannot prevent userspace from doing (2), but we can make (1) work consistently with (2). addrconf_dad_stop() is only called in two cases: if DAD failed, or to skip DAD when the link is down. In that second case, the fix is to avoid deleting the address, like we already do for permanent addresses. Fixes: 3c21edbd1137 ("[IPV6]: Defer IPv6 device initialization until the link becomes ready.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-02bpf: fix to bpf_setsockopsLawrence Brakmo
Fixed build error due to misplaced "#ifdef CONFIG_INET" (moved 1 statement up). Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: Adds support for setting sndcwnd clampLawrence Brakmo
Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_SNDCWND_CLAMP, which sets the initial congestion window. It is useful to limit the sndcwnd when the host are close to each other (small RTT). Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: Adds support for setting initial cwndLawrence Brakmo
Adds a new bpf_setsockopt for TCP sockets, TCP_BPF_IW, which sets the initial congestion window. This can be used when the hosts are far apart (large RTTs) and it is safe to start with a large inital cwnd. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: Add support for changing congestion controlLawrence Brakmo
Added support for changing congestion control for SOCK_OPS bpf programs through the setsockopt bpf helper function. It also adds a new SOCK_OPS op, BPF_SOCK_OPS_NEEDS_ECN, that is needed for congestion controls, like dctcp, that need to enable ECN in the SYN packets. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: Add TCP connection BPF callbacksLawrence Brakmo
Added callbacks to BPF SOCK_OPS type program before an active connection is intialized and after a passive or active connection is established. The following patch demostrates how they can be used to set send and receive buffer sizes. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: Add setsockopt helper function to bpfLawrence Brakmo
Added support for calling a subset of socket setsockopts from BPF_PROG_TYPE_SOCK_OPS programs. The code was duplicated rather than making the changes to call the socket setsockopt function because the changes required would have been larger. The ops supported are: SO_RCVBUF SO_SNDBUF SO_MAX_PACING_RATE SO_PRIORITY SO_RCVLOWAT SO_MARK Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: Support for setting initial receive windowLawrence Brakmo
This patch adds suppport for setting the initial advertized window from within a BPF_SOCK_OPS program. This can be used to support larger initial cwnd values in environments where it is known to be safe. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: Support for per connection SYN/SYN-ACK RTOsLawrence Brakmo
This patch adds support for setting a per connection SYN and SYN_ACK RTOs from within a BPF_SOCK_OPS program. For example, to set small RTOs when it is known both hosts are within a datacenter. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01bpf: BPF support for sock_opsLawrence Brakmo
Created a new BPF program type, BPF_PROG_TYPE_SOCK_OPS, and a corresponding struct that allows BPF programs of this type to access some of the socket's fields (such as IP addresses, ports, etc.). It uses the existing bpf cgroups infrastructure so the programs can be attached per cgroup with full inheritance support. The program will be called at appropriate times to set relevant connections parameters such as buffer sizes, SYN and SYN-ACK RTOs, etc., based on connection information such as IP addresses, port numbers, etc. Alghough there are already 3 mechanisms to set parameters (sysctls, route metrics and setsockopts), this new mechanism provides some distinct advantages. Unlike sysctls, it can set parameters per connection. In contrast to route metrics, it can also use port numbers and information provided by a user level program. In addition, it could set parameters probabilistically for evaluation purposes (i.e. do something different on 10% of the flows and compare results with the other 90% of the flows). Also, in cases where IPv6 addresses contain geographic information, the rules to make changes based on the distance (or RTT) between the hosts are much easier than route metric rules and can be global. Finally, unlike setsockopt, it oes not require application changes and it can be updated easily at any time. Although the bpf cgroup framework already contains a sock related program type (BPF_PROG_TYPE_CGROUP_SOCK), I created the new type (BPF_PROG_TYPE_SOCK_OPS) beccause the existing type expects to be called only once during the connections's lifetime. In contrast, the new program type will be called multiple times from different places in the network stack code. For example, before sending SYN and SYN-ACKs to set an appropriate timeout, when the connection is established to set congestion control, etc. As a result it has "op" field to specify the type of operation requested. The purpose of this new program type is to simplify setting connection parameters, such as buffer sizes, TCP's SYN RTO, etc. For example, it is easy to use facebook's internal IPv6 addresses to determine if both hosts of a connection are in the same datacenter. Therefore, it is easy to write a BPF program to choose a small SYN RTO value when both hosts are in the same datacenter. This patch only contains the framework to support the new BPF program type, following patches add the functionality to set various connection parameters. This patch defines a new BPF program type: BPF_PROG_TYPE_SOCKET_OPS and a new bpf syscall command to load a new program of this type: BPF_PROG_LOAD_SOCKET_OPS. Two new corresponding structs (one for the kernel one for the user/BPF program): /* kernel version */ struct bpf_sock_ops_kern { struct sock *sk; __u32 op; union { __u32 reply; __u32 replylong[4]; }; }; /* user version * Some fields are in network byte order reflecting the sock struct * Use the bpf_ntohl helper macro in samples/bpf/bpf_endian.h to * convert them to host byte order. */ struct bpf_sock_ops { __u32 op; union { __u32 reply; __u32 replylong[4]; }; __u32 family; __u32 remote_ip4; /* In network byte order */ __u32 local_ip4; /* In network byte order */ __u32 remote_ip6[4]; /* In network byte order */ __u32 local_ip6[4]; /* In network byte order */ __u32 remote_port; /* In network byte order */ __u32 local_port; /* In host byte horder */ }; Currently there are two types of ops. The first type expects the BPF program to return a value which is then used by the caller (or a negative value to indicate the operation is not supported). The second type expects state changes to be done by the BPF program, for example through a setsockopt BPF helper function, and they ignore the return value. The reply fields of the bpf_sockt_ops struct are there in case a bpf program needs to return a value larger than an integer. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01Merge branch 'for-upstream' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2017-07-01 Here are some more Bluetooth patches for the 4.13 kernel: - Added support for Broadcom BCM43430 controllers - Added sockaddr length checks before accessing sa_family - Fixed possible "might sleep" errors in bnep, cmtp and hidp modules - A few other minor fixes Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: Add peeloff-flags socket optionNeil Horman
Based on a request raised on the sctp devel list, there is a need to augment the sctp_peeloff operation while specifying the O_CLOEXEC and O_NONBLOCK flags (simmilar to the socket syscall). Since modifying the SCTP_SOCKOPT_PEELOFF socket option would break user space ABI for existing programs, this patch creates a new socket option SCTP_SOCKOPT_PEELOFF_FLAGS, which accepts a third flags parameter to allow atomic assignment of the socket descriptor flags. Tested successfully by myself and the requestor Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Vlad Yasevich <vyasevich@gmail.com> CC: "David S. Miller" <davem@davemloft.net> CC: Andreas Steinmetz <ast@domdv.de> CC: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01Merge tag 'nfc-next-4.13-1' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next Samuel Ortiz says: ==================== NFC 4.13 pull request This is the NFC pull requesy for 4.13. We have: - A conversion to unified device and GPIO APIs for the fdp, pn544, and st{21,-nci} drivers. - A fix for NFC device IDs allocation. - A fix for the nfcmrvl driver firmware download mechanism. - A trf7970a DT and GPIO cleanup and clock setting fix. - A few fixes for potential overflows in the digital and LLCP code. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01datapath: Avoid using stack larger than 1024.Tonghao Zhang
When compiling OvS-master on 4.4.0-81 kernel, there is a warning: CC [M] /root/ovs/datapath/linux/datapath.o /root/ovs/datapath/linux/datapath.c: In function 'ovs_flow_cmd_set': /root/ovs/datapath/linux/datapath.c:1221:1: warning: the frame size of 1040 bytes is larger than 1024 bytes [-Wframe-larger-than=] This patch factors out match-init and action-copy to avoid "Wframe-larger-than=1024" warning. Because mask is only used to get actions, we new a function to save some stack space. Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_init_chunk_tXin Long
This patch is to remove the typedef sctp_init_chunk_t, and replace with struct sctp_init_chunk in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_inithdr_tXin Long
This patch is to remove the typedef sctp_inithdr_t, and replace with struct sctp_inithdr in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_data_chunk_tXin Long
This patch is to remove the typedef sctp_data_chunk_t, and replace with struct sctp_data_chunk in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_datahdr_tXin Long
This patch is to remove the typedef sctp_datahdr_t, and replace with struct sctp_datahdr in the places where it's using this typedef. It is also to use izeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_param_tXin Long
This patch is to remove the typedef sctp_param_t, and replace with struct sctp_paramhdr in the places where it's using this typedef. It is also to remove the useless declaration sctp_addip_addr_config and fix the lack of params for some other functions' declaration. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_paramhdr_tXin Long
This patch is to remove the typedef sctp_paramhdr_t, and replace with struct sctp_paramhdr in the places where it's using this typedef. It is also to fix some indents and use sizeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_cid_tXin Long
This patch is to remove the typedef sctp_cid_t, and replace with struct sctp_cid in the places where it's using this typedef. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_chunkhdr_tXin Long
This patch is to remove the typedef sctp_chunkhdr_t, and replace with struct sctp_chunkhdr in the places where it's using this typedef. It is also to fix some indents and use sizeof(variable) instead of sizeof(type)., especially in sctp_new. Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove the typedef sctp_sctphdr_tXin Long
This patch is to remove the typedef sctp_sctphdr_t, and replace with struct sctphdr in the places where it's using this typedef. It is also to fix some indents and use sizeof(variable) instead of sizeof(type). Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01sctp: remove an unnecessary check from sctp_endpoint_destroyXin Long
ep->base.sk gets it's value since sctp_endpoint_new, nowhere will change it. So there's no need to check if it's null, as it can never be null. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01net: convert packet_fanout.sk_ref from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01net: convert netlbl_lsm_cache.refcount from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01net: convert net.passive from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01net: convert inet_frag_queue.refcnt from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01net: convert fib_rule.refcnt from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01net: convert unix_address.refcnt from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01net: convert netpoll_info.refcnt from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01net: convert in_device.refcnt from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01net: convert ip_mc_list.refcnt from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01net: convert sock.sk_refcnt from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. This patch uses refcount_inc_not_zero() instead of atomic_inc_not_zero_hint() due to absense of a _hint() version of refcount API. If the hint() version must be used, we might need to revisit API. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-07-01net: convert sock.sk_wmem_alloc from atomic_t to refcount_tReshetova, Elena
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: David Windsor <dwindsor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>