summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-12-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Lots of overlapping changes. Also on the net-next side the XDP state management is handled more in the generic layers so undo the 'net' nfp fix which isn't applicable in net-next. Include a necessary change by Jakub Kicinski, with log message: ==================== cls_bpf no longer takes care of offload tracking. Make sure netdevsim performs necessary checks. This fixes a warning caused by TC trying to remove a filter it has not added. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller" "What's a holiday weekend without some networking bug fixes? [1] 1) Fix some eBPF JIT bugs wrt. SKB pointers across helper function calls, from Daniel Borkmann. 2) Fix regression from errata limiting change to marvell PHY driver, from Zhao Qiang. 3) Fix u16 overflow in SCTP, from Xin Long. 4) Fix potential memory leak during bridge newlink, from Nikolay Aleksandrov. 5) Fix BPF selftest build on s390, from Hendrik Brueckner. 6) Don't append to cfg80211 automatically generated certs file, always write new ones from scratch. From Thierry Reding. 7) Fix sleep in atomic in mac80211 hwsim, from Jia-Ju Bai. 8) Fix hang on tg3 MTU change with certain chips, from Brian King. 9) Add stall detection to arc emac driver and reset chip when this happens, from Alexander Kochetkov. 10) Fix MTU limitng in GRE tunnel drivers, from Xin Long. 11) Fix stmmac timestamping bug due to mis-shifting of field. From Fredrik Hallenberg. 12) Fix metrics match when deleting an ipv4 route. The kernel sets some internal metrics bits which the user isn't going to set when it makes the delete request. From Phil Sutter. 13) mvneta driver loop over RX queues limits on "txq_number" :-) Fix from Yelena Krivosheev. 14) Fix double free and memory corruption in get_net_ns_by_id, from Eric W. Biederman. 15) Flush ipv4 FIB tables in the reverse order. Some tables can share their actual backing data, in particular this happens for the MAIN and LOCAL tables. We have to kill the LOCAL table first, because it uses MAIN's backing memory. Fix from Ido Schimmel. 16) Several eBPF verifier value tracking fixes, from Edward Cree, Jann Horn, and Alexei Starovoitov. 17) Make changes to ipv6 autoflowlabel sysctl really propagate to sockets, unless the socket has set the per-socket value explicitly. From Shaohua Li. 18) Fix leaks and double callback invocations of zerocopy SKBs, from Willem de Bruijn" [1] Is this a trick question? "Relaxing"? "Quiet"? "Fine"? - Linus. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits) skbuff: skb_copy_ubufs must release uarg even without user frags skbuff: orphan frags before zerocopy clone net: reevalulate autoflowlabel setting after sysctl setting openvswitch: Fix pop_vlan action for double tagged frames ipv6: Honor specified parameters in fibmatch lookup bpf: do not allow root to mangle valid pointers selftests/bpf: add tests for recent bugfixes bpf: fix integer overflows bpf: don't prune branches when a scalar is replaced with a pointer bpf: force strict alignment checks for stack pointers bpf: fix missing error return in check_stack_boundary() bpf: fix 32-bit ALU op verification bpf: fix incorrect tracking of register size truncation bpf: fix incorrect sign extension in check_alu_op() bpf/verifier: fix bounds calculation on BPF_RSH ipv4: Fix use-after-free when flushing FIB tables s390/qeth: fix error handling in checksum cmd callback tipc: remove joining group member from congested list selftests: net: Adding config fragment CONFIG_NUMA=y nfp: bpf: keep track of the offloaded program ...
2017-12-21Merge branch 'flow_dissector-Provide-basic-batman-adv-unicast-handling'David S. Miller
Sven Eckelmann says: ==================== flow_dissector: Provide basic batman-adv unicast handling we are currently starting to use batman-adv as mesh protocol on multicore embedded devices. These usually don't have a lot of CPU power per core but are reasonable fast when using multiple cores. It was noticed that sending was working very well but receiving was basically only using on CPU core per neighbor. The reason for that is format of the (normal) incoming packet: +--------------------+ | ip(v6)hdr | +--------------------+ | inner ethhdr | +--------------------+ | batadv unicast hdr | +--------------------+ | outer ethhdr | +--------------------+ The flow dissector will therefore stop after parsing the outer ethernet header and will not parse the actual ipv(4|6)/... header of the packet. Our assumption was now that it would help us to add minimal support to the flow dissector to jump over the batman-adv unicast and inner ethernet header (like in gre ETH_P_TEB). The patch was implemented in a slightly hacky way [1] and the results looked quite promising. I didn't get any feedback how the files should actually be named. So I am now just using the names from RFC v3 The discussion of the RFC v3 can be found in the related patches of https://patchwork.ozlabs.org/cover/849345/ The discussion of the RFC v2 can be found in the related patches of https://patchwork.ozlabs.org/cover/844783/ Changes in v4: ============== * added patch to change the u8/u16 to __u8/__u16 in include/uapi/linux/batadv_packet.h - requested by Willem de Bruijn <willemdebruijn.kernel@gmail.com> Changes in v3: ============== * removed change of uapi/linux/batman_adv.h to uapi/linux/batadv_genl.h - requested by Willem de Bruijn <willemdebruijn.kernel@gmail.com> * removed naming fixes for enums/defines in uapi/linux/batadv_genl.h - requested by Willem de Bruijn <willemdebruijn.kernel@gmail.com> * renamed uapi/linux/batadv.h to uapi/linux/batadv_packet.h * moved batadv dissector functionality in own function - requested by Tom Herbert <tom@herbertland.com> * added support for flags FLOW_DISSECTOR_F_STOP_AT_ENCAP and FLOW_DIS_ENCAPSULATION - requested by Willem de Bruijn <willemdebruijn.kernel@gmail.com> Changes in v2: ============== * removed the batman-adv unicast packet header definition from flow_dissector.c * moved the batman-adv packet.h/uapi headers around to provide the correct definitions to flow_dissector.c ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21flow_dissector: Parse batman-adv unicast headersSven Eckelmann
The batman-adv unicast packets contain a full layer 2 frame in encapsulated form. The flow dissector must therefore be able to parse the batman-adv unicast header to reach the layer 2+3 information. +--------------------+ | ip(v6)hdr | +--------------------+ | inner ethhdr | +--------------------+ | batadv unicast hdr | +--------------------+ | outer ethhdr | +--------------------+ The obtained information from the upper layer can then be used by RPS to schedule the processing on separate cores. This allows better distribution of multiple flows from the same neighbor to different cores. Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21batman-adv: Convert packet.h to uapi headerSven Eckelmann
The header file is used by different userspace programs to inject packets or to decode sniffed packets. It should therefore be available to them as userspace header. Also other components in the kernel (like the flow dissector) require access to the packet definitions to be able to decode ETH_P_BATMAN ethernet packets. Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21batman-adv: Remove kernel fixed width types in packet.hSven Eckelmann
The uapi headers use the __u8/__u16/... version of the fixed width types instead of u8/u16/... The use of the latter must be avoided before packet.h is copied to include/uapi/linux/. Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21batman-adv: Remove usage of BIT(x) in packet.hSven Eckelmann
The BIT(x) macro is no longer available for uapi headers because it is defined outside of it (linux/bitops.h). The use of it must therefore be avoided and replaced by an appropriate other representation. Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21batman-adv: Let packet.h include its headers directlySven Eckelmann
The headers used by packet.h should also be included by it directly. main.h is currently dealing with it in batman-adv, but this will no longer work when this header is moved to include/uapi/linux/. Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21Merge branch 'sfc-Medford2'David S. Miller
Bert Kenward says: ==================== sfc: support extra stats on Medford2 X2000-series NICs add port stats for two new features: FEC (Forward Error Correction, used on 25G links) and CTPIO (cut-through programmed I/O). This patch series adds support for reporting both of these sets of stats v2: add additional Signed-off-by ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21sfc: expose CTPIO stats on NICs that support themBert Kenward
While the Linux driver doesn't use CTPIO ('cut-through programmed I/O'), other drivers on the same port might, so if we're responsible for reporting per-port stats we need to include the CTPIO stats. Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21sfc: expose FEC stats on Medford2Edward Cree
There's no explicit capability bit, so we just condition them on having efx->num_mac_stats >= MC_CMD_MAC_NSTATS_V2. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21sfc: support variable number of MAC statsEdward Cree
Medford2 NICs support more than MC_CMD_MAC_NSTATS stats, and report the new count in a field of MC_CMD_GET_CAPABILITIES_V4. This also means that the end generation count moves (it is, as before, the last 64 bits of the DMA buffer, but that is no longer MC_CMD_MAC_GENERATION_END). So read num_mac_stats from the GET_CAPABILITIES response, if present; otherwise assume MC_CMD_MAC_NSTATS; and always use num_mac_stats - 1 rather than MC_CMD_MAC_GENERATION_END. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21sfc: update MCDI protocol headersEdward Cree
Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21cxgb4: add new T5 and T6 device id'sGanesh Goudar
Add device id's 0x50ac, 0x6087 for T5 and T6 cards respectively. Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: dwc-xlgmac: Get rid of custom hex_dump_to_buffer()Jie Deng
Get rid of custom hex_dump_to_buffer(). The output is slightly changed, i.e. each byte followed by white space. Note, we don't use print_hex_dump() here since the original code uses nedev_dbg(). Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Jie Deng <jiedeng@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21phylink: avoid attaching more than one PHYRussell King
Attaching more than one PHY to phylink is bad news, as we store a pointer to the PHY in a single location. Error out if more than one PHY is attempted to be attached. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21Merge branch 'net-zerocopy-fixes'David S. Miller
Saeed Mahameed says: =================== Mellanox, mlx5 fixes 2017-12-19 The follwoing series includes some fixes for mlx5 core and etherent driver. Please pull and let me know if there is any problem. This series doesn't introduce any conflict with the ongoing mlx5 for-next submission. For -stable: kernels >= v4.7.y ("net/mlx5e: Fix possible deadlock of VXLAN lock") ("net/mlx5e: Add refcount to VXLAN structure") ("net/mlx5e: Prevent possible races in VXLAN control flow") ("net/mlx5e: Fix features check of IPv6 traffic") kernels >= v4.9.y ("net/mlx5: Fix error flow in CREATE_QP command") ("net/mlx5: Fix rate limit packet pacing naming and struct") kernels >= v4.13.y ("net/mlx5: FPGA, return -EINVAL if size is zero") kernels >= v4.14.y ("Revert "mlx5: move affinity hints assignments to generic code") All above patches apply and compile with no issues on corresponding -stable. =================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21skbuff: skb_copy_ubufs must release uarg even without user fragsWillem de Bruijn
skb_copy_ubufs creates a private copy of frags[] to release its hold on user frags, then calls uarg->callback to notify the owner. Call uarg->callback even when no frags exist. This edge case can happen when zerocopy_sg_from_iter finds enough room in skb_headlen to copy all the data. Fixes: 3ece782693c4 ("sock: skb_copy_ubufs support for compound pages") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21skbuff: orphan frags before zerocopy cloneWillem de Bruijn
Call skb_zerocopy_clone after skb_orphan_frags, to avoid duplicate calls to skb_uarg(skb)->callback for the same data. skb_zerocopy_clone associates skb_shinfo(skb)->uarg from frag_skb with each segment. This is only safe for uargs that do refcounting, which is those that pass skb_orphan_frags without dropping their shared frags. For others, skb_orphan_frags drops the user frags and sets the uarg to NULL, after which sock_zerocopy_clone has no effect. Qemu hangs were reported due to duplicate vhost_net_zerocopy_callback calls for the same data causing the vhost_net_ubuf_ref_>refcount to drop below zero. Link: http://lkml.kernel.org/r/<CAF=yD-LWyCD4Y0aJ9O0e_CHLR+3JOeKicRRTEVCPxgw4XOcqGQ@mail.gmail.com> Fixes: 1f8b977ab32d ("sock: enable MSG_ZEROCOPY") Reported-by: Andreas Hartmann <andihartmann@01019freenet.de> Reported-by: David Hill <dhill@redhat.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "It's been a few weeks, so here's a small collection of fixes that should go into the current series. This contains: - NVMe pull request from Christoph, with a few important fixes. - kyber hang fix from Omar. - A blk-throttl fix from Shaohua, fixing a case where we double charge a bio. - Two call_single_data alignment fixes from me, fixing up some unfortunate changes that went into 4.14 without being properly reviewed on the block side (since nobody was CC'ed on the patch...). - A bounce buffer fix in two parts, one from me and one from Ming. - Revert bdi debug error handling patch. It's causing boot issues for some folks, and a week down the line, we're still no closer to a fix. Revert this patch for now until it's figured out, then we can retry for 4.16" * 'for-linus' of git://git.kernel.dk/linux-block: Revert "bdi: add error handle for bdi_debug_register" null_blk: unalign call_single_data block: unalign call_single_data in struct request block-throttle: avoid double charge block: fix blk_rq_append_bio block: don't let passthrough IO go into .make_request_fn() nvme: setup streams after initializing namespace head nvme: check hw sectors before setting chunk sectors nvme: call blk_integrity_unregister after queue is cleaned up nvme-fc: remove double put reference if admin connect fails nvme: set discard_alignment to zero kyber: fix another domain token wait queue hang
2017-12-21Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "ARM fixes: - A bug in handling of SPE state for non-vhe systems - A fix for a crash on system shutdown - Three timer fixes, introduced by the timer optimizations for v4.15 x86 fixes: - fix for a WARN that was introduced in 4.15 - fix for SMM when guest uses PCID - fixes for several bugs found by syzkaller ... and a dozen papercut fixes for the kvm_stat tool" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits) tools/kvm_stat: sort '-f help' output kvm: x86: fix RSM when PCID is non-zero KVM: Fix stack-out-of-bounds read in write_mmio KVM: arm/arm64: Fix timer enable flow KVM: arm/arm64: Properly handle arch-timer IRQs after vtimer_save_state KVM: arm/arm64: timer: Don't set irq as forwarded if no usable GIC KVM: arm/arm64: Fix HYP unmapping going off limits arm64: kvm: Prevent restoring stale PMSCR_EL1 for vcpu KVM/x86: Check input paging mode when cs.l is set tools/kvm_stat: add line for totals tools/kvm_stat: stop ignoring unhandled arguments tools/kvm_stat: suppress usage information on command line errors tools/kvm_stat: handle invalid regular expressions tools/kvm_stat: add hint on '-f help' to man page tools/kvm_stat: fix child trace events accounting tools/kvm_stat: fix extra handling of 'help' with fields filter tools/kvm_stat: fix missing field update after filter change tools/kvm_stat: fix drilldown in events-by-guests mode tools/kvm_stat: fix command line option '-g' kvm: x86: fix WARN due to uninitialized guest FPU state ...
2017-12-21net: ibm: emac: support RGMII-[RX|TX]ID phymodeChristian Lamparter
The RGMII spec allows compliance for devices that implement an internal delay on TXC and/or RXC inside the transmitter. This patch adds the necessary RGMII_[RX|TX]ID mode code to handle such PHYs with the emac driver. Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: ibm: emac: replace custom PHY_MODE_* macrosChristian Lamparter
The ibm_emac driver predates the PHY_INTERFACE_MODE_* enums by a few years. And while the driver has been retrofitted to use the PHYLIB, the old definitions have stuck around to this day. This patch replaces all occurences of PHY_MODE_* with the respective equivalent PHY_INTERFACE_MODE_* enum. And finally, it purges the old macros for good. Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: ibm: emac: replace custom rgmii_mode_name with phy_modesChristian Lamparter
phy_modes() in the common phy.h already defines the same phy mode names in lower case. The deleted rgmii_mode_name() is used only in one place and for a "notice-level" printk. Hence, it will not be missed. Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: reevalulate autoflowlabel setting after sysctl settingShaohua Li
sysctl.ip6.auto_flowlabels is default 1. In our hosts, we set it to 2. If sockopt doesn't set autoflowlabel, outcome packets from the hosts are supposed to not include flowlabel. This is true for normal packet, but not for reset packet. The reason is ipv6_pinfo.autoflowlabel is set in sock creation. Later if we change sysctl.ip6.auto_flowlabels, the ipv6_pinfo.autoflowlabel isn't changed, so the sock will keep the old behavior in terms of auto flowlabel. Reset packet is suffering from this problem, because reset packet is sent from a special control socket, which is created at boot time. Since sysctl.ipv6.auto_flowlabels is 1 by default, the control socket will always have its ipv6_pinfo.autoflowlabel set, even after user set sysctl.ipv6.auto_flowlabels to 1, so reset packset will always have flowlabel. Normal sock created before sysctl setting suffers from the same issue. We can't even turn off autoflowlabel unless we kill all socks in the hosts. To fix this, if IPV6_AUTOFLOWLABEL sockopt is used, we use the autoflowlabel setting from user, otherwise we always call ip6_default_np_autolabel() which has the new settings of sysctl. Note, this changes behavior a little bit. Before commit 42240901f7c4 (ipv6: Implement different admin modes for automatic flow labels), the autoflowlabel behavior of a sock isn't sticky, eg, if sysctl changes, existing connection will change autoflowlabel behavior. After that commit, autoflowlabel behavior is sticky in the whole life of the sock. With this patch, the behavior isn't sticky again. Cc: Martin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Tom Herbert <tom@quantonium.net> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21openvswitch: Fix pop_vlan action for double tagged framesEric Garver
skb_vlan_pop() expects skb->protocol to be a valid TPID for double tagged frames. So set skb->protocol to the TPID and let skb_vlan_pop() shift the true ethertype into position for us. Fixes: 5108bbaddc37 ("openvswitch: add processing of L3 packets") Signed-off-by: Eric Garver <e@erig.me> Reviewed-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21Merge branch 'net-sched-extack'David S. Miller
Alexander Aring says: ==================== net: sched: sch: introduce extack support this patch series basically add support for extack in common qdisc handling. Additional it adds extack pointer to common qdisc callback handling this offers per qdisc implementation to setting the extack message for each failure over netlink. The extack message will be set deeper in qdisc functions but going not deeper as net core api. For qdisc module callback handling, the extack will not be set. This will be part of per qdisc extack handling. I also want to prepare patches to handle extack per qdisc module... so there will come a lot of more patches, just cut them down to make it reviewable. There are some above 80-chars width warnings, which I ignore because it looks more ugly otherwise. This patch-series based on patches by David Ahern which gave me some hints how to deal with extack support. Cc: David Ahern <dsahern@gmail.com> changes since v4: - rebase on current net-next/master - fix several typos (also David Ahren to Ahern, I am sorry) - Add acked by Jamal changes since v3: - remove patch 2/2 lib: nlattr: set extack msg if validate_nla fails since David Ahern has a better solution - Remove check on net admin permission since -EPERM indicates it already - Change rtab to "rate table" - this is what it's stands for - Fix cbs *not* support messages - Fix tcf block error message for allocation, allocation will be still there because there are multiple places which returns -ENOMEM - Finnally also took care about sch_atm, sorry somehow I forgot this one and I hope I didn't forgot any sch implementation to add new callback parameters changes since v2: - add fix coding style patch to catch all checkpatch warnings - add patch for setting netlink extack msg if validate_nla fails - changes in handle generic qdisc errors - remove NL_SET_ERR_MSG from memory allocation errors - remove NL_SET_ERR_MSG from device not found - change STAB to table size - add various new patches to add extack support for common TC functions like qdisc_get_rtab, tcf_block_get, qdisc_alloc and qdisc_create_dflt - users which are interessted in the detailed error messages can assign extack, otherwise NULL. - Add sch_cbq as example for qdisc_ops callback: init, qdisc_class_ops callbacks: change and graft - Add sch_cbs as example for qdisc_ops callback: change - Add sch_drr as example for qdisc_class ops callbacks: tcf_block ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sch: sch_drr: add extack supportAlexander Aring
This patch adds extack support for the drr qdisc implementation by adding NL_SET_ERR_MSG in validation of user input. Also it serves to illustrate a use case of how the infrastructure ops api changes are to be used by individual qdiscs. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sch: sch_cbs: add extack supportAlexander Aring
This patch adds extack support for the cbs qdisc implementation by adding NL_SET_ERR_MSG in validation of user input. Also it serves to illustrate a use case of how the infrastructure ops api changes are to be used by individual qdiscs. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sch: sch_cbq: add extack supportAlexander Aring
This patch adds extack support for the cbq qdisc implementation by adding NL_SET_ERR_MSG in validation of user input. Also it serves to illustrate a use case of how the infrastructure ops api changes are to be used by individual qdiscs. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sch: api: add extack support in qdisc_create_dfltAlexander Aring
This patch adds extack support for the function qdisc_create_dflt which is a common used function in the tc subsystem. Callers which are interested in the receiving error can assign extack to get a more detailed information why qdisc_create_dflt failed. The function qdisc_create_dflt will also call an init callback which can fail by any per-qdisc specific handling. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sch: api: add extack support in qdisc_allocAlexander Aring
This patch adds extack support for the function qdisc_alloc which is a common used function in the tc subsystem. Callers which are interested in the receiving error can assign extack to get a more detailed information why qdisc_alloc failed. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sch: api: add extack support in tcf_block_getAlexander Aring
This patch adds extack support for the function tcf_block_get which is a common used function in the tc subsystem. Callers which are interested in the receiving error can assign extack to get a more detailed information why tcf_block_get failed. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sch: api: add extack support in qdisc_get_rtabAlexander Aring
This patch adds extack support for the function qdisc_get_rtab which is a common used function in the tc subsystem. Callers which are interested in the receiving error can assign extack to get a more detailed information why qdisc_get_rtab failed. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sched: sch: add extack for graft callbackAlexander Aring
This patch adds extack support for graft callback to prepare per-qdisc specific changes for extack. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sched: sch: add extack for block callbackAlexander Aring
This patch adds extack support for block callback to prepare per-qdisc specific changes for extack. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sched: sch: add extack to change classAlexander Aring
This patch adds extack support for class change callback api. This prepares to handle extack support inside each specific class implementation. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sched: sch: add extack for change qdisc opsAlexander Aring
This patch adds extack support for change callback for qdisc ops structtur to prepare per-qdisc specific changes for extack. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sched: sch: add extack for init callbackAlexander Aring
This patch adds extack support for init callback to prepare per-qdisc specific changes for extack. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sched: sch_api: handle generic qdisc errorsAlexander Aring
This patch adds extack support for generic qdisc handling. The extack will be set deeper to each called function which is not part of netdev core api. Cc: David Ahern <dsahern@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21net: sched: fix coding style issuesAlexander Aring
This patch fix checkpatch issues for upcomming patches according to the sched api file. It changes mostly how to check on null pointer. Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Alexander Aring <aring@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21Revert "bdi: add error handle for bdi_debug_register"Jens Axboe
This reverts commit a0747a859ef6d3cc5b6cd50eb694499b78dd0025. It breaks some booting for some users, and more than a week into this, there's still no good fix. Revert this commit for now until a solution has been found. Reported-by: Laura Abbott <labbott@redhat.com> Reported-by: Bruno Wolff III <bruno@wolff.to> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-12-21ipv6: Honor specified parameters in fibmatch lookupIdo Schimmel
Currently, parameters such as oif and source address are not taken into account during fibmatch lookup. Example (IPv4 for reference) before patch: $ ip -4 route show 192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1 198.51.100.0/24 dev dummy1 proto kernel scope link src 198.51.100.1 $ ip -6 route show 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium 2001:db8:2::/64 dev dummy1 proto kernel metric 256 pref medium fe80::/64 dev dummy0 proto kernel metric 256 pref medium fe80::/64 dev dummy1 proto kernel metric 256 pref medium $ ip -4 route get fibmatch 192.0.2.2 oif dummy0 192.0.2.0/24 dev dummy0 proto kernel scope link src 192.0.2.1 $ ip -4 route get fibmatch 192.0.2.2 oif dummy1 RTNETLINK answers: No route to host $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium After: $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy0 2001:db8:1::/64 dev dummy0 proto kernel metric 256 pref medium $ ip -6 route get fibmatch 2001:db8:1::2 oif dummy1 RTNETLINK answers: Network is unreachable The problem stems from the fact that the necessary route lookup flags are not set based on these parameters. Instead of duplicating the same logic for fibmatch, we can simply resolve the original route from its copy and dump it instead. Fixes: 18c3a61c4264 ("net: ipv6: RTM_GETROUTE: return matched fib result when requested") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21tools/kvm_stat: sort '-f help' outputStefan Raspl
Sort the fields returned by specifying '-f help' on the command line. While at it, simplify the code a bit, indent the output and eliminate an extra blank line at the beginning. Signed-off-by: Stefan Raspl <raspl@linux.vnet.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-21kvm: x86: fix RSM when PCID is non-zeroPaolo Bonzini
rsm_load_state_64() and rsm_enter_protected_mode() load CR3, then CR4 & ~PCIDE, then CR0, then CR4. However, setting CR4.PCIDE fails if CR3[11:0] != 0. It's probably easier in the long run to replace rsm_enter_protected_mode() with an emulator callback that sets all the special registers (like KVM_SET_SREGS would do). For now, set the PCID field of CR3 only after CR4.PCIDE is 1. Reported-by: Laszlo Ersek <lersek@redhat.com> Tested-by: Laszlo Ersek <lersek@redhat.com> Fixes: 660a5d517aaab9187f93854425c4c63f4a09195c Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2017-12-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2017-12-21 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix multiple security issues in the BPF verifier mostly related to the value and min/max bounds tracking rework in 4.14. Issues range from incorrect bounds calculation in some BPF_RSH cases, to improper sign extension and reg size handling on 32 bit ALU ops, missing strict alignment checks on stack pointers, and several others that got fixed, from Jann, Alexei and Edward. 2) Fix various build failures in BPF selftests on sparc64. More specifically, librt needed to be added to the libs to link against and few format string fixups for sizeof, from David. 3) Fix one last remaining issue from BPF selftest build that was still occuring on s390x from the asm/bpf_perf_event.h include which could not find the asm/ptrace.h copy, from Hendrik. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2017-12-21bpf: do not allow root to mangle valid pointersAlexei Starovoitov
Do not allow root to convert valid pointers into unknown scalars. In particular disallow: ptr &= reg ptr <<= reg ptr += ptr and explicitly allow: ptr -= ptr since pkt_end - pkt == length 1. This minimizes amount of address leaks root can do. In the future may need to further tighten the leaks with kptr_restrict. 2. If program has such pointer math it's likely a user mistake and when verifier complains about it right away instead of many instructions later on invalid memory access it's easier for users to fix their progs. 3. when register holding a pointer cannot change to scalar it allows JITs to optimize better. Like 32-bit archs could use single register for pointers instead of a pair required to hold 64-bit scalars. 4. reduces architecture dependent behavior. Since code: r1 = r10; r1 &= 0xff; if (r1 ...) will behave differently arm64 vs x64 and offloaded vs native. A significant chunk of ptr mangling was allowed by commit f1174f77b50c ("bpf/verifier: rework value tracking") yet some of it was allowed even earlier. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-21Merge branch 'bpf-verifier-sec-fixes'Daniel Borkmann
Alexei Starovoitov says: ==================== This patch set addresses a set of security vulnerabilities in bpf verifier logic discovered by Jann Horn. All of the patches are candidates for 4.14 stable. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-21selftests/bpf: add tests for recent bugfixesJann Horn
These tests should cover the following cases: - MOV with both zero-extended and sign-extended immediates - implicit truncation of register contents via ALU32/MOV32 - implicit 32-bit truncation of ALU32 output - oversized register source operand for ALU32 shift - right-shift of a number that could be positive or negative - map access where adding the operation size to the offset causes signed 32-bit overflow - direct stack access at a ~4GiB offset Also remove the F_LOAD_WITH_STRICT_ALIGNMENT flag from a bunch of tests that should fail independent of what flags userspace passes. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2017-12-21bpf: fix integer overflowsAlexei Starovoitov
There were various issues related to the limited size of integers used in the verifier: - `off + size` overflow in __check_map_access() - `off + reg->off` overflow in check_mem_access() - `off + reg->var_off.value` overflow or 32-bit truncation of `reg->var_off.value` in check_mem_access() - 32-bit truncation in check_stack_boundary() Make sure that any integer math cannot overflow by not allowing pointer math with large values. Also reduce the scope of "scalar op scalar" tracking. Fixes: f1174f77b50c ("bpf/verifier: rework value tracking") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>