summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-04-26blk-mq: count allocated but not started requests in iostats inflightOmar Sandoval
In the legacy block case, we increment the counter right after we allocate the request, not when the driver handles it. In both the legacy and blk-mq cases, part_inc_in_flight() is called from blk_account_io_start() right after we've allocated the request. blk-mq only considers requests started requests as inflight, but this is inconsistent with the legacy definition and the intention in the code. This removes the started condition and instead counts all allocated requests. Fixes: f299b7c7a9de ("blk-mq: provide internal in-flight variant") Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-25Merge tag 'for_v4.17-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fix from Jan Kara: "A fix of a fsnotify race causing panics / softlockups" * tag 'for_v4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: Fix fsnotify_mark_connector race
2018-04-25Merge tag 'scsi-fixes' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Eight bug fixes, one spelling update and one tracepoint addition. The most serious is probably the mptsas write same fix because it means anyone using these controllers sees errors when modern filesystems try to issue discards" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: target: fix crash with iscsi target and dvd scsi: sd_zbc: Avoid that resetting a zone fails sporadically scsi: sd: Defer spinning up drive while SANITIZE is in progress scsi: megaraid_sas: Do not log an error if FW successfully initializes. scsi: ufs: add trace event for ufs upiu scsi: core: remove reference to scsi_show_extd_sense() scsi: mptsas: Disable WRITE SAME scsi: fnic: fix spelling mistake in fnic stats "Abord" -> "Abort" scsi: scsi_debug: IMMED related delay adjustments scsi: iscsi: respond to netlink with unicast when appropriate
2018-04-25Merge tag 'for-linus-20180425' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block updates from Jens Axboe: "I ended up sitting on this about a week longer than I wanted to, since we were hashing out details with a timeout change. I've now killed that patch, so we can flush the existing queue in due time. This contains: - Fix for an old regression, where entering the queue can be disturbed by a signal to the process. This can cause spurious EIO. Fix from Alan Jenkins. - cdrom information leak fix from Dan. - Trivial helper for testing queue FUA from Dave Chinner, part of his O_DIRECT FUA series. - Series of swim fixes from Finn that actually makes it work again. - Loop O_DIRECT corruption fix, which caused data corruption in production for us. From me. - BFQ crash fix from me. - bcache maintainer update. Michael no longer has the time to do it, Coly has stepped up to serve as the new maintainer. - blkcg locking fixes from Jiang Biao. - Revert of a change from this merge window from Ming, that causes an issue on some hardware. - Minor clarification doc addition from Linus Walleij" * tag 'for-linus-20180425' of git://git.kernel.dk/linux-block: (22 commits) Revert "blk-mq: remove code for dealing with remapping queue" block: mq: Add some minor doc for core structs bcache: mark Coly Li as bcache maintainer MAINTAINERS: Remove me as maintainer of bcache blkcg: init root blkcg_gq under lock blkcg: small fix on comment in blkcg_init_queue blkcg: don't hold blkcg lock when deactivating policy block: add blk_queue_fua() helper function cdrom: information leak in cdrom_ioctl_media_changed() bfq-iosched: ensure to clear bic/bfqq pointers when preparing request blk-mq: start request gstate with gen 1 block/swim: Select appropriate drive on device open block/swim: Fix IO error at end of medium block/swim: Check drive type block/swim: Rename macros to avoid inconsistent inverted logic block/swim: Don't log an error message for an invalid ioctl block/swim: Remove extra put_disk() call from error path block/swim: Fix array bounds check m68k/mac: Don't remap SWIM MMIO region loop: handle short DIO reads ...
2018-04-25Merge tag 'riscv-for-linus-4.17-rc3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux Pull RISC-V fixes from Palmer Dabbelt: "This contains three small fixes related to the RISC-V port that I'd like to target for 4.17-rc3: - a Kconfig cleanup to select DMA_DIRECT_OPS instead of redefining it in arch/riscv - the removal of asm/handle_irq.h, which doesn't exist, from our arch header list - the addition of "-no-pie" the link rules for our VDSO-related files, which fixes the build on systems where PIE is enabled by default" * tag 'riscv-for-linus-4.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux: RISC-V: build vdso-dummy.o with -no-pie riscv: there is no <asm/handle_irq.h> riscv: select DMA_DIRECT_OPS instead of redefining it
2018-04-25Merge tag 'dma-mapping-4.17-3' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds
Pull dma-mapping fixes from Christoph Hellwig: "A few small dma-mapping fixes for Linux 4.17-rc3: - don't loop to try GFP_DMA allocations if ZONE_DMA is not actually enabled (regression in 4.16) - don't try to do virt_to_page before we know we actuall have a valid page in dma_common_mmap - a comment fixup related to the above fix" * tag 'dma-mapping-4.17-3' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: postpone cpu addr translation on mmap dma-coherent: clarify dma_mmap_from_dev_coherent documentation dma-direct: don't retry allocation for no-op GFP_DMA
2018-04-25Revert "blk-mq: remove code for dealing with remapping queue"Ming Lei
This reverts commit 37c7c6c76d431dd7ef9c29d95f6052bd425f004c. Turns out some drivers(most are FC drivers) may not use managed IRQ affinity, and has their customized .map_queues meantime, so still keep this code for avoiding regression. Reported-by: Laurence Oberman <loberman@redhat.com> Tested-by: Laurence Oberman <loberman@redhat.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Stefan Haberland <sth@linux.vnet.ibm.com> Cc: Ewan Milne <emilne@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-25block: mq: Add some minor doc for core structsLinus Walleij
As it came up in discussion on the mailing list that the semantic meaning of 'blk_mq_ctx' and 'blk_mq_hw_ctx' isn't completely obvious to everyone, let's add some minimal kerneldoc for a starter. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-25bcache: mark Coly Li as bcache maintainerJens Axboe
Since Michael had to step back, Coly has agreed to be the new maintainer. Mark him as such. Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-25MAINTAINERS: Remove me as maintainer of bcacheMichael Lyle
Too much to do with other projects. I've enjoyed working with everyone here, and hope to occasionally contribute on bcache. Signed-off-by: Michael Lyle <mlyle@lyle.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-24Merge branch 'userns-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull userns bug fix from Eric Biederman: "Just a small fix to properly set the return code on error" * 'userns-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: commoncap: Handle memory allocation failure.
2018-04-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) Fix rtnl deadlock in ipvs, from Julian Anastasov. 2) s390 qeth fixes from Julian Wiedmann (control IO completion stalls, bad MAC address update sequence, request side races on command IO timeouts). 3) Handle seq_file overflow properly in l2tp, from Guillaume Nault. 4) Fix VLAN priority mappings in cpsw driver, from Ivan Khoronzhuk. 5) Packet scheduler ife action fixes (malformed TLV lengths, etc.) from Alexander Aring. 6) Fix out of bounds access in tcp md5 option parser, from Jann Horn. 7) Missing netlink attribute policies in rtm_ipv6_policy table, from Eric Dumazet. 8) Missing socket address length checks in l2tp and pppoe connect, from Guillaume Nault. 9) Fix netconsole over team and bonding, from Xin Long. 10) Fix race with AF_PACKET socket state bitfields, from Willem de Bruijn. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (51 commits) ice: Fix insufficient memory issue in ice_aq_manage_mac_read sfc: ARFS filter IDs net: ethtool: Add missing kernel doc for FEC parameters packet: fix bitfield update race ice: Do not check INTEVENT bit for OICR interrupts ice: Fix incorrect comment for action type ice: Fix initialization for num_nodes_added igb: Fix the transmission mode of queue 0 for Qav mode ixgbevf: ensure xdp_ring resources are free'd on error exit team: fix netconsole setup over team amd-xgbe: Only use the SFP supported transceiver signals amd-xgbe: Improve KR auto-negotiation and training amd-xgbe: Add pre/post auto-negotiation phy hooks pppoe: check sockaddr length in pppoe_connect() l2tp: check sockaddr length in pppol2tp_connect() net: phy: marvell: clear wol event before setting it ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policy bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave tcp: don't read out-of-bounds opsize ibmvnic: Clean actual number of RX or TX pools ...
2018-04-24Merge branch '1GbE' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue Jeff Kirsher says: ==================== Intel Wired LAN Driver Updates 2018-04-24 This series contains fixes to ixgbevf, igb and ice drivers. Colin Ian King fixes the return value on error for the new XDP support that went into ixgbevf for 4.17. Vinicius provides a fix for queue 0 for igb, which was not receiving all the credits it needed when QAV mode was enabled. Anirudh provides several fixes for the new ice driver, starting with properly initializing num_nodes_added to zero. Fixed up a code comment to better reflect what is really going on in the code. Fixed how to detect if an OICR interrupt has occurred to a more reliable method. Md Fahad fixes the ice driver to allocate the right amount of memory when reading and storing the devices MAC addresses. The device can have up to 2 MAC addresses (LAN and WoL), while WoL is currently not supported, we need to ensure it can be properly handled when support is added. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24ice: Fix insufficient memory issue in ice_aq_manage_mac_readMd Fahad Iqbal Polash
For the MAC read operation, the device can return up to two (LAN and WoL) MAC addresses. Without access to adequate memory, the device will return an error. Fixed this by allocating the right amount of memory. Also, logic to detect and copy the LAN MAC address into the port_info structure has been added. Note that the WoL MAC address is ignored currently as the WoL feature isn't supported yet. Fixes: dc49c7723676 ("ice: Get MAC/PHY/link info and scheduler topology") Signed-off-by: Md Fahad Iqbal Polash <md.fahad.iqbal.polash@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24RISC-V: build vdso-dummy.o with -no-pieAurelien Jarno
Debian toolcahin defaults to PIE, and I guess that will also be the case of most distributions. This causes the following build failure: AS arch/riscv/kernel/vdso/getcpu.o AS arch/riscv/kernel/vdso/flush_icache.o VDSOLD arch/riscv/kernel/vdso/vdso.so.dbg OBJCOPY arch/riscv/kernel/vdso/vdso.so AS arch/riscv/kernel/vdso/vdso.o VDSOLD arch/riscv/kernel/vdso/vdso-dummy.o LD arch/riscv/kernel/vdso/vdso-syms.o riscv64-linux-gnu-ld: attempted static link of dynamic object `arch/riscv/kernel/vdso/vdso-dummy.o' make[2]: *** [arch/riscv/kernel/vdso/Makefile:43: arch/riscv/kernel/vdso/vdso-syms.o] Error 1 make[1]: *** [scripts/Makefile.build:575: arch/riscv/kernel/vdso] Error 2 make: *** [Makefile:1018: arch/riscv/kernel] Error 2 While the root Makefile correctly passes "-fno-PIE" to build individual object files, the RISC-V kernel also builds vdso-dummy.o as an executable, which is therefore linked as PIE. Fix that by updating this specific link rule to also include "-no-pie". Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-04-24riscv: there is no <asm/handle_irq.h>Christoph Hellwig
So don't list it as generic-y. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-04-24riscv: select DMA_DIRECT_OPS instead of redefining itChristoph Hellwig
DMA_DIRECT_OPS is defined in lib/Kconfig, so don't duplicate it in arch/riscv/Kconfig. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Palmer Dabbelt <palmer@sifive.com>
2018-04-24sfc: ARFS filter IDsEdward Cree
Associate an arbitrary ID with each ARFS filter, allowing to properly query for expiry. The association is maintained in a hash table, which is protected by a spinlock. v3: fix build warnings when CONFIG_RFS_ACCEL is disabled (thanks lkp-robot). v2: fixed uninitialised variable (thanks davem and lkp-robot). Fixes: 3af0f34290f6 ("sfc: replace asynchronous filter operations") Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24net: ethtool: Add missing kernel doc for FEC parametersFlorian Fainelli
While adding support for ethtool::get_fecparam and set_fecparam, kernel doc for these functions was missed, add those. Fixes: 1a5f3da20bd9 ("net: ethtool: add support for forward error correction modes") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24packet: fix bitfield update raceWillem de Bruijn
Updates to the bitfields in struct packet_sock are not atomic. Serialize these read-modify-write cycles. Move po->running into a separate variable. Its writes are protected by po->bind_lock (except for one startup case at packet_create). Also replace a textual precondition warning with lockdep annotation. All others are set only in packet_setsockopt. Serialize these updates by holding the socket lock. Analogous to other field updates, also hold the lock when testing whether a ring is active (pg_vec). Fixes: 8dc419447415 ("[PACKET]: Add optional checksum computation for recvmsg") Reported-by: DaeRyong Jeong <threeearcat@gmail.com> Reported-by: Byoungyoung Lee <byoungyoung@purdue.edu> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-24ice: Do not check INTEVENT bit for OICR interruptsBen Shelton
According to the hardware spec, checking the INTEVENT bit isn't a reliable way to detect if an OICR interrupt has occurred. This is because this bit can be cleared by the hardware/firmware before the interrupt service routine has run. So instead, just check for OICR events every time. Fixes: 940b61af02f4 ("ice: Initialize PF and setup miscellaneous interrupt") Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24ice: Fix incorrect comment for action typeAnirudh Venkataramanan
Action type 5 defines large action generic values. Fix comment to reflect that better. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24ice: Fix initialization for num_nodes_addedAnirudh Venkataramanan
ice_sched_add_nodes_to_layer is used recursively, and so we start with num_nodes_added being 0. This way, in case of an error or if num_nodes is NULL, the function just returns 0 to indicate that no nodes were added. Fixes: 5513b920a4f7 ("ice: Update Tx scheduler tree for VSI multi-Tx queue support") Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Tony Brelinski <tonyx.brelinski@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24igb: Fix the transmission mode of queue 0 for Qav modeVinicius Costa Gomes
When Qav mode is enabled, queue 0 should be kept on Stream Reservation mode. From the i210 datasheet, section 8.12.19: "Note: Queue0 QueueMode must be set to 1b when TransmitMode is set to Qav." ("QueueMode 1b" represents the Stream Reservation mode) The solution is to give queue 0 the all the credits it might need, so it has priority over queue 1. A situation where this can happen is when cbs is "installed" only on queue 1, leaving queue 0 alone. For example: $ tc qdisc replace dev enp2s0 handle 100: parent root mqprio num_tc 3 \ map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0 $ tc qdisc replace dev enp2s0 parent 100:2 cbs locredit -1470 \ hicredit 30 sendslope -980000 idleslope 20000 offload 1 Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24ixgbevf: ensure xdp_ring resources are free'd on error exitColin Ian King
The current error handling for failed resource setup for xdp_ring data is a break out of the loop and returning 0 indicated everything was OK, when in fact it is not. Fix this by exiting via the error exit label err_setup_tx that will clean up the resources correctly and return and error status. Detected by CoverityScan, CID#1466879 ("Logically dead code") Fixes: 21092e9ce8b1 ("ixgbevf: Add support for XDP_TX action") Signed-off-by: Colin Ian King <colin.king@canonical.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-24team: fix netconsole setup over teamXin Long
The same fix in Commit dbe173079ab5 ("bridge: fix netconsole setup over bridge") is also needed for team driver. While at it, remove the unnecessary parameter *team from team_port_enable_netpoll(). v1->v2: - fix it in a better way, as does bridge. Fixes: 0fb52a27a04a ("team: cleanup netpoll clode") Reported-by: João Avelino Bellomo Filho <jbellomo@redhat.com> Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23Merge branch 'amd-xgbe-fixes'David S. Miller
aTom Lendacky says: ==================== amd-xgbe: AMD XGBE driver fixes 2018-04-23 This patch series addresses some issues in the AMD XGBE driver. The following fixes are included in this driver update series: - Improve KR auto-negotiation and training (2 patches) - Add pre and post auto-negotiation hooks - Use the pre and post auto-negotiation hooks to disable CDR tracking during auto-negotiation page exchange in KR mode - Check for SFP tranceiver signal support and only use the signal if the SFP indicates that it is supported This patch series is based on net. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23amd-xgbe: Only use the SFP supported transceiver signalsTom Lendacky
The SFP eeprom indicates the transceiver signals (Rx LOS, Tx Fault, etc.) that it supports. Update the driver to include checking the eeprom data when deciding whether to use a transceiver signal. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23amd-xgbe: Improve KR auto-negotiation and trainingTom Lendacky
Update xgbe-phy-v2.c to make use of the auto-negotiation (AN) phy hooks to improve the ability to successfully complete Clause 73 AN when running at 10gbps. Hardware can sometimes have issues with CDR lock when the AN DME page exchange is being performed. The AN and KR training hooks are used as follows: - The pre AN hook is used to disable CDR tracking in the PHY so that the DME page exchange can be successfully and consistently completed. - The post KR training hook is used to re-enable the CDR tracking so that KR training can successfully complete. - The post AN hook is used to check for an unsuccessful AN which will increase a CDR tracking enablement delay (up to a maximum value). Add two debugfs entries to allow control over use of the CDR tracking workaround. The debugfs entries allow the CDR tracking workaround to be disabled and determine whether to re-enable CDR tracking before or after link training has been initiated. Also, with these changes the receiver reset cycle that is performed during the link status check can be performed less often. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23amd-xgbe: Add pre/post auto-negotiation phy hooksTom Lendacky
Add hooks to the driver auto-negotiation (AN) flow to allow the different phy implementations to perform any steps necessary to improve AN. Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23pppoe: check sockaddr length in pppoe_connect()Guillaume Nault
We must validate sockaddr_len, otherwise userspace can pass fewer data than we expect and we end up accessing invalid data. Fixes: 224cf5ad14c0 ("ppp: Move the PPP drivers") Reported-by: syzbot+4f03bdf92fdf9ef5ddab@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23l2tp: check sockaddr length in pppol2tp_connect()Guillaume Nault
Check sockaddr_len before dereferencing sp->sa_protocol, to ensure that it actually points to valid data. Fixes: fd558d186df2 ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts") Reported-by: syzbot+a70ac890b23b1bf29f5c@syzkaller.appspotmail.com Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23net: phy: marvell: clear wol event before setting itJingju Hou
If WOL event happened once, the LED[2] interrupt pin will not be cleared unless we read the CSISR register. If interrupts are in use, the normal interrupt handling will clear the WOL event. Let's clear the WOL event before enabling it if !phy_interrupt_is_valid(). Signed-off-by: Jingju Hou <Jingju.Hou@synaptics.com> Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net The following patchset contains Netfilter/IPVS fixes for your net tree, they are: 1) Fix SIP conntrack with phones sending session descriptions for different media types but same port numbers, from Florian Westphal. 2) Fix incorrect rtnl_lock mutex logic from IPVS sync thread, from Julian Anastasov. 3) Skip compat array allocation in ebtables if there is no entries, also from Florian. 4) Do not lose left/right bits when shifting marks from xt_connmark, from Jack Ma. 5) Silence false positive memleak in conntrack extensions, from Cong Wang. 6) Fix CONFIG_NF_REJECT_IPV6=m link problems, from Arnd Bergmann. 7) Cannot kfree rule that is already in list in nf_tables, switch order so this error handling is not required, from Florian Westphal. 8) Release set name in error path, from Florian. 9) include kmemleak.h in nf_conntrack_extend.c, from Stepheh Rothwell. 10) NAT chain and extensions depend on NF_TABLES. 11) Out of bound access when renaming chains, from Taehee Yoo. 12) Incorrect casting in xt_connmark leads to wrong bitshifting. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policyEric Dumazet
KMSAN reported use of uninit-value that I tracked to lack of proper size check on RTA_TABLE attribute. I also believe RTA_PREFSRC lacks a similar check. Fixes: 86872cb57925 ("[IPv6] route: FIB6 configuration using struct fib6_config") Fixes: c3968a857a6b ("ipv6: RTA_PREFSRC support for ipv6 route source address selection") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslaveXin Long
After Commit 8a8efa22f51b ("bonding: sync netpoll code with bridge"), it would set slave_dev npinfo in slave_enable_netpoll when enslaving a dev if bond->dev->npinfo was set. However now slave_dev npinfo is set with bond->dev->npinfo before calling slave_enable_netpoll. With slave_dev npinfo set, __netpoll_setup called in slave_enable_netpoll will not call slave dev's .ndo_netpoll_setup(). It causes that the lower dev of this slave dev can't set its npinfo. One way to reproduce it: # modprobe bonding # brctl addbr br0 # brctl addif br0 eth1 # ifconfig bond0 192.168.122.1/24 up # ifenslave bond0 eth2 # systemctl restart netconsole # ifenslave bond0 br0 # ifconfig eth2 down # systemctl restart netconsole The netpoll won't really work. This patch is to remove that slave_dev npinfo setting in bond_enslave(). Fixes: 8a8efa22f51b ("bonding: sync netpoll code with bridge") Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23tcp: don't read out-of-bounds opsizeJann Horn
The old code reads the "opsize" variable from out-of-bounds memory (first byte behind the segment) if a broken TCP segment ends directly after an opcode that is neither EOL nor NOP. The result of the read isn't used for anything, so the worst thing that could theoretically happen is a pagefault; and since the physmap is usually mostly contiguous, even that seems pretty unlikely. The following C reproducer triggers the uninitialized read - however, you can't actually see anything happen unless you put something like a pr_warn() in tcp_parse_md5sig_option() to print the opsize. ==================================== #define _GNU_SOURCE #include <arpa/inet.h> #include <stdlib.h> #include <errno.h> #include <stdarg.h> #include <net/if.h> #include <linux/if.h> #include <linux/ip.h> #include <linux/tcp.h> #include <linux/in.h> #include <linux/if_tun.h> #include <err.h> #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <string.h> #include <stdio.h> #include <unistd.h> #include <sys/ioctl.h> #include <assert.h> void systemf(const char *command, ...) { char *full_command; va_list ap; va_start(ap, command); if (vasprintf(&full_command, command, ap) == -1) err(1, "vasprintf"); va_end(ap); printf("systemf: <<<%s>>>\n", full_command); system(full_command); } char *devname; int tun_alloc(char *name) { int fd = open("/dev/net/tun", O_RDWR); if (fd == -1) err(1, "open tun dev"); static struct ifreq req = { .ifr_flags = IFF_TUN|IFF_NO_PI }; strcpy(req.ifr_name, name); if (ioctl(fd, TUNSETIFF, &req)) err(1, "TUNSETIFF"); devname = req.ifr_name; printf("device name: %s\n", devname); return fd; } #define IPADDR(a,b,c,d) (((a)<<0)+((b)<<8)+((c)<<16)+((d)<<24)) void sum_accumulate(unsigned int *sum, void *data, int len) { assert((len&2)==0); for (int i=0; i<len/2; i++) { *sum += ntohs(((unsigned short *)data)[i]); } } unsigned short sum_final(unsigned int sum) { sum = (sum >> 16) + (sum & 0xffff); sum = (sum >> 16) + (sum & 0xffff); return htons(~sum); } void fix_ip_sum(struct iphdr *ip) { unsigned int sum = 0; sum_accumulate(&sum, ip, sizeof(*ip)); ip->check = sum_final(sum); } void fix_tcp_sum(struct iphdr *ip, struct tcphdr *tcp) { unsigned int sum = 0; struct { unsigned int saddr; unsigned int daddr; unsigned char pad; unsigned char proto_num; unsigned short tcp_len; } fakehdr = { .saddr = ip->saddr, .daddr = ip->daddr, .proto_num = ip->protocol, .tcp_len = htons(ntohs(ip->tot_len) - ip->ihl*4) }; sum_accumulate(&sum, &fakehdr, sizeof(fakehdr)); sum_accumulate(&sum, tcp, tcp->doff*4); tcp->check = sum_final(sum); } int main(void) { int tun_fd = tun_alloc("inject_dev%d"); systemf("ip link set %s up", devname); systemf("ip addr add 192.168.42.1/24 dev %s", devname); struct { struct iphdr ip; struct tcphdr tcp; unsigned char tcp_opts[20]; } __attribute__((packed)) syn_packet = { .ip = { .ihl = sizeof(struct iphdr)/4, .version = 4, .tot_len = htons(sizeof(syn_packet)), .ttl = 30, .protocol = IPPROTO_TCP, /* FIXUP check */ .saddr = IPADDR(192,168,42,2), .daddr = IPADDR(192,168,42,1) }, .tcp = { .source = htons(1), .dest = htons(1337), .seq = 0x12345678, .doff = (sizeof(syn_packet.tcp)+sizeof(syn_packet.tcp_opts))/4, .syn = 1, .window = htons(64), .check = 0 /*FIXUP*/ }, .tcp_opts = { /* INVALID: trailing MD5SIG opcode after NOPs */ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 19 } }; fix_ip_sum(&syn_packet.ip); fix_tcp_sum(&syn_packet.ip, &syn_packet.tcp); while (1) { int write_res = write(tun_fd, &syn_packet, sizeof(syn_packet)); if (write_res != sizeof(syn_packet)) err(1, "packet write failed"); } } ==================================== Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.") Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-23dma-mapping: postpone cpu addr translation on mmapJacopo Mondi
Postpone calling virt_to_page() translation on memory locations not guaranteed to be backed by a struct page. Try first to map memory from the device coherent memory pool, then perform translation if that fails. On some architectures, specifically SH when configured with the SPARSEMEM memory model, assuming a struct page is always assigned to a memory address lead to unexpected hangs during the virtual to page address translation. This patch fixes that specific issue but applies in the general case too. Suggested-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Jacopo Mondi <jacopo+renesas@jmondi.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-04-23dma-coherent: clarify dma_mmap_from_dev_coherent documentationRobin Murphy
The use of "correctly mapped" here is misleading, since it can give the wrong expectation in the case that the memory *should* have been mapped from the per-device pool, but doing so failed for other reasons. Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-04-23dma-direct: don't retry allocation for no-op GFP_DMATakashi Iwai
When an allocation with lower dma_coherent mask fails, dma_direct_alloc() retries the allocation with GFP_DMA. But, this is useless for architectures that hav no ZONE_DMA. Fix it by adding the check of CONFIG_ZONE_DMA before retrying the allocation. Fixes: 95f183916d4b ("dma-direct: retry allocations using GFP_DMA for small masks") Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-04-22Linux 4.17-rc2Linus Torvalds
2018-04-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller
Daniel Borkmann says: ==================== pull-request: bpf 2018-04-21 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Fix a deadlock between mm->mmap_sem and bpf_event_mutex when one task is detaching a BPF prog via perf_event_detach_bpf_prog() and another one dumping through bpf_prog_array_copy_info(). For the latter we move the copy_to_user() out of the bpf_event_mutex lock to fix it, from Yonghong. 2) Fix test_sock and test_sock_addr.sh failures. The former was hitting rlimit issues and the latter required ping to specify the address family, from Yonghong. 3) Remove a dead check in sockmap's sock_map_alloc(), from Jann. 4) Add generated files to BPF kselftests gitignore that were previously missed, from Anders. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22ibmvnic: Clean actual number of RX or TX poolsThomas Falcon
Avoid using value stored in the login response buffer when cleaning TX and RX buffer pools since these could be inconsistent depending on the device state. Instead use the field in the driver's private data that tracks the number of active pools. Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22Merge branch 'net-sched-ife-malformed-ife-packet-fixes'David S. Miller
Alexander Aring says: ==================== net: sched: ife: malformed ife packet fixes As promised at netdev 2.2 tc workshop I am working on adding scapy support for tdc testing. It is still work in progress. I will submit the patches to tdc later (they are not in good shape yet). The good news is I have been able to find bugs which normal packet testing would not be able to find. With fuzzy testing I was able to craft certain malformed packets that IFE action was not able to deal with. This patch set fixes those bugs. changes since v4: - use pskb_may_pull before pointer assign changes since v3: - use pskb_may_pull changes since v2: - remove inline from __ife_tlv_meta_valid - add const to cast to meta_tlvhdr - add acked and reviewed tags ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22net: sched: ife: check on metadata lengthAlexander Aring
This patch checks if sk buffer is available to dererence ife header. If not then NULL will returned to signal an malformed ife packet. This avoids to crashing the kernel from outside. Signed-off-by: Alexander Aring <aring@mojatatu.com> Reviewed-by: Yotam Gigi <yotam.gi@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22net: sched: ife: handle malformed tlv lengthAlexander Aring
There is currently no handling to check on a invalid tlv length. This patch adds such handling to avoid killing the kernel with a malformed ife packet. Signed-off-by: Alexander Aring <aring@mojatatu.com> Reviewed-by: Yotam Gigi <yotam.gi@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22net: sched: ife: signal not finding metaidAlexander Aring
We need to record stats for received metadata that we dont know how to process. Have find_decode_metaid() return -ENOENT to capture this. Signed-off-by: Alexander Aring <aring@mojatatu.com> Reviewed-by: Yotam Gigi <yotam.gi@gmail.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22strparser: Do not call mod_delayed_work with a timeout of LONG_MAXDoron Roberts-Kedes
struct sock's sk_rcvtimeo is initialized to LONG_MAX/MAX_SCHEDULE_TIMEOUT in sock_init_data. Calling mod_delayed_work with a timeout of LONG_MAX causes spurious execution of the work function. timer->expires is set equal to jiffies + LONG_MAX. When timer_base->clk falls behind the current value of jiffies, the delta between timer_base->clk and jiffies + LONG_MAX causes the expiration to be in the past. Returning early from strp_start_timer if timeo == LONG_MAX solves this problem. Found while testing net/tls_sw recv path. Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages") Reviewed-by: Tejun Heo <tj@kernel.org> Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22ipv6: sr: fix NULL pointer dereference in seg6_do_srh_encap()- v4 pktsAhmed Abdelsalam
In case of seg6 in encap mode, seg6_do_srh_encap() calls set_tun_src() in order to set the src addr of outer IPv6 header. The net_device is required for set_tun_src(). However calling ip6_dst_idev() on dst_entry in case of IPv4 traffic results on the following bug. Using just dst->dev should fix this BUG. [ 196.242461] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 [ 196.242975] PGD 800000010f076067 P4D 800000010f076067 PUD 10f060067 PMD 0 [ 196.243329] Oops: 0000 [#1] SMP PTI [ 196.243468] Modules linked in: nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd input_leds glue_helper led_class pcspkr serio_raw mac_hid video autofs4 hid_generic usbhid hid e1000 i2c_piix4 ahci pata_acpi libahci [ 196.244362] CPU: 2 PID: 1089 Comm: ping Not tainted 4.16.0+ #1 [ 196.244606] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 196.244968] RIP: 0010:seg6_do_srh_encap+0x1ac/0x300 [ 196.245236] RSP: 0018:ffffb2ce00b23a60 EFLAGS: 00010202 [ 196.245464] RAX: 0000000000000000 RBX: ffff8c7f53eea300 RCX: 0000000000000000 [ 196.245742] RDX: 0000f10000000000 RSI: ffff8c7f52085a6c RDI: ffff8c7f41166850 [ 196.246018] RBP: ffffb2ce00b23aa8 R08: 00000000000261e0 R09: ffff8c7f41166800 [ 196.246294] R10: ffffdce5040ac780 R11: ffff8c7f41166828 R12: ffff8c7f41166808 [ 196.246570] R13: ffff8c7f52085a44 R14: ffffffffb73211c0 R15: ffff8c7e69e44200 [ 196.246846] FS: 00007fc448789700(0000) GS:ffff8c7f59d00000(0000) knlGS:0000000000000000 [ 196.247286] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 196.247526] CR2: 0000000000000000 CR3: 000000010f05a000 CR4: 00000000000406e0 [ 196.247804] Call Trace: [ 196.247972] seg6_do_srh+0x15b/0x1c0 [ 196.248156] seg6_output+0x3c/0x220 [ 196.248341] ? prandom_u32+0x14/0x20 [ 196.248526] ? ip_idents_reserve+0x6c/0x80 [ 196.248723] ? __ip_select_ident+0x90/0x100 [ 196.248923] ? ip_append_data.part.50+0x6c/0xd0 [ 196.249133] lwtunnel_output+0x44/0x70 [ 196.249328] ip_send_skb+0x15/0x40 [ 196.249515] raw_sendmsg+0x8c3/0xac0 [ 196.249701] ? _copy_from_user+0x2e/0x60 [ 196.249897] ? rw_copy_check_uvector+0x53/0x110 [ 196.250106] ? _copy_from_user+0x2e/0x60 [ 196.250299] ? copy_msghdr_from_user+0xce/0x140 [ 196.250508] sock_sendmsg+0x36/0x40 [ 196.250690] ___sys_sendmsg+0x292/0x2a0 [ 196.250881] ? _cond_resched+0x15/0x30 [ 196.251074] ? copy_termios+0x1e/0x70 [ 196.251261] ? _copy_to_user+0x22/0x30 [ 196.251575] ? tty_mode_ioctl+0x1c3/0x4e0 [ 196.251782] ? _cond_resched+0x15/0x30 [ 196.251972] ? mutex_lock+0xe/0x30 [ 196.252152] ? vvar_fault+0xd2/0x110 [ 196.252337] ? __do_fault+0x1f/0xc0 [ 196.252521] ? __handle_mm_fault+0xc1f/0x12d0 [ 196.252727] ? __sys_sendmsg+0x63/0xa0 [ 196.252919] __sys_sendmsg+0x63/0xa0 [ 196.253107] do_syscall_64+0x72/0x200 [ 196.253305] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [ 196.253530] RIP: 0033:0x7fc4480b0690 [ 196.253715] RSP: 002b:00007ffde9f252f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 196.254053] RAX: ffffffffffffffda RBX: 0000000000000040 RCX: 00007fc4480b0690 [ 196.254331] RDX: 0000000000000000 RSI: 000000000060a360 RDI: 0000000000000003 [ 196.254608] RBP: 00007ffde9f253f0 R08: 00000000002d1e81 R09: 0000000000000002 [ 196.254884] R10: 00007ffde9f250c0 R11: 0000000000000246 R12: 0000000000b22070 [ 196.255205] R13: 20c49ba5e353f7cf R14: 431bde82d7b634db R15: 00007ffde9f278fe [ 196.255484] Code: a5 0f b6 45 c0 41 88 41 28 41 0f b6 41 2c 48 c1 e0 04 49 8b 54 01 38 49 8b 44 01 30 49 89 51 20 49 89 41 18 48 8b 83 b0 00 00 00 <48> 8b 30 49 8b 86 08 0b 00 00 48 8b 40 20 48 8b 50 08 48 0b 10 [ 196.256190] RIP: seg6_do_srh_encap+0x1ac/0x300 RSP: ffffb2ce00b23a60 [ 196.256445] CR2: 0000000000000000 [ 196.256676] ---[ end trace 71af7d093603885c ]--- Fixes: 8936ef7604c11 ("ipv6: sr: fix NULL pointer dereference when setting encap source address") Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com> Acked-by: David Lebrun <dlebrun@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-22Merge tag 'drm-fixes-for-v4.17-rc2' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Exynos, i915, vc4, amdgpu fixes. i915: - an oops fix - two race fixes - some gvt fixes amdgpu: - dark screen fix - clk/voltage fix - vega12 smu fix vc4: - memory leak fix exynos just drops some code" * tag 'drm-fixes-for-v4.17-rc2' of git://people.freedesktop.org/~airlied/linux: (23 commits) drm/amd/powerplay: header file interface to SMU update drm/amd/pp: Fix bug voltage can't be OD separately on VI drm/amd/display: Don't program bypass on linear regamma LUT drm/i915: Fix LSPCON TMDS output buffer enabling from low-power state drm/i915/audio: Fix audio detection issue on GLK drm/i915: Call i915_perf_fini() on init_hw error unwind drm/i915/bios: filter out invalid DDC pins from VBT child devices drm/i915/pmu: Inspect runtime PM state more carefully while estimating RC6 drm/i915: Do no use kfree() to free a kmem_cache_alloc() return value drm/exynos: exynos_drm_fb -> drm_framebuffer drm/exynos: Move dma_addr out of exynos_drm_fb drm/exynos: Move GEM BOs to drm_framebuffer drm: Fix HDCP downstream dev count read drm/vc4: Fix memory leak during BO teardown drm/i915/execlists: Clear user-active flag on preemption completion drm/i915/gvt: Add drm_format_mod update drm/i915/gvt: Disable primary/sprite/cursor plane at virtual display initialization drm/i915/gvt: Delete redundant error message in fb_decode.c drm/i915/gvt: Cancel dma map when resetting ggtt entries drm/i915/gvt: Missed to cancel dma map for ggtt entries ...