summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2016-12-23ipvlan: fix multicast processingMahesh Bandewar
In an IPvlan setup when master is set in loopback mode e.g. ethtool -K eth0 set loopback on where eth0 is master device for IPvlan setup. The failure is caused by the faulty logic that determines if the packet is from TX-path vs. RX-path by just looking at the mac- addresses on the packet while processing multicast packets. In the loopback-mode where this crash was happening, the packets that are sent out are reflected by the NIC and are processed on the RX path, but mac-address check tricks into thinking this packet is from TX path and falsely uses dev_forward_skb() to pass packets to the slave (virtual) devices. This patch records the path while queueing packets and eliminates logic of looking at mac-addresses for the same decision. ------------[ cut here ]------------ kernel BUG at include/linux/skbuff.h:1737! Call Trace: [<ffffffff921fbbc2>] dev_forward_skb+0x92/0xd0 [<ffffffffc031ac65>] ipvlan_process_multicast+0x395/0x4c0 [ipvlan] [<ffffffffc031a9a7>] ? ipvlan_process_multicast+0xd7/0x4c0 [ipvlan] [<ffffffff91cdfea7>] ? process_one_work+0x147/0x660 [<ffffffff91cdff09>] process_one_work+0x1a9/0x660 [<ffffffff91cdfea7>] ? process_one_work+0x147/0x660 [<ffffffff91ce086d>] worker_thread+0x11d/0x360 [<ffffffff91ce0750>] ? rescuer_thread+0x350/0x350 [<ffffffff91ce960b>] kthread+0xdb/0xe0 [<ffffffff91c05c70>] ? _raw_spin_unlock_irq+0x30/0x50 [<ffffffff91ce9530>] ? flush_kthread_worker+0xc0/0xc0 [<ffffffff92348b7a>] ret_from_fork+0x9a/0xd0 [<ffffffff91ce9530>] ? flush_kthread_worker+0xc0/0xc0 Fixes: ba35f8588f47 ("ipvlan: Defer multicast / broadcast processing to a work-queue") Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23ipvlan: fix various issues in ipvlan_process_multicast()Eric Dumazet
1) netif_rx() / dev_forward_skb() should not be called from process context. 2) ipvlan_count_rx() should be called with preemption disabled. 3) We should check if ipvlan->dev is up before feeding packets to netif_rx() 4) We need to prevent device from disappearing if some packets are in the multicast backlog. 5) One kfree_skb() should be a consume_skb() eventually Fixes: ba35f8588f47 ("ipvlan: Defer multicast / broadcast processing to a work-queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: 1) We have to be careful to not try and place a checksum after the end of a rawv6 packet, fix from Dave Jones with help from Hannes Frederic Sowa. 2) Missing memory barriers in tcp_tasklet_func() lead to crashes, from Eric Dumazet. 3) Several bug fixes for the new XDP support in virtio_net, from Jason Wang. 4) Increase headroom in RX skbs in be2net driver to accomodate encapsulations such as geneve. From Kalesh A P. 5) Fix SKB frag unmapping on TX in mvpp2, from Thomas Petazzoni. 6) Pre-pulling UDP headers created a regression in RECVORIGDSTADDR socket option support, from Willem de Bruijn. 7) UID based routing added a potential OOPS in ip_do_redirect() when we see an SKB without a socket attached. We just need it for the network namespace which we can get from skb->dev instead. Fix from Lorenzo Colitti. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits) sctp: fix recovering from 0 win with small data chunks sctp: do not loose window information if in rwnd_over virtio-net: XDP support for small buffers virtio-net: remove big packet XDP codes virtio-net: forbid XDP when VIRTIO_NET_F_GUEST_UFO is support virtio-net: make rx buf size estimation works for XDP virtio-net: unbreak csumed packets for XDP_PASS virtio-net: correctly handle XDP_PASS for linearized packets virtio-net: fix page miscount during XDP linearizing virtio-net: correctly xmit linearized page on XDP_TX virtio-net: remove the warning before XDP linearizing mlxsw: spectrum_router: Correctly remove nexthop groups mlxsw: spectrum_router: Don't reflect dead neighs neigh: Send netevent after marking neigh as dead ipv6: handle -EFAULT from skb_copy_bits inet: fix IP(V6)_RECVORIGDSTADDR for udp sockets net/sched: cls_flower: Mandate mask when matching on flags net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6 stmmac: CSR clock configuration fix net: ipv4: Don't crash if passing a null sk to ip_do_redirect. ...
2016-12-23sctp: fix recovering from 0 win with small data chunksMarcelo Ricardo Leitner
Currently if SCTP closes the receive window with window pressure, mostly caused by excessive skb overhead on payload/overheads ratio, SCTP will close the window abruptly while saving the delta on rwnd_press. It will start recovering rwnd as the chunks are consumed by the application and the rwnd_press will be only recovered after rwnd reach the same value as of rwnd_press, mostly to prevent silly window syndrome. Thing is, this is very inefficient with small data chunks, as with those it will never reach back that value, and thus it will never recover from such pressure. This means that we will not issue window updates when recovering from 0 window and will rely on a sender retransmit to notice it. The fix here is to remove such threshold, as no value is good enough: it depends on the (avg) chunk sizes being used. Test with netperf -t SCTP_STREAM -- -m 1, and trigger 0 window by sending SIGSTOP to netserver, sleep 1.2, and SIGCONT. Rate limited to 845kbps, for visibility. Capture done at netserver side. Previously: 01.500751 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632372996] [a_rwnd 99153] [ 01.500752 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632372997] [SID: 0] [SS 01.517471 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS 01.517483 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap 01.517485 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS 01.517488 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap 01.534168 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373096] [SID: 0] [SS 01.534180 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap 01.534181 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373169] [SID: 0] [SS 01.534185 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap 02.525978 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS 02.526021 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373009] [a_rwnd 0] [#gap (window update missed) 04.573807 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373010] [SID: 0] [SS 04.779370 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373082] [a_rwnd 859] [#g 04.789162 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373083] [SID: 0] [SS 04.789323 IP A.36925 > B.48277: sctp (1) [DATA] (B)(E) [TSN: 632373156] [SID: 0] [SS 04.789372 IP B.48277 > A.36925: sctp (1) [SACK] [cum ack 632373228] [a_rwnd 786] [#g After: 02.568957 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098728] [a_rwnd 99153] 02.568961 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098729] [SID: 0] [S 02.585631 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S 02.585666 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga 02.585671 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S 02.585683 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga 02.602330 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098828] [SID: 0] [S 02.602359 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga 02.602363 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098901] [SID: 0] [S 02.602372 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga 03.600788 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S 03.600830 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 0] [#ga 03.619455 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 13508] 03.619479 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 27017] 03.619497 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 40526] 03.619516 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 54035] 03.619533 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 67544] 03.619552 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 81053] 03.619570 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098741] [a_rwnd 94562] (following data transmission triggered by window updates above) 03.633504 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098742] [SID: 0] [S 03.836445 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098814] [a_rwnd 100000] 03.843125 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098815] [SID: 0] [S 03.843285 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098888] [SID: 0] [S 03.843345 IP B.50536 > A.55173: sctp (1) [SACK] [cum ack 2490098960] [a_rwnd 99894] 03.856546 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490098961] [SID: 0] [S 03.866450 IP A.55173 > B.50536: sctp (1) [DATA] (B)(E) [TSN: 2490099011] [SID: 0] [S Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23sctp: do not loose window information if in rwnd_overMarcelo Ricardo Leitner
It's possible that we receive a packet that is larger than current window. If it's the first packet in this way, it will cause it to increase rwnd_over. Then, if we receive another data chunk (specially as SCTP allows you to have one data chunk in flight even during 0 window), rwnd_over will be overwritten instead of added to. In the long run, this could cause the window to grow bigger than its initial size, as rwnd_over would be charged only for the last received data chunk while the code will try open the window for all packets that were received and had its value in rwnd_over overwritten. This, then, can lead to the worsening of payload/buffer ratio and cause rwnd_press to kick in more often. The fix is to sum it too, same as is done for rwnd_press, so that if we receive 3 chunks after closing the window, we still have to release that same amount before re-opening it. Log snippet from sctp_test exhibiting the issue: [ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000 rwnd decreased by 1 to (0, 1, 114221) [ 146.209232] sctp: sctp_assoc_rwnd_decrease: association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1! [ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000 rwnd decreased by 1 to (0, 1, 114221) [ 146.209232] sctp: sctp_assoc_rwnd_decrease: association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1! [ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000 rwnd decreased by 1 to (0, 1, 114221) [ 146.209232] sctp: sctp_assoc_rwnd_decrease: association:ffff88013928e000 has asoc->rwnd:0, asoc->rwnd_over:1! [ 146.209232] sctp: sctp_assoc_rwnd_decrease: asoc:ffff88013928e000 rwnd decreased by 1 to (0, 1, 114221) Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull final vfs updates from Al Viro: "Assorted cleanups and fixes all over the place" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: sg_write()/bsg_write() is not fit to be called under KERNEL_DS ufs: fix function declaration for ufs_truncate_blocks fs: exec: apply CLOEXEC before changing dumpable task flags seq_file: reset iterator to first record for zero offset vfs: fix isize/pos/len checks for reflink & dedupe [iov_iter] fix iterate_all_kinds() on empty iterators move aio compat to fs/aio.c reorganize do_make_slave() clone_private_mount() doesn't need to touch namespace_sem remove a bogus claim about namespace_sem being held by callers of mnt_alloc_id()
2016-12-23Merge branch 'virtio-net-xdp-fixes'David S. Miller
Jason Wang says: ==================== several fixups for virtio-net XDP Merry Xmas and a Happy New year to all: This series tries to fixes several issues for virtio-net XDP which could be categorized into several parts: - fix several issues during XDP linearizing - allow csumed packet to work for XDP_PASS - make EWMA rxbuf size estimation works for XDP - forbid XDP when GUEST_UFO is support - remove big packet XDP support - add XDP support or small buffer Please see individual patches for details. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23virtio-net: XDP support for small buffersJason Wang
Commit f600b6905015 ("virtio_net: Add XDP support") leaves the case of small receive buffer untouched. This will confuse the user who want to set XDP but use small buffers. Other than forbid XDP in small buffer mode, let's make it work. XDP then can only work at skb->data since virtio-net create skbs during refill, this is sub optimal which could be optimized in the future. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23virtio-net: remove big packet XDP codesJason Wang
Now we in fact don't allow XDP for big packets, remove its codes. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23virtio-net: forbid XDP when VIRTIO_NET_F_GUEST_UFO is supportJason Wang
When VIRTIO_NET_F_GUEST_UFO is negotiated, host could still send UFO packet that exceeds a single page which could not be handled correctly by XDP. So this patch forbids setting XDP when GUEST_UFO is supported. While at it, forbid XDP for ECN (which comes only from GRO) too to prevent user from misconfiguration. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23virtio-net: make rx buf size estimation works for XDPJason Wang
We don't update ewma rx buf size in the case of XDP. This will lead underestimation of rx buf size which causes host to produce more than one buffers. This will greatly increase the possibility of XDP page linearization. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23virtio-net: unbreak csumed packets for XDP_PASSJason Wang
We drop csumed packet when do XDP for packets. This breaks XDP_PASS when GUEST_CSUM is supported. Fix this by allowing csum flag to be set. With this patch, simple TCP works for XDP_PASS. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23virtio-net: correctly handle XDP_PASS for linearized packetsJason Wang
When XDP_PASS were determined for linearized packets, we try to get new buffers in the virtqueue and build skbs from them. This is wrong, we should create skbs based on existed buffers instead. Fixing them by creating skb based on xdp_page. With this patch "ping 192.168.100.4 -s 3900 -M do" works for XDP_PASS. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23virtio-net: fix page miscount during XDP linearizingJason Wang
We don't put page during linearizing, the would cause leaking when xmit through XDP_TX or the packet exceeds PAGE_SIZE. Fix them by put page accordingly. Also decrease the number of buffers during linearizing to make sure caller can free buffers correctly when packet exceeds PAGE_SIZE. With this patch, we won't get OOM after linearize huge number of packets. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23virtio-net: correctly xmit linearized page on XDP_TXJason Wang
After we linearize page, we should xmit this page instead of the page of first buffer which may lead unexpected result. With this patch, we can see correct packet during XDP_TX. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23virtio-net: remove the warning before XDP linearizingJason Wang
Since we use EWMA to estimate the size of rx buffer. When rx buffer size is underestimated, it's usual to have a packet with more than one buffers. Consider this is not a bug, remove the warning and correct the comment before XDP linearizing. Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23Merge tag 'befs-v4.10-rc1' of git://github.com/luisbg/linux-befsLinus Torvalds
Pull befs updates from Luis de Bethencourt: "A series of small fixes and adding NFS export support" * tag 'befs-v4.10-rc1' of git://github.com/luisbg/linux-befs: befs: add NFS export support befs: remove trailing whitespaces befs: remove signatures from comments befs: fix style issues in header files befs: fix style issues in linuxvfs.c befs: fix typos in linuxvfs.c befs: fix style issues in io.c befs: fix style issues in inode.c befs: fix style issues in debug.c
2016-12-23Merge tag 'drm-fixes-for-4.10-rc1' of ↵Linus Torvalds
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Some fixes came in while I was out, mostly intel and amdgpu ones, with one ast fix" Daniel Vetter says: "This should also shut up the WARN_ON(!intel_dp->lane_count) noise" * tag 'drm-fixes-for-4.10-rc1' of git://people.freedesktop.org/~airlied/linux: (35 commits) drm/amdgpu: update tile table for oland/hainan drm/amdgpu: update tile table for verde drm/amdgpu: update rev id for verde drm/amdgpu: update golden setting for verde drm/amdgpu: update rev id for oland drm/amdgpu: update golden setting for oland drm/amdgpu: update rev id for hainan drm/amdgpu: update golden setting for hainan drm/amdgpu: update rev id for pitcairn drm/amdgpu: update golden setting for pitcairn drm/amdgpu: update golden setting/tiling table of tahiti drm/i915: skip the first 4k of stolen memory on everything >= gen8 drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping drm/i915: Fix use after free in logical_render_ring_init drm/i915: disable PSR by default on HSW/BDW drm/i915: Fix setting of boost freq tunable drm/i915: tune down the fast link training vs boot fail drm/i915: Reorder phys backing storage release drm/i915/gen9: Fix PCODE polling during SAGV disabling drm/i915/gen9: Fix PCODE polling during CDCLK change notification ...
2016-12-23Merge tag 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma fixes from Doug Ledford: "First round of -rc fixes for 4.10 kernel: - a series of qedr fixes - a series of rxe fixes - one i40iw fix - one cma fix - one cxgb4 fix" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: IB/rxe: Don't check for null ptr in send() IB/rxe: Drop future atomic/read packets rather than retrying IB/rxe: Use BTH_PSN_MASK when ACKing duplicate sends qedr: Always notify the verb consumer of flushed CQEs qedr: clear the vendor error field in the work completion qedr: post_send/recv according to QP state qedr: ignore inline flag in read verbs qedr: modify QP state to error when destroying it qedr: return correct value on modify qp qedr: return error if destroy CQ failed qedr: configure the number of CQEs on CQ creation i40iw: Set 128B as the only supported RQ WQE size IB/cma: Fix a race condition in iboe_addr_get_sgid() IB/rxe: Fix a memory leak in rxe_qp_cleanup() iw_cxgb4: set correct FetchBurstMax for QPs
2016-12-23Merge tag 'scsi-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull late SCSI updates from James Bottomley: "This is mostly stuff which missed the initial pull. There's a new driver: qedi, and some ufs, ibmvscsis and ncr5380 updates plus some assorted driver fixes and also a fix for the bug where if a device goes into a blocked state between configuration and sysfs device add (which can be a long time under async probing) it would become permanently blocked" * tag 'scsi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (30 commits) scsi: avoid a permanent stop of the scsi device's request queue scsi: mpt3sas: Recognize and act on iopriority info scsi: qla2xxx: Fix Target mode handling with Multiqueue changes. scsi: qla2xxx: Add Block Multi Queue functionality. scsi: qla2xxx: Add multiple queue pair functionality. scsi: qla2xxx: Utilize pci_alloc_irq_vectors/pci_free_irq_vectors calls. scsi: qla2xxx: Only allow operational MBX to proceed during RESET. scsi: hpsa: remove memory allocate failure message scsi: Update 3ware driver email addresses scsi: zfcp: fix rport unblock race with LUN recovery scsi: zfcp: do not trace pure benign residual HBA responses at default level scsi: zfcp: fix use-after-"free" in FC ingress path after TMF scsi: libcxgbi: return error if interface is not up scsi: cxgb4i: libcxgbi: add missing module_put() scsi: cxgb4i: libcxgbi: cxgb4: add T6 iSCSI completion feature scsi: cxgb4i: libcxgbi: add active open cmd for T6 adapters scsi: cxgb4i: use cxgb4_tp_smt_idx() to get smt_idx scsi: qedi: Add QLogic FastLinQ offload iSCSI driver framework. scsi: aacraid: remove wildcard for series 9 controllers scsi: ibmvscsi: add write memory barrier to CRQ processing ...
2016-12-23Merge tag 'arc-4.10-rc1-part2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull more ARC updates from Vineet Gupta: - Fix for aliasing VIPT dcache in old ARC700 cores - micro-optimization in ARC700 ProtV handler - Enable SG_CHAIN [Vladimir] - ARC HS38 core intc default to prio 1 * tag 'arc-4.10-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: mm: arc700: Don't assume 2 colours for aliasing VIPT dcache ARC: mm: No need to save cache version in @cpuinfo ARC: enable SG chaining ARCv2: intc: default all interrupts to priority 1 ARCv2: entry: document intr disable in hard isr ARC: ARCompact entry: elide re-reading ECR in ProtV handler
2016-12-23Merge branch 'mlxsw-router-fixes'David S. Miller
Jiri Pirko says: ==================== mlxsw: Router fixes Ido says: First two patches ensure we remove from the device's table neighbours that are considered to be dead by the neighbour core. The last patch removes nexthop groups from the device when they are no longer valid. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23mlxsw: spectrum_router: Correctly remove nexthop groupsIdo Schimmel
At the end of the nexthop initialization process we determine whether the nexthop should be offloaded or not based on the NUD state of the neighbour representing it. After all the nexthops were initialized we refresh the nexthop group and potentially offload it to the device, in case some of the nexthops were resolved. Make the destruction of a nexthop group symmetric with its creation by marking all nexthops as invalid and then refresh the nexthop group to make sure it was removed from the device's tables. Fixes: b2157149b0b0 ("mlxsw: spectrum_router: Add the nexthop neigh activity update") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23mlxsw: spectrum_router: Don't reflect dead neighsIdo Schimmel
When a neighbour is considered to be dead, we should remove it from the device's table regardless of its NUD state. Without this patch, after setting a port to be administratively down we get the following errors when we periodically try to update the kernel about neighbours activity: [ 461.947268] mlxsw_spectrum 0000:03:00.0 sw1p3: Failed to find matching neighbour for IP=192.168.100.2 Fixes: a6bf9e933daf ("mlxsw: spectrum_router: Offload neighbours based on NUD state change") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23neigh: Send netevent after marking neigh as deadIdo Schimmel
neigh_cleanup_and_release() is always called after marking a neighbour as dead, but it only notifies user space and not in-kernel listeners of the netevent notification chain. This can cause multiple problems. In my specific use case, it causes the listener (a switch driver capable of L3 offloads) to believe a neighbour entry is still valid, and is thus erroneously kept in the device's table. Fix that by sending a netevent after marking the neighbour as dead. Fixes: a6bf9e933daf ("mlxsw: spectrum_router: Offload neighbours based on NUD state change") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23ipv6: handle -EFAULT from skb_copy_bitsDave Jones
By setting certain socket options on ipv6 raw sockets, we can confuse the length calculation in rawv6_push_pending_frames triggering a BUG_ON. RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40 RSP: 0018:ffff881f6c4a7c18 EFLAGS: 00010282 RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002 RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00 RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009 R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030 R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80 Call Trace: [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830 [<ffffffff81772697>] inet_sendmsg+0x67/0xa0 [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50 [<ffffffff816d982f>] SYSC_sendto+0xef/0x170 [<ffffffff816da27e>] SyS_sendto+0xe/0x10 [<ffffffff81002910>] do_syscall_64+0x50/0xa0 [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25 Handle by jumping to the failure path if skb_copy_bits gets an EFAULT. Reproducer: #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <sys/types.h> #include <sys/socket.h> #include <netinet/in.h> #define LEN 504 int main(int argc, char* argv[]) { int fd; int zero = 0; char buf[LEN]; memset(buf, 0, LEN); fd = socket(AF_INET6, SOCK_RAW, 7); setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4); setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN); sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110); } Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23inet: fix IP(V6)_RECVORIGDSTADDR for udp socketsWillem de Bruijn
Socket cmsg IP(V6)_RECVORIGDSTADDR checks that port range lies within the packet. For sockets that have transport headers pulled, transport offset can be negative. Use signed comparison to avoid overflow. Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") Reported-by: Nisar Jagabar <njagabar@cloudmark.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23Merge branch 'cls_flower-act_tunnel_key-fixes'David S. Miller
Or Gerlitz says: ==================== net/sched fixes for cls_flower and act_tunnel_key This small series contain a fix to the matching flags support in flower and to the tunnel key action MD prep for IPv6. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23net/sched: cls_flower: Mandate mask when matching on flagsOr Gerlitz
When matching on flags, we should require the user to provide the mask and avoid using an all-ones mask. Not doing so causes matching on flags provided w.o mask to hit on the value being unset for all flags, which may not what the user wanted to happen. Fixes: faa3ffce7829 ('net/sched: cls_flower: Add support for matching on flags') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Paul Blakey <paulb@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23net/sched: act_tunnel_key: Fix setting UDP dst port in metadata under IPv6Or Gerlitz
The UDP dst port was provided to the helper function which sets the IPv6 IP tunnel meta-data under a wrong param order, fix that. Fixes: 75bfbca01e48 ('net/sched: act_tunnel_key: Add UDP dst port option') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-23stmmac: CSR clock configuration fixjpinto
When testing stmmac with my QoS reference design I checked a problem in the CSR clock configuration that was impossibilitating the phy discovery, since every read operation returned 0x0000ffff. This patch fixes the issue. Signed-off-by: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2016-12-22Merge branch 'work.namespace' into for-linusAl Viro
2016-12-22sg_write()/bsg_write() is not fit to be called under KERNEL_DSAl Viro
Both damn things interpret userland pointers embedded into the payload; worse, they are actually traversing those. Leaving aside the bad API design, this is very much _not_ safe to call with KERNEL_DS. Bail out early if that happens. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22ufs: fix function declaration for ufs_truncate_blocksJeff Layton
sparse says: fs/ufs/inode.c:1195:6: warning: symbol 'ufs_truncate_blocks' was not declared. Should it be static? Note that the forward declaration in the file is already marked static. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22fs: exec: apply CLOEXEC before changing dumpable task flagsAleksa Sarai
If you have a process that has set itself to be non-dumpable, and it then undergoes exec(2), any CLOEXEC file descriptors it has open are "exposed" during a race window between the dumpable flags of the process being reset for exec(2) and CLOEXEC being applied to the file descriptors. This can be exploited by a process by attempting to access /proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE. The race in question is after set_dumpable has been (for get_link, though the trace is basically the same for readlink): [vfs] -> proc_pid_link_inode_operations.get_link -> proc_pid_get_link -> proc_fd_access_allowed -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS); Which will return 0, during the race window and CLOEXEC file descriptors will still be open during this window because do_close_on_exec has not been called yet. As a result, the ordering of these calls should be reversed to avoid this race window. This is of particular concern to container runtimes, where joining a PID namespace with file descriptors referring to the host filesystem can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect against access of CLOEXEC file descriptors -- file descriptors which may reference filesystem objects the container shouldn't have access to). Cc: dev@opencontainers.org Cc: <stable@vger.kernel.org> # v3.2+ Reported-by: Michael Crosby <crosbymichael@gmail.com> Signed-off-by: Aleksa Sarai <asarai@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22seq_file: reset iterator to first record for zero offsetTomasz Majchrzak
If kernfs file is empty on a first read, successive read operations using the same file descriptor will return no data, even when data is available. Default kernfs 'seq_next' implementation advances iterator position even when next object is not there. Kernfs 'seq_start' for following requests will not return iterator as position is already on the second object. This defect doesn't allow to monitor badblocks sysfs files from MD raid. They are initially empty but if data appears at some stage, userspace is not able to read it. Signed-off-by: Tomasz Majchrzak <tomasz.majchrzak@intel.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22vfs: fix isize/pos/len checks for reflink & dedupeDarrick J. Wong
Strengthen the checking of pos/len vs. i_size, clarify the return values for the clone prep function, and remove pointless code. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22[iov_iter] fix iterate_all_kinds() on empty iteratorsAl Viro
Problem similar to ones dealt with in "fold checks into iterate_and_advance()" and followups, except that in this case we really want to do nothing when asked for zero-length operation - unlike zero-length iterate_and_advance(), zero-length iterate_all_kinds() has no side effects, and callers are simpler that way. That got exposed when copy_from_iter_full() had been used by tipc, which builds an msghdr with zero payload and (now) feeds it to a primitive based on iterate_all_kinds() instead of iterate_and_advance(). Reported-by: Jon Maloy <jon.maloy@ericsson.com> Tested-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22move aio compat to fs/aio.cAl Viro
... and fix the minor buglet in compat io_submit() - native one kills ioctx as cleanup when put_user() fails. Get rid of bogus compat_... in !CONFIG_AIO case, while we are at it - they should simply fail with ENOSYS, same as for native counterparts. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2016-12-22Merge branch 'misc' into for-linusJames Bottomley
2016-12-23Merge tag 'drm-intel-next-fixes-2016-12-22' of ↵Dave Airlie
git://anongit.freedesktop.org/git/drm-intel into drm-fixes First set of i915 fixes for code in next. * tag 'drm-intel-next-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-intel: drm/i915: skip the first 4k of stolen memory on everything >= gen8 drm/i915: Fallback to single PAGE_SIZE segments for DMA remapping drm/i915: Fix use after free in logical_render_ring_init drm/i915: disable PSR by default on HSW/BDW drm/i915: Fix setting of boost freq tunable drm/i915: tune down the fast link training vs boot fail drm/i915: Reorder phys backing storage release drm/i915/gen9: Fix PCODE polling during SAGV disabling drm/i915/gen9: Fix PCODE polling during CDCLK change notification drm/i915/dsi: Fix chv_exec_gpio disabling the GPIOs it is setting drm/i915/dsi: Fix swapping of MIPI_SEQ_DEASSERT_RESET / MIPI_SEQ_ASSERT_RESET drm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating drm/i915: drop the struct_mutex when wedged or trying to reset
2016-12-23Merge tag 'drm-misc-fixes-2016-12-22' of ↵Dave Airlie
git://anongit.freedesktop.org/git/drm-misc into drm-fixes Here's the one lonely bugfix I talked about on irc. * tag 'drm-misc-fixes-2016-12-22' of git://anongit.freedesktop.org/git/drm-misc: drivers/gpu/drm/ast: Fix infinite loop if read fails
2016-12-23Merge branch 'drm-next-4.10' of git://people.freedesktop.org/~agd5f/linux ↵Dave Airlie
into drm-fixes - fix display regression on DCE6/8 - Powergating fixes for GFX8 - amdgpu SI fixes (golden settings, proper rev id setup, etc.) * 'drm-next-4.10' of git://people.freedesktop.org/~agd5f/linux: (21 commits) drm/amdgpu: update tile table for oland/hainan drm/amdgpu: update tile table for verde drm/amdgpu: update rev id for verde drm/amdgpu: update golden setting for verde drm/amdgpu: update rev id for oland drm/amdgpu: update golden setting for oland drm/amdgpu: update rev id for hainan drm/amdgpu: update golden setting for hainan drm/amdgpu: update rev id for pitcairn drm/amdgpu: update golden setting for pitcairn drm/amdgpu: update golden setting/tiling table of tahiti drm/amdgpu: fix cursor setting of dce6/dce8 drm/amdgpu: refine set clock gating for tonga/polaris drm/amdgpu: initialize cg flags for tonga/polaris10/polaris11. drm/amdgpu: add new gfx cg flags. drm/amdgpu: fix pg can't be disabled by PG mask. drm/amdgpu: always initialize gfx pg for gfx_v8.0. drm/amdgpu: enable AMD_PG_SUPPORT_CP in Carrizo/Stoney. drm/amdgpu: fix init save/restore list in gfx_v8.0 drm/amdgpu: fix enable_cp_power_gating in gfx_v8.0. ...
2016-12-22Merge tag 'leds_for_4.10_email_update' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds Pull LED maintainer email update from Jacek Anaszewski: "Update Jacek Anaszewski's email address" * tag 'leds_for_4.10_email_update' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds: MAINTAINERS: Update Jacek Anaszewski's email address
2016-12-22Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer fixes from Jens Axboe: "Just a set of small fixes that have either been queued up after the original pull for this merge window, or just missed the original pull request. - a few bcache fixes/changes from Eric and Kent - add WRITE_SAME to the command filter whitelist frm Mauricio - kill an unused struct member from Ritesh - partition IO alignment fix from Stefan - nvme sysfs printf fix from Stephen" * 'for-linus' of git://git.kernel.dk/linux-block: block: check partition alignment nvme : Use correct scnprintf in cmb show block: allow WRITE_SAME commands with the SG_IO ioctl block: Remove unused member (busy) from struct blk_queue_tag bcache: partition support: add 16 minors per bcacheN device bcache: Make gc wakeup sane, remove set_task_state()
2016-12-22Merge tag 'acpi-extra-4.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more ACPI updates from Rafael Wysocki: "Here are new versions of two ACPICA changes that were deferred previously due to a problem they had introduced, two cleanups on top of them and the removal of a useless warning message from the ACPI core. Specifics: - Move some Linux-specific functionality to upstream ACPICA and update the in-kernel users of it accordingly (Lv Zheng) - Drop a useless warning (triggered by the lack of an optional object) from the ACPI namespace scanning code (Zhang Rui)" * tag 'acpi-extra-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI / osl: Remove deprecated acpi_get_table_with_size()/early_acpi_os_unmap_memory() ACPI / osl: Remove acpi_get_table_with_size()/early_acpi_os_unmap_memory() users ACPICA: Tables: Allow FADT to be customized with virtual address ACPICA: Tables: Back port acpi_get_table_with_size() and early_acpi_os_unmap_memory() from Linux kernel ACPI: do not warn if _BQC does not exist
2016-12-22Merge tag 'pm-fixes-4.10-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "They fix one bug introduced recently, a build warning and a kerneldoc function description. Specifics: - Prevent the acpi-cpufreq driver from crashing on exit by fixing a check against the __cpuhp_setup_state() return value and fix the kerneldoc description of that function to make it clear that it may return positive numbers on success too (Boris Ostrovsky) - Drop an incorrect __init annotation of a function in the s3c64xx cpufreq driver and fix a build warning generated (by older compilers) because of it (Arnd Bergmann)" * tag 'pm-fixes-4.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: s3c64xx: remove incorrect __init annotation cpufreq: Remove CPU hotplug callbacks only if they were initialized CPU/hotplug: Clarify description of __cpuhp_setup_state() return value
2016-12-22Merge tag 'mmc-v4.10-3' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - further fix thread wake-up for requests - use a bounce buffer to fix DMA issue for SSR register read MMC host: - sdhci: Fix a regression for runtime PM - sdhci-cadence: Add a proper SoC specific DT compatible" * tag 'mmc-v4.10-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: sd: Meet alignment requirements for raw_ssr DMA mmc: core: Further fix thread wake-up mmc: sdhci: Fix to handle MMC_POWER_UNDEFINED mmc: sdhci-cadence: add Socionext UniPhier specific compatible string
2016-12-22Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull SElinux fix from James Morris: "From Paul: 'A small SELinux patch to fix some clang/llvm compiler warnings and ensure the tools under scripts work well in the face of kernel changes'" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: selinux: use the kernel headers when building scripts/selinux
2016-12-22Merge branch 'x86-cache-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache allocation interface from Thomas Gleixner: "This provides support for Intel's Cache Allocation Technology, a cache partitioning mechanism. The interface is odd, but the hardware interface of that CAT stuff is odd as well. We tried hard to come up with an abstraction, but that only allows rather simple partitioning, but no way of sharing and dealing with the per package nature of this mechanism. In the end we decided to expose the allocation bitmaps directly so all combinations of the hardware can be utilized. There are two ways of associating a cache partition: - Task A task can be added to a resource group. It uses the cache partition associated to the group. - CPU All tasks which are not member of a resource group use the group to which the CPU they are running on is associated with. That allows for simple CPU based partitioning schemes. The main expected user sare: - Virtualization so a VM can only trash only the associated part of the cash w/o disturbing others - Real-Time systems to seperate RT and general workloads. - Latency sensitive enterprise workloads - In theory this also can be used to protect against cache side channel attacks" [ Intel RDT is "Resource Director Technology". The interface really is rather odd and very specific, which delayed this pull request while I was thinking about it. The pull request itself came in early during the merge window, I just delayed it until things had calmed down and I had more time. But people tell me they'll use this, and the good news is that it is _so_ specific that it's rather independent of anything else, and no user is going to depend on the interface since it's pretty rare. So if push comes to shove, we can just remove the interface and nothing will break ] * 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86/intel_rdt: Implement show_options() for resctrlfs x86/intel_rdt: Call intel_rdt_sched_in() with preemption disabled x86/intel_rdt: Update task closid immediately on CPU in rmdir and unmount x86/intel_rdt: Fix setting of closid when adding CPUs to a group x86/intel_rdt: Update percpu closid immeditately on CPUs affected by changee x86/intel_rdt: Reset per cpu closids on unmount x86/intel_rdt: Select KERNFS when enabling INTEL_RDT_A x86/intel_rdt: Prevent deadlock against hotplug lock x86/intel_rdt: Protect info directory from removal x86/intel_rdt: Add info files to Documentation x86/intel_rdt: Export the minimum number of set mask bits in sysfs x86/intel_rdt: Propagate error in rdt_mount() properly x86/intel_rdt: Add a missing #include MAINTAINERS: Add maintainer for Intel RDT resource allocation x86/intel_rdt: Add scheduler hook x86/intel_rdt: Add schemata file x86/intel_rdt: Add tasks files x86/intel_rdt: Add cpus file x86/intel_rdt: Add mkdir to resctrl file system x86/intel_rdt: Add "info" files to resctrl file system ...