summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2019-05-15tcp: do not recycle cloned skbsEric Dumazet
It is illegal to change arbitrary fields in skb_shared_info if the skb is cloned. Before calling skb_zcopy_clear() we need to ensure this rule, therefore we need to move the test from sk_stream_alloc_skb() to sk_wmem_free_skb() Fixes: 4f661542a402 ("tcp: fix zerocopy and notsent_lowat issues") Signed-off-by: Eric Dumazet <edumazet@google.com> Diagnosed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-14net: replace CONFIG_DEBUG_KERNEL with CONFIG_DEBUG_MISCSinan Kaya
CONFIG_DEBUG_KERNEL should not impact code generation. Use the newly defined CONFIG_DEBUG_MISC instead to keep the current code. Link: http://lkml.kernel.org/r/20190413224438.10802-6-okaya@kernel.org Signed-off-by: Sinan Kaya <okaya@kernel.org> Acked-by: Florian Westphal <fw@strlen.de> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: Florian Westphal <fw@strlen.de> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Anders Roxell <anders.roxell@linaro.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Christophe Leroy <christophe.leroy@c-s.fr> Cc: Chris Zankel <chris@zankel.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: James Hogan <jhogan@kernel.org> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Paul Burton <paul.burton@mips.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Thomas Bogendoerfer <tbogendoerfer@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14rtnetlink: always put IFLA_LINK for links with a link-netnsidSabrina Dubroca
Currently, nla_put_iflink() doesn't put the IFLA_LINK attribute when iflink == ifindex. In some cases, a device can be created in a different netns with the same ifindex as its parent. That device will not dump its IFLA_LINK attribute, which can confuse some userspace software that expects it. For example, if the last ifindex created in init_net and foo are both 8, these commands will trigger the issue: ip link add parent type dummy # ifindex 9 ip link add link parent netns foo type macvlan # ifindex 9 in ns foo So, in case a device puts the IFLA_LINK_NETNSID attribute in a dump, always put the IFLA_LINK attribute as well. Thanks to Dan Winship for analyzing the original OpenShift bug down to the missing netlink attribute. v2: change Fixes tag, it's been here forever, as Nicolas Dichtel said add Nicolas' ack v3: change Fixes tag fix subject typo, spotted by Edward Cree Analyzed-by: Dan Winship <danw@redhat.com> Fixes: d8a5ec672768 ("[NET]: netlink support for moving devices between network namespaces.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-14tcp: fix retrans timestamp on passive Fast OpenYuchung Cheng
Commit c7d13c8faa74 ("tcp: properly track retry time on passive Fast Open") sets the start of SYNACK retransmission time on passive Fast Open in "retrans_stamp". However the timestamp is not reset upon the handshake has completed. As a result, future data packet retransmission may not update it in tcp_retransmit_skb(). This may lead to socket aborting earlier unexpectedly by retransmits_timed_out() since retrans_stamp remains the SYNACK rtx time. This bug only manifests on passive TFO sender that a) suffered SYNACK timeout and then b) stalls on very first loss recovery. Any successful loss recovery would reset the timestamp to avoid this issue. Fixes: c7d13c8faa74 ("tcp: properly track retry time on passive Fast Open") Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-14Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: - a few misc things and hotfixes - ocfs2 - almost all of MM * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (139 commits) kernel/memremap.c: remove the unused device_private_entry_fault() export mm: delete find_get_entries_tag mm/huge_memory.c: make __thp_get_unmapped_area static mm/mprotect.c: fix compilation warning because of unused 'mm' variable mm/page-writeback: introduce tracepoint for wait_on_page_writeback() mm/vmscan: simplify trace_reclaim_flags and trace_shrink_flags mm/Kconfig: update "Memory Model" help text mm/vmscan.c: don't disable irq again when count pgrefill for memcg mm: memblock: make keeping memblock memory opt-in rather than opt-out hugetlbfs: always use address space in inode for resv_map pointer mm/z3fold.c: support page migration mm/z3fold.c: add structure for buddy handles mm/z3fold.c: improve compression by extending search mm/z3fold.c: introduce helper functions mm/page_alloc.c: remove unnecessary parameter in rmqueue_pcplist mm/hmm: add ARCH_HAS_HMM_MIRROR ARCH_HAS_HMM_DEVICE Kconfig mm/vmscan.c: simplify shrink_inactive_list() fs/sync.c: sync_file_range(2) may use WB_SYNC_ALL writeback xen/privcmd-buf.c: convert to use vm_map_pages_zero() xen/gntdev.c: convert to use vm_map_pages() ...
2019-05-14mm/gup: change GUP fast to use flags rather than a write 'bool'Ira Weiny
To facilitate additional options to get_user_pages_fast() change the singular write parameter to be gup_flags. This patch does not change any functionality. New functionality will follow in subsequent patches. Some of the get_user_pages_fast() call sites were unchanged because they already passed FOLL_WRITE or 0 for the write parameter. NOTE: It was suggested to change the ordering of the get_user_pages_fast() arguments to ensure that callers were converted. This breaks the current GUP call site convention of having the returned pages be the final parameter. So the suggestion was rejected. Link: http://lkml.kernel.org/r/20190328084422.29911-4-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-4-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marshall <hubcap@omnibond.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Hogan <jhogan@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14mm/gup: replace get_user_pages_longterm() with FOLL_LONGTERMIra Weiny
Pach series "Add FOLL_LONGTERM to GUP fast and use it". HFI1, qib, and mthca, use get_user_pages_fast() due to its performance advantages. These pages can be held for a significant time. But get_user_pages_fast() does not protect against mapping FS DAX pages. Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which retains the performance while also adding the FS DAX checks. XDP has also shown interest in using this functionality.[1] In addition we change get_user_pages() to use the new FOLL_LONGTERM flag and remove the specialized get_user_pages_longterm call. [1] https://lkml.org/lkml/2019/3/19/939 "longterm" is a relative thing and at this point is probably a misnomer. This is really flagging a pin which is going to be given to hardware and can't move. I've thought of a couple of alternative names but I think we have to settle on if we are going to use FL_LAYOUT or something else to solve the "longterm" problem. Then I think we can change the flag to a better name. Secondly, it depends on how often you are registering memory. I have spoken with some RDMA users who consider MR in the performance path... For the overall application performance. I don't have the numbers as the tests for HFI1 were done a long time ago. But there was a significant advantage. Some of which is probably due to the fact that you don't have to hold mmap_sem. Finally, architecturally I think it would be good for everyone to use *_fast. There are patches submitted to the RDMA list which would allow the use of *_fast (they reworking the use of mmap_sem) and as soon as they are accepted I'll submit a patch to convert the RDMA core as well. Also to this point others are looking to use *_fast. As an aside, Jasons pointed out in my previous submission that *_fast and *_unlocked look very much the same. I agree and I think further cleanup will be coming. But I'm focused on getting the final solution for DAX at the moment. This patch (of 7): This patch starts a series which aims to support FOLL_LONGTERM in get_user_pages_fast(). Some callers who would like to do a longterm (user controlled pin) of pages with the fast variant of GUP for performance purposes. Rather than have a separate get_user_pages_longterm() call, introduce FOLL_LONGTERM and change the longterm callers to use it. This patch does not change any functionality. In the short term "longterm" or user controlled pins are unsafe for Filesystems and FS DAX in particular has been blocked. However, callers of get_user_pages_fast() were not "protected". FOLL_LONGTERM can _only_ be supported with get_user_pages[_fast]() as it requires vmas to determine if DAX is in use. NOTE: In merging with the CMA changes we opt to change the get_user_pages() call in check_and_migrate_cma_pages() to a call of __get_user_pages_locked() on the newly migrated pages. This makes the code read better in that we are calling __get_user_pages_locked() on the pages before and after a potential migration. As a side affect some of the interfaces are cleaned up but this is not the primary purpose of the series. In review[1] it was asked: <quote> > This I don't get - if you do lock down long term mappings performance > of the actual get_user_pages call shouldn't matter to start with. > > What do I miss? A couple of points. First "longterm" is a relative thing and at this point is probably a misnomer. This is really flagging a pin which is going to be given to hardware and can't move. I've thought of a couple of alternative names but I think we have to settle on if we are going to use FL_LAYOUT or something else to solve the "longterm" problem. Then I think we can change the flag to a better name. Second, It depends on how often you are registering memory. I have spoken with some RDMA users who consider MR in the performance path... For the overall application performance. I don't have the numbers as the tests for HFI1 were done a long time ago. But there was a significant advantage. Some of which is probably due to the fact that you don't have to hold mmap_sem. Finally, architecturally I think it would be good for everyone to use *_fast. There are patches submitted to the RDMA list which would allow the use of *_fast (they reworking the use of mmap_sem) and as soon as they are accepted I'll submit a patch to convert the RDMA core as well. Also to this point others are looking to use *_fast. As an asside, Jasons pointed out in my previous submission that *_fast and *_unlocked look very much the same. I agree and I think further cleanup will be coming. But I'm focused on getting the final solution for DAX at the moment. </quote> [1] https://lore.kernel.org/lkml/20190220180255.GA12020@iweiny-DESK2.sc.intel.com/T/#md6abad2569f3bf6c1f03686c8097ab6563e94965 [ira.weiny@intel.com: v3] Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190328084422.29911-2-ira.weiny@intel.com Link: http://lkml.kernel.org/r/20190317183438.2057-2-ira.weiny@intel.com Signed-off-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Rich Felker <dalias@libc.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Mike Marshall <hubcap@omnibond.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-05-14bpf: sockmap fix msg->sg.size account on ingress skbJohn Fastabend
When converting a skb to msg->sg we forget to set the size after the latest ktls/tls code conversion. This patch can be reached by doing a redir into ingress path from BPF skb sock recv hook. Then trying to read the size fails. Fix this by setting the size. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-14bpf: sockmap remove duplicate queue freeJohn Fastabend
In tcp bpf remove we free the cork list and purge the ingress msg list. However we do this before the ref count reaches zero so it could be possible some other access is in progress. In this case (tcp close and/or tcp_unhash) we happen to also hold the sock lock so no path exists but lets fix it otherwise it is extremely fragile and breaks the reference counting rules. Also we already check the cork list and ingress msg queue and free them once the ref count reaches zero so its wasteful to check twice. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-14bpf: sockmap, only stop/flush strp if it was enabled at some pointJohn Fastabend
If we try to call strp_done on a parser that has never been initialized, because the sockmap user is only using TX side for example we get the following error. [ 883.422081] WARNING: CPU: 1 PID: 208 at kernel/workqueue.c:3030 __flush_work+0x1ca/0x1e0 ... [ 883.422095] Workqueue: events sk_psock_destroy_deferred [ 883.422097] RIP: 0010:__flush_work+0x1ca/0x1e0 This had been wrapped in a 'if (psock->parser.enabled)' logic which was broken because the strp_done() was never actually being called because we do a strp_stop() earlier in the tear down logic will set parser.enabled to false. This could result in a use after free if work was still in the queue and was resolved by the patch here, 1d79895aef18f ("sk_msg: Always cancel strp work before freeing the psock"). However, calling strp_stop(), done by the patch marked in the fixes tag, only is useful if we never initialized a strp parser program and never initialized the strp to start with. Because if we had initialized a stream parser strp_stop() would have been called by sk_psock_drop() earlier in the tear down process. By forcing the strp to stop we get past the WARNING in strp_done that checks the stopped flag but calling cancel_work_sync on work that has never been initialized is also wrong and generates the warning above. To fix check if the parser program exists. If the program exists then the strp work has been initialized and must be sync'd and cancelled before free'ing any structures. If no program exists we never initialized the stream parser in the first place so skip the sync/cancel logic implemented by strp_done. Finally, remove the strp_done its not needed and in the case where we are using the stream parser has already been called. Fixes: e8e3437762ad9 ("bpf: Stop the psock parser before canceling its work") Signed-off-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2019-05-13flow_dissector: disable preemption around BPF callsEric Dumazet
Various things in eBPF really require us to disable preemption before running an eBPF program. syzbot reported : BUG: assuming atomic context at net/core/flow_dissector.c:737 in_atomic(): 0, irqs_disabled(): 0, pid: 24710, name: syz-executor.3 2 locks held by syz-executor.3/24710: #0: 00000000e81a4bf1 (&tfile->napi_mutex){+.+.}, at: tun_get_user+0x168e/0x3ff0 drivers/net/tun.c:1850 #1: 00000000254afebd (rcu_read_lock){....}, at: __skb_flow_dissect+0x1e1/0x4bb0 net/core/flow_dissector.c:822 CPU: 1 PID: 24710 Comm: syz-executor.3 Not tainted 5.1.0+ #6 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 __cant_sleep kernel/sched/core.c:6165 [inline] __cant_sleep.cold+0xa3/0xbb kernel/sched/core.c:6142 bpf_flow_dissect+0xfe/0x390 net/core/flow_dissector.c:737 __skb_flow_dissect+0x362/0x4bb0 net/core/flow_dissector.c:853 skb_flow_dissect_flow_keys_basic include/linux/skbuff.h:1322 [inline] skb_probe_transport_header include/linux/skbuff.h:2500 [inline] skb_probe_transport_header include/linux/skbuff.h:2493 [inline] tun_get_user+0x2cfe/0x3ff0 drivers/net/tun.c:1940 tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037 call_write_iter include/linux/fs.h:1872 [inline] do_iter_readv_writev+0x5fd/0x900 fs/read_write.c:693 do_iter_write fs/read_write.c:970 [inline] do_iter_write+0x184/0x610 fs/read_write.c:951 vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015 do_writev+0x15b/0x330 fs/read_write.c:1058 __do_sys_writev fs/read_write.c:1131 [inline] __se_sys_writev fs/read_write.c:1128 [inline] __x64_sys_writev+0x75/0xb0 fs/read_write.c:1128 do_syscall_64+0x103/0x670 arch/x86/entry/common.c:298 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: d58e468b1112 ("flow_dissector: implements flow dissector BPF hook") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Petar Penkov <ppenkov@google.com> Cc: Stanislav Fomichev <sdf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Postpone chain policy update to drop after transaction is complete, from Florian Westphal. 2) Add entry to flowtable after confirmation to fix UDP flows with packets going in one single direction. 3) Reference count leak in dst object, from Taehee Yoo. 4) Check for TTL field in flowtable datapath, from Taehee Yoo. 5) Fix h323 conntrack helper due to incorrect boundary check, from Jakub Jankowski. 6) Fix incorrect rcu dereference when fetching basechain stats, from Florian Westphal. 7) Missing error check when adding new entries to flowtable, from Taehee Yoo. 8) Use version field in nfnetlink message to honor the nfgen_family field, from Kristian Evensen. 9) Remove incorrect configuration check for CONFIG_NF_CONNTRACK_IPV6, from Subash Abhinov Kasiviswanathan. 10) Prevent dying entries from being added to the flowtable, from Taehee Yoo. 11) Don't hit WARN_ON() with malformed blob in ebtables with trailing data after last rule, reported by syzbot, patch from Florian Westphal. 12) Remove NFT_CT_TIMEOUT enumeration, never used in the kernel code. 13) Fix incorrect definition for NFT_LOGLEVEL_MAX, from Florian Westphal. This batch comes with a conflict that can be fixed with this patch: diff --cc include/uapi/linux/netfilter/nf_tables.h index 7bdb234f3d8c,f0cf7b0f4f35..505393c6e959 --- a/include/uapi/linux/netfilter/nf_tables.h +++ b/include/uapi/linux/netfilter/nf_tables.h @@@ -966,6 -966,8 +966,7 @@@ enum nft_socket_keys * @NFT_CT_DST_IP: conntrack layer 3 protocol destination (IPv4 address) * @NFT_CT_SRC_IP6: conntrack layer 3 protocol source (IPv6 address) * @NFT_CT_DST_IP6: conntrack layer 3 protocol destination (IPv6 address) - * @NFT_CT_TIMEOUT: connection tracking timeout policy assigned to conntrack + * @NFT_CT_ID: conntrack id */ enum nft_ct_keys { NFT_CT_STATE, @@@ -991,6 -993,8 +992,7 @@@ NFT_CT_DST_IP, NFT_CT_SRC_IP6, NFT_CT_DST_IP6, - NFT_CT_TIMEOUT, + NFT_CT_ID, __NFT_CT_MAX }; #define NFT_CT_MAX (__NFT_CT_MAX - 1) That replaces the unused NFT_CT_TIMEOUT definition by NFT_CT_ID. If you prefer, I can also solve this conflict here, just let me know. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-12net: dccp : proto: remove Unneeded variable "err"Hariprasad Kelam
Fix below issue reported by coccicheck net/dccp/proto.c:266:5-8: Unneeded variable: "err". Return "0" on line 310 Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-12net: dsa: Initialize DSA_SKB_CB(skb)->deferred_xmit variableVladimir Oltean
The sk_buff control block can have any contents on xmit put there by the stack, so initialization is mandatory, since we are checking its value after the actual DSA xmit (the tagger may have changed it). The DSA_SKB_ZERO() macro could have been used for this purpose, but: - Zeroizing a 48-byte memory region in the hotpath is best avoided. - It would have triggered a warning with newer compilers since __dsa_skb_cb contains a structure within a structure, and the {0} initializer was incorrect for that purpose. So simply remove the DSA_SKB_ZERO() macro and initialize the deferred_xmit variable by hand (which should be done for all further dsa_skb_cb variables which need initialization - currently none - to avoid the performance penalty). Fixes: 97a69a0dea9a ("net: dsa: Add support for deferred xmit") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-11net: qrtr: use protocol endiannes variableNicholas Mc Guire
sparse was unable to verify endiannes correctness due to reassignment from le32_to_cpu to the same variable - fix this warning up by providing a proper __le32 type and initializing it. This is not actually fixing any bug - rather just addressing the sparse warning. Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-10dsa: tag_brcm: Fix build error without CONFIG_NET_DSA_TAG_BRCM_PREPENDYueHaibing
Fix gcc build error: net/dsa/tag_brcm.c:211:16: error: brcm_prepend_netdev_ops undeclared here (not in a function); did you mean brcm_netdev_ops? DSA_TAG_DRIVER(brcm_prepend_netdev_ops); ^ ./include/net/dsa.h:708:10: note: in definition of macro DSA_TAG_DRIVER .ops = &__ops, \ ^~~~~ ./include/net/dsa.h:701:36: warning: dsa_tag_driver_brcm_prepend_netdev_ops defined but not used [-Wunused-variable] #define DSA_TAG_DRIVER_NAME(__ops) dsa_tag_driver ## _ ## __ops ^ ./include/net/dsa.h:707:30: note: in expansion of macro DSA_TAG_DRIVER_NAME static struct dsa_tag_driver DSA_TAG_DRIVER_NAME(__ops) = { \ ^~~~~~~~~~~~~~~~~~~ net/dsa/tag_brcm.c:211:1: note: in expansion of macro DSA_TAG_DRIVER DSA_TAG_DRIVER(brcm_prepend_netdev_ops); Like the CONFIG_NET_DSA_TAG_BRCM case, brcm_prepend_netdev_ops and DSA_TAG_PROTO_BRCM_PREPEND should be wrappeed by CONFIG_NET_DSA_TAG_BRCM_PREPEND. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: b74b70c44986 ("net: dsa: Support prepended Broadcom tag") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-10bridge: Fix error path for kobject_init_and_add()Tobin C. Harding
Currently error return from kobject_init_and_add() is not followed by a call to kobject_put(). This means there is a memory leak. We currently set p to NULL so that kfree() may be called on it as a noop, the code is arguably clearer if we move the kfree() up closer to where it is called (instead of after goto jump). Remove a goto label 'err1' and jump to call to kobject_put() in error return from kobject_init_and_add() fixing the memory leak. Re-name goto label 'put_back' to 'err1' now that we don't use err1, following current nomenclature (err1, err2 ...). Move call to kfree out of the error code at bottom of function up to closer to where memory was allocated. Add comment to clarify call to kfree(). Signed-off-by: Tobin C. Harding <tobin@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-09Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds
Pull networking fixes from David Miller: "Several bug fixes, many are quick merge-window regression cures: - When NLM_F_EXCL is not set, allow same fib rule insertion. From Hangbin Liu. - Several cures in sja1105 DSA driver (while loop exit condition fix, return of negative u8, etc.) from Vladimir Oltean. - Handle tx/rx delays in realtek PHY driver properly, from Serge Semin. - Double free in cls_matchall, from Pieter Jansen van Vuuren. - Disable SIOCSHWTSTAMP in macvlan/vlan containers, from Hangbin Liu. - Endainness fixes in aqc111, from Oliver Neukum. - Handle errors in packet_init properly, from Haibing Yue. - Various W=1 warning fixes in kTLS, from Jakub Kicinski" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (34 commits) nfp: add missing kdoc net/tls: handle errors from padding_length() net/tls: remove set but not used variables docs/btf: fix the missing section marks nfp: bpf: fix static check error through tightening shift amount adjustment selftests: bpf: initialize bpf_object pointers where needed packet: Fix error path in packet_init net/tcp: use deferred jump label for TCP acked data hook net: aquantia: fix undefined devm_hwmon_device_register_with_info reference aqc111: fix double endianness swap on BE aqc111: fix writing to the phy on BE aqc111: fix endianness issue in aqc111_change_mtu vlan: disable SIOCSHWTSTAMP in container macvlan: disable SIOCSHWTSTAMP in container tipc: fix hanging clients using poll with EPOLLOUT flag tuntap: synchronize through tfiles array instead of tun->numqueues tuntap: fix dividing by zero in ebpf queue selection dwmac4_prog_mtl_tx_algorithms() missing write operation ptp_qoriq: fix NULL access if ptp dt node missing net/sched: avoid double free on matchall reoffload ...
2019-05-09net/tls: handle errors from padding_length()Jakub Kicinski
At the time padding_length() is called the record header is still part of the message. If malicious TLS 1.3 peer sends an all-zero record padding_length() will stop at the record header, and return full length of the data including the tail_size. Subsequent subtraction of prot->overhead_size from rxm->full_len will cause rxm->full_len to turn negative. skb accessors, however, will always catch resulting out-of-bounds operation, so in practice this fix comes down to returning the correct error code. It also fixes a set but not used warning. This code was added by commit 130b392c6cd6 ("net: tls: Add tls 1.3 support"). CC: Dave Watson <davejwatson@fb.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-09net/tls: remove set but not used variablesJakub Kicinski
Commit 4504ab0e6eb8 ("net/tls: Inform user space about send buffer availability") made us report write_space regardless whether partial record push was successful or not. Remove the now unused return value to clean up the following W=1 warning: net/tls/tls_device.c: In function ‘tls_device_write_space’: net/tls/tls_device.c:546:6: warning: variable ‘rc’ set but not used [-Wunused-but-set-variable] int rc = 0; ^~ CC: Vakul Garg <vakul.garg@nxp.com> CC: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-09Merge tag 'nfs-for-5.2-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds
Pull NFS client updates from Anna Schumaker: "Highlights include: Stable bugfixes: - Fall back to MDS if no deviceid is found rather than aborting # v4.11+ - NFS4: Fix v4.0 client state corruption when mount Features: - Much improved handling of soft mounts with NFS v4.0: - Reduce risk of false positive timeouts - Faster failover of reads and writes after a timeout - Added a "softerr" mount option to return ETIMEDOUT instead of EIO to the application after a timeout - Increase number of xprtrdma backchannel requests - Add additional xprtrdma tracepoints - Improved send completion batching for xprtrdma Other bugfixes and cleanups: - Return -EINVAL when NFS v4.2 is passed an invalid dedup mode - Reduce usage of GFP_ATOMIC pages in SUNRPC - Various minor NFS over RDMA cleanups and bugfixes - Use the correct container namespace for upcalls - Don't share superblocks between user namespaces - Various other container fixes - Make nfs_match_client() killable to prevent soft lockups - Don't mark all open state for recovery when handling recallable state revoked flag" * tag 'nfs-for-5.2-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (69 commits) SUNRPC: Rebalance a kref in auth_gss.c NFS: Fix a double unlock from nfs_match,get_client nfs: pass the correct prototype to read_cache_page NFSv4: don't mark all open state for recovery when handling recallable state revoked flag SUNRPC: Fix an error code in gss_alloc_msg() SUNRPC: task should be exit if encode return EKEYEXPIRED more times NFS4: Fix v4.0 client state corruption when mount PNFS fallback to MDS if no deviceid found NFS: make nfs_match_client killable lockd: Store the lockd client credential in struct nlm_host NFS: When mounting, don't share filesystems between different user namespaces NFS: Convert NFSv2 to use the container user namespace NFSv4: Convert the NFS client idmapper to use the container user namespace NFS: Convert NFSv3 to use the container user namespace SUNRPC: Use namespace of listening daemon in the client AUTH_GSS upcall SUNRPC: Use the client user namespace when encoding creds NFS: Store the credential of the mount process in the nfs_server SUNRPC: Cache cred of process creating the rpc_client xprtrdma: Remove stale comment xprtrdma: Update comments that reference ib_drain_qp ...
2019-05-09packet: Fix error path in packet_initYueHaibing
kernel BUG at lib/list_debug.c:47! invalid opcode: 0000 [#1 CPU: 0 PID: 12914 Comm: rmmod Tainted: G W 5.1.0+ #47 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:__list_del_entry_valid+0x53/0x90 Code: 48 8b 32 48 39 fe 75 35 48 8b 50 08 48 39 f2 75 40 b8 01 00 00 00 5d c3 48 89 fe 48 89 c2 48 c7 c7 18 75 fe 82 e8 cb 34 78 ff <0f> 0b 48 89 fe 48 c7 c7 50 75 fe 82 e8 ba 34 78 ff 0f 0b 48 89 f2 RSP: 0018:ffffc90001c2fe40 EFLAGS: 00010286 RAX: 000000000000004e RBX: ffffffffa0184000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffff888237a17788 RDI: 00000000ffffffff RBP: ffffc90001c2fe40 R08: 0000000000000000 R09: 0000000000000000 R10: ffffc90001c2fe10 R11: 0000000000000000 R12: 0000000000000000 R13: ffffc90001c2fe50 R14: ffffffffa0184000 R15: 0000000000000000 FS: 00007f3d83634540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000555c350ea818 CR3: 0000000231677000 CR4: 00000000000006f0 Call Trace: unregister_pernet_operations+0x34/0x120 unregister_pernet_subsys+0x1c/0x30 packet_exit+0x1c/0x369 [af_packet __x64_sys_delete_module+0x156/0x260 ? lockdep_hardirqs_on+0x133/0x1b0 ? do_syscall_64+0x12/0x1f0 do_syscall_64+0x6e/0x1f0 entry_SYSCALL_64_after_hwframe+0x49/0xbe When modprobe af_packet, register_pernet_subsys fails and does a cleanup, ops->list is set to LIST_POISON1, but the module init is considered to success, then while rmmod it, BUG() is triggered in __list_del_entry_valid which is called from unregister_pernet_subsys. This patch fix error handing path in packet_init to avoid possilbe issue if some error occur. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-09SUNRPC: Rebalance a kref in auth_gss.cChuck Lever
Restore the kref_get that matches the gss_put_auth(gss_msg->auth) done by gss_release_msg(). Fixes: ac83228a7101 ("SUNRPC: Use namespace of listening daemon ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-05-09SUNRPC: Fix an error code in gss_alloc_msg()Dan Carpenter
If kstrdup_const() then this function returns zero (success) but it should return -ENOMEM. Fixes: ac83228a7101 ("SUNRPC: Use namespace of listening daemon in the client AUTH_GSS upcall") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-05-09SUNRPC: task should be exit if encode return EKEYEXPIRED more timesZhangXiaoxu
If the rpc.gssd always return cred success, but now the cred is expired, then the task will loop in call_refresh and call_transmit. Exit the rpc task after retry. Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2019-05-09net/tcp: use deferred jump label for TCP acked data hookJakub Kicinski
User space can flip the clean_acked_data_enabled static branch on and off with TLS offload when CONFIG_TLS_DEVICE is enabled. jump_label.h suggests we use the delayed version in this case. Deferred branches now also don't take the branch mutex on decrement, so we avoid potential locking issues. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-09Merge tag 'batadv-net-for-davem-20190509' of git://git.open-mesh.org/linux-mergeDavid S. Miller
Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich (we forgot to include this patch previously ...) - fix multicast tt/tvlv worker locking, by Linus Lüssing ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-09vlan: disable SIOCSHWTSTAMP in containerHangbin Liu
With NET_ADMIN enabled in container, a normal user could be mapped to root and is able to change the real device's rx filter via ioctl on vlan, which would affect the other ptp process on host. Fix it by disabling SIOCSHWTSTAMP in container. Fixes: a6111d3c93d0 ("vlan: Pass SIOC[SG]HWTSTAMP ioctls to real device") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-09tipc: fix hanging clients using poll with EPOLLOUT flagParthasarathy Bhuvaragan
commit 517d7c79bdb398 ("tipc: fix hanging poll() for stream sockets") introduced a regression for clients using non-blocking sockets. After the commit, we send EPOLLOUT event to the client even in TIPC_CONNECTING state. This causes the subsequent send() to fail with ENOTCONN, as the socket is still not in TIPC_ESTABLISHED state. In this commit, we: - improve the fix for hanging poll() by replacing sk_data_ready() with sk_state_change() to wake up all clients. - revert the faulty updates introduced by commit 517d7c79bdb398 ("tipc: fix hanging poll() for stream sockets"). Fixes: 517d7c79bdb398 ("tipc: fix hanging poll() for stream sockets") Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@gmail.com> Acked-by: Jon Maloy <jon.maloy@ericsson.se> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "This has been a smaller cycle than normal. One new driver was accepted, which is unusual, and at least one more driver remains in review on the list. Summary: - Driver fixes for hns, hfi1, nes, rxe, i40iw, mlx5, cxgb4, vmw_pvrdma - Many patches from MatthewW converting radix tree and IDR users to use xarray - Introduction of tracepoints to the MAD layer - Build large SGLs at the start for DMA mapping and get the driver to split them - Generally clean SGL handling code throughout the subsystem - Support for restricting RDMA devices to net namespaces for containers - Progress to remove object allocation boilerplate code from drivers - Change in how the mlx5 driver shows representor ports linked to VFs - mlx5 uapi feature to access the on chip SW ICM memory - Add a new driver for 'EFA'. This is HW that supports user space packet processing through QPs in Amazon's cloud" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (186 commits) RDMA/ipoib: Allow user space differentiate between valid dev_port IB/core, ipoib: Do not overreact to SM LID change event RDMA/device: Don't fire uevent before device is fully initialized lib/scatterlist: Remove leftover from sg_page_iter comment RDMA/efa: Add driver to Kconfig/Makefile RDMA/efa: Add the efa module RDMA/efa: Add EFA verbs implementation RDMA/efa: Add common command handlers RDMA/efa: Implement functions that submit and complete admin commands RDMA/efa: Add the ABI definitions RDMA/efa: Add the com service API definitions RDMA/efa: Add the efa_com.h file RDMA/efa: Add the efa.h header file RDMA/efa: Add EFA device definitions RDMA: Add EFA related definitions RDMA/umem: Remove hugetlb flag RDMA/bnxt_re: Use core helpers to get aligned DMA address RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page size RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks RDMA/umem: Add API to find best driver supported page size in an MR ...
2019-05-09netfilter: ebtables: CONFIG_COMPAT: reject trailing data after last ruleFlorian Westphal
If userspace provides a rule blob with trailing data after last target, we trigger a splat, then convert ruleset to 64bit format (with trailing data), then pass that to do_replace_finish() which then returns -EINVAL. Erroring out right away avoids the splat plus unneeded translation and error unwind. Fixes: 81e675c227ec ("netfilter: ebtables: add CONFIG_COMPAT support") Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-05-08net/sched: avoid double free on matchall reoffloadPieter Jansen van Vuuren
Avoid freeing cls_mall.rule twice when failing to setup flow_action offload used in the hardware intermediate representation. This is achieved by returning 0 when the setup fails but the skip software flag has not been set. Fixes: f00cbf196814 ("net/sched: use the hardware intermediate representation for matchall") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-08openvswitch: Replace removed NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT)Geert Uytterhoeven
Commit 4806e975729f99c7 ("netfilter: replace NF_NAT_NEEDED with IS_ENABLED(CONFIG_NF_NAT)") removed CONFIG_NF_NAT_NEEDED, but a new user popped up afterwards. Fixes: fec9c271b8f1bde1 ("openvswitch: load and reference the NAT helper.") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Florian Westphal <fw@strlen.de> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-08ipv4: Fix raw socket lookup for local trafficDavid Ahern
inet_iif should be used for the raw socket lookup. inet_iif considers rt_iif which handles the case of local traffic. As it stands, ping to a local address with the '-I <dev>' option fails ever since ping was changed to use SO_BINDTODEVICE instead of cmsg + IP_PKTINFO. IPv6 works fine. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-08fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL ↵Hangbin Liu
not supplied With commit 153380ec4b9 ("fib_rules: Added NLM_F_EXCL support to fib_nl_newrule") we now able to check if a rule already exists. But this only works with iproute2. For other tools like libnl, NetworkManager, it still could add duplicate rules with only NLM_F_CREATE flag, like [localhost ~ ]# ip rule 0: from all lookup local 32766: from all lookup main 32767: from all lookup default 100000: from 192.168.7.5 lookup 5 100000: from 192.168.7.5 lookup 5 As it doesn't make sense to create two duplicate rules, let's just return 0 if the rule exists. Fixes: 153380ec4b9 ("fib_rules: Added NLM_F_EXCL support to fib_nl_newrule") Reported-by: Thomas Haller <thaller@redhat.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds
Pull networking updates from David Miller: "Highlights: 1) Support AES128-CCM ciphers in kTLS, from Vakul Garg. 2) Add fib_sync_mem to control the amount of dirty memory we allow to queue up between synchronize RCU calls, from David Ahern. 3) Make flow classifier more lockless, from Vlad Buslov. 4) Add PHY downshift support to aquantia driver, from Heiner Kallweit. 5) Add SKB cache for TCP rx and tx, from Eric Dumazet. This reduces contention on SLAB spinlocks in heavy RPC workloads. 6) Partial GSO offload support in XFRM, from Boris Pismenny. 7) Add fast link down support to ethtool, from Heiner Kallweit. 8) Use siphash for IP ID generator, from Eric Dumazet. 9) Pull nexthops even further out from ipv4/ipv6 routes and FIB entries, from David Ahern. 10) Move skb->xmit_more into a per-cpu variable, from Florian Westphal. 11) Improve eBPF verifier speed and increase maximum program size, from Alexei Starovoitov. 12) Eliminate per-bucket spinlocks in rhashtable, and instead use bit spinlocks. From Neil Brown. 13) Allow tunneling with GUE encap in ipvs, from Jacky Hu. 14) Improve link partner cap detection in generic PHY code, from Heiner Kallweit. 15) Add layer 2 encap support to bpf_skb_adjust_room(), from Alan Maguire. 16) Remove SKB list implementation assumptions in SCTP, your's truly. 17) Various cleanups, optimizations, and simplifications in r8169 driver. From Heiner Kallweit. 18) Add memory accounting on TX and RX path of SCTP, from Xin Long. 19) Switch PHY drivers over to use dynamic featue detection, from Heiner Kallweit. 20) Support flow steering without masking in dpaa2-eth, from Ioana Ciocoi. 21) Implement ndo_get_devlink_port in netdevsim driver, from Jiri Pirko. 22) Increase the strict parsing of current and future netlink attributes, also export such policies to userspace. From Johannes Berg. 23) Allow DSA tag drivers to be modular, from Andrew Lunn. 24) Remove legacy DSA probing support, also from Andrew Lunn. 25) Allow ll_temac driver to be used on non-x86 platforms, from Esben Haabendal. 26) Add a generic tracepoint for TX queue timeouts to ease debugging, from Cong Wang. 27) More indirect call optimizations, from Paolo Abeni" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1763 commits) cxgb4: Fix error path in cxgb4_init_module net: phy: improve pause mode reporting in phy_print_status dt-bindings: net: Fix a typo in the phy-mode list for ethernet bindings net: macb: Change interrupt and napi enable order in open net: ll_temac: Improve error message on error IRQ net/sched: remove block pointer from common offload structure net: ethernet: support of_get_mac_address new ERR_PTR error net: usb: smsc: fix warning reported by kbuild test robot staging: octeon-ethernet: Fix of_get_mac_address ERR_PTR check net: dsa: support of_get_mac_address new ERR_PTR error net: dsa: sja1105: Fix status initialization in sja1105_get_ethtool_stats vrf: sit mtu should not be updated when vrf netdev is the link net: dsa: Fix error cleanup path in dsa_init_module l2tp: Fix possible NULL pointer dereference taprio: add null check on sched_nest to avoid potential null pointer dereference net: mvpp2: cls: fix less than zero check on a u32 variable net_sched: sch_fq: handle non connected flows net_sched: sch_fq: do not assume EDT packets are ordered net: hns3: use devm_kcalloc when allocating desc_cb net: hns3: some cleanup for struct hns3_enet_ring ...
2019-05-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Minor conflict with the DSA legacy code removal. Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07Merge tag 'driver-core-5.2-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core/kobject updates from Greg KH: "Here is the "big" set of driver core patches for 5.2-rc1 There are a number of ACPI patches in here as well, as Rafael said they should go through this tree due to the driver core changes they required. They have all been acked by the ACPI developers. There are also a number of small subsystem-specific changes in here, due to some changes to the kobject core code. Those too have all been acked by the various subsystem maintainers. As for content, it's pretty boring outside of the ACPI changes: - spdx cleanups - kobject documentation updates - default attribute groups for kobjects - other minor kobject/driver core fixes All have been in linux-next for a while with no reported issues" * tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (47 commits) kobject: clean up the kobject add documentation a bit more kobject: Fix kernel-doc comment first line kobject: Remove docstring reference to kset firmware_loader: Fix a typo ("syfs" -> "sysfs") kobject: fix dereference before null check on kobj Revert "driver core: platform: Fix the usage of platform device name(pdev->name)" init/config: Do not select BUILD_BIN2C for IKCONFIG Provide in-kernel headers to make extending kernel easier kobject: Improve doc clarity kobject_init_and_add() kobject: Improve docs for kobject_add/del driver core: platform: Fix the usage of platform device name(pdev->name) livepatch: Replace klp_ktype_patch's default_attrs with groups cpufreq: schedutil: Replace default_attrs field with groups padata: Replace padata_attr_type default_attrs field with groups irqdesc: Replace irq_kobj_type's default_attrs field with groups net-sysfs: Replace ktype default_attrs field with groups block: Replace all ktype default_attrs with groups samples/kobject: Replace foo_ktype's default_attrs field with groups kobject: Add support for default attribute groups to kobj_type driver core: Postpone DMA tear-down until after devres release for probe failure ...
2019-05-07net/sched: remove block pointer from common offload structurePieter Jansen van Vuuren
Based on feedback from Jiri avoid carrying a pointer to the tcf_block structure in the tc_cls_common_offload structure. Instead store a flag in driver private data which indicates if offloads apply to a shared block at block binding time. Suggested-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07net: ethernet: support of_get_mac_address new ERR_PTR errorPetr Štetiar
There was NVMEM support added to of_get_mac_address, so it could now return ERR_PTR encoded error values, so we need to adjust all current users of of_get_mac_address to this new fact. While at it, remove superfluous is_valid_ether_addr as the MAC address returned from of_get_mac_address is always valid and checked by is_valid_ether_addr anyway. Fixes: d01f449c008a ("of_net: add NVMEM support to of_get_mac_address") Signed-off-by: Petr Štetiar <ynezz@true.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07net: dsa: support of_get_mac_address new ERR_PTR errorPetr Štetiar
There was NVMEM support added to of_get_mac_address, so it could now return ERR_PTR encoded error values, so we need to adjust all current users of of_get_mac_address to this new fact. While at it, remove superfluous is_valid_ether_addr as the MAC address returned from of_get_mac_address is always valid and checked by is_valid_ether_addr anyway. Fixes: d01f449c008a ("of_net: add NVMEM support to of_get_mac_address") Signed-off-by: Petr Štetiar <ynezz@true.cz> Tested-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07vrf: sit mtu should not be updated when vrf netdev is the linkStephen Suryaputra
VRF netdev mtu isn't typically set and have an mtu of 65536. When the link of a tunnel is set, the tunnel mtu is changed from 1480 to the link mtu minus tunnel header. In the case of VRF netdev is the link, then the tunnel mtu becomes 65516. So, fix it by not setting the tunnel mtu in this case. Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07net: dsa: Fix error cleanup path in dsa_init_moduleYueHaibing
BUG: unable to handle kernel paging request at ffffffffa01c5430 PGD 3270067 P4D 3270067 PUD 3271063 PMD 230bc5067 PTE 0 Oops: 0000 [#1 CPU: 0 PID: 6159 Comm: modprobe Not tainted 5.1.0+ #33 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:raw_notifier_chain_register+0x16/0x40 Code: 63 f8 66 90 e9 5d ff ff ff 90 90 90 90 90 90 90 90 90 90 90 55 48 8b 07 48 89 e5 48 85 c0 74 1c 8b 56 10 3b 50 10 7e 07 eb 12 <39> 50 10 7c 0d 48 8d 78 08 48 8b 40 08 48 85 c0 75 ee 48 89 46 08 RSP: 0018:ffffc90001c33c08 EFLAGS: 00010282 RAX: ffffffffa01c5420 RBX: ffffffffa01db420 RCX: 4fcef45928070a8b RDX: 0000000000000000 RSI: ffffffffa01db420 RDI: ffffffffa01b0068 RBP: ffffc90001c33c08 R08: 000000003e0a33d0 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000094443661 R12: ffff88822c320700 R13: ffff88823109be80 R14: 0000000000000000 R15: ffffc90001c33e78 FS: 00007fab8bd08540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffa01c5430 CR3: 00000002297ea000 CR4: 00000000000006f0 Call Trace: register_netdevice_notifier+0x43/0x250 ? 0xffffffffa01e0000 dsa_slave_register_notifier+0x13/0x70 [dsa_core ? 0xffffffffa01e0000 dsa_init_module+0x2e/0x1000 [dsa_core do_one_initcall+0x6c/0x3cc ? do_init_module+0x22/0x1f1 ? rcu_read_lock_sched_held+0x97/0xb0 ? kmem_cache_alloc_trace+0x325/0x3b0 do_init_module+0x5b/0x1f1 load_module+0x1db1/0x2690 ? m_show+0x1d0/0x1d0 __do_sys_finit_module+0xc5/0xd0 __x64_sys_finit_module+0x15/0x20 do_syscall_64+0x6b/0x1d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe Cleanup allocated resourses if there are errors, otherwise it will trgger memleak. Fixes: c9eb3e0f8701 ("net: dsa: Add support for learning FDB through notification") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07l2tp: Fix possible NULL pointer dereferenceYueHaibing
BUG: unable to handle kernel NULL pointer dereference at 0000000000000128 PGD 0 P4D 0 Oops: 0000 [#1 CPU: 0 PID: 5697 Comm: modprobe Tainted: G W 5.1.0-rc7+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:__lock_acquire+0x53/0x10b0 Code: 8b 1c 25 40 5e 01 00 4c 8b 6d 10 45 85 e4 0f 84 bd 06 00 00 44 8b 1d 7c d2 09 02 49 89 fe 41 89 d2 45 85 db 0f 84 47 02 00 00 <48> 81 3f a0 05 70 83 b8 00 00 00 00 44 0f 44 c0 83 fe 01 0f 86 3a RSP: 0018:ffffc90001c07a28 EFLAGS: 00010002 RAX: 0000000000000000 RBX: ffff88822f038440 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000128 RBP: ffffc90001c07a88 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000001 R13: 0000000000000000 R14: 0000000000000128 R15: 0000000000000000 FS: 00007fead0811540(0000) GS:ffff888237a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000128 CR3: 00000002310da000 CR4: 00000000000006f0 Call Trace: ? __lock_acquire+0x24e/0x10b0 lock_acquire+0xdf/0x230 ? flush_workqueue+0x71/0x530 flush_workqueue+0x97/0x530 ? flush_workqueue+0x71/0x530 l2tp_exit_net+0x170/0x2b0 [l2tp_core ? l2tp_exit_net+0x93/0x2b0 [l2tp_core ops_exit_list.isra.6+0x36/0x60 unregister_pernet_operations+0xb8/0x110 unregister_pernet_device+0x25/0x40 l2tp_init+0x55/0x1000 [l2tp_core ? 0xffffffffa018d000 do_one_initcall+0x6c/0x3cc ? do_init_module+0x22/0x1f1 ? rcu_read_lock_sched_held+0x97/0xb0 ? kmem_cache_alloc_trace+0x325/0x3b0 do_init_module+0x5b/0x1f1 load_module+0x1db1/0x2690 ? m_show+0x1d0/0x1d0 __do_sys_finit_module+0xc5/0xd0 __x64_sys_finit_module+0x15/0x20 do_syscall_64+0x6b/0x1d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fead031a839 Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe8d9acca8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 0000560078398b80 RCX: 00007fead031a839 RDX: 0000000000000000 RSI: 000056007659dc2e RDI: 0000000000000003 RBP: 000056007659dc2e R08: 0000000000000000 R09: 0000560078398b80 R10: 0000000000000003 R11: 0000000000000246 R12: 0000000000000000 R13: 00005600783a04a0 R14: 0000000000040000 R15: 0000560078398b80 Modules linked in: l2tp_core(+) e1000 ip_tables ipv6 [last unloaded: l2tp_core CR2: 0000000000000128 ---[ end trace 8322b2b8bf83f8e1 If alloc_workqueue fails in l2tp_init, l2tp_net_ops is unregistered on failure path. Then l2tp_exit_net is called which will flush NULL workqueue, this patch add a NULL check to fix it. Fixes: 67e04c29ec0d ("l2tp: unregister l2tp_net_ops on failure path") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07taprio: add null check on sched_nest to avoid potential null pointer dereferenceColin Ian King
The call to nla_nest_start_noflag can return a null pointer and currently this is not being checked and this can lead to a null pointer dereference when the null pointer sched_nest is passed to function nla_nest_end. Fix this by adding in a null pointer check. Addresses-Coverity: ("Dereference null return value") Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07Merge tag 'stream_open-5.2' of https://lab.nexedi.com/kirr/linuxLinus Torvalds
Pull stream_open conversion from Kirill Smelkov: - remove unnecessary double nonseekable_open from drivers/char/dtlk.c as noticed by Pavel Machek while reviewing nonseekable_open -> stream_open mass conversion. - the mass conversion patch promised in commit 10dce8af3422 ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock") and is automatically generated by running $ make coccicheck MODE=patch COCCI=scripts/coccinelle/api/stream_open.cocci I've verified each generated change manually - that it is correct to convert - and each other nonseekable_open instance left - that it is either not correct to convert there, or that it is not converted due to current stream_open.cocci limitations. More details on this in the patch. - finally, change VFS to pass ppos=NULL into .read/.write for files that declare themselves streams. It was suggested by Rasmus Villemoes and makes sure that if ppos starts to be erroneously used in a stream file, such bug won't go unnoticed and will produce an oops instead of creating illusion of position change being taken into account. Note: this patch does not conflict with "fuse: Add FOPEN_STREAM to use stream_open()" that will be hopefully coming via FUSE tree, because fs/fuse/ uses new-style .read_iter/.write_iter, and for these accessors position is still passed as non-pointer kiocb.ki_pos . * tag 'stream_open-5.2' of https://lab.nexedi.com/kirr/linux: vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files *: convert stream-like files from nonseekable_open -> stream_open dtlk: remove double call to nonseekable_open
2019-05-07net_sched: sch_fq: handle non connected flowsEric Dumazet
FQ packet scheduler assumed that packets could be classified based on their owning socket. This means that if a UDP server uses one UDP socket to send packets to different destinations, packets all land in one FQ flow. This is unfair, since each TCP flow has a unique bucket, meaning that in case of pressure (fully utilised uplink), TCP flows have more share of the bandwidth. If we instead detect unconnected sockets, we can use a stochastic hash based on the 4-tuple hash. This also means a QUIC server using one UDP socket will properly spread the outgoing packets to different buckets, and in-kernel pacing based on EDT model will no longer risk having big rb-tree on one flow. Note that UDP application might provide the skb->hash in an ancillary message at sendmsg() time to avoid the cost of a dissection in fq packet scheduler. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07net_sched: sch_fq: do not assume EDT packets are orderedEric Dumazet
TCP stack makes sure packets for a given flow are monotically increasing, but we want to allow UDP packets to use EDT as well, so that QUIC servers can use in-kernel pacing. This patch adds a per-flow rb-tree on which packets might be stored. We still try to use the linear list for the typical cases where packets are queued with monotically increasing skb->tstamp, since queue/dequeue packets on a standard list is O(1). Note that the ability to store packets in arbitrary EDT order will allow us to implement later a per TCP socket mechanism adding delays (with jitter eventually) and reorders, to implement convenient network emulators. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-05-07Merge branch 'work.icache' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs inode freeing updates from Al Viro: "Introduction of separate method for RCU-delayed part of ->destroy_inode() (if any). Pretty much as posted, except that destroy_inode() stashes ->free_inode into the victim (anon-unioned with ->i_fops) before scheduling i_callback() and the last two patches (sockfs conversion and folding struct socket_wq into struct socket) are excluded - that pair should go through netdev once davem reopens his tree" * 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (58 commits) orangefs: make use of ->free_inode() shmem: make use of ->free_inode() hugetlb: make use of ->free_inode() overlayfs: make use of ->free_inode() jfs: switch to ->free_inode() fuse: switch to ->free_inode() ext4: make use of ->free_inode() ecryptfs: make use of ->free_inode() ceph: use ->free_inode() btrfs: use ->free_inode() afs: switch to use of ->free_inode() dax: make use of ->free_inode() ntfs: switch to ->free_inode() securityfs: switch to ->free_inode() apparmor: switch to ->free_inode() rpcpipe: switch to ->free_inode() bpf: switch to ->free_inode() mqueue: switch to ->free_inode() ufs: switch to ->free_inode() coda: switch to ->free_inode() ...
2019-05-07libceph: make ceph_pr_addr take an struct ceph_entity_addr pointerJeff Layton
GCC9 is throwing a lot of warnings about unaligned accesses by callers of ceph_pr_addr. All of the current callers are passing a pointer to the sockaddr inside struct ceph_entity_addr. Fix it to take a pointer to a struct ceph_entity_addr instead, and then have the function make a copy of the sockaddr before printing it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>