summaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)Author
2018-01-25net/ipv6: Do not allow route add with a device that is downDavid Ahern
IPv6 allows routes to be installed when the device is not up (admin up). Worse, it does not mark it as LINKDOWN. IPv4 does not allow it and really there is no reason for IPv6 to allow it, so check the flags and deny if device is admin down. Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net/smc: check for healthy link group resp. connectionsUrsula Braun
If a problem for at least one connection of a link group is detected, the whole link group and all its connections are terminated. This patch adds a check for healthy link group when trying to reserve a work request, and checks for healthy connections before starting a tx worker. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net/smc: wake up wr_reg_wait when terminating a link groupUrsula Braun
If a new connection with a new rmb is added to a link group, its memory region is registered. If a link group is terminated, a pending registration requires a wake up. And consolidate setting of tx_flag peer_conn_abort in smc_lgr_terminate(). Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net/smc: do not reuse a linkgroup with setup problemsUrsula Braun
Once a linkgroup is created successfully, it stays alive for a certain time to service more connections potentially created. If one of the initialization steps for a new linkgroup fails, the linkgroup should not be reused by other connections following. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net/smc: terminate link group for ib_post_send problemsUrsula Braun
If ib_post_send() fails, terminate all connections of this link group. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net/smc: handle state SMC_PEERFINCLOSEWAIT correctlyUrsula Braun
A state transition from closing state SMC_PEERFINCLOSEWAIT to closing state SMC_APPFINCLOSEWAIT is not allowed. Once a closing indication from the peer has been received, the socket reaches state SMC_CLOSED. And receiving a peer_conn_abort just changes the state of the socket into one of the states SMC_PROCESSABORT or SMC_CLOSED; sending a peer_conn_abort occurs in smc_close_active() for state SMC_PROCESSABORT only. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net/smc: cancel tx worker in case of socket abortsUrsula Braun
If an SMC socket is aborted, the tx worker should be cancelled. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-25net: Move net:netns_ids destruction out of rtnl_lock() and document locking ↵Kirill Tkhai
scheme Currently, we unhash a dying net from netns_ids lists under rtnl_lock(). It's a leftover from the time when net::netns_ids was introduced. There was no net::nsid_lock, and rtnl_lock() was mostly need to order modification of alive nets nsid idr, i.e. for: for_each_net(tmp) { ... id = __peernet2id(tmp, net); idr_remove(&tmp->netns_ids, id); ... } Since we have net::nsid_lock, the modifications are protected by this local lock, and now we may introduce better scheme of netns_ids destruction. Let's look at the functions peernet2id_alloc() and get_net_ns_by_id(). Previous commits taught these functions to work well with dying net acquired from rtnl unlocked lists. And they are the only functions which can hash a net to netns_ids or obtain from there. And as easy to check, other netns_ids operating functions works with id, not with net pointers. So, we do not need rtnl_lock to synchronize cleanup_net() with all them. The another property, which is used in the patch, is that net is unhashed from net_namespace_list in the only place and by the only process. So, we avoid excess rcu_read_lock() or rtnl_lock(), when we'are iterating over the list in unhash_nsid(). All the above makes possible to keep rtnl_lock() locked only for net->list deletion, and completely avoid it for netns_ids unhashing and destruction. As these two doings may take long time (e.g., memory allocation to send skb), the patch should positively act on the scalability and signify decrease the time, which rtnl_lock() is held in cleanup_net(). Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24Merge branch 'rebased-net-ioctl' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24kill kernel_sock_ioctl()Al Viro
no users since 2014 Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24dev_ioctl(): move copyin/copyout to callersAl Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24ipconfig: use dev_set_mtu()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24lift handling of SIOCIW... out of dev_ioctl()Al Viro
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24kill dev_ifname32()Al Viro
same story... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24kill bond_ioctl()Al Viro
Same story as with dev_ifsioc(), except that the last cases with non-trivial conversions had been taken out in 2013... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24kill dev_ifsioc()Al Viro
Once upon a time net/socket.c:dev_ifsioc() used to handle SIOCSHWTSTAMP and SIOCSIFMAP. These have different native and compat layout, so the format conversion had been needed. In 2009 these two cases had been taken out, turning the rest into a convoluted way to calling sock_do_ioctl(). We copy compat structure into native one, call sock_do_ioctl() on that and copy the result back for the in/out ioctls. No layout transformation anywhere, so we might as well just call sock_do_ioctl() and skip all the headache with copying. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24ip_rt_ioctl(): take copyin to callerAl Viro
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24devinet_ioctl(): take copyin/copyout to callerAl Viro
Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24net: separate SIOCGIFCONF handling from dev_ioctl()Al Viro
Only two of dev_ioctl() callers may pass SIOCGIFCONF to it. Separating that codepath from the rest of dev_ioctl() allows both to simplify dev_ioctl() itself (all other cases work with struct ifreq *) *and* seriously simplify the compat side of that beast: all it takes is passing to inet_gifconf() an extra argument - the size of individual records (sizeof(struct ifreq) or sizeof(struct compat_ifreq)). With dev_ifconf() called directly from sock_do_ioctl()/compat_dev_ifconf() that's easy to arrange. As the result, compat side of SIOCGIFCONF doesn't need any allocations, copy_in_user() back and forth, etc. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2018-01-24ip_tunnel: Use mark in skb by defaultThomas Winter
This allows marks set by connmark in iptables to be used for route lookups. Signed-off-by: Thomas Winter <thomas.winter@alliedtelesis.co.nz> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24cls_u32: propagate extack to delete callbackJakub Kicinski
Propagate extack on removal of offloaded filter. Don't pass extack from error paths. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24cls_u32: pass offload flags to tc_cls_common_offload_init()Jakub Kicinski
Pass offload flags to the new implementation of tc_cls_common_offload_init(). Extack will now only be set if user requested skip_sw. hnodes need to hold onto the flags now to be able to reuse them on filter removal. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24cls_flower: propagate extack to delete callbackJakub Kicinski
Propagate extack on removal of offloaded filter. Don't pass extack from error paths. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24cls_flower: pass offload flags to tc_cls_common_offload_init()Jakub Kicinski
Pass offload flags to the new implementation of tc_cls_common_offload_init(). Extack will now only be set if user requested skip_sw. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24cls_matchall: propagate extack to delete callbackJakub Kicinski
Propagate extack on removal of offloaded filter. Don't pass extack from error paths. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24cls_matchall: pass offload flags to tc_cls_common_offload_init()Jakub Kicinski
Pass offload flags to the new implementation of tc_cls_common_offload_init(). Extack will now only be set if user requested skip_sw. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24cls_bpf: propagate extack to offload delete callbackJakub Kicinski
Propagate extack on removal of offloaded filter. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24cls_bpf: pass offload flags to tc_cls_common_offload_init()Jakub Kicinski
Pass offload flags to the new implementation of tc_cls_common_offload_init(). Extack will now only be set if user requested skip_sw. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24cls_bpf: remove gen_flags from bpf_offloadJakub Kicinski
cls_bpf now guarantees that only device-bound programs are allowed with skip_sw. The drivers no longer pay attention to flags on filter load, therefore the bpf_offload member can be removed. If flags are needed again they should probably be added to struct tc_cls_common_offload instead. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24net: sched: prepare for reimplementation of tc_cls_common_offload_init()Jakub Kicinski
Rename the tc_cls_common_offload_init() helper function to tc_cls_common_offload_init_deprecated() and add a new implementation which also takes flags argument. We will only set extack if flags indicate that offload is forced (skip_sw) otherwise driver errors should be ignored, as they don't influence the overall filter installation. Note that we need the tc_skip_hw() helper for new version, therefore it is added later in the file. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24net: sched: propagate extack to cls->destroy callbacksJakub Kicinski
Propagate extack to cls->destroy callbacks when called from non-error paths. On error paths pass NULL to avoid overwriting the failure message. Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24kcm: Check if sk_user_data already set in kcm_attachTom Herbert
This is needed to prevent sk_user_data being overwritten. The check is done under the callback lock. This should prevent a socket from being attached twice to a KCM mux. It also prevents a socket from being attached for other use cases of sk_user_data as long as the other cases set sk_user_data under the lock. Followup work is needed to unify all the use cases of sk_user_data to use the same locking. Reported-by: syzbot+114b15f2be420a8886c3@syzkaller.appspotmail.com Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Signed-off-by: Tom Herbert <tom@quantonium.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24kcm: Only allow TCP sockets to be attached to a KCM muxTom Herbert
TCP sockets for IPv4 and IPv6 that are not listeners or in closed stated are allowed to be attached to a KCM mux. Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reported-by: syzbot+8865eaff7f9acd593945@syzkaller.appspotmail.com Signed-off-by: Tom Herbert <tom@quantonium.net> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24pktgen: Clean read user supplied flag messDmitry Safonov
Don't use error-prone-brute-force way. Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24pktgen: Remove brute-force printing of flagsDmitry Safonov
Add macro generated pkt_flag_names array, with a little help of which the flags can be printed by using an index. Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24pktgen: Add behaviour flags macro to generate flags/namesDmitry Safonov
PKT_FALGS macro will be used to add package behavior names definitions to simplify the code that prints/reads pkg flags. Sorted the array in order of printing the flags in pktgen_if_show() Note: Renamed IPSEC_ON => IPSEC for simplicity. No visible behavior change expected. Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24pktgen: Add missing !flag parametersDmitry Safonov
o FLOW_SEQ now can be disabled with pgset "flag !FLOW_SEQ" o FLOW_SEQ and FLOW_RND are antonyms, as it's shown by pktgen_if_show() o IPSEC now may be disabled Note, that IPV6 is enabled with dst6/src6 parameters, not with a flag parameter. Signed-off-by: Dmitry Safonov <dima@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24net: sched: em_nbyte: don't add the data offset twiceWolfgang Bumiller
'ptr' is shifted by the offset and then validated, the memcmp should not add it a second time. Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24net/smc: continue waiting if peer signals write_shutdownUrsula Braun
If the peer sends a shutdown WRITE, this should not affect sending in general, and waiting for send buffer space in particular. Stop waiting of the local socket for send buffer space only, if peer signals closing, but not if peer signals just shutdown WRITE. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24net/smc: improve state change handling after close waitUrsula Braun
When a socket is closed or shutdown, smc waits for data being transmitted in certain states. If the state changes during this wait, the close switch depending on state should be reentered. In addition, state change is avoided if sending of close or shutdown fails. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24net/smc: make wait for work request uninterruptibleUrsula Braun
Work requests are needed for every ib_post_send(), among them the ib_post_send() to signal closing. If an smc socket program is cancelled, the smc connections should be cleaned up, and require sending of closing signals to the peer. This may fail, if a wait for a free work request is needed, but is cancelled immediately due to the cancel interrupt. To guarantee notification of the peer, the wait for a work request is changed to uninterruptible. And the area to receive work request completion info with ib_poll_cq() is cleared first. And _tx_ variable names are used in the _tx_routines for the demultiplexing common type in the header. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24net/smc: get rid of tx_pend waits in socket closingUrsula Braun
There is no need to wait for confirmation of pending tx requests for a closing connection, since pending tx slots are dismissed when finishing a connection. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24net/smc: simplify function smc_clcsock_accept()Ursula Braun
Cleanup to avoid duplicate code in smc_clcsock_accept(). No functional change. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24net/smc: use local struct sock variables consistentlyUrsula Braun
Cleanup to consistently exploit the local struct sock definitions. No functional change. Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-24Merge branch 'master' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2018-01-24 1) Only offloads SAs after they are fully initialized. Otherwise a NIC may receive packets on a SA we can not yet handle in the stack. From Yossi Kuperman. 2) Fix negative refcount in case of a failing offload. From Aviad Yehezkel. 3) Fix inner IP ptoro version when decapsulating from interaddress family tunnels. From Yossi Kuperman. 4) Use true or false for boolean variables instead of an integer value in xfrm_get_type_offload. From Gustavo A. R. Silva. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABELBen Hutchings
Commit 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl setting") removed the initialisation of ipv6_pinfo::autoflowlabel and added a second flag to indicate whether this field or the net namespace default should be used. The getsockopt() handling for this case was not updated, so it currently returns 0 for all sockets for which IPV6_AUTOFLOWLABEL is not explicitly enabled. Fix it to return the effective value, whether that has been set at the socket or net namespace level. Fixes: 513674b5a2c9 ("net: reevalulate autoflowlabel setting after sysctl ...") Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23net/sched: act_csum: don't use spinlock in the fast pathDavide Caratti
use RCU instead of spin_{,unlock}_bh() to protect concurrent read/write on act_csum configuration, to reduce the effects of contention in the data path when multiple readers are present. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23net/sched: act_csum: use per-core statisticsDavide Caratti
use per-CPU counters, like other TC actions do, instead of maintaining one set of stats across all cores. This allows updating act_csum stats without the need of protecting them using spin_{,un}lock_bh() invocations. Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2018-01-23net: link_watch: mark bonding link events urgentRoopa Prabhu
It takes 1sec for bond link down notification to hit user-space when all slaves of the bond go down. 1sec is too long for protocol daemons in user-space relying on bond notification to recover (eg: multichassis lag implementations in user-space). Since the link event code already marks team device port link events as urgent, this patch moves the code to cover all lag ports and master. Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>