summaryrefslogtreecommitdiff
path: root/net/core/net-sysfs.c
AgeCommit message (Collapse)Author
2013-02-23net/core: apply pm_runtime_set_memalloc_noio on network devicesMing Lei
Deadlock might be caused by allocating memory with GFP_KERNEL in runtime_resume and runtime_suspend callback of network devices in iSCSI situation, so mark network devices and its ancestor as 'memalloc_noio' with the introduced pm_runtime_set_memalloc_noio(). Signed-off-by: Ming Lei <ming.lei@canonical.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Decotigny <david.decotigny@google.com> Cc: Tom Herbert <therbert@google.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Oliver Neukum <oneukum@suse.de> Cc: Jiri Kosina <jiri.kosina@suse.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-01-10net: Add support for XPS without sysfs being definedAlexander Duyck
This patch makes it so that we can support transmit packet steering without sysfs needing to be enabled. The reason for making this change is to make it so that a driver can make use of the XPS even while the sysfs portion of the interface is not present. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2013-01-10net: Add functions netif_reset_xps_queue and netif_set_xps_queueAlexander Duyck
This patch adds two functions, netif_reset_xps_queue and netif_set_xps_queue. The main idea behind these two functions is to provide a mechanism through which drivers can update their defaults in regards to XPS. Currently no such mechanism exists and as a result we cannot use XPS for things such as ATR which would require a basic configuration to start in which the Tx queues are mapped to CPUs via a 1:1 mapping. With this change I am making it possible for drivers such as ixgbe to be able to use the XPS feature by controlling the default configuration. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-28net: allow to change carrier via sysfsJiri Pirko
Make carrier writable Signed-off-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Flavio Leitner <fbl@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-22CONFIG_HOTPLUG removal from networking coreGreg KH
CONFIG_HOTPLUG is always enabled now, so remove the unused code that was trying to be compiled out when this option was disabled, in the networking core. Cc: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: drivers/net/wireless/iwlwifi/pcie/tx.c Minor iwlwifi conflict in TX queue disabling between 'net', which removed a bogus warning, and 'net-next' which added some status register poking code. Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-19net: remove unnecessary wireless includesJohannes Berg
The wireless and wext includes in net-sysfs.c aren't needed, so remove them. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18net: Allow userns root control of the core of the network stack.Eric W. Biederman
Allow an unpriviled user who has created a user namespace, and then created a network namespace to effectively use the new network namespace, by reducing capable(CAP_NET_ADMIN) and capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns, CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls. Settings that merely control a single network device are allowed. Either the network device is a logical network device where restrictions make no difference or the network device is hardware NIC that has been explicity moved from the initial network namespace. In general policy and network stack state changes are allowed while resource control is left unchanged. Allow ethtool ioctls. Allow binding to network devices. Allow setting the socket mark. Allow setting the socket priority. Allow setting the network device alias via sysfs. Allow setting the mtu via sysfs. Allow changing the network device flags via sysfs. Allow setting the network device group via sysfs. Allow the following network device ioctls. SIOCGMIIPHY SIOCGMIIREG SIOCSIFNAME SIOCSIFFLAGS SIOCSIFMETRIC SIOCSIFMTU SIOCSIFHWADDR SIOCSIFSLAVE SIOCADDMULTI SIOCDELMULTI SIOCSIFHWBROADCAST SIOCSMIIREG SIOCBONDENSLAVE SIOCBONDRELEASE SIOCBONDSETHWADDR SIOCBONDCHANGEACTIVE SIOCBRADDIF SIOCBRDELIF SIOCSHWTSTAMP Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-16wireless: add back sysfs directoryJohannes Berg
commit 35b2a113cb0298d4f9a1263338b456094a414057 broke (at least) Fedora's networking scripts, they check for the existence of the wireless directory. As the files aren't used, add the directory back and not the files. Also do it for both drivers based on the old wireless extensions and cfg80211, regardless of whether the compat code for wext is built into cfg80211 or not. Cc: stable@vger.kernel.org [3.6] Reported-by: Dave Airlie <airlied@gmail.com> Reported-by: Bill Nottingham <notting@redhat.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-09-05net: add unknown state to sysfs NIC duplex exportNikolay Aleksandrov
Currently when the NIC duplex state is DUPLEX_UNKNOWN it is exported as full through sysfs, this patch adds support for DUPLEX_UNKNOWN. It is handled the same way as in ethtool. Signed-off-by: Nikolay Aleksandrov <naleksan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-05wireless: remove wext sysfsJohannes Berg
The only user of this was hal prior to its 0.5.12 release which happened over two years ago, so I'm sure this can be removed without issues. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
2012-04-15net: cleanup unsigned to unsigned intEric Dumazet
Use of "unsigned int" is preferred to bare "unsigned" in net tree. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-12net/core: simple_strtoul cleanupShuah Khan
Changed net/core/net-sysfs.c: netdev_store() to use kstrtoul() instead of obsolete simple_strtoul(). Signed-off-by: Shuah Khan <shuahkhan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-24static keys: Introduce 'struct static_key', static_key_true()/false() and ↵Ingo Molnar
static_key_slow_[inc|dec]() So here's a boot tested patch on top of Jason's series that does all the cleanups I talked about and turns jump labels into a more intuitive to use facility. It should also address the various misconceptions and confusions that surround jump labels. Typical usage scenarios: #include <linux/static_key.h> struct static_key key = STATIC_KEY_INIT_TRUE; if (static_key_false(&key)) do unlikely code else do likely code Or: if (static_key_true(&key)) do likely code else do unlikely code The static key is modified via: static_key_slow_inc(&key); ... static_key_slow_dec(&key); The 'slow' prefix makes it abundantly clear that this is an expensive operation. I've updated all in-kernel code to use this everywhere. Note that I (intentionally) have not pushed through the rename blindly through to the lowest levels: the actual jump-label patching arch facility should be named like that, so we want to decouple jump labels from the static-key facility a bit. On non-jump-label enabled architectures static keys default to likely()/unlikely() branches. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: Jason Baron <jbaron@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: a.p.zijlstra@chello.nl Cc: mathieu.desnoyers@efficios.com Cc: davem@davemloft.net Cc: ddaney.cavm@gmail.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-01-17bql: Fix inconsistency between file mode and attr method.Hiroaki SHIMODA
There is no store() method for inflight attribute in the tx-<n>/byte_queue_limits sysfs directory. So remove S_IWUSR bit. Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-12net: reintroduce missing rcu_assign_pointer() callsEric Dumazet
commit a9b3cd7f32 (rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTER) did a lot of incorrect changes, since it did a complete conversion of rcu_assign_pointer(x, y) to RCU_INIT_POINTER(x, y). We miss needed barriers, even on x86, when y is not NULL. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Stephen Hemminger <shemminger@vyatta.com> CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-24rfs: better sizing of dev_flow_tableEric Dumazet
Aim of this patch is to provide full range of rps_flow_cnt on 64bit arches. Theorical limit on number of flows is 2^32 Fix some buggy RPS/RFS macros as well. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> CC: Xi Wang <xi.wang@gmail.com> CC: Laurent Chavey <chavey@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller
Conflicts: net/bluetooth/l2cap_core.c Just two overlapping changes, one added an initialization of a local variable, and another change added a new local variable. Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-22rps: fix insufficient bounds checking in store_rps_dev_flow_table_cnt()Xi Wang
Setting a large rps_flow_cnt like (1 << 30) on 32-bit platform will cause a kernel oops due to insufficient bounds checking. if (count > 1<<30) { /* Enforce a limit to prevent overflow */ return -EINVAL; } count = roundup_pow_of_two(count); table = vmalloc(RPS_DEV_FLOW_TABLE_SIZE(count)); Note that the macro RPS_DEV_FLOW_TABLE_SIZE(count) is defined as: ... + (count * sizeof(struct rps_dev_flow)) where sizeof(struct rps_dev_flow) is 8. (1 << 30) * 8 will overflow 32 bits. This patch replaces the magic number (1 << 30) with a symbolic bound. Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-05bql: fix CONFIG_XPS=n buildEric Dumazet
netdev_queue_release() should be called even if CONFIG_XPS=n to properly release device reference. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29bql: Byte queue limitsTom Herbert
Networking stack support for byte queue limits, uses dynamic queue limits library. Byte queue limits are maintained per transmit queue, and a dql structure has been added to netdev_queue structure for this purpose. Configuration of bql is in the tx-<n> sysfs directory for the queue under the byte_queue_limits directory. Configuration includes: limit_min, bql minimum limit limit_max, bql maximum limit hold_time, bql slack hold time Also under the directory are: limit, current byte limit inflight, current number of bytes on the queue Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-29xps: Add xps_queue_release functionTom Herbert
This patch moves the xps specific parts in netdev_queue_release into its own function which netdev_queue_release can call. This allows netdev_queue_release to be more generic (for adding new attributes to tx queues). Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-17net: use jump_label to shortcut RPS if not setupEric Dumazet
Most machines dont use RPS/RFS, and pay a fair amount of instructions in netif_receive_skb() / netif_rx() / get_rps_cpu() just to discover RPS/RFS is not setup. Add a jump_label named rps_needed. If no device rps_map or global rps_sock_flow_table is setup, netif_receive_skb() / netif_rx() do a single instruction instead of many ones, including conditional jumps. jmp +0 (if CONFIG_JUMP_LABEL=y) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-16net: new counter for tx_timeout errors in sysfsdavid decotigny
This adds the /sys/class/net/DEV/queues/Q/tx_timeout attribute containing the total number of timeout events on the given queue. It is always available with CONFIG_SYSFS, independently of CONFIG_RPS/XPS. Credits to Stephen Hemminger for a preliminary version of this patch. Tested: without CONFIG_SYSFS (compilation only) with sysfs and without CONFIG_RPS & CONFIG_XPS with sysfs and without CONFIG_RPS with sysfs and without CONFIG_XPS with defaults Signed-off-by: David Decotigny <david.decotigny@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-16net-sysfs: fixed minor sparse warningdavid decotigny
This commit fixes following warning: net/core/net-sysfs.c:921:6: warning: symbol 'numa_node' shadows an earlier one include/linux/topology.h:222:1: originally declared here Signed-off-by: David Decotigny <david.decotigny@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-31net: Add export.h for EXPORT_SYMBOL/THIS_MODULE to non-modulesPaul Gortmaker
These files are non modular, but need to export symbols using the macros now living in export.h -- call out the include so that things won't break when we remove the implicit presence of module.h from everywhere. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-09-15net: consolidate and fix ethtool_ops->get_settings callingJiri Pirko
This patch does several things: - introduces __ethtool_get_settings which is called from ethtool code and from drivers as well. Put ASSERT_RTNL there. - dev_ethtool_get_settings() is replaced by __ethtool_get_settings() - changes calling in drivers so rtnl locking is respected. In iboe_get_rate was previously ->get_settings() called unlocked. This fixes it. Also prb_calc_retire_blk_tmo() in af_packet.c had the same problem. Also fixed by calling __dev_get_by_index() instead of dev_get_by_index() and holding rtnl_lock for both calls. - introduces rtnl_lock in bnx2fc_vport_create() and fcoe_vport_create() so bnx2fc_if_create() and fcoe_if_create() are called locked as they are from other places. - use __ethtool_get_settings() in bonding code Signed-off-by: Jiri Pirko <jpirko@redhat.com> v2->v3: -removed dev_ethtool_get_settings() -added ASSERT_RTNL into __ethtool_get_settings() -prb_calc_retire_blk_tmo - use __dev_get_by_index() and lock around it and __ethtool_get_settings() call v1->v2: add missing export_symbol Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> [except FCoE bits] Acked-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-12net: cleanup some rcu_dereference_rawEric Dumazet
RCU api had been completed and rcu_access_pointer() or rcu_dereference_protected() are better than generic rcu_dereference_raw() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-02rcu: convert uses of rcu_assign_pointer(x, NULL) to RCU_INIT_POINTERStephen Hemminger
When assigning a NULL value to an RCU protected pointer, no barrier is needed. The rcu_assign_pointer, used to handle that but will soon change to not handle the special case. Convert all rcu_assign_pointer of NULL value. //smpl @@ expression P; @@ - rcu_assign_pointer(P, NULL) + RCU_INIT_POINTER(P, NULL) // </smpl> Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-14net: remove /sys/class/net/*/featuresMichał Mirosław
The same information and more can be obtained by using ethtool with ETHTOOL_GFEATURES. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-12Delay struct net freeing while there's a sysfs instance refering to itAl Viro
* new refcount in struct net, controlling actual freeing of the memory * new method in kobj_ns_type_operations (->drop_ns()) * ->current_ns() semantics change - it's supposed to be followed by corresponding ->drop_ns(). For struct net in case of CONFIG_NET_NS it bumps the new refcount; net_drop_ns() decrements it and calls net_free() if the last reference has been dropped. Method renamed to ->grab_current_ns(). * old net_free() callers call net_drop_ns() instead. * sysfs_exit_ns() is gone, along with a large part of callchain leading to it; now that the references stored in ->ns[...] stay valid we do not need to hunt them down and replace them with NULL. That fixes problems in sysfs_lookup() and sysfs_readdir(), along with getting rid of sb->s_instances abuse. Note that struct net *shutdown* logics has not changed - net_cleanup() is called exactly when it used to be called. The only thing postponed by having a sysfs instance refering to that struct net is actual freeing of memory occupied by struct net. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2011-05-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits) macvlan: fix panic if lowerdev in a bond tg3: Add braces around 5906 workaround. tg3: Fix NETIF_F_LOOPBACK error macvlan: remove one synchronize_rcu() call networking: NET_CLS_ROUTE4 depends on INET irda: Fix error propagation in ircomm_lmp_connect_response() irda: Kill set but unused variable 'bytes' in irlan_check_command_param() irda: Kill set but unused variable 'clen' in ircomm_connect_indication() rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport() be2net: Kill set but unused variable 'req' in lancer_fw_download() irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication() atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined. rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer(). rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler() rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection() rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window() pkt_sched: Kill set but unused variable 'protocol' in tc_classify() isdn: capi: Use pr_debug() instead of ifdefs. tg3: Update version to 3.119 tg3: Apply rx_discards fix to 5719/5720 ... Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c as per Davem.
2011-05-16net: convert to new cpumask APIKOSAKI Motohiro
We plan to remove cpu_xx() old api later. Thus this patch convert it. This patch has no functional change. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-05-07net,rcu: convert call_rcu(xps_dev_maps_release) to kfree_rcu()Lai Jiangshan
The rcu callback xps_dev_maps_release() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(xps_dev_maps_release). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-07net,rcu: convert call_rcu(xps_map_release) to kfree_rcu()Lai Jiangshan
The rcu callback xps_map_release() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(xps_map_release). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-05-07net,rcu: convert call_rcu(rps_map_release) to kfree_rcu()Lai Jiangshan
The rcu callback rps_map_release() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(rps_map_release). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
2011-04-29ethtool: Call ethtool's get/set_settings callbacks with cleaned dataDavid Decotigny
This makes sure that when a driver calls the ethtool's get/set_settings() callback of another driver, the data passed to it is clean. This guarantees that speed_hi will be zeroed correctly if the called callback doesn't explicitely set it: we are sure we don't get a corrupted speed from the underlying driver. We also take care of setting the cmd field appropriately (ETHTOOL_GSET/SSET). This applies to dev_ethtool_get_settings(), which now makes sure it sets up that ethtool command parameter correctly before passing it to drivers. This also means that whoever calls dev_ethtool_get_settings() does not have to clean the ethtool command parameter. This function also becomes an exported symbol instead of an inline. All drivers visible to make allyesconfig under x86_64 have been updated. Signed-off-by: David Decotigny <decot@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-09net: rename group sysfs entry to netdev_groupXiaotian Feng
commit a512b92 adds sysfs entry for net device group, but before this commit, tun also uses group sysfs, so after this commit checkin, kernel warns like this: sysfs: cannot create duplicate filename '/devices/virtual/net/vnet0/group' Since tun has used this for years, rename sysfs under tun might break existing userspace, so rename group sysfs entry for net device group is a better choice. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24net: add sysfs entry for device groupVlad Dogaru
The group of a network device can be queried or changed from userspace using sysfs. For example, considering sysfs mounted in /sys, one can change the group that interface lo belongs to: echo 1 > /sys/class/net/lo/group Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-24net: change netdev->features to u32Michał Mirosław
Quoting Ben Hutchings: we presumably won't be defining features that can only be enabled on 64-bit architectures. Occurences found by `grep -r` on net/, drivers/net, include/ [ Move features and vlan_features next to each other in struct netdev, as per Eric Dumazet's suggestion -DaveM ] Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-16net: use NUMA_NO_NODE instead of the magic number -1Changli Gao
Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01net sched: use xps information for qdisc NUMA affinityEric Dumazet
Allocate qdisc memory according to NUMA properties of cpus included in xps map. To be effective, qdisc should be (re)setup after changes of /sys/class/net/eth<n>/queues/tx-<n>/xps_cpus I added a numa_node field in struct netdev_queue, containing NUMA node if all cpus included in xps_cpus share same node, else -1. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-29xps: add __rcu annotationsEric Dumazet
Avoid sparse warnings : add __rcu annotations and use rcu_dereference_protected() where necessary. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-29xps: NUMA allocations for per cpu dataEric Dumazet
store_xps_map() allocates maps that are used by single cpu, it makes sense to use NUMA allocations. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28xps: Add CONFIG_XPSTom Herbert
This patch adds XPS_CONFIG option to enable and disable XPS. This is done in the same manner as RPS_CONFIG. This is also fixes build failure in XPS code when SMP is not enabled. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24xps: Transmit Packet SteeringTom Herbert
This patch implements transmit packet steering (XPS) for multiqueue devices. XPS selects a transmit queue during packet transmission based on configuration. This is done by mapping the CPU transmitting the packet to a queue. This is the transmit side analogue to RPS-- where RPS is selecting a CPU based on receive queue, XPS selects a queue based on the CPU (previously there was an XPS patch from Eric Dumazet, but that might more appropriately be called transmit completion steering). Each transmit queue can be associated with a number of CPUs which will use the queue to send packets. This is configured as a CPU mask on a per queue basis in: /sys/class/net/eth<n>/queues/tx-<n>/xps_cpus The mappings are stored per device in an inverted data structure that maps CPUs to queues. In the netdevice structure this is an array of num_possible_cpu structures where each structure holds and array of queue_indexes for queues which that CPU can use. The benefits of XPS are improved locality in the per queue data structures. Also, transmit completions are more likely to be done nearer to the sending thread, so this should promote locality back to the socket on free (e.g. UDP). The benefits of XPS are dependent on cache hierarchy, application load, and other factors. XPS would nominally be configured so that a queue would only be shared by CPUs which are sharing a cache, the degenerative configuration woud be that each CPU has it's own queue. Below are some benchmark results which show the potential benfit of this patch. The netperf test has 500 instances of netperf TCP_RR test with 1 byte req. and resp. bnx2x on 16 core AMD XPS (16 queues, 1 TX queue per CPU) 1234K at 100% CPU No XPS (16 queues) 996K at 100% CPU Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17net: zero kobject in rx_queue_releaseJohn Fastabend
netif_set_real_num_rx_queues() can decrement and increment the number of rx queues. For example ixgbe does this as features and offloads are toggled. Presumably this could also happen across down/up on most devices if the available resources changed (cpu offlined). The kobject needs to be zero'd in this case so that the state is not preserved across kobject_put()/kobject_init_and_add(). This resolves the following error report. ixgbe 0000:03:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX kobject (ffff880324b83210): tried to init an initialized object, something is seriously wrong. Pid: 1972, comm: lldpad Not tainted 2.6.37-rc18021qaz+ #169 Call Trace: [<ffffffff8121c940>] kobject_init+0x3a/0x83 [<ffffffff8121cf77>] kobject_init_and_add+0x23/0x57 [<ffffffff8107b800>] ? mark_lock+0x21/0x267 [<ffffffff813c6d11>] net_rx_queue_update_kobjects+0x63/0xc6 [<ffffffff813b5e0e>] netif_set_real_num_rx_queues+0x5f/0x78 [<ffffffffa0261d49>] ixgbe_set_num_queues+0x1c6/0x1ca [ixgbe] [<ffffffffa0262509>] ixgbe_init_interrupt_scheme+0x1e/0x79c [ixgbe] [<ffffffffa0274596>] ixgbe_dcbnl_set_state+0x167/0x189 [ixgbe] Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15net: Simplify RX queue allocationTom Herbert
This patch move RX queue allocation to alloc_netdev_mq and freeing of the queues to free_netdev (symmetric to TX queue allocation). Each kobject RX queue takes a reference to the queue's device so that the device can't be freed before all the kobjects have been released-- this obviates the need for reference counts specific to RX queues. Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25rps: add __rcu annotationsEric Dumazet
Add __rcu annotations to : (struct netdev_rx_queue)->rps_map (struct netdev_rx_queue)->rps_flow_table struct rps_sock_flow_table *rps_sock_flow_table; And use appropriate rcu primitives. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-08net: Fix rxq ref countingTom Herbert
The rx->count reference is used to track reference counts to the number of rx-queue kobjects created for the device. This patch eliminates initialization of the counter in netif_alloc_rx_queues and instead increments the counter each time a kobject is created. This is now symmetric with the decrement that is done when an object is released. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>